parsing tags in markdown

First published:

Last Edited:

Number of edits:

The Luhmann method suggests using tags for categorizing entries. I am not sure how I feel about it, but for sure it can help re-discover new notes based on a topic, following an apparently random jumping process. Therefore, to be able to include tags into notes without a fixed structure, I wanted to parse the markdown file and identify strings that start with a #. The regex was inspired by the parses of sublimeless:

RE_TAGS = r"(#+([^#\s.,\/!$%\^&\*;{}\[\]'\"=`~()<>”\\]|:[a-zA-Z0-9])+)"

The tricky part was to add it as an extension to the markdown parser in python so that it would not only transform the content, but it would also store it for using in the main script. I followed a similar approach to that defined for the wikilinks:

First, define an InlineProcessor that will happen after the code has been standardized. Fortunately, the extensions API is well documented, but I still had to go to the code to find out what priority to give to it.

In my case, when I define the TagExtension, I use:

md.inlinePatterns.register(TagInlineProcessor(TagInlineProcessor.RE_TAGS, md), 'tags', 65)

The 65 means it will happen right after the SimpleTextInlineProcessor and the AsteriskProcessor, but I am not sure this is the best place.

Do you like what you read?

Get a weekly e-mail with my latest thoughts, reflections, book reviews, and more.

Aquiles Carattino
Aquiles Carattino
This note you are reading is part of my digital garden. Follow the links to learn more, and remember that these notes evolve over time. After all, this website is not a blog.


Nothing links here, how did you reach this page then?
© 2020 Aquiles Carattino
Privacy Policy
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.