Qstrip
A fast Markdown stripping library for Python with a C backend.
I decided to create my own Markdown stripping library for Python after finding the existing dedicated package to be vastly inefficient. Instead of directly stripping Markdown syntax from the input string, it first converts the Markdown to HTML using another library, and then extracts the text using bs4. Although this works, it is very slow, since it performs unnecessary work, and since bs4 is written entirely in Python, it comes with a significant overhead.
Qstrip is a dedicated markdown stripper with no dependencies, and makes use of a C extension for its core logic, thus making it lightning fast. The quality of the output text is measured by a collection of robust unit tests.
Installation
pip install qstrip
Usage
from qstrip import strip_markdown
with open('markdown_file.md', 'r') as f:
content = f.read()
stripped_content = strip_markdown(content)
print(stripped_content)
Command Line Interface
Once installed, qstrip can also be executed directly from the command line:
$ qstrip --help
usage: qstrip [-h] [-i INPUT] [-o OUTPUT]
Strip markdown
options:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input file to strip markdown from. Defaults to
stdin.
-o OUTPUT, --output OUTPUT
Output file to write the stripped text to.
Defaults to stdout.
$ echo "[Link to my site](https://example.com)" > markdown.md
$ qstrip -i markdown.md
Link to my site
Current and planned features
- Strip headings
- Strip bold tags
- Strip italic tags
- Strip strikethrough tags
- Strip code blocks
- Strip inline code
- Strip links
- Strip images
- Strip tables
- Handle images inside links
- Strip lists
- Strip blockquotes
- Handle escape sequences
- Support other markup formats (e.g., reStructuredText, HTML/XML)