Skip to main content

Qstrip

GitHub stars Build Status License

A fast Markdown stripping library for Python with a C backend.

This library is available on PyPi.

I decided to create my own Markdown stripping library for Python after finding the existing dedicated package to be vastly inefficient. Instead of directly stripping Markdown syntax from the input string, it first converts the Markdown to HTML using another library, and then extracts the text using bs4. Although this works, it is very slow, since it performs unnecessary work, and since bs4 is written entirely in Python, it comes with a significant overhead.

Qstrip is a dedicated markdown stripper with no dependencies, and makes use of a C extension for its core logic, thus making it lightning fast. The quality of the output text is measured by a collection of robust unit tests.

Installation

pip install qstrip

Usage

from qstrip import strip_markdown

with open('markdown_file.md', 'r') as f:
content = f.read()

stripped_content = strip_markdown(content)
print(stripped_content)

Command Line Interface

Once installed, qstrip can also be executed directly from the command line:

$ qstrip --help
usage: qstrip [-h] [-i INPUT] [-o OUTPUT]

Strip markdown

options:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input file to strip markdown from. Defaults to
stdin.
-o OUTPUT, --output OUTPUT
Output file to write the stripped text to.
Defaults to stdout.
$ echo "[Link to my site](https://example.com)" > markdown.md
$ qstrip -i markdown.md
Link to my site

Current and planned features

  • Strip headings
  • Strip bold tags
  • Strip italic tags
  • Strip strikethrough tags
  • Strip code blocks
  • Strip inline code
  • Strip links
  • Strip images
  • Strip tables
  • Handle images inside links
  • Strip lists
  • Strip blockquotes
  • Handle escape sequences
  • Support other markup formats (e.g., reStructuredText, HTML/XML)