nh3

Python bindings to the ammonia HTML sanitization library.

Installation

pip install nh3

Usage

Use clean() to sanitize HTML fragments:

>>> import nh3
>>> nh3.clean("<unknown>hi")
'hi'
>>> nh3.clean("<b><img src='' onerror='alert(\\'hax\\')'>XSS?</b>")
'<b><img src="">XSS?</b>'

It has many options to customize the sanitization, as documented below. For example, to only allow <b> tags:

>>> nh3.clean("<b><a href='https://example.com'>Hello</a></b>", tags={"b"})
'<b>Hello</b>'

API reference

Python bindings to the ammonia HTML sanitization library ( https://github.com/rust-ammonia/ammonia ).

nh3.clean(html, tags=None, clean_content_tags=None, attributes=None, attribute_filter=None, strip_comments=True, link_rel='noopener noreferrer', generic_attribute_prefixes=None, tag_attribute_values=None, set_tag_attribute_values=None, url_schemes=None, allowed_classes=None, filter_style_properties=None)

Sanitize an HTML fragment according to the given options. See Cleaner() for detailed sanitizer options.

Parameters:

html (str) – Input HTML fragment

Returns:

Sanitized HTML fragment

Return type:

str

For example:

>>> import nh3
>>> nh3.clean("<unknown>hi")
'hi'
>>> nh3.clean("<b><img src='' onerror='alert(\\'hax\\')'>XSS?</b>")
'<b><img src="">XSS?</b>'

Example of using attribute_filter:

>>> from copy import deepcopy
>>> attributes = deepcopy(nh3.ALLOWED_ATTRIBUTES)
>>> attributes["a"].add("class")
>>> def attribute_filter(tag, attr, value):
...     if tag == "a" and attr == "class":
...         if "mention" in value.split(" "):
...             return "mention"
...         return None
...     return value
>>> nh3.clean("<a class='mention unwanted'>@foo</a>",
...     attributes=attributes,
...     attribute_filter=attribute_filter)
'<a class="mention" rel="noopener noreferrer">@foo</a>'

Example of maintaining the rel attribute:

>>> from copy import deepcopy
>>> attributes = deepcopy(nh3.ALLOWED_ATTRIBUTES)
>>> attributes["a"].add("rel")
>>> nh3.clean("<a href='https://tag.example' rel='tag'>#tag</a>",
...     link_rel=None, attributes=attributes)
'<a href="https://tag.example" rel="tag">#tag</a>'
nh3.clean_text(html)

Turn an arbitrary string into unformatted HTML.

Roughly equivalent to Python’s html.escape() or PHP’s htmlspecialchars and htmlentities. Escaping is as strict as possible, encoding every character that has special meaning to the HTML parser.

Parameters:

html (str) – Input HTML fragment

Returns:

Cleaned text

Return type:

str

For example:

>>> import nh3
>>> nh3.clean_text('Robert"); abuse();//')
'Robert&quot;);&#32;abuse();&#47;&#47;'
nh3.is_html(html)

Determine if a given string contains HTML.

This function parses the full string and checks for any HTML syntax.

Note: This function will return True for strings that contain invalid HTML syntax like <g> and even Vec::<u8>::new().

Parameters:

html (str) – Input string

Return type:

bool

For example:

>>> nh3.is_html("plain text")
False
>>> nh3.is_html("<p>html!</p>")
True
nh3.ALLOWED_TAGS

The default set of tags allowed by clean(). Useful for customizing the default to add or remove some tags:

>>> tags = nh3.ALLOWED_TAGS - {"b"}
>>> nh3.clean("<b><i>yeah</i></b>", tags=tags)
'<i>yeah</i>'
nh3.ALLOWED_ATTRIBUTES

The default mapping of tags to allowed attributes for clean(). Useful for customizing the default to add or remove some attributes:

>>> from copy import deepcopy
>>> attributes = deepcopy(nh3.ALLOWED_ATTRIBUTES)
>>> attributes["img"].add("data-invert")
>>> nh3.clean("<img src='example.jpeg' data-invert=true>", attributes=attributes)
'<img src="example.jpeg" data-invert="true">'
nh3.ALLOWED_URL_SCHEMES

The default set of URL schemes permitted on href and src attributes. Useful for customizing the default to add or remove some URL schemes:

>>> url_schemes = nh3.ALLOWED_URL_SCHEMES - {'tel'}
>>> nh3.clean('<a href="tel:+1">Call</a> or <a href="mailto:contact@me">email</a> me.', url_schemes=url_schemes)
'<a rel="noopener noreferrer">Call</a> or <a href="mailto:contact@me" rel="noopener noreferrer">email</a> me.'