nh3
Python bindings to the ammonia HTML sanitization library.
Installation
pip install nh3
Usage
Use clean() to sanitize HTML fragments:
>>> import nh3
>>> nh3.clean("<unknown>hi")
'hi'
>>> nh3.clean("<b><img src='' onerror='alert(\\'hax\\')'>XSS?</b>")
'<b><img src="">XSS?</b>'
It has many options to customize the sanitization, as documented below.
For example, to only allow <b> tags:
>>> nh3.clean("<b><a href='https://example.com'>Hello</a></b>", tags={"b"})
'<b>Hello</b>'
API reference
Python bindings to the ammonia HTML sanitization library ( https://github.com/rust-ammonia/ammonia ).
- nh3.clean(html, tags=None, clean_content_tags=None, attributes=None, attribute_filter=None, strip_comments=True, link_rel='noopener noreferrer', generic_attribute_prefixes=None, tag_attribute_values=None, set_tag_attribute_values=None, url_schemes=None, allowed_classes=None, filter_style_properties=None)
Sanitize an HTML fragment according to the given options. See
Cleaner()for detailed sanitizer options.- Parameters:
html (
str) – Input HTML fragment- Returns:
Sanitized HTML fragment
- Return type:
str
For example:
>>> import nh3 >>> nh3.clean("<unknown>hi") 'hi' >>> nh3.clean("<b><img src='' onerror='alert(\\'hax\\')'>XSS?</b>") '<b><img src="">XSS?</b>'
Example of using
attribute_filter:>>> from copy import deepcopy >>> attributes = deepcopy(nh3.ALLOWED_ATTRIBUTES) >>> attributes["a"].add("class") >>> def attribute_filter(tag, attr, value): ... if tag == "a" and attr == "class": ... if "mention" in value.split(" "): ... return "mention" ... return None ... return value >>> nh3.clean("<a class='mention unwanted'>@foo</a>", ... attributes=attributes, ... attribute_filter=attribute_filter) '<a class="mention" rel="noopener noreferrer">@foo</a>'
Example of maintaining the
relattribute:>>> from copy import deepcopy >>> attributes = deepcopy(nh3.ALLOWED_ATTRIBUTES) >>> attributes["a"].add("rel") >>> nh3.clean("<a href='https://tag.example' rel='tag'>#tag</a>", ... link_rel=None, attributes=attributes) '<a href="https://tag.example" rel="tag">#tag</a>'
- nh3.clean_text(html)
Turn an arbitrary string into unformatted HTML.
Roughly equivalent to Python’s html.escape() or PHP’s htmlspecialchars and htmlentities. Escaping is as strict as possible, encoding every character that has special meaning to the HTML parser.
- Parameters:
html (
str) – Input HTML fragment- Returns:
Cleaned text
- Return type:
str
For example:
>>> import nh3 >>> nh3.clean_text('Robert"); abuse();//') 'Robert"); abuse();//'
- nh3.is_html(html)
Determine if a given string contains HTML.
This function parses the full string and checks for any HTML syntax.
Note: This function will return True for strings that contain invalid HTML syntax like
<g>and evenVec::<u8>::new().- Parameters:
html (
str) – Input string- Return type:
bool
For example:
>>> nh3.is_html("plain text") False >>> nh3.is_html("<p>html!</p>") True
- nh3.ALLOWED_TAGS
The default set of tags allowed by
clean(). Useful for customizing the default to add or remove some tags:>>> tags = nh3.ALLOWED_TAGS - {"b"} >>> nh3.clean("<b><i>yeah</i></b>", tags=tags) '<i>yeah</i>'
- nh3.ALLOWED_ATTRIBUTES
The default mapping of tags to allowed attributes for
clean(). Useful for customizing the default to add or remove some attributes:>>> from copy import deepcopy >>> attributes = deepcopy(nh3.ALLOWED_ATTRIBUTES) >>> attributes["img"].add("data-invert") >>> nh3.clean("<img src='example.jpeg' data-invert=true>", attributes=attributes) '<img src="example.jpeg" data-invert="true">'
- nh3.ALLOWED_URL_SCHEMES
The default set of URL schemes permitted on
hrefandsrcattributes. Useful for customizing the default to add or remove some URL schemes:>>> url_schemes = nh3.ALLOWED_URL_SCHEMES - {'tel'} >>> nh3.clean('<a href="tel:+1">Call</a> or <a href="mailto:contact@me">email</a> me.', url_schemes=url_schemes) '<a rel="noopener noreferrer">Call</a> or <a href="mailto:contact@me" rel="noopener noreferrer">email</a> me.'