Table Visualization¶
This section demonstrates visualization of tabular data using the Styler class. For information on visualization with charting please see Chart Visualization. This document is written as a Jupyter Notebook, and can be viewed or downloaded here.
Styler Object and Customising the Display¶
Styling and output display customisation should be performed after the data in a DataFrame has been processed. The Styler is not dynamically updated if further changes to the DataFrame are made. The DataFrame.style
attribute is a property that returns a Styler object. It has a _repr_html_
method defined on it so it is rendered automatically in Jupyter Notebook.
The Styler, which can be used for large data but is primarily designed for small data, currently has the ability to output to these formats:
HTML
LaTeX
String (and CSV by extension)
Excel
(JSON is not currently available)
The first three of these have display customisation methods designed to format and customise the output. These include:
Formatting values, the index and columns headers, using .format() and .format_index(),
Renaming the index or column header labels, using .relabel_index()
Hiding certain columns, the index and/or column headers, or index names, using .hide()
Concatenating similar DataFrames, using .concat()
Formatting the Display¶
Formatting Values¶
The Styler distinguishes the display value from the actual value, in both data values and index or columns headers. To control the display value, the text is printed in each cell as string, and we can use the .format() and .format_index() methods to manipulate this according to a format spec string or a callable that takes a single value and returns a string. It is possible to define this for the whole table, or index, or for individual columns, or MultiIndex levels. We can also overwrite index names
Additionally, the format function has a precision argument to specifically help formatting floats, as well as decimal and thousands separators to support other locales, an na_rep argument to display missing data, and an escape and hyperlinks arguments to help displaying safe-HTML or safe-LaTeX. The default formatter is configured to adopt pandas’ global options such as styler.format.precision
option, controllable using
with pd.option_context('format.precision', 2):
[2]:
import pandas as pd
import numpy as np
import matplotlib as mpl
[4]:
df = pd.DataFrame({
"strings": ["Adam", "Mike"],
"ints": [1, 3],
"floats": [1.123, 1000.23]
})
df.style \
.format(precision=3, thousands=".", decimal=",") \
.format_index(str.upper, axis=1) \
.relabel_index(["row 1", "row 2"], axis=0)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[4], line 7
1 df = pd.DataFrame({
2 "strings": ["Adam", "Mike"],
3 "ints": [1, 3],
4 "floats": [1.123, 1000.23]
5 })
6 df.style \
----> 7 .format(precision=3, thousands=".", decimal=",") \
8 .format_index(str.upper, axis=1) \
9 .relabel_index(["row 1", "row 2"], axis=0)
AttributeError: 'function' object has no attribute 'format'
Using Styler to manipulate the display is a useful feature because maintaining the indexing and data values for other purposes gives greater control. You do not have to overwrite your DataFrame to display it how you like. Here is a more comprehensive example of using the formatting functions whilst still relying on the underlying data for indexing and calculations.
[5]:
np.random.seed(25) # for reproducibility
weather_df = pd.DataFrame(np.random.rand(10,2)*5,
index=pd.date_range(start="2021-01-01", periods=10),
columns=["Tokyo", "Beijing"])
def rain_condition(v):
if v < 1.75:
return "Dry"
elif v < 2.75:
return "Rain"
return "Heavy Rain"
def make_pretty(styler):
styler.set_caption("Weather Conditions")
styler.format(rain_condition)
styler.format_index(lambda v: v.strftime("%A"))
styler.background_gradient(axis=None, vmin=1, vmax=5, cmap="YlGnBu")
return styler
weather_df
[5]:
Tokyo | Beijing | |
---|---|---|
2021-01-01 | 4.350621 | 2.911385 |
2021-01-02 | 1.394195 | 0.929556 |
2021-01-03 | 2.055501 | 0.586878 |
2021-01-04 | 3.424844 | 2.188055 |
2021-01-05 | 2.781147 | 1.835402 |
2021-01-06 | 2.011829 | 0.565204 |
2021-01-07 | 2.235154 | 2.927226 |
2021-01-08 | 0.809926 | 2.603594 |
2021-01-09 | 1.630256 | 3.495931 |
2021-01-10 | 1.831973 | 4.181873 |
[6]:
weather_df.loc["2021-01-04":"2021-01-08"].style.pipe(make_pretty)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 weather_df.loc["2021-01-04":"2021-01-08"].style.pipe(make_pretty)
AttributeError: 'function' object has no attribute 'pipe'
Hiding Data¶
The index and column headers can be completely hidden, as well subselecting rows or columns that one wishes to exclude. Both these options are performed using the same methods.
The index can be hidden from rendering by calling .hide() without any arguments, which might be useful if your index is integer based. Similarly column headers can be hidden by calling .hide(axis=“columns”) without any further arguments.
Specific rows or columns can be hidden from rendering by calling the same .hide() method and passing in a row/column label, a list-like or a slice of row/column labels to for the subset
argument.
Hiding does not change the integer arrangement of CSS classes, e.g. hiding the first two columns of a DataFrame means the column class indexing will still start at col2
, since col0
and col1
are simply ignored.
[7]:
df = pd.DataFrame(np.random.randn(5, 5))
df.style \
.hide(subset=[0, 2, 4], axis=0) \
.hide(subset=[0, 2, 4], axis=1)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[7], line 3
1 df = pd.DataFrame(np.random.randn(5, 5))
2 df.style \
----> 3 .hide(subset=[0, 2, 4], axis=0) \
4 .hide(subset=[0, 2, 4], axis=1)
AttributeError: 'function' object has no attribute 'hide'
To invert the function to a show functionality it is best practice to compose a list of hidden items.
[8]:
show = [0, 2, 4]
df.style \
.hide([row for row in df.index if row not in show], axis=0) \
.hide([col for col in df.columns if col not in show], axis=1)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[8], line 3
1 show = [0, 2, 4]
2 df.style \
----> 3 .hide([row for row in df.index if row not in show], axis=0) \
4 .hide([col for col in df.columns if col not in show], axis=1)
AttributeError: 'function' object has no attribute 'hide'
Concatenating DataFrame Outputs¶
Two or more Stylers can be concatenated together provided they share the same columns. This is very useful for showing summary statistics for a DataFrame, and is often used in combination with DataFrame.agg.
Since the objects concatenated are Stylers they can independently be styled as will be shown below and their concatenation preserves those styles.
[9]:
summary_styler = df.agg(["sum", "mean"]).style \
.format(precision=3) \
.relabel_index(["Sum", "Average"])
df.style.format(precision=1).concat(summary_styler)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[9], line 2
1 summary_styler = df.agg(["sum", "mean"]).style \
----> 2 .format(precision=3) \
3 .relabel_index(["Sum", "Average"])
4 df.style.format(precision=1).concat(summary_styler)
AttributeError: 'function' object has no attribute 'format'
Styler Object and HTML¶
The Styler was originally constructed to support the wide array of HTML formatting options. Its HTML output creates an HTML <table>
and leverages CSS styling language to manipulate many parameters including colors, fonts, borders, background, etc. See here for more information on styling HTML tables. This allows a lot of flexibility out of the box, and even enables web developers to
integrate DataFrames into their exiting user interface designs.
Below we demonstrate the default output, which looks very similar to the standard DataFrame HTML representation. But the HTML here has already attached some CSS classes to each cell, even if we haven’t yet created any styles. We can view these by calling the .to_html() method, which returns the raw HTML as string, which is useful for further processing or adding to a file - read on in More about CSS and
HTML. This section will also provide a walkthrough for how to convert this default output to represent a DataFrame output that is more communicative. For example how we can build s
:
[10]:
df = pd.DataFrame([[38.0, 2.0, 18.0, 22.0, 21, np.nan],[19, 439, 6, 452, 226,232]],
index=pd.Index(['Tumour (Positive)', 'Non-Tumour (Negative)'], name='Actual Label:'),
columns=pd.MultiIndex.from_product([['Decision Tree', 'Regression', 'Random'],['Tumour', 'Non-Tumour']], names=['Model:', 'Predicted:']))
df.style
[10]:
<bound method <lambda> of Model: Decision Tree Regression Random \
Predicted: Tumour Non-Tumour Tumour Non-Tumour Tumour
Actual Label:
Tumour (Positive) 38.0 2.0 18.0 22.0 21
Non-Tumour (Negative) 19.0 439.0 6.0 452.0 226
Model:
Predicted: Non-Tumour
Actual Label:
Tumour (Positive) NaN
Non-Tumour (Negative) 232.0 >
[12]:
s
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[12], line 1
----> 1 s
NameError: name 's' is not defined
The first step we have taken is the create the Styler object from the DataFrame and then select the range of interest by hiding unwanted columns with .hide().
[13]:
s = df.style.format('{:.0f}').hide([('Random', 'Tumour'), ('Random', 'Non-Tumour')], axis="columns")
s
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[13], line 1
----> 1 s = df.style.format('{:.0f}').hide([('Random', 'Tumour'), ('Random', 'Non-Tumour')], axis="columns")
2 s
AttributeError: 'function' object has no attribute 'format'
Methods to Add Styles¶
There are 3 primary methods of adding custom CSS styles to Styler:
Using .set_table_styles() to control broader areas of the table with specified internal CSS. Although table styles allow the flexibility to add CSS selectors and properties controlling all individual parts of the table, they are unwieldy for individual cell specifications. Also, note that table styles cannot be exported to Excel.
Using .set_td_classes() to directly link either external CSS classes to your data cells or link the internal CSS classes created by .set_table_styles(). See here. These cannot be used on column header rows or indexes, and also won’t export to Excel.
Using the .apply() and .applymap() functions to add direct internal CSS to specific data cells. See here. As of v1.4.0 there are also methods that work directly on column header rows or indexes; .apply_index() and .applymap_index(). Note that only these methods add styles that will export to Excel. These methods work in a similar way to DataFrame.apply() and DataFrame.applymap().
Table Styles¶
Table styles are flexible enough to control all individual parts of the table, including column headers and indexes. However, they can be unwieldy to type for individual data cells or for any kind of conditional formatting, so we recommend that table styles are used for broad styling, such as entire rows or columns at a time.
Table styles are also used to control features which can apply to the whole table at once such as creating a generic hover functionality. The :hover
pseudo-selector, as well as other pseudo-selectors, can only be used this way.
To replicate the normal format of CSS selectors and properties (attribute value pairs), e.g.
tr:hover {
background-color: #ffff99;
}
the necessary format to pass styles to .set_table_styles() is as a list of dicts, each with a CSS-selector tag and CSS-properties. Properties can either be a list of 2-tuples, or a regular CSS-string, for example:
[15]:
cell_hover = { # for row hover use <tr> instead of <td>
'selector': 'td:hover',
'props': [('background-color', '#ffffb3')]
}
index_names = {
'selector': '.index_name',
'props': 'font-style: italic; color: darkgrey; font-weight:normal;'
}
headers = {
'selector': 'th:not(.index_name)',
'props': 'background-color: #000066; color: white;'
}
s.set_table_styles([cell_hover, index_names, headers])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[15], line 13
5 index_names = {
6 'selector': '.index_name',
7 'props': 'font-style: italic; color: darkgrey; font-weight:normal;'
8 }
9 headers = {
10 'selector': 'th:not(.index_name)',
11 'props': 'background-color: #000066; color: white;'
12 }
---> 13 s.set_table_styles([cell_hover, index_names, headers])
NameError: name 's' is not defined
Next we just add a couple more styling artifacts targeting specific parts of the table. Be careful here, since we are chaining methods we need to explicitly instruct the method not to overwrite
the existing styles.
[17]:
s.set_table_styles([
{'selector': 'th.col_heading', 'props': 'text-align: center;'},
{'selector': 'th.col_heading.level0', 'props': 'font-size: 1.5em;'},
{'selector': 'td', 'props': 'text-align: center; font-weight: bold;'},
], overwrite=False)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[17], line 1
----> 1 s.set_table_styles([
2 {'selector': 'th.col_heading', 'props': 'text-align: center;'},
3 {'selector': 'th.col_heading.level0', 'props': 'font-size: 1.5em;'},
4 {'selector': 'td', 'props': 'text-align: center; font-weight: bold;'},
5 ], overwrite=False)
NameError: name 's' is not defined
As a convenience method (since version 1.2.0) we can also pass a dict to .set_table_styles() which contains row or column keys. Behind the scenes Styler just indexes the keys and adds relevant .col<m>
or .row<n>
classes as necessary to the given CSS selectors.
[19]:
s.set_table_styles({
('Regression', 'Tumour'): [{'selector': 'th', 'props': 'border-left: 1px solid white'},
{'selector': 'td', 'props': 'border-left: 1px solid #000066'}]
}, overwrite=False, axis=0)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[19], line 1
----> 1 s.set_table_styles({
2 ('Regression', 'Tumour'): [{'selector': 'th', 'props': 'border-left: 1px solid white'},
3 {'selector': 'td', 'props': 'border-left: 1px solid #000066'}]
4 }, overwrite=False, axis=0)
NameError: name 's' is not defined
Setting Classes and Linking to External CSS¶
If you have designed a website then it is likely you will already have an external CSS file that controls the styling of table and cell objects within it. You may want to use these native files rather than duplicate all the CSS in python (and duplicate any maintenance work).
Table Attributes¶
It is very easy to add a class
to the main <table>
using .set_table_attributes(). This method can also attach inline styles - read more in CSS Hierarchies.
[21]:
out = s.set_table_attributes('class="my-table-cls"').to_html()
print(out[out.find('<table'):][:109])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[21], line 1
----> 1 out = s.set_table_attributes('class="my-table-cls"').to_html()
2 print(out[out.find('<table'):][:109])
NameError: name 's' is not defined
Data Cell CSS Classes¶
New in version 1.2.0
The .set_td_classes() method accepts a DataFrame with matching indices and columns to the underlying Styler’s DataFrame. That DataFrame will contain strings as css-classes to add to individual data cells: the <td>
elements of the <table>
. Rather than use external CSS we will create our classes internally and add them to table style. We will save adding the
borders until the section on tooltips.
[22]:
s.set_table_styles([ # create internal CSS classes
{'selector': '.true', 'props': 'background-color: #e6ffe6;'},
{'selector': '.false', 'props': 'background-color: #ffe6e6;'},
], overwrite=False)
cell_color = pd.DataFrame([['true ', 'false ', 'true ', 'false '],
['false ', 'true ', 'false ', 'true ']],
index=df.index,
columns=df.columns[:4])
s.set_td_classes(cell_color)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[22], line 1
----> 1 s.set_table_styles([ # create internal CSS classes
2 {'selector': '.true', 'props': 'background-color: #e6ffe6;'},
3 {'selector': '.false', 'props': 'background-color: #ffe6e6;'},
4 ], overwrite=False)
5 cell_color = pd.DataFrame([['true ', 'false ', 'true ', 'false '],
6 ['false ', 'true ', 'false ', 'true ']],
7 index=df.index,
8 columns=df.columns[:4])
9 s.set_td_classes(cell_color)
NameError: name 's' is not defined
Styler Functions¶
Acting on Data¶
We use the following methods to pass your style functions. Both of those methods take a function (and some other keyword arguments) and apply it to the DataFrame in a certain way, rendering CSS styles.
.applymap() (elementwise): accepts a function that takes a single value and returns a string with the CSS attribute-value pair.
.apply() (column-/row-/table-wise): accepts a function that takes a Series or DataFrame and returns a Series, DataFrame, or numpy array with an identical shape where each element is a string with a CSS attribute-value pair. This method passes each column or row of your DataFrame one-at-a-time or the entire table at once, depending on the
axis
keyword argument. For columnwise useaxis=0
, rowwise useaxis=1
, and for the entire table at once useaxis=None
.
This method is powerful for applying multiple, complex logic to data cells. We create a new DataFrame to demonstrate this.
[24]:
np.random.seed(0)
df2 = pd.DataFrame(np.random.randn(10,4), columns=['A','B','C','D'])
df2.style
[24]:
<bound method <lambda> of A B C D
0 1.764052 0.400157 0.978738 2.240893
1 1.867558 -0.977278 0.950088 -0.151357
2 -0.103219 0.410599 0.144044 1.454274
3 0.761038 0.121675 0.443863 0.333674
4 1.494079 -0.205158 0.313068 -0.854096
5 -2.552990 0.653619 0.864436 -0.742165
6 2.269755 -1.454366 0.045759 -0.187184
7 1.532779 1.469359 0.154947 0.378163
8 -0.887786 -1.980796 -0.347912 0.156349
9 1.230291 1.202380 -0.387327 -0.302303>
For example we can build a function that colors text if it is negative, and chain this with a function that partially fades cells of negligible value. Since this looks at each element in turn we use applymap
.
[25]:
def style_negative(v, props=''):
return props if v < 0 else None
s2 = df2.style.applymap(style_negative, props='color:red;')\
.applymap(lambda v: 'opacity: 20%;' if (v < 0.3) and (v > -0.3) else None)
s2
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[25], line 3
1 def style_negative(v, props=''):
2 return props if v < 0 else None
----> 3 s2 = df2.style.applymap(style_negative, props='color:red;')\
4 .applymap(lambda v: 'opacity: 20%;' if (v < 0.3) and (v > -0.3) else None)
5 s2
AttributeError: 'function' object has no attribute 'applymap'
We can also build a function that highlights the maximum value across rows, cols, and the DataFrame all at once. In this case we use apply
. Below we highlight the maximum in a column.
[27]:
def highlight_max(s, props=''):
return np.where(s == np.nanmax(s.values), props, '')
s2.apply(highlight_max, props='color:white;background-color:darkblue', axis=0)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[27], line 3
1 def highlight_max(s, props=''):
2 return np.where(s == np.nanmax(s.values), props, '')
----> 3 s2.apply(highlight_max, props='color:white;background-color:darkblue', axis=0)
NameError: name 's2' is not defined
We can use the same function across the different axes, highlighting here the DataFrame maximum in purple, and row maximums in pink.
[29]:
s2.apply(highlight_max, props='color:white;background-color:pink;', axis=1)\
.apply(highlight_max, props='color:white;background-color:purple', axis=None)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[29], line 1
----> 1 s2.apply(highlight_max, props='color:white;background-color:pink;', axis=1)\
2 .apply(highlight_max, props='color:white;background-color:purple', axis=None)
NameError: name 's2' is not defined
This last example shows how some styles have been overwritten by others. In general the most recent style applied is active but you can read more in the section on CSS hierarchies. You can also apply these styles to more granular parts of the DataFrame - read more in section on subset slicing.
It is possible to replicate some of this functionality using just classes but it can be more cumbersome. See item 3) of Optimization
Debugging Tip: If you’re having trouble writing your style function, try just passing it into DataFrame.apply
. Internally, Styler.apply
uses DataFrame.apply
so the result should be the same, and with DataFrame.apply
you will be able to inspect the CSS string output of your intended function in each cell.
Acting on the Index and Column Headers¶
Similar application is achieved for headers by using:
.applymap_index() (elementwise): accepts a function that takes a single value and returns a string with the CSS attribute-value pair.
.apply_index() (level-wise): accepts a function that takes a Series and returns a Series, or numpy array with an identical shape where each element is a string with a CSS attribute-value pair. This method passes each level of your Index one-at-a-time. To style the index use
axis=0
and to style the column headers useaxis=1
.
You can select a level
of a MultiIndex
but currently no similar subset
application is available for these methods.
[31]:
s2.applymap_index(lambda v: "color:pink;" if v>4 else "color:darkblue;", axis=0)
s2.apply_index(lambda s: np.where(s.isin(["A", "B"]), "color:pink;", "color:darkblue;"), axis=1)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[31], line 1
----> 1 s2.applymap_index(lambda v: "color:pink;" if v>4 else "color:darkblue;", axis=0)
2 s2.apply_index(lambda s: np.where(s.isin(["A", "B"]), "color:pink;", "color:darkblue;"), axis=1)
NameError: name 's2' is not defined
Tooltips and Captions¶
Table captions can be added with the .set_caption() method. You can use table styles to control the CSS relevant to the caption.
[32]:
s.set_caption("Confusion matrix for multiple cancer prediction models.")\
.set_table_styles([{
'selector': 'caption',
'props': 'caption-side: bottom; font-size:1.25em;'
}], overwrite=False)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[32], line 1
----> 1 s.set_caption("Confusion matrix for multiple cancer prediction models.")\
2 .set_table_styles([{
3 'selector': 'caption',
4 'props': 'caption-side: bottom; font-size:1.25em;'
5 }], overwrite=False)
NameError: name 's' is not defined
Adding tooltips (since version 1.3.0) can be done using the .set_tooltips() method in the same way you can add CSS classes to data cells by providing a string based DataFrame with intersecting indices and columns. You don’t have to specify a css_class
name or any css props
for the tooltips, since there are standard defaults, but the option is there if you want more visual control.
[34]:
tt = pd.DataFrame([['This model has a very strong true positive rate',
"This model's total number of false negatives is too high"]],
index=['Tumour (Positive)'], columns=df.columns[[0,3]])
s.set_tooltips(tt, props='visibility: hidden; position: absolute; z-index: 1; border: 1px solid #000066;'
'background-color: white; color: #000066; font-size: 0.8em;'
'transform: translate(0px, -24px); padding: 0.6em; border-radius: 0.5em;')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[34], line 4
1 tt = pd.DataFrame([['This model has a very strong true positive rate',
2 "This model's total number of false negatives is too high"]],
3 index=['Tumour (Positive)'], columns=df.columns[[0,3]])
----> 4 s.set_tooltips(tt, props='visibility: hidden; position: absolute; z-index: 1; border: 1px solid #000066;'
5 'background-color: white; color: #000066; font-size: 0.8em;'
6 'transform: translate(0px, -24px); padding: 0.6em; border-radius: 0.5em;')
NameError: name 's' is not defined
The only thing left to do for our table is to add the highlighting borders to draw the audience attention to the tooltips. We will create internal CSS classes as before using table styles. Setting classes always overwrites so we need to make sure we add the previous classes.
[36]:
s.set_table_styles([ # create internal CSS classes
{'selector': '.border-red', 'props': 'border: 2px dashed red;'},
{'selector': '.border-green', 'props': 'border: 2px dashed green;'},
], overwrite=False)
cell_border = pd.DataFrame([['border-green ', ' ', ' ', 'border-red '],
[' ', ' ', ' ', ' ']],
index=df.index,
columns=df.columns[:4])
s.set_td_classes(cell_color + cell_border)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[36], line 1
----> 1 s.set_table_styles([ # create internal CSS classes
2 {'selector': '.border-red', 'props': 'border: 2px dashed red;'},
3 {'selector': '.border-green', 'props': 'border: 2px dashed green;'},
4 ], overwrite=False)
5 cell_border = pd.DataFrame([['border-green ', ' ', ' ', 'border-red '],
6 [' ', ' ', ' ', ' ']],
7 index=df.index,
8 columns=df.columns[:4])
9 s.set_td_classes(cell_color + cell_border)
NameError: name 's' is not defined
Finer Control with Slicing¶
The examples we have shown so far for the Styler.apply
and Styler.applymap
functions have not demonstrated the use of the subset
argument. This is a useful argument which permits a lot of flexibility: it allows you to apply styles to specific rows or columns, without having to code that logic into your style
function.
The value passed to subset
behaves similar to slicing a DataFrame;
A scalar is treated as a column label
A list (or Series or NumPy array) is treated as multiple column labels
A tuple is treated as
(row_indexer, column_indexer)
Consider using pd.IndexSlice
to construct the tuple for the last one. We will create a MultiIndexed DataFrame to demonstrate the functionality.
[38]:
df3 = pd.DataFrame(np.random.randn(4,4),
pd.MultiIndex.from_product([['A', 'B'], ['r1', 'r2']]),
columns=['c1','c2','c3','c4'])
df3
[38]:
c1 | c2 | c3 | c4 | ||
---|---|---|---|---|---|
A | r1 | -1.048553 | -1.420018 | -1.706270 | 1.950775 |
r2 | -0.509652 | -0.438074 | -1.252795 | 0.777490 | |
B | r1 | -1.613898 | -0.212740 | -0.895467 | 0.386902 |
r2 | -0.510805 | -1.180632 | -0.028182 | 0.428332 |
We will use subset to highlight the maximum in the third and fourth columns with red text. We will highlight the subset sliced region in yellow.
[39]:
slice_ = ['c3', 'c4']
df3.style.apply(highlight_max, props='color:red;', axis=0, subset=slice_)\
.set_properties(**{'background-color': '#ffffb3'}, subset=slice_)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[39], line 2
1 slice_ = ['c3', 'c4']
----> 2 df3.style.apply(highlight_max, props='color:red;', axis=0, subset=slice_)\
3 .set_properties(**{'background-color': '#ffffb3'}, subset=slice_)
AttributeError: 'function' object has no attribute 'apply'
If combined with the IndexSlice
as suggested then it can index across both dimensions with greater flexibility.
[40]:
idx = pd.IndexSlice
slice_ = idx[idx[:,'r1'], idx['c2':'c4']]
df3.style.apply(highlight_max, props='color:red;', axis=0, subset=slice_)\
.set_properties(**{'background-color': '#ffffb3'}, subset=slice_)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[40], line 3
1 idx = pd.IndexSlice
2 slice_ = idx[idx[:,'r1'], idx['c2':'c4']]
----> 3 df3.style.apply(highlight_max, props='color:red;', axis=0, subset=slice_)\
4 .set_properties(**{'background-color': '#ffffb3'}, subset=slice_)
AttributeError: 'function' object has no attribute 'apply'
This also provides the flexibility to sub select rows when used with the axis=1
.
[41]:
slice_ = idx[idx[:,'r2'], :]
df3.style.apply(highlight_max, props='color:red;', axis=1, subset=slice_)\
.set_properties(**{'background-color': '#ffffb3'}, subset=slice_)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[41], line 2
1 slice_ = idx[idx[:,'r2'], :]
----> 2 df3.style.apply(highlight_max, props='color:red;', axis=1, subset=slice_)\
3 .set_properties(**{'background-color': '#ffffb3'}, subset=slice_)
AttributeError: 'function' object has no attribute 'apply'
There is also scope to provide conditional filtering.
Suppose we want to highlight the maximum across columns 2 and 4 only in the case that the sum of columns 1 and 3 is less than -2.0 (essentially excluding rows (:,'r2')
).
[42]:
slice_ = idx[idx[(df3['c1'] + df3['c3']) < -2.0], ['c2', 'c4']]
df3.style.apply(highlight_max, props='color:red;', axis=1, subset=slice_)\
.set_properties(**{'background-color': '#ffffb3'}, subset=slice_)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[42], line 2
1 slice_ = idx[idx[(df3['c1'] + df3['c3']) < -2.0], ['c2', 'c4']]
----> 2 df3.style.apply(highlight_max, props='color:red;', axis=1, subset=slice_)\
3 .set_properties(**{'background-color': '#ffffb3'}, subset=slice_)
AttributeError: 'function' object has no attribute 'apply'
Only label-based slicing is supported right now, not positional, and not callables.
If your style function uses a subset
or axis
keyword argument, consider wrapping your function in a functools.partial
, partialing out that keyword.
my_func2 = functools.partial(my_func, subset=42)
Optimization¶
Generally, for smaller tables and most cases, the rendered HTML does not need to be optimized, and we don’t really recommend it. There are two cases where it is worth considering:
If you are rendering and styling a very large HTML table, certain browsers have performance issues.
If you are using
Styler
to dynamically create part of online user interfaces and want to improve network performance.
Here we recommend the following steps to implement:
1. Remove UUID and cell_ids¶
Ignore the uuid
and set cell_ids
to False
. This will prevent unnecessary HTML.
This is sub-optimal:
[43]:
df4 = pd.DataFrame([[1,2],[3,4]])
s4 = df4.style
This is better:
[44]:
from pandas.io.formats.style import Styler
s4 = Styler(df4, uuid_len=0, cell_ids=False)
2. Use table styles¶
Use table styles where possible (e.g. for all cells or rows or columns at a time) since the CSS is nearly always more efficient than other formats.
This is sub-optimal:
[45]:
props = 'font-family: "Times New Roman", Times, serif; color: #e83e8c; font-size:1.3em;'
df4.style.applymap(lambda x: props, subset=[1])
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[45], line 2
1 props = 'font-family: "Times New Roman", Times, serif; color: #e83e8c; font-size:1.3em;'
----> 2 df4.style.applymap(lambda x: props, subset=[1])
AttributeError: 'function' object has no attribute 'applymap'
This is better:
[46]:
df4.style.set_table_styles([{'selector': 'td.col1', 'props': props}])
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[46], line 1
----> 1 df4.style.set_table_styles([{'selector': 'td.col1', 'props': props}])
AttributeError: 'function' object has no attribute 'set_table_styles'
3. Set classes instead of using Styler functions¶
For large DataFrames where the same style is applied to many cells it can be more efficient to declare the styles as classes and then apply those classes to data cells, rather than directly applying styles to cells. It is, however, probably still easier to use the Styler function api when you are not concerned about optimization.
This is sub-optimal:
[47]:
df2.style.apply(highlight_max, props='color:white;background-color:darkblue;', axis=0)\
.apply(highlight_max, props='color:white;background-color:pink;', axis=1)\
.apply(highlight_max, props='color:white;background-color:purple', axis=None)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[47], line 1
----> 1 df2.style.apply(highlight_max, props='color:white;background-color:darkblue;', axis=0)\
2 .apply(highlight_max, props='color:white;background-color:pink;', axis=1)\
3 .apply(highlight_max, props='color:white;background-color:purple', axis=None)
AttributeError: 'function' object has no attribute 'apply'
This is better:
[48]:
build = lambda x: pd.DataFrame(x, index=df2.index, columns=df2.columns)
cls1 = build(df2.apply(highlight_max, props='cls-1 ', axis=0))
cls2 = build(df2.apply(highlight_max, props='cls-2 ', axis=1, result_type='expand').values)
cls3 = build(highlight_max(df2, props='cls-3 '))
df2.style.set_table_styles([
{'selector': '.cls-1', 'props': 'color:white;background-color:darkblue;'},
{'selector': '.cls-2', 'props': 'color:white;background-color:pink;'},
{'selector': '.cls-3', 'props': 'color:white;background-color:purple;'}
]).set_td_classes(cls1 + cls2 + cls3)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[48], line 5
3 cls2 = build(df2.apply(highlight_max, props='cls-2 ', axis=1, result_type='expand').values)
4 cls3 = build(highlight_max(df2, props='cls-3 '))
----> 5 df2.style.set_table_styles([
6 {'selector': '.cls-1', 'props': 'color:white;background-color:darkblue;'},
7 {'selector': '.cls-2', 'props': 'color:white;background-color:pink;'},
8 {'selector': '.cls-3', 'props': 'color:white;background-color:purple;'}
9 ]).set_td_classes(cls1 + cls2 + cls3)
AttributeError: 'function' object has no attribute 'set_table_styles'
4. Don’t use tooltips¶
Tooltips require cell_ids
to work and they generate extra HTML elements for every data cell.
5. If every byte counts use string replacement¶
You can remove unnecessary HTML, or shorten the default class names by replacing the default css dict. You can read a little more about CSS below.
[49]:
my_css = {
"row_heading": "",
"col_heading": "",
"index_name": "",
"col": "c",
"row": "r",
"col_trim": "",
"row_trim": "",
"level": "l",
"data": "",
"blank": "",
}
html = Styler(df4, uuid_len=0, cell_ids=False)
html.set_table_styles([{'selector': 'td', 'props': props},
{'selector': '.c1', 'props': 'color:green;'},
{'selector': '.l0', 'props': 'color:blue;'}],
css_class_names=my_css)
print(html.to_html())
<style type="text/css">
#T_ td {
font-family: "Times New Roman", Times, serif;
color: #e83e8c;
font-size: 1.3em;
}
#T_ .c1 {
color: green;
}
#T_ .l0 {
color: blue;
}
</style>
<table id="T_">
<thead>
<tr>
<th class=" l0" > </th>
<th class=" l0 c0" >0</th>
<th class=" l0 c1" >1</th>
</tr>
</thead>
<tbody>
<tr>
<th class=" l0 r0" >0</th>
<td class=" r0 c0" >1</td>
<td class=" r0 c1" >2</td>
</tr>
<tr>
<th class=" l0 r1" >1</th>
<td class=" r1 c0" >3</td>
<td class=" r1 c1" >4</td>
</tr>
</tbody>
</table>
[50]:
html
[50]:
0 | 1 | |
---|---|---|
0 | 1 | 2 |
1 | 3 | 4 |
Builtin Styles¶
Some styling functions are common enough that we’ve “built them in” to the Styler
, so you don’t have to write them and apply them yourself. The current list of such functions is:
.highlight_null: for use with identifying missing data.
.highlight_min and .highlight_max: for use with identifying extremeties in data.
.highlight_between and .highlight_quantile: for use with identifying classes within data.
.background_gradient: a flexible method for highlighting cells based on their, or other, values on a numeric scale.
.text_gradient: similar method for highlighting text based on their, or other, values on a numeric scale.
.bar: to display mini-charts within cell backgrounds.
The individual documentation on each function often gives more examples of their arguments.
Highlight Null¶
[51]:
df2.iloc[0,2] = np.nan
df2.iloc[4,3] = np.nan
df2.loc[:4].style.highlight_null(color='yellow')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[51], line 3
1 df2.iloc[0,2] = np.nan
2 df2.iloc[4,3] = np.nan
----> 3 df2.loc[:4].style.highlight_null(color='yellow')
AttributeError: 'function' object has no attribute 'highlight_null'
Highlight Min or Max¶
[52]:
df2.loc[:4].style.highlight_max(axis=1, props='color:white; font-weight:bold; background-color:darkblue;')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[52], line 1
----> 1 df2.loc[:4].style.highlight_max(axis=1, props='color:white; font-weight:bold; background-color:darkblue;')
AttributeError: 'function' object has no attribute 'highlight_max'
Highlight Between¶
This method accepts ranges as float, or NumPy arrays or Series provided the indexes match.
[53]:
left = pd.Series([1.0, 0.0, 1.0], index=["A", "B", "D"])
df2.loc[:4].style.highlight_between(left=left, right=1.5, axis=1, props='color:white; background-color:purple;')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[53], line 2
1 left = pd.Series([1.0, 0.0, 1.0], index=["A", "B", "D"])
----> 2 df2.loc[:4].style.highlight_between(left=left, right=1.5, axis=1, props='color:white; background-color:purple;')
AttributeError: 'function' object has no attribute 'highlight_between'
Highlight Quantile¶
Useful for detecting the highest or lowest percentile values
[54]:
df2.loc[:4].style.highlight_quantile(q_left=0.85, axis=None, color='yellow')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[54], line 1
----> 1 df2.loc[:4].style.highlight_quantile(q_left=0.85, axis=None, color='yellow')
AttributeError: 'function' object has no attribute 'highlight_quantile'
Background Gradient and Text Gradient¶
You can create “heatmaps” with the background_gradient
and text_gradient
methods. These require matplotlib, and we’ll use Seaborn to get a nice colormap.
[55]:
import seaborn as sns
cm = sns.light_palette("green", as_cmap=True)
df2.style.background_gradient(cmap=cm)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[55], line 4
1 import seaborn as sns
2 cm = sns.light_palette("green", as_cmap=True)
----> 4 df2.style.background_gradient(cmap=cm)
AttributeError: 'function' object has no attribute 'background_gradient'
[56]:
df2.style.text_gradient(cmap=cm)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[56], line 1
----> 1 df2.style.text_gradient(cmap=cm)
AttributeError: 'function' object has no attribute 'text_gradient'
.background_gradient and .text_gradient have a number of keyword arguments to customise the gradients and colors. See the documentation.
Set properties¶
Use Styler.set_properties
when the style doesn’t actually depend on the values. This is just a simple wrapper for .applymap
where the function returns the same properties for all cells.
[57]:
df2.loc[:4].style.set_properties(**{'background-color': 'black',
'color': 'lawngreen',
'border-color': 'white'})
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[57], line 1
----> 1 df2.loc[:4].style.set_properties(**{'background-color': 'black',
2 'color': 'lawngreen',
3 'border-color': 'white'})
AttributeError: 'function' object has no attribute 'set_properties'
Bar charts¶
You can include “bar charts” in your DataFrame.
[58]:
df2.style.bar(subset=['A', 'B'], color='#d65f5f')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[58], line 1
----> 1 df2.style.bar(subset=['A', 'B'], color='#d65f5f')
AttributeError: 'function' object has no attribute 'bar'
Additional keyword arguments give more control on centering and positioning, and you can pass a list of [color_negative, color_positive]
to highlight lower and higher values or a matplotlib colormap.
To showcase an example here’s how you can change the above with the new align
option, combined with setting vmin
and vmax
limits, the width
of the figure, and underlying css props
of cells, leaving space to display the text and the bars. We also use text_gradient
to color the text the same as the bars using a matplotlib colormap (although in this case the visualization is probably better without this additional effect).
[59]:
df2.style.format('{:.3f}', na_rep="")\
.bar(align=0, vmin=-2.5, vmax=2.5, cmap="bwr", height=50,
width=60, props="width: 120px; border-right: 1px solid black;")\
.text_gradient(cmap="bwr", vmin=-2.5, vmax=2.5)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[59], line 1
----> 1 df2.style.format('{:.3f}', na_rep="")\
2 .bar(align=0, vmin=-2.5, vmax=2.5, cmap="bwr", height=50,
3 width=60, props="width: 120px; border-right: 1px solid black;")\
4 .text_gradient(cmap="bwr", vmin=-2.5, vmax=2.5)
AttributeError: 'function' object has no attribute 'format'
The following example aims to give a highlight of the behavior of the new align options:
[61]:
HTML(head)
[61]:
Align | All Negative | Both Neg and Pos | All Positive | Large Positive |
---|