I am working in Jupyter Notebook 6.1.4 with Pandas 1.1.3 on Windows 10.
I have a dataframe with 2000 rows:
import pandas as pd
import numpy as np
c = np.random.choice
colours = ['blue', 'yellow', 'green', 'green... no, blue']
knights = ['Bedevere', 'Galahad', 'Arthur', 'Robin', 'Lancelot']
qualities = ['wise', 'brave', 'pure', 'not quite so brave']
dfsize = 2000
knights = pd.DataFrame({'name_id':c(range(dfsize), dfsize, replace=False),
'favourite_colour':c(colours, dfsize),
'favourite_knight':c(knights, dfsize),
'favourite_quality':c(qualities, dfsize)})
I create a copy, and a dataframe comparing the two (obviously this will just be a dataframe full of True values).
knights2 = knights.copy()
same = knights.eq(knights2)
I create a function to feed to a pandas Styler object to highlight the dataframe
def highlight_true(value):
if value == True:
return "background-color:green"
else:
return "background-color:red"
I display the new highlighted dataframe
display(same.style.applymap(highlight_true))
All cells are highlighted green as expected.
But if I increase the size of the dataframe to say 3000
dfsize = 3000
It highlights up to row 2049 and then leaves the rest unhighlighted (neither green nor red). Can anyone explain what is happening here? Does applymap have a limit to how large a dataframe it will highlight? Does this happen on all computers / versions of Jupyter Notebook/ Pandas or just the ones I'm working with? Is there a way to fix this so pandas can highlight large dataframes? Is anyone able to reproduce this result? What size dataframes did you test it on? What version of Pandas/ Jupyter Notebook did you use? Thank you.
EDIT: Additional info I've just found in case it helps anyone figure out what's going wrong: If I create another copy with some different values and compare this to the original:
some_rows = [0, 1000, 2000, 2500, 3000]
dfsize = 3500
knights3 = knights.copy()
for x in some_rows:
knights3.loc[x, 'favourite_quality'] = "a different value"
same2 = knights.eq(knights3)
display(same2.style.applymap(highlight_true))
It stops highlighting True values in green at row 2049 as before, but carries on through the dataframe just highlighting the False values in red (and leaving the True values unhighlighted).