I am writing keywords and their corresponding page numbers via LaTeX into textfiles which i then process with Python. How can I create a sorted list of page numbers with their corresponding keyword?
The following code gives me the unique list however it is not sorted.
import pandas as pd
def unique(liste):
a = liste.split(',')
a = [int(numeric_string) for numeric_string in a]
a = sorted(a)
a = map(str,a)
b = set(a)
return ','.join(b)
df = pd.DataFrame({'keyword': ["foo","foo","foo","foo","foo","foo","foo","foo","bar","bar","bar"], "page": [1,2,3,3,4,5,6,7,7,9,10]})
df['page'] = df['page'].astype(str)
print(df)
grouped = df.groupby('keyword',as_index=False).agg(lambda col: ','.join(col))
grouped = pd.DataFrame(grouped)
grouped['unique'] = grouped['page'].apply(unique)
print(grouped)
produces
keyword page
0 foo 1
1 foo 2
2 foo 3
3 foo 3
4 foo 4
5 foo 5
6 foo 6
7 foo 7
8 bar 7
9 bar 9
10 bar 10
keyword page unique
0 bar 7,9,10 9,7,10
1 foo 1,2,3,3,4,5,6,7 3,7,6,4,5,2,1