I need to keep track of each position of a named column. So if the first column has the same name k times, its value would be 1*k. It's best shown in an example:
df1 = pd.DataFrame({'name':['n1', 'n2', 'n3']})
df1['pos'] = df1.index + 1
df2 = pd.DataFrame({'name':['n1', 'n3', 'n4']})
df2['pos'] = df2.index + 1
print "df1:\n", df1, '\n'
print "df2:\n", df2, '\n'
# Hack
df3 = df1.merge(df2, on='name', how='outer')
df3 = df3.fillna(0)
print df3
# Sum the desired values
df3['pos'] = df3.pos_x + df3.pos_y
del df3['pos_x']
del df3['pos_y']
# Produce desired output
print "\nDesired Output:\n", df3
The output is:
df1:
name pos
0 n1 1
1 n2 2
2 n3 3
df2:
name pos
0 n1 1
1 n3 2
2 n4 3
name pos_x pos_y
0 n1 1 1
1 n2 2 0
2 n3 3 2
3 n4 0 3
Desired Output:
name pos
0 n1 2
1 n2 2
2 n3 5
3 n4 3
In df1 and df2, the pos column is being constructed by the index. I'm not picky, the pos column could be the same as the index.
Anyone know a more compact way to get the counts in the final pos column for each of the names?
I need to sum like this over hundreds of thousands of dataframes that I'll calculate iteratively, where pos column represents the performance of each name.