1

I need to keep track of each position of a named column. So if the first column has the same name k times, its value would be 1*k. It's best shown in an example:

df1 = pd.DataFrame({'name':['n1', 'n2', 'n3']})
df1['pos'] = df1.index + 1

df2 = pd.DataFrame({'name':['n1', 'n3', 'n4']})
df2['pos'] = df2.index + 1

print "df1:\n", df1, '\n'
print "df2:\n", df2, '\n'

# Hack
df3 = df1.merge(df2, on='name', how='outer')
df3 = df3.fillna(0)
print df3

# Sum the desired values
df3['pos'] = df3.pos_x + df3.pos_y
del df3['pos_x']
del df3['pos_y']

# Produce desired output
print "\nDesired Output:\n", df3

The output is:

df1:
  name  pos
0   n1    1
1   n2    2
2   n3    3 

df2:
  name  pos
0   n1    1
1   n3    2
2   n4    3 

  name  pos_x  pos_y
0   n1      1      1
1   n2      2      0
2   n3      3      2
3   n4      0      3

Desired Output:
  name  pos
0   n1    2
1   n2    2
2   n3    5
3   n4    3

In df1 and df2, the pos column is being constructed by the index. I'm not picky, the pos column could be the same as the index.

Anyone know a more compact way to get the counts in the final pos column for each of the names?

I need to sum like this over hundreds of thousands of dataframes that I'll calculate iteratively, where pos column represents the performance of each name.

1 Answer 1

4

Another option is to concat rather than merge:

In [11]: df4 = pd.concat([df1, df2])

Then you can groupby 'name', and sum the result (pos):

In [12]: g = df4.groupby('name', as_index=False)

In [13]: g.sum()
Out[13]: 
  name  pos
0   n1    2
1   n2    2
2   n3    5
3   n4    3
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.