2

The following is easy on SQL but I cannot figure out how to do it on Pandas.

My inputs are:

import pandas as p    
symb = p.DataFrame(['a', 'b', 'c'], columns=['symb'])
fld = p.DataFrame(['field1', 'field2', 'field3'], columns=['fld'])

I want to be able to get the following DataFrame as output:

symb  fld
a     field1
a     field2
a     field3
b     field1
b     field2
b     field3
c     field1
c     field2
c     field3

Any idea on how to get to this result?

Thanks!

2 Answers 2

3

First create new columns with same values in both DataFrames. Then use merge by them and then drop helper column:

symb['one'] = 1
fld['one'] = 1
print pd.merge(symb, fld, on='one').drop('one', axis=1)
  symb     fld
0    a  field1
1    a  field2
2    a  field3
3    b  field1
4    b  field2
5    b  field3
6    c  field1
7    c  field2
8    c  field3
Sign up to request clarification or add additional context in comments.

2 Comments

Just one comment ... do you guys know how this is called ? I could not even google how to do this as I was not aware how this operation is defined (join, merge and other will link to pages showing the general instructions). This could help future users to find this page.
I think it is called Cartesian product or cross join
1

Solution

pd.DataFrame(index=symb.symb, columns=fld.fld).fillna(0).stack().reset_index()[['symb', 'fld']]

1 Comment

While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.