Create pandas dataframe by repeating one row with new multiindex

Question

In Pandas I have a series and a multi-index:

s = pd.Series([1,2,3,4], index=['w', 'x', 'y', 'z'])
idx = pd.MultiIndex.from_product([['a', 'b'], ['c', 'd']])

What is the best way for me to create a DataFrame that has idx as index, and s as value for each row, preserving the index in S as columns?

df =
       w   x   y   z
a  c   1   2   3   4
   d   1   2   3   4
b  c   1   2   3   4
   d   1   2   3   4

piRSquared · Accepted Answer · 2017-06-21 04:09:33Z

3

Use the pd.DataFrame constructor followed by assign

pd.DataFrame(index=idx).assign(**s)

     w  x  y  z
a c  1  2  3  4
  d  1  2  3  4
b c  1  2  3  4
  d  1  2  3  4

answered Jun 21, 2017 at 4:09

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Allen Qin Over a year ago

This is a very smart solution!

Zhang18 Over a year ago

This is super interesting. The only thing I will note is that assign shuffles the order of s based on its index (see the Notes section in documentation). So if the index names are ['w', 'x', 'y', 'a'] instead, column a will jump to the front. But that's OK for my purpose.

jezrael · Accepted Answer · 2017-06-21 05:45:09Z

1

You can use numpy.repeat with numpy.ndarray.reshape for duplicate data and last DataFrame constructor:

arr = np.repeat(s.values, len(idx)).reshape(-1, len(idx))
df = pd.DataFrame(arr, index=idx, columns=s.index)
print (df)
     w  x  y  z
a c  1  1  1  1
  d  2  2  2  2
b c  3  3  3  3
  d  4  4  4  4

Timings:

np.random.seed(123)
s = pd.Series(np.random.randint(10, size=1000))
s.index = s.index.astype(str)
idx = pd.MultiIndex.from_product([np.random.randint(10, size=250), ['a','b','c', 'd']])

In [32]: %timeit (pd.DataFrame(np.repeat(s.values, len(idx)).reshape(len(idx), -1), index=idx, columns=s.index))
100 loops, best of 3: 3.94 ms per loop

In [33]: %timeit (pd.DataFrame(index=idx).assign(**s))
1 loop, best of 3: 332 ms per loop

In [34]: %timeit pd.DataFrame([s]*len(idx),idx,s.index)
10 loops, best of 3: 82.9 ms per loop

edited Jun 21, 2017 at 5:45

answered Jun 21, 2017 at 5:36

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

2 Comments

Zhang18 Over a year ago

thank you! I learned much from your answers to this (and other) question that there is a trade off between speed and syntactic localization within Pandas. I understand now that if I can resort to numpy more often, my speed can go up!

jezrael Over a year ago

Ya, if performance is not important, all solutions are nice, good lukc!

Allen Qin · Accepted Answer · 2017-06-21 04:52:40Z

0

Use [s]*len(s) as data, idx as index and s.index as column to reconstruct a df.

pd.DataFrame([s]*len(s),idx,s.index)
Out[56]: 
     w  x  y  z
a c  1  2  3  4
  d  1  2  3  4
b c  1  2  3  4
  d  1  2  3  4

answered Jun 21, 2017 at 4:52

Allen Qin

20k9 gold badges55 silver badges68 bronze badges

1 Comment

piRSquared Over a year ago

This only works coincidentally because len(s) == len(idx). Try s = pd.Series([0,1,2,3,4], index=['v', 'w', 'x', 'y', 'z']) and it fails. You want this instead pd.DataFrame([s]*len(idx),idx,s.index)

Collectives™ on Stack Overflow

Create pandas dataframe by repeating one row with new multiindex

3 Answers 3

2 Comments

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related