2

In Pandas I have a series and a multi-index:

s = pd.Series([1,2,3,4], index=['w', 'x', 'y', 'z'])
idx = pd.MultiIndex.from_product([['a', 'b'], ['c', 'd']])

What is the best way for me to create a DataFrame that has idx as index, and s as value for each row, preserving the index in S as columns?

df =
       w   x   y   z
a  c   1   2   3   4
   d   1   2   3   4
b  c   1   2   3   4
   d   1   2   3   4

3 Answers 3

3

Use the pd.DataFrame constructor followed by assign

pd.DataFrame(index=idx).assign(**s)

     w  x  y  z
a c  1  2  3  4
  d  1  2  3  4
b c  1  2  3  4
  d  1  2  3  4
Sign up to request clarification or add additional context in comments.

2 Comments

This is a very smart solution!
This is super interesting. The only thing I will note is that assign shuffles the order of s based on its index (see the Notes section in documentation). So if the index names are ['w', 'x', 'y', 'a'] instead, column a will jump to the front. But that's OK for my purpose.
1

You can use numpy.repeat with numpy.ndarray.reshape for duplicate data and last DataFrame constructor:

arr = np.repeat(s.values, len(idx)).reshape(-1, len(idx))
df = pd.DataFrame(arr, index=idx, columns=s.index)
print (df)
     w  x  y  z
a c  1  1  1  1
  d  2  2  2  2
b c  3  3  3  3
  d  4  4  4  4

Timings:

np.random.seed(123)
s = pd.Series(np.random.randint(10, size=1000))
s.index = s.index.astype(str)
idx = pd.MultiIndex.from_product([np.random.randint(10, size=250), ['a','b','c', 'd']])

In [32]: %timeit (pd.DataFrame(np.repeat(s.values, len(idx)).reshape(len(idx), -1), index=idx, columns=s.index))
100 loops, best of 3: 3.94 ms per loop

In [33]: %timeit (pd.DataFrame(index=idx).assign(**s))
1 loop, best of 3: 332 ms per loop

In [34]: %timeit pd.DataFrame([s]*len(idx),idx,s.index)
10 loops, best of 3: 82.9 ms per loop

2 Comments

thank you! I learned much from your answers to this (and other) question that there is a trade off between speed and syntactic localization within Pandas. I understand now that if I can resort to numpy more often, my speed can go up!
Ya, if performance is not important, all solutions are nice, good lukc!
0

Use [s]*len(s) as data, idx as index and s.index as column to reconstruct a df.

pd.DataFrame([s]*len(s),idx,s.index)
Out[56]: 
     w  x  y  z
a c  1  2  3  4
  d  1  2  3  4
b c  1  2  3  4
  d  1  2  3  4

1 Comment

This only works coincidentally because len(s) == len(idx). Try s = pd.Series([0,1,2,3,4], index=['v', 'w', 'x', 'y', 'z']) and it fails. You want this instead pd.DataFrame([s]*len(idx),idx,s.index)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.