1

I like using nested data structures and now I'm trying to understand how to use Pandas

Here is a toy model:

a=pd.DataFrame({'x':[1,2],'y':[10,20]})
b=pd.DataFrame({'x':[3,4],'y':[30,40]})
c=[a,b]

now I would like to get:

sol=np.array([[[1],[3]],[[2],[4]]])

I have an idea to get both sol[0] and sol[1] as:

s0=np.array([item[['x']].ix[0] for item in c])
s1=np.array([item[['x']].ix[1] for item in c])

but to get sol I would run over the index and I don't think it is really pythonic...

1 Answer 1

1

It looks like you want just the x columns from a and b. You can concatenate two Series (or DataFrames) into a new DataFrame using pd.concat:

In [132]: pd.concat([a['x'], b['x']], axis=1)
Out[132]: 
   x  x
0  1  3
1  2  4

[2 rows x 2 columns]

Now, if you want a numpy array, use the values attribute:

In [133]: pd.concat([a['x'], b['x']], axis=1).values
Out[133]: 
array([[1, 3],
       [2, 4]], dtype=int64)

And if you want a numpy array with the same shape as sol, then use the reshape method:

In [134]: pd.concat([a['x'], b['x']], axis=1).values.reshape(2,2,1)
Out[134]: 
array([[[1],
        [3]],

       [[2],
        [4]]], dtype=int64)

In [136]: np.allclose(pd.concat([a['x'], b['x']], axis=1).values.reshape(2,2,1), sol)
Out[136]: True
Sign up to request clarification or add additional context in comments.

2 Comments

It looks fine but I still have a doubt since I would like to use the list c and not its elements explicitly.
Then use pd.concat([df['x'] for df in c], axis=1).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.