1

Is there a way to tell pandas not to index by character columns? My code is

A=['a','b','c']
B=[1,2,3]
pd.DataFrame(A,B)
   0
1  a
2  b
3  c

What I want is just two columns so I can do groupby column A. How do I go about doing this? I can do something like this but I would like to skip the column names to gain as much performance as possible.

pd.DataFrame({'A':A,'B':B})
   A  B
0  a  1
1  b  2
2  c  3
7
  • df = pd.DataFrame([A, B]).T should give you what you want. Commented Feb 7, 2018 at 19:00
  • pd.DataFrame(list(zip(A,B))) Commented Feb 7, 2018 at 19:01
  • Both works. but I get an error when I try to group by 0 like this pd.DataFrame([A,B]).T.groupby(0).mean(). I think it is losing types when it is transposed. Commented Feb 7, 2018 at 19:07
  • 1
    When you say, "so I can do groupby column A," what do you mean? What does your desired output look like? Commented Feb 7, 2018 at 19:13
  • 'pd.DataFrame(list(zip(A,B))).groupby(0, as_index=False).mean()' ? Commented Feb 7, 2018 at 19:16

1 Answer 1

1

If you're actually dealing with only two columns, you can group one series by another.

In [6]: A = ['a', 'a', 'b', 'b', 'c', 'c']

In [7]: B = [1, 2, 3, 4, 5, 6]

In [8]: pd.Series(B).groupby(A).mean()
Out[8]: 
a    0.5
b    2.5
c    4.5
dtype: float64

I've provided some timings below.

In [9]: %timeit pd.Series(B).groupby(A).mean()
1000 loops, best of 3: 1.07 ms per loop

In [10]: %timeit pd.DataFrame({'A': A, 'B': B}).groupby('A').mean()
100 loops, best of 3: 2.66 ms per loop

In [11]: %timeit pd.DataFrame(list(zip(A, B))).groupby(0).mean()
100 loops, best of 3: 2.38 ms per loop
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.