0

I have a data frame...

            A  B  C  D  E  F
0  2018-02-01  2  3  4  5  6
1  2018-02-02  6  7  8  4  2
2  2018-02-03  3  4  5  6  7

...which I convert to a numpy array...

[['2018-02-01' 2 3 4 5 6]
 ['2018-02-02' 6 7 8 4 2]
 ['2018-02-03' 3 4 5 6 7]]

What I would like to do is the following:

  1. Store only columns A, B, and C in the numpy array, rather than all the columns.
  2. I would like to loop over the first column, then the second and the third one. How can I achieve that?

My code is as follows:

import pandas as pd

df = pd.DataFrame([
 ['2018-02-01', 1, 3, 6, 102, 8],
['2018-02-01', 2, 3, 4, 5, 6],
['2018-02-02', 6, 7, 8, 4, 2],
['2018-02-03', 3, 4, 5, 6, 7]
], columns=['A', 'B', 'C', 'D', 'E', 'F'])

print(df)

#--> Here only save Columns A,B,C    
nparray = df.as_matrix()
print(nparray)

#--> Loop throug Columns and would like to have it looped over the Column A first
for i in nparray:
    print(i)
#Using the Values in B and C columns for that loop
calc= [func(B,C)
      for B, C in zip(nparray)]

Update: I made a numerical example.

            A  B  C  D    E  F
0  2018-02-01  1  3  6  102  8
1  2018-02-01  2  3  4    5  6
2  2018-02-02  6  7  8    4  2
3  2018-02-03  3  4  5    6  7

Dummy code looks likte the following (it is more a nested loop)

loop over date 2018-02-01:

calc = func(Column B + Column C) = 1+3 = 4

next row is the same date so:

calc += func(Column B + Column C) = 4 + 2+ 3 = 9

for date 2018-02-01 the result is 9 and can be stored e.g. in a csv file

loop over date 2018-02-02

calc = func(Column B + Column C) = 6+7 = 13

for date 2018-02-02 the result is 13 and can be stored e.g. in a csv file

loop over date 2018-02-03

calc = func(Column B + Column C) = 3+4 = 7

for date 2018-02-03 the result is 7 and can be stored e.g. in a csv file

etc

5
  • 1
    df['A'].values etc will give you the relevant numpy array of that column. Commented Feb 1, 2018 at 13:54
  • Do keep in mind that [['2018-02-01' 2 3 4 5 6] ... will never be a proper NumPy array, or all elements will be objects: you can't mix strings and integers. You can use a structured array instead, depending on how you want to use it. Commented Feb 1, 2018 at 13:55
  • Without a clear use-case why you want to use NumPy arrays, instead of the Dataframe and Series/columns, I find this an unclear question. If you want to learn about NumPy arrays themselves, start there instead, not with a Dataframe. Commented Feb 1, 2018 at 13:56
  • @MCM, numpy is great if your data is a single dtype. you should probably use df[['B', 'C', 'D', 'E', 'F']].values to only get the numeric component. since you are learning, also check the type of your array via x.dtype. As an example, you may wish to upcast to int64 or downcast to int8. Commented Feb 1, 2018 at 14:09
  • @all, thanks for the response. I tried to make my question more clear with an example. Maybe making it for me more clear with an example using the values shown above. I am trying to use less pandas functions but more numpy etc Commented Feb 1, 2018 at 15:14

2 Answers 2

1
  1. df[['A','B','C']].values
  2. df[['B', 'C']].apply(func, axis=1)

Here, func will receive one row at a time, so you could define it this way:

def func(x):
    x.B *= 2
    x.C += 1
    return x

You could also do this:

calc = [func(B,C) for B, C in df[['B', 'C']].itertuples(index=False)]

Or this:

calc = [func(x.B, x.C) for x in df.itertuples()]

Be aware that this sort of iterating code, whether using itertuples or apply, is very slow compared with other "vectorized" approaches. But if you insist on using loops, you can, and for small data it will be OK.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot. Ist it possible just to use my code with a loop instead using dataframe functions?
thanks a lot for the nice example. Somehow using your iteration in calc does not work I am receiving error messages: TypeError: func() takes 1 positional argument but 2 were given and TypeError: func() takes 1 positional argument but 2 were given. My second question is how do I see that it loops through the dates? What I want is loop over a date, using the data in column B and C then row 2 etc until the next date comes. So for equal dates it should iterate through all the data and like in your example aggregate the calc values. Hope it is clearer?
0

For the first part of your question, just select the columns you want to use:

print df[['A', 'B', 'C']].as_matrix()
>>>
[['2018-02-01' 2L 3L]
 ['2018-02-02' 6L 7L]
 ['2018-02-03' 3L 4L]]

The second part of your question is redundant, there is no difference between iterating through a numpy array compared to a data frame, because the individual data types will be the same, in this case integers.

Hence use:

for k in df.A:
    print k

1 Comment

’s, thanks a lot. Is it not possible to extend my example shown above? I updated everything. Hope my problem is now clearer?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.