1

I have a Dataframe df which columns are ['col_A' , 'col_B', 'col_C'] and it has 1000 rows.

I have also a series that has as index the names of the columns of the DataFrame and a value that is between 0 and 1000. For instance: s is the Serie as such:

Col_A      20
Col_B      0
Col_C      300

I would like to change the dataframe as:

df.iloc[0:20,0] = a certain value  (column A)

I've tried slicing using a for loop but its taking too much time. is there a pandas function able to do this ?

My code is:

for i in range(0,3):
    df.iloc[0:s.iloc[i]-1,i] = -1

In a general scope, I sometimes need to map a Series index to a Dataframe column but struggle to find a fast and less consuming method.

Thank you

5
  • Post your code with the loop? An MCVE would be nice. Commented Mar 19, 2020 at 9:42
  • I've edited the question to include the code Commented Mar 19, 2020 at 9:53
  • Again, mcve would be nice. Not clear where's that s coming from. Commented Mar 19, 2020 at 9:54
  • @Divakar I've edited, here df is the dataframe and s is the Serie that has as index the name of the columns of df Commented Mar 19, 2020 at 10:00
  • You are just looping through 3 cols. That's not a lot. Hence, I am not sure if avoiding the loop would make much sense. Commented Mar 19, 2020 at 10:22

1 Answer 1

3

For Loop

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0,10, (1000,3)), 
                  columns=['col_A', 'col_B', 'col_C'])

s = pd.Series([20,0,300], 
              index=['col_A', 'col_B', 'col_C'])

for col, idx in s.iteritems():
    df.loc[:idx, col] = -1

df
     col_A  col_B  col_C
0     -999   -999   -999
1     -999      3   -999
2     -999      3   -999
3     -999      2   -999
4     -999      1   -999
..     ...    ...    ...
995      2      6      9
996      1      9      5
997      2      6      4
998      4      0      1
999      9      2      8

Pandas Apply

def f(c, s):
    c[:s[c.name]] = -1
    return c

df = df.apply(lambda c: f(c,s))
df 
     col_A  col_B  col_C
0       -1      6     -1
1       -1      1     -1
2       -1      6     -1
3       -1      1     -1
4       -1      6     -1
..     ...    ...    ...
995      2      4      3
996      2      0      0
997      8      5      7
998      3      5      5
999      5      7      7

Performance In my local machine, using N=1000, the for loop is slightly faster. Increasing N to 1M, apply method is faster:

def for_loop(N):
    df = pd.DataFrame(np.random.randint(0,10, (N,3)), 
                  columns=['col_A', 'col_B', 'col_C'])
    for col, idx in s.iteritems():
        df.loc[:idx, col] = -1
    return df

def apply_method(N):

    def f(c, s):
        c[:s[c.name]] = -1
        return c

    return df.apply(lambda c: f(c,s))

%timeit for_loop(1000)
1.19 ms ± 58.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit apply_method(1000)
185 ms ± 44.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit for_loop(1000_000)
303 ms ± 25.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit apply_method(1000_000)
162 ms ± 8.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Sign up to request clarification or add additional context in comments.

8 Comments

Loop s are not fast and less consuming method.
yes that's what I'm looking for, I've used a for loop (check my code in question, i've edited it) but the for loop is taking too much time when the columns get more
@jezrael Please suggest/post a vectorized implementation, I am looking forward to that :). Also, here we are only looping through a series with size len(df.columns)
@FBruzzesi - I am waiting for divakar ;)
Do you think that defining a custom function and using apply would speed this up?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.