1

I have a list

a = [15, 50 , 75]

Using the above list I have to create smaller dataframes filtering out rows (the number of rows is defined by the list) on the index from the main dataframe.

let's say my main dataframe is df the dataframes I'd like to have is df1 (from row index 0-15),df2 (from row index 15-65), df3 (from row index 65 - 125)

since these are just three I can easily use something like this below:

limit1 = a[0]
limit2 = a[1] + limit1
limit3 = a[2] + limit3

df1 = df.loc[df.index <= limit1]
df2 = df.loc[(df.index > limit1) & (df.index <= limit2)]
df2 = df2.reset_index(drop = True)
df3 =  df.loc[(df.index > limit2) & (df.index <= limit3)]
df3 = df3.reset_index(drop = True)

But what if I want to implement this with a long list on the main dataframe df, I am looking for something which is iterable like the following (which doesn't work):

df1 = df.loc[df.index <= limit1]
for i in range(2,3):
 for j in range(2,3):
  for k in range(2,3):
   df[i] =  df.loc[(df.index > limit[j]) & (df.index <= limit[k])]
   df[i] = df[i].reset_index(drop=True)
   print(df[i])
3
  • According to your logic it should be from 0-15 , then from 15-65, and then from 65-90, else your rule is changing Commented Dec 11, 2019 at 13:38
  • Yes. That's correct Commented Dec 11, 2019 at 13:45
  • Check my answer Commented Dec 11, 2019 at 13:48

2 Answers 2

2

you could modify your code by building dataframes from the main dataframe iteratively cutting out slices from the end of the dataframe.

dfs = [] # this list contains your partitioned dataframes
a = [15, 50 , 75]
for idx in a[::-1]:
    dfs.insert(0, df.iloc[idx:])
    df = df.iloc[:idx]
dfs.insert(0, df) # add the last remaining dataframe
print(dfs) 

Another option is to use list expressions as follows:

a = [0, 15, 50 , 75]
dfs = [df.iloc[a[i]:a[i+1]] for i in range(len(a)-1)]
Sign up to request clarification or add additional context in comments.

4 Comments

on a 75 row dataframe this brings a 60 row df, then a 10 row df and then a 0 row df
Now it brings a 35 row, a 25 row and a 0 row df
0-to-15 is missing. I would set: a = [0, 15, 50 , 75]
true. adding dfs.insert(0, df) at the end solves it.
1

This does it. It's better to use dictionaries if you want to store multiple variables and call them later. It's bad practice to create variables in an iterative way, so always avoid it.

df = pd.DataFrame(np.linspace(1,75,75), columns=['a'])
a = [15, 50 , 25]
d = {}

b = 0
for n,i in enumerate(a):
    d[f'df{n}'] = df.iloc[b:b+i]
    b+=i

Output:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.