5

I have below pandas dataframe

import pandas as pd

data = pd.DataFrame({'x1':range(10, 18),    # Create pandas DataFrame
                     'x2':['a', 'b', 'b', 'c', 'd', 'a', 'b', 'd'],
                     'x3':range(27, 19, - 1),
                     'x4':['x', 'z', 'y', 'y', 'x', 'y', 'z', 'x']})

Now I want to slice this dataframe at index points starting at Pt = [0, 3, 5], and then put all these sliced dataframe in a list of length 3.

Could you please help if there is any method or function to achieve that?

3 Answers 3

3

There is a builtin numpy function for that: np.split. It will split the input array or DataFrame at the provided split points. In your case, since you might not want the values before the first point ignore them.

If your first point is 0:

out = np.split(data, Pt[1:])

Or, for a generic case:

out = np.split(data.iloc[Pt[0]:], Pt[1:])
# or
out = np.split(data, Pt)[1:]

Output:

[   x1 x2  x3 x4
 0  10  a  27  x
 1  11  b  26  z
 2  12  b  25  y,
    x1 x2  x3 x4
 3  13  c  24  y
 4  14  d  23  x,
    x1 x2  x3 x4
 5  15  a  22  y
 6  16  b  21  z
 7  17  d  20  x]

Last thing, if you want to split based on labels and not positions (e.g. if your index is a, b, c.. instead of 0, 1, 2), you can first convert the labels to positions with Index.get_indexer_for:

Pt = data.index.get_indexer_for(Pt)

Example:

data.index = list('abcdefgh')
Pt = data.index.get_indexer_for(['a', 'd', 'f'])
np.split(data.iloc[Pt[0]:], Pt[1:])

[   x1 x2  x3 x4
 a  10  a  27  x
 b  11  b  26  z
 c  12  b  25  y,
    x1 x2  x3 x4
 d  13  c  24  y
 e  14  d  23  x,
    x1 x2  x3 x4
 f  15  a  22  y
 g  16  b  21  z
 h  17  d  20  x]
Sign up to request clarification or add additional context in comments.

Comments

1

First all values must be non-negative integers and must be in ascending order, all values must be less or equal like length of DataFrame in Pt.

#end of dataframe for last slice 
Pt.append(len(data))

#list comprehension for list of DataFrames
L = [data.iloc[Pt[i]:Pt[i+1]] for i in range(len(Pt)-1)]
print (L)

[   x1 x2  x3 x4
0  10  a  27  x
1  11  b  26  z
2  12  b  25  y,    x1 x2  x3 x4
3  13  c  24  y
4  14  d  23  x,    x1 x2  x3 x4
5  15  a  22  y
6  16  b  21  z
7  17  d  20  x]

Comments

0

Here's the code to get the answer:

import pandas as pd

data = pd.DataFrame({
    'x1': range(10, 18),
    'x2': ['a', 'b', 'b', 'c', 'd', 'a', 'b', 'd'],
    'x3': range(27, 19, -1),
    'x4': ['x', 'z', 'y', 'y', 'x', 'y', 'z', 'x']
})

Pt = [0, 3, 5]          # slice points
Pt.append(len(data))     # add end index

# Slice DataFrame
dfs = [data.iloc[Pt[i]:Pt[i+1]] for i in range(len(Pt)-1)]

# Display each slice
for i in range(len(dfs)):
    print("Slice", i+1)
    print(dfs[i], "\n")

The output is :

Slice 1
   x1 x2  x3 x4
0  10  a  27  x
1  11  b  26  z
2  12  b  25  y 

Slice 2
   x1 x2  x3 x4
3  13  c  24  y
4  14  d  23  x 

Slice 3
   x1 x2  x3 x4
5  15  a  22  y
6  16  b  21  z
7  17  d  20  x

Explanation

  • Append the length of the DataFrame to include the last segment.
  • Loop through index pairs and use .iloc[start:end].
  • Store each slice in a list (dfs).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.