0

I have a dataframe as below:

Size    C1      C2      C3      C4      C5      C6      C7      C8      C9
10000   .90     1.10    1.30    1.50    2.10    3.10    5.60    8.40    15.80
15000   1.35    1.65    1.95    2.25    3.15    4.65    8.40    12.60   23.70
20000   1.80    2.20    2.60    3.00    4.20    6.20    11.20   16.80   31.60
25000   2.25    2.75    3.25    3.75    5.25    7.75    14.00   21.00   39.50
30000   2.70    3.30    3.90    4.50    6.30    9.30    16.80   25.20   47.40
35000   3.15    3.85    4.55    5.25    7.35    10.85   19.60   29.40   55.30
40000   3.60    4.40    5.20    6.00    8.40    12.40   22.40   33.60   63.20
45000   4.05    4.95    5.85    6.75    9.45    13.95   25.20   37.80   71.10
50000   4.50    5.50    6.50    7.50    10.50   15.50   28.00   42.00   79.00
10000   .60     .80     1.00    1.20    1.80    2.80    5.30    8.10    15.50
15000   .90     1.20    1.50    1.80    2.70    4.20    7.95    12.15   23.25
20000   1.20    1.60    2.00    2.40    3.60    5.60    10.60   16.20   31.00
25000   1.50    2.00    2.50    3.00    4.50    7.00    13.25   20.25   38.75
30000   1.80    2.40    3.00    3.60    5.40    8.40    15.90   24.30   46.50
35000   2.10    2.80    3.50    4.20    6.30    9.80    18.55   28.35   54.25
40000   2.40    3.20    4.00    4.80    7.20    11.20   21.20   32.40   62.00
45000   2.70    3.60    4.50    5.40    8.10    12.60   23.85   36.45   69.75
50000   3.00    4.00    5.00    6.00    9.00    14.00   26.50   40.50   77.50
1000    0.20    0.20    0.20    0.20    0.20    0.20    0.20    0.20    0.20
2000    0.39    0.39    0.39    0.39    0.39    0.39    0.39    0.39    0.39
3000    0.59    0.59    0.59    0.59    0.59    0.59    0.59    0.59    0.59
4000    0.78    0.78    0.78    0.78    0.78    0.78    0.78    0.78    0.78
5000    0.98    0.98    0.98    0.98    0.98    0.98    0.98    0.98    0.98
6000    1.17    1.17    1.17    1.17    1.17    1.17    1.17    1.17    1.17
7000    1.37    1.37    1.37    1.37    1.37    1.37    1.37    1.37    1.37
8000    1.56    1.56    1.56    1.56    1.56    1.56    1.56    1.56    1.56
9000    1.76    1.76    1.76    1.76    1.76    1.76    1.76    1.76    1.76
10000   1.95    1.95    1.95    1.95    1.95    1.95    1.95    1.95    1.95

Now I would like to split them into 3 dataframes based on the 'Size'

df1: From 10000 - before next occurrence of 10000 df2: Second 10000 - before 1000 df3: From 1000 to end

Otherwise,it is fine to have a temporary variable (temp column) in the same dataframe specifying categories like S1,S2 and S3 respectively for above ranges.

Could anyone guide me how to go about this?

Regards

2 Answers 2

3

Assumng that you want to break on the decreases, you could use the compare-cumsum-groupby pattern:

parts = list(df.groupby((df["Size"].diff() < 0).cumsum()))

which gives me (suppressing boring rows in the middle)

>>> for key, group in parts:
...     print(key)
...     print(group)
...     print("----")
...     
0
    Size    C1    C2    C3    C4     C5     C6    C7    C8    C9
0  10000  0.90  1.10  1.30  1.50   2.10   3.10   5.6   8.4  15.8
1  15000  1.35  1.65  1.95  2.25   3.15   4.65   8.4  12.6  23.7
2  20000  1.80  2.20  2.60  3.00   4.20   6.20  11.2  16.8  31.6
[...]
7  45000  4.05  4.95  5.85  6.75   9.45  13.95  25.2  37.8  71.1
8  50000  4.50  5.50  6.50  7.50  10.50  15.50  28.0  42.0  79.0
----
1
     Size   C1   C2   C3   C4   C5    C6     C7     C8     C9
9   10000  0.6  0.8  1.0  1.2  1.8   2.8   5.30   8.10  15.50
10  15000  0.9  1.2  1.5  1.8  2.7   4.2   7.95  12.15  23.25
11  20000  1.2  1.6  2.0  2.4  3.6   5.6  10.60  16.20  31.00
[...]
16  45000  2.7  3.6  4.5  5.4  8.1  12.6  23.85  36.45  69.75
17  50000  3.0  4.0  5.0  6.0  9.0  14.0  26.50  40.50  77.50
----
2
     Size    C1    C2    C3    C4    C5    C6    C7    C8    C9
18   1000  0.20  0.20  0.20  0.20  0.20  0.20  0.20  0.20  0.20
19   2000  0.39  0.39  0.39  0.39  0.39  0.39  0.39  0.39  0.39
20   3000  0.59  0.59  0.59  0.59  0.59  0.59  0.59  0.59  0.59
[...]
26   9000  1.76  1.76  1.76  1.76  1.76  1.76  1.76  1.76  1.76
27  10000  1.95  1.95  1.95  1.95  1.95  1.95  1.95  1.95  1.90
----
Sign up to request clarification or add additional context in comments.

Comments

2

Not so elegant but this works:

In [259]:
ranges=[]
first = df.index[0]
criteria = df.index[df['Size'].diff() < 0]
for idx in criteria:
    ranges.append((first, idx))
    first += idx
ranges

Out[259]:
[(0, 9), (9, 18)]

In [261]:
splits = []
for r in ranges:
    splits.append(df.iloc[r[0]:r[1]])
splits.append(df.iloc[ranges[-1][0]:])
splits

Out[261]:
[    Size    C1    C2    C3    C4     C5     C6    C7    C8    C9
 0  10000  0.90  1.10  1.30  1.50   2.10   3.10   5.6   8.4  15.8
 1  15000  1.35  1.65  1.95  2.25   3.15   4.65   8.4  12.6  23.7
 2  20000  1.80  2.20  2.60  3.00   4.20   6.20  11.2  16.8  31.6
 3  25000  2.25  2.75  3.25  3.75   5.25   7.75  14.0  21.0  39.5
 4  30000  2.70  3.30  3.90  4.50   6.30   9.30  16.8  25.2  47.4
 5  35000  3.15  3.85  4.55  5.25   7.35  10.85  19.6  29.4  55.3
 6  40000  3.60  4.40  5.20  6.00   8.40  12.40  22.4  33.6  63.2
 7  45000  4.05  4.95  5.85  6.75   9.45  13.95  25.2  37.8  71.1
 8  50000  4.50  5.50  6.50  7.50  10.50  15.50  28.0  42.0  79.0,
      Size   C1   C2   C3   C4   C5    C6     C7     C8     C9
 9   10000  0.6  0.8  1.0  1.2  1.8   2.8   5.30   8.10  15.50
 10  15000  0.9  1.2  1.5  1.8  2.7   4.2   7.95  12.15  23.25
 11  20000  1.2  1.6  2.0  2.4  3.6   5.6  10.60  16.20  31.00
 12  25000  1.5  2.0  2.5  3.0  4.5   7.0  13.25  20.25  38.75
 13  30000  1.8  2.4  3.0  3.6  5.4   8.4  15.90  24.30  46.50
 14  35000  2.1  2.8  3.5  4.2  6.3   9.8  18.55  28.35  54.25
 15  40000  2.4  3.2  4.0  4.8  7.2  11.2  21.20  32.40  62.00
 16  45000  2.7  3.6  4.5  5.4  8.1  12.6  23.85  36.45  69.75
 17  50000  3.0  4.0  5.0  6.0  9.0  14.0  26.50  40.50  77.50,
      Size    C1    C2    C3    C4    C5     C6     C7     C8     C9
 9   10000  0.60  0.80  1.00  1.20  1.80   2.80   5.30   8.10  15.50
 10  15000  0.90  1.20  1.50  1.80  2.70   4.20   7.95  12.15  23.25
 11  20000  1.20  1.60  2.00  2.40  3.60   5.60  10.60  16.20  31.00
 12  25000  1.50  2.00  2.50  3.00  4.50   7.00  13.25  20.25  38.75
 13  30000  1.80  2.40  3.00  3.60  5.40   8.40  15.90  24.30  46.50
 14  35000  2.10  2.80  3.50  4.20  6.30   9.80  18.55  28.35  54.25
 15  40000  2.40  3.20  4.00  4.80  7.20  11.20  21.20  32.40  62.00
 16  45000  2.70  3.60  4.50  5.40  8.10  12.60  23.85  36.45  69.75
 17  50000  3.00  4.00  5.00  6.00  9.00  14.00  26.50  40.50  77.50
 18   1000  0.20  0.20  0.20  0.20  0.20   0.20   0.20   0.20   0.20
 19   2000  0.39  0.39  0.39  0.39  0.39   0.39   0.39   0.39   0.39
 20   3000  0.59  0.59  0.59  0.59  0.59   0.59   0.59   0.59   0.59
 21   4000  0.78  0.78  0.78  0.78  0.78   0.78   0.78   0.78   0.78
 22   5000  0.98  0.98  0.98  0.98  0.98   0.98   0.98   0.98   0.98
 23   6000  1.17  1.17  1.17  1.17  1.17   1.17   1.17   1.17   1.17
 24   7000  1.37  1.37  1.37  1.37  1.37   1.37   1.37   1.37   1.37
 25   8000  1.56  1.56  1.56  1.56  1.56   1.56   1.56   1.56   1.56
 26   9000  1.76  1.76  1.76  1.76  1.76   1.76   1.76   1.76   1.76
 27  10000  1.95  1.95  1.95  1.95  1.95   1.95   1.95   1.95   1.95]

So firstly this looks to see when the size stops increasing:

df['Size'].diff() < 0

and we use to mask the index, we then iterate over these ranges to create a list of tuple ranges.

We iterate over these ranges to slice the df in the last step.

4 Comments

Thanks EdChum,this definitely works.Just looking for a better solution.
One point to notice here is that,I am missing the last 10000 in the 3rd dataframe.Is there any fix for that?
changed df.index[-1]) to df.index[-1])+1 and got it.
The only issue with that is that it assumes that the index is contiguous, I've updated my answer, basically I reduce the range list so I don't append the last range, and manually add it so it slices from the last entry to the end of the df, see the update

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.