Create multiple DataFrames from a single DataFrame based on conditions

Question

Hi I would like to manipulate the following data frame so that everytime a new "sample" is shown a new dataframe is created.

For example the algorithm should group all analytes,CAS numbers, and values below sample1 into a dataframe and then create a new dataframe once it hits sample 2 and so on.

New to pandas and python so thank you in advance.

Look for tutorials on for loops, lists and DF.loc. the combination of the three will help you. — JarroVGIT
– JarroVGIT, Commented Aug 1, 2020 at 20:21

Jakub · Accepted Answer · 2020-08-02 05:29:49Z

1

import pandas as pd

# Create DataFrame
data = [{'analyte': 'sample1'},
        {'analyte': 'bacon', 'CAS1': 1},
        {'analyte': 'eggs', 'CAS1': 2},
        {'analyte': 'money', 'CAS1': 3, 'CAS2': 1, 'Value2': 1.11},
        {'analyte': 'shoe', 'CAS1': 4},
        {'analyte': 'boy', 'CAS1': 5},
        {'analyte': 'girl', 'CAS1': 6},
        {'analyte': 'onion', 'CAS1': 7, 'CAS2': 4, 'Value2': 6.53},
        {'analyte': 'sample2'},
        {'analyte': 'bacon', 'CAS1': 1},
        {'analyte': 'eggs', 'CAS1': 2, 'CAS2': 1, 'Value2': 7.88},
        {'analyte': 'money', 'CAS1': 3},
        {'analyte': 'shoe', 'CAS1': 4, 'CAS2': 3, 'Value2': 15.5},
        {'analyte': 'boy', 'CAS1': 5},
        {'analyte': 'girl', 'CAS1': 6},
        {'analyte': 'onion', 'CAS1': 7}]
df = pd.DataFrame(data)

# Create list of row indices for each sample name
# For above example: s = [0, 8, 16]
s = list(df['analyte'].index[df['analyte'].str[:6] == 'sample']) + [len(df)]

# Create new dataframes for each sample and print results
samples = {}
for i, j in zip(s, s[1:]):
    sample_df = df.iloc[i+1 : j, :].reset_index(drop=True)
    sample_name = df.iloc[i].loc['analyte']
    samples.update( {sample_name : sample_df} )

print(samples['sample2'])

Other options:

# if CAS1 cell of sample row is NaN
sample_indices = list(df['CAS1'].index[df['CAS1'].apply(np.isnan)]) + [len(df)]

# if CAS1 cell of sample row is NaN or None
sample_indices = list(df['CAS1'].index[df['CAS1'].isnull()]) + [len(df)]

# if CAS1 cell of sample row is an empty string
sample_indices = list(df['CAS1'].index[df['CAS1'] == '']) + [len(df)]

edited Aug 2, 2020 at 5:29

answered Aug 1, 2020 at 22:17

Jakub

5593 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Chris Kubicki Over a year ago

Awesome-This worked. Thank you! One question if each of the output dataframes is df1 how do I access them separately?

Jakub Over a year ago

If you don't want to use numpy, you can substitute .apply(np.isnan) with .isnull().

Chris Kubicki Over a year ago

Thanks. One last question- does this method rely on CAS1 repeating itself. For example would having a "sample 3" with CAS1 values of "18,6,3,7, etc." throw off your algorithm. Thanks again.

Jakub Over a year ago

It only matters what value is in the CAS1 column of the sample row. And that value can not occur in any of the CAS1 cells below the sample. If the CAS1 sample cell is a None or NaN, the algorithm will work using df['CAS1'].isnull(), but only if the CAS1 cells below the sample are not None or NaN. If the sample cell is an empty string, then replace df['CAS1'].isnull() with df['CAS1'] == "". This then requires the CAS1 cells under the sample to not be an empty string. You can also use the 'or' operator such as (df['CAS1'].isnull() or df['CAS1'] == "")

Jakub Over a year ago

1. The example above has NaN's in CAS1, not blank spaces. 2. I would not use regex if the sample always begins with same letters. I put another option above that works. 3. You could append both the sample name and the DataFrame to a 2D list or tuple, then select DataFrame that you want by using the sample name or the index.

|

Collectives™ on Stack Overflow

Create multiple DataFrames from a single DataFrame based on conditions

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related