Extracting pandas dataframe from another dataframe

Question

Suppose i have the following dataframe:

                                   Date      Open      High       Low     Close     Volume         min         max  Loc
Date
2020-06-15 14:00:00 2020-06-15 14:00:00  0.000123  0.000130  0.000121  0.000128  1467828.0  0.00012081  0.00013040    0
2020-06-15 18:00:00 2020-06-15 18:00:00  0.000128  0.000129  0.000123  0.000125  1264642.0           0           0    1
2020-06-15 22:00:00 2020-06-15 22:00:00  0.000125  0.000126  0.000122  0.000123   723738.0           0           0    2

I'm trying to create a new dataframe where:

The data should be the columns Open, min, max Loc but ONLY where min and max are > 0.
The index of the dataframe should be the column Loc

Now i know that to create a Dataframe from another dataframe i can use pandas.concat() but i don't know how to set the conditions i explained above. Can anyone help me out on this?

Expected output example:

 Loc    Open          min         max   
   0   0.000123    0.00012081  0.00013040

user13893607 · Accepted Answer · 2020-07-10 11:08:06Z

Building your example dataframe:

df = pd.DataFrame(
    data={
        "Date": ["2020-06-15 14:00:00", "2020-06-15 18:00:00", "2020-06-15 22:00:00"],
        "Open": [0.000123, 0.000128, 0.000125],
        "High": [0.000130, 0.000129, 0.000126],
        "Low": [0.000121, 0.000123, 0.000122],
        "Close": [0.000128, 0.000125, 0.000123],
        "Volume": [1467828.0, 1264642.0, 723738.0],
        "min": [0.00012081, 0, 0],
        "max": [0.00013040, 0, 0],
        "Loc":  [0, 1, 2],
    }
)

df.set_index("Date", drop=False, inplace=True)

A solution would be this:

# Set the index to a different column
# ("df2" is a copy of "df")
df2 = df.set_index("Loc")

# Keep only some columns
df2 = df2[["Open", "min", "max"]]

# Filter rows based on a condition
df2 = df2[(df2["min"] > 0) & (df2["max"] > 0)]

df2 would be like this:

         Open       min      max
Loc                             
0    0.000123  0.000121  0.00013

jezrael · Accepted Answer · 2020-07-10 11:04:51Z

2

First filter by mask created by DataFrame.gt for compare for greater of both columns with DataFrame.all, select columns by DataFrame.loc and last add DataFrame.set_index:

df = df.loc[df[['min','max']].gt(0).all(axis=1), ['Open','min','max','Loc']].set_index('Loc')
print (df)
         Open       min      max
Loc                             
0    0.000123  0.000121  0.00013

Or compare both columns separately and chain masks by & for bitwise AND:

df = df.loc[df['min'].gt(0) & df['max'].gt(0), ['Open','min','max','Loc']].set_index('Loc')

EDIT:

Because error:

''>' not supported between instances of 'str' and 'int',

it means there are string repr of values in min or max columns (or both), so convert values to numbers before solutions above:

df['min'] = pd.to_numeric(df['min'], errors='coerce')
df['max'] = pd.to_numeric(df['max'], errors='coerce')

edited Jul 10, 2020 at 11:04

answered Jul 10, 2020 at 10:52

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

4 Comments

San9096 Over a year ago

Thank you a lot! I'm getting the following error: ''>' not supported between instances of 'str' and 'int', i suppose it has to do with the original dataframe. Maybe i need to convert everything to float or int?

jezrael Over a year ago

@San9096 - I think it means max or min or both are strings repr. give me some tome for solution.

San9096 Over a year ago

Yeah, i didn't notice that the column was in string format, to_numeric() should convert it

San9096 Over a year ago

Yes, saw your edit and i was using the exact same line. Awesome! Thank you a lot. I accepted.

Collectives™ on Stack Overflow

Extracting pandas dataframe from another dataframe

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related