1

Suppose i have the following dataframe:

                                   Date      Open      High       Low     Close     Volume         min         max  Loc
Date
2020-06-15 14:00:00 2020-06-15 14:00:00  0.000123  0.000130  0.000121  0.000128  1467828.0  0.00012081  0.00013040    0
2020-06-15 18:00:00 2020-06-15 18:00:00  0.000128  0.000129  0.000123  0.000125  1264642.0           0           0    1
2020-06-15 22:00:00 2020-06-15 22:00:00  0.000125  0.000126  0.000122  0.000123   723738.0           0           0    2

I'm trying to create a new dataframe where:

  1. The data should be the columns Open, min, max Loc but ONLY where min and max are > 0.
  2. The index of the dataframe should be the column Loc

Now i know that to create a Dataframe from another dataframe i can use pandas.concat() but i don't know how to set the conditions i explained above. Can anyone help me out on this?

Expected output example:

 Loc    Open          min         max   
   0   0.000123    0.00012081  0.00013040    

2 Answers 2

3

Building your example dataframe:

df = pd.DataFrame(
    data={
        "Date": ["2020-06-15 14:00:00", "2020-06-15 18:00:00", "2020-06-15 22:00:00"],
        "Open": [0.000123, 0.000128, 0.000125],
        "High": [0.000130, 0.000129, 0.000126],
        "Low": [0.000121, 0.000123, 0.000122],
        "Close": [0.000128, 0.000125, 0.000123],
        "Volume": [1467828.0, 1264642.0, 723738.0],
        "min": [0.00012081, 0, 0],
        "max": [0.00013040, 0, 0],
        "Loc":  [0, 1, 2],
    }
)

df.set_index("Date", drop=False, inplace=True)

A solution would be this:

# Set the index to a different column
# ("df2" is a copy of "df")
df2 = df.set_index("Loc")

# Keep only some columns
df2 = df2[["Open", "min", "max"]]

# Filter rows based on a condition
df2 = df2[(df2["min"] > 0) & (df2["max"] > 0)]

df2 would be like this:

         Open       min      max
Loc                             
0    0.000123  0.000121  0.00013
Sign up to request clarification or add additional context in comments.

Comments

2

First filter by mask created by DataFrame.gt for compare for greater of both columns with DataFrame.all, select columns by DataFrame.loc and last add DataFrame.set_index:

df = df.loc[df[['min','max']].gt(0).all(axis=1), ['Open','min','max','Loc']].set_index('Loc')
print (df)
         Open       min      max
Loc                             
0    0.000123  0.000121  0.00013

Or compare both columns separately and chain masks by & for bitwise AND:

df = df.loc[df['min'].gt(0) & df['max'].gt(0), ['Open','min','max','Loc']].set_index('Loc')

EDIT:

Because error:

''>' not supported between instances of 'str' and 'int',

it means there are string repr of values in min or max columns (or both), so convert values to numbers before solutions above:

df['min'] = pd.to_numeric(df['min'], errors='coerce')
df['max'] = pd.to_numeric(df['max'], errors='coerce')

4 Comments

Thank you a lot! I'm getting the following error: ''>' not supported between instances of 'str' and 'int', i suppose it has to do with the original dataframe. Maybe i need to convert everything to float or int?
@San9096 - I think it means max or min or both are strings repr. give me some tome for solution.
Yeah, i didn't notice that the column was in string format, to_numeric() should convert it
Yes, saw your edit and i was using the exact same line. Awesome! Thank you a lot. I accepted.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.