0

I have a large dataset from a CSV file. It has two columns, the first is Date/Time in hh:mm:ss:ms form and the other is Pressure in number form. The pressure has randon values throughout that are not numerical values (things like 150+AA42BB43). These appear randomly throughout the 50,000 rows in the file and are not the same.

I need a way to change these Pressure values to numeric so I can perform data manipulation on them.

df_cleaned = df['Pressure'].loc[~df['Pressure'].map(lambda x: isinstance(x, float) | isinstance(x, int))]

I tried this, but it got rid of my Date/Time values and also did not clean all the pressure values while also getting rid of my headers.

I was wondering if anyone had any suggestions on how I can easily clean the data in the 2nd column while also keeping my Date/Time values in the first column accurate.

4
  • you should rather use df_cleaned = df.loc[....] (or even df_cleaned = df[....]) instead of df['Pressure'].loc[...] Commented Oct 6, 2021 at 0:58
  • Using df_cleaned = df['Pressure']... you get only one column (Pressure) and you skip other columns - and this is why you don't have Date/Time. And because it is single column so it may give it as Series instead of DataFrame - and this can remove your header(s) because Series (single column) doesn't need header. Commented Oct 6, 2021 at 1:05
  • you can do isinstance(x, (float, int)) Commented Oct 6, 2021 at 1:07
  • How exactly do you want to clean the pressure values? Just get rid of the non-numeric characters and convert to a float? Commented Oct 6, 2021 at 1:17

2 Answers 2

2

Your problem is that you use

df_cleaned = df['Pressure']

and this get only one column (Pressure) and skip other columns. And when you get single column then it may gives you Series instead of DataFrame - and Series can keep only one column so it doesn't need header to select columns.

You should run it without ['Pressure']

df_cleaned = df.loc[ ~df['Pressure'].map(...) ]

or even

df_cleaned = df[ ~df['Pressure'].map(...) ]

BTW: shorter isinstance(x, (float, int))

But using isinstance may not work if you have float/int values as strings - because isinstance("123", (float, int)) gives False - and you would have to rather try to convert float("123") and int("123") and catch error.


import pandas as pd

data = {
    'DateTime': ['2021.10.04', '2021.10.05', '2021.10.06'], 
    'Pressure': [78, '150+AA42BB43', 23], 
}

df = pd.DataFrame(data)

df_cleaned = df[ df['Pressure'].map(lambda x:isinstance(x, (float, int))) ]

print(df_cleaned)

Result:

     DateTime Pressure
0  2021.10.04       78
2  2021.10.06       23

EDIT:

If you have values as string then you can use to_numeric to convert them and put NaN if value can't be converted

df['Pressure'] = pd.to_numeric(df['Pressure'], errors='coerce')

and then you can filter it usinf isna()

df_cleaned = df[ ~df['Pressure'].isna() ]

import pandas as pd

data = {
    'DateTime': ['2021.10.04', '2021.10.05', '2021.10.06'], 
    'Pressure': ['78.2', '150+AA42BB43', '23'], 
}

df = pd.DataFrame(data)

df['Pressure'] = pd.to_numeric(df['Pressure'], errors='coerce')
print(df)

df_cleaned = df[ ~df['Pressure'].isna() ]
print(df_cleaned)
Sign up to request clarification or add additional context in comments.

Comments

1

I think I've an answer if all your non-numerical values are strings.

Have you tried using pandas replace()? Something like:

df['Pressure'].replace(to_replace = r'.+', value=0, inplace=True, regex=True)

I've used a regex to determine "any string". inplace=True allows that the existing dataframe is modified, instead of creating a new one.

Here, the function will replace any string with a given integer. I am not sure which integer you'd like to put there, so I've just used zero as an example. If you'd like a different integer for each string, you could use a map as explained in this answer.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.