Python Pandas convert 1 column of combination of strings to multiple columns of categorical data

Question

I am working on a project of analyzing weather data. Below is a abbreviated version of my csv file (only focus on the last column "Conditions"):

Year,Month,Day,Hour,DOW,Maximum Temperature,Minimum Temperature,Temperature,Precipitation,Snow,SnowDepth,Wind Speed,Visibility,Cloud Cover,Relative Humidity,Conditions
2020,3,5,8,3,48.0,48.0,48.0,0.0,0.0,0.0,10.3,9.9,0.0,81.44,Clear
2020,3,5,10,3,56.9,56.9,56.9,0.0,0.0,0.0,6.3,9.9,25.1,55.29,Partially cloudy
2020,3,9,8,0,60.7,60.7,60.7,0.0,0.0,0.0,14.5,8.1,79.6,91.95,Overcast
2020,3,9,10,0,62.5,62.5,62.5,0.01,0.0,0.0,16.0,7.0,94.7,89.95,"Rain, Overcast"
2020,3,17,20,1,66.4,66.4,66.4,0.02,0.0,0.0,8.7,4.3,68.6,88.78,"Rain, Partially cloudy"

and I want to transfer it to something like this:

Clear,Partially cloudy,Rain,Overcast
1,0,0,0
0,1,0,0
0,0,0,1
0,0,1,1
0,1,1,0

I saw that I could use the code below but I don't know how to deal with the condition when I have 2 categories in one data.

dataset['Conditions'] = dataset['Conditions'].map({1: 'Clear', 2: 'Partially cloudy', 3: 'Rain', 4: 'Snow'})
dataset = pd.get_dummies(dataset, columns=['Conditions'], prefix='', prefix_sep='')

Thank you in advance : )

Henry Ecker · Accepted Answer · 2021-05-19 23:25:30Z

3

Try str.split + explode then sum level 0:

dummies = pd.get_dummies(
    dataset['Conditions'].str.split(', ').explode()
).sum(level=0)

print(dummies)

dummies:

   Clear  Overcast  Partially cloudy  Rain
0      1         0                 0     0
1      0         0                 1     0
2      0         1                 0     0
3      0         1                 0     1
4      0         0                 1     1

To join back to the original DataFrame:

dummies = pd.get_dummies(
    dataset['Conditions'].str.split(', ').explode()
).sum(level=0)
# Join Back to dataset
dataset = dataset.drop(columns='Conditions').join(dummies)
print(dataset.to_string())

   Year  Month  Day  Hour  ...  Clear  Overcast  Partially cloudy  Rain
0  2020      3    5     8  ...      1         0                 0     0
1  2020      3    5    10  ...      0         0                 1     0
2  2020      3    9     8  ...      0         1                 0     0
3  2020      3    9    10  ...      0         1                 0     1
4  2020      3   17    20  ...      0         0                 1     1

edited May 19, 2021 at 23:25

answered May 19, 2021 at 22:35

Henry Ecker♦

35.8k19 gold badges48 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

asdfg Over a year ago

sorry if this sounds kinda dumb cuz i'm new to all this. How can I merge "out" into the original csv file?

Henry Ecker Over a year ago

Instead of or in addition to the conditions column?

asdfg Over a year ago

replace the "Conditions" column as the 4 columns

Henry Ecker Over a year ago

It's join but I added the complete code to my answer.

Nk03 · Accepted Answer · 2021-05-19 22:41:34Z

3

you can use pd.get_dummies:

result = (
    pd.get_dummies(
        df.Conditions.str.split(', ', expand=True)
        .stack())
    .sum(level=0)
)

OUTPUT:

   Clear  Overcast  Partially cloudy  Rain
0      1         0                 0     0
1      0         0                 1     0
2      0         1                 0     0
3      0         1                 0     1
4      0         0                 1     1

edited May 19, 2021 at 22:41

answered May 19, 2021 at 22:32

Nk03

15k2 gold badges11 silver badges24 bronze badges

Comments

Yiqing Wang · Accepted Answer · 2021-05-19 22:38:29Z

0

import pandas as pd 
xx = pd.DataFrame([[1,2,"ss"],[2,3,"cc"],[4,2,"d"]],columns=["v1","v2","s"])
pd.Series(xx["s"]).str.get_dummies()

answered May 19, 2021 at 22:38

Yiqing Wang

333 bronze badges

1 Comment

Henry Ecker Over a year ago

How does this handle the case when there are multiple categorical data points in the same column "Rain, Partially cloudy" for example?

Collectives™ on Stack Overflow

Python Pandas convert 1 column of combination of strings to multiple columns of categorical data

3 Answers 3

4 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related