1

I'm trying to split array values to columns.

I've created a Google Colab notebook and you can find my code here.

Here is a screenshot of the data (Hashtags):

Here is a representation of the data.

    codes
1   [71020]
2   [77085]
3   [36415]
4   [99213, 99287]
5   [99233, 99233, 99233]

I want to split this arrays into different columns.

To something like this (screenshot - Hashtags split to columns):

Here is a representation of it.

                   code_1      code_2      code_3   
1                  71020
2                  77085
3                  36415
4                  99213       99287
5                  99233       99233       99233

I tried the following code which I got form this Stack Overflow post, but it doesn't give the expected results:

df_hashtags_splitted = pd.DataFrame(df['hashtags'].tolist())

What am I doing wrong?

2
  • 1
    What are the unexpected results you are getting? Commented May 22, 2022 at 5:07
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. Commented May 22, 2022 at 5:57

1 Answer 1

1

The reason is the lists are still stored as strings in the hashtags column when you read them with read_csv. You can convert them upon reading of the data (follwing code taken from the Colab notebook):

import pandas as pd
from ast import literal_eval

url = "https://raw.githubusercontent.com/hashimputhiyakath/datasets/main/hashtags10.csv"

# Notice the added converter to turn strings into lists.
df = pd.read_csv(url, converters={'hashtags': literal_eval})

And then the solution you mentioned will work as expected.

df_hashtags_splitted = pd.DataFrame(df['hashtags'].tolist(), index=df.index).add_prefix('hashtag_')
print(df_hashtags_splitted.head(10))
          hashtag_0     hashtag_1         hashtag_2       hashtag_3           hashtag_4       hashtag_5    hashtag_6         hashtag_7  hashtag_8       hashtag_9 hashtag_10 hashtag_11
0         longcovid     covidhelp              None            None                None            None         None              None       None            None       None       None
1            mumbai         covid      hospitalbeds  covidemergency           mahacovid       oxygenbed  mumbaicovid  covid19indiahelp  covidhelp  covidresources       None       None
2   kawahcoffeeshop   coffeelover             kawah       costarica            puravida         heredia       oxygen              None       None            None       None       None
3           lucknow        mumbai         hyderabad           delhi            verified  covidresources    covidhelp  covid19indiahelp       None            None       None       None
4            oxygen          None              None            None                None            None         None              None       None            None       None       None
5  covid19indiahelp        mahara              None            None                None            None         None              None       None            None       None       None
6            oxygen       amadoda              None            None                None            None         None              None       None            None       None       None
7  plasmadonordelhi  plasmamumbai  covid19indiahelp       covidhelp  covidemergency2021            None         None              None       None            None       None       None
8            oxygen  conservation           wilding       rewilding         environment  sustainability  restorative       agriculture   wildlife    biodiversity      water   wildswim
9             covid      verified            mumbai          oxygen  covidemergency2021         covid19    covidhelp    covidresources       None            None       None       None

Alternatively, to convert the lists to strings after you read the csv you can do:

df['hashtags'] = df['hashtags'].map(literal_eval)
Sign up to request clarification or add additional context in comments.

2 Comments

from ast import literal_eval ` df['hashtags'] = df['hashtags'].map(literal_eval) df_hashtags_splitted = pd.DataFrame(df['hashtags'].tolist()).add_prefix('hashtag_') df_hashtags_splitted.head()
@HashimHamzaPuthiyakath right, I had forgot about the hashtag_ prefix. I added that to the answer as well, thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.