1

I define a function as follows...

def getSentiment(x):
    vs = vaderSentiment
    col = vs(x['messages'].encode('utf-8', 'replace'))
    return col

The column of the DataFrame I am applying the function to contains individual strings per row (two examples)...

There are some classic 'Cat' ones about seatbelts
That would be the fighters steroids… I've told you

When I apply the function using...

df['sentiment']=df.apply(getSentiment, axis=1)

The dicts that resulted from the function are converted into string format in the new sentiment column (two rows as examples)...

sentiment
{'compound': 0.4404, 'neg': 0.0, 'neu': 0.919, 'pos': 0.081} 
{'compound': 0.4404, 'neg': 0.0, 'neu': 0.256, 'pos': 0.744}

Instead of this, is there a way to apply the function so that the key value pairs from the dict are returned as individual columns (in addition to the other variables), like this in effect:

compound    neg    neu      pos
0.4404      0.0    0.919    0.081
0.4404      0.0    0.256    0.744

Amongst other things I've tried using DataFrame.from_dict and searching some other answers on here but nothing seems applicable.

1
  • The original DataFrame has a column called messages in it. That's what function is applied to. Commented Jan 12, 2016 at 12:32

1 Answer 1

1

If values of column sentiment are strings, you can apply function ast.literal_eval for converting them to dictionary:

import ast

print df

#                                           sentiment  tmp
#0  {'compound': 0.4404, 'neg': 0.0, 'neu': 0.919,...   aa
#1  {'compound': 0.4404, 'neg': 0.0, 'neu': 0.256,...  sss

print type(df['sentiment'][0])

#<type 'str'>

df1 = df['sentiment'].apply(lambda x: pd.Series(ast.literal_eval(x)))
print df1

#   compound  neg    neu    pos
#0    0.4404    0  0.919  0.081
#1    0.4404    0  0.256  0.744

If values of column sentiment are dictionaries:

print df['sentiment']

#0    {u'neg': 0.0, u'neu': 0.919, u'pos': 0.081, u'...
#1    {u'neg': 0.0, u'neu': 0.256, u'pos': 0.744, u'...

print type(df['sentiment'][0])

#<type 'dict'>

print pd.DataFrame(x for x in df['sentiment'])

#   compound  neg    neu    pos
#0    0.4404    0  0.919  0.081
#1    0.4404    0  0.256  0.744
Sign up to request clarification or add additional context in comments.

4 Comments

I cant see how this is perfect, thanks. Any idea for the error I get now: ValueError: malformed string!
I think string is not key : value pair, so it cannot be converted.
Is print type(df['sentiment'][0]) string?
My mistake. sentiment is now a dict. I can now do new = pd.DataFrame(x for x in df['sentiment']) and join new with the original dataset. Thanks so much for persisting with this. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.