0

I've been trying to convert a pandas dataframe column of list elements to json and push it to snowflake as a variant but I'm stuck in 1st step.

I have a pandas dataframe with ID and conversation transcript which looks in this way.

Sample dataframe:

ID   transcript
1     ['Joe([email protected]): Hey', 'Smoe([email protected]): Hey!! How are you doing?', 'Joe([email protected]): I'm doing good']

And, I have multiple rows(conversation transcripts with different ID) with same format

Expected dataframe:

    ID   transcript
    1     {'Joe([email protected]): Hey', 'Smoe([email protected]): Hey!! How are you doing?', 'Joe([email protected]): I'm doing good'}

I tried to convert each individual object to json but list object has no attribute 'to_json'

df['transcript_json'] = df['transcript_json'].apply(lambda x: x.to_json())

I also tried converting the whole column into a json object which gave me a big string object but didn't get me any further where I want to go.

transcript_list = df['transcript'].to_json()

{"0":["Joe([email protected]): Hey", "Smoe([email protected]): Hey!! How are you doing?", "Joe([email protected]): I'm doing good"]}

I know I'm missing something small here. Any ideas on how to do it would be much appreciated.

2
  • 1
    Apparently, your expected dataframe contains an invalid JSON. If you need multi-values, you need to use an array instead of dict. For instance, the expected JSON would be: ['Joe([email protected]): Hey', 'Smoe([email protected]): Hey!! How are you doing?', 'Joe([email protected]): I'm doing good'] Commented May 19, 2021 at 15:42
  • Ah... I see. That makes sense. I don't have a key for that JSON object Commented May 19, 2021 at 16:32

2 Answers 2

1

It's not clear what are wanting the end result to be. Your expected from original only change these [] to these {}. If you want a dictionary with usable key:value pairs, here's a bastardized way to change the string to dictionary. The problem is, you lose any elements when the email address (the key) is the same.

data='''
ID   transcript
1   ['Joe([email protected]): Hey', 'Smoe([email protected]): Hey!! How are you doing?', 'Joe([email protected]): I'm doing good']
'''
df = pd.read_csv(io.StringIO(data), sep='   ', engine='python')
df['transcript'] = df['transcript'].apply(lambda x: x.replace(': ', '": "').replace("['", '{"').replace("']", '"}').replace("', '", '", "'))
print(df['transcript'].apply(lambda x: type(x)))
df['transcript'].apply(lambda x: json.loads(x))

Outptut

0    <class 'str'>
Name: transcript, dtype: object

0    {'Joe([email protected])': 'I'm doing good', 'Smoe([email protected])': 'Hey!! How are you doing?'}
Name: transcript, dtype: object

What format do you need that list object to really be in so you don't lose any data? Can it be a list of properly formatted key:value pairs?

Sign up to request clarification or add additional context in comments.

Comments

1

Split each item in the list on : and create dictionary out of each key,value after splitting and then use json.dumps to serialize it to JSON string.

df.transcript.apply(lambda x:{key:value.strip() for key,value in [item.split(':') for item in x]}).apply(json.dumps)

OUTPUT:

'{"Joe([email protected])": "I\'m doing good", "Smoe([email protected])": "Hey!! How are you doing?"}'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.