2

I have a pandas dataframe which has a column structured as well:

  sequences
-------------
[(1838, 2038)]
[]
[]
[(809, 1090)]

I'need to loop row by row, so I structured the loop as well:

for index, row in df.iterrows():
    true_anom_seq = json.loads(row['sequences'])

What I wanna do is create a nested loop like [[1838, 2038], [], [], [809, 1090]] so I can iterate through it. The problem is that the code I wrote gives me the error:

JSONDecodeError: Expecting value: line 1 column 2 (char 1)

I also tried to print row['sequences'][0] and it gives me [, so it is reading it as a string.

How can I convert this string to a list?

3 Answers 3

1
import pandas as pd
import re
col = {'index': [1,2,3,4], 'sequence':['[(1838, 2038)]', '[]', '[]', '[(809, 1090)]']}
new_sequence = []
new_df = pd.DataFrame(col)
for index, row in new_df.iterrows():
    one_item = []
    true_anom_seq = re.findall(r'\d+', row['sequence'])
    for match in true_anom_seq:
        one_item.append(match)
    new_sequence.append(one_item)
print(new_sequence)
Sign up to request clarification or add additional context in comments.

Comments

1

Use ast.literal_eval to convert strings to list/dict/...:

from ast import literal_eval

>>> literal_eval('[1,2,3]')
[1,2,3]

Comments

1

No need to iterate through the dataframe itself nor use regex. Just apply the literal_eval function to each row in the sequence column and wrap it as a list:

from ast import literal_eval
import pandas as pd

col = {'index': [1,2,3,4], 'sequence':['[(1838, 2038)]', '[]', '[]', '[(809, 1090)]']}
new_sequence = []
new_df = pd.DataFrame(col)

list(new_df.sequence.apply(literal_eval))
[[(1838, 2038)], [], [], [(809, 1090)]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.