1

I have a pandas dataframe as shown below.

DF1 =

sid                 path
 1    '["rome","is","in","province","lazio"]'   
 1    "['rome', 'is', 'in', 'province', 'naples']"
 1     ['N']
 1    "['rome', 'is', 'in', 'province', 'in', 'campania']"
 ....

I want to remove all unnecessary characters of the column path so the result should look like this:

DF2 =

    sid                  path
     1         rome is in province lazio
     1         rome is in province naples
     1                    N
     1         rome is in province in campania
 ....

I tried replacing all the unnecessary characters like this :

 DF1["path"].replace("[","").replace("]","").replace('"',"").replace(","," ").replace("'","")

But it didn't work. I suppose it's due to the entries ["N"]

How can I do this? Any help is appreciated!

2
  • Why is ['N'] not quoted? Is it a list containing a string or is it supposed to be "['N']"? Commented Jun 18, 2018 at 15:02
  • ['N'] is a list in this case. Commented Jun 18, 2018 at 15:07

2 Answers 2

1

Using ast.literal_eval & str.join

Demo:

import pandas as pd
import ast
df = pd.DataFrame({"path": ['["rome","is","in","province","lazio"]', "['rome', 'is', 'in', 'province', 'naples']", ['N']]})
df['path'] = df['path'].astype(str).apply(ast.literal_eval).apply(lambda x: " ".join(x))
print(df)

Output:

                         path
0   rome is in province lazio
1  rome is in province naples
2                           N
Sign up to request clarification or add additional context in comments.

1 Comment

Yup that might work, though a bit of a roundabout way since you apply list -> str -> list.
1

You can use ast.literal_eval to safely read lists output as strings. One way to account for genuine lists is to catch ValueError.

Note that, if at all possible, you should try to sort these issues upstream before they reach your dataframe.

from ast import literal_eval

df = pd.DataFrame({'sid': [1, 1, 1, 1],
                   'path': ['["rome","is","in","province","lazio"]',
                            "['rome', 'is', 'in', 'province', 'naples']",
                            ['N'],
                            "['rome', 'is', 'in', 'province', 'in', 'campania']"]})

def converter(x):
    try:
        return ' '.join(literal_eval(x))
    except ValueError:
        return ' '.join(x)

df['path'] = df['path'].apply(converter)

print(df)

                              path  sid
0        rome is in province lazio    1
1       rome is in province naples    1
2                                N    1
3  rome is in province in campania    1

2 Comments

Is there any difference between ast.literal_eval and plain eval?
@BubbleBubbleBubbleGut, Yes, eval is unsafe and not recommended.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.