2

I have a CSV file containing some JSON strings and I want to parse it out and store into dataframe. The file looks like:

file1,"{\"A1\": {\"a\": \"123\"}, \"B1\": {\"b1\": \"456\", \"b2\": \"789\", \"b3\": \"000\"}}",
file2,"{\"A2\": {\"a\": \"321\"}, \"B2\": {\"b1\": \"654\", \"b2\": \"987\"}}"

After get the key in the dictionary. The dateframe I want will be:

 1              2                                    3           
file1   {"A1":{"a":"123"}}    {"B1":{"b1":"456","b2":"789","b3":"000"}}  
file2   {"A2":{"a2":"321"}}   {"B2":{"b1":"654","b2":"987"}}

The value in column 2 and columns 3 will be dictionary. I have tried:

pd.read_csv(file, quotechar='"', header=None) 

but it still separates my JSON in the wrong way...
Any suggestions? Many thanks!

5
  • The formatting of your question isn't really clear, but how far did you get with this already? Parsing csv is easy Commented Mar 11, 2017 at 15:15
  • 1
    If I used pd.read_csv(file), the third comma will separate the dictionary of "B1" into three parts. And the last part will be like "b3":"000"}} , which is not a json form. Commented Mar 11, 2017 at 15:21
  • Your json is quoted. You need to set the quote char, otherwise all commas are split, yes. Commented Mar 11, 2017 at 15:33
  • I have tried pd.read_csv(file,quotechar='"',header=None) It still separate my json in wrong way... Commented Mar 11, 2017 at 16:05
  • I don't have column in this file so I don't know how to use converter in pd.read_csv() ... Commented Mar 11, 2017 at 16:08

1 Answer 1

7

The data you have is using \" to escape a double quote within each cell. This behaviour can be specified by setting both doublequote=True and escapechar='\\' as parameters as follows:

df = pd.read_csv('input.json', doublequote=True, escapechar='\\')
print df

Giving you something like:

       0                                                  1     2
0  file1  {"A1": {"a": "123"}, "B1": {"b1": "456", "b2":...      
1  file2  {"A2": {"a": "321"}, "B2": {"b1": "654", "b2":...  None
   file1 {"A1": {"a": "123"}, "B1": {"b1": "456", "b2": "789", "b3": "000"}}  \
0  file2  {"A2": {"a": "321"}, "B2": {"b1": "654", "b2":...                    

   Unnamed: 2  
0         NaN
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.