1

Hello I am working with a json, this json contains several conversations, the format is the following: from bracket to bracket is contained a complete converstation as follows:

[
    {
        "created": "2017-02-02T11:57:41+0000",
        "from": "Bank",
        "message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks."
    },
    {
        "created": "2017-02-01T22:19:58+0000"   ,
        "from": "Alex ",
        "message": "Could someone please help me?, I am callig to CC and they don't answer"
    },
    {
        "created": "2017-02-01T22:19:42+0000",
        "from": "Alex ",
        "message": "the sms with the corresponding key and token has not arrived"
    },
    {
        "created": "2017-02-01T22:19:28+0000",
        "from": "Alex ",
        "message": "I have issues to make payments from the app"
    },
    {
        "created": "2017-02-01T22:19:18+0000",
        "from": "Alex ",
        "message": "Good afternoon"
    }
],

I would like to parse this json in order to get the questions in a column and then match it with the anwers provided always for the Bank, in the second column as follows, for the first interaction would be:

All User Comments:

"Good afternoon, I have issues to make payments from the app, the sms with the corresponding key and token has not arrived, Could someone please help me?, I am callig to CC and they don't answer"

All Answers:

"Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks."

my desired output is to parse all the json to build this two columns notice that you can sort all by the hour and corresponding date, I order to get this I tried:

with open('/home/adolfo/Desktop/CONVERSATIONS/test2.json') as json_data:
    d = json.load(json_data)
    df = pd.DataFrame.from_records(np.concatenate(d))

print(df)

however I got:

                     created   from  \
0   2017-02-02T11:57:41+0000   Bank   
1   2017-02-01T22:19:58+0000  Alex    
2   2017-02-01T22:19:42+0000  Alex    
3   2017-02-01T22:19:28+0000  Alex    
4   2017-02-01T22:19:18+0000  Alex    
5   2017-02-02T11:57:41+0000   Bank   
6   2017-02-01T22:19:58+0000  Alex    
7   2017-02-01T22:19:42+0000  Alex    
8   2017-02-01T22:19:28+0000  Alex    
9   2017-02-01T22:19:18+0000  Alex    
10  2017-02-01T22:19:12+0000   Bank   
11  2017-02-01T16:22:30+0000   Alex   

                                              message  
0   Hi Alex, if you have not perform the modificat...  
1   Could someone please help me?, I am callig to ...  
2   the sms with the corresponding key and token h...  
3         I have issues to make payments from the app  
4                                      Good afternoon  
5   Hi Alex, if you have not perform the modificat...  
6   Could someone please help me?, I am callig to ...  
7   the sms with the corresponding key and token h...  
8         I have issues to make payments from the app  
9                                      Good afternoon  
10   Hello Alexander, the money is available to be...  
11  hello they have deposited the money into my ac...  

So I really appreciate support to achieve this task, this an example of the json:

[
    [
        {
            "created": "2017-02-02T11:57:41+0000",
            "from": "Bank",
            "message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks."
        },
        {
            "created": "2017-02-01T22:19:58+0000"   ,
            "from": "Alex ",
            "message": "Could someone please help me?, I am callig to CC and they don't answer"
        },
        {
            "created": "2017-02-01T22:19:42+0000",
            "from": "Alex ",
            "message": "the sms with the corresponding key and token has not arrived"
        },
        {
            "created": "2017-02-01T22:19:28+0000",
            "from": "Alex ",
            "message": "I have issues to make payments from the app"
        },
        {
            "created": "2017-02-01T22:19:18+0000",
            "from": "Alex ",
            "message": "Good afternoon"
        }
    ],
    [
        {
            "created": "2017-02-01T22:19:12+0000",
            "from": "Bank",
            "message": " Hello Alexander, the money is available to be  withdrawn, you could go to any store the number is 70307002459"
        }, 
        {            
            "created": "2017-02-01T16:22:30+0000",
            "from": "Alex",
            "message": "hello they have deposited the money into my account, I don't have account from this bank, Could I know if I can withdraw the money? DNI 427 thanks a lot"
        }

    ]


]

After a useful feedback from here I tried:

df = pd.read_json('/home/adolfo/Desktop/CONVERSATIONS/test2.json')

df.created = pd.to_datetime(df.created)

df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')).set_index(['created', 'qna']).message.unstack(fill_value='')

but I got:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-44-8881c5d91cd0> in <module>()
     63 df = pd.read_json('/home/adolfo/Desktop/CONVERSATIONS/test2.json')
     64 
---> 65 df.created = pd.to_datetime(df.created)
     66 
     67 df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')).set_index(['created', 'qna']).message.unstack(fill_value='')

/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py in __getattr__(self, name)
   2742             if name in self._info_axis:
   2743                 return self[name]
-> 2744             return object.__getattribute__(self, name)
   2745 
   2746     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'created'
4
  • I is not clear to me what is wrong with the default parsing. Commented Mar 21, 2017 at 23:53
  • @StephenRauch the problem with the initial approach is that, it is not sorted in the way that I need, I just need two columns one with all the comments and the second one with all the answers in the appropriated order Commented Mar 21, 2017 at 23:55
  • How do you distinguish between comments and answers Commented Mar 21, 2017 at 23:56
  • @StephenRauch the answers are provided by the Bank, sorry if that was not clear, all the another things are provided by users Commented Mar 21, 2017 at 23:57

1 Answer 1

1
    j = """[
    [
        {
            "created": "2017-02-02T11:57:41+0000",
            "from": "Bank",
            "message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks."
        },
        {
            "created": "2017-02-01T22:19:58+0000"   ,
            "from": "Alex ",
            "message": "Could someone please help me?, I am callig to CC and they don't answer"
        },
        {
            "created": "2017-02-01T22:19:42+0000",
            "from": "Alex ",
            "message": "the sms with the corresponding key and token has not arrived"
        },
        {
            "created": "2017-02-01T22:19:28+0000",
            "from": "Alex ",
            "message": "I have issues to make payments from the app"
        },
        {
            "created": "2017-02-01T22:19:18+0000",
            "from": "Alex ",
            "message": "Good afternoon"
        }
    ],
    [
        {
            "created": "2017-02-01T22:19:12+0000",
            "from": "Bank",
            "message": " Hello Alexander, the money is available to be  withdrawn, you could go to any store the number is 70307002459"
        }, 
        {            
            "created": "2017-02-01T16:22:30+0000",
            "from": "Alex",
            "message": "hello they have deposited the money into my account, I don't have account from this bank, Could I know if I can withdraw the money? DNI 427 thanks a lot"
        }

    ]


]"""

js = json.loads(j)
df = pd.concat({i: pd.DataFrame(j) for i, j in enumerate(js)})

df.created = pd.to_datetime(df.created)

df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')).set_index(['created', 'qna']).message.unstack(fill_value='')

enter image description here

Sign up to request clarification or add additional context in comments.

5 Comments

Hello I really appreciate the support however I got an error when I tried with the real data set, I am going to update the question with the issue that I got
your example works really well but I do not know what is happening when I put the path of the real data I got an error
@neo33 edit your post with the output of print(df.head())
@neo33 I see the issue, your posted json at the beginning is different
@pirRSquared, could you please support me to change the answer to my format? cause I don't understand the issue

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.