Hello I am working with a json, this json contains several conversations, the format is the following: from bracket to bracket is contained a complete converstation as follows:
[
{
"created": "2017-02-02T11:57:41+0000",
"from": "Bank",
"message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks."
},
{
"created": "2017-02-01T22:19:58+0000" ,
"from": "Alex ",
"message": "Could someone please help me?, I am callig to CC and they don't answer"
},
{
"created": "2017-02-01T22:19:42+0000",
"from": "Alex ",
"message": "the sms with the corresponding key and token has not arrived"
},
{
"created": "2017-02-01T22:19:28+0000",
"from": "Alex ",
"message": "I have issues to make payments from the app"
},
{
"created": "2017-02-01T22:19:18+0000",
"from": "Alex ",
"message": "Good afternoon"
}
],
I would like to parse this json in order to get the questions in a column and then match it with the anwers provided always for the Bank, in the second column as follows, for the first interaction would be:
All User Comments:
"Good afternoon, I have issues to make payments from the app, the sms with the corresponding key and token has not arrived, Could someone please help me?, I am callig to CC and they don't answer"
All Answers:
"Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks."
my desired output is to parse all the json to build this two columns notice that you can sort all by the hour and corresponding date, I order to get this I tried:
with open('/home/adolfo/Desktop/CONVERSATIONS/test2.json') as json_data:
d = json.load(json_data)
df = pd.DataFrame.from_records(np.concatenate(d))
print(df)
however I got:
created from \
0 2017-02-02T11:57:41+0000 Bank
1 2017-02-01T22:19:58+0000 Alex
2 2017-02-01T22:19:42+0000 Alex
3 2017-02-01T22:19:28+0000 Alex
4 2017-02-01T22:19:18+0000 Alex
5 2017-02-02T11:57:41+0000 Bank
6 2017-02-01T22:19:58+0000 Alex
7 2017-02-01T22:19:42+0000 Alex
8 2017-02-01T22:19:28+0000 Alex
9 2017-02-01T22:19:18+0000 Alex
10 2017-02-01T22:19:12+0000 Bank
11 2017-02-01T16:22:30+0000 Alex
message
0 Hi Alex, if you have not perform the modificat...
1 Could someone please help me?, I am callig to ...
2 the sms with the corresponding key and token h...
3 I have issues to make payments from the app
4 Good afternoon
5 Hi Alex, if you have not perform the modificat...
6 Could someone please help me?, I am callig to ...
7 the sms with the corresponding key and token h...
8 I have issues to make payments from the app
9 Good afternoon
10 Hello Alexander, the money is available to be...
11 hello they have deposited the money into my ac...
So I really appreciate support to achieve this task, this an example of the json:
[
[
{
"created": "2017-02-02T11:57:41+0000",
"from": "Bank",
"message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks."
},
{
"created": "2017-02-01T22:19:58+0000" ,
"from": "Alex ",
"message": "Could someone please help me?, I am callig to CC and they don't answer"
},
{
"created": "2017-02-01T22:19:42+0000",
"from": "Alex ",
"message": "the sms with the corresponding key and token has not arrived"
},
{
"created": "2017-02-01T22:19:28+0000",
"from": "Alex ",
"message": "I have issues to make payments from the app"
},
{
"created": "2017-02-01T22:19:18+0000",
"from": "Alex ",
"message": "Good afternoon"
}
],
[
{
"created": "2017-02-01T22:19:12+0000",
"from": "Bank",
"message": " Hello Alexander, the money is available to be withdrawn, you could go to any store the number is 70307002459"
},
{
"created": "2017-02-01T16:22:30+0000",
"from": "Alex",
"message": "hello they have deposited the money into my account, I don't have account from this bank, Could I know if I can withdraw the money? DNI 427 thanks a lot"
}
]
]
After a useful feedback from here I tried:
df = pd.read_json('/home/adolfo/Desktop/CONVERSATIONS/test2.json')
df.created = pd.to_datetime(df.created)
df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')).set_index(['created', 'qna']).message.unstack(fill_value='')
but I got:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-44-8881c5d91cd0> in <module>()
63 df = pd.read_json('/home/adolfo/Desktop/CONVERSATIONS/test2.json')
64
---> 65 df.created = pd.to_datetime(df.created)
66
67 df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')).set_index(['created', 'qna']).message.unstack(fill_value='')
/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py in __getattr__(self, name)
2742 if name in self._info_axis:
2743 return self[name]
-> 2744 return object.__getattribute__(self, name)
2745
2746 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'created'
