I want to import a json-file (even though it looks more like a plain txt-file) where each line is a small (four data pairs) json file. Every json file should be a row in the pandas Dataframe with four columns.
Example:
# Inside "data.json"
{"time": "2020-07-01:14:27:16.0000", "id": "m38dk117", "position": "66277", "active_current": "17.1"}
{"time": "2020-07-01:14:27:16.0000", "id": "m38dk118", "position": "3277", "active_current": "0.0"}
...
{"time": "2020-07-30:14:27:16.0000", "id": "m38dk006", "position": "73117", "active_current": "0.0"}
data.json is approx 30MB large and contains roughly 250.000 lines - for each day
data.json is approx 900MB large and contains roughly 7.5M lines - for each month
The following code snippet does do the job but is far too slow. Alternative options to pandas are welcome too, i am not limited to pandas. But inexperienced dealing with huge amounts of log-data.
Proposition:
import pandas as pd
import json
df = pd.DataFrame()
with open('data.json', 'r') as stacked_json_file:
row_idx = -1
for json_file in stacked_json_file:
row_idx += 1
df = df.append(pd.DataFrame(json.loads(json_file), index = [row_idx]))
Is this possibly slow because pd.DataFrame.append does not append IN-Place?
