I am trying to iterate through a JSON dict and write chosen objects into a pandas Dataframe and then at the and want to calculate the correlation between the two columns.
The Dataframe should look like this:
calc val
20.1 20
20.2 20
19.8 20
... ...
10.1 10
10.3 10
9.8 10
... ...
5.2 5
5.1 5
5.0 5
... ...
My JSON dict looks like this:
{
"20um PSL": [
{
"imgsize": 20.886688245888603,
"trigsize": 20.87416236786009,
...
}
"imgsize": 20.886688245888603,
"trigsize": 20.87416236786009,
...
{...}
...
]
"10um PSL": [
{
"imgsize": 10.886688245888603,
"trigsize": 10.87416236786009,
....
}
{...}
...
]
"5um PSL": [
{
"imgsize": 5.886688245888603,
"trigsize": 5.87416236786009,
....
}
{...}
...
]
}
this is my code so far:
sizes = ['20um PSL', '10um PSL', '5um PSL']
for file in json_data[sizes[0]]:
particles_0 = pd.DataFrame({'calc': file['trigsize'], 'val': sizes_list[0]})
for file in json_data[sizes[1]]:
particles_1 = pd.DataFrame({'calc': file['trigsize'], 'val': sizes_list[1]})
for file in json_data[sizes[2]]:
particles_2 = pd.DataFrame({'calc': file['trigsize'], 'val': sizes_list[2]})
df = particles_0.append([particles_1, particles_2])
df.reset_index(drop=True, inplace=True) # reorder index
My difficulty is now that somehow always only the last 'trigsize' value is loaded into the data frame. I am aware that this is somehow a fault with my loop and I need to change the iteration. As I am a Python beginner I seem not to find the logical solution to this problem. In the end i need to calculate the correlation on how the numbers in Calc correlate with val (which is in this case always 20, 10, 5). Might there be a better solution for this? (instead of creating a dataframe?)
Dataframe so far:
calc val
20.1 20
20.1 20
20.1 20
... ...
10.1 10
10.1 10
10.1 10
... ...
5.0 5
5.0 5
5.0 5
... ...