Python pandas data frame code optimisation

Question

I am actually looking to optimize my code here in python. I'm hitting my ES(elastic search) and getting json response, now i'm iterating over json response and storing them as list to append them as column in dataframe

unmtchd_ESdata={"Response from Elastic seaach"}

    for i in range(len(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'])):
        list6.append(unmtchd_ESdata['avg'])
        list7.append(unmtchd_ESdata['key'])
        ....
        ....

    mkt_df=pd.DataFrame()
    mkt_df["market_avg_total_sales_count"]=dict6
    mkt_df["pos_code"]=dict7
    ...
    ....

At the end the result will have mkt_df dataframe with all the columns being assigned with values in the order of what was appended to the list. If a list suppose list6 is appended with values like [01200000129,00980030003] then it wil be present in the below form in data format and same applies for the rest as well

   market_avg_total_sales_count     pos_code 
0                        329.75  01200000129 
1                         15.00  00980030003

Now my question here is i'm reading too many variables and i want them as dataframe values and obviously having N number of list is making my program in efficient because all these operations are in memory. Any suggestions on how to replicate such scenerio with less space and time complexity

Edit: Adding my json structure here :

{
  "took": 28,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 12170,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "filtered": {
      "doc_count": 5,
      "POSCode": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "01200000129",
            "doc_count": 4,
            "POSCodeModifier": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "0",
                  "doc_count": 4,
                  "CSP": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": "5555",
                        "doc_count": 4,
                        "per_stock": {
                          "buckets": [
                            {
                              "key_as_string": "2018-02-26",
                              "key": 1519603200000,
                              "doc_count": 0,
                              "avg_week_qty_sales": {
                                "value": 0
                              }
                            },
                            {
                              "key_as_string": "2018-03-05",
                              "key": 1520208000000,
                              "doc_count": 1,
                              "avg_week_qty_sales": {
                                "value": 10
                              }
                            },
                            {
                              "key_as_string": "2018-03-12",
                              "key": 1520812800000,
                              "doc_count": 1,
                              "avg_week_qty_sales": {
                                "value": 300
                              }
                            },
                            {
                              "key_as_string": "2018-03-19",
                              "key": 1521417600000,
                              "doc_count": 1,
                              "avg_week_qty_sales": {
                                "value": 1000
                              }
                            },
                            {
                              "key_as_string": "2018-03-26",
                              "key": 1522022400000,
                              "doc_count": 1,
                              "avg_week_qty_sales": {
                                "value": 9
                              }
                            }
                          ]
                        },
                        "market_week_metrics": {
                          "count": 4,
                          "min": 9,
                          "max": 1000,
                          "avg": 329.75,
                          "sum": 1319,
                          "sum_of_squares": 1090181,
                          "variance": 163810.1875,
                          "std_deviation": 404.7347124969639,
                          "std_deviation_bounds": {
                            "upper": 1139.2194249939278,
                            "lower": -479.71942499392776
                          }
                        }
                      }
                    ]
                  }
                }
              ]
            }
          },
          {
            "key": "00980030003",
            "doc_count": 1,
            "POSCodeModifier": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "0",
                  "doc_count": 1,
                  "CSP": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": "5555",
                        "doc_count": 1,
                        "per_stock": {
                          "buckets": [
                            {
                              "key_as_string": "2018-02-26",
                              "key": 1519603200000,
                              "doc_count": 0,
                              "avg_week_qty_sales": {
                                "value": 0
                              }
                            },
                            {
                              "key_as_string": "2018-03-05",
                              "key": 1520208000000,
                              "doc_count": 1,
                              "avg_week_qty_sales": {
                                "value": 15
                              }
                            },
                            {
                              "key_as_string": "2018-03-12",
                              "key": 1520812800000,
                              "doc_count": 0,
                              "avg_week_qty_sales": {
                                "value": 0
                              }
                            },
                            {
                              "key_as_string": "2018-03-19",
                              "key": 1521417600000,
                              "doc_count": 0,
                              "avg_week_qty_sales": {
                                "value": 0
                              }
                            },
                            {
                              "key_as_string": "2018-03-26",
                              "key": 1522022400000,
                              "doc_count": 0,
                              "avg_week_qty_sales": {
                                "value": 0
                              }
                            }
                          ]
                        },
                        "market_week_metrics": {
                          "count": 1,
                          "min": 15,
                          "max": 15,
                          "avg": 15,
                          "sum": 15,
                          "sum_of_squares": 225,
                          "variance": 0,
                          "std_deviation": 0,
                          "std_deviation_bounds": {
                            "upper": 15,
                            "lower": 15
                          }
                        }
                      }
                    ]
                  }
                }
              ]
            }
          }
        ]
      }
    }
  }
}

The values that i'm trying to fetch

for i in range(len(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'])):
            list6.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['avg'])
            list7.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['key'])
            list8.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['max']-unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['min'])
            list9.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['max'])
            list10.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['min'])

I have a feeling that the slow bit is the sequential list appends, not the dataframe series constructions. Are you sure that the bottleneck is in assigning pandas series from lists? — jpp
– jpp, Commented Apr 15, 2018 at 18:20
Even i have a feeling "slow bit is the sequential list appends" because i'm using around 20 such appends in my code. — user7422128
– user7422128, Commented Apr 15, 2018 at 18:22
In that case, you might have to show us some of the json. Not all 20 keys and all rows, maybe 4 keys for 4 rows is enough. Then we can suggest a better way for you to build your dataframe. — jpp
– jpp, Commented Apr 15, 2018 at 18:24

romulomadu · Accepted Answer · 2018-04-16 17:52:03Z

1

You can create just one list and append a tuple with n dim, where n is the number of cols, each iteration, for example:

for i in range(3):
    some_list.append((i, i+3))

Results:

[(0, 3), (1, 4), (2, 5)]

Passing it to a dataframe gives:

pd.DataFrame(some_list, columns=['col1', 'col2'])
   col1  col2
0     0     3
1     1     4
2     2     5

Try to adapt it to your solution.

edited Apr 16, 2018 at 17:52

answered Apr 15, 2018 at 19:08

romulomadu

6776 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user7422128 Over a year ago

kudos! You saved me from using some extra space

Collectives™ on Stack Overflow

Python pandas data frame code optimisation

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related