0

I am actually looking to optimize my code here in python. I'm hitting my ES(elastic search) and getting json response, now i'm iterating over json response and storing them as list to append them as column in dataframe

unmtchd_ESdata={"Response from Elastic seaach"}

    for i in range(len(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'])):
        list6.append(unmtchd_ESdata['avg'])
        list7.append(unmtchd_ESdata['key'])
        ....
        ....

    mkt_df=pd.DataFrame()
    mkt_df["market_avg_total_sales_count"]=dict6
    mkt_df["pos_code"]=dict7
    ...
    ....

At the end the result will have mkt_df dataframe with all the columns being assigned with values in the order of what was appended to the list. If a list suppose list6 is appended with values like [01200000129,00980030003] then it wil be present in the below form in data format and same applies for the rest as well

   market_avg_total_sales_count     pos_code 
0                        329.75  01200000129 
1                         15.00  00980030003 

Now my question here is i'm reading too many variables and i want them as dataframe values and obviously having N number of list is making my program in efficient because all these operations are in memory. Any suggestions on how to replicate such scenerio with less space and time complexity

Edit: Adding my json structure here :

{
  "took": 28,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 12170,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "filtered": {
      "doc_count": 5,
      "POSCode": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "01200000129",
            "doc_count": 4,
            "POSCodeModifier": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "0",
                  "doc_count": 4,
                  "CSP": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": "5555",
                        "doc_count": 4,
                        "per_stock": {
                          "buckets": [
                            {
                              "key_as_string": "2018-02-26",
                              "key": 1519603200000,
                              "doc_count": 0,
                              "avg_week_qty_sales": {
                                "value": 0
                              }
                            },
                            {
                              "key_as_string": "2018-03-05",
                              "key": 1520208000000,
                              "doc_count": 1,
                              "avg_week_qty_sales": {
                                "value": 10
                              }
                            },
                            {
                              "key_as_string": "2018-03-12",
                              "key": 1520812800000,
                              "doc_count": 1,
                              "avg_week_qty_sales": {
                                "value": 300
                              }
                            },
                            {
                              "key_as_string": "2018-03-19",
                              "key": 1521417600000,
                              "doc_count": 1,
                              "avg_week_qty_sales": {
                                "value": 1000
                              }
                            },
                            {
                              "key_as_string": "2018-03-26",
                              "key": 1522022400000,
                              "doc_count": 1,
                              "avg_week_qty_sales": {
                                "value": 9
                              }
                            }
                          ]
                        },
                        "market_week_metrics": {
                          "count": 4,
                          "min": 9,
                          "max": 1000,
                          "avg": 329.75,
                          "sum": 1319,
                          "sum_of_squares": 1090181,
                          "variance": 163810.1875,
                          "std_deviation": 404.7347124969639,
                          "std_deviation_bounds": {
                            "upper": 1139.2194249939278,
                            "lower": -479.71942499392776
                          }
                        }
                      }
                    ]
                  }
                }
              ]
            }
          },
          {
            "key": "00980030003",
            "doc_count": 1,
            "POSCodeModifier": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "0",
                  "doc_count": 1,
                  "CSP": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                      {
                        "key": "5555",
                        "doc_count": 1,
                        "per_stock": {
                          "buckets": [
                            {
                              "key_as_string": "2018-02-26",
                              "key": 1519603200000,
                              "doc_count": 0,
                              "avg_week_qty_sales": {
                                "value": 0
                              }
                            },
                            {
                              "key_as_string": "2018-03-05",
                              "key": 1520208000000,
                              "doc_count": 1,
                              "avg_week_qty_sales": {
                                "value": 15
                              }
                            },
                            {
                              "key_as_string": "2018-03-12",
                              "key": 1520812800000,
                              "doc_count": 0,
                              "avg_week_qty_sales": {
                                "value": 0
                              }
                            },
                            {
                              "key_as_string": "2018-03-19",
                              "key": 1521417600000,
                              "doc_count": 0,
                              "avg_week_qty_sales": {
                                "value": 0
                              }
                            },
                            {
                              "key_as_string": "2018-03-26",
                              "key": 1522022400000,
                              "doc_count": 0,
                              "avg_week_qty_sales": {
                                "value": 0
                              }
                            }
                          ]
                        },
                        "market_week_metrics": {
                          "count": 1,
                          "min": 15,
                          "max": 15,
                          "avg": 15,
                          "sum": 15,
                          "sum_of_squares": 225,
                          "variance": 0,
                          "std_deviation": 0,
                          "std_deviation_bounds": {
                            "upper": 15,
                            "lower": 15
                          }
                        }
                      }
                    ]
                  }
                }
              ]
            }
          }
        ]
      }
    }
  }
}

The values that i'm trying to fetch

for i in range(len(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'])):
            list6.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['avg'])
            list7.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['key'])
            list8.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['max']-unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['min'])
            list9.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['max'])
            list10.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['min'])
4
  • I have a feeling that the slow bit is the sequential list appends, not the dataframe series constructions. Are you sure that the bottleneck is in assigning pandas series from lists? Commented Apr 15, 2018 at 18:20
  • Even i have a feeling "slow bit is the sequential list appends" because i'm using around 20 such appends in my code. Commented Apr 15, 2018 at 18:22
  • In that case, you might have to show us some of the json. Not all 20 keys and all rows, maybe 4 keys for 4 rows is enough. Then we can suggest a better way for you to build your dataframe. Commented Apr 15, 2018 at 18:24
  • Added you can have a look @jpp Commented Apr 15, 2018 at 18:35

1 Answer 1

1

You can create just one list and append a tuple with n dim, where n is the number of cols, each iteration, for example:

for i in range(3):
    some_list.append((i, i+3))

Results:

[(0, 3), (1, 4), (2, 5)]

Passing it to a dataframe gives:

pd.DataFrame(some_list, columns=['col1', 'col2'])
   col1  col2
0     0     3
1     1     4
2     2     5

Try to adapt it to your solution.

Sign up to request clarification or add additional context in comments.

1 Comment

kudos! You saved me from using some extra space

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.