1

I have such a list

[9308, '127.05', [{'id': 8568, 'name': 'some product name', 'product_id': 4204, 'variation_id': 0, 'quantity': 1, 'tax_class': '', 'subtotal': '139.00', 'subtotal_tax': '0.00', 'total': '118.15', 'total_tax': '0.00', 'taxes': [], 'meta_data': [], 'sku': '', 'price': 118.15}], 9306, '98.89', [{'id': 8566, 'name': 'some product name', 'product_id': 4200, 'variation_id': 0, 'quantity': 1, 'tax_class': '', 'subtotal': '89.99', 'subtotal_tax': '0.00', 'total': '89.99', 'total_tax': '0.00', 'taxes': [], 'meta_data': [], 'sku': '', 'price': 89.99}]

I'd like to transform it into a data frame that looks like:

ID   Total Value     Product IDs
9308   127.05        4204
9306   98.89         4200
etc.

also some IDs could have few product ids, so the list should look like:

ID   Total Value     Product IDs
9308   127.05        4204
9308   127.05        4200
9308   127.05        5555

Can anyone help me? I'm a very beginner in Python.

3 Answers 3

1

IIUC, and your input list is really structured well, then you can use some hard coding like this:

df_out = pd.concat([pd.DataFrame(l[::3], columns=['ID']), 
                    pd.DataFrame(l[1::3], columns=['Total Value']), 
                    pd.concat([pd.DataFrame(i) for i in l[2::3]], ignore_index=True)
                      .loc[:, 'product_id']], axis=1)

Output:

     ID Total Value  product_id
0  9308      127.05        4204
1  9306       98.89        4200
Sign up to request clarification or add additional context in comments.

1 Comment

in your case it takes only first product_id for ID. As I said one ID can have multipple product ids
1

You can use the grouper recipe from itertools to easily do this. https://docs.python.org/3/library/itertools.html#itertools-recipes

# Copied from itertools recipes link above
from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

Then you use that to iterate through your data in chunks of 3, and combine it into a defaultdict to build your data into the correct format for a DataFrame

import collections

clean_data = collections.defaultdict(list)

for entry_id, total_value, product_json in grouper(data, 3):
    for record in product_json:
        clean_data["id"].append(entry_id)
        clean_data["total_value"].append(total_value)
        clean_data["product_id"].append(record["product_id"])
    
df = pd.DataFrame(clean_data)
print(df)
     id total_value  product_id
0  9308      127.05        4204
1  9306       98.89        4200

This will also handle the case where you have more than 1 record in the 3rd of your data (e.g. if you have 2 dictionaries in your list, instead of just 1)

Comments

1

The assumption is that the array has a structure of ID, Total Value and product info that is encoded as json. If we loop through that array three items at a time, then the below should work fine:

def unzip(i,v,d):
    return pd.DataFrame(d).assign(ID=i, TotalValue=v )
df = pd.concat([unzip(i,v,d) for i, v, d in zip(*[iter(js)]*3)])
df[['ID','TotalValue', 'product_id']]

Example output:

enter image description here

Notes: Details on how to iterate multiple items at a time. When working with json, I find it easier to create a dataframe first, and then add additional columns that are broadcasted to all dataframe rows.

Assumption: input string was missing closing "]" otherwise it is not an array. The above code works with this input.

js = [ 9308, '127.05', [{'id': 8568, 'name': 'some product name', 'product_id': 4204, 'variation_id': 0, 'quantity': 1, 'tax_class': '', 'subtotal': '139.00', 'subtotal_tax': '0.00', 'total': '118.15', 'total_tax': '0.00', 'taxes': [], 'meta_data': [], 'sku': '', 'price': 118.15}], 
        9306, '98.89',  [{'id': 8566, 'name': 'some product name', 'product_id': 4200, 'variation_id': 0, 'quantity': 1, 'tax_class': '', 'subtotal': '89.99', 'subtotal_tax': '0.00', 'total': '89.99', 'total_tax': '0.00', 'taxes': [], 'meta_data': [], 'sku': '', 'price': 89.99}]]

2 Comments

oh sorry,it doesn't work. It takes first id (like: [{'id': 8568) not product_id
It is replicating your expected result. Could you provide an example input where it is not working?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.