Parse dict column and add new columns to dataframe [duplicate]

Question

The pandas df is created by reading CSV file with more than 100+ columns and 2+ mil rows which looks similar to

project_code,dict_data
A001,{"a": "1", "b": "2", "c": "3"}
B001,{"a": "4", "b": "5", "c": "6"}
C001,{"b": "7", "c": "8"}
D001,{"c": "9"}
E001,{"a": "10", "c": "11"}

Using for loop & if conditions, the dict column key value data is parsed and the df looks as follow. And at times new keys get added and I have to update my if conditions to parse those keys

project_code   dict_data                         a     b     c    d
A001           {"a": "1", "b": "2", "c": "3"}    "1"   "2"   "3"  ""
B001           {"a": "4", "b": "5", "d": "6"}    "4"   "5"   ""   "6"
C001           {"b": "7", "c": "8"}              ""    "7"   "8"  ""
D001           {"d": "9"}                        ""    ""    ""   "9"
E001           {"a": "10", "c": "11"}            "10"  ""    "11" ""

The code takes hours to run and is not the best or efficient way for processing dict data column. Would appreciate if someone could guide or suggest me the better & efficient way.

when you have list or dict in cell, plz provide your input as code. — Panda Kim
– Panda Kim, Commented Jan 21, 2024 at 12:28

J_H · Accepted Answer · 2024-01-21 12:29:50Z

0

Two million rows is an immediate tip off.

Use a relational database already! Postgres would be a good choice, but even the very simple sqlite that comes standard with pythons built-in libraries would be a big improvement on what you’re doing.

The magic of a DB table which is indexed is you can often access just the subset of data that you need, while leaving most rows untouched on disk. This is quite different from being forced to load two million rows before you can even begin to do anything with your data.

Pandas has good support for writing a dataframe to a table and for reading table rows into a new data frame.

answered Jan 21, 2024 at 12:29

J_H

21.2k5 gold badges29 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Parse dict column and add new columns to dataframe [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related