Create Pandas Dataframe from List of Dictionaries with missing values for some keys

Question

everyone.

Below is the code I'm using to parse a text file:

import pandas as pd

tags = ['129','30','32','851','9730','9882'] 
rows = []

file = open('D:\\python\\redi_fix\\redi_august.txt','r') 
content = file.readlines() 
for line in content:
    for message in line.split('\t'):
        try:
            row_dict = {}
            tag,val = message.split('=')        
            if tag in tags:
                row_dict[tag]=val
                rows.append(row_dict)
        except:
            pass

Creating a pandas dataframe from rows yields the following result:

129     30      32      851     9730    9882
r170557 NaN     NaN     NaN     NaN     NaN
NaN     ARCA    NaN     NaN     NaN     NaN
NaN     NaN     100     NaN     NaN     NaN
r170557 NaN     NaN     NaN     NaN     NaN
NaN     ARCA    NaN     NaN     NaN     NaN
NaN     NaN     300     NaN     NaN     NaN

Looks like every value for a key is on a different row. The result I'm struggling to achieve is all values to be on the same row - see below for example:

129     30      32      851     9730    9882
r170557 ARCA    100     NaN     NaN     NaN
r170557 ARCA    300     NaN     NaN     NaN

group by '129','30','32' pandas.pydata.org/pandas-docs/stable/generated/… — Keith
– Keith, Commented Nov 17, 2017 at 21:10

BENY · Accepted Answer · 2017-11-17 21:13:04Z

4

Using your result dataframe, we need sorted and dropna

result.apply(lambda x : sorted(x,key=pd.isnull)).dropna(thresh=1)
Out[1171]: 
       129    30     32  851  9730  9882
0  r170557  ARCA  100.0  NaN   NaN   NaN
1  r170557  ARCA  300.0  NaN   NaN   NaN

answered Nov 17, 2017 at 21:13

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Bharath M Shetty Over a year ago

sorted is usually my thing :)

Selim Over a year ago

The above solved my issue exactly as intended. Have to admit I do not fully understand why and how it works, will need some time to get my head around it. Thanks to everyone for the input.

BENY Over a year ago

@Selim Yw~ :-) , If you have any question , let me know

Selim Over a year ago

@Wen, can you please shed some light on what is the logic of the code you suggested, what exactly does your statement; Thanks for your time.

BENY Over a year ago

@Selim apply , is a function go through column by column , you can treat it as an enhancement of for loop, sorted, by key , is sorting the value by the assign level, which here, they will sort by not null value firstly then the null value, dropna here, is drop the row have all null value which is not need .:-)

cs95 · Accepted Answer · 2017-11-17 21:13:01Z

4

If you want to "collapse" your NaNs, you can perform a groupby + agg on first/last:

df.groupby(df['129'].notnull().cumsum(), as_index=False).agg('first')

       129    30     32  851  9730  9882
0  r170557  ARCA  100.0  NaN   NaN   NaN
1  r170557  ARCA  300.0  NaN   NaN   NaN

answered Nov 17, 2017 at 21:13

cs95

406k106 gold badges744 silver badges797 bronze badges

4 Comments

cs95 Over a year ago

@Wen Nice! Meanwhile this I learned from piR or jezrael, I think.

MaxU - stand with Ukraine Over a year ago

Really nice one ! +1

cs95 Over a year ago

@MaxU Thanks, means a lot coming from you. :-)

Selim Over a year ago

This solution also worked, albeit I don't fully understand it. Will investigate further. Thanks a lot for the help.

Collectives™ on Stack Overflow

Create Pandas Dataframe from List of Dictionaries with missing values for some keys

2 Answers 2

5 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related