2

everyone.

Below is the code I'm using to parse a text file:

import pandas as pd

tags = ['129','30','32','851','9730','9882'] 
rows = []

file = open('D:\\python\\redi_fix\\redi_august.txt','r') 
content = file.readlines() 
for line in content:
    for message in line.split('\t'):
        try:
            row_dict = {}
            tag,val = message.split('=')        
            if tag in tags:
                row_dict[tag]=val
                rows.append(row_dict)
        except:
            pass

Creating a pandas dataframe from rows yields the following result:

129     30      32      851     9730    9882
r170557 NaN     NaN     NaN     NaN     NaN
NaN     ARCA    NaN     NaN     NaN     NaN
NaN     NaN     100     NaN     NaN     NaN
r170557 NaN     NaN     NaN     NaN     NaN
NaN     ARCA    NaN     NaN     NaN     NaN
NaN     NaN     300     NaN     NaN     NaN

Looks like every value for a key is on a different row. The result I'm struggling to achieve is all values to be on the same row - see below for example:

129     30      32      851     9730    9882
r170557 ARCA    100     NaN     NaN     NaN
r170557 ARCA    300     NaN     NaN     NaN
1

2 Answers 2

4

Using your result dataframe, we need sorted and dropna

result.apply(lambda x : sorted(x,key=pd.isnull)).dropna(thresh=1)
Out[1171]: 
       129    30     32  851  9730  9882
0  r170557  ARCA  100.0  NaN   NaN   NaN
1  r170557  ARCA  300.0  NaN   NaN   NaN
Sign up to request clarification or add additional context in comments.

5 Comments

sorted is usually my thing :)
The above solved my issue exactly as intended. Have to admit I do not fully understand why and how it works, will need some time to get my head around it. Thanks to everyone for the input.
@Selim Yw~ :-) , If you have any question , let me know
@Wen, can you please shed some light on what is the logic of the code you suggested, what exactly does your statement; Thanks for your time.
@Selim apply , is a function go through column by column , you can treat it as an enhancement of for loop, sorted, by key , is sorting the value by the assign level, which here, they will sort by not null value firstly then the null value, dropna here, is drop the row have all null value which is not need .:-)
4

If you want to "collapse" your NaNs, you can perform a groupby + agg on first/last:

df.groupby(df['129'].notnull().cumsum(), as_index=False).agg('first')

       129    30     32  851  9730  9882
0  r170557  ARCA  100.0  NaN   NaN   NaN
1  r170557  ARCA  300.0  NaN   NaN   NaN

4 Comments

@Wen Nice! Meanwhile this I learned from piR or jezrael, I think.
Really nice one ! +1
@MaxU Thanks, means a lot coming from you. :-)
This solution also worked, albeit I don't fully understand it. Will investigate further. Thanks a lot for the help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.