0

Lets say I imported a really messy data from a PFD and I´m cleaning it. I have something like this:

Name Type Date other1 other2 other3
Name1 '' '' Type1 '' Date1
Name2 '' '' '' Type2 Date2
Name3 '' '' Type3 Date3 ''
Name4 '' Type4 '' '' Date4
Name5 Type5 '' Date5 '' ''

And so on. As you can see, Type is always before date on each row, but I basically need to delete all '' (currently empty strings on the DataFrame) while moving everything to the left so they align with their respective Type and Date columns. Additionally, there's more columns to the right with the same problem, but for structural reasons I cant remove ALL '', the solution I´m looking for would just move 'everything to the left' so to speak (as it happens with pd.shift).

I appreciate your help.

1
  • When you shift the values to each row on the left, the column names remain the same? if you could post your desired output it would also be nice. Commented May 3, 2022 at 22:18

2 Answers 2

2
data = df.values.flatten()
pd.DataFrame(data[data != ""].reshape(-1, 3), columns = ['Name','Type', 'Date'])

or:

pd.DataFrame(df.values[df.values != ""].reshape(-1, 3), columns = ['Name','Type', 'Date'])

output:

    Name    Type    Date
0   Name1   Type1   Date1
1   Name2   Type2   Date2
2   Name3   Type3   Date3
3   Name4   Type4   Date4
4   Name5   Type5   Date5

without reshape:

pd.DataFrame(df.apply(lambda x: (a:=np.array(x))[a != ""] , axis = 1).values.tolist())

or:

s = df[0].copy()
for col in df.columns[1:]:
    s += " " + df[col]
pd.DataFrame(s.str.split().values.tolist(), columns = ['Name','Type', 'Date'])
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your help. But in this case a reshape isn't easy to perform, since there's a lot more columns. I'm looking for something similar to a df.shift(axis='columns') but to the left and applying only for each row's case.
@CA I think there is no shift like way for this problem, at least on my knowledge
1

What worked for me was:

while '' in df['Type'].unique():
    for i,row in df.iterrows():
        if row['Type'] == '':
            df.iloc[i, 1:] = df.iloc[i, 1:].shift(-1, fill_value='')

And the same for next column

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.