1

Suppose I have a dataframe looking something like this:

  col1 col2 col3 col4
0    A    B    F    O
1    A         G    Q
2    A    C    G    P
3    A         H     
4    A    D    I     
5    A    D    I     
6    A         J    U
7    A    E         J

How can I shift the columns if the column value is empty?

  col1 col2  col3  col4
0    A    B     F     O
1    A    G     Q 
2    A    C     G     P
3    A    H  
4    A    D     I 
5    A    D     I 
6    A    J     U 
7    A    E     J 

I thought I could check current column, if it's empty, take the next column value and make that empty.

for col in df.columns:    
    df[col] = np.where((df[col] == ''), df[f'col{int(col[-1])+1}'], df[col])
    df[f'col{int(col[-1])+1}'] = np.where((df[col] == ''), '', df[col])

But I am failing somewhere. Sample df below.

df = pd.DataFrame(
    {
        'col1': ['A','A','A','A','A','A','A','A'],
        'col2': ['B','','C','','D','D','','E'],
        'col3': ['F','G','G','H','I','I','J',''],
        'col4': ['O','Q','P','','','','U','J']
    }
)

3 Answers 3

3

Replace empty string with NaN

df = df.replace('', np.nan)

Apply dropna row-wise

odf = df.apply(lambda x: pd.Series(x.dropna().values), axis=1)

To retain column names,

odf.columns = df.columns

NOTE: It is always good to represent missing data with NaN

Output

  col1 col2 col3 col4
0    A    B    F    O
1    A    G    Q  NaN
2    A    C    G    P
3    A    H  NaN  NaN
4    A    D    I  NaN
5    A    D    I  NaN
6    A    J    U  NaN
7    A    E    J  NaN
Sign up to request clarification or add additional context in comments.

1 Comment

This answer is shear Pythonic elegance. What a fantastic line of code! There was a small issue with columns to columns but its trivial and like compared with for example stackoverflow.com/questions/72104599/… ... think you can see the point. Summary, this post needs upvoting alot. This coder is good.
2

One way is to use np.argsort:

s = df.to_numpy()
orders = np.argsort(s=='', axis=1, kind='mergesort')

df[:] = s[np.arange(len(s))[:,None],orders]

Output:

  col1 col2 col3 col4
0    A    B    F    O
1    A    G    Q     
2    A    C    G    P
3    A    H          
4    A    D    I     
5    A    D    I     
6    A    J    U     
7    A    E    J     

Note: A very similar approach can be found in this question.

1 Comment

numpy solutions also very fast
0

You can count the number of empty values for a column, then sort it, and finally get the desired datframe..

counts = {}
for col in df.columns.to_list():
    counts[col] = (df[col]== '').sum()  #Based on the example you have provided.
# Then sort the dictionary based on counts.
counts = dict(sorted(counts.items(), key=lambda item: item[1]))
#Assign back to the dataframe.
df = df[[*counts.keys()]]
df

  col1 col3 col2 col4
0    A    F    B    O
1    A    G         Q
2    A    G    C    P
3    A    H          
4    A    I    D     
5    A    I    D     
6    A    J         U
7    A         E    J

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.