Pandas trim leading & trailing white space in a dataframe

Question

develop a function that Trims leading & trailing white space.

here is a simple sample, but real file contains far more complex rows and columns.

df=pd.DataFrame([["A b ",2,3],[np.nan,2,3],\
[" random",43,4],[" any txt is possible "," 2 1",22],\
["",23,99],[" help ",23,np.nan]],columns=['A','B','C'])

the result should eliminate all leading & trailing white space, but retain the space inbetween the text.

df=pd.DataFrame([["A b",2,3],[np.nan,2,3],\
["random",43,4],["any txt is possible","2 1",22],\
["",23,99],["help",23,np.nan]],columns=['A','B','C'])

Mind that the function needs to cover all possible situations. thank you

Can you show us both an input and and output of what you're after, as well as what you've tried so far. Try to detail what is going wrong. — scagood
– scagood, Commented Mar 29, 2018 at 8:30
@scagood, the second code should give a final result of what it should look like — S.Gu
– S.Gu, Commented Mar 29, 2018 at 8:33

Fred · Accepted Answer · 2024-02-28 10:06:18Z

35

I think need check if values are strings, because mixed values in column - numeric with strings and for each string call strip:

df = df.map(lambda x: x.strip() if isinstance(x, str) else x)
print (df)
                     A    B     C
0                  A b    2   3.0
1                  NaN    2   3.0
2               random   43   4.0
3  any txt is possible  2 1  22.0
4                        23  99.0
5                 help   23   NaN

If columns have same dtypes, not get NaNs like in your sample for numeric values in column B:

cols = df.select_dtypes(['object']).columns
df[cols] = df[cols].apply(lambda x: x.str.strip())
print (df)
                     A    B     C
0                  A b  NaN   3.0
1                  NaN  NaN   3.0
2               random  NaN   4.0
3  any txt is possible  2 1  22.0
4                       NaN  99.0
5                 help  NaN   NaN

(original answer used applymap which is depreciated)

edited Feb 28, 2024 at 10:06

Fred

6039 silver badges21 bronze badges

answered Mar 29, 2018 at 8:34

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

DeepKling · Accepted Answer · 2022-02-03 12:46:41Z

28

I think there is a one-liner for that using regex and replace:

df = df.replace(r"^ +| +$", r"", regex=True)

Explanation for the regex:

^ is line start
(space and plus, +) is one or more spaces
| is or
$ is line end.

So it searches for leading (line start and spaces) and trailing (spaces and line end) spaces and replaces them with an empty string.

edited Feb 3, 2022 at 12:46

answered May 10, 2021 at 7:11

DeepKling

3293 silver badges6 bronze badges

1 Comment

binford Over a year ago

Just FYI: "Regular expressions will only substitute on strings" says pandas docs.

Collectives™ on Stack Overflow

Pandas trim leading & trailing white space in a dataframe

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related