24

develop a function that Trims leading & trailing white space.

here is a simple sample, but real file contains far more complex rows and columns.

df=pd.DataFrame([["A b ",2,3],[np.nan,2,3],\
[" random",43,4],[" any txt is possible "," 2 1",22],\
["",23,99],[" help ",23,np.nan]],columns=['A','B','C'])

the result should eliminate all leading & trailing white space, but retain the space inbetween the text.

df=pd.DataFrame([["A b",2,3],[np.nan,2,3],\
["random",43,4],["any txt is possible","2 1",22],\
["",23,99],["help",23,np.nan]],columns=['A','B','C'])

Mind that the function needs to cover all possible situations. thank you

2
  • Can you show us both an input and and output of what you're after, as well as what you've tried so far. Try to detail what is going wrong. Commented Mar 29, 2018 at 8:30
  • @scagood, the second code should give a final result of what it should look like Commented Mar 29, 2018 at 8:33

2 Answers 2

35

I think need check if values are strings, because mixed values in column - numeric with strings and for each string call strip:

df = df.map(lambda x: x.strip() if isinstance(x, str) else x)
print (df)
                     A    B     C
0                  A b    2   3.0
1                  NaN    2   3.0
2               random   43   4.0
3  any txt is possible  2 1  22.0
4                        23  99.0
5                 help   23   NaN

If columns have same dtypes, not get NaNs like in your sample for numeric values in column B:

cols = df.select_dtypes(['object']).columns
df[cols] = df[cols].apply(lambda x: x.str.strip())
print (df)
                     A    B     C
0                  A b  NaN   3.0
1                  NaN  NaN   3.0
2               random  NaN   4.0
3  any txt is possible  2 1  22.0
4                       NaN  99.0
5                 help  NaN   NaN

(original answer used applymap which is depreciated)

Sign up to request clarification or add additional context in comments.

Comments

28

I think there is a one-liner for that using regex and replace:

df = df.replace(r"^ +| +$", r"", regex=True)

Explanation for the regex:

  • ^ is line start
  • (space and plus, +) is one or more spaces
  • | is or
  • $ is line end.

So it searches for leading (line start and spaces) and trailing (spaces and line end) spaces and replaces them with an empty string.

1 Comment

Just FYI: "Regular expressions will only substitute on strings" says pandas docs.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.