Remove whitespace from list of strings with pandas/python

Question

I have a dataframe in which one columns values are lists of strings. here the structure of the file to read:

[
    {
        "key1":"value1 ",
        "key2":"2",
        "key3":["a","b  2 "," exp  white   space 210"],
    },
    {
        "key1":"value1 ",
        "key2":"2",
        "key3":[],
    },

]

I need to remove all white space for each item if it is more than one white space. expected output:

[
    {
        "key1":"value1",
        "key2":"2",
        "key3":["a","b2","exp white space 210"],
    },
    {
        "key1":"value1",
        "key2":"2",
        "key3":[],
    }
]

Note: I have some value that are empty in some lines e.g "key3":[]

Use df.replace('\s+', ' ', regex=True) for multiple spaces and use str.strip for the leading and trailing spaces — It_is_Chris
– It_is_Chris, Commented Mar 18, 2022 at 15:20
Please change the question to put a sample problematic input, i.e., your empty list. People should be able to cut and paste your sample and reproduce the actual problem you're struggling with. — joanis
– joanis, Commented Mar 18, 2022 at 15:58

gremur · Accepted Answer · 2022-03-18 17:25:07Z

1

If I understand correctly some of your dataframe cells have list type values.

The file_name.json content is below:

[
    {
        "key1": "value1 ",
        "key2": "2",
        "key3": ["a", "b  2 ", " exp  white   space 210"]
    }, 
    {
        "key1": "value1 ",
        "key2": "2",
        "key3": []
    }
]

Possible solution in this case is the following:

import pandas as pd
import re

df = pd.read_json("file_name.json")


def cleanup_data(value):
    if value and type(value) is list:
        return [re.sub(r'\s+', ' ', x.strip()) for x in value]
    elif value and type(value) is str:
        return re.sub(r'\s+', ' ', value.strip())
    else:
        return value

# apply cleanup function to all cells in dataframe
df = df.applymap(cleanup_data)

df

Returns

     key1  key2                           key3
0  value1     2  [a, b 2, exp white space 210]
1  value1     2                             []

edited Mar 18, 2022 at 17:25

answered Mar 18, 2022 at 16:59

gremur

1,6902 gold badges9 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Learner Over a year ago

I have an array of object, so this will not work

gremur Over a year ago

I updated the code to the new format of input data

user17242583 · Accepted Answer · 2022-03-18 15:24:59Z

0

If I understand correctly:

df = pd.read_json('''{
    "key1":"value1 ",
    "key2":"value2",
    "key3":["a","b   "," exp  white   space "],
    "key2":" value2"
}''')

df = df.apply(lambda col: col.str.strip().str.replace(r'\s+', ' ', regex=True))

Output:

>>> df
     key1    key2             key3
0  value1  value2                a
1  value1  value2                b
2  value1  value2  exp white space

>>> df.to_numpy()
array([['value1', 'value2', 'a'],
       ['value1', 'value2', 'b'],
       ['value1', 'value2', 'exp white space']], dtype=object)

answered Mar 18, 2022 at 15:24

user17242583

5 Comments

Learner Over a year ago

I got this error AttributeError: Can only use .str accessor with string values!.

user17242583 Over a year ago

Will you please provide how you're reading the JSON file in the question? I think we're reading it differently, thus the error and your end and not on mine :)

Learner Over a year ago

df = pd.read_json("filename.json")

user17242583 Over a year ago

When I paste your JSON into filename.json, run df = pd.read_json("filename.json") and then df = df.apply(lambda col: col.str.strip().str.replace(r'\s+', ' ', regex=True)), it produces a dataframe just like the one in my answer. So I can't tell what's wrong...

Learner Over a year ago

I guess because that i have some value that are empty in some lines e.g "key3":[]

Collectives™ on Stack Overflow

Remove whitespace from list of strings with pandas/python

2 Answers 2

2 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related