Remove offending characters from strings in list [duplicate]

Question

Sample data to parse (a list of unicode strings):

[u'\n', u'1\xa0', u'Some text here.', u'\n', u'1\xa0', u'Some more text here.', 
u'\n', u'1\xa0', u'Some more text here.']

I want to remove \xa0 from these strings.

EDIT: Current Method Not Working:

def remove_from_list(l, x):
  return [li.replace(x, '') for li in l]

remove_from_list(list, u'\xa0')

I'm still getting the exact same output.

Check these, stackoverflow.com/questions/3939361/…, tutorialspoint.com/python/string_replace.htm — Rupak
– Rupak, Commented May 17, 2013 at 19:09
Which part of this do you not know how to do? How to turn u'1\xa0' into u'10'? Or how to do the same thing for each element in a list? — abarnert
– abarnert, Commented May 17, 2013 at 19:11

abarnert · Accepted Answer · 2013-05-17 19:56:24Z

5

The problem is different in each version of your code. Let's start with this:

newli = re.sub(x, '', li)
l[li].replace(newli)

First, newli is already the line you want—that's what re.sub does—so you don't need replace here at all. Just assign newli.

Second, l[li] isn't going to work, because li is the value of the line, not the index.

In this version, it's a but more subtle:

li = re.sub(x, '', li)

re.sub is returning a new string, and you're assigning that string to li. But that doesn't affect anything in the list, it's just saying "li no longer refers to the current line in the list, it now refers to this new string".

To only way to replace the list elements is to get the index so you can use the [] operator. And to get that, you want to use enumerate.

So:

def remove_from_list(l, x):
  for index, li in enumerate(l):
    l[index] = re.sub(x, '', li)
  return l

But really, you probably do want to use str.replace—it's just that you want to use it instead of re.sub:

def remove_from_list(l, x):
  for index, li in enumerate(l):
    l[index] = li.replace(x, '')
  return l

Then you don't have to worry about what happens if x is a special character in regular expressions.

Also, in Python, you almost never want to modify an object in-place, and also return it. Either modify it and return None, or return a new copy of the object. So, either:

def remove_from_list(l, x):
  for index, li in enumerate(l):
    newli = li.replace(x, '')
    l[index] = newli

… or:

def remove_from_list(l, x):
  new_list = []
  for li in l:
    newli = li.replace(x, '')
    new_list.append(newli)
  return new_list

And you can simply the latter to a list comprehension, as in unutbu's answer:

def remove_from_list(l, x):
  new_list = [li.replace(x, '') for li in l]
  return new_list

The fact that the second one is easier to write (no need for enumerate, has a handy shortcut, etc.) is no coincidence—it's usually the one you want, so Python makes it easy.

I don't know how else to make this clearer, but one last try:

If you choose the version that returns a fixed-up new copy of the list instead of modifying the list in-place, your original list will not be modified in any way. If you want to use the fixed-up new copy, you have to use the return value of the function. For example:

>>> def remove_from_list(l, x):
...     new_list = [li.replace(x, '') for li in l]
...     return new_list
>>> a = [u'\n', u'1\xa0']
>>> b = remove_from_list(a, u'\xa0')
>>> a
[u'\n', u'1\xa0']
>>> b
[u'\n', u'1']

The problem you're having with your actual code turning everything into a list of 1-character and 0-character strings is that you don't actually have a list of strings in the first place, you have one string that's a repr of a list of strings. So, for li in l means "for each character li in the string l, instead of for each stringliin the listl`.

edited May 17, 2013 at 19:56

answered May 17, 2013 at 19:20

abarnert

368k54 gold badges626 silver badges691 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Dan Over a year ago

For some reason it still isn't working. I am using return [li.replace(x, '') for li in l] based on your last line but it still has those characters in place.

Dan Over a year ago

I just updated the answer to show what I did based on this answer.

abarnert Over a year ago

This won't modify l in-place, it will return a new list with those characters stripped out of each string. You have to print that new list, or assign it to something, or whatever.

Dan Over a year ago

I am, just not showing in my example - I'll update my question to show you.

Jon Clements · Accepted Answer · 2013-05-17 19:29:07Z

3

Another option if you're only interested in ASCII chars (as you mention characters, but this also also happens to work for the case of the posted example):

[text.encode('ascii', 'ignore') for text in your_list]

edited May 17, 2013 at 19:29

answered May 17, 2013 at 19:22

Jon Clements

143k34 gold badges254 silver badges288 bronze badges

Comments

unutbu · Accepted Answer · 2013-05-17 19:18:04Z

1

You could use a list comprehension and str.replace:

>>> items
[u'\n',
 u'1\xa0',
 u'Some text here.',
 u'\n',
 u'1\xa0',
 u'Some more text here.',
 u'\n',
 u'1\xa0',
 u'Some more text here.']
>>> [item.replace(u'\xa0', u'') for item in items]
[u'\n',
 u'1',
 u'Some text here.',
 u'\n',
 u'1',
 u'Some more text here.',
 u'\n',
 u'1',
 u'Some more text here.']

edited May 17, 2013 at 19:18

answered May 17, 2013 at 19:10

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

4 Comments

abarnert Over a year ago

@DanO'Day: What valid characters do you want to maintain that this version doesn't? This retains everything except for \xa0, which is exactly what you asked for.

Matthias Over a year ago

@DanO'Day: The code didn't change.

Dan Over a year ago

@Matthias my bad, still not working though

abarnert Over a year ago

What does "not working" mean? When you run this exact code in your Python interpreter, you get different results that unutbu showed? Or the results unutbu showed are wrong in some way?

Collectives™ on Stack Overflow

Remove offending characters from strings in list [duplicate]

3 Answers 3

4 Comments

Comments

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

4 Comments

Linked

Related