2

Sample data to parse (a list of unicode strings):

[u'\n', u'1\xa0', u'Some text here.', u'\n', u'1\xa0', u'Some more text here.', 
u'\n', u'1\xa0', u'Some more text here.']

I want to remove \xa0 from these strings.

EDIT: Current Method Not Working:

def remove_from_list(l, x):
  return [li.replace(x, '') for li in l]

remove_from_list(list, u'\xa0')

I'm still getting the exact same output.

12
  • Have you tried anything yet? Commented May 17, 2013 at 19:07
  • yes, I'll show my attempts Commented May 17, 2013 at 19:09
  • Check these, stackoverflow.com/questions/3939361/…, tutorialspoint.com/python/string_replace.htm Commented May 17, 2013 at 19:09
  • Which part of this do you not know how to do? How to turn u'1\xa0' into u'10'? Or how to do the same thing for each element in a list? Commented May 17, 2013 at 19:11
  • no @abarnert turn it into u'1' Commented May 17, 2013 at 19:12

3 Answers 3

5

The problem is different in each version of your code. Let's start with this:

newli = re.sub(x, '', li)
l[li].replace(newli)

First, newli is already the line you want—that's what re.sub does—so you don't need replace here at all. Just assign newli.

Second, l[li] isn't going to work, because li is the value of the line, not the index.


In this version, it's a but more subtle:

li = re.sub(x, '', li)

re.sub is returning a new string, and you're assigning that string to li. But that doesn't affect anything in the list, it's just saying "li no longer refers to the current line in the list, it now refers to this new string".


To only way to replace the list elements is to get the index so you can use the [] operator. And to get that, you want to use enumerate.

So:

def remove_from_list(l, x):
  for index, li in enumerate(l):
    l[index] = re.sub(x, '', li)
  return l

But really, you probably do want to use str.replace—it's just that you want to use it instead of re.sub:

def remove_from_list(l, x):
  for index, li in enumerate(l):
    l[index] = li.replace(x, '')
  return l

Then you don't have to worry about what happens if x is a special character in regular expressions.


Also, in Python, you almost never want to modify an object in-place, and also return it. Either modify it and return None, or return a new copy of the object. So, either:

def remove_from_list(l, x):
  for index, li in enumerate(l):
    newli = li.replace(x, '')
    l[index] = newli

… or:

def remove_from_list(l, x):
  new_list = []
  for li in l:
    newli = li.replace(x, '')
    new_list.append(newli)
  return new_list

And you can simply the latter to a list comprehension, as in unutbu's answer:

def remove_from_list(l, x):
  new_list = [li.replace(x, '') for li in l]
  return new_list

The fact that the second one is easier to write (no need for enumerate, has a handy shortcut, etc.) is no coincidence—it's usually the one you want, so Python makes it easy.


I don't know how else to make this clearer, but one last try:

If you choose the version that returns a fixed-up new copy of the list instead of modifying the list in-place, your original list will not be modified in any way. If you want to use the fixed-up new copy, you have to use the return value of the function. For example:

>>> def remove_from_list(l, x):
...     new_list = [li.replace(x, '') for li in l]
...     return new_list
>>> a = [u'\n', u'1\xa0']
>>> b = remove_from_list(a, u'\xa0')
>>> a
[u'\n', u'1\xa0']
>>> b
[u'\n', u'1']

The problem you're having with your actual code turning everything into a list of 1-character and 0-character strings is that you don't actually have a list of strings in the first place, you have one string that's a repr of a list of strings. So, for li in l means "for each character li in the string l, instead of for each stringliin the listl`.

Sign up to request clarification or add additional context in comments.

4 Comments

For some reason it still isn't working. I am using return [li.replace(x, '') for li in l] based on your last line but it still has those characters in place.
I just updated the answer to show what I did based on this answer.
This won't modify l in-place, it will return a new list with those characters stripped out of each string. You have to print that new list, or assign it to something, or whatever.
I am, just not showing in my example - I'll update my question to show you.
3

Another option if you're only interested in ASCII chars (as you mention characters, but this also also happens to work for the case of the posted example):

[text.encode('ascii', 'ignore') for text in your_list]

Comments

1

You could use a list comprehension and str.replace:

>>> items
[u'\n',
 u'1\xa0',
 u'Some text here.',
 u'\n',
 u'1\xa0',
 u'Some more text here.',
 u'\n',
 u'1\xa0',
 u'Some more text here.']
>>> [item.replace(u'\xa0', u'') for item in items]
[u'\n',
 u'1',
 u'Some text here.',
 u'\n',
 u'1',
 u'Some more text here.',
 u'\n',
 u'1',
 u'Some more text here.']

4 Comments

@DanO'Day: What valid characters do you want to maintain that this version doesn't? This retains everything except for \xa0, which is exactly what you asked for.
@DanO'Day: The code didn't change.
@Matthias my bad, still not working though
What does "not working" mean? When you run this exact code in your Python interpreter, you get different results that unutbu showed? Or the results unutbu showed are wrong in some way?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.