2

I'm trying to replace all identical elements in a list with a new string, and also trying to move away from using loops for everything.

# My aim is to turn:
list = ["A", "", "", "D"]
# into:
list = ["A", "???", "???", "D"]
# but without using a for-loop

I started off with variations of comprehensions:

# e.g. 1
['' = "???"(i) for i in list]
# e.g. 2
list = [list[i] .replace '???' if ''(i) for i in range(len(lst))]

Then I tried to employ Python's map function as seen here:

list[:] = map(lambda i: "???", list)
# I couldn't work out where to add the '""' to be replaced.

Finally I butchered a third solution:

list[:] = ["???" if ''(i) else i for i in list]

I feel like I'm moving further from a sensible line of attack, I just want a tidy way to complete a simple task.

15
  • 1
    Does this answer your question? In-place replacement of all occurrences of an element in a list in python Commented Aug 20, 2021 at 13:30
  • Yes, thank you, however I also got ample novel solutions to my answer, including one which used python's map function correctly. Commented Aug 20, 2021 at 13:35
  • 2
    note: a list-comprehension is in fact a for loop... Commented Aug 20, 2021 at 13:36
  • @PierreD is it faster or just more concise for a human to read? Commented Aug 20, 2021 at 13:37
  • also: please don't redefine list as a variable. Commented Aug 20, 2021 at 13:37

5 Answers 5

3

You can try this:

list1 = ["A", "", "", "D"]

list2=list(map(lambda x: "???" if not x else x,list1))

print(list2)

Here is a longer version of the above one:

list1 = ["A", "", "", "D"]
def check_string(string):
    if not string:
        return "???"
    return string

list2=list(map(check_string,list1))
print(list2)

Taking advantage of the fact that "" strings are False value, you can then use implicit booleanness and return the value respectively. Output:

['A', '???', '???', 'D']
Sign up to request clarification or add additional context in comments.

Comments

2

For concision (if we allow list comprehensions, which are a form of loop). Also, as noted correctly by @ComteHerappait, this is to replace empty strings with '???', consistent with the examples of the question.

>>> [e or '???' for e in l]
['A', '???', '???', 'D']

If instead we focus on replacing duplicate elements, then:

seen = set()
newl = ['???' if e in seen or seen.add(e) else e for e in l]
>>> newl
['A', '', '???', 'D']

Finally, the following replaces all duplicates in a list:

from collections import Counter

c = Counter(l)
newl = [e if c[e] < 2 else '???' for e in l]
>>> newl
['A', '???', '???', 'D']

3 Comments

this works very well for removing empty strings, but I think the question is about duplicates.
you are correct; the question is ambiguous, see my comment.
Just FWIW, this updated answer responds to all the cases of the OP's question: replacement of empty strings, replacement of duplicates (starting from the first dupe), or replacement of all duplicates. The list comprehension (first code snippet) is also the fastest solution so far, both for short lists and long lists.
1

You could use a list comprehension, but what you'd do is compare each element, and if its a match replace with a different string, otherwise just keep the original element.

>>> data = ["A", "", "", "D"]
>>> ['???' if i == '' else i for i in data]
['A', '???', '???', 'D']

2 Comments

That works but contains an explicit 'for' loop which is what the OP wanted to avoid
@DarkKnight What do you think map does under the hood ;) there is no solution to this problem that does not involve explicit or implicit looping
1

How about this:-

myList = ['A', '', '', 'D']
myMap = map(lambda i: '???' if i == '' else i, myList)
print(list(myMap))

...will result in:-

['A', '???', '???', 'D']

2 Comments

That looks a lot like my solution
You're right. We were obviously writing code coincidentally
-1

If you want to avoid using loops as the title suggests, one can use np.where instead of list-comprehension, and it's faster for large arrays:

data = np.array(["A", "", "", "D"], dtype='object')
index = np.where(data == '')[0]
data[index] = "???"
data.tolist()

and the result:

['A', '???', '???', 'D']

Speed test

for rep in [1, 10, 100, 1000, 10000]:
    data = ["A", "", "", "D"] * rep
    print(f'array of length {4 * rep}')
    print('np.where:')
    %timeit data2 = np.array(data, dtype='object'); index = np.where(data2 == '')[0]; data2[index] = "???"; data2.tolist()
    print('list-comprehension:')
    %timeit ['???' if i == '' else i for i in data]

and the result:

array of length 4
np.where:
The slowest run took 11.79 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 10.7 µs per loop
list-comprehension:
The slowest run took 5.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 487 ns per loop
array of length 40
np.where:
The slowest run took 7.08 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 13 µs per loop
list-comprehension:
100000 loops, best of 5: 2.99 µs per loop
array of length 400
np.where:
The slowest run took 4.83 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 5: 31 µs per loop
list-comprehension:
10000 loops, best of 5: 26 µs per loop
array of length 4000
np.where:
1000 loops, best of 5: 225 µs per loop
list-comprehension:
1000 loops, best of 5: 244 µs per loop
array of length 40000
np.where:
100 loops, best of 5: 2.27 ms per loop
list-comprehension:
100 loops, best of 5: 2.63 ms per loop

for arrays longer than 4000 np.where is faster.

8 Comments

this is one of the slowest methods for short lists; For the four-element list of the OP question, it takes 7.89 µs ± 237 ns per loop, which is 23.8x slower than a simple list comprehension. For large lists (that are not yet as np.array), the relative difference decreases; it asymptotically stabilizes to around 1.9x slower.
@PierreD check out the updated post; for large arrays this method is faster
you used the wrong list comprehension. The one I proposed is [e or '???' for e in data]. That ends up at 1.9x faster than np.where in your loop of %timing: np.where: 1.83 ms ± 1.43 µs; list comprehension: 959 µs ± 735 ns. Before writing my comment, I had tested up to 100 million random elements. That's why I asserted 1.9x asymptotic speedup against np.where.
what do you mean by wrong? The list-comprehension I compared with is the solution to identical elements as the title of OP suggests (and as can be seen in other answers). Yours just works for empty elements.
You used %timeit ['???' if i == '' else i for i in data]. That replaces only empty elements, just like most of the answers here. For the case of empty elements, I suggested [e or '' for e in data], which is between 28x and 1.9x faster than np.array and np.where. That's why I say you used the wrong list comprehension. As far as removing duplicates, the other parts of my answer address that. I note that it seems to be the only answer so far that does it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.