1

I need a count of all the emails in a list, some of the emails however are consolidated together with a | symbol. These need to be split and the emails need to be counted after splitting to avoid getting an inaccurate or low count of frequencies.

I have a list that is something like this:

test = ['[email protected]', '[email protected]|[email protected]', '[email protected]|[email protected]', '[email protected]', '[email protected]']

I performed a set of operations to split and when I split, the pipe gets replaced by double quotes at that location so I replace the double with single quotes so I have all email ids enclosed in single quotes.

# convert list to a string
test_str = str(test)

# apply string operation to split by separator '|'
test1 = test_str.split('|')
print(test1)

--> OUTPUT of above print statement:   ["['[email protected]', '[email protected]", "[email protected]', '[email protected]", "[email protected]', '[email protected]', '[email protected]']"]

test2 = str(test1)
test3 = test2.replace('"','')
print(test3)

--> OUTPUT of above print statement: [['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']]

How can I now obtain a count of all the emails? This is a string essentially and if it's a list, I could use collections.Counter to easily obtain a count.

I'd like to get a list like the one listed below that has the email and the count in descending order of frequency

 ['[email protected]': 3, '[email protected]': 2, '[email protected]': 1, '[email protected]': 1]

Thanks for the help!

2 Answers 2

1

You can use collections.Counter with a generator expression that iterates over the input list of strings and then iterates over the sub-list of emails by splitting the strings. Use the most_common method to ensure a descending order of counts:

from collections import Counter
dict(Counter(e for s in test if s for e in s.split('|')).most_common())

This returns:

{'[email protected]': 3, '[email protected]': 2, '[email protected]': 1, '[email protected]': 1}
Sign up to request clarification or add additional context in comments.

3 Comments

That's great. I have one question. What if the list consists of a NoneType value. test = ['[email protected]', '[email protected]|[email protected]', None, '[email protected]|[email protected]', '[email protected]', '[email protected]'] How can I get rid of it and use the above function?
This works btw if there's no None value. Thank you so much.
I've updated my answer with a filter that filters out None items then.
1

What about iterating over the list and calling counter.update on every string? Like this:

test = ['[email protected]', '[email protected]|[email protected]', '[email protected]|[email protected]', '[email protected]', '[email protected]']
c = Counter()
for email_str in test:
    if email_str:
        c.update(email_str.split('|'))
res = c.most_common()

2 Comments

that works too for the current list! Thanks! however if there's a None value that would not work. How can I make it work with a None type value in my list?
super thanks so much! I appreciate the help and quick response. Got to finish my code!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.