How to count frequencies/occurences of all values within a string

Question

I need a count of all the emails in a list, some of the emails however are consolidated together with a | symbol. These need to be split and the emails need to be counted after splitting to avoid getting an inaccurate or low count of frequencies.

I have a list that is something like this:

test = ['[email protected]', '[email protected]|[email protected]', '[email protected]|[email protected]', '[email protected]', '[email protected]']

I performed a set of operations to split and when I split, the pipe gets replaced by double quotes at that location so I replace the double with single quotes so I have all email ids enclosed in single quotes.

# convert list to a string
test_str = str(test)

# apply string operation to split by separator '|'
test1 = test_str.split('|')
print(test1)

--> OUTPUT of above print statement:   ["['[email protected]', '[email protected]", "[email protected]', '[email protected]", "[email protected]', '[email protected]', '[email protected]']"]

test2 = str(test1)
test3 = test2.replace('"','')
print(test3)

--> OUTPUT of above print statement: [['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']]

How can I now obtain a count of all the emails? This is a string essentially and if it's a list, I could use collections.Counter to easily obtain a count.

I'd like to get a list like the one listed below that has the email and the count in descending order of frequency

 ['[email protected]': 3, '[email protected]': 2, '[email protected]': 1, '[email protected]': 1]

Thanks for the help!

blhsing · Accepted Answer · 2019-10-02 05:19:23Z

1

You can use collections.Counter with a generator expression that iterates over the input list of strings and then iterates over the sub-list of emails by splitting the strings. Use the most_common method to ensure a descending order of counts:

from collections import Counter
dict(Counter(e for s in test if s for e in s.split('|')).most_common())

This returns:

{'[email protected]': 3, '[email protected]': 2, '[email protected]': 1, '[email protected]': 1}

edited Oct 2, 2019 at 5:19

answered Oct 2, 2019 at 5:13

blhsing

109k9 gold badges88 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

nlp Over a year ago

That's great. I have one question. What if the list consists of a NoneType value. test = ['[email protected]', '[email protected]|[email protected]', None, '[email protected]|[email protected]', '[email protected]', '[email protected]'] How can I get rid of it and use the above function?

nlp Over a year ago

This works btw if there's no None value. Thank you so much.

blhsing Over a year ago

I've updated my answer with a filter that filters out None items then.

Eugene Prikazchikov · Accepted Answer · 2019-10-02 06:33:37Z

1

What about iterating over the list and calling counter.update on every string? Like this:

test = ['[email protected]', '[email protected]|[email protected]', '[email protected]|[email protected]', '[email protected]', '[email protected]']
c = Counter()
for email_str in test:
    if email_str:
        c.update(email_str.split('|'))
res = c.most_common()

edited Oct 2, 2019 at 6:33

answered Oct 2, 2019 at 5:21

Eugene Prikazchikov

1,9341 gold badge16 silver badges11 bronze badges

2 Comments

nlp Over a year ago

that works too for the current list! Thanks! however if there's a None value that would not work. How can I make it work with a None type value in my list?

nlp Over a year ago

super thanks so much! I appreciate the help and quick response. Got to finish my code!

Collectives™ on Stack Overflow

How to count frequencies/occurences of all values within a string

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related