TypeError: 'in ' requires string as left operand, not generator in Python

Question

I'm trying to parse tweets data.

My data shape is as follows:

59593936 3061025991 null null <d>2009-08-01 00:00:37</d> <s>&lt;a href="http://help.twitter.com/index.php?pg=kb.page&amp;id=75" rel="nofollow"&gt;txt&lt;/a&gt;</s> <t>honda just recalled 440k accords...traffic around here is gonna be light...win!!</t> ajc8587 15 24 158 -18000 0 0 <n>adrienne conner</n> <ud>2009-07-23 21:27:10</ud> <t>eastern time (us &amp; canada)</t> <l>ga</l>
22020233 3061032620 null null <d>2009-08-01 00:01:03</d> <s>&lt;a href="http://alexking.org/projects/wordpress" rel="nofollow"&gt;twitter tools&lt;/a&gt;</s> <t>new blog post: honda recalls 440k cars over airbag risk http://bit.ly/2wsma</t> madcitywi 294 290 9098 -21600 0 0 <n>madcity</n> <ud>2009-02-26 15:25:04</ud> <t>central time (us &amp; canada)</t> <l>madison, wi</l>

I want to get the total numbers of tweets and the numbers of keyword related tweets. I prepared the keywords in text file. In addition, I wanna get the tweet text contents, total number of tweets which contain mention(@), retweet(RT), and URL (I wanna save every URL in other file).

So, I coded like this.

import time
import os

total_tweet_count = 0
related_tweet_count = 0
rt_count = 0
mention_count = 0
URLs = {}

def get_keywords(filepath, mode):
    with open(filepath, mode) as f:
        for line in f:
            yield line.split().lower()

for line in open('/nas/minsu/2009_06.txt'):
    tweet = line.strip().lower()

    total_tweet_count += 1

    with open('./related_tweets.txt', 'a') as save_file_1:
        keywords = get_keywords('./related_keywords.txt', 'r')

        if keywords in line:
            text =  line.split('<t>')[1].split('</t>')[0]

            if 'http://' in text:
                try:
                    url = text.split('http://')[1].split()[0]
                    url = 'http://' + url

                    if url not in URLs:
                        URLs[url] = []
                    URLs[url].append('\t' + text)

                    save_file_3 = open('./URLs_in_related_tweets.txt', 'a')
                    print >> save_file_3, URLs

                except:
                    pass

            if '@' in text:
                mention_count +=1

            if 'RT' in text:
                rt_count += 1

            related_tweet_count += 1

            print >> save_file_1, text

    save_file_2 = open('./info_related_tweets.txt', 'w')

print >> save_file_2, str(total_tweet_count) + '\t' + srt(related_tweet_count) + '\t' + str(mention_count) + '\t' + str(rt_count)

save_file_1.close()
save_file_2.close()
save_file_3.close()

Following is the sample keywords

Depression
Placebo
X-rays
X-ray
HIV
Blood preasure
Flu
Fever
Oral Health
Antibiotics
Diabetes
Mellitus
Genetic disorders

I think my code has many problem, but the first error is as follws:

Traceback (most recent call last): File "health_related_tweets.py", line 23, in if keywords in line: TypeError: 'in ' requires string as left operand, not generator

Please help me out!

I think you need to use a regex. It is THE tool to use when one wants to extract data from a text. See module re — eyquem
– eyquem, Commented Oct 2, 2011 at 17:16

varunl · Accepted Answer · 2011-10-02 17:05:30Z

2

The reason is that keywords = get_keywords(...) returns a generator. Logically thinking about it, keywords should be a list of all the keywords. And for each keyword in this list, you want to check if it's in the tweet/line or not.

Sample code:

keywords = get_keywords('./related_keywords.txt', 'r')
has_keyword = False
for keyword in keywords:
  if keyword in line:
    has_keyword = True
    break
if has_keyword:
  # Your code here (for the case when the line has at least one keyword)

(The above code would be replacing if keywords in line:)

answered Oct 2, 2011 at 17:05

varunl

20.4k5 gold badges33 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

ooozooo Over a year ago

I got another error. (Traceback (most recent call last): File "health_related_tweets.py", line 25, in <module> for keyword in keywords: File "health_related_tweets.py", line 13, in get_keywords yield line.split().lower() AttributeError: 'list' object has no attribute 'lower') I thought that I need to transform the keywords and tweets which will be parsed in lower case for parsing. So I put the ".lower" in my code. But it makes error.... . How should I fix it?

varunl Over a year ago

Again that makes sense. line.split() will give you a list (of strings), and lower() works on a string. Can you give me a sample related_keywords.txt.

ooozooo Over a year ago

the related_keywords.txt contains words like this: Dentist Depression Placebo X-rays X-ray HIV Blood preasure Flu (These're divided by enter. I mean that each word such as HIV and X-ray or phrase such as Boold preasure is written in a line. So I split it by ".split()")

ooozooo Over a year ago

I put the sample keywords in the main text! Thanks for your help!

varunl Over a year ago

Great. And you would ideally not require the split function because you dont want to split a word like "blood pressure" into ["blood", "pressure"]. you are looking for the whole word in the text.

|

Collectives™ on Stack Overflow

TypeError: 'in ' requires string as left operand, not generator in Python

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related