0

I am parsing a tab separated file where the first element is a twitter hashtag and the second element is the tweet contents.

My input file looks like:

#trumpisanabuser    of young black men . calling for the execution of the innocent !url "
#centralparkfiv of young black men . calling for the execution of the innocent !url "
#trumppence16   "
#trumppence16   "
#america2that   @user "

and My code does is to filter out the duplicate contents such as retweets by checking if the second tab-sepearted element is a duplicate.

import sys
import csv

tweetfile = sys.argv[1]
tweetset = set()
with open(tweetfile, "rt") as f:
    reader = csv.reader(f, delimiter = '\t')
    for row in reader:
       print("hashtag: " + str(row[0]) + "\t" + "tweet: " + str(row[1]))
       row[1] = row[1].replace("\\ n", "").rstrip()
       if row[1] in tweetset: 
          continue  
       temp = row[1].replace("!url","")
       temp = temp.replace("@user","")
       temp = "".join([c if c.isalnum() else "" for c in temp])
       if temp: 
           taglines.append(row[0] + "\t" + row[1])
       tweetset.add(row[1])

However, the parsing is done weird. When I print each parsed item, the output is as the following. Can anyone explain why the parsing breaks and caused this line to be printed (hashtag: #trumppence16 tweet:, newline, then #trumppence16)?

hashtag: #centralparkfive   tweet: of young black men . calling for the execution of the innocent !url "
hashtag: #trumppence16  tweet: 
#trumppence16   
hashtag: #america2that  tweet: @user "
1
  • you have unterminated quotes in the file Commented Jan 3, 2017 at 7:53

1 Answer 1

1

You have lines with " for the tweet. CSV can quote columns by quoting them with " around the value, including newlines. Everything from the opening " to the next closing " is a single column value.

You can disable quote handling by setting the quoting option to csv.QUOTE_NONE:

reader = csv.reader(f, delimiter='\t', quoting=csv.QUOTE_NONE)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.