0

I am doing a data cleaning task using Python and reading from a text file which contains several sentences. After tokenizing the text file I keep getting a list with the tokens for each sentence as follows:

[u'does', u'anyone', u'think', u'that', u'we', u'have', u'forgotten', u'the', u'days', u'of', u'favours', u'for', u'the', u'pn', u's', u'party', u's', u'friends', u'of', u'friends', u'and', u'paymasters', u'some', u'of', u'us', u'have', u'longer', u'memories']

[u'but', u'is', u'the', u'value', u'at', u'which', u'vassallo', u'brothers', u'bought', u'this', u'property', u'actually', u'relevant', u'and', u'represents', u'the', u'actual', u'value', u'of', u'the', u'property']

[u'these', u'monsters', u'are', u'wrecking', u'the', u'reef', u'the', u'cargo', u'vessels', u'have', u'been', u'there', u'for', u'weeks', u'and', u'the', u'passenger', u'ship', u'for', u'at', u'least', u'24', u'hours', u'now', u'https', u'uploads', u'disquscdn', u'com'].

The code I am doing is the following:

with open(file_path) as fp:
    comments = fp.readlines()

    for i in range (0, len(comments)):

        tokens = tokenizer.tokenize(no_html.lower())
        print tokens

Where no_html is the text file without any html tags. Is there anyone who could tell me how to get all these tokens into one list please ?

1 Answer 1

1

Instead of using comments = fp.readlines(), try comments = fp.read() instead.

What readlines does is it reads all the lines of a file and returns them in a list.

Another thing you can do is you can just join all the tokenized results into a single list.

all_tokens = []
for i in range (0, len(comments)):

        tokens = tokenizer.tokenize(no_html.lower())
        #print tokens
        all_tokens.extend(tokens)

print all_tokens
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much! I used read() instead of readlines() and managed to fix my problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.