1

I need some help figuring out how to split the words in a text file into a list. I can use something like this:

words = []
for line in open('text.txt'):
    line.split()
    words.append(line)

But if the file contains multiple lines of text, they are split into sublists, e.g.

this is the first line
this is the second line

Becomes:

[['this', 'is', 'the', 'first', 'line'], ['this', 'is', 'the', 'second', 'line']]

How do I make it so that they are in the same list? i.e.

[['this', 'is', 'the', 'first', 'line', 'this', 'is', 'the', 'second', 'line']]

thanks!

EDIT: This program will be opening multiple text files, so the words in each file need to be added to a sublist. So if a file has multiple lines, all the words from these lines should be stored together in a sublist. i.e. Each new file starts a new sublist.

3 Answers 3

3

You can use list comprehension, like this to flatten the list of words

[word for words in line.split() for word in words]

This is the same as writing

result = []
for words in line.split():
    for word in words:
       result.append(word)

Or you can use itertools.chain.from_iterable, like this

from itertools import chain
with open("Input.txt") as input_file:
    print list(chain.from_iterable(line.split() for line in input_file))
Sign up to request clarification or add additional context in comments.

1 Comment

I'm not quite sure how to implement this as my program does a regex substitution to words (if required) before they are added to the list, i.e. line in file is split into words, then regex check, then added new to list
3

Your code doesn't actually do what you say it does. line.split() just returns a list of words in the line, which you don't do anything with; it doesn't affect line in any way, so when you do words.append(line), you're just appending the original line, a single string.

So, first, you have to fix that:

words = []
for line in open('text.txt'):
    words.append(line.split())

Now, what you're doing is repeatedly appending a new list of words to an empty list. So of course you get a list of lists of words. This is because you're mixing up the append and extend methods of list. append takes any object, and adds that object as a new element of the list; extend takes any iterable, and adds each element of that iterable as separate new elements of the list.

And if you fix that too:

words = []
for line in open('text.txt'):
    words.extend(line.split())

… now you get what you wanted.

Comments

1

Not sure why you want to keep the [[]] but:

words = [open('text.txt').read().split()]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.