2

I have large file comprising ~100,000 lines. Each line corresponds to a cluster and each entry within each line is a reference i.d. for another file (protein structure in this case), e.g.

1hgn 1dju 3nmj 8kfn
9opu 7gfb 
4bui

I need to read in the file as a list of lists where each line is a sublist, thus preserving the integrity of the cluster, e.g.

nested_list = [['1hgn', '1dju', '3nmj', '8kfn'], ['9opu', '7gfb'], ['4bui']]

My current code creates a nested list but the entries within each list are a single string and not comma separated. Therefore, I cannot splice the list with indices so easily.

Any help greatly appreciated.

Thanks, S :-)

2 Answers 2

13

Super simple:

with open('myfile', 'r') as f:
    data = [line.split() for line in f]
Sign up to request clarification or add additional context in comments.

1 Comment

Nope - that will do exactly what the OP asked. Yay Python & batteries included.
6

You'll want to investigate the str.split() method.

>>> '1hgn 1dju 3nmj 8kfn'.split()
['1hgn', '1dju', '3nmj', '8kfn']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.