Extract rows based on values from text file using Python

Question

I have a list of information in file A that I want to extract according to the numbering in file B. If given the value 4 and 5, all the 4th column in file A with the value 4 and 5 will be extracted. May I know how can I do this using python? can anyone help me? The code below only extract based on index that have value 4.

with open("B.txt", "rt") as f:
    classes = [int(line) for line in f.readlines()]
    with open("A.txt", "rt") as f:
        lines = [line for index, line in enumerate(f.readlines()) if classes[index]== 4]
        lines_all= "".join(lines)

with open("C.txt", "w") as f:
        f.write(lines_all)

A.txt

hg17_ct_ER_ER_1003  36  42  1
hg17_ct_ER_ER_1003  109 129 2
hg17_ct_ER_ER_1003  110 130 2
hg17_ct_ER_ER_1003  129 149 2
hg17_ct_ER_ER_1003  130 150 2
hg17_ct_ER_ER_1003  157 163 3
hg17_ct_ER_ER_1003  157 165 3
hg17_ct_ER_ER_1003  179 185 4
hg17_ct_ER_ER_1003  197 217 5
hg17_ct_ER_ER_1003  220 226 6

B.txt

4
5

Desired output

hg17_ct_ER_ER_1003  179 185 4
hg17_ct_ER_ER_1003  197 217 5

A good start would be to go to the documentation for CSV and pandas. Both are Python modules for you to look at which would help you. — Alex Huszagh
– Alex Huszagh, Commented Jun 12, 2015 at 3:04
Okay. With the edit you have posted a complete question. My vote for you. — Bhargav Rao
– Bhargav Rao, Commented Jun 12, 2015 at 3:12

Padraic Cunningham · Accepted Answer · 2015-06-12 03:20:58Z

3

create a set of the lines/numbers from the b file the compare the last element from each row in f1 to the elements in the set:

import  csv    
with open("a.txt") as f, open("b.txt") as f2:
    st = set(line.rstrip() for line in f2)
    r = csv.reader(f,delimiter=" ")
    data = [row for row in r if row[-1] in st]
    print(data)

[['hg17_ct_ER_ER_1003', '179', '185', '4'], ['hg17_ct_ER_ER_1003', '197', '217', '5']]

set delimiter= to whatever it is or don't set it at all if your file is comma separated.

Or:

with open("a.txt") as f, open("b.txt") as f2:
    st = set(line.rstrip() for line in f2)
    data = [line.rstrip() for line in f if line.rsplit(None, 1)[1] in st ]
    print(data)
['hg17_ct_ER_ER_1003 179 185 4', 'hg17_ct_ER_ER_1003 197 217 5']

edited Jun 12, 2015 at 3:20

answered Jun 12, 2015 at 3:13

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

liushuaikobe Over a year ago

The output is not he wanted. See my answer.

Padraic Cunningham Over a year ago

@liushuaikobe, what are you talking about exactly? Your code is just a less efficient version of what I provided

liushuaikobe Over a year ago

There's no need to import csv and, the output of your first approach is a list nested list, which isn't the desired output.

Padraic Cunningham Over a year ago

@zehnpaard, we only split once from the right as opposed to splitting on every whitespace i.e 'hg17_ct_ER_ER_1003 197 217 5' -> ['hg17_ct_ER_ER_1003 197 217 ,'5']

zehnpaard Over a year ago

Right, again more efficient given we're looking at the last column, thanks for the clarification.

|

Community · Accepted Answer · 2017-05-23 12:22:20Z

0

with open("B.txt", "r") as target_file:
    target = [i.strip() for i in target_file]

with open("A.txt", "r") as data_file:
    r = filter(lambda x: x.strip().rsplit(None, 1)[1] in target, data_file)

print "".join(r)

the output:

hg17_ct_ER_ER_1003  179 185 4
hg17_ct_ER_ER_1003  197 217 5

As mentioned by @Padraic, I change the split()[-1] to rsplit(None, 1)[1].

edited May 23, 2017 at 12:22

CommunityBot

11 silver badge

answered Jun 12, 2015 at 3:16

liushuaikobe

2,1901 gold badge25 silver badges28 bronze badges

Collectives™ on Stack Overflow

Extract rows based on values from text file using Python

2 Answers 2

8 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related