2

I have a list of information in file A that I want to extract according to the numbering in file B. If given the value 4 and 5, all the 4th column in file A with the value 4 and 5 will be extracted. May I know how can I do this using python? can anyone help me? The code below only extract based on index that have value 4.

with open("B.txt", "rt") as f:
    classes = [int(line) for line in f.readlines()]
    with open("A.txt", "rt") as f:
        lines = [line for index, line in enumerate(f.readlines()) if classes[index]== 4]
        lines_all= "".join(lines)

with open("C.txt", "w") as f:
        f.write(lines_all)

A.txt

hg17_ct_ER_ER_1003  36  42  1
hg17_ct_ER_ER_1003  109 129 2
hg17_ct_ER_ER_1003  110 130 2
hg17_ct_ER_ER_1003  129 149 2
hg17_ct_ER_ER_1003  130 150 2
hg17_ct_ER_ER_1003  157 163 3
hg17_ct_ER_ER_1003  157 165 3
hg17_ct_ER_ER_1003  179 185 4
hg17_ct_ER_ER_1003  197 217 5
hg17_ct_ER_ER_1003  220 226 6

B.txt

4
5

Desired output

hg17_ct_ER_ER_1003  179 185 4
hg17_ct_ER_ER_1003  197 217 5
2
  • A good start would be to go to the documentation for CSV and pandas. Both are Python modules for you to look at which would help you. Commented Jun 12, 2015 at 3:04
  • 1
    Okay. With the edit you have posted a complete question. My vote for you. Commented Jun 12, 2015 at 3:12

2 Answers 2

3

create a set of the lines/numbers from the b file the compare the last element from each row in f1 to the elements in the set:

import  csv    
with open("a.txt") as f, open("b.txt") as f2:
    st = set(line.rstrip() for line in f2)
    r = csv.reader(f,delimiter=" ")
    data = [row for row in r if row[-1] in st]
    print(data)

[['hg17_ct_ER_ER_1003', '179', '185', '4'], ['hg17_ct_ER_ER_1003', '197', '217', '5']]

set delimiter= to whatever it is or don't set it at all if your file is comma separated.

Or:

with open("a.txt") as f, open("b.txt") as f2:
    st = set(line.rstrip() for line in f2)
    data = [line.rstrip() for line in f if line.rsplit(None, 1)[1] in st ]
    print(data)
['hg17_ct_ER_ER_1003 179 185 4', 'hg17_ct_ER_ER_1003 197 217 5']
Sign up to request clarification or add additional context in comments.

8 Comments

The output is not he wanted. See my answer.
@liushuaikobe, what are you talking about exactly? Your code is just a less efficient version of what I provided
There's no need to import csv and, the output of your first approach is a list nested list, which isn't the desired output.
@zehnpaard, we only split once from the right as opposed to splitting on every whitespace i.e 'hg17_ct_ER_ER_1003 197 217 5' -> ['hg17_ct_ER_ER_1003 197 217 ,'5']
Right, again more efficient given we're looking at the last column, thanks for the clarification.
|
0
with open("B.txt", "r") as target_file:
    target = [i.strip() for i in target_file]

with open("A.txt", "r") as data_file:
    r = filter(lambda x: x.strip().rsplit(None, 1)[1] in target, data_file)

print "".join(r)

the output:

hg17_ct_ER_ER_1003  179 185 4
hg17_ct_ER_ER_1003  197 217 5

As mentioned by @Padraic, I change the split()[-1] to rsplit(None, 1)[1].

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.