I have a file that looks something like
geneA geneB 134
geneC geneF 395
geneH geneD 958
geneF geneC 395
geneB geneA 134
geneD geneH 958
I would like to remove the lines that have the same genes (that are in opposite order) so I just get
geneA geneB 134
geneC geneF 395
geneH geneD 958
I have this so far, but I get even more duplicates when I try using replace() or an if not statement. Any ideas on how I could change this?
with open(filename, 'r') as handle, open(outfilename, 'a') as w:
for line in handle:
element = line.split()
gene1 = element[0]
gene2 = element[1]
for line in handle:
matchingelement = line.split()
gene3 = matchingelement[0]
gene4 = matchingelement[1]
if gene3 == gene2 and gene4 == gene1:
"""Remove the line"""