1

I am trying to learn python and I wanted to write a text-parser. I try to parse a large fasta-file full of dna-strings (it is 136275 lines long and has the size of 9.8MB). My problem is that the program always stops working at an exact position (line 16076) and doesn't throw an error.

def file_parser(filepath):
  data = []
  file_content = open(filepath, 'r')
  line = file_content.readline()
  i=0
  while line:
    if line == 0:
      break
    elif line[0] == ">":
      key, name = line.split('|')[-2:]
      dna = ''
      line = file_content.readline()
      i = i+1
      while not line.startswith('>'): #line[0] != ">": #
        dna = dna + line
        line = file_content.readline()
      dna = dna.rstrip('\n')
      name = name.rstrip('\n')
      row = {
        key, 
        name, 
        dna
      }
      data.append(row)
      print(i)
    else:
      print("Your file is corrupted")
  return data

So my question is (as a beginner to writing python) whats wrong with my code that it stops working? I assume that it could be the line.startswith('>') because I switched it to that because I had some string index out of range errors before but to be honest I'm not really sure.

My test-file comes from this source: ftp://ftp.ncbi.nih.gov/genomes/Acanthisitta_chloris/protein/ (its the .fa.gz-file) I use the a slightly customized Ubuntu 18.10 and python3.

Thanks for your time.

13
  • Don't say a file is "large" without supporting that with a concrete number. I don't want to download a potentially very large file just to check. Commented Nov 3, 2018 at 18:49
  • @usr2564301 Ohh, yeah, forgot that, thank you. Commented Nov 3, 2018 at 18:52
  • 1
    Thanks! 9.8 Mb is not large at all (I process mutilples of that with eaze), so it should not be stressing Python, or your system in general. Commented Nov 3, 2018 at 18:56
  • 1
    Don't use readline() and a while loop. Just loop over the file object to get lines: for line in file_content:, and you can get additional lines in the loop with next(file_content). Commented Nov 3, 2018 at 19:00
  • Are your running this on Windows perhaps? What Python version are you using? Commented Nov 3, 2018 at 19:00

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.