1

I am late convert to Python 3. I am trying to process output from a REST api for protein sequences using urllib.

In legacy python I could use:

self.seq_fileobj = urllib2.urlopen("http://www.uniprot.org/uniprot/{}.fasta".format(uniprot_id))
self.seq_header = self.seq_fileobj.next()
print "Read in sequence information for {}.".format(self.seq_header[:-1])
self.sequence = [achar for a_line in self.seq_fileobj for achar in a_line if achar != "\n"]
print("Sequence:{}\n".format("".join(self.sequence)))

For the same section of code in python 3, I use:

context = ssl._create_unverified_context()
self.seq_fileobj = urllib.request.urlopen("https://www.uniprot.org/uniprot/{}.fasta".format(uniprot_id),context=context)
self.seq_header = next(self.seq_fileobj)
print("Read in sequence information for {}.".format(self.seq_header.rstrip()))
self.b_sequence = [str(achar).encode('utf-8') for a_line in self.seq_fileobj for achar in a_line]
self.sequence = [chr(int(x)) for x in self.b_sequence]

I have read a little about string encoding and decoding to modify my list comprehension for python 3:

self.b_sequence = [str(achar).encode('utf-8') for a_line in self.seq_fileobj for achar in a_line]
self.sequence = [chr(int(x)) for x in self.b_sequence]

Although my code is working- is this the best way to achieve this result where I go from an array of bytes of ascii characters encoded with utf-8 to their resulting strings?. The chr(int(x)) bit is what seems un pythonic to me and I fear I may be missing something.

1 Answer 1

1

You don't need to convert the bytes to strings on a character-to-character basis. Since you want to strip out the newline characters, you can instead read the entire file as bytes, convert the bytes to strings with the decode method (which defaults to the utf-8 encoding as you are using) and remove the newline characters using the str.replace method:

self.sequence = list(self.seq_fileobj.read().decode().replace('\n', ''))
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you..this is so much cleaner and I now understand now that decode basically gets it done inside the list comprehension.
A string itself is a sequence and can be accessed like a list. There is no need to convert it to a list unless the content needs to be modified.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.