More pythonic to convert bytes to string while processing urllib response instead of chr(int(x))

Question

I am late convert to Python 3. I am trying to process output from a REST api for protein sequences using urllib.

In legacy python I could use:

self.seq_fileobj = urllib2.urlopen("http://www.uniprot.org/uniprot/{}.fasta".format(uniprot_id))
self.seq_header = self.seq_fileobj.next()
print "Read in sequence information for {}.".format(self.seq_header[:-1])
self.sequence = [achar for a_line in self.seq_fileobj for achar in a_line if achar != "\n"]
print("Sequence:{}\n".format("".join(self.sequence)))

For the same section of code in python 3, I use:

context = ssl._create_unverified_context()
self.seq_fileobj = urllib.request.urlopen("https://www.uniprot.org/uniprot/{}.fasta".format(uniprot_id),context=context)
self.seq_header = next(self.seq_fileobj)
print("Read in sequence information for {}.".format(self.seq_header.rstrip()))
self.b_sequence = [str(achar).encode('utf-8') for a_line in self.seq_fileobj for achar in a_line]
self.sequence = [chr(int(x)) for x in self.b_sequence]

I have read a little about string encoding and decoding to modify my list comprehension for python 3:

self.b_sequence = [str(achar).encode('utf-8') for a_line in self.seq_fileobj for achar in a_line]
self.sequence = [chr(int(x)) for x in self.b_sequence]

Although my code is working- is this the best way to achieve this result where I go from an array of bytes of ascii characters encoded with utf-8 to their resulting strings?. The chr(int(x)) bit is what seems un pythonic to me and I fear I may be missing something.

blhsing · Accepted Answer · 2019-10-06 15:56:04Z

1

You don't need to convert the bytes to strings on a character-to-character basis. Since you want to strip out the newline characters, you can instead read the entire file as bytes, convert the bytes to strings with the decode method (which defaults to the utf-8 encoding as you are using) and remove the newline characters using the str.replace method:

self.sequence = list(self.seq_fileobj.read().decode().replace('\n', ''))

edited Oct 6, 2019 at 15:56

answered Oct 6, 2019 at 15:09

blhsing

109k9 gold badges89 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

harijay Over a year ago

Thank you..this is so much cleaner and I now understand now that decode basically gets it done inside the list comprehension.

GZ0 Over a year ago

A string itself is a sequence and can be accessed like a list. There is no need to convert it to a list unless the content needs to be modified.

Collectives™ on Stack Overflow

More pythonic to convert bytes to string while processing urllib response instead of chr(int(x))

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related