I have a list of URLs and I want to save each of their targets in a separate text file.
Here's an example of the input file containing the URLs:
~$: head -3 url.txt
http://www.uniprot.org/uniprot/P32234.txt
http://www.uniprot.org/uniprot/P05552.txt
http://www.uniprot.org/uniprot/P07701.txt
I'm currently using a Python custom function to accomplish this task. It works, but the main inconvenient are: user has to copy-paste URLs manually (there's no direct file input) and the output contains some 'b' characters at the beginning of each line (?binary).
~$: head -3 P32234.txt
b' ID 128UP_DROME Reviewed; 368 AA.
'b' AC P32234; Q9V648;
'b' DT 01-OCT-1993, integrated into UniProtKB/Swiss-Prot.
Here's the Python code:
def html_to_txt():
import urllib.request
url = str(input('Enter URL: '))
page = urllib.request.urlopen(url)
with open(str(input('Enter filename: ')), "w") as f:
for x in page:
f.write(str(x).replace('\\n','\n'))
s= 'Done'
return s
Is there a cleaner way of doing this using some Unix utilities?
b'strings with a b prefix'are Python 3 bytes objects. Such data should be written to a file opened in binary mode, and not converted tostr.