How to read in url in python and then print each URL on the website?

Question

I am trying to figure out how to only read in each line that is a url from a website, every time I run the code I get the error:

AttributeError: module 'urllib' has no attribute 'urlopen'

My code is below

import os
import subprocess
import urllib

datasource = urllib.urlopen("www.google.com")

while 1:
        line = datasource.readline()
        if line == "": break
        if (line.find("www") > -1) :
                print (line)


li = ['www.apple.com', 'www.google.com']
os.chdir('..')
os.chdir('..')
os.chdir('..')
os.chdir('Program Files (x86)\\LinkChecker')

for s in li:
    os.system('Start .\linkchecker ' + s)

afaik urllib.urlopen is python2 ... in python3 try urllib.request.urlopen — Joran Beasley
– Joran Beasley, Commented Jun 7, 2017 at 20:15

danglingpointer · Accepted Answer · 2017-06-07 20:18:00Z

1

This is very simple example.

This works in Python 3.2 and greater.

import urllib.request
with urllib.request.urlopen("http://www.apple.com") as url:
    r = url.read()
print(r)

For reference, go through this question. Urlopen attribute error.

answered Jun 7, 2017 at 20:18

danglingpointer

4,9563 gold badges27 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

itzMEonTV · Accepted Answer · 2017-06-07 20:19:12Z

0

Seems python3X, so you should use

urllib.request.urlopen

edited Jun 7, 2017 at 20:19

answered Jun 7, 2017 at 20:16

itzMEonTV

20.4k4 gold badges44 silver badges53 bronze badges

2 Comments

DYZ Over a year ago

Must be datasource = urllib.request.urlopen("http://www.google.com") (urllib.request.urlopen does not add "http://")

itzMEonTV Over a year ago

removed that part. OP will understand that once it works :)

Steffi Keran Rani J · Accepted Answer · 2018-02-12 18:19:04Z

The AttributeError was because it should be urllib.request.urlopen instead of urllib.urlopen.

Apart from the AttributeError mentioned in the question, I faced 2 more errors.

ValueError: unknown url type: 'www.google.com'

Solution: Rewrite the line defining datasource as follows where the https part is included:

datasource = urllib.request.urlopen("https://www.google.com")
TypeError: a bytes-like object is required, not 'str' in the line ' if (line.find("www") > -1) :`.

The overall solution code is:

import os
import urllib

datasource = urllib.request.urlopen("https://www.google.com")

while 1:
        line = str(datasource.read())
        if line == "": break
        if (line.find("www") > -1) :
                print (line)

li = ['www.apple.com', 'www.google.com']
os.chdir('..')
os.chdir('..')
os.chdir('..')
os.chdir('Program Files (x86)\\LinkChecker')

for s in li:
    os.system('Start .\linkchecker ' + s)

Collectives™ on Stack Overflow

How to read in url in python and then print each URL on the website?

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related