1

I'm learning a web scraping technique from series of tutorials from Chris Reeves. Really great stuff, you should check it out.

I ran onto issue with example from tutorial no. 10 where Chris explains connection to mySQL database. First I had problem with not commiting a value to table in database. Then in comments I've discovered that I'm missing conn.commit() which author of video does not include in his program. I've added this part of code to my program and that is working great, now it looks like this:

from threading import Thread
import urllib
import re
import MySQLdb

conn = MySQLdb.connect(host="127.0.0.1",port=3307,user="root",passwd="root",db="stock_data")

query = "INSERT INTO tutorial (symbol) values('AAPL')"
x = conn.cursor()
x.execute(query)
conn.commit()
row = x.fetchall()

It connects to my local database, and it adds AAPL to table tutorial under column symbol successfully.

My problems started on second part of Chris's tutorial where you suppose to add multithreaded part of code which reads 4-lettered symbols from external .txt file and add's everything into same database.

Now when my program looked like this:

from threading import Thread
import urllib
import re
import MySQLdb

gmap = {}

def th(ur):
    base = "http://finance.yahoo.com/q?s="+ur
    regex = '<span id="yfs_l84_'+ur.lower()+'">(.+?)</span>'
    pattern = re.compile(regex)
    htmltext = urllib.urlopen(base).read()
    results = re.findall(pattern,htmltext)
    try:
        gmap[ur] = results [0]
    except:
        print "got an error"

symbolslist = open("threads/symbols.txt").read()
symbolslist = symbolslist.replace(" ","").split(",")

print symbolslist

threadlist = []

for u in symbolslist:
    t = Thread(target=th,args=(u,))
    t.start()
    threadlist.append(t)

for b in threadlist:
    b.join()

conn = MySQLdb.connect(host="127.0.0.1",port=3307,user="root",passwd="root",db="stock_data")

for key in gmap.keys():
    print key,gmap[key]
    query = "INSERT INTO tutorial (symbol,last) values("
    query = query+"'"+key+"',"+gmap[key]+")"
    x = conn.cursor()
    x.execute(query)
    conn.commit()
    row = x.fetchall()

which is almost exactly like Chris example (except I don't use external login data, but direct in code, but that is not a problem), I'm getting error for all threads and they look like this:

Exception in thread Thread-474:
Traceback (most recent call last):
  File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner
    self.run()
  File "C:\Python27\lib\threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "threads/threads2.py", line 12, in th
    htmltext = urllib.urlopen(base).read()
  File "C:\Python27\lib\urllib.py", line 87, in urlopen
    return opener.open(url)
  File "C:\Python27\lib\urllib.py", line 208, in open
    return getattr(self, name)(url)
  File "C:\Python27\lib\urllib.py", line 345, in open_http
    h.endheaders(data)
  File "C:\Python27\lib\httplib.py", line 969, in endheaders
    self._send_output(message_body)
  File "C:\Python27\lib\httplib.py", line 829, in _send_output
    self.send(msg)
  File "C:\Python27\lib\httplib.py", line 791, in send
    self.connect()
  File "C:\Python27\lib\httplib.py", line 772, in connect
    self.timeout, self.source_address)
  File "C:\Python27\lib\socket.py", line 571, in create_connection
    raise err
IOError: [Errno socket error] [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

And this is, as I said, just one error, for Thread-474, but I'm getting it for multiple threads in IDE, as so for Thread-441, Thread-390, Thread-391 etc...

What am I missing? is it something in the code or in my setup of MySql server? Because according to everything in Chris example, it should work

Help anyone?

1
  • good day dear dzordz many many thanks for all well where did you add timeout = 10 socket.setdefaulttimeout(timeout) love to hear from you Commented May 24, 2021 at 19:34

2 Answers 2

1

Your threads are trying to access a website and have nothing to do with the database; therefore your issues is not with the setup of your db (and you've already tried and confirmed that it works) but with your internet connection.

Are you sure you have a network connection and have set the right proxies etc?

Sign up to request clarification or add additional context in comments.

7 Comments

I'm on wirelles connection, could that be a problem? I'm not sure what do I need to set up in proxies and how.... I'm able to access to any website via ordinary browser
1. Can you check if the site you are trying to access can be accessed from your browser (finance.yahoo.com) 2. Can you ping this site from the command line (from where you arer running python)
1. I can access to finance.yahoo.com successfully from browser 2. I'm running my program from Python Shell, but If if ping finance.yahoo.com from windows command prompt it gives me response, so it looks from there that everything is ok
Do you use a proxy? If yes, see if this helps... stackoverflow.com/questions/16936165/…
By the way, I checked your web scraping code and it works as expected :-)
|
0

it seems that I was having problem with socket timeout....

I have added

timeout = 10
socket.setdefaulttimeout(timeout)

before definition and it worked as supposed to! :)

2 Comments

Hey, I'm also using Mr. Reeves tutorial's as my work wants something with scraping used. Where do you add this in the code?
hi there - i am a big big fan of Mr. Reeves Tutorials - i love them - where can i find the code

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.