4

I have noticed that it may be caused by beautifulsoup or recursive data structure. however, the data structure that cause error seems no problem:

class Movie:
def __init__(self, name="", dscore=0, mscore=0, durl="", murl=""): 
    self.name = name
    self.dscore = float(dscore)
    self.mscore = float(mscore)
    self.durl = durl
    self.murl = murl
def __str__(self):
    return unicode(self.name) + u' / ' + unicode(self.dscore) + u' / ' + unicode(self.mscore) \
        + u' / ' + unicode(self.durl) + u' / ' + unicode(self.murl)

The statement causing the problem is:

DataDict['MovieInfo'] = MovieInfo

and

pickle.dump(DataDict, f, True)

following is the function:

def SaveData():
global LinkUrlQueue
global MovieSet
global MovieInfo
global LinkUrlSet
global MovieUrlQueue
DataDict = {}
DataDict['LinkUrlSet'] = LinkUrlSet
DataDict['MovieSet'] = MovieSet
#DataDict['MovieInfo'] = MovieInfo
DataDict['LinkUrlQueue'] = LinkUrlQueue
DataDict['MovieUrlQueue'] = MovieUrlQueue
f = open('MovieInfo.txt', 'wb')

for item in MovieInfo:
    f.write(item.__str__().encode('utf8') + '\n'.encode('utf8'))
f.close()
try:
    print 'saving data...'
    f = open('spider.dat', 'wb')
    pickle.dump(DataDict, f, True)
    f.close()
except IOError as e:
    print 'IOError, error no: %d' % e.no
    print 'saved to spider2.dat'
    pickle.dump(DataDict, open('spider2.dat', 'wb'))
    time.sleep(10)

my complete source code:

spider.py: http://paste.ubuntu.com/7149731/

fetch.py: http://paste.ubuntu.com/7149732/

You can just download and run.

Besides, welcome any coding style suggestions

3
  • Would help to provide SSCCE that reproduces the problem. Commented Mar 25, 2014 at 5:39
  • I prefer PEP8 style. Functions are all lowercase, classes have initial upper case. Commented Mar 25, 2014 at 5:42
  • Don't return a unicode object from the __str__ method. Return it from __unicode__ instead. Commented Mar 25, 2014 at 5:47

1 Answer 1

9

Well... I finally solve the problem by myself...

The reason for this problem is that pickle cannot handle BEAUTIFULSOUP!!! Generally, it cannot handle html parser.

I realize that when passing arguments into my functions, I should convert them into str() or unicode() then do assignments, instead of remaining them as beautifulsoup objects...

thanks for everyone~

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.