1

I am scrapping cricket test match details i have tested the results now i want to save it inside the file. while saving the html in file I am getting str object cannot be interedpreted as an integer

this is my code

for i in range(0, 2000):
    url = 'http://search.espncricinfo.com/ci/content/match/search.html?search=test;all=1;page=%s' %i
    html = requests.get(url)

    print ('Checking page %s of 2000' %(i+1))

    soupy = bs4.BeautifulSoup(html.text, 'html.parser')

    time.sleep(1)
    for new_host in soupy.findAll('a', {'class' : 'srchPlyrNmTxt'}):
        try:
            new_host = new_host['href']
        except:
            continue
        odiurl = BASE_URL + new_host
        new_host = odiurl
        print(new_host)
        html = requests.get(new_host).text
        with open('espncricinfo-fc/{0!s}'.format(str.split(new_host, "/")[4]), "wb") as f:
                f.write(html)

I am getting this error str object cannot be interedpreted as an integer

I am getting error in this line

with open('espncricinfo-fc/{0!s}'.format(str.split(new_host, "/")[4]), "wb") as f:

12
  • Where are you getting the error? (which line?) Commented Nov 29, 2018 at 5:25
  • last two lines i am getting error Commented Nov 29, 2018 at 5:26
  • What is BASE_URL? Commented Nov 29, 2018 at 5:36
  • You're writing in byte mode ("wb"), but I'm guessing you're trying to write str data rather than bytes. What happens if you change requests.get(new_host).text to requests.get(new_host).text.encode()? Commented Nov 29, 2018 at 5:39
  • @ShivamSingh BASE_URL = 'espncricinfo.com' Commented Nov 29, 2018 at 5:39

2 Answers 2

1

If you are using Python 3.x, try changing the last line to

f.write(bytes(html, 'UTF-8'))

Also try this,

new_host = str(new_host['href'])
Sign up to request clarification or add additional context in comments.

6 Comments

I am getting the same error as before 'str' object cannot be interpreted as an integer
Updated the answer
You need to fix the split function, It should be str.split(new_host, "/")[6] and BASE_URL = "http://espncricinfo.com"
Shivam done both of these above edit but still it is not saving in the file.
What are you getting exactly? Does 'espncricinfo-fc' folder exist?
|
0

The problem is your print statement. It should read

print('checking %d etc.' % (i + 1))

3 Comments

its working i have checked i have also removed this line but the error remains same..
What line number, does new_host as a class have a __str__ method. You print it and use str.split in the final line
What does soupy.findall('a', { ... }) return. A range or a list of strings. You set new_host to BASE_URL + new_host. After setting it to a dictionary lookup new_host['href']

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.