5

How do I use python mechanize to retrieve a file from an aspnetForm submitControl that triggers an Excel file download when I don't know the file URL or file name?

URL of site with Excel file: http://www.ncysaclassic.com/TTSchedules.aspx?tid=NCFL&year=2012&stid=NCFL&syear=2012&div=U11M01

I'm trying to get the file downloaded by the Print Excel 'button'.

So far I have:

r = br.open('http://www.ncysaclassic.com/TTSchedules.aspx?tid=NCFL&year=2012&stid=NCFL&syear=2012&div=U11M01')
html = r.read()

# Show the html title
print br.title()

# Show the available forms
for f in br.forms():
    print f

br.select_form('aspnetForm')
print '\n\nSubmitting...\n'
br.submit("ctl00$ContentPlaceHolder1$btnExtractSched")

print 'Response...\n'
print br.response().info()
print br.response().read

print 'still alive...\n'

for prop, value in vars(br.response()).iteritems():
    print 'Property:', prop, ', Value: ', value

print 'myfile...\n' 

myfile = br.response().read

and I get this output:

    Submitting...

    Response...

Content-Type: application/vnd.ms-excel
Last-Modified: Thu, 27 Sep 2012 20:19:10 GMT
Accept-Ranges: bytes
ETag: W/"6e27615aed9ccd1:0"
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Thu, 27 Sep 2012 20:19:09 GMT
Connection: close
Content-Length: 691200

<bound method response_seek_wrapper.read of <response_seek_wrapper at 0x2db5248L whose wrapped object = <closeable_response at 0x2e811c8L whose fp = <socket._fileobject object at 0x0000000002D79930>>>>
still alive...

Property: _headers , Value:  Content-Type: application/vnd.ms-excel
Last-Modified: Thu, 27 Sep 2012 20:19:10 GMT
Accept-Ranges: bytes
ETag: W/"6e27615aed9ccd1:0"
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Thu, 27 Sep 2012 20:19:09 GMT
Connection: close
Content-Length: 691200

Property: _seek_wrapper__read_complete_state , Value:  [False]
Property: _seek_wrapper__have_readline , Value:  True
Property: _seek_wrapper__is_closed_state , Value:  [False]
Property: _seek_wrapper__pos , Value:  0
Property: wrapped , Value:  <closeable_response at 0x2e811c8L whose fp = <socket._fileobject object at 0x0000000002D79930>>
Property: _seek_wrapper__cache , Value:  <cStringIO.StringO object at 0x0000000002E8B0D8>

Seems I am very close...Note the Content-Type: application/vnd.ms-excel

I just don't know what to do next. Where is my file, and how do I get a pointer to it and save it locally for access later?

Update:

I used dir() to get a list of methods/attributes for the response() and then tried a couple of the methods...

print '\ndir(br.response())\n'
for each in dir(br.response()):
    print each

print '\nresponse info...\n'
print br.response().info()

print '\nresponse geturl\n'
print br.response().geturl()

and I get this output...

dir(br.response())

__copy__
__doc__
__getattr__
__init__
__iter__
__module__
__repr__
__setattr__
_headers
_seek_wrapper__cache
_seek_wrapper__have_readline
_seek_wrapper__is_closed_state
_seek_wrapper__pos
_seek_wrapper__read_complete_state
close
get_data
geturl
info
invariant
next
read
readline
readlines
seek
set_data
tell
wrapped
xreadlines

response info...

Date: Thu, 27 Sep 2012 20:55:02 GMT
ETag: W/"fa759b5df29ccd1:0"
Server: Microsoft-IIS/7.5
Connection: Close
Content-Type: application/vnd.ms-excel
X-Powered-By: ASP.NET
Accept-Ranges: bytes
Last-Modified: Thu, 27 Sep 2012 20:55:03 GMT
Content-Length: 691200


response geturl

http://www.ncysaclassic.com/photos/pdftemp/ScheduleExcel165502.xls

I think I already have this file in my br.response. I just don't know how to extract it! Please help.

2
  • I'm getting closer it seems... Commented Sep 27, 2012 at 20:58
  • These both worked for me: print '\nAttempting to write file 1...\n' # found this here stackoverflow.com/questions/8116623/… # open("/path/to/someFile", "wb").write(urllib2.urlopen("someUrl.com/somePage.html").read()) open("C:\Users\gregb\Downloads\download.xls", "wb").write(br.response().read()) print '\nAttempting to write file 2...\n' open("C:\Users\gregb\Downloads\urllib2_urlopen.xls", "wb").write(urllib2.urlopen("ncysaclassic.com/photos/pdftemp/…) Commented Sep 27, 2012 at 21:37

1 Answer 1

3
# fill out the form
response = br.submit()
fileobj = open('filename', 'w+')
fileobj.write(response.read())
fileobj.close()
Sign up to request clarification or add additional context in comments.

6 Comments

Why can't I enter a carriage return in these comments? When I do, my comment is submitted before I'm finished!
Lets try the two spaces suggested in the help
Let try again does this start a new line?
Try it on IE instead of Chrome
I was going to post my code in the comments but its difficult if I can't enter a linefeed...let me try the mark up formatting... # THIS WORKS! # open a local file instance fileobj = open("C:\\Users\\gregb\\Downloads\\ncysa_schedule.xls", "w+") # write to it from the submit response above fileobj.write(br.response().read()) fileobj.close() # thats it! Do you know how to hide the gzip(True) warning? br.set_handle_gzip(True) # this gives a warning - how to suppress it? br.set_handle_gzip(True) # this gives a warning - how to suppress it?`
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.