0

I have a clickable link on a dynamically created page, that looks like:

<td align="left"><a id="ucResultsGrid_X77" href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;ucResultsGrid$X77&quot;, &quot;&quot;, false, &quot;&quot;, &quot;webProperty.aspx?stype=id&amp;s=67&amp;time=201606071553023&amp;id=X77&quot;, false, true))" style="text-decoration:underline;">View Property</a></td><td align="right">X77</td>

After inspecting the page source, it appears that this submits to:

<form name="searchForm" method="post" action="./webSearch.aspx?cad&amp;stype=id&amp;s=67&amp;time=201606071512012" id="searchForm">
<div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/really long string />
</div>

<script type="text/javascript">
//<![CDATA[
var theForm = document.forms['searchForm'];
if (!theForm) {
    theForm = document.searchForm;
}
function __doPostBack(eventTarget, eventArgument) {
    if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
        theForm.__EVENTTARGET.value = eventTarget;
        theForm.__EVENTARGUMENT.value = eventArgument;
        theForm.submit();
    }
}

I've been reading through http://harman-clarke.co.uk/answers/javascript-links-in-scrapy.php and http://cpuknows.com/2015/09/12/scrapy/ . from these and other sources (http://doc.scrapy.org/en/latest/faq.html#what-s-this-huge-cryptic-viewstate-parameter-used-in-some-forms and https://blog.scrapinghub.com/2016/04/20/scrapy-tips-from-the-pros-april-2016-edition/)

I've produced the following spider function:

def parse_third_request(self, response):

    item = response.meta['item'] 
    yield FormRequest.from_response(response,formname='searchForm',callback=self.parse_detail_page,meta={'item': item})

However I'm not clear on how to set the formdata dictionary mentioned in http://doc.scrapy.org/en/latest/topics/request-response.html#request-subclasses . In this case I'm clicking a link not filling a form.

1 Answer 1

2

the final idea is to replicate a Request, not really replicate a "click", which could involve several requests or just internal javascript processing with the actual response.

formdata is just another argument you can fill in that FormRequest, the final idea behind from_response is to just create the request specified on the Form you set, and by default gets all of the already set up input tags, with their name and value (just like a normal form request works).

There are some input tags that don't have information on the value attribute, which are normally filled later by user input, you need to check which specific parameters are being sent once you try to replicate that request, and pass those parameters on the formdata dictionary.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.