1

I'm stuck in web scraping a page using Python. Basically, the following is the request from HttpRequester (in Mozilla) and it gives me the right response.

POST https://www.hpe.com/h20195/v2/Library.aspx/LoadMore
Content-Type: application/json
{"sort": "csdisplayorder", "hdnOffset": "1", "uniqueRequestId": "d6da6a30bdeb4d77b0e607a6b688de1e", "test": "", "titleSearch": "false", "facets": "wildcatsearchcategory#HPE,cshierarchycategory#No,csdocumenttype#41,csproducttype#18964"}
 -- response --
200 OK
Cache-Control:  private, max-age=0
Content-Length:  13701
Content-Type:  application/json; charset=utf-8
Server:  Microsoft-IIS/7.5
X-AspNet-Version:  4.0.30319
X-Powered-By:  ASP.NET
Date:  Sat, 28 May 2016 04:12:57 GMT
Connection:  keep-alive

The exact same operation in python 2.7.1 using Requests, fails with an error. The following is the code snippet:

jsonContent = {"sort": "csdisplayorder", "hdnOffset": "1", "uniqueRequestId": "d6da6a30bdeb4d77b0e607a6b688de1e", "test": "", "titleSearch": "false", "facets": "wildcatsearchcategory#HPE,cshierarchycategory#No,csdocumenttype#41,csproducttype#18964"}

catResponse = requests.post('https://www.hpe.com/h20195/v2/Library.aspx/LoadMore', json = jsonContent)

The following is the error that I get:

{"Message":"Value cannot be null.\r\nParameter name: source","StackTrace":"   at
 System.Linq.Enumerable.Contains[TSource](IEnumerable`1 source, TSource value, I
EqualityComparer`1 comparer)\r\n   

More information: The Post request that I'm looking for is fired upon:

  1. opening this web page: https://www.hpe.com/h20195/v2/Library.aspx?doctype=41&doccompany=HPE&footer=41&filter_doctype=no&filter_doclang=no&country=&filter_country=no&cc=us&lc=en&status=A&filter_status=rw#doctype-41&doccompany-HPE&prodtype_oid-18964&status-a&sortorder-csdisplayorder&teasers-off&isRetired-false&isRHParentNode-false&titleCheck-false

  2. Clicking on the "Load more" grey button at the end of the page

I'm capturing the exact set of request headers and response from the browser operation and trying to mimic that in Postman, Python code and HttpRequester (Mozilla).

It flags the same error (mentioned above) with Postman and Python, but works with no headers set on my part with HttpRequester.

Can anyone think of an explanation for this?

4
  • 1
    Perhaps HttpRequester sends along a cookie, or the server alters behaviour based on the user agent. Impossible to tell, but your requests code is otherwise correct. Commented May 28, 2016 at 4:55
  • Thanks for the quick response. But if HttpRequest sends along a cookie, it should be listed as part of the request headers, right? I don't see any header other than content-type in the raw output (listed in my question). I don't believe user-agent is the problem because the user agent "User-Agent: python-requests/2.10.0" worked for a different post request to the same server. Commented May 28, 2016 at 5:11
  • There are too many headers missing from the HttpRequester output; there is no content-length, no accept, no user-agent. You are not being shown all headers that are sent, so you can't make any assumptions. Commented May 28, 2016 at 5:14
  • The issue was solved by using requests Session which created a cookie ASP.NET_SessionId that persisted between different Post requests. Martijn was right - it looks like HttpRequester was passing more headers than was evident in the raw output. The other hint came from Postman - when I enabled the interceptor and used the browser's cookies (which had the session ID among others), the Post request went through. Commented May 29, 2016 at 5:16

1 Answer 1

1

If both Postman and requests are receiving an error, then there is more context than what HttpRequester is showing. There are a number of headers that I'd expect to be set almost always, including User-Agent and Content-Length, that are missing here.

The usual suspects are cookies (look for Set-Cookie headers in earlier requests, preserve those by using a requests.Session() object), the User-Agent header and perhaps a Referrer header, but do look for other headers like anything starting with Accept, for example.

Have HttpRequester post to http://httpbin.org/post instead for example, and inspect the returned JSON, which tells you what headers were sent. This won't include cookies (those are domain-specific), but anything else could potentially be something the server looks for. Try such headers one by one if cookies are not helping.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.