3

I'm trying to get json response from this webpage using the following approach but this is what I get {"message": "Must provide valid one of: query_id, query_hash", "status": "fail"}. I tried to print the response url, as in r.url in the second script to see if it matches the one I tried to send but I found it different in structure.

If I use the url directly (taken from dev tools) within requests, I get required content:

import json
import requests

check_url = 'https://www.instagram.com/graphql/query/?query_hash=7dabc71d3e758b1ec19ffb85639e427b&variables=%7B%22tag_name%22%3A%22instagood%22%2C%22first%22%3A2%2C%22after%22%3A%22QVFDa3djMUFwM1BkRWJNTlEzRmxBYkRGdFBDVzViU2JoNVZPbWNQSmNCTE1HNDlhYWdsdi1EcE5ickhvYjhRWUhqUDhIcXE3YTE4M1JMbmdVN0lMSXM3ZA%3D%3D%22%7D'
r = requests.get(check_url)
print(r.json())

But, I can't make it work:

import json
import requests

url = 'https://www.instagram.com/explore/tags/instagood/'
query_url = 'https://www.instagram.com/graphql/query/?'

payload = {
    "query_hash": "7dabc71d3e758b1ec19ffb85639e427b",
    "variables": {"tag_name":"instagood","first":"2","after":"QVFDa3djMUFwM1BkRWJNTlEzRmxBYkRGdFBDVzViU2JoNVZPbWNQSmNCTE1HNDlhYWdsdi1EcE5ickhvYjhRWUhqUDhIcXE3YTE4M1JMbmdVN0lMSXM3ZA=="}
}

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'
    r = s.get(query_url,params=json.dumps(payload))
    print(r.content)

How can I make the above script work?

2 Answers 2

2
+50

Your problem is connected to how you encode the params. From the check_url in your first example we can see:

?query_hash=7dabc71d3e758b1ec19ffb85639e427b&variables=%7B%22tag_name%22%3A%22...

This URL has 2 params:

  1. query_hash - string
  2. variables - looks like a URL encoded string, judging by the escape values (%7B%22).

As you have correctly identified, %7B%22 corresponds to {". In other words, the second parameter is a url-escaped JSON string.

From this we can get a clue about the new solution:

query_url = 'https://www.instagram.com/graphql/query/?'

variables = {"tag_name": "instagood", "first": "2",
             "after": "QVFDa3djMUFwM1BkRWJNTlEzRmxBYkRGdFBDVzViU2JoNVZPbWNQSmNCTE1HNDlhYWdsdi1EcE5ickhvYjhRWUhqUDhIcXE3YTE4M1JMbmdVN0lMSXM3ZA=="}
payload = {
    "query_hash": "7dabc71d3e758b1ec19ffb85639e427b",
    "variables": json.dumps(variables)
}

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) ' + \
                              'Chrome/81.0.4044.138 Safari/537.36'
    r = s.get(query_url, params=payload)
    print(r.content)

As you can see, the params passed to the requests.get method is a dict with two keys. This will get translated into ?query_hash=value1&variables=value2.

To get the correct value for variables, we just dump the JSON to string. The requests library will take care of URL-escaping all the characters like { and " in the string.

Sign up to request clarification or add additional context in comments.

Comments

0

While running your code, the URL that forms after api call contains unnecessary escape characters. This is what screwing up the API call.

here It is not suggested to send data payload while using get. A quick solution to this could be using post request instead. It worked fine!

import json
import requests

url = 'https://www.instagram.com/explore/tags/instagood/'
query_url = 'https://www.instagram.com/graphql/query/?'

payload = {
    "query_hash": "7dabc71d3e758b1ec19ffb85639e427b",
    "variables": {"tag_name":"instagood","first":"2","after":"QVFDa3djMUFwM1BkRWJNTlEzRmxBYkRGdFBDVzViU2JoNVZPbWNQSmNCTE1HNDlhYWdsdi1EcE5ickhvYjhRWUhqUDhIcXE3YTE4M1JMbmdVN0lMSXM3ZA=="}
}

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'
    r = s.post(query_url,params=json.dumps(payload))
    print(r.content)

2 Comments

Note that url parameters are not considered request body, so we're free to use GET.
Your suggested script fetches the wrong content. Btw, you should use data instead of params while using within post requests.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.