I'm trying to build a python crawler using requests library. When i use get method i retrieved result look like: THá» THAO. But when i use curl i got THỂ THAO and it is my expected result. Here is my code:
def get_raw_channel():
r = requests.get('http://vtv.vn/')
raw_html = r.text
soup = BeautifulSoup(raw_html)
o_tags = soup.find_all("option")
for o_tag in o_tags:
print o_tag.text
# raw_channel = RawChannel(o_tag.text.strip(), o_tag['value'])
# channels_file.write(raw_channel.__str__() + '\n')
Here is my curl cmd: curl http://vtv.vn/
Question: why the results is different? How can i achieve curl's result by using requests?
(Date: Mon, 09 Feb 2015 07:59:34 GMT, Content-Type: text/html, Transfer-Encoding: chunked, Connection: close, Vary: Accept-Encoding ,Server: vtv-rpthis is curl response header. And:{'via': '1.1 TMG', 'proxy-connection': 'Keep-Alive', 'transfer-encoding': 'chunk ed', 'vary': 'Accept-Encoding', 'server': 'vtv-rp', 'connection': 'Keep-Alive', 'date': 'Mon, 09 Feb 2015 08:19:52 GMT', 'content-type': 'text/html'}is requests response headers.utf-8