0

When I visited this website throught my Firefox 13, I got a page with some content. But when I use wget to download it :

wget http://tinhvan.com

I got other content on downloaded HTML page. Tried set user-agent :

wget -U 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:13.0) Gecko/20100101 Firefox/13.0.1' http://tinhvan.com

but got same result.

What happened ? and how do I get the same result as when I visit it throught Firefox?

UPDATE

Here is from Firefox => view source:

<!DOCTYPE html>

<html dir="ltr" lang="vi">  

    <head id="ctl00_page_header">




            <title>

                Tinhvan Group - Trang chủ       

and here from downloaded by wget

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><link href="Content/images/main.css" rel="stylesheet" type="text/css" /><link href="Content/images/mail-detail.css" rel="stylesheet" type="text/css" />
    <script src="../../Content/JqueryUI/js/jquery-1.3.2.min.js" type="text/javascript"></script>    
    <title>

    Trang chủ - Tinhvan Group Website
5
  • 1
    it would help to show what wget returns Commented Jul 31, 2012 at 8:58
  • I think I get the same results. Commented Jul 31, 2012 at 8:59
  • @CharlesB :I've updated the questiong Commented Jul 31, 2012 at 9:18
  • Are you by any chance logged in or so from Firefox? What does visiting the site with another browser do? Commented Jul 31, 2012 at 9:47
  • I visit that site without login or something else. Used tor-browser but get same result as Firefox does Commented Jul 31, 2012 at 9:59

1 Answer 1

1

Firefox (not just FF, Chrome, IE, etc does it as well) automatically add Accept* headers.

e.g.

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: en-US, en;q=0.5

try

wget --header="Accept: text/html"  -U 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:13.0) Gecko/20100101 Firefox/13.0.1' http://tinhvan.com

Note: if you don't declare Accept header then wget automatically adds Accept:*/* which means give me anything you have. It seems that the site returns aplication/xhtml+xml by default but you expect text/html.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.