7

Hi complete newbie here so bear with me. Seems like a simple job but I can't seem to find an easy way to do this.

So I need to extract a particular text from a webpage "www.example.com/index.php". I know that the text would be available in p tag with certain id. How do I extract this data out using javascript?

What I'm trying currently is that I have my javascript file (trying.js) on my computer with the following code:

$(document).ready(function () {
    $.get("www.example.com/index.php", function(data) {
        console.log(data)
    }) ;
});

and a html that runs the javascript file.

When I open this html page with firefox it doesn't show me anything in console. How do I get the website's data? Am I on the correct track here? Is there a better way to do this?

8
  • 12
    You can't, javascript has a same origin policy, so you don't have access to other websites than those on the same domain or services that support JSONP or CORS. Commented Oct 4, 2013 at 13:03
  • 2
    possible duplicate of Can Javascript read the source of any web page? Commented Oct 4, 2013 at 13:06
  • 1
    You need to write an app, maybe using Selenium or Watin browser automation or my new favorite CSQuery (it has only read access to the DOM but uses JQuery style filters in CSharp and is really fast). Commented Oct 4, 2013 at 13:23
  • 1
    What you're looking for is a page scraper. Javascript can't pull it off because it can only gather data from the domain you're on. You could build it in Ruby, for example, and use one of the many existing gems for this sort of task, like github.com/assaf/scrapi or nokogiri.org Commented Oct 4, 2013 at 13:34
  • 1
    Please take a look at stackoverflow.com/questions/680562/… There are multiple ways discussed. Hope it helps you. Commented Oct 4, 2013 at 13:43

1 Answer 1

0

Due to Same-Origin Policy (CORS), which prevents JavaScript from making direct requests to different domains for security reasons. However you can do that using

    1 - Use a proxy server
$(document).ready(function () {
    $.get("https://your-proxy-server.com/fetch?url=https://www.example.com/index.php", function(data) {
        const parser = new DOMParser();
        const doc = parser.parseFromString(data, 'text/html');
        const text = doc.getElementById('your-id-here').textContent;
        console.log(text);
    });
});

2 -  Use the Fetch API with a server that supports CORS
fetch('https://www.example.com/index.php', {
    method: 'GET',
    headers: {
        'Accept': 'text/html'
    }
})
.then(response => response.text())
.then(data => {
    const parser = new DOMParser();
    const doc = parser.parseFromString(data, 'text/html');
    const text = doc.getElementById('your-id-here').textContent;
    console.log(text);
})
.catch(error => console.error('Error:', error));
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.