2

I am looking to build a scraper in google scripts. There are 2 classe with the same name and I can't find a way to get the second class. It only outputs the first one.

enter image description here

function myFunction() {
    var url = "https://www.zchocolat.com/shop/fr/livraison-cadeau-chocolat/espagne";
    var fromText = '<p class="article"';
    var toText = '">';

    var content = UrlFetchApp.fetch(url).getContentText();
    var scraped = Parser
                .data(content)
                .setLog()
                .from(fromText)
                .to(toText)
                .build();
    Logger.log(scraped);
    return scraped;
}

function SAVE_DATA() {
   var sheet = SpreadsheetApp.openById('').getSheetByName('Feuille 1'); 
   sheet.appendRow([ new Date(), myFunction() ]);

}

2 Answers 2

4

How about this sample?

Modification points :

  1. When it sees the Parser of library you use at here, it seems that by using iterate(), data can be retrieved as an array.
  2. The data you want is second one.

When these are reflected to your script, the modified script is as follows.

Modified script :

function myFunction() {
  var url = "https://www.zchocolat.com/shop/fr/livraison-cadeau-chocolat/espagne";
  var fromText = '<p class="article">';
  var toText = '</p>';
  var content = UrlFetchApp.fetch(url).getContentText();
  var scraped = Parser
              .data(content)
              .from(fromText)
              .to(toText)
              .iterate();
  Logger.log(scraped[1]);
  return scraped;
}

Result :

97% de nos colis ont &eacute;t&eacute; livr&eacute;s dans les temps en 2016.
                                        zChocolat a d&eacute;j&agrave; livr&eacute; avec succ&egrave;s 21,923 cadeaux chocolat en Espagne.
Sign up to request clarification or add additional context in comments.

Comments

3

You should parse the html with the XmlService so that you can more easily extract the nodes you want. There are some good examples at this site (https://sites.google.com/site/scriptsexamples/learn-by-example/parsing-html)

You would end up with something like:

function myFunction() {
    var url = "https://www.zchocolat.com/shop/fr/livraison-cadeau-chocolat/espagne";
    var fromText = '<p class="article"';
    var toText = '">';

    var content = UrlFetchApp.fetch(url).getContentText();

    var doc = XmlService.parse(html);
    var html = doc.getRootElement();
    var articles = getElementsByClassName(html, 'articles');
    Logger.log(articles);
}

function getElementsByClassName(element, classToFind) {  
  var data = [];
  var descendants = element.getDescendants();
  descendants.push(element);  
  for(i in descendants) {
    var elt = descendants[i].asElement();
    if(elt != null) {
      var classes = elt.getAttribute('class');
      if(classes != null) {
        classes = classes.getValue();
        if(classes == classToFind) data.push(elt);
        else {
          classes = classes.split(' ');
          for(j in classes) {
            if(classes[j] == classToFind) {
              data.push(elt);
              break;
            }
          }
        }
      }
    }
  }
  return data;
}

7 Comments

I'm interested in your method. I have one question for your answer. May I ask you it?
Sure. It is a method I found on the site that I linked. What is your question?
Thank you for reply. I have tried your script. But it didn't work. Error is Error: Attribute name "async" associated with an element type "script" must be followed by the ' = ' character. at var doc = XmlService.parse(content);. I would like to know how to parse var content = UrlFetchApp.fetch("https://www.zchocolat.com/shop/fr/livraison-cadeau-chocolat/espagne").getContentText(); by XmlService.
Could you understand my question?
Can you post all of your code? I can't figure out what your error is saying without seeing all of it
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.