1

I'm trying to parse a HTML to retrieve the value of tag, on my Google Apps Script code. contains line breaks in attributes, and appears more than once but I only want the first value. (In this case, only 'foo' is required.)

<b class="
"
>
foo
</b><b class="
"
>
var
</b>

On Google Apps Script, functions such as 'getElementByTagName' is not available. So I first though of using regexp but it's not the wise option here. Does anyone have an idea on how I can move forward? Any comment/guess would be highly appreciated!

1
  • Are there others what I can do for your this question? Commented Aug 22, 2018 at 22:47

2 Answers 2

4

How about using XmlService for your situation as a workaround? At XmlService, even if there are several line breaks in the tags, the value can be retrieved. I think that there are several workarounds for your situation. So please think of this as one of them.

The flow of sample script is as follows.

Flow :

  1. Add the header of xml and a root element tag to the html.
  2. Parse the creates xml value using XmlService.
  3. Retrieve the first value of tags using XmlService.

Sample script :

var html = '<b class="\n"\n>\nfoo\n</b><b class="\n"\n>\nvar\n</b>\n'; // Your sample value

var xml = '<?xml version="1.0"?><sampleContents>' + html + '</sampleContents>';
var res = XmlService.parse(xml).getRootElement().getChildren()[0].getText().trim();
Logger.log(res) // foo

Note :

  • In this sample script, your sample html was used. So if you use more complicated one, can you provide it? I would like to modify the script.

Reference :

If this was not what you want, please tell me. I would like to modify it.

Edit 1 :

Unfortunately, for the value retrieved from the URL, above script cannot be used. So I used "Parser" which is a GAS library for your situation. The sample script is as follows.

Sample script :

var url = "https://www.booking.com/searchresults.ja.html?ss=kyoto&checkin_year=2018&checkin_month=10&checkin_monthday=1&checkout_year=2018&checkout_month=10&checkout_monthday=2&no_rooms=1&group_adults=1&group_children=0";
var html = UrlFetchApp.fetch(url).getContentText();
var res = Parser.data(html).from("<b class=\"\n\"\n>").to("</b>").build().trim();
Logger.log(res) // US$11

Note :

  • Before you run this script, please install "Parser". About the install of library, you can see it at here.
    • The project key of the library is M1lugvAXKKtUxn_vdAG9JZleS6DrsjUUV

References :

Edit 2 :

For your 2nd URL in your comment, it seems that the URL is different from your 1st one. And also your new URL has no tag of <b class=\"\n\"\n>. By this, the value you want cannot be retrieved. But from the 1st URL in your comment, I presumed about the value what you want. Please confirm the following script?

var url = "https://www.booking.com/searchresults.ja.html?ss=kyotogranvia&checkin_year=2018&checkin_month=10&checkin_monthday=1&checkout_year=2018&checkout_month=10&checkout_monthday=2&no_rooms=1&group_adults=1&group_children=0";
var html = UrlFetchApp.fetch(url).getContentText();
var res = Parser.data(html).from("<span class=\"lp-postcard-avg-price-value\">").to("</span>").build().trim();
Logger.log(res) // US$289
Sign up to request clarification or add additional context in comments.

6 Comments

Hello Tanaike-san, I appreciate your answer! The approach with XmlService looks interesting, but yes as you mentioned, the html I'm attacking is more complicated as below; booking.com/… In this html, I'd like to retrieve the price in the first <b class=""> tag. Do you still think i'd better going with XmlService or is there more appropriate way? Thanks for your warm help!
@pomme I'm really sorry for my incomplete answer. I updated my answer. Could you please confirm it?
@pomme I'm really sorry for my incomplete answer again. At your 2nd URL, there is no tag of <b class=\"\n\"\n>. So my latest script cannot used. So I added new script for 2nd one by presuming the value what you want. Could you please confirm it? I'm worry that the tag might be different for each URL you use.
Tanaike-san, appreciate your continuous support! Thanks to your advise, now I can scrape what I need from Booking.com. Will try harder to get used to Google Apps Script more!
@pomme Thank you for replying. If you have any questions, feel free to tell me. I would like to also study.
|
-1

Don't use the Parser library (https://www.kutil.org/2016/01/easy-data-scrapping-with-google-apps.html) This is NOT an HTML parser at all; it just looks for text between two regular expressions. If you insist on trying it anyway, you will need the new Script ID: 1Mc8BthYthXx6CoIz90-JiSzSafVnT6U3t0z_W3hLTAX5ek4w0G_EIrNw

obtained from the link "Completed code of Parser library" https://script.google.com/d/1Mc8BthYthXx6CoIz90-JiSzSafVnT6U3t0z_W3hLTAX5ek4w0G_EIrNw/edit?usp=drive_web

on the webpage https://www.kutil.org/2016/01/easy-data-scrapping-with-google-apps.html

1 Comment

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.