0

I have complex string in which I need to pull single words and/or multiple words.

Here is the string:

<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="5" yahoo:created="2013-07-28T18:37:23Z" yahoo:lang="en-US"><diagnostics><publiclyCallable>true</publiclyCallable><user-time>145</user-time><service-time>141</service-time><build-version>38483</build-version></diagnostics><results><Result xmlns="urn:yahoo:cate">**RED**</Result><Result xmlns="urn:yahoo:cate">**GREEN**</Result><Result xmlns="urn:yahoo:cate">**BLUE**</Result><Result xmlns="urn:yahoo:cate">**A, E, I, O, U **</Result><Result xmlns="urn:yahoo:cate">**SOMETIMES Y**</Result></results></query><!-- total: 145 -->

(I really wish that wouldn't scroll, since it makes it difficult to see the entire picture)

Anyway, I need to be able to pull out the:

RED

GREEN

BLUE

A, E, I, O, U

SOMETIMES Y

++++ btw, I tried to make those values BOLD in the big string, but they show up with asteriks instead. Disredard the asterisks. They are not part of the string. However I'm leaving them in there since it makes them easier to find when you look at the entire string) ++++

My goal is to turn that complex string into this:

RED|GREEN|BLUE|A, E, I, O, U|SOMETIMES Y

My preference is to do this on the sheet level using a single nested function (or a combination of multiple functions if necessary).

Failing that, a script version would be preferable to nothing.

I've been at this for hours using SPLIT, FIND, SUBSTITUTE, and a few other things that I tried on a whim - just to try everything. But I've now reached the saturation point of thinking clearly on this, and I'm hoping that someone can put me on a path for how to attack this logically.

I'm truly stumped (and frustrated).

==========================================

I said that I'd post the solution if I figured out the sheet-level solution, so this is it:

=mid(substitute(substitute(regexreplace(mid(A1,find("<Result",A1),find("</query",A1)-find("<Result",A1)),"<.*?>+","-"),"--","|"),"-","|"),2,len(substitute(substitute(regexreplace(mid(A1,find("<Result",A1),find("</query",A1)-find("<Result",A1)),"<.*?>+","-"),"--","|"),"-","|"))-2)
1
  • I should add that this example returns 5 keyword(s). However there can be any number of keyword(s) that will be within the string, including zero keywords. Commented Jul 28, 2013 at 21:07

1 Answer 1

2

Have you considered using XmlService Services? https://developers.google.com/apps-script/reference/xml-service

Simple example:

    /* CODE FOR DEMONSTRATION PURPOSES */
    function testXML() {
      var result = [];
      var document = XmlService.parse('<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="5" yahoo:created="2013-07-28T18:37:23Z" yahoo:lang="en-US"><diagnostics><publiclyCallable>true</publiclyCallable><user-time>145</user-time><service-time>141</service-time><build-version>38483</build-version></diagnostics><results><Result xmlns="urn:yahoo:cate">RED</Result><Result xmlns="urn:yahoo:cate">GREEN</Result><Result xmlns="urn:yahoo:cate">BLUE</Result><Result xmlns="urn:yahoo:cate">A, E, I, O, U</Result><Result xmlns="urn:yahoo:cate">SOMETIMES Y</Result></results></query><!-- total: 145 -->');
      var entries = document.getRootElement().getChildren('results')[0].getChildren();
      for (var i = 0, len = entries.length; i < len; ++i)
        result.push(entries[i].getText());
      Logger.log(result.join('|'));
    }
Sign up to request clarification or add additional context in comments.

5 Comments

I had NOT considered that.... because it is beyond my familiarity with javascript to even think of such a solution. But it certainly works! (you are a GENIOUS!). Btw, I've never asked this question before because I feel foolish for not knowing the answer, but if I finally ask now, then I won't have to feel foolish anymore, and i'll have actually learned something new and useful. So here goes: How do I turn the "logger" result into CELL results on the page?
On another note, I have come up with a sheet level solution using LEN, FIND, RIGHT, REGEXREPLACE, and SPLIT (in separate cells - so far) to accomplish this. Now I just need to figure out how to nest it all, and include a CONCATENATE into the mix as well. I'll post the sloution if I can figure out how to nest it all properly.
to write in a serie of cells use this : SpreadsheetApp.getActiveSheet().getRange(1,1,1,result.length).setValues([result]); for a single row. if you need it in columns let us know
Ok, that wasn't something that I would have come up with on my own, so I'm actually glad that I asked. Thank you! Btw, William added a join to his solution, so now I'm trying to include that as part of the result. It's obviously not as simple as adding .join('|') to result, as I encounter errors. I'm skunked.
He added join('|') only to get a readable result with | separators, you shouldn't use it when writing back to a spreadsheet. What you need is a matrix (a 2 dimension array) to be able to setValues in the SS. Maybe you should start a new post on how to write array data to SS ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.