2

Using the IMPORTXML function, is it possible to construct an XPATH query that pulls the Industry value for a given Wikipedia page?

For example, the value I want to pull from this page - https://en.wikipedia.org/wiki/Target_Corporation - is "Retail" whereas on this page - https://en.wikipedia.org/wiki/Boohoo.com - it would be "Fashion".

2 Answers 2

2
  • You want to create the xpath for retrieving the Industry value for a given Wikipedia page.

If my understanding is correct, as other pattern, how about the formula with this xpath? Please think of this as just one of several answers.

Sample formula:

=IMPORTXML(A1,"//th[text()='Industry']/following-sibling::td")
  • The xpath is //th[text()='Industry']/following-sibling::td.
  • In this case, the URL of https://en.wikipedia.org/wiki/Target_Corporation or https://en.wikipedia.org/wiki/Boohoo.com is put in the cell "A1".

Result:

enter image description here

Reference:

Added:

From your replying, I knew that you want to add 2 more URLs. So all URLs are as follows.

Issue and workaround:

For above URLs, when the formula of =IMPORTXML(A1,"//th[text()='Industry']/following-sibling::td") is used, Retail, Fashion, Retail and Travel, services are returned.

When the xpath is modified to //th[text()='Industry']/following-sibling::td/a, Retail, #N/A, #N/A and Travel are returned.

The reason of this is due to the following difference.

<tr>
  <th scope="row">Industry</th>
  <td class="category"><a href="/wiki/Travel" title="Travel">Travel</a> services</td>
</tr>

and

<tr>
  <th scope="row" style="padding-right:0.5em;">Industry</th>
  <td class="category" style="line-height:1.35em;"><a href="/wiki/Retail" title="Retail">Retail</a></td>
</tr>

and

<tr>
  <th scope="row" style="padding-right:0.5em;">Industry</th>
  <td class="category" style="line-height:1.35em;">Fashion</td>
</tr>

By this, I think that unfortunately, in order to retrieve Travel, Retail and Fashion from above, those cannot be directly retrieved with only one xpath. So I used a built-in function for this situation.

Workaround:

In this workaround, I used INDEX. Please think of this as just one of several answers.

=INDEX(IMPORTXML(A1,"//th[text()='Industry']/following-sibling::td"),1,1)
  • The xpath is //th[text()='Industry']/following-sibling::td. This is not modified.
  • In this case, the URL is put in the cell "A1".
  • When 2 values are retrieved, the 1st one is retrieved. By this, I used INDEX.
Result:

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

This is very helpful. Thanks! It is leading to odd results in some cases.For this page - en.wikipedia.org/wiki/Woot - it returns a "Resource at url not found" error even though the link works and the HTML appears to have the same structure as the other pages. The bigger problem is that for industries that are described by multiple words (example - en.wikipedia.org/wiki/TripAdvisor), the formula puts the all the words after the second word in neighbouring cells (rather than concatenating everything into a single cell). I wonder if either of these problems can be addressed.
@zgall1 Thank you for replying. I couldn't notice that you want to also use this formula to other URLs. This is due to my poor skill. I deeply apologize for this. I added one more formula for 4 URLs you provided. Could you please confirm it? Unfortunately, I think that in order to retrieve values from 4 URLs, those cannot be directly retrieved with only one xpath. So I used a built-in function for this situation. If that was not the direction you want, I apologize.
I apologize for the late reply. This did work for me.
@zgall1 Thank you for replying. I'm glad your issue was resolved. Thank you, too.
0

try:

=INDEX(IMPORTXML("https://en.wikipedia.org/wiki/Boohoo.com", 
 "//td[@class='category']"), 2, 1)

=INDEX(IMPORTXML("https://en.wikipedia.org/wiki/Target_Corporation", 
 "//td[@class='category']"),2,1)

0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.