1

I'm trying to build a Google Sheet that pulls information from the IRS Form 990 repository hosted via AWS S3.

Here is the XML file: Example 990 Form in XML

The query I'm doing is designed to pull the business names under the Schedule I section from the XML source. The business name is wrapped in the BusinessNameLine1Txttags.

Utilizing the built-in IMPORTXML function from Google Sheet I've built the following:

=IMPORTXML("https://s3.amazonaws.com/irs-form-990/201702299349300445_public.xml", "//Return/ReturnData/IRS990ScheduleI/RecipientTable/RecipientBusinessName/BusinessNameLine1Txt")

When I execute the function with parameters seen above I receive an error saying that the imported content is empty. Is my XPATH query incorrect or does it have to do with some quirk in the data?

1 Answer 1

3

How about this modification?

=IMPORTXML(A1, "//*[local-name()='BusinessNameLine1Txt']")
  • https://s3.amazonaws.com/irs-form-990/201702299349300445_public.xml is put in "A1".

Result:

enter image description here

Reference:

If I misunderstand your issue, please tell me. I would like to modify it.

Edit:

=IMPORTXML(A1, "//*[local-name()='IRS990ScheduleI']//*[local-name()='BusinessNameLine1Txt']")

Result:

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Tanaike, that was helpful. Do you know how to specify so that query just selects the business names from the Schedule I section?
@Matt I'm really sorry for my incomplete answer. The values you want is =IMPORTXML(A1, "//*[local-name()='IRS990ScheduleI']//*[local-name()='BusinessNameLine1Txt']")? If I misunderstand again, please tell me.
thank you this is exactly what I was looking for! Would you happen to know why I couldn't use the syntax I wrote in the original question.
@Matt At first, I tried =IMPORTXML(A1, "/Return"). If the data has not issues, the some values are returned. But #N/A was returned. So I checked the XML data. It was found that when http://www.irs.gov/efile of xmlns="http://www.irs.gov/efile" in Return is removed (for example, it's xmlns="".), your xpath worked fine. But the existing XML cannot be modified. So I proposed to use local-name(). If this was not useful, I'm sorry.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.