2

I'm trying to import data from an XML file to PostgreSQL. I would like to create columns from the attributes in the <row> tag. The problem is when I use xpath to get text, I get nothing.

Here is a sample input testinput.xml file

<tags>
  <row Id="1" TagName=".net" Count="316293" ExcerptPostId="3624959" WikiPostId="3607476" />
  <row Id="2" TagName="html" Count="1116853" ExcerptPostId="3673183" WikiPostId="3673182" />
  <row Id="3" TagName="javascript" Count="2343663" ExcerptPostId="3624960" WikiPostId="3607052" />
</tags>

Here is my myquery.sql

DO $$

DECLARE xml_string xml;

BEGIN
xml_string := XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('/home/me/data/testinput.xml'), 'UTF8'));

DROP TABLE IF EXISTS tags;

CREATE TABLE tags AS
SELECT
    (xpath('//ID', x))[1]::text as id,
    (xpath('//TagName', x))[1]::text as tagName,
    (xpath('//Count', x))[1]::text as count,
    (xpath('//ExcerptPostId', x))[1]::text as excerptPostId,
    (xpath('//WikiPostId', x))[1]::text as wikiPostId
FROM unnest(xpath('//row', xml_string)) x;
END$$;

SELECT * FROM tags;

I got this

 id | tagname | count | excerptpostid | wikipostid 
----+---------+-------+---------------+------------
    |         |       |               | 
    |         |       |               | 
    |         |       |               | 
(3 rows)

By the way, my .xml file is around 100GB, so any fast solution would help. Thank you!

2
  • At 100 GB, XPath will can become memory-taxing since it requires reading the entire document in memory before parsing. Consider programming languages or external XML tools like Saxon and parse the XML files by streaming (i.e., iteratively walking down the tree and releasing elements). Java, PHP, Python, etc. have libraries for this need and they have libraries to connect to Postgres. Commented May 22, 2022 at 1:32
  • @Parfait Oh! Guess I'll use a different tool then. Thank you! Commented May 22, 2022 at 2:59

2 Answers 2

1

Using xmltable() is typically easier and more flexible:

CREATE TABLE tags 
AS
SELECT x.*
FROM xmltable('/tags/row' passing xml_string
       columns id text path '@Id',
               tag_name text path '@TagName',
               "count" int path '@Count',
               excerpt_post_id  int path '@ExcerptPostId',
               wiki_pot_id int path 'WikiPostId') as x;
Sign up to request clarification or add additional context in comments.

1 Comment

I didn't know xmltable() existed. Thank you!
0

Because the values you intend to parse are attributes (not elements), you need to parse with @ keyword:

SELECT
    (xpath('//@Id', x))[1]::text as id,
    (xpath('//@TagName', x))[1]::text as tagName,
    (xpath('//@Count', x))[1]::text as count,
    (xpath('//@ExcerptPostId', x))[1]::text as excerptPostId,
    (xpath('//@WikiPostId', x))[1]::text as wikiPostId
FROM unnest(xpath('//row', xml_string)) x;

Online Demo

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.