0

I have a requirement where I am receiving a XML file on a Unix path. I want to copy the content of this xml file in a Postgres table in one column.

Example file received at : /home/bojack/test.xml

<address>
   <street>Main Street</street>
   <city>New York</city>
   <country>USA</country>
</address>

Want to copy this to a table with 2 columns,

id value
1  <address>
   <street>Main Street</street>
   <city>New York</city>
   <country>US</country>
   </address>

Could anyone please help on this?

3
  • You want to put the whole file into a single table cell with newlines? Is that id column type serial or does the previous maximum id value need to be read from the table before loading? Also, have you tried anything? Commented Jan 7, 2024 at 11:26
  • I want to put whole file in single table cell(no newline needed, simple dump). I am getting only one file per day, so.. id will not matter, it would be only 1. I tried using ETL tool, informatica (File to Table transform) but not getting expected result. Commented Jan 7, 2024 at 15:21
  • XML type. Commented Jan 7, 2024 at 17:11

1 Answer 1

1

First the table:

# CREATE TABLE test_table(id serial, value text); -- notice id type serial
CREATE TABLE

Then, let's add a random attribute to an element (just to demonstrate quotes in the content):

$ cat file
<address>
   <street some="attribute">Main Street</street><!--attribute here-->
   <city>New York</city>
   <country>USA</country>
</address>

Now some awk to slurp in the whole file, to duplicate all double quotes in the content and to add (or replace all) double quotes to the beginning and end:

$ awk '                   # tested with gawk, mawk and awk version 20121220
BEGIN {
    RS="^$"               # slurp in the whole file
}
{
    gsub(/"/,"\"\"")      # double the double quotes
    # gsub(/\r?\n/,"")    # to remove newlines, uncomment (untested)
    gsub(/^"*|"*$/,"\"")  # add quotes to the beginning and end
    print                 # output
}' file

Output with above awk (notice "s in the beginning and the end and ""s in the middle):

"<address>
   <street some=""attribute"">Main Street</street>
   <city>New York</city>
   <country>USA</country>
</address>
"

Now to process the file with awk and to feed it's output to psql:

$ awk 'BEGIN{RS="^$"}{gsub(/"/,"\"\"");gsub(/^"*|"*$/,"\"");print}' file |
> psql -h host -d dbname -c "\copy test_table(value) from '/dev/stdin' csv" -U username
Password: 
COPY 1

And finally:

# select * from test_table ;
 id |                      value                       
----+--------------------------------------------------
  1 | <address>                                       +
    |    <street some="attribute">Main Street</street>+
    |    <city>New York</city>                        +
    |    <country>USA</country>                       +
    | </address>                                      +
    | 
(1 row)

This was tested in Linux using Bash shell.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.