0

Following this example, I am having trouble using a PostgreSQL variables in a *.sql script:

  • I want to iterate over a number of XML data files, using a BASH script

  • the BASH script assigns XML file names to a variable, that is passed to SQL script

  • the SQL script, called by that BASH script, loads those data into PostgreSQL

If I source the XML files directly, there is no problem; however, I cannot access that variable, in my SQL script:

In my SQL script (hmdb.sql) I can access the PSQL variable :bash_var (passed from the BASH script):

\echo '\nEXTERNAL VARIABLE (= "datafile", HERE):' :bash_var '\n'

and/or directly reference the XML file,

datafile text := 'hmdb/hmdb.xml';

but not as a variable:

datafile text := 'bash_var';

hmdb.sh

#!/bin/bash

DATA_DIR=data/

for file in $DATA_DIR/*.xml
  do
    bash_var=$(echo ${file##*/})
    echo $bash_var
    psql -d hmdb -v bash_var=$bash_var -f hmdb.sql
done
2
  • 1
    I deal daily with tons of XML files and I always found that approach easy to implement and to maintain: stackoverflow.com/questions/49862106/… doing so you simplify a lot your data import. Commented Apr 27, 2018 at 7:29
  • @JimJones: thank you; funny timing, eh? Mentioned in your answer (other SO question), I am also using xml_split as my file is huge (3.4 GB), and I was getting OOM issues. I'll post back here once I work through this. :-) Commented Apr 27, 2018 at 19:59

1 Answer 1

1

OK, here is my solution.

I post a more detailed answer on my Persagen.com blog.

Basically, I decided to abrogate the DO $$DECLARE ... approach (described in SO 49950384) in favor of the simplified approach, below.

I am then able to access the BASH / PSQL shared variable, :bash_var, thusly:

xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file(:'bash_var'))))

Here is a sample SQL script, illustrating that usage:

hmdb.sql

\c hmdb

CREATE TABLE hmdb_identifiers (
  id SERIAL,
  accession VARCHAR(15) NOT NULL,
  name VARCHAR(300) NOT NULL,
  cas_number VARCHAR(12),
  pubchem_cid INT,
  PRIMARY KEY (id),
  UNIQUE (accession)
);

\echo '\n[hmdb.sql] bash_var:' :bash_var '\n'

-- UPDATE (2019-05-15): SEE MY COMMENTS BELOW RE: TEMP TABLE!
CREATE TEMP TABLE tmp_table AS 
SELECT 
  (xpath('//accession/text()', x))[1]::text::varchar(15) AS accession
  ,(xpath('//name/text()', x))[1]::text::varchar(300) AS name 
  ,(xpath('//cas_registry_number/text()', x))[1]::text::varchar(12) AS cas_number 
  ,(xpath('//pubchem_compound_id/text()', x))[1]::text::int AS pubchem_cid 
-- FROM unnest(xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('hmdb/hmdb.xml'), 'UTF8')))) x
FROM unnest(xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file(:'bash_var'), 'UTF8')))) x
;

INSERT INTO hmdb_identifiers (accession, name, cas_number, pubchem_cid)
  SELECT lower(accession), lower(name), lower(cas_number), pubchem_cid FROM tmp_table;

DROP TABLE tmp_table;

SQL script notes:

  • In the xpath statements I recast the ::text (e.g.: ::text::varchar(15)) per the Postgres table schema.

  • More significantly, if I did not recast the datatypes in the xpath statement and a field entry (e.g. name length) exceeded the SQL varchar(300) length limit, those data threw a PSQL error and the table did not update (i.e. a blank table results).

I uploaded the XML data files used in this answer at this Gist

https://gist.github.com/victoriastuart/d1b1959bd31e4de5ed951ff4fe3c3184

Direct links:


UPDATE (2019-05-15)

In follow-on work, detailed in my research blog post Exporting Plain Text to PostgreSQL, I directly load XML data into PostgreSQL, rather than using temp tables.

TL/DR. In that project, I observed the following improvements.

Parameter | Temp Tables  | Direct Import | Reduction
    Time: | 1048 min     | 1.75 min      | 599x
   Space: | 252,000 MB   | 18 MB         | 14,000x
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.