Import:
Import your data as plain text first using COPY, in case it's not in valid JSON format (which is the case here).
You can use pattern matching for some basic cleaning, then parse it with a simple type cast. If the file is not easily reachable from the db, you can look into psql \copy as well as its PGAdmin wrapper
create table public.measurement_samples_raw (sample_row text);
/* /home/username/measurement_samples.json:
[{"name":"A","value":3.300000,"err":1.200000,},{"name":"B","value":730.000000,"err":112.000000,},{"name":"E","value":22.600000,"err":4.700000,},{"name":"H","value":58.300000,"err":11.100000,}]
[{"name":"A","value":2.100000,"err":1.400000,},{"name":"J","value":266.000000,"err":65.000000,},{"name":"K","value":14.700000,"err":3.800000,}] */
copy public.measurement_samples_raw (sample_row)
from '/home/your_username/measurement_samples.json';
update public.measurement_samples_raw set
sample_row = regexp_replace(
'{"top_key":'||sample_row||'}', --unnamed lists aren't supported, so adding a key and wrapping in {...}
',\s*}', '}',--pattern and replacement to remove trailing commas
'g' --forces the function to replace all instances of the pattern
);
alter table public.measurement_samples_raw
add column sample_id serial,--so that each measurement sample has an ID
add column sample_row_jsonb jsonb;
update public.measurement_samples_raw
set sample_row_jsonb=sample_row::jsonb;
Since Postgres 16, you can also use pg_input_is_valid() to filter incoming values based on whether they match a given type's accepted input format. For versions 15 and earlier, it's easy to backport it.
Since Postgres 17, COPY also offers ON_ERROR and REJECT_LIMIT options:
ON_ERROR
Specifies how to behave when encountering an error converting a column's input value into its data type. An error_action value of stop means fail the command, while ignore means discard the input row and continue with the next one. The default is stop.
The ignore option is applicable only for COPY FROM when the FORMAT is text or csv.
A NOTICE message containing the ignored row count is emitted at the end of the COPY FROM if at least one row was discarded. When LOG_VERBOSITY option is set to verbose, a NOTICE message containing the line of the input file and the column name whose input conversion has failed is emitted for each discarded row. When it is set to silent, no message is emitted regarding ignored rows.
REJECT_LIMIT
Specifies the maximum number of errors tolerated while converting a column's input value to its data type, when ON_ERROR is set to ignore. If the input causes more errors than the specified value, the COPY command fails, even with ON_ERROR set to ignore. This clause must be used with ON_ERROR=ignore and maxerror must be positive bigint. If not specified, ON_ERROR=ignore allows an unlimited number of errors, meaning COPY will skip all erroneous data.
Storage:
You can normalize the structure and populate it by extraction using json type functions and operators
drop table if exists public.measurement_samples;
create table public.measurement_samples (
id serial,
name char(1),
value numeric,
err numeric,
constraint measurement_samples_pk primary key (id, name) --I assume you don't want >=2 values for characteristic 'A' in a single measurement row
);
insert into public.measurement_samples (id, name, value, err)
select sample_id,
(single_measurement_in_sample->>'name')::text,
(single_measurement_in_sample->>'value')::numeric,
(single_measurement_in_sample->>'err')::numeric
from (
select sample_id,
json_array_elements(
sample_row::json->'top_key'
) as single_measurement_in_sample
from public.measurement_samples_raw
) raw_input;
Which gives you a gapless structure:
table public.measurement_samples;
| id |
name |
value |
err |
| 1 |
A |
3.300000 |
1.200000 |
| 1 |
B |
730.000000 |
112.000000 |
| 1 |
E |
22.600000 |
4.700000 |
| 1 |
H |
58.300000 |
11.100000 |
| 2 |
A |
2.100000 |
1.400000 |
| 2 |
J |
266.000000 |
65.000000 |
| 2 |
K |
14.700000 |
3.800000 |
Which you can rearrange to your needs, without wasting space, at the cost of performance - unless you make the view materialized:
create view public.v_measurement_samples as
select id,
sum(value) filter (where name='A') as "A_value",
sum(err) filter (where name='A') as "A_err",
sum(value) filter (where name='B') as "B_value",
sum(err) filter (where name='B') as "B_err",
sum(value) filter (where name='C') as "C_value",
sum(err) filter (where name='C') as "C_err",
sum(value) filter (where name='D') as "D_value",
sum(err) filter (where name='D') as "D_err",
sum(value) filter (where name='E') as "E_value",
sum(err) filter (where name='E') as "E_err"
from public.measurement_samples
group by id
order by id;
table public.v_measurement_samples;
| id |
A_value |
A_err |
B_value |
B_err |
C_value |
C_err |
D_value |
D_err |
E_value |
E_err |
| 1 |
3.300000 |
1.200000 |
730.000000 |
112.000000 |
|
|
|
|
22.600000 |
4.700000 |
| 2 |
2.100000 |
1.400000 |
|
|
|
|
|
|
|
|
Working dbfiddle example.