3
\$\begingroup\$

The Finnish personal identity code is defined as:

  • Day, month, and year of birth, in ddmmyy form (six digits, zero padded if necessary)
  • A separator
    • For those born >= 2000, one of A, B, C, D, E, or F
    • For those born in the 1900s, one of Y, X, W, V, U, or -
    • For those born in the 1800s, a plus sign (+)
  • A three-digit individual number, odd for male, even for female. In practice this is between 002 and 899 (900-999 is used for temporary identity codes).
  • A control character (the nine digits modulo 31, table lookup 0-Y)

For example, Anna Finlander's personal identity code is 131052-308T. She is thus born October 13th, 1952. For official information, see here and here.

I wanted to calculate and store the date of birth from the personal identity code in Postgresql. I came up with the following, using a generated column.

create table people (
  id char(11),
  birthday date generated always as (
    case
      when substring(id from 7 for 1) in ('A', 'B', 'C', 'D', 'E', 'F') then make_date((2000 + substring(id from 5 for 2)::int), substring(id from 3 for 2)::int, substring(id from 1 for 2)::int)
      when substring(id from 7 for 1) in ('Y', 'X', 'W', 'V', 'U', '-') then make_date((1900 + substring(id from 5 for 2)::int), substring(id from 3 for 2)::int, substring(id from 1 for 2)::int)
      else make_date(1800 + substring(id from 5 for 2)::int, substring(id from 3 for 2)::int, substring(id from 1 for 2)::int)
    end
  ) stored
);

I generate a bunch of PICs to test with. Generator here (not mine).

insert into people values
('200118E3687'),
('290555X6906'),
('290366W021U'),
('020896X648D'),
('060113E330N'),
('140293-3254'),
('240445Y5535'),
('271208W637P'),
('200712F334D'),
('180330Y745T'),
('010275X898K'),
('290365X369T'),
('020326Y5549'),
('210316F447H'),
('240643V002F'),
('270814B294Y'),
('020437-011D'),
('270693U601M'),
('180841U181F'),
('220311D245B'),
('030791+8405'),
('070192+523X'),
('041138U469C'),
('120902F564P'),
('210948W637R'),
('220288V496H'),
('291193+769C'),
('121020C279Y'),
('160595+647R'),
('250601-547V'),
('230533-381N'),
('271159U448Y'),
('011218D368U'),
('290391+659V'),
('221219-0457'),
('130590X252S'),
('280717V179V'),
('270798+6743'),
('230990V8624'),
('240895W718L'),
('140748Y270S'),
('130940V611N'),
('131119D0603'),
('070796+2278'),
('241098U148L'),
('110656X072P'),
('241052U730H'),
('160185+507D'),
('011002E6761'),
('090380X6594'),
('010594Y9032'),
('010594Y9021'),
('020594X903P'),
('020594X902N'),
('030594W903B'),
('030694W9024'),
('040594V9030'),
('040594V902Y'),
('050594U903M'),
('050594U902L'),
('010516B903X'),
('010516B902W'),
('020516C903K'),
('020516C902J'),
('030516D9037'),
('030516D9026'),
('010501E9032'),
('020502E902X'),
('020503F9037'),
('020504A902E'),
('020504B904H');

select
  id, birthday,
  age(birthday) as age
from people;

     id      |  birthday  |            age            
-------------+------------+---------------------------
 200118E3687 | 2018-01-20 | 6 years 5 mons 1 day
 290555X6906 | 1955-05-29 | 69 years 23 days
 290366W021U | 1966-03-29 | 58 years 2 mons 23 days
 020896X648D | 1996-08-02 | 27 years 10 mons 19 days
 060113E330N | 2013-01-06 | 11 years 5 mons 15 days
 140293-3254 | 1993-02-14 | 31 years 4 mons 7 days
 240445Y5535 | 1945-04-24 | 79 years 1 mon 27 days
 271208W637P | 1908-12-27 | 115 years 5 mons 25 days
 200712F334D | 2012-07-20 | 11 years 11 mons 1 day
 180330Y745T | 1930-03-18 | 94 years 3 mons 3 days
 010275X898K | 1975-02-01 | 49 years 4 mons 20 days
 290365X369T | 1965-03-29 | 59 years 2 mons 23 days
 020326Y5549 | 1926-03-02 | 98 years 3 mons 19 days
 210316F447H | 2016-03-21 | 8 years 3 mons
 240643V002F | 1943-06-24 | 80 years 11 mons 27 days
 270814B294Y | 2014-08-27 | 9 years 9 mons 25 days
 020437-011D | 1937-04-02 | 87 years 2 mons 19 days
 270693U601M | 1993-06-27 | 30 years 11 mons 24 days
 180841U181F | 1941-08-18 | 82 years 10 mons 3 days
 220311D245B | 2011-03-22 | 13 years 2 mons 30 days
 030791+8405 | 1891-07-03 | 132 years 11 mons 18 days
 070192+523X | 1892-01-07 | 132 years 5 mons 14 days
 041138U469C | 1938-11-04 | 85 years 7 mons 17 days
 120902F564P | 2002-09-12 | 21 years 9 mons 9 days
 210948W637R | 1948-09-21 | 75 years 9 mons
 220288V496H | 1988-02-22 | 36 years 3 mons 28 days
 291193+769C | 1893-11-29 | 130 years 6 mons 22 days
 121020C279Y | 2020-10-12 | 3 years 8 mons 9 days
 160595+647R | 1895-05-16 | 129 years 1 mon 5 days
 250601-547V | 1901-06-25 | 122 years 11 mons 26 days
 230533-381N | 1933-05-23 | 91 years 29 days
 271159U448Y | 1959-11-27 | 64 years 6 mons 24 days
 011218D368U | 2018-12-01 | 5 years 6 mons 20 days
 290391+659V | 1891-03-29 | 133 years 2 mons 23 days
 221219-0457 | 1919-12-22 | 104 years 5 mons 30 days
 130590X252S | 1990-05-13 | 34 years 1 mon 8 days
 280717V179V | 1917-07-28 | 106 years 10 mons 24 days
 270798+6743 | 1898-07-27 | 125 years 10 mons 25 days
 230990V8624 | 1990-09-23 | 33 years 8 mons 28 days
 240895W718L | 1995-08-24 | 28 years 9 mons 28 days
 140748Y270S | 1948-07-14 | 75 years 11 mons 7 days
 130940V611N | 1940-09-13 | 83 years 9 mons 8 days
 131119D0603 | 2019-11-13 | 4 years 7 mons 8 days
 070796+2278 | 1896-07-07 | 127 years 11 mons 14 days
 241098U148L | 1998-10-24 | 25 years 7 mons 28 days
 110656X072P | 1956-06-11 | 68 years 10 days
 241052U730H | 1952-10-24 | 71 years 7 mons 28 days
 160185+507D | 1885-01-16 | 139 years 5 mons 5 days
 011002E6761 | 2002-10-01 | 21 years 8 mons 20 days
 090380X6594 | 1980-03-09 | 44 years 3 mons 12 days
 010594Y9032 | 1994-05-01 | 30 years 1 mon 20 days
 010594Y9021 | 1994-05-01 | 30 years 1 mon 20 days
 020594X903P | 1994-05-02 | 30 years 1 mon 19 days
 020594X902N | 1994-05-02 | 30 years 1 mon 19 days
 030594W903B | 1994-05-03 | 30 years 1 mon 18 days
 030694W9024 | 1994-06-03 | 30 years 18 days
 040594V9030 | 1994-05-04 | 30 years 1 mon 17 days
 040594V902Y | 1994-05-04 | 30 years 1 mon 17 days
 050594U903M | 1994-05-05 | 30 years 1 mon 16 days
 050594U902L | 1994-05-05 | 30 years 1 mon 16 days
 010516B903X | 2016-05-01 | 8 years 1 mon 20 days
 010516B902W | 2016-05-01 | 8 years 1 mon 20 days
 020516C903K | 2016-05-02 | 8 years 1 mon 19 days
 020516C902J | 2016-05-02 | 8 years 1 mon 19 days
 030516D9037 | 2016-05-03 | 8 years 1 mon 18 days
 030516D9026 | 2016-05-03 | 8 years 1 mon 18 days
 010501E9032 | 2001-05-01 | 23 years 1 mon 20 days
 020502E902X | 2002-05-02 | 22 years 1 mon 19 days
 020503F9037 | 2003-05-02 | 21 years 1 mon 19 days
 020504A902E | 2004-05-02 | 20 years 1 mon 19 days
 020504B904H | 2004-05-02 | 20 years 1 mon 19 days
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

short ID

As I read the generated clause for the "birthday" column, I worry that we might occasionally accept a short ID and then mangle it. If possible, I would like to see a fatal error raised, or at least see zero such rows actually INSERTed and committed.

Background: All Americans know that Boston is an east coast city that has U.S. postal ZIP codes like 02108. And yet modern data processing systems get it wrong an astonishing amount of the time, truncating it down to 2108, with databases and Excel spreadsheets turning it into an integer. One could use a %05d format string, but then that risks turning a "dropped keystroke" typo into a legitimate looking entry, plus it risks running afoul of ZIP+4 entries. It's the sort of thing that requires constant vigilance, always preserving a ZIP as a string, and remembering to add Boston test records to midwest datasets. And it looks like Anna Suomalainen could suffer a similar fate. Fortunately nearly a third of Finns are born in the first nine days of months, so it would be harder for such errors to slip by undetected.

A simple length check would be a good start. For extra credit you might compute the mod 31 checksum, which would probably involve writing a stored procedure. And repeatedly flipping around DDMMYY to YYYYMMDD is tedious enough that, while you're at it, may as well write a trivial user func for that as well. Alternatively, exploit date parse format strings to make it a little more compact.

Instead of defaulting to else make_date(1800 + ..., I'd like to see an explicit check for a '+' separator, causing short or otherwise mangled inputs to fail.

I like that bad dates such as February 30th produce fatal error -- good.

mixed case

I imagine that IDs sometimes get entered by hand, without the CAPS LOCK key on. So if a separator of 'A' arrives as 'a', the OP code defaults to interpreting it as '+' from centuries ago.

Similarly if a checksum of 'B' arrives as 'b', we should signal fatal error or perhaps just upcase it. We could range check for 002 .. 899, but perhaps the checksum suffices.

You may find initial validation via regexp to be helpful.

\$\endgroup\$
2
  • \$\begingroup\$ Thanks @J_H. Regarding date parse format strings, do you mean using to_date instead of make_date? That's not possible in a generated column, as functions used there have to be immutable, and to_date is only stable. \$\endgroup\$ Commented Jun 25, 2024 at 7:03
  • \$\begingroup\$ Awww! Sad face. Perhaps a user func or stored procedure may be a better fit. Or perhaps we should be more trusting that some well-tested application library is feeding us only well-formatted inputs. \$\endgroup\$ Commented Jun 25, 2024 at 15:34

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.