37

I'm currently attempting to modify an existing API that interacts with a postgres database. Long story short, it's essentially stores descriptors/metadata to determine where an actual 'asset' (typically this is a file of some sort) is storing on the server's hard disk.

Currently, its possible to 'tag' these 'assets' with any number of undefined key-value pairs (i.e. uploadedBy, addedOn, assetType, etc.) These tags are stored in a separate table with a structure similar to the following:

+---------------+----------------+-------------+
|assetid (text) | tagid(integer) | value(text) |
|---------------+----------------+-------------|
|someStringValue| 1234           | someValue   |
|---------------+----------------+-------------|
|aDiffStringKey | 1235           | a username  |
|---------------+----------------+-------------|
|aDiffStrKey    | 1236           | Nov 5, 1605 |
+---------------+----------------+-------------+

assetid and tagid are foreign keys from other tables. Think of the assetid representing a file and the tagid/value pair is a map of descriptors.

Right now, the API (which is in Java) creates all these key-value pairs as a Map object. This includes things like timestamps/dates. What we'd like to do is to somehow be able to store different types of data for the value in the key-value pair. Or at least, storing it differently within the database, so that if we needed to, we could run queries checking date-ranges and the like on these tags. However, if they're stored as text items in the db, then we'd have to a.) Know that this is actually a date/time/timestamp item, and b.) convert into something that we could actually run such a query on.

There is only 1 idea I could think of thus far, without complete changing changing the layout of the db too much.

It is to expand the assettag table (shown above) to have additional columns for various types (numeric, text, timestamp), allow them to be null, and then on insert, checking the corresponding 'key' to figure out what type of data it really is. However, I can see a lot of problems with that sort of implementation.

Can any PostgreSQL-Ninjas out there offer a suggestion on how to approach this problem? I'm only recently getting thrown back into the deep-end of database interactions, so I admit I'm a bit rusty.

2
  • 1
    This is one reason why it is often a bad idea to use EAV tables to store data. Commented Aug 14, 2013 at 14:28
  • I agree. The joys of inheriting an existing design, I suppose. C'est la vie. Commented Aug 14, 2013 at 14:52

4 Answers 4

52

You've basically got two choices:

Option 1: A sparse table

Have one column for each data type, but only use the column that matches that data type you want to store. Of course this leads to most columns being null - a waste of space, but the purists like it because of the strong typing. It's a bit clunky having to check each column for null to figure out which datatype applies. Also, too bad if you actually want to store a null - then you must chose a specific value that "means null" - more clunkiness.

Option 2: Two columns - one for content, one for type

Everything can be expressed as text, so have a text column for the value, and another column (int or text) for the type, so your app code can restore the correct value in the correct type object. Good things are you don't have lots of nulls, but importantly you can easily extend the types to something beyond SQL data types to application classes by storing their value as json and their type as the class name.

I have used option 2 several times in my career and it was always very successful.

Sign up to request clarification or add additional context in comments.

6 Comments

Bohemian, thanks for the response. If I went with option 2, how would I got about doing a date-range type query on the values whose type is marked as a date/time type? Would it be best to write the postgresql equivalent of a stored procedure to execute that type of query?
Store dates as the number of days since 1970, or if a timestamp type, the number of seconds since 1970. Google epoch time or unix time.
Nice answer, exactly this problem I had and couldn't decide. Even if in my case 'postgres' is very effective storing 'null' values, your argument of individual types convinced me. Thanks!
What about indexes for range queries in option 2? I think that using option 2 you end up with something that you cannot query efficiently...
Any update on how to efficiently query option 2?
|
8

Another option, depending on what your doing, could be to just have one value column but store some json around the value...

This could look something like:

  {
    "type": "datetime",
    "value": "2019-05-31 13:51:36" 
  } 

That could even go a step further, using a Json or XML column.

Comments

4

I'm not in any way PostgreSQL ninja, but I think that instead of two columns (one for name and one for type) you could look at hstore data type:

data type for storing sets of key/value pairs within a single PostgreSQL value. This can be useful in various scenarios, such as rows with many attributes that are rarely examined, or semi-structured data. Keys and values are simply text strings.

Of course, you have to check how date/timestamps converting into and from this type and see if it good for you.

Comments

1

You can use 2 different technics:

  1. if you have floating type for every tagid

Define table and ID for every tagid-assetid combination and actual data tables:

maintable:
+---------------+----------------+-----------------+---------------+
|assetid (text) | tagid(integer) | tablename(text) | table_id(int) |
|---------------+----------------+-----------------+---------------|
|someStringValue| 1234           | tablebool       | 123           |
|---------------+----------------+-----------------+---------------|
|aDiffStringKey | 1235           | tablefloat      | 123           |
|---------------+----------------+-----------------+---------------|
|aDiffStrKey    | 1236           | tablestring     | 123           |
+---------------+----------------+-----------------+---------------+

tablebool
+-------------+-------------+
| id(integer) | value(bool) |
|-------------+-------------|
| 123         | False       |
+-------------+-------------+

tablefloat
+-------------+--------------+
| id(integer) | value(float) |
|-------------+--------------|
| 123         | 12.345       |
+-------------+--------------+

tablestring
+-------------+---------------+
| id(integer) | value(string) |
|-------------+---------------|
| 123         | 'text'        |
+-------------+---------------+
  1. In case if every tagid has fixed type

create tagid description table

tag descriptors
+---------------+----------------+-----------------+
|assetid (text) | tagid(integer) | tablename(text) |
|---------------+----------------+-----------------|
|someStringValue| 1234           | tablebool       |
|---------------+----------------+-----------------|
|aDiffStringKey | 1235           | tablefloat      |
|---------------+----------------+-----------------|
|aDiffStrKey    | 1236           | tablestring     |
+---------------+----------------+-----------------+

and correspodnding data tables

tablebool
+-------------+----------------+-------------+
| id(integer) | tagid(integer) | value(bool) |
|-------------+----------------+-------------|
| 123         | 1234           | False       |
+-------------+----------------+-------------+

tablefloat
+-------------+----------------+--------------+
| id(integer) | tagid(integer) | value(float) |
|-------------+----------------+--------------|
| 123         | 1235           | 12.345       |
+-------------+----------------+--------------+

tablestring
+-------------+----------------+---------------+
| id(integer) | tagid(integer) | value(string) |
|-------------+----------------+---------------|
| 123         | 1236           | 'text'        |
+-------------+----------------+---------------+

All this is just for general idea. You should adapt it for your needs.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.