3

I've been working in a large scale project where all the primary keys are stored as RAW type. The ID field is auto-generated as a unique 16 digit UUID. I can't find any particular advantage of using RAW type column. Can someone help understand if there is any real advantage of storing primary keys in RAW format instead of VARCHAR2?

2 Answers 2

6

Well in the database design typically the size matter. The bigger key takes more space in storage, on disc, the sorting takes longer time etc.

From this point the integer database key is the most compact one (implemented as a NUMBER type with zero precision, allocation typically between 2-8 bytes). From various reasons UUID is used as a key – with various motivations that are often independent of the database design rules.

Additionally, the UUID is often stored as formatted string in a VARCHAR2 column.

This is similar design as if you would store DATEs as a string (which is considered not a best practice).

Despite of it the RAW(16) columns allocate 16 bytes, the formatted UUID 36 bytes.

So in summary IMO there a following recommendations

  1. Use NUMBER keys
  2. If you can’t (and have solid arguments for it) use UUID in RAW(16) format

Note that of course the RAW format is a bit inconvenient to handle than a string (e.g. in setting of a bind variable). This often leads to the decision of storing the UUIDas a string - the vast majority of cases I encountered.

Below a small example illustrating the difference in sizing

create table tab
(id INT,
 RAW_UUID RAW(16) 
); 

insert into tab(ID,RAW_UUID) values (1,sys_guid());
insert into tab(ID,RAW_UUID) values (1000000001,sys_guid());

select * from tab;

        ID RAW_UUID                        
---------- --------------------------------
         1 8135869AECF44FB280A04033888FD518
1000000001 DE04ED07DDD84D1AABE9059F38364C7E



select vsize(id), vsize(raw_uuid)  from tab;

 VSIZE(ID) VSIZE(RAW_UUID)
---------- ---------------
         2              16
         6              16

What you can do is to define a virtual column (i.e. column that allocates no space) that presents the formatted UUID:

alter table tab add ( UUID VARCHAR2(36) GENERATED ALWAYS AS 
    (SUBSTR(LOWER(RAWTOHEX(RAW_UUID)),1,8)||'-'||SUBSTR(LOWER(RAWTOHEX(RAW_UUID)),9,4)||'-'||
    SUBSTR(LOWER(RAWTOHEX(RAW_UUID)),13,4)||'-'||SUBSTR(LOWER(RAWTOHEX(RAW_UUID)),17,4)||'-'||
    SUBSTR(LOWER(RAWTOHEX(RAW_UUID)),21,12)) VIRTUAL VISIBLE); 

Now the table has the text form UUID as well and you can use the familiar query

select * from tab where uuid = 'cbf7e2e2-a9e9-40fb-badc-18cb9a4fe663';

You can even define an index on the virtual column, but always before using UUID think on the Rule 1 above.

Sign up to request clarification or add additional context in comments.

3 Comments

This assumes that storage is expensive and processing is cheap; however, the OP should profile this and may find that storage is cheap and processing is expensive and they need to have a human-readable GUID as a formatted string and then it may be better to store the GUID as a VARCHAR2 and not as a RAW and then they do not need the virtual column and the processing overhead of formatting.
Well, sorting or a hash join on an full blown UUID requires much more processing than an equivalent on a simple NUMBER (or RAW) - IMO @MT0. But my main point is do not use UUID at all, esp. if it is centrally assigned as mentioned above. What is the advantage agains a sequence?
Main advantage is that can be generated in client, so I can manage asynchronies much better because ID is created before persist
2

A GUID in Oracle is represented as raw(16).

You can get a GUID like this:

select sys_guid() from dual;

That's why you should use raw(16).

3 Comments

A GUID can also be stored as a hexidecimal string such as 3F2504E0-4F89-11D3-9A0C-0305E82C3301. The only reason to choose the RAW binary format over a string is if you are trying to use minimal memory; if you are going to end up converting the RAW to a human-readable hexidecimal string then that is going to take some computation every time you do it and the trade-off of some increased storage is probably worth it to use VARCHAR2(38) or CHAR(38) vs RAW.
On the other hand with varchar ther will be some additional processing for characterset that is not needed.
The similar argumentation as above would IMO lead to: "A DATE can also be stored as a character string such as 01-MAY-2021. The only reason to choose the DATE format over a string is if you are trying to use minimal memory" - which is probably not true.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.