9

I'm currently working on a table that contains hashes, stored in bytea format. Converting the hashes to hex-strings however yields the wrong order of bytes. Example:

SELECT encode(hash, 'hex') FROM mytable LIMIT 1;

Output: 1a6ee4de86143e81
Expected: 813e1486dee46e1a

Is there a way to reverse the order of bytes for all entries?

1
  • 3
    Why do you expect the hash to be reversed byte-wise? That's ... weird. Commented Nov 30, 2016 at 8:09

8 Answers 8

7

Here is one method of doing it, however I would never do this. There is nothing wrong with storing bytes in a database's bytea column. But, I wouldn't bit wrangle in the database, and if I did I would use,

  • a C language function, or
  • some fancy procedural language that didn't require me exploding the inputs into a set of bytes.

This is sql-esque and should work -- here is what we're doing,

  1. Generate a set consisting of a series of offsets 0 - (bytelength-1).
  2. Map those offsets to bytes represented as strings of hex.
  3. String aggregate them in reverse order.

Here is an example,

CREATE TABLE foo AS SELECT '\x813e1486dee46e1a'::bytea AS bar;

SELECT bar, string_agg(to_hex(byte), '') AS hash
FROM foo
CROSS JOIN LATERAL (
  SELECT get_byte(bar,"offset") AS byte
  FROM generate_series(0,octet_length(bar)-1) AS x("offset")
  ORDER BY "offset" DESC
) AS x
GROUP BY bar;

Two notes,

  1. We could probably not use offset because it's reserved but you get the point.
  2. This assumes that your hash (bar in the above) is UNIQUE.
0
5

Solutions with tools in vanilla Postgres:

I added a column bytea_reverse to both solutions. Remove it if you don't need it.

With get_byte():

SELECT t.b, text_reverse, decode(text_reverse, 'hex') AS bytea_reverse
FROM   tbl t
LEFT   JOIN LATERAL (
   SELECT string_agg(to_hex(get_byte(b, x)), '') AS text_reverse
   FROM   generate_series(octet_length(t.b) - 1, 0, -1) x
   ) x ON true;

This is similar to what @Evan provided. Most of his excellent explanation applies. But:

  • Use LEFT JOIN LATERAL ... ON true or you lose rows with NULL values.
  • generate_series() can provide numbers in reverse, so we do not need another ORDER BY step.
  • While using a LATERAL join, aggregate in the subquery. Less error prone, easier to integrate with more complex queries, and no need to GROUP BY in the outer query.

With regexp_matches():

SELECT t.b, text_reverse, decode(text_reverse, 'hex') AS bytea_reverse
FROM   tbl t
LEFT   JOIN LATERAL (
   SELECT string_agg(byte[1], '' ORDER  BY ord DESC) AS text_reverse
   FROM   regexp_matches(encode(t.b, 'hex' ), '..', 'g' ) WITH ORDINALITY AS x(byte, ord)
   ) x ON true;

This is similar to the "verbose" variant @filiprem provided. But:

  • Use LEFT JOIN LATERAL ... ON true or you lose rows with NULL values.
  • Use WITH ORDINALITY to get row numbers "for free". So we neither need another subquery with row_number() nor a double reverse(). Details:
  • Reverse ordering can be done in the aggregate function. (But it might be a bit faster to order in the subquery and add another subquery layer to aggregate pre-ordered rows.)
  • One subquery (or two) instead of two CTE is typically faster.

Similar question on SO:

2
  • The main disadvantage is that the SQL/PLSQL is much slower then the other scripting languages like Python/Perl/TCL/Java for the scalar code which does not acts with data in the DB. At this point of view the using of the "C language function" as suggested by @Evan is the best solution but it is somewhere complicated for the distribution. Commented Dec 3, 2016 at 2:13
  • 2
    @Abelisto: Yup, a C function should be faster by orders of magnitude - especially for bytea values longer than a couple of bytes. Commented Dec 3, 2016 at 2:21
4

If you need just to reverse bytes in the bytea value there is the (relatively) simple and fast solution using plpythonu:

create or replace function reverse_bytea(p_inp bytea) returns bytea stable language plpythonu as $$
  b = bytearray()
  b.extend(p_inp)
  b.reverse()
  return b
$$;

select encode(reverse_bytea('\x1a6ee4de86143e81'), 'hex');
----
813e1486dee46e1a

However I suppose that something wrong with data itself (the storage way, the data interpretation...)

2
  • I think you should have went the extra distance if you're going to use plpython: you should have returned the hex-encoded string. Commented Nov 30, 2016 at 5:06
  • 1
    @EvanCarroll And convert the result back to bytea if somebody needs just reverse the value? IMO one function - one task. However it's a matter of taste. Commented Nov 30, 2016 at 8:23
3

You could treat encoded representation as text and use regexp to reverse byte by byte.

SELECT string_agg(reverse(b[1]),'')
FROM regexp_matches(reverse(encode('STUFF','hex')),'..','g')b;

Another (more verbose) method:

WITH bytes AS (
  SELECT row_number() over() AS n, byte[1]
  FROM regexp_matches( encode( 'STUFF', 'hex' ), '..', 'g' ) AS byte
), revbytes AS (
  SELECT * FROM bytes ORDER BY n DESC
)
SELECT array_to_string(array_agg(byte),'')
FROM revbytes;

Sample usage:

(filip@[local:/var/run/postgresql]:5432) filip=# SELECT encode( 'STUFF', 'hex' );
   encode   
------------
 5354554646
(1 row)

(filip@[local:/var/run/postgresql]:5432) filip=# SELECT string_agg(reverse(b[1]),'')FROM regexp_matches(reverse(encode('STUFF','hex')),'..','g')b;
 string_agg 
------------
 4646555453
(1 row)
1
  • Nice catch. I thought about regexp_matches and reverse but not guessed about double-reverse :) Commented Nov 29, 2016 at 23:08
3

Thanks to all the suggestions, I wrote this C-Language-Function that works as needed:

#include "postgres.h"
#include "fmgr.h"

#ifdef PG_MODULE_MAGIC
    PG_MODULE_MAGIC;
#endif

Datum bytea_custom_reverse(PG_FUNCTION_ARGS);

PG_FUNCTION_INFO_V1(bytea_custom_reverse);
Datum
bytea_custom_reverse(PG_FUNCTION_ARGS) {
  bytea *data = PG_GETARG_BYTEA_P_COPY(0);
  unsigned char *ptr = (unsigned char *) VARDATA(data);

  int32 dataLen = VARSIZE(data) - VARHDRSZ;

  unsigned char *start, *end;

  for ( start = ptr, end = ptr + dataLen - 1; start < end; ++start, --end ) {
    unsigned char swap = *start;
    *start = *end;
    *end = swap;
  }


  PG_RETURN_BYTEA_P(data);
}
2

Inspired by the answer of R. Martin, I created the pg_bswap extension that can reverse byte order for various data types, with decent speed:

https://github.com/pandrewhk/pg_bswap

Code from GitHub edited in:

#include "postgres.h"
#include "fmgr.h"
#include "utils/lsyscache.h"
#include "port/pg_bswap.h"

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1(bswap);
Datum
bswap(PG_FUNCTION_ARGS) {
    Oid     typ = get_fn_expr_argtype(fcinfo->flinfo, 0);
    int16   typlen;
    bool    typbyval;

    if (!OidIsValid(typ))
        elog(ERROR, "could not determine data type of input");

    get_typlenbyval(typ, &typlen, &typbyval);

    //pass by reference, variable-length
    if (typlen == -1){
        struct varlena *d = PG_DETOAST_DATUM_COPY(PG_GETARG_VARLENA_PP(0));
        unsigned char *ptr = (unsigned char *) VARDATA_ANY(d);
        int32 dataLen = VARSIZE_ANY_EXHDR(d);

        unsigned char *start, *end;

        for (start = ptr, end = ptr + dataLen - 1; start < end; ++start, --end) {
            unsigned char swap = *start;
            *start = *end;
            *end = swap;
        }
        PG_RETURN_POINTER(d);
    }

    //pass by reference, fixed length
    if (typlen > 0 && !typbyval) {
        unsigned char *d = (unsigned char *) PG_GETARG_POINTER(0);
        unsigned char *r = (unsigned char *) palloc(typlen);

        for (int16 i = 0; i < typlen; ++i)
            *(r+i) = *(d+typlen-i-1); //copy backwards

        PG_RETURN_POINTER(r);
    }

    //pass by value, fixed length
    if (typbyval && typlen == 1)
        PG_RETURN_DATUM(PG_GETARG_DATUM(0));

    if (typbyval && typlen == 2)
        PG_RETURN_UINT16(pg_bswap16(PG_GETARG_UINT16(0)));

    if (typbyval && typlen == 4)
        PG_RETURN_UINT32(pg_bswap32(PG_GETARG_UINT32(0)));

    if (typbyval && typlen == 8)
        PG_RETURN_INT64(pg_bswap64(PG_GETARG_INT64(0)));

    elog(ERROR, "unexpected argument type");
    PG_RETURN_NULL();
}
5
  • 1
    Unless you give more details, this may be considered as just spam. Commented Dec 29, 2022 at 12:15
  • 1
    I decided to put in your code - link-only is "frowned upon" here by the powers that be (bitrot/linkrot) - I thought it was fine, but as of the time of writing, it's been over a year since you've visited this site. Anyway, given the current poor health of SE generally in terms of questions being posted relative to, say, 8/9 years ago, I think GitHub is likely to last just a tad longer than this network! I've assumed that your (elegant) contribution can be released under this licence? :-) Commented Feb 14 at 5:10
  • @Vérace thanks! yeah, treat as BSD 2-clause or CC0, whichever feels more familiar Commented Feb 15 at 7:27
  • Oh, wasn't expecting you to pop up - thanks for your input! Just a quick question if I may! There is no file "port/pg_bswap.h" that I could find in the repo - is this a placeholder for (we've all had them) future plans or similar? Commented Feb 15 at 9:51
  • @Vérace zero recollection, probably lack of cleanup after copy-paste Commented Feb 16 at 15:41
1

Was curious if there's a simple way to reverse bytea strings without dealing with hex or regex.

In case someone finds it useful, here's what I came up with:

create or replace function bytea_reverse(bytea)
returns bytea
language sql immutable parallel safe
return (
    select string_agg(substring($1 from byte_pos for 1), '')
      from generate_series(octet_length($1), 1, -1) as byte_pos);

I also had the need to output integers in little endian format that the above function can easily do with bytea_reverse(int4send(x)), but since we know it is always 4 bytes, this is about 30% faster:

create or replace function public.int4send_le(int)
returns bytea
language sql immutable parallel safe
return
(select int4send( $1 << 24
               | ($1 <<  8 & 16711680)
               | ($1 >>  8 & 65280)
               | ($1 >> 24 & 255)));

These correctly reverse both positive and negative integers as bytea:

select int4send(i), int4send_le(i), bytea_reverse(int4send(i)) from generate_series(-2,1) i;

  int4send  | int4send_le | bytea_reverse 
------------+-------------+---------------
 \xfffffffe | \xfeffffff  | \xfeffffff
 \xffffffff | \xffffffff  | \xffffffff
 \x00000000 | \x00000000  | \x00000000
 \x00000001 | \x01000000  | \x01000000
3
  • This should have been upvoted more often - but then the question was pretty old when you replied - nice piece of bit-twiddling! It'll help for a wee project that I'm considering! Commented Feb 14 at 5:20
  • Just a quick question - do you have a link/reference/whatever for the int4send() function - I've found stuff here and there but I'm wondering if there's a clear text about this (and all of the other similar) functions? Commented Feb 16 at 19:20
  • All these functions are listed in pg_catalog.pg_proc. There are two per each data type, one send and one recv. Look at pg_catalog.pg_type.{typreceive, typsend} for names of functions for each type. These deal with binary protocol. typin and typout are for text protocol. Best to look details up in the source. Commented Feb 23 at 15:32
0

Thanks for helping this thread. And this is my choose of convert bigint to bytea in littleEndian like in C# using BitConverter.GetBytes() according on answers:

with mycte as (
select int8send(394112768534335::bigint) as conversionValue
)
SELECT decode(string_agg (
  (case when get_byte(conversionValue, x)<= 15 then ('0')  else  '' end) ||
  to_hex(get_byte(conversionValue, x))
  , ''), 'hex') AS nativeId_reverse  
   FROM mycte,  generate_series(octet_length(conversionValue) - 1, 0, -1) as x;

For search value placed in postgresql as littleEndian byteA by it Bigint presentation:

with mycte as (
select int8send(394112768534335::bigint) as conversionValue
)
Select * FROM mycte, *SomeByteaFieldTable*
where *SomeByteaId* =                                                                         
(SELECT decode(string_agg (
  (case when get_byte(conversionValue, x)<= 15 then ('0')  else  '' end) ||
  to_hex(get_byte(conversionValue, x))
  , ''), 'hex') AS nativeId_reverse  
   FROM   generate_series(octet_length(conversionValue) - 1, 1, -1) x);   
0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.