How to replace multiple special characters in Postgres 9.5

Question

I have a table containing a list of name which might contain special character:

id   name
1    Johän
2    Jürgen
3    Janna
4    Üdyr
...

Is there a function that replaces each character for another specific one? (Not necessarily an unaccented one). Something like this:

SELECT id, function('ä,ü',name,'ae,ue');
Result:

    id   name
    1    Johaen
    2    Juergen
    3    Janna
    4    UEdyr
    ...

Possible duplicate of Multiple split and assign order_id CHECK the TRANSLATE part. How convert every char to <space> — Juan Carlos Oropeza
– Juan Carlos Oropeza, Commented Jul 27, 2016 at 16:59
@JuanCarlosOropeza this is little bit different task - you cannot use translate because it is designed for single characters only. — Pavel Stehule
– Pavel Stehule, Commented Jul 27, 2016 at 17:46
I had a similar problem and what I did was just use replace twice. Though I don’t know if that’s the best / efficient way to go about it. replace(replace(column,‘ä’,’ae’), ‘ü’,’ue’) — DardanM
– DardanM, Commented Jul 1, 2020 at 17:39

Community · Accepted Answer · 2017-05-23 12:22:36Z

replace()

If you want just to replace one or few characters you can use function replace(string text, from text, to text) that replaces all occurrences in string substring. The replace function can be used to replace one character to several characters.

translate()

If you want to translate some letters to other letters you can user function translate(string text, from text, to text) that replaces any character in a string that matches a character in the from by the corresponding character in the to set.

Some data to play with:

drop table if exists xyz;

create table xyz (
    id serial not null,
    name varchar(30)
);

insert into xyz (name) values
    ('Juhänäo'),
    ('Jürgüen'),
    ('Dannäu'),
    ('Übüdyr');

Example of replace function:

select replace(name, 'ä', 'a') from xyz;

This function replaces letter ä in the name column with letter a. Juhänäo becomes Juhanao.

select replace(name, 'ä', 'ae') from xyz;

Now it replaces letter ä with ae.

select replace(replace(replace(name, 'ä', 'ae'), 'ü', 'ue'), 'Ü', 'Ue') from xyz;

Not very nice, but in the example all ä become ae, ü become ue, and Ü become 'Ue'.

update xyz set name = replace(replace(replace(name, 'ä', 'ae'), 'ü', 'ue'), 'Ü', 'Ue');

Changes letters and updates rows. The result of the update is following:

Juhaenaeo
Juergueen
Dannaeu
Uebuedyr

Example of translate function:

select translate(name, 'ä,ü,Ü', 'a,u,U') from xyz;

Translates all letters ä to a, ü to u and Ü to U.

update xyz set name = translate(name, 'ä,ü,Ü', 'a,u,U');

Updates table so all predefined letters are translated and the change is saved to the database. The result of the update is following:

Juhanao
Jurguen
Dannau
Ubudyr

More information:

Replace characters with multi-character strings

Postgresql string functions

The commas in translate are superfluous; the correct syntax according to the docs would be translate(name, 'äüÜ', 'auU');.

Pavel Stehule · Accepted Answer · 2016-07-27 17:39:27Z

12

No, there are no this function. Probably is not to hard to write optimized C extension what does it. But C language is not necessary always. You can try SQL or PLpgSQL function:

CREATE OR REPLACE FUNCTION xx(text, text[], text[])
RETURNS text AS $$
   SELECT string_agg(coalesce($3[array_position($2, c)],c),'')
      FROM regexp_split_to_table($1,'') g(c)
$$ LANGUAGE sql;

postgres=# select xx('Jürgen', ARRAY['ä','ü'], ARRAY['ae','ue']);
┌─────────┐
│   xx    │
╞═════════╡
│ Juergen │
└─────────┘
(1 row)

On my comp it does 6000 transformation under 200ms (but I have developer build of PostgreSQL - it is slower).

answered Jul 27, 2016 at 17:39

Pavel Stehule

46.7k6 gold badges104 silver badges103 bronze badges

6 Comments

Juan Carlos Oropeza Over a year ago

I wondering what you mean developer build, you mean production build will be faster? If that is the case can you point me to what check to improve the build speed.

Juan Carlos Oropeza Over a year ago

just in case: array_position is 9.5+ not available 9.4 But yes, is fast. 12000 random name. With ARRAY['a','e'], ARRAY['ae','ue'] took 300 ms

Pavel Stehule Over a year ago

The question's subject is related to PostgreSQL 9.5, so I used 9.5 function. I have developer build with enabled assertions - there are much more memory checks, etc. When you use Postgres from rpm, deb, ... then asserts should be disabled. The check: show debug_assertions;

klin Over a year ago

Good job. Note however that the function works well if both encoding and client_encoding are set to UTF8.

Juan Carlos Oropeza Over a year ago

@PavelStehule Is my fault didnt saw the 9.5 on the title.

|

Tomi Panula-Ontto · Accepted Answer · 2021-10-19 12:14:06Z

1

Avoid writing your own unaccent function. Instead, I recommend using unaccent extension.

create extension unaccent;
select unaccent('Juhänäo');
select unaccent(name) from xyz;

Here is a link to Postgres documentation for Unaccent

answered Oct 19, 2021 at 12:14

Tomi Panula-Ontto

111 bronze badge

Comments

soufiane ELAMMARI · Accepted Answer · 2022-11-10 20:08:51Z

1

you simply use this :

select  translate(column, 'âàãáéêèíóôõüúç', 'aaaaeeeiooouuc') from table

Good luck ^^

answered Nov 10, 2022 at 20:08

soufiane ELAMMARI

1,05912 silver badges14 bronze badges

Comments

Rafs · Accepted Answer · 2023-05-31 12:27:18Z

1

If you are after German letters, then this works:

CREATE OR REPLACE FUNCTION public.udf_transliterate_german(
    german_word character varying)
    RETURNS character varying
    LANGUAGE 'sql'
    COST 100
    IMMUTABLE PARALLEL UNSAFE
AS $BODY$
SELECT REPLACE(
    REPLACE(
        REPLACE(
            REPLACE(
                REPLACE(
                    REPLACE(
                        REPLACE(
                            REPLACE(german_word,
                                'ä','ae'),
                            'ö','oe' ),
                        'ü','ue'),
                    'ß','ss'),
                'Ä', 'AE'),
            'Ö', 'OE'),
        'Ü', 'UE'),
    'ẞ', 'SS');
$BODY$;

It is not elegant though.

edited May 31, 2023 at 12:27

answered Jul 19, 2021 at 14:04

Rafs

83411 silver badges26 bronze badges

1 Comment

Pavel Stehule Feb 26 at 7:18

Attention - this function can be very slow - it can be problem when it is used very often. When it is possible, then you should to use unaccent extension.

bcsta · Accepted Answer · 2022-12-11 11:47:23Z

This is a generic recursive function written in python (but can easily be replicated in whatever language you prefer) That takes in the original string and a list of substrings to be removed:

def replace_psql_string(str_to_remove, query):

    ss = str_to_remove.split(',')
    if ss[0] != '':
        query = "REGEXP_REPLACE('{0}', '{1}', '', 'gi')".format(query, ss[0])
        return self.replace_psql_string(','.join(ss[1:]), query)
    else:
        return query

# Run it
replace_psql_string("test,foo", "input test string")

The code splits the comma separated string to a list of substrings to remove, Creates a REGEXP_REPLACE function in sql with the first substring. Then, recursively calls the function again with [1:] elements and append each REGEXP query to the passed parameter. So that finally, when no more substrings are in the list to remove, the final aggregated query is returned.

This will output:

REGEXP_REPLACE(REGEXP_REPLACE('input test string, 'test', '', 'gi'), 'foo', '', 'gi')

Then use it in whatever part of your query that makes sense. Can be tested by appending a SELECT in front:

SELECT REGEXP_REPLACE(REGEXP_REPLACE('input test string, 'test', '', 'gi'), 'foo', '', 'gi');


regexp_replace 
----------------
 input  string
(1 row)

Collectives™ on Stack Overflow

How to replace multiple special characters in Postgres 9.5

6 Answers 6

1 Comment

6 Comments

Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

1 Comment

6 Comments

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related