16

I have a table containing a list of name which might contain special character:

id   name
1    Johän
2    Jürgen
3    Janna
4    Üdyr
...

Is there a function that replaces each character for another specific one? (Not necessarily an unaccented one). Something like this:

SELECT id, function('ä,ü',name,'ae,ue');
Result:

    id   name
    1    Johaen
    2    Juergen
    3    Janna
    4    UEdyr
    ...
3
  • Possible duplicate of Multiple split and assign order_id CHECK the TRANSLATE part. How convert every char to <space> Commented Jul 27, 2016 at 16:59
  • @JuanCarlosOropeza this is little bit different task - you cannot use translate because it is designed for single characters only. Commented Jul 27, 2016 at 17:46
  • 1
    I had a similar problem and what I did was just use replace twice. Though I don’t know if that’s the best / efficient way to go about it. replace(replace(column,‘ä’,’ae’), ‘ü’,’ue’) Commented Jul 1, 2020 at 17:39

6 Answers 6

22

replace()

If you want just to replace one or few characters you can use function replace(string text, from text, to text) that replaces all occurrences in string substring. The replace function can be used to replace one character to several characters.

translate()

If you want to translate some letters to other letters you can user function translate(string text, from text, to text) that replaces any character in a string that matches a character in the from by the corresponding character in the to set.

Some data to play with:

drop table if exists xyz;

create table xyz (
    id serial not null,
    name varchar(30)
);

insert into xyz (name) values
    ('Juhänäo'),
    ('Jürgüen'),
    ('Dannäu'),
    ('Übüdyr');

Example of replace function:

select replace(name, 'ä', 'a') from xyz;

This function replaces letter ä in the name column with letter a. Juhänäo becomes Juhanao.

select replace(name, 'ä', 'ae') from xyz;

Now it replaces letter ä with ae.

select replace(replace(replace(name, 'ä', 'ae'), 'ü', 'ue'), 'Ü', 'Ue') from xyz;

Not very nice, but in the example all ä become ae, ü become ue, and Ü become 'Ue'.

update xyz set name = replace(replace(replace(name, 'ä', 'ae'), 'ü', 'ue'), 'Ü', 'Ue');

Changes letters and updates rows. The result of the update is following:

Juhaenaeo
Juergueen
Dannaeu
Uebuedyr

Example of translate function:

select translate(name, 'ä,ü,Ü', 'a,u,U') from xyz;

Translates all letters ä to a, ü to u and Ü to U.

update xyz set name = translate(name, 'ä,ü,Ü', 'a,u,U');

Updates table so all predefined letters are translated and the change is saved to the database. The result of the update is following:

Juhanao
Jurguen
Dannau
Ubudyr

More information:

Replace characters with multi-character strings

Postgresql string functions

Sign up to request clarification or add additional context in comments.

1 Comment

The commas in translate are superfluous; the correct syntax according to the docs would be translate(name, 'äüÜ', 'auU');.
12

No, there are no this function. Probably is not to hard to write optimized C extension what does it. But C language is not necessary always. You can try SQL or PLpgSQL function:

CREATE OR REPLACE FUNCTION xx(text, text[], text[])
RETURNS text AS $$
   SELECT string_agg(coalesce($3[array_position($2, c)],c),'')
      FROM regexp_split_to_table($1,'') g(c)
$$ LANGUAGE sql;

postgres=# select xx('Jürgen', ARRAY['ä','ü'], ARRAY['ae','ue']);
┌─────────┐
│   xx    │
╞═════════╡
│ Juergen │
└─────────┘
(1 row)

On my comp it does 6000 transformation under 200ms (but I have developer build of PostgreSQL - it is slower).

6 Comments

I wondering what you mean developer build, you mean production build will be faster? If that is the case can you point me to what check to improve the build speed.
just in case: array_position is 9.5+ not available 9.4 But yes, is fast. 12000 random name. With ARRAY['a','e'], ARRAY['ae','ue'] took 300 ms
The question's subject is related to PostgreSQL 9.5, so I used 9.5 function. I have developer build with enabled assertions - there are much more memory checks, etc. When you use Postgres from rpm, deb, ... then asserts should be disabled. The check: show debug_assertions;
Good job. Note however that the function works well if both encoding and client_encoding are set to UTF8.
@PavelStehule Is my fault didnt saw the 9.5 on the title.
|
1

Avoid writing your own unaccent function. Instead, I recommend using unaccent extension.

create extension unaccent;
select unaccent('Juhänäo');
select unaccent(name) from xyz;

Here is a link to Postgres documentation for Unaccent

Comments

1

you simply use this :

select  translate(column, 'âàãáéêèíóôõüúç', 'aaaaeeeiooouuc') from table

Good luck ^^

Comments

1

If you are after German letters, then this works:

CREATE OR REPLACE FUNCTION public.udf_transliterate_german(
    german_word character varying)
    RETURNS character varying
    LANGUAGE 'sql'
    COST 100
    IMMUTABLE PARALLEL UNSAFE
AS $BODY$
SELECT REPLACE(
    REPLACE(
        REPLACE(
            REPLACE(
                REPLACE(
                    REPLACE(
                        REPLACE(
                            REPLACE(german_word,
                                'ä','ae'),
                            'ö','oe' ),
                        'ü','ue'),
                    'ß','ss'),
                'Ä', 'AE'),
            'Ö', 'OE'),
        'Ü', 'UE'),
    'ẞ', 'SS');
$BODY$;

It is not elegant though.

1 Comment

Attention - this function can be very slow - it can be problem when it is used very often. When it is possible, then you should to use unaccent extension.
0

This is a generic recursive function written in python (but can easily be replicated in whatever language you prefer) That takes in the original string and a list of substrings to be removed:

def replace_psql_string(str_to_remove, query):

    ss = str_to_remove.split(',')
    if ss[0] != '':
        query = "REGEXP_REPLACE('{0}', '{1}', '', 'gi')".format(query, ss[0])
        return self.replace_psql_string(','.join(ss[1:]), query)
    else:
        return query

# Run it
replace_psql_string("test,foo", "input test string")

The code splits the comma separated string to a list of substrings to remove, Creates a REGEXP_REPLACE function in sql with the first substring. Then, recursively calls the function again with [1:] elements and append each REGEXP query to the passed parameter. So that finally, when no more substrings are in the list to remove, the final aggregated query is returned.

This will output:

REGEXP_REPLACE(REGEXP_REPLACE('input test string, 'test', '', 'gi'), 'foo', '', 'gi')

Then use it in whatever part of your query that makes sense. Can be tested by appending a SELECT in front:

SELECT REGEXP_REPLACE(REGEXP_REPLACE('input test string, 'test', '', 'gi'), 'foo', '', 'gi');


regexp_replace 
----------------
 input  string
(1 row)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.