4

My database has a lot of descriptions which are similar to each other and I want to group them together, but because of different numbers, they are not being grouped together. So is there any way I can mask the numbers and make the descriptions same.

We can do that in Excel or Notepad++, using find and replace, so anyway in which it would be possible in SQL. I know we can replace in SQL using the function

REPLACE('column', 'new input', 'to be replaced')

But how to do it for regex, as the numbers can be in any combination.

I am using PostgreSQL.

Some inputs :-

sample input description 123
sample input description 456
this is another description 678
this is another description 999

I would like to convert them to:-

sample input description xxx
sample input description xxx
this is another description xxx
this is another description xxx

the numbers can be anywhere.

I am doing it on redshift.

5
  • 1
    Are you using MS SQL Server or Postgresql? Commented Jan 18, 2018 at 9:42
  • 2
    please decide on the vendor, also provide some samples Commented Jan 18, 2018 at 9:42
  • That's not masking. You want to transform the input based on some unspecified rules that will produce the same result from different input. What are those rules? Commented Jan 18, 2018 at 9:43
  • Postgresql supports regex contrary to sql-server. Regex .Net assembly can be accesed from sql-server using CLR integration. Commented Jan 18, 2018 at 9:46
  • @Serg and neither should be used carelessly. A query that uses a regex can't use any underlying indexes. The server would have to cache the transformed data before grouping. It's even worse for filtering - the server would have to scan the entire table and consume the entire text value each time before determining whether there's a match or not. This can waste a lot of memory if there are many rows and/or the text field is large Commented Jan 18, 2018 at 9:51

3 Answers 3

4

You'd use

regexp_replace(col, '[[:digit:]]+', '#')

in order to replace any number of digits with a single #.

Rextester demo: http://rextester.com/BFSP36237

Use the flag 'g' if multiple numbers can occur in a string:

regexp_replace(col, '[[:digit:]]+', '#', 'g')

Rextester demo: http://rextester.com/WHTJ51233

Sign up to request clarification or add additional context in comments.

Comments

2

You can use this REGEXP_REPLACE function in below format.

select regexp_replace ( columnthatneedtomask,'[0-9]','x' )  from table ;

refer below link for more information:-

https://docs.aws.amazon.com/redshift/latest/dg/REGEXP_REPLACE.html

Comments

2
select regexp_replace(column, '[0-9]', 'x', 'g') as newcolumn from table;

1 Comment

It would be better if you provide some explanations, maybe a link to documentation. And why from table?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.