3

I'm trying to remove non alphanumeric characters in multiple columns in a table, and have no permission to create functions nor temporary functions. I'm wonder whether anyone here have any experiences removing non alphanumeric characters without creating any functions at all? Thanks. I'm using MS SQL Server Management Studio v17.9.1

7
  • You can run replace() a lot. Commented Mar 1, 2019 at 22:21
  • Did you mean for each column I want to replace? Commented Mar 1, 2019 at 22:24
  • @cheklapkok - are you limited to a single SELECT in which you are to return the column data without the non-alphanumeric characters? Could you give us a little more insight/context as to what your working environment is like? Commented Mar 1, 2019 at 22:40
  • I think @GordonLinoff means what the OP used in this question: T-SQL strip all non-alpha and non-numeric characters. Commented Mar 1, 2019 at 22:40
  • @Forty3 I need to strip out all non-alphanumeric characters from a few columns from a table in the database, but I don't have permission to create functions, temporary functions nor procedures at all. So, I'm wondering whether there's a way to do it the hardway. Commented Mar 1, 2019 at 22:47

2 Answers 2

2

If you have to use a single SELECT query like @Forty3 mentioned then the multiple REPLACEs like @Gordon-Linoff said is probably best (but definitely not ideal).

If you can update the data or use T-SQL, then you could do something like this from https://searchsqlserver.techtarget.com/tip/Replacing-non-alphanumeric-characters-in-strings-using-T-SQL:

while @@rowcount > 0
        update  user_list_original
        set     fname = replace(fname, substring(fname, patindex('%[^a-zA-Z ]%', fname), 1), '')
        where   patindex('%[^a-zA-Z ]%', fname) <> 0
Sign up to request clarification or add additional context in comments.

2 Comments

I'm currently trying this solution, but it seems like it'll take a long time. I'm not sure if it would ever finish since I have millions of rows :\
No SQL-only, non-function solution will perform well with millions of rows.
0

Here is a starting point - you will need to adjust it to accommodate all of the columns which require cleansing:

;WITH allcharcte ( id, textcol1, textcol2, textcol1where, textcol2where )
     AS (SELECT id,
                CAST(textcol1 AS NVARCHAR(255)),
                CAST(textcol2 AS NVARCHAR(255)),
                -- Start the process of looking for non-alphanumeric chars in each
                -- of the text columns. The returned value from PATINDEX is the position
                -- of the non-alphanumeric char and is stored in the *where columns 
                -- of the CTE.
                PATINDEX(N'%[^0-9A-Z]%', textcol1),
                PATINDEX(N'%[^0-9A-Z]%', textcol2)
           FROM #temp

         UNION ALL

         -- This is the recursive part. It works through the rows which have been
         -- returned thus far processing them for use in the next iteration
         SELECT prev.id,
                -- If the *where column relevant for each of the columns is NOT NULL
                -- and NOT ZERO, then use the STUFF command to replace the char
                -- at that location with an empty string
                CASE ISNULL(prev.textcol1where, 0)
                  WHEN 0 THEN CAST(prev.textcol1 AS NVARCHAR(255))
                  ELSE CAST(STUFF(prev.textcol1, prev.textcol1where, 1, N'') AS NVARCHAR(255))
                END,
                CASE ISNULL(prev.textcol2where, 0)
                  WHEN 0 THEN CAST(prev.textcol2 AS NVARCHAR(255))
                  ELSE CAST(STUFF(prev.textcol2, prev.textcol2where, 1, N'') AS NVARCHAR(255))
                END,

                -- We now check for the existence of the next non-alphanumeric
                -- character AFTER we replace the most recent finding
                ISNULL(PATINDEX(N'%[^0-9A-Z]%', STUFF(prev.textcol1, prev.textcol1where, 1, N'')), 0),
                ISNULL(PATINDEX(N'%[^0-9A-Z]%', STUFF(prev.textcol2, prev.textcol2where, 1, N'')), 0)
           FROM allcharcte prev
          WHERE ISNULL(prev.textcol1where, 0) > 0
             OR ISNULL(prev.textcol2where, 0) > 0)
SELECT *
  FROM allcharcte
 WHERE textcol1where = 0
   AND textcol2where = 0 

Essentially, it is a recursive CTE which will repeatedly replace any non-alphanumeric character (found via the PATINDEX(N'%[^0-9A-Z]%', <column>)) with an empty string (via the STUFF(<column>, <where>, N'')). By replicating the blocks, you should be able to adapt it to any number of columns.

EDIT: If you anticipate having more than 100 instances of non-alphanumeric characters to strip out of any one column, you will need to adjust the MAXRECURSION property ahead of the call.

3 Comments

Thanks. I'll give this a try and will let you know. Do you know if it'll take super long since I have millions of rows?
MILLIONS? Probably not terribly fast. Of course, knowing your dataset is MILLIONS of rows and you are not permitted access to write functions/procedures raises more questions that are probably well outside the scope of SO. Best of luck!
yes, if it takes too long then I probably will try something else. One thing I'd like to ask you as I don't have too much knowledge on this is can you explain a little on the textcol and textcolwhere?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.