0

I have a table which stores strings in array. Couldn't figure it out why but simple example looks like that:

+--------+----------------------------------+
| reason |              string              |
+--------+----------------------------------+
| \N     | \N\N\N\NXXX - ABCDEFGH\N\N |
| \N     | \N\N\N\NXXX - ABCDEFGH       |
| \N     | \N\N\N\N                      |
| \N     | \N\N\N\NXXX - ABCDEFGH\N    |
| \N     | \N\N                            |
| \N     | \N\N\N                         |
| \N     | \N                               |
+--------+----------------------------------+

We couldn't see that in table above but true format of first string looks like that enter image description here

Basically, what I would like to retrieve is:

+--------+----------------------------------+
| reason |              string              |
+--------+----------------------------------+
| \N     |          XXX - ABCDEFGH          |
+--------+----------------------------------+

XXX - remains always the same but ABCDEFGH may be any string. The problem is I can't use table path.path.path_path[4] because string XXX - ABCDEFGH may be 4th or any element of the array (even 20th).

Tried to use where lower(path.path.string) like ('xxx - %') but received error

Select 
path.path.reason, 
path.path.string
From table_name
Where path.id = '123'
And datestr = '2018-07-21'
4
  • you should add your table definition and the queries of the results that we are seeing. i does not seem to be an array... question is not clear at all either Commented Oct 25, 2018 at 11:56
  • 1
    What is the code of strings delimiter? Commented Oct 25, 2018 at 13:27
  • 2
    Hint: SELECT regexp_extract('\N\N\N\NXXX - ABCDEFGH\N\N', '\N\N\N\N(.*?)(\N\N)', 1) Commented Oct 25, 2018 at 16:05
  • What is that character as show in image? Commented Oct 25, 2018 at 19:13

1 Answer 1

1

This regular expression will do the job for you([^\N$])+.

Assuming the character showed in the image is a $.

First, you can use regexp_extract() to retrieve particular array element. It has the following syntax:

regexp_extract(string subject, string pattern, int index)

Second, you can use regexp_replace which has the following syntax:

regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)

Test Data

WITH string_column 
     AS (SELECT explode(array('XXX - ABCSSSSSSSSSSSGH\N\N', 
                    '\N$\N$\N$\N$XXX - ABCDEFGH$\N\N', 
                    '\N\N\N\N', '\N\N\N\NXXX - ABCDEFGH\N')) AS 
            str_column
        ) 
SELECT regexp_replace(regexp_extract(str_column, '([^\N$])+', 0), "$", " ") 
    AS string_col 
FROM string_column 

Will result in

------------------------------
|         string_col         |
------------------------------
| XXX - ABCSSSSSSSSSSSGH     |
------------------------------
| XXX - ABCDEFGH             |
------------------------------
|                            |
------------------------------
| XXX - ABCDEFGH             |
------------------------------

Note: A '0' which specifies the index produces a match, after the extract based on the pattern.

regexp_extract(str_column, '(,|[^\N$])+', 0) 

The following statement will replace occurrence of any '$'

regexp_replace(regexp_extract(str_column, '([^\N$])+', 0), "$", " ")

For more information on

Sign up to request clarification or add additional context in comments.

2 Comments

Wow, thanks for comprehensive answer. I have to admit it looks realy tricky as after I retreive those strings (every string starts with "XXX - " and then there is a string which may be 5 characters or 100 characters long) I want to use joins etc so it may get much more complicated for me but thank you once again! You asked in comment above what's that character in the picture - I have no idea... if there is a way to check that, please let me know ;)
Can you show me the hive definition for this table?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.