2

I'm running hive 0.13. I have a column that has various strings that look like this: a:ABC,b:SDSD,c:213123#a:SDS,b:UIOU,c:89797#a:TYRQQQW,b:UIOUIOYYO,c:546654 a:DFSS,b:TYRTTN,c:12323#a:HJH,b:YTUUUTYUTYT,c:67890 a:TYY,b:OPIUIU,c:86768 They could of any length, one set a,b,c values will always be separated by a '#'.

Now, what I'm trying to do is, extract only the b column, like: b:SDSD,b:UIOU,b:UIOUIOYYO b:TYRTTN,b:YTUUUTYUTYT

What I've been trying to do is something like regexp_replace(column,'^channel:+[A-Z]{3,10},',")

I.e., replace all values that isn't b:... with blank, but this isn't working.

Could someone please correct me or suggest a better way?

Thanks.

2 Answers 2

2
[^b]:[^,]*,?

Try this.Replace by empty space.See demo.

https://regex101.com/r/wU7sQ0/27

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks for this. But how would you approach this if, in place of a,b & c, we had words. Say 'a' was supplier,'b' was agent and 'c' was player.
@FenderBender you can use negative lookahead ^((?!agent).)*
Thanks. A slightly different version of this worked for me. But you certainly got me thinking in the right direction.
@FenderBender can you please share what worked for you
@vks - indeed, but one lives in hope! It's just that I'm interested in learning and would, just as you did at the time, like to know what the OP's ultimate solution was! I'm a bit of a dog-with-a-bone that way! :-)
|
0

There's a simple way of doing this which involves REGEX_MATCHES - all of the code below can be found here.

CREATE TABLE test
(
  id SMALLINT NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  str TEXT NOT NULL
);

Your data:

INSERT INTO test (str) VALUES
('a:ABC,b:SDSD,c:213123#a:SDS,b:UIOU,c:89797#a:TYRQQQW,b:UIOUIOYYO,c:546654'),
('a:DFSS,b:TYRTTN,c:12323#a:HJH,b:YTUUUTYUTYT,c:67890'),
('a:TYY,b:OPIUIU,c:86768');

Instead of DELETE-ing what you don't want, try SELECT-ing the text that you do want as follows:

SELECT
  id,
  UNNEST(REGEXP_MATCHES(str, '(b:[A-Z]+)', 'g')) AS ext
FROM
  test;

Result:

id  ext
1   b:SDSD
1   b:UIOU
1   b:UIOUIOYYO
2   b:TYRTTN
2   b:YTUUUTYUTYT
3   b:OPIUIU

and to concatenate as strings:

SELECT
  id,
  STRING_AGG(ext, ',')
FROM
(
  SELECT
    id,
    UNNEST(REGEXP_MATCHES(str, '(b:[A-Z]+)', 'g')) AS ext
  FROM
    test
)
GROUP BY id
ORDER BY id;

Result:

id  result
1   b:SDSD,b:UIOU,b:UIOUIOYYO
2   b:TYRTTN,b:YTUUUTYUTYT
3   b:OPIUIU

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.