2

my record is like:

0x0000110PPPP111KZY0 H123456789 XYZ 000000000000000000607532030000607532000060753203002014101707199999

I am searching for a regex where i can split first 3 char 0x0 in to one field in a hive table and the rest 000110PPPP111KZY0 in to second field and so on fixed length file and no delimiter.

2
  • can you give example of the result you want? at least 2 or 3 Commented Jul 29, 2015 at 2:40
  • how you want the third field? please provide sample result for the record given. Commented Jul 29, 2015 at 6:21

2 Answers 2

1

I have no experience with hadoop or hive, however the following regex will work with what I believe you're looking for.

/(\dx\d)(.*)/ This will capture/split 0x0 into the first capture group, and everything afterwards into the second capture group. If you only want the numbers/letters following the 0x0 number (so none of the H123456789 or trailing words and letters), use /(\dx\d)([^ ]*)/

If I misunderstood what you're looking for, can you just clarify the exact section of that code you provided that you'd like to select and/or capture? Thanks!

Sign up to request clarification or add additional context in comments.

Comments

0
Select 
  regexp_extract(data, '^(\\dx\\d).*', 1), 
  regexp_extract(data, '^\\dx\\d(.*)', 1) 
from (Select '0x0000110PPPP111KZY0 ' as data) a;

This code returns a Hive row with two fields:

0x0 000110PPPP111KZY0 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.