2

I'm having a problem with the regex syntax in Mysql when it comes to matching whitespaces in Strings.

I have a database of Zipcodes in the format:
1111 AA CITYNAME or 1111 CITYNAME.

From this, I want to extract the zipcode and the cityname, I used the following code:

DROP FUNCTION IF EXISTS GET_POSTALCODE;
CREATE FUNCTION GET_POSTALCODE(input VARCHAR(255))
  RETURNS VARCHAR(255)
  BEGIN
    DECLARE output VARCHAR(255) DEFAULT '';
    IF input LIKE '^[1-9][0-9]{3}[[:blank:]][A-Z]{2}[[:blank:]]%'
    THEN
      SET output = SUBSTRING(input, 1, 7);
    ELSE
      SET output = SUBSTRING(input, 1, 4);
    END IF;
    RETURN output;
  END

I would expect the result for a input string of 9741 NE Groningen to be split up into 9741 NE and Groningen.

But instead I get 9741 and NE Groningen.

I have tried all sorts of things to match the whitespace, which I think is the problem. I tried:

  • [[:blank:]]
  • [:blank:]
  • [[:space:]]
  • [:space:]
  • and the \s method

[:space:] Should match all whitespaces, but yet again, same result.

Nothing I try seems to work, could you maybe point me in the right direction?

Thank you!

5
  • is this "AA" of "1111 AA CITYNAME" always two characters long or can it be more or less? Commented Jun 2, 2016 at 21:51
  • The letters are always 2, not more, but in some cases the 2 letters aren't supplied, and the 4 numbers are immediately followed by a space and the cityName Commented Jun 2, 2016 at 21:54
  • okay - i dont know further right now but perhaps this will point you in a better direction: \s+(?=\S*+$) will give you the last whitespace, but i dont get how to become the part after and before that. Commented Jun 2, 2016 at 22:17
  • Your task is best done before inserting the data into a database. Commented Jun 9, 2016 at 3:27
  • Is this code actually used? According to MySQL specs 'like' does not work on regular expressions: dev.mysql.com/doc/refman/5.7/en/pattern-matching.html Commented Aug 12, 2017 at 13:09

1 Answer 1

4

Foreward

You can enable PCRE if you use this library

Description

^([0-9]{4}(?:[[:blank:]]+[a-z]{2}(?=[[:blank:]]))?)[[:blank:]](.*$)

Regular expression visualization

This regular expression will do the following:

  • find the 4 digit codes followed by an optional two characters
  • match the rest of the string which should be a city name

Example

Live Demo

https://regex101.com/r/sE3xN7/4

Sample text

Note your examples only had 4 digit codes, so I took the liberty of adding an additional digit

1111 AA CITYNAME1
2222 CITYNAME2
3333 Las Vegas
4444 BB Las Vegas
9741 NE Groningen

Sample Matches

MATCH 1
1.  [0-7]   `1111 AA`
2.  [8-17]  `CITYNAME1`

MATCH 2
1.  [18-22] `2222`
2.  [23-32] `CITYNAME2`

MATCH 3
1.  [33-37] `3333`
2.  [38-47] `Las Vegas`

MATCH 4
1.  [48-55] `4444 BB`
2.  [56-65] `Las Vegas`

MATCH 5
1.  [66-73] `9741 NE`
2.  [74-83] `Groningen`

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of a "line"
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [0-9]{4}                 any character of: '0' to '9' (4 times)
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      [[:blank:]]+             whitespace (\n, \r, \t, \f, and " ")
                               (1 or more times (matching the most
                               amount possible))
----------------------------------------------------------------------
      [a-z]{2}                 any character of: 'a' to 'z' (2 times)
----------------------------------------------------------------------
      (?=                      look ahead to see if there is:
----------------------------------------------------------------------
        [[:blank:]]               whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
      )                        end of look-ahead
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  [[:blank:]]                 whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
    $                        before an optional \n, and the end of a
                             "line"
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.