0

I have a collection of all uppercase address names and numbers and I want to extract just the first encountered address number for each address. The following examples show what I would like to extract from each:

  • 80 ROSE COTTAGE -> 80
  • 80A ROSE COTTAGE -> 80A
  • 80 A ROSE COTTAGE -> 80 A
  • 80ROSE COTTAGE -> 80 (accidental no-space)
  • [ANY OTHER TEXT] 80 ROSE COTTAGE -> 80

I have found some similar questions answered here and elsewhere on the internet, but they always deal with an address as a whole as opposed to specifically just address name and number:

Match each address from the address number to the 'street type'

regex street address match

Regular Expression: Any character that is NOT a letter or number

javascript regular expressions address number

JavaScript regex to validate an address

The last one makes reference to a lookahead, which lead me to construct a negative look ahead for any alphanumeric characters following a potential single text character(eg. 80 A) in my JavaScript regex. However without adding the alternative "digits only found" group (\d+) my fourth example above does not return just the number.

(?:\d+\s*[A-Z]?(?![A-Z0-9]))|(?:\d+))

Is there a way to combine these two groups into a single regex expression? Or is this not possible in JavaScript's regex implementation?

Any help with this would be greatly appreciared.

2
  • 1
    Does it really have to be that complicated? An address usually has only one number which must be the number you are looking for. If it is followed by a character directly like in 80A or if it is followed by a character encased in spaces like in 80 A then that is what you are looking for. Commented Sep 22, 2014 at 11:04
  • /hi thanks for your reply. The dataset is not perfect and as with my last two examples sometimes the number is not at the start, or a word following the number without a seperating space. Without using the look ahead, i found that 80ROSECOTTAGE would result in 80R when it should just be 80. Thus I have currently added the digit only alternative group. This works, but I am wondering if there is a way to combine without having the groups. Commented Sep 22, 2014 at 11:23

1 Answer 1

0
(\d+\s*(?:[A-Z](?![A-Z]))?)

You can try this.

See demo.

http://regex101.com/r/kM7rT8/13

Sign up to request clarification or add additional context in comments.

9 Comments

Hi, I am attempting to apply your suggestion to the first group in my regular expression (so that I can drop the second). The best I seem to do is: \d+\s*[A-Z]?(?![A-Z]{2,}) This works fine apart from it seems to drop the zero from 80 when the example text is '80ROSE COTTAGE'. Did you mean to apply this somehow else? What I require is: [At least one digit(s)] followed by [Any or no whitespace] followed by [Any one 'A to Z' that is not followed by another 'A to Z'] (or failing the last part, just the original digit(s)). I hope that makes sense?
@Derek you have to use .replace function.And replace with ``.and just use the regex given and nothing else.
Hi sorry I didnt realise that was the case. Am I right in thinking that carrying out a replace using this on 'A 80 A' would return 'A 80 A' as opposed to '80 A'?
@Derek it would return A 80 A
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.