Need regex to ignore a specific string of only numbers

Question

so I'm using 3rd party application that uses regex to get matches. It is automatically set to match only the first match since it only looking for one piece of information per page. I cannot change this setting unless I want it to find all matches to be display as an array which I rarely want it to do. That last condition doesn't apply to the match I want.

What I want it to find are ID codes. It just so happens that all the IDs start with 10 and are followed by 4 more numbers

Example:

So I wrote this regex

10[0-9]{4}

The only problem with this is that there is a .js file in the header that is named 10022008.js and since it automatically chooses the first match, all the IDs get set to this.

How do you get regex to ignore that string of numbers and that string only? All the searches I have done only similar ignore type codes have not worked

Are the others surrounded by whitespace? Does a . follow any of them? The simple solution is to use \s in the pattern as \s+10[0-9]{4}\s. Post some examples of where the ids would occur. — Michael Berkowski
– Michael Berkowski, Commented Aug 17, 2012 at 18:13
sometimes they are, but sometimes they start with #. It varies too which is annoying — Travis Crum
– Travis Crum, Commented Aug 17, 2012 at 18:24
so you dont know your problem itself.check out ur extract rules with them and YOU NEED to tag a REGEX question with the langauge that you are using!! — Anirudha
– Anirudha, Commented Aug 17, 2012 at 18:25
the most common regex code i use is like this - - - id="imaunicorn"><a href="(.*?)" id="unicornfriend — Travis Crum
– Travis Crum, Commented Aug 17, 2012 at 18:48
sorry for not being specific @Anirudha, the extract rules are what I define them to be — Travis Crum
– Travis Crum, Commented Aug 20, 2012 at 13:58

Bohemian · Accepted Answer · 2012-08-17 18:14:21Z

5

Add the "word boundary" regex \b to each end of your regex:

\b10[0-9]{4}\b

The word boundary matches between any "word" character (ie \w, which is [0-9a-zA-Z_]) and any non-word character, or visa versa, and is zero-width, so it won't add any characters to your capture.

answered Aug 17, 2012 at 18:14

Bohemian♦

427k103 gold badges603 silver badges750 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Travis Crum Over a year ago

this one works great! I didn't think of that. I tried a look ahead but it kept canceling all the matches it found and returned nothing

Michael Berkowski Over a year ago

Doesn't \b also match the strings followed by . in .js?

Travis Crum Over a year ago

so far it has not, I'm running the crawl right now so I'll know in a few minutes ^_^

Bohemian Over a year ago

@Michael Yes, but in his example the js file has more than 6 chars in its name, so it won't match it

Michael Berkowski Over a year ago

@Bohemian +1 Ah, I didn't look closely at it -I thought the issue was that it was exactly the same length, hence my comment on the OP about whitespace boundaries.

KRyan · Accepted Answer · 2012-08-17 18:14:50Z

2

Lookahead is one solution. May not be the most efficient, but I think it is the most readable.

10\d{4}(?!08\.js)

This will match 10 followed by any four digits, provided that those digits are not followed by 08.js.

answered Aug 17, 2012 at 18:14

KRyan

7,6402 gold badges42 silver badges73 bronze badges

1 Comment

Travis Crum Over a year ago

this works in the ruby tester I use but not in the app I am using. I'm not entirely sure how regex in Blosm differs from ruby, perl etc

Jason Carter · Accepted Answer · 2012-08-17 18:14:25Z

-1

I'm not sure what the input data looks like, but could you limit it to the beginning and end of line?

^10[0-9]{4}$

answered Aug 17, 2012 at 18:14

Jason Carter

495 bronze badges

3 Comments

Anirudha Over a year ago

^ and $ would not work cuz the match is somewhere within the file,not at the start...use \b instead!

Jason Carter Over a year ago

I guess I didn't understand the question. I assumed the input data was on separate lines, and was being processed as such.

Travis Crum Over a year ago

Thanks for answering my first question but I tired this too and it didn't work. Anirudha is right with using the \b. Idk what I was thinking but I tried to use the \b to specify what I was trying to ignore and not what I was trying to find...

Collectives™ on Stack Overflow

Need regex to ignore a specific string of only numbers

3 Answers 3

5 Comments

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related