3

I'm pretty new to regex, but I did some study. I got into a problem that might turn out impossible to be solved by regex, so I need a piece of advice.

I have the following string:

some text key 12, 32, 311 ,465 and 345. some other text dog 612, 
12, 32, 9 and 10. some text key 1, 2.

I'm trying to figure if it possible (using regex only) to extract the numbers 12 32 311 465 345 1 2 only - as a set of individual matches.

When I approach this problem I tried to look for a pattern that matches only the relevant results. So I came up with :

  • get numbers that have prefix of "key" and NOT have prefix of "dog".

But I'm not sure if it is even possible. I mean that I know that for the number 1 I can use (?<=key )+[\d]+ and get it as a result, but for the rest of the numbers (i.e. 2..5), can I "use" the key prefix again?

4
  • 1
    Could you try to rewrite your question? It's not clear. Do you want to extract numbers or digits? What would you like to get from 11, 51, 1dog, dog1? Commented Jul 23, 2015 at 6:38
  • Exctracting numbers. the point is to extract only numbers that are followed by key string and NOT by dog string Commented Jul 23, 2015 at 6:46
  • Are you expecting a single string match (i.e. "12, 32, 311 ,465 and 345"), or are you looking for a set of individual matches (i.e. {12,32,311,465,345})? Commented Jul 23, 2015 at 6:50
  • I'm expecting for set of individual matches. (edited my question). thank you Commented Jul 23, 2015 at 6:54

4 Answers 4

3

In Java, you can make use of a constrained width look-behind that accepts {n,m} limiting quantifier.

So, you can use

(?<=key(?:(?!dog)[^.]){0,100})[0-9]+

Or, if key and dog are whole words, use \b word boundary:

String pattern = "(?<=\\bkey\\b(?:(?!\\bdog\\b)[^.]){0,100})[0-9]+";

The only problem there may arise if the distance between the dog or key and the numbers is bigger than m. You may increase it to 1000 and I think that would work for most cases.

Sample IDEONE demo

String str = "some text key 12, 32, 311 ,465 and 345. some other text dog 612,\n12, 32, 9 and 10. some text key 1, 2.";
String str2 = "some text key 1, 2, 3 ,4 and 5. some other text dog 6, 7, 8, 9 and 10. some text, key 1, 2 dog 3, 4 key 5, 6";
Pattern ptrn = Pattern.compile("(?<=key(?:(?!dog)[^.]){0,100})[0-9]+");
Matcher m = ptrn.matcher(str);
while (m.find()) {
   System.out.println(m.group(0));
}
System.out.println("-----");
m = ptrn.matcher(str2);
while (m.find()) {
   System.out.println(m.group(0));
}
Sign up to request clarification or add additional context in comments.

7 Comments

Assumption being made here is that dog and key does not appear in the same "sentence" (delimited by .).
Correct. That is what the sample input implies.
My issue with this solution is that the String: key 1, 2 dog 3, 4 key 5, 6 will not work. I added it to the question example.
@GilPeretz: Write that test case in your question.
@nhahtdh added it. Thank you.
|
2

I wouldn't recommend using code that you can't understand and customize, but here is my one-pass solution, using the method described in this answer of mine. If you want to understand the construction method, please read the other answer.

(?:key(?>\s+and\s+|[\s,]+)|(?!^)\G(?>\s+and\s+|[\s,]+))(\d+)

Compared to the method described in the other post, I dropped the look-ahead, since in this case, we don't need to check against a suffix.

The separator here is (?>\s+and\s+|[\s,]+). It currently allows "and" with spaces on both sides, or any mix of spaces and commas. I use (?>pattern) to inhibit backtracking, so the order of the alternation is significant. Change it back to (?:pattern) if you want to modify it and you are unsure of what you are doing.

Sample code:

String input = "some text key 12, 32, 311 ,465 and 345. some other text dog 612,\n12, 32, 9 and 10. some text key 1, 2. key 1, 2 dog 3, 4 key 5, 6. key is dog 23, 45. key 4";
Pattern p = Pattern.compile("(?:key(?>\\s+and\\s+|[\\s,]+)|(?!^)\\G(?>\\s+and\\s+|[\\s,]+))(\\d+)");
Matcher m = p.matcher(input);
List<String> numbers = new ArrayList<>();

while (m.find()) {
    numbers.add(m.group(1));
}

System.out.println(numbers);

Demo on ideone

1 Comment

This is really the correct regex +1. Was going to post similar but dropped due to loose nature of input data structure.
1

You can do it in 2 steps.

(?<=key\\s)\\d+(?:\\s*(?:,|and)\\s*\\d+)*

Grab all the numbers.See demo.

https://regex101.com/r/uK9cD8/6

Then split or extract \\d+ from it.See demo.

https://regex101.com/r/uK9cD8/7

3 Comments

I'm afraid two steps is not an option for me. is it possible in single step?
@GilPeretz 2 steps is a sure shot way of achieving what you want.You anyways cannot do it in a single regex with 100 % accuracy.
Initially, you were looking for a single regex to do it. I see the limitations of a constrained width lookbehind and nested lookarounds scared you off. Anyway, it's great you have solutions to choose from!
1

You can use a positive look behind which ensure that your sequence doesn't precede by any word except key :

(?<=key)\s(?:\d+[\s,]+)+(?:and )?\d+

Note that here you don't need to use a negative look behind for dog because this regex will just match if your sequence precede by key.

See demo https://regex101.com/r/gZ4hS4/3

9 Comments

In the last it should be \d+, otherwise it would not match two digit number and higher in last.
@Rahul Actually after and we write one digit! but its better to use \d+!
it will not match completly if you use only \d in some text key 12, 32, 311 ,465 and 345. some other text dog 612, 12, 32, 9 and 10. some text
Now your regex is good but earlier you used (?<=key)[^\d]*[\d, ]+(?:and )?\d.
@Kasramvd i'm looking for individual set of results rather than 1 single result. is it possible?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.