2

Is there a way to check if a substring contains an entire WORD, and not a substring.

Envision the following scenario:

public class Test {
    public static void main(String[] args) {
        String[] text = {"this is a", "banana"};
        String search = "a";

        int counter = 0;
        for(int i = 0; i < text.length; i++) {
            if(text[i].toLowerCase().contains(search)) {
                counter++;
            }
        }

        System.out.println("Counter was " + counter);
    }
}

This evaluates to

Counter was 2

Which is not what I'm looking for, as there is only one instance of the word 'a' in the array.

The way I read it is as follows:

The if-test finds an 'a' in text[0], the 'a' corresponding to "this is [a]". However, it also finds occurrences of 'a' in "banana", and thus increments the counter.

How can I solve this to only include the WORD 'a', and not substrings containing a?

Thanks!

0

5 Answers 5

6

You could use a regex, using Pattern.quote to escape out any special characters.

String regex = ".*\\b" + Pattern.quote(search) + "\\b.*"; // \b is a word boundary

int counter = 0;
for(int i = 0; i < text.length; i++) {
    if(text[i].toLowerCase().matches(regex)) {
        counter++;
    }
}

Note this will also find "a" in "this is a; pause" or "Looking for an a?" where a doesn't have a space after it.

Sign up to request clarification or add additional context in comments.

3 Comments

if(text[i].toLowerCase().matches(regex)) {
Thanks! However, I get the message "the method quote(String) is undefined for type Pattern").
@northerner it was added in Java 5.0, which version of Java are you using?
1

Could try this way:

for(int i = 0; i < text.length; i++) {
    String[] words = text[i].split("\\s+");
    for (String word : words) 
        if(word.equalsIgnoreCase(search)) {
            counter++;
            break;
        }
}

Comments

0

If the words are separated by a space, then you can do:

if((" "+text[i].toLowerCase()+" ").contains(" "+search+" "))
{
   ...
}

This adds two spaces to the original String.
eg: "this is a" becomes " this is a ".

Then it searches for the word, with the flanking spaces. eg: It searches for " a " when search is "a"

5 Comments

what if the a is at the beggining or the end?
@Nadir That is exactly why we are adding spaces!
@Nadir See this : " " +text[i].toLowerCase()+ " "
Quite inefficent, having to create 2 new Strings for each check, when you can just use regex
what if the word is starting with a new line? like "hello \nworld"
0
Arrays.asList("this is a banana".split(" ")).stream().filter((s) -> s.equals("a")).count();

Comments

0

Of course, as others have written, you can start playing around with all kinds of pattern to match "words" out of "text".

But the thing is: depending on the underlying problem you have to solve, this might (by far) not good enough. Meaning: are you facing the problem of finding some pattern in some string ... or is it really, that you want to interpret that text in the "human language" sense? You know, when somebody writes down text, there might be subtle typos, strange characters; all kind of stuff that make it hard to really "find" a certain word in that text. Unless you dive into the "language processing" aspect of things.

Long story short: if your job is "locate certain patterns in strings"; then all the other answers will do. But if your requirement goes beyond that, like "some human will be using your application to 'search' huge data sets"; then you better stop now; and consider turning to full-text enabled search engines like ElasticSearch or Solr.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.