Check if string contains word (not substring!)

Question

Is there a way to check if a substring contains an entire WORD, and not a substring.

Envision the following scenario:

public class Test {
    public static void main(String[] args) {
        String[] text = {"this is a", "banana"};
        String search = "a";

        int counter = 0;
        for(int i = 0; i < text.length; i++) {
            if(text[i].toLowerCase().contains(search)) {
                counter++;
            }
        }

        System.out.println("Counter was " + counter);
    }
}

This evaluates to

Counter was 2

Which is not what I'm looking for, as there is only one instance of the word 'a' in the array.

The way I read it is as follows:

The if-test finds an 'a' in text[0], the 'a' corresponding to "this is [a]". However, it also finds occurrences of 'a' in "banana", and thus increments the counter.

How can I solve this to only include the WORD 'a', and not substrings containing a?

Thanks!

Peter Lawrey · Accepted Answer · 2016-04-22 12:16:46Z

6

You could use a regex, using Pattern.quote to escape out any special characters.

String regex = ".*\\b" + Pattern.quote(search) + "\\b.*"; // \b is a word boundary

int counter = 0;
for(int i = 0; i < text.length; i++) {
    if(text[i].toLowerCase().matches(regex)) {
        counter++;
    }
}

Note this will also find "a" in "this is a; pause" or "Looking for an a?" where a doesn't have a space after it.

edited Apr 22, 2016 at 12:16

answered Apr 22, 2016 at 12:07

Peter Lawrey

535k83 gold badges770 silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Guillaume Barré Over a year ago

if(text[i].toLowerCase().matches(regex)) {

yulai Over a year ago

Thanks! However, I get the message "the method quote(String) is undefined for type Pattern").

Peter Lawrey Over a year ago

@northerner it was added in Java 5.0, which version of Java are you using?

rev_dihazum · Accepted Answer · 2016-04-22 12:11:10Z

1

Could try this way:

for(int i = 0; i < text.length; i++) {
    String[] words = text[i].split("\\s+");
    for (String word : words) 
        if(word.equalsIgnoreCase(search)) {
            counter++;
            break;
        }
}

answered Apr 22, 2016 at 12:11

rev_dihazum

8181 gold badge9 silver badges20 bronze badges

Comments

dryairship · Accepted Answer · 2016-04-22 12:09:29Z

0

If the words are separated by a space, then you can do:

if((" "+text[i].toLowerCase()+" ").contains(" "+search+" "))
{
   ...
}

This adds two spaces to the original String.
eg: "this is a" becomes " this is a ".

Then it searches for the word, with the flanking spaces. eg: It searches for " a " when search is "a"

edited Apr 22, 2016 at 12:09

answered Apr 22, 2016 at 12:07

dryairship

6,0574 gold badges32 silver badges57 bronze badges

5 Comments

Nadir Over a year ago

what if the a is at the beggining or the end?

dryairship Over a year ago

@Nadir That is exactly why we are adding spaces!

dryairship Over a year ago

@Nadir See this : " " +text[i].toLowerCase()+ " "

Nadir Over a year ago

Quite inefficent, having to create 2 new Strings for each check, when you can just use regex

Jinu P C Over a year ago

what if the word is starting with a new line? like "hello \nworld"

Michele Da Ros · Accepted Answer · 2016-04-22 12:15:03Z

0

Arrays.asList("this is a banana".split(" ")).stream().filter((s) -> s.equals("a")).count();

answered Apr 22, 2016 at 12:15

Michele Da Ros

9167 silver badges24 bronze badges

Comments

GhostCat · Accepted Answer · 2016-04-22 12:15:52Z

Of course, as others have written, you can start playing around with all kinds of pattern to match "words" out of "text".

But the thing is: depending on the underlying problem you have to solve, this might (by far) not good enough. Meaning: are you facing the problem of finding some pattern in some string ... or is it really, that you want to interpret that text in the "human language" sense? You know, when somebody writes down text, there might be subtle typos, strange characters; all kind of stuff that make it hard to really "find" a certain word in that text. Unless you dive into the "language processing" aspect of things.

Long story short: if your job is "locate certain patterns in strings"; then all the other answers will do. But if your requirement goes beyond that, like "some human will be using your application to 'search' huge data sets"; then you better stop now; and consider turning to full-text enabled search engines like ElasticSearch or Solr.

Collectives™ on Stack Overflow

Check if string contains word (not substring!)

5 Answers 5

3 Comments

Comments

5 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

Comments

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related