1

I am trying to see if a given host name appears in a list of hosts in the form of comma separated string like the following:

String list = "aa.com,bb.com,cc.com,dd.net,ee.com,ff.net";
String host1 = "aa.com"; // should be a match
String host2 = "a.com";  // shouldn't be a match
String host3 = "ff.net"  // should be a match

// here is a test for host1     
if (list.matches(".*[,^]" + host1 + "[$,].*")) {
    System.out.println(host1 + " matched");
}
else {
    System.out.println(host1 + " not matched");
}

But I got not matched for host (aa.com) but then I am not very familiar with regex. Please correct me!

BTW I don't want to use a solution where you split the host list into an array and then doing matching there. It was too slow because the host list can be quite long. Regex apporoach can be even worse but I was trying to make it work first.

7
  • 1
    matches() matches the whole string, not a part of it. You would have to either split the string and compare to each element, or use Pattern ...; Matcher ...;. Commented May 26, 2014 at 18:22
  • what is the pattern of inputs? Commented May 26, 2014 at 18:23
  • Matches doesn't compile regex. Commented May 26, 2014 at 18:24
  • 5
    Since you have a defined list, why not simply do if(Arrays.asList(list.split(",")).contains(host1)){//matched} ? Or you could split the string first, and put all the elements in an HashSet. Then checking if it's valid or not will be done in constant time. Commented May 26, 2014 at 18:25
  • 1
    @mrres1 I know, where did you see that I'm testing a regular expression? Commented May 26, 2014 at 18:51

5 Answers 5

1

I also think Regexes are too slow if you are looking for an exact match, so I tried to write a method that looks for occurences of the host name in the list and checks every substring whether it's not a part of a wider host name (like "a.com" is a part of "aa.com"). If it's not - the result is true, there is such a host in the list. Here's the code:

boolean containsHost(String list, String host) {
    boolean result = false;
    int i = -1;
    while((i = list.indexOf(host, i + 1)) >= 0) { // while there is next match
        if ((i == 0 || list.charAt(i - 1) == ',') // beginning of the list or has a comma right before it
                && (i == (list.length() - host.length()) // end of the list 
                || list.charAt(i + host.length()) == ',')) { // or has a comma right after it
            result = true;
            break;
        }
    }
    return result;
}

But then I thought that it would be even faster to check just 3 cases - matches in the beginning, in the middle and in the end of the list, which can be done with startsWith, contains and endsWith methods respectively. Here's the second option, which I would prefer in your case:

boolean containsHostShort(String list, String host) {
    return list.contains("," + host + ",") || list.startsWith(host + ",") || list.endsWith("," + host);     
}

UPD: ZouZou's comment to your post also seems good, I would recommend to compare the speed on a list similar to the sizes you have in the real situation and choose the fastest one.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Nicko. I eventually made my regex working but found it was too slow: list.matches("(.*[,]|^)" + str1 + "([,].*|$)"); (might not be accurate I wrote from my memory). So I ended up using exactly the same method (looping, indexOf and checking boundary). It is by far the fastest compared to regex and split and comparing
@kee, did you try the second option? It still seems to me that it's faster than the first one.
I didn't get a chance to try since the first one gave me a pretty good performance and I have a lot other issues to deal with but the 2nd one is a lot more concise and easier to read! If I have a chance to measure, I will report back.
0

Like it is mentioned in the comments. You shouldn't be using Matches as it tries to match the regex pattern to the entire comma delimited string. You are not trying to do that. You are trying to detect if a given substring occurs in the comma separated source string.

In order to do that you would just use the hostname in a findall method. However, you can just use substring which would not have an overhead of regex compilation.

Regexes are used to match strings that could have variations in the pattern matched. Never use a regex when you want to do exact string matching.

Comments

0

You can use a lambda to stream the array and return a boolean for the match.

String list = "aa.com,bb.com,cc.com,dd.net,ee.com,ff.net";
String host1 = "aa.com"; // should be a match
String host2 = "a.com";  // shouldn't be a match
String host3 = "ff.net";  // should be a match

ArrayList<String> alist = new ArrayList<String>();

for(String item : list.split("\\,"))
{
    alist.add(item);
}

boolean contains_host1 = alist.stream().anyMatch(b -> b.equals(host1));
boolean contains_host2 = alist.stream().anyMatch(b -> b.equals(host2));
boolean contains_host3 = alist.stream().anyMatch(b -> b.equals(host3));

System.out.println(contains_host1);
System.out.println(contains_host2);
System.out.println(contains_host3);

Console output:

true
false
true

Comments

0

This works prefectly,without regex

         String list = "aa.com,bb.com,cc.com,dd.net,ee.com,ff.net";
         String host1 = "aa.com"; 
         String host2 = "a.com";  
         String host3 = "ff.net"; 
         boolean checkingFlag=false;
         String [] arrayList=list.split(",");
        System.out.println(arrayList.length);




        for(int i=0;i<arrayList.length;i++)
        {
          // here is a test for host1     
            if (arrayList[i].equalsIgnoreCase(host1))
                checkingFlag=true;

        }

        if (checkingFlag)
            System.out.println("Matched");
        else
            System.out.println("Not matched");

It is hardly taken 20-30 millsecs to execute a loop with 1 million records.As per your comment i have just edited.you can check this.

long startingTime=System.currentTimeMillis();

        for(int i=0;i<1000000;i++)
        {
            if (i==999999)
                checkingFlag=true;

        }
        long endingTime=System.currentTimeMillis();
        System.out.println("total time in millisecond:"+ (endingTime-startingTime));

2 Comments

true. only problem of this approach is that it is slow. I should have mentioned this in my post but the host list can be a few hundred (and it changes) and this operation needs to be repeated for over 1M entries so I was looking for a faster way. Maybe regex is equally slow but I wanted to make it work first.
@kee I doubt that storing a huge String and doing a regular expression on it to see if the host matches will be faster than having an appropriate datastructure (such as an HashSet) to perform this task.
0

Try this:

String list = "aa.com,bb.com,cc.com,dd.net,ee.com,ff.net";
String host1 = "aa.com"; // should be a match
String host2 = "a.com";  // shouldn't be a match
String host3 = "ff.net"  // should be a match 

//For host1
Pattern p1 = Pattern.compile("\\b[A-Za-z]{2}.com");  
Matcher m1 = p1.matcher(list);

if(m1.find()){
   System.out.println(host1 + " matched");
}else{
   System.out.println(host1 + " not matched");
}

//for host2
p1 = Pattern.compile("\\b[A-Za-z]{1}.com");
m1 = p1.matcher(list);

if(m1.find()){
     System.out.println(host2 + " matched");
}else{
     System.out.println(host2+"Not mached");
}

//and so on...

The \b means word boundary (so start of word in this case). The [A-Za-z]{n}.com means a character between A-Z or a-z n times followed by a .com

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.