0

I've a some strings like that "paddington road" and I need to extract the word "road" from this string. How can I do that?

The problem is that I need to process a list of streets and extract some words like "road" "park" "street" "boulevard" and many others.

What could be the best way to do that? The complexity is O(n*m) and if you consider that I process more than 5000 streets, the performance should be very important.

I'm extracting the values from a Postgres db and putting into a List but I'm not sure it's the best way, may be a hash table is faster to query?

I tried something like this:

    // Parse selectedList
    Iterator<String> it = streets.iterator();
    Iterator<String> it_exception = exception.iterator();

    int counter = streets.size();
    while(it.hasNext()) {   

        while ( it_exception.hasNext() ) {
            // remove substring it_exception.next() from it.next()              
        }               
    }

What do you think?

3
  • When you say you want to "extract" those words. Do you need to do anything with the words you are "extracting" or do you simply want to remove them from the string? Commented Jan 11, 2012 at 22:36
  • Do you think putting that condition in the select query itself will add to the complexity? Commented Jan 11, 2012 at 22:37
  • Why not use a substring() in conjunction with indexof() methods? you can do the same operation with extracting a string using SQL query in postgres it also has a substring() and strpos() Commented Jan 11, 2012 at 22:39

3 Answers 3

1

You can try Set:

Set<String> exceptions = new HashSet<String>(...);
for (String street : streets) {
    String[] words = street.split(" ");
    StringBuilder res = new StringBuilder();
    for (String word : words) {
        if (!exceptions.contains(word)) {
            res.append(word).append(" ");
        }
    } 
    System.out.println(res);
}

I think complexity will be O(n), where n is a number of all words in streets.

Sign up to request clarification or add additional context in comments.

Comments

1

You need to get a new iterator for your list of keywords at each iteration of the outer loop. The easiest way is to use the foreach syntax:

for (String streetName : streets) {
    for (String keyword : keywords) {
        // find if the string contains the keyword, and perhaps break if found to avoid searching for the other keywords
    }
}

Don't preoptimize. 5000 is nothing for a computer, and street names are short strings. And if you place the most frequent keywords (street, rather than boulevard) at the beginning of the keyword list, you'll have less iterations.

Comments

1
List streets = new ArrayList<String>();
    streets.add("paddington road");
    streets.add("paddington park");

    for (Object object : streets) {
        String cmpstring = object.toString();
        String[] abc = cmpstring.split(" ");
        String secondwrd = abc[1];
        System.out.println("secondwrd"+secondwrd);

    }

you can keep secondwrd in a list or string buffer etc....

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.