0

I Need to identify a string in a text and replace it with null string. Problem is, it is not always present as a word itself. There will be space character present between each letter or set of letters. For example:

For word "Decent", I may face the following values. D ec ent, De ce nt, De ce n t .

Is there a way to identify these strings using "Decent" word as input with any regular expression? I am very new to regular expressions. Please help!!

TIA!

3
  • What language are you using? I would say avoid a regex here, if you can. Commented Apr 22, 2013 at 2:53
  • 1
    Hope that this is not your idea of censoring. Commented Apr 22, 2013 at 2:54
  • I am using vb.net. I faced this problem while parsing a PDF document! Commented Apr 22, 2013 at 3:02

4 Answers 4

1
\bD\s*e\s*c\s*e\s*n\s*t\s*

so you match D ec ent, De ce nt, De ce n t, decent Decent

but not blade centimeter

Sign up to request clarification or add additional context in comments.

Comments

1

If you use

'D ?e ?c ?e ?n ?t ?'

it will match the word with extra spaces

1 Comment

\b is recommended here, to prevent the case of cutting off half the end of some word and half the beginning of another word.
1

The expression "D\s*e\s*c\s*e\s*n\s*t" will do it. Each letter is followed by zero or more spaces. Actually \s is "whitespace characters." You could replace \s* with * (space followed by an asterisk) if you just want literal spaces.

Comments

0

first a bit of code:

import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class WordsWithSpaces {

    public static void main(String[] args) {
        String test = "Descent D escent De s  cent desce nd";
        String word = "descent";
        String pattern = "";
        for(int i=0; i<word.length();i++) {
            pattern = pattern+word.charAt(i)+"\\s*";
        }
        System.err.println("pattern is: "+pattern);
        Pattern p = Pattern.compile(pattern,Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher(test);
        while(m.find()) {
            String found = test.substring(m.start(),m.end());
            System.err.println(found+" matches");
        }


    }

}

now for the explanation: \s is a character class for whitespace. this includes spaces and tabs and (possibly) linebreaks. in this piece of code, i take every character of the word i am looking for, and append "\s", with "*" meaning 0 or mor occurences.

also, to avoid it being case sensitive, i set the CASE_INSENSITIVE flag on the pattern.

character classes may not have the same name in your programming language of choice, but there should be one for whitespace. check your documentation.

2 Comments

\s does include line break \n, \r.
No. It always match a new line, flag doesn't have any effect on \s. MULTILINE only affects the meaning of ^ and $.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.