Regex to identify a word containing spaces

Question

I Need to identify a string in a text and replace it with null string. Problem is, it is not always present as a word itself. There will be space character present between each letter or set of letters. For example:

For word "Decent", I may face the following values. D ec ent, De ce nt, De ce n t .

Is there a way to identify these strings using "Decent" word as input with any regular expression? I am very new to regular expressions. Please help!!

TIA!

What language are you using? I would say avoid a regex here, if you can. — squiguy
– squiguy, Commented Apr 22, 2013 at 2:53
I am using vb.net. I faced this problem while parsing a PDF document! — chaituse
– chaituse, Commented Apr 22, 2013 at 3:02

Keith Nicholas · Accepted Answer · 2013-04-22 03:04:05Z

1

\bD\s*e\s*c\s*e\s*n\s*t\s*

so you match D ec ent, De ce nt, De ce n t, decent Decent

but not blade centimeter

answered Apr 22, 2013 at 3:04

Keith Nicholas

44.4k15 gold badges101 silver badges168 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Diego Torres Milano · Accepted Answer · 2013-04-22 02:56:59Z

1

If you use

'D ?e ?c ?e ?n ?t ?'

it will match the word with extra spaces

answered Apr 22, 2013 at 2:56

Diego Torres Milano

69.9k9 gold badges116 silver badges145 bronze badges

1 Comment

nhahtdh Over a year ago

\b is recommended here, to prevent the case of cutting off half the end of some word and half the beginning of another word.

Jim Mischel · Accepted Answer · 2013-04-22 02:59:36Z

1

The expression "D\s*e\s*c\s*e\s*n\s*t" will do it. Each letter is followed by zero or more spaces. Actually \s is "whitespace characters." You could replace \s* with * (space followed by an asterisk) if you just want literal spaces.

answered Apr 22, 2013 at 2:59

Jim Mischel

135k25 gold badges197 silver badges377 bronze badges

Comments

rmalchow · Accepted Answer · 2013-04-22 03:01:25Z

0

first a bit of code:

import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class WordsWithSpaces {

    public static void main(String[] args) {
        String test = "Descent D escent De s  cent desce nd";
        String word = "descent";
        String pattern = "";
        for(int i=0; i<word.length();i++) {
            pattern = pattern+word.charAt(i)+"\\s*";
        }
        System.err.println("pattern is: "+pattern);
        Pattern p = Pattern.compile(pattern,Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher(test);
        while(m.find()) {
            String found = test.substring(m.start(),m.end());
            System.err.println(found+" matches");
        }


    }

}

now for the explanation: \s is a character class for whitespace. this includes spaces and tabs and (possibly) linebreaks. in this piece of code, i take every character of the word i am looking for, and append "\s", with "*" meaning 0 or mor occurences.

also, to avoid it being case sensitive, i set the CASE_INSENSITIVE flag on the pattern.

character classes may not have the same name in your programming language of choice, but there should be one for whitespace. check your documentation.

answered Apr 22, 2013 at 3:01

rmalchow

2,78922 silver badges32 bronze badges

2 Comments

nhahtdh Over a year ago

\s does include line break \n, \r.

nhahtdh Over a year ago

No. It always match a new line, flag doesn't have any effect on \s. MULTILINE only affects the meaning of ^ and $.

Collectives™ on Stack Overflow

Regex to identify a word containing spaces

4 Answers 4

Comments

1 Comment

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related