0

Please let me how to remove double spaces and characters from below string.

String = Test----$$$$19****45@@@@ Nothing

Clean String = Test-$19*45@ Nothing

I have used regex "\s+" but it just removing the double spaces and I have tried other patterns of regex but it is too complex... please help me.

I am using vb.net

3
  • In which language are you programming? Not all regex engines are created equally. Commented Jan 3, 2013 at 13:30
  • Here you can find anserwers http://stackoverflow.com/questions/7780794/javascript-regex-remove-duplicate-characters Commented Jan 3, 2013 at 13:31
  • I am using vb.net and dont know anything about RegEx and I am trying to learn it... if you give me an example pattern it will be great help for me. Thanks in Advance. Commented Jan 3, 2013 at 14:09

4 Answers 4

5

What you'll want to do is create a backreference to any character, and then remove the following characters that match that backreference. It's usually possible using the pattern (.)\1+, which should be replaced with just that backreference (once). It depends on the programming language how it's exactly done.

Dim text As String = "Test@@@_&aa&&&"
Dim result As String = New Regex("(.)\1+").Replace(text, "$1")

result will now contain Test@_&a&. Alternatively, you can use a lookaround to not remove that backreference in the first place:

Dim text As String = "Test@@@_&aa&&&"
Dim result As String = New Regex("(?<=(.))\1+").Replace(text, "")

Edit: included examples

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Patrickdev... I am using vb.net and dont know anything about RegEx and I am trying to learn it... if you give me an example pattern it will be great help for me. Thanks in Advance.
@Ashi: I've included examples that show how to do it in VB.NET.
Thank you so much Patrickdev... But it is removing all double characters I dont want to remove double characters of the alpha. I just wanted to remove $,-,@,*,& etc. I am sorry i did not post my question correctly.
@Ashi: Just replace the . with [^A-Za-z0-9]. This is a character class that matches anything that's not (the ^ negates it) either an uppercase letter, lowercase letter or number. Therefore, it will remove anything else. Your pattern would then look like ([^A-Za-z0-9])\1+.
1

For a faster alternative try:

        Dim text As String = "Test@@@_&aa&&&"

        Dim sb As New StringBuilder(text.Length)
        Dim lastChar As Char
        For Each c As Char In text
            If c <> lastChar Then
                sb.Append(c)
                lastChar = c
            End If
        Next

        Console.WriteLine(sb.ToString())

1 Comment

Thanks I have done 10-15 performance test on different columns of a large datatable and always find string functions are faster than RegEx... It seems I have to use string functions to archive my goal in a better way. Please give me your advice. Thanks Again.
0

Here is a perl way to substitute all multiple non word chars by only one:

my $String = 'Test----$$$$19****45@@@@ Nothing';
$String =~ s/(\W)\1+/$1/g;
print $String;

output:

Test-$19*45@ Nothing

Comments

0

Here's how it would look in Java...

String raw = "Test----$$$$19****45@@@@ Nothing";
String cleaned = raw.replaceAll("(.)\\1+", "$1");
System.out.println(raw);
System.out.println(cleaned);

prints

Test----$$$$19****45@@@@ Nothing
Test-$19*45@ Nothing

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.