2

I have been searching all over the internet for a solution, but could not find one.

I need to remove duplicate characters within a String but would also like to include an exception to allow a integer amount of characters to repeat / remain in the string.

For example, I tried the following:

$str = 'This ----------is******** a bbbb 999-999-9999 ******** 8888888888 test 4444444444 ********##########Sammy!!!!!! ###### hello !!!!!!';

$t1 = preg_replace('/(.)\1{3,}/','',$str);
$t2 = preg_replace('/(\S)\1{3,}/','',$str);
$t3 = preg_replace('{(.)\1+}','$1',$str);
$t4 = preg_replace("/[;,:\s]+/",',',$str);
$t5 = preg_replace('/\W/', '', $str);
$t6 = preg_replace( "/[^a-z]/i", "", $str);

echo '$t1 = '.$t1.'<br>';
echo '$t2 = '.$t2.'<br>';
echo '$t3 = '.$t3.'<br>';
echo '$t4 = '.$t4.'<br>';
echo '$t5 = '.$t5.'<br>';
echo '$t6 = '.$t6.'<br>';

Results:

$t1 = This is a 999-999- test Sammy hello 
$t2 = This is a 999-999- test Sammy hello 
$t3 = This -is* a b 9-9-9 * 8 test 4 *#Samy! # helo !
$t4 = This,----------is********,a,bbbb,999-999-9999,********,8888888888,test,4444444444,********##########Sammy!!!!!!,######,hello,!!!!!!
$t5 = Thisisabbbb99999999998888888888test4444444444Sammyhello
$t6 = ThisisabbbbtestSammyhello

The desired out put would be:

This ---is*** a bbbb 999-999-9999 *** 8888888888 test 4444444444 ***###Sammy!!! ### hello !!!

As you can see, the desired output leaves the numbers alone and only leaves 3 repeated characters, i.e. --- ### * !!!

I would like to be able to change the exceptions from 3 to any other integer if possible.

Thanks in advance.

2
  • 1
    /([^0-9])\1{3,}? if you want to allow digits to repeat, then exclude digits from the repetition check. Commented May 25, 2012 at 21:14
  • thanks, but how would the entire preg_replace statement be written to incorporate this? Commented May 25, 2012 at 21:20

2 Answers 2

3

This will do it:

preg_replace('/(([^\d])\2\2)\2+/', '$1', $str);

[^\d] matches a single character which isn't a digit.
\2 refers to the captured digit
$1 refers to the first captured group which will be the first three repeated characters, so the extra \2+ gets stripped off.

Codepad

Sign up to request clarification or add additional context in comments.

3 Comments

If I wanted to retain 4 repeated characters, would I change all the 2's to 3's and so on? I tried using 3,'s but that did not work.
@Sammy No, \2 refers to the digit captured by the second set of parentheses (the character that matched [^\d]. To retain 4 characters you would change \2\2 to \2\2\2. That will keep one extra character, once you're at that many though it's better to change it to \2{3} which is a shorthand for \2 three times. IE. \2{5} is shorthand for \2\2\2\2\2
Hm, this doesn't work. I use different regex which does the trick: preg_replace('{([^\w])\1+}','',$str);
0

The regex you are looking for: /((.)\2{2})\2*/ If you need exception n, put n-1 in the curly brace {n-1}: /((.)\2{n-1})\2*/

EDIT: for non-number or what ever you what, replace . with other things, for example [^\d] etc. /(([^\d])\2{2})\2*/

1 Comment

I forgot to add that you have to capture $1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.