2

I was trying to find a way to remove duplicated strings within Notepad++ (not removing duplicated lines). For example, if I have the text:

Alice's favorite ice cream (not gelato) is: Vanilla! Alice's favorite ice cream (not gelato) is: Vanilla! Please go get some ice cream from the store

The result should be

Alice's favorite ice cream (not gelato) is: Vanilla! Please go get some ice cream from the store

I tried looking up some examples and found this but it doesn't really work exactly as expected. Thanks in advance.

2
  • how do you identify what is a duplicated string? is it always a line that contains the duplicate? Commented Apr 9, 2015 at 15:43
  • Basically every line has a duplicated string, but rather than manually going through and deleting the duplicated portions of the string I would like to automate it somehow. Commented Apr 9, 2015 at 16:17

2 Answers 2

2

If you have consecutive duplicates then you can use a regex like this:

(.*)\1

Working demo

And use the replacemente string: \1 or $1 as you can see in the substitution section in the screenshot.

enter image description here

Update: as 1010 pointed in this comment:

beware that it may match duplicates like the 'o' in "look", or parts of a sentence that repeats characters as in "... this is a duplicate"

You can prevent this by creating a limitation for minimal duplicated chars. For instance, this regex looks for duplicated chars that will have at least 5 duplicated characters (you can modify the numbere for what you consider right):

(.{5,})\1
Sign up to request clarification or add additional context in comments.

7 Comments

In Notepad++ 6.7.5 this just gives me the message "Zero length match" and no changes are made to the file.
@AdrianHHH I'm using 5.9.6 and works fine for me. I'll try updating the version
beware that it may match duplicates like the 'o' in "look", or parts of a sentence that repeats characters as in "... this is a duplicate".
@1010 Thanks, for the comment. I improved the answer thanks to you
@Fede Is there any way to accommodate the case where words with double letter would be after the duplicated text? For example: Alice's favorite ice cream (not gelato) is: Vanilla! Alice's favorite ice cream (not gelato) is: Vanilla! Please go get some ice cream from the store, and hurry! would be changed to Alice's favorite ice cream (not gelato) is: Vanilla! Please go get some ice cream from the store, and hury!
|
1

I just had a similar problem and wanted to remove duplicate lines. It turns out that there is already a built-in feature to do that.

Just go to "Edit->Line Operations->Remove Consecutive Duplicate Lines".

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.