3

I am trying to implement word level matches in Google Diff Match Patch, but it is beating me up.

The result I get is:

 =I've never been =|-a-|=t=|= th=|-e-|=se places=|
 =I've never been =|=t=|+o+|= th=|+o+|=se places=|

The result I want is:

 =I've never been =|-at these-|= places=|
 =I've never been =|+to those+|= places=|

The documentation says:

make a copy of diff_linesToChars and call it diff_linesToWords. Look for the line that identifies the next line boundary: lineEnd = text.indexOf('\n', lineStart);

In the c# version, I found the line to change in diff_linesToCharsMunge, which I changed to:

lineEnd = text.Replace(@"/[\n\.,;:]/ g"," ").IndexOf(" ", lineStart);

However, there is no change in granularity -it still finds differences at character level.

I am calling:

List<Diff> differences = diffs.diff_main(linepair.Original, linepair.Corrected, true);
diffs.diff_cleanupSemantic(differences); 

I have stepped through to make sure that it is hitting the change I made (incidently, there is a hardcoded minimum of 100 characters before it kicks in).

1
  • Were you able to solve this issue? I am stuck with same problem. Could yo post the code here if you have managed to solve it. Commented Feb 8, 2021 at 16:44

2 Answers 2

4

I have created a sample dotnet project with diffmatch program. Its probably older version of DiffMatchPatch file but the word and lines work.

DiffMatchPatchSample

For your above sample text ,I get below output.

at these | to those

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much, Niketh. I actually gave up on it and moved onto something else. Now you have a solution, I will take another look at it.
0

I was stuck with the same problem in php's version of this library and found a solution here.

You just have to make a copy of linesToChars function called linesToWords

Here's How I did it

$dtk = new DiffToolkit();

$a = $dtk->linesToWords($old ,$new);
$lineText1 = $a[0];
$lineText2 = $a[1];
$lineArray = $a[2];

$diffs = $dmp->diff_main($lineText1, $lineText2);
$dtk->charsToLines($diffs ,$lineArray );

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.