5

I have the following strings: "abcdefx", "zzdefghij" I would like to extract the common part of the two strings, i.e. here "def". I tried with sed but I couldn't do that unless the common part was a prefix like this:

fprint "%s\n%\n" | sed -e 'N;s:\(.*\).*\n\1.*:\1:'
6
  • 5
    This is not a trivial problem even using a real programming language. Commented Apr 25, 2014 at 14:15
  • What is the use case? This sounds like homework or idle curiosity. Commented Apr 25, 2014 at 14:25
  • I have files which I would like to classify following their base name into directories. The problem is that for one given base there could be files with some prefix separated with '-' or '_' and zero, one, three, or four trailing substrings separated with '-', '_', or even nothing. the only way to determine the base name for files of a given base is to extract the common part of the file names. Commented Apr 25, 2014 at 14:43
  • You might want to check out rosettacode.org/wiki/Longest_common_subsequence -- some solutions for a similar problem in various languages. Commented Apr 25, 2014 at 15:01
  • 1
    (saying the obvious) ... and you have no opportunity to reorganize the generating system to create something that can be parsed more easily? Good luck. Commented Apr 25, 2014 at 15:05

2 Answers 2

7

This pure bash script will find the first longest substring of its two arguments, in a fairly efficient way:

#!/bin/bash

if ((${#1}>${#2})); then
   long=$1 short=$2
else
   long=$2 short=$1
fi

lshort=${#short}
score=0
for ((i=0;i<lshort-score;++i)); do
   for ((l=score+1;l<=lshort-i;++l)); do
      sub=${short:i:l}
      [[ $long != *$sub* ]] && break
      subfound=$sub score=$l
   done
done

if ((score)); then
   echo "$subfound"
fi

Demo (I called the script banana):

$ ./banana abcdefx zzdefghij
def
$ ./banana "I have the following strings: abcdefx, zzdefghij I would like to extract the common part of the two strings, i.e. here def." "I tried with sed but I couldn't do that unless the common part was a prefix like this"
 the common part 
Sign up to request clarification or add additional context in comments.

1 Comment

+1 superb... Had similar algorithm in mind but didn't get time to complete it.
4

I thought this sounded interesting, here is my solution:

first="abcdefx"
second="zzdefghij"

for i in $(seq ${#first} -1 1); do
    for j in $(seq 0 $((${#first}-i))); do
        grep -q "${first:$j:$i}" <<< "$second" && match="${first:$j:$i}" && break 2
    done
done

echo "Longest common substring: ${match:-None found}"

Output:

Longest common substring: def

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.