bash longest common part of two string

Question

I have the following strings: "abcdefx", "zzdefghij" I would like to extract the common part of the two strings, i.e. here "def". I tried with sed but I couldn't do that unless the common part was a prefix like this:

fprint "%s\n%\n" | sed -e 'N;s:\(.*\).*\n\1.*:\1:'

This is not a trivial problem even using a real programming language. — stark
– stark, Commented Apr 25, 2014 at 14:15
What is the use case? This sounds like homework or idle curiosity. — l0b0
– l0b0, Commented Apr 25, 2014 at 14:25
I have files which I would like to classify following their base name into directories. The problem is that for one given base there could be files with some prefix separated with '-' or '_' and zero, one, three, or four trailing substrings separated with '-', '_', or even nothing. the only way to determine the base name for files of a given base is to extract the common part of the file names. — user1850133
– user1850133, Commented Apr 25, 2014 at 14:43
You might want to check out rosettacode.org/wiki/Longest_common_subsequence -- some solutions for a similar problem in various languages. — glenn jackman
– glenn jackman, Commented Apr 25, 2014 at 15:01
(saying the obvious) ... and you have no opportunity to reorganize the generating system to create something that can be parsed more easily? Good luck. — shellter
– shellter, Commented Apr 25, 2014 at 15:05

gniourf_gniourf · Accepted Answer · 2014-04-28 11:06:22Z

7

This pure bash script will find the first longest substring of its two arguments, in a fairly efficient way:

#!/bin/bash

if ((${#1}>${#2})); then
   long=$1 short=$2
else
   long=$2 short=$1
fi

lshort=${#short}
score=0
for ((i=0;i<lshort-score;++i)); do
   for ((l=score+1;l<=lshort-i;++l)); do
      sub=${short:i:l}
      [[ $long != *$sub* ]] && break
      subfound=$sub score=$l
   done
done

if ((score)); then
   echo "$subfound"
fi

Demo (I called the script banana):

$ ./banana abcdefx zzdefghij
def
$ ./banana "I have the following strings: abcdefx, zzdefghij I would like to extract the common part of the two strings, i.e. here def." "I tried with sed but I couldn't do that unless the common part was a prefix like this"
 the common part

edited Apr 28, 2014 at 11:06

answered Apr 25, 2014 at 16:02

gniourf_gniourf

47.4k10 gold badges105 silver badges113 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

anubhava Over a year ago

+1 superb... Had similar algorithm in mind but didn't get time to complete it.

Josh Jolly · Accepted Answer · 2014-04-25 15:37:04Z

4

I thought this sounded interesting, here is my solution:

first="abcdefx"
second="zzdefghij"

for i in $(seq ${#first} -1 1); do
    for j in $(seq 0 $((${#first}-i))); do
        grep -q "${first:$j:$i}" <<< "$second" && match="${first:$j:$i}" && break 2
    done
done

echo "Longest common substring: ${match:-None found}"

Output:

Longest common substring: def

answered Apr 25, 2014 at 15:37

Josh Jolly

11.9k3 gold badges41 silver badges57 bronze badges

Collectives™ on Stack Overflow

bash longest common part of two string

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related