What is a shell command to find the longest common substring of two strings in unix? like: foo 'abcdefghi' 'abjklmdefnop' prints: def
-
Does this need to be POSIX? Targeted at any specific distro?Daenyth– Daenyth2012-02-21 20:29:29 +00:00Commented Feb 21, 2012 at 20:29
-
it's best to have it working on most of linuxesuser1081596– user10815962012-02-22 11:26:54 +00:00Commented Feb 22, 2012 at 11:26
-
@user1081596: Then I recommend implementing this in perl, because it will be installed on every linux unless the user has removed it.Daenyth– Daenyth2012-02-23 16:36:00 +00:00Commented Feb 23, 2012 at 16:36
-
@user1081596 In this case, why would Perl be a better choice than Ruby, Python, or any other scripting language?Anderson Green– Anderson Green2013-01-08 20:35:31 +00:00Commented Jan 8, 2013 at 20:35
Add a comment
|
2 Answers
I am not sure if there is a single command that does the job for you but the following bash script should do it.
#!/bin/bash
word1="$1"
word2="$2"
if [ ${#word1} -lt ${#word2} ]
then
word1="$2"
word2="$1"
fi
for ((i=${#word2}; i>0; i--)); do
for ((j=0; j<=${#word2}-i; j++)); do
if [[ $word1 =~ ${word2:j:i} ]]
then
echo ${word2:j:i}
exit
fi
done
done
save the above as a file substr.sh do chmod +x substr.sh
pranithk @ ~
09:24:32 :) $ ./substr.sh 'abcdefghi' 'abcdeghi'
abcde
pranithk @ ~
09:24:33 :) $ ./substr.sh 'abcdefghi' 'abjklmdefnop'
def
Comments
This is known as the longest common subsequence problem and there are some great algorithms for it. Check out the dynamic programming solution (If you google it, you'll find a ton of implementations). If you really want to understand this at an algorithmic level, check out this MIT lecture,
2 Comments
user1081596
Thank you for this nice link. But for now I am afraid I just need a quick standard command line solution and I would not mind if it is implemented with O(n^5) complexity.
Daenyth
@user1081596: What size will your inputs be?