0

How can I find the position of a substring in a string without using str.find() in Python? How should I loop it?

def find substring(string,substring):
     for i in xrange(len(string)):
        if string[i]==substring[0]:
          print i
        else: print false

For example, when string = "ATACGTG" and substring = "ACGT", it should return 2. I want to understand how str.find() works

6
  • How does my solution work for you? Commented Jul 12, 2014 at 2:54
  • index is basically the same thing as find. i want to solve the problem without using any built-in functions.my idea is: if substring is in the string, then find i where string[i]=substring[0] and return i. i am not familiar with python Commented Jul 12, 2014 at 3:01
  • You're using built-in functions xrange and len, do you mean no string methods? Commented Jul 12, 2014 at 3:06
  • Building your own find function will inevitably be much slower as many of Python's built-in functions are optimized for speed and/or written in C. What is the reason for making your own function? It looks like you're searching through genome data where speed may be vital. Commented Jul 12, 2014 at 3:31
  • See this question for details on how find actually works in Python stackoverflow.com/questions/681649/… Commented Jul 12, 2014 at 3:33

3 Answers 3

1

You can use Boyer-Moore or Knuth-Morris-Pratt. Both create tables to precalculate faster moves on each miss. The B-M page has a python implementation. And both pages refer to other string-searching algorithms.

Sign up to request clarification or add additional context in comments.

3 Comments

The answers in this other question suggest that the Python find implementation is based on Boyer-Moore - stackoverflow.com/questions/681649/…
Yeah, I am sure it is. I am not sure what you are getting at, though. Is this not a good answer because it replicates in python the underlying implementation in C? If you had to implement string.find from scratch in python, what would you do?
It seems like a highly relevant answer to me, that's why I upvoted. I was just providing more information - sorry if it sounded critical.
1

I can't think of a way to do it without any built-in functions at all.

I can:

def find_substring(string, substring):

    def starts_with(string, substring):
        while True:
            if substring == '':
                return True

            if string == '' or string[0] != substring[0]:
                return False

            string, substring = string[1:], substring[1:]

    n = 0

    while string != '' and substring != '':

        if starts_with(string, substring):
            return n

        string = string[1:]

        n += 1

    return -1

print(find_substring('ATACGTG', 'ACGT'))

I.e. avoiding built-ins len(), range(), etc. By not using built-in len() we lose some efficiency in that we could have finished sooner. The OP specified iteration, which the above uses, but the recursive variant is a bit more compact:

def find_substring(string, substring, n=0):

    def starts_with(string, substring):
        if substring == '':
            return True

        if string == '' or string[0] != substring[0]:
            return False

        return starts_with(string[1:], substring[1:])

    if string == '' or substring == '':
        return -1

    if starts_with(string, substring):
        return n

    return find_substring(string[1:], substring, n + 1)

print(find_substring('ATACGTG', 'ACGT'))

Comments

0

Under the constraint of not using find, you can use str.index instead, which returns a ValueError if the substring is not found:

def find_substring(a_string, substring):
    try:
        print(a_string.index(substring))
    except ValueError:
        print('Not Found')

and usage:

>>> find_substring('foo bar baz', 'bar')
4
>>> find_substring('foo bar baz', 'quux')
Not Found

If you must loop, you can do this, which slides along the string, and with a matching first character then checks to see if the rest of the string startswith the substring, which is a match:

def find_substring(a_string, substring):
    for i, c in enumerate(a_string):
        if c == substring[0] and a_string[i:].startswith(substring):
            print(i)
            return
    else: 
        print(False)

To do it with no string methods:

def find_substring(a_string, substring):
    for i in range(len(a_string)):
        if a_string[i] == substring[0] and a_string[i:i+len(substring)] == substring:
            print(i)
            return
    else: 
        print(False)

I can't think of a way to do it without any built-in functions at all.

2 Comments

index is basically the same thing as find. i want to sove the proble without using any buil-in functions
@Nur How's this, no index?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.