Skip to main content
Added time comparisons
Source Link
AJNeufeld
  • 35.3k
  • 5
  • 41
  • 103

Review

Structure

Placing multiple statements on one line using the semicolon ; or putting statements after the colon : in flow-control statements is to be eschewed. Don't do it.

Type hints

If you are going to use typehints, and you should, you should use the typehints for both parameters and return values. sep() has no return type. By inspection, it returns a list of strings, so the return type should be list[str] (or prior to Python 3.9, from typing import List and use List[str])

Naming

Your names cur, and res are very terse, but are acceptable (barely).

The variable s however is unacceptable. What is s? What does it mean? Lets examine the one and only comment where it is used: # Q. Well, that was completely and utterly unhelpful.

A much better name for s would be within_quote.

Resulting code

def sep(string: str, *seps: str) -> list[str]:

    result = []
    current = ""
    within_quote = False

    for ch in string:
        if ch == "'":
            within_quote = not within_quote
        if not within_quote and ch in seps:
            result.append(current)
            current = ""
        else:
            current += ch

    result.append(current)
    return result

This is immediately more readable.

Alternate implementation

The following regular-expression solution is close to the same behaviour as your code. It deviates when multiple spaces occur, or quoted terms are not surrounded by spaces. For you example case, however, it returns identical results.

import re

def sep(subject: str, *seps: str) -> list[str]:
    separators = re.escape("".join(seps))
    pattern = f"(?:'[^']+')|(?:[^{separators}]+)"
    return re.findall(pattern, subject)

if __name__ == '__main__':
    result = sep("a b c d 'e f' g h", ' ')
    print(repr(result))

Depending on your actual input requirements, it may or may not be sufficient for your needs.

Time comparisons

On strings with between 1 and one million terms, where a term is either a random "word" or a quoted string of 2-4 random words, and a random word is 1 to 7 random lowercase letters, I get the following time comparisons:

log-log time comparison

Once above strings of 10 words, my solution and Fernandez's solution are both faster than the original. While mine beats Fernandez's in conciseness, their solution is clearly faster.

Review

Structure

Placing multiple statements on one line using the semicolon ; or putting statements after the colon : in flow-control statements is to be eschewed. Don't do it.

Type hints

If you are going to use typehints, and you should, you should use the typehints for both parameters and return values. sep() has no return type. By inspection, it returns a list of strings, so the return type should be list[str] (or prior to Python 3.9, from typing import List and use List[str])

Naming

Your names cur, and res are very terse, but are acceptable (barely).

The variable s however is unacceptable. What is s? What does it mean? Lets examine the one and only comment where it is used: # Q. Well, that was completely and utterly unhelpful.

A much better name for s would be within_quote.

Resulting code

def sep(string: str, *seps: str) -> list[str]:

    result = []
    current = ""
    within_quote = False

    for ch in string:
        if ch == "'":
            within_quote = not within_quote
        if not within_quote and ch in seps:
            result.append(current)
            current = ""
        else:
            current += ch

    result.append(current)
    return result

This is immediately more readable.

Alternate implementation

The following regular-expression solution is close to the same behaviour as your code. It deviates when multiple spaces occur, or quoted terms are not surrounded by spaces. For you example case, however, it returns identical results.

import re

def sep(subject: str, *seps: str) -> list[str]:
    separators = re.escape("".join(seps))
    pattern = f"(?:'[^']+')|(?:[^{separators}]+)"
    return re.findall(pattern, subject)

if __name__ == '__main__':
    result = sep("a b c d 'e f' g h", ' ')
    print(repr(result))

Depending on your actual input requirements, it may or may not be sufficient for your needs.

Review

Structure

Placing multiple statements on one line using the semicolon ; or putting statements after the colon : in flow-control statements is to be eschewed. Don't do it.

Type hints

If you are going to use typehints, and you should, you should use the typehints for both parameters and return values. sep() has no return type. By inspection, it returns a list of strings, so the return type should be list[str] (or prior to Python 3.9, from typing import List and use List[str])

Naming

Your names cur, and res are very terse, but are acceptable (barely).

The variable s however is unacceptable. What is s? What does it mean? Lets examine the one and only comment where it is used: # Q. Well, that was completely and utterly unhelpful.

A much better name for s would be within_quote.

Resulting code

def sep(string: str, *seps: str) -> list[str]:

    result = []
    current = ""
    within_quote = False

    for ch in string:
        if ch == "'":
            within_quote = not within_quote
        if not within_quote and ch in seps:
            result.append(current)
            current = ""
        else:
            current += ch

    result.append(current)
    return result

This is immediately more readable.

Alternate implementation

The following regular-expression solution is close to the same behaviour as your code. It deviates when multiple spaces occur, or quoted terms are not surrounded by spaces. For you example case, however, it returns identical results.

import re

def sep(subject: str, *seps: str) -> list[str]:
    separators = re.escape("".join(seps))
    pattern = f"(?:'[^']+')|(?:[^{separators}]+)"
    return re.findall(pattern, subject)

if __name__ == '__main__':
    result = sep("a b c d 'e f' g h", ' ')
    print(repr(result))

Depending on your actual input requirements, it may or may not be sufficient for your needs.

Time comparisons

On strings with between 1 and one million terms, where a term is either a random "word" or a quoted string of 2-4 random words, and a random word is 1 to 7 random lowercase letters, I get the following time comparisons:

log-log time comparison

Once above strings of 10 words, my solution and Fernandez's solution are both faster than the original. While mine beats Fernandez's in conciseness, their solution is clearly faster.

Source Link
AJNeufeld
  • 35.3k
  • 5
  • 41
  • 103

Review

Structure

Placing multiple statements on one line using the semicolon ; or putting statements after the colon : in flow-control statements is to be eschewed. Don't do it.

Type hints

If you are going to use typehints, and you should, you should use the typehints for both parameters and return values. sep() has no return type. By inspection, it returns a list of strings, so the return type should be list[str] (or prior to Python 3.9, from typing import List and use List[str])

Naming

Your names cur, and res are very terse, but are acceptable (barely).

The variable s however is unacceptable. What is s? What does it mean? Lets examine the one and only comment where it is used: # Q. Well, that was completely and utterly unhelpful.

A much better name for s would be within_quote.

Resulting code

def sep(string: str, *seps: str) -> list[str]:

    result = []
    current = ""
    within_quote = False

    for ch in string:
        if ch == "'":
            within_quote = not within_quote
        if not within_quote and ch in seps:
            result.append(current)
            current = ""
        else:
            current += ch

    result.append(current)
    return result

This is immediately more readable.

Alternate implementation

The following regular-expression solution is close to the same behaviour as your code. It deviates when multiple spaces occur, or quoted terms are not surrounded by spaces. For you example case, however, it returns identical results.

import re

def sep(subject: str, *seps: str) -> list[str]:
    separators = re.escape("".join(seps))
    pattern = f"(?:'[^']+')|(?:[^{separators}]+)"
    return re.findall(pattern, subject)

if __name__ == '__main__':
    result = sep("a b c d 'e f' g h", ' ')
    print(repr(result))

Depending on your actual input requirements, it may or may not be sufficient for your needs.