1

I have written a Python function for some post processing in my text recognition algorithm. It works fine, but it seems to be kind of lengthy and has many if-else-conditions. I am looking for a way to simplify my code:

def postProcessing(new_s): #new_s is a list
  import string
    
  new_s=removeFrontLetters(new_s) #This function has to be called first
 
  if all(ch in string.digits for ch in new_s[:4]): #checking if first 4 elements are digits.
    
    if all(ch in string.ascii_letters for ch in new_s[4:5]): #checking if the [4] and [5] are letters
      
      if len(new_s)>=7:         
        if new_s[6]=='C': 
          
          new_s=new_s[:7] #if length>=7 and the [6] =='C' the reversed of the first 7 elements has to be returned.
          new_s=list(reversed(new_s)) 
          return(new_s)

          
        else:
          new_s=new_s[:6] #if length>=7 but the [6] =!'C' the reversed of the first 6 elements has to be returned.
          new_s=list(reversed(new_s))

          return(new_s)
      

      else:
        new_s=list(reversed(new_s)) #if length<7, the reversed of the given list has to be returned.
        return(new_s)
    else:
      print('not valid') #if the [4] and [5] are not letters, it is not valid
  else:
    print('not valid') #if the [:4] are not digits, it is not valid

This seems very beginner-level and lengthy. I am a beginner, but I am trying to improve my function. Do you have suggestions?

5
  • 1
    We can already notice that there is a potential silent bug in your code. new_s[4:5] gives you the 5th element only, inside a list. Commented Sep 25, 2022 at 12:18
  • What is your function do??? Can you tell us so we can provide some other solutions? Commented Sep 25, 2022 at 12:20
  • @codester_09 this gets a list in first. and it should check if the first 4 elements are digits and the 5th and 6th are letters. If this first two conditions are satisfied, and if the length of the list >= 7, it should check the 7th element. if 7th element is 'C' it should return the reverse of the [:7]. if not 'C' or if length=<7, it should return reverse [:6]. Commented Sep 25, 2022 at 12:28
  • a basic simplification: all(ch in string.digits for ch in new_s[:4]) is also "".join(new_s[:4]).isdigit() (and all(ch in string.ascii_letters for ch in new_s[4:5]) is new_s[4].isalpha() Commented Sep 25, 2022 at 12:28
  • #if the [4] and [5] are not letters you're not testing [5] Commented Sep 25, 2022 at 12:31

3 Answers 3

2

You can invert your if statements and use early returns to reduce the indentation of your code.

def postProcessing(new_s):  # new_s is a list
    import string

    new_s = removeFrontLetters(new_s)  # This function has to be called first

    if not all(ch in string.digits for ch in new_s[:4]):  # checking if first 4 elements are digits.
        raise ValueError("First four elements must be digits")
    
    if not all(ch in string.ascii_letters for ch in new_s[4:5]):  # checking if the [4] and [5] are letters
        raise ValueError("First elements 4 and 5 must be digits")
    
    if len(new_s) <= 7:
        new_s = list(reversed(new_s))  # if length<7, the reversed of the given list has to be returned.
        return (new_s)
    
    if new_s[6] == 'C':

        new_s = new_s[
                :7]  # if length>=7 and the [6] =='C' the reversed of the first 7 elements has to be returned.
        new_s = list(reversed(new_s))
        return (new_s)

    new_s = new_s[:6]  # if length>=7 but the [6] =!'C' the reversed of the first 6 elements has to be returned.
    new_s = list(reversed(new_s))
    return (new_s)
Sign up to request clarification or add additional context in comments.

2 Comments

I assume leaving the obvious 4:5 issue is on purpose?
@njzk2 Yes, left as an exercise for the reader.
2

It's quite neat you ask 'the world' for advise. Did you know there's a dedicated stackexchange site for this? https://codereview.stackexchange.com/.

Unless you insist writing Python code for this, it seems that you need a regular expression here.

So some tips:

  • use regex for pattern matching
  • use variables to express what an expression means
  • use exceptions instead of an 'invalid value' string
  • separate the 'parsing' from the processing to keep functions small and focused
  • use doctest to document and test small functions
def post_process(new_s):
    """
    reverse a string (with a twist)

    what follows is a doctest.  You can run it with
    $ python -m doctest my_python_file.py

    >>> post_process('1234abCdef')
    'Cba4321'
    >>> post_process('1234abdef')
    'ba4321'
    """

    cmds = {
      'C': cmd_c,
      '': cmd_none
    }

    command, content = command_and_content(new_s)
    process = cmds[command]
    return process(content)


def cmd_c(content):
  return 'C' + "".join(reversed(content))
def cmd_none(content):
  return "".join(reversed(content))

The command_and_content function replaces the parsing logic:

def command_and_content(new_s):
    # get help on https://regex101.com/ to find the
    # regular expression for four digits and two letters
    digits_then_ascii = re.compile(r"(\d{4}[a-z]{2})(C?)(.*)")
    if match := digits_then_ascii.match(new_s):
      content = match.group(1)
      command = match.group(2)
      return command, content

    # pylint will ask you to not use an else clause after a return
    # Also, Python advises exceptions for notifying erroneous input
    raise ValueError(new_s)

Comments

1

From the context you provided, I assume that all this processing can happen in-place (i.e. without the need to allocate additional memory). The benefit of lists is that they are mutable, so you can actually do all your operations in-place.

This adheres to the style conventions (PEP 8) and uses correct type annotations (PEP 484):

from string import digits, ascii_letters


def remove_front_letters(new_s: list[str]) -> None:
    ...
    raise NotImplementedError


def post_processing(new_s: list[str]) -> None:
    remove_front_letters(new_s)
    if any(ch not in digits for ch in new_s[:4]):
        raise ValueError("The first 4 characters must be digits")
    if any(ch not in ascii_letters for ch in new_s[4:6]):
        raise ValueError("The 5th and 6th characters must be letters")
    if len(new_s) >= 7:
        if new_s[6] == 'C':
            del new_s[7:]
        else:
            del new_s[6:]
    new_s.reverse()

If you do want a new list, you can just call this function with a .copy() of your input list.

References: list methods; del statement

PS: If you use Python version 3.8 or lower, instead of list[str] you'll need to use typing.List[str].

Also someone mentioned the possibility of replacing the iteration via all() (or any()) with a "".join(...).isdigit() for example. While this is certainly also correct and technically less code, I am not sure it is necessarily more readable. More importantly it creates a new string in the process, which I don't think is necessary.

By the way, you could even reduce that conditional deletion of list elements to a one liner like this:

...
    if len(new_s) >= 7:
        del new_s[7 if new_s[6] == 'C' else 6:]
    new_s.reverse()

But I would argue that this is worse because it is less readable. Personal preference I guess.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.