2

Im working on a transpiler in python right now and one function of my code is to find certain symbols and to place spaces around them so theyre easier to parse later.

This is the code that initially places to the spaces around the chars

def InsertSpaces(string):
    to_return = list(string)

    for i, char in enumerate(to_return):
        if char == '?' or char == '#' or char == '@':
            to_return[i] = (to_return[i] + ' ')[::-1]
            to_return[i] += ' '

    print(''.join(to_return))

Although this worked, it created an annoying problem

It created spaces right after the newlines, which could cause problems later on and is just ugly.

So this:

'@0x0D?0A\n@0x1f?@0x2f?48#65#6C#6C#6F#2C#20#57#6F#72#6C#64#21'

Becomes this:

 ' @ 0x0D ? 0A
  @ 0x1f ?  @ 0x2f ? 48 # 65 # 6C # 6C # 6F # 2C # 20 # 57 # 6F # 72 # 6C # 64 # 21'

(Keep in mind this splits up the string into a list)

So I wrote this to detect newlines inside the list in which I would remove the spaces afterwards.

for char in to_return:
    char_next = to_return[to_return.index(char) + 1]
    if (char + char_next) == '':
        print('found a newline')

The issue is that it does not detect any newlines.

Printing the pairs of characters you can see the newline character but it cant be found by the code since it turns into a newline which is not readable by a simple string I believe.

 @ 0
0x
x0
0x
D ? 
 ? 0
0x
A


 @ 
 @ 0
0x
x0
1f
f ? 
 ? 0
 @ 0
0x
x0
2f
f ? 
 ? 0
48
8 # 
 # 6
65
5 # 
 # 6
65
C # 
 # 6
65
C # 
 # 6
65
F # 
 # 6
2f
C # 
 # 6
2f
0x
 # 6
5 # 
7 # 
 # 6
65
F # 
 # 6
7 # 
2f
 # 6
65
C # 
 # 6
65
48
 # 6
2f
1f

Is there a way to detect a newline character while iterating through a list of strings?

4 Answers 4

2

First off, this is kind of a strange line of code:

to_return[i] = (to_return[i] + ' ')[::-1]

to_return[i] is one character long, so this line is equivalent to:

to_return[i] = ' ' + to_return[i]

Second of all, if you're just trying to pad all the '?', '#', and '@' with spaces, why not try a simple replace:

def InsertSpaces(string):
  return string.replace("?"," ? ").replace("#", " # ").replace("@", " @ ")

or even shorter if you use the re (regex) module:

def InsertSpace(string):
  return re.sub("(#|\?|@)",r" \1 ", string)
Sign up to request clarification or add additional context in comments.

4 Comments

I thought of a simple replace as well but ... it does not do what OP asks! This code also inserts spaces before and after newlines. Of course adding yet more replaces could get rid of them, but I imagine there must be a fancier solution.
I bet we can use re.sub again and replace \n with \n
This works! return re.sub("\n ", r"\n", re.sub("(#|\?|@)",r" \1 ", string))
Great! Quick suggestion to make your code more readable: split that into two separate statements instead of nesting. Something like: spaced_string=re.sub("(#|\?|@)",r" \1 ", string) and then return spaced_string.replace("\n ","\n"). (also note that I didn't use re.sub for the second replacement since regexes aren't necessary there)
1

Iterate over both the current char and the next and use '\n':

for char, char_next in zip(to_return, to_return[1:]):
    if char + char_next == '\n ':
        print('found a newline')

2 Comments

Is there a way to do it with regex? The other answer has a regex solution to a different problem and I think this could apply by replacing \n with '\n'
Figured it out. Fixes all my problems. Re is powerfull. return re.sub("\n ", r"\n", re.sub("(#|\?|@)",r" \1 ", string))[1:]
1

You don't have to "scan" for a newline, as long as you only add a space when there is a character which is not a space, either before or after, your current character.

I don't think it can be done with a single regular expression, but with two you can add just spaces where needed. It needs two lookbehinds/lookaheads, because there are two conditions:

  1. There must be at least one character before/after the @, #, or ?;
  2. It must not be a newline.

I added a third condition for consistency:

  1. .. this character must not be a space.

so there won't be a space added when there already is one. (It's only a convenience addition, because \S happens to match "everything that is not space-like".)

Why do you need two lookbehinds? Because one of them will match if there is a character (which must not be a space-like) and another will match if there is "not" a newline, which includes start and end of the string itself.

The following code, with a slightly altered input string to show off that it works at edge cases,

import re

str = '@0x0D?0A\n@0x1f?@0x2f?48# 65#6C#6C#6F#2C#20#57#6F#72#6C#64#21@'

str = re.sub(r'(?<=\S)(?<!\n)([@?#])', r' \1', str)
str = re.sub(r'([@?#])(?!\n)(?=\S)', r'\1 ', str)
print ('"'+str+'"')

results in

"@ 0x0D ? 0A
@ 0x1f ? @ 0x2f ? 48 # 65 # 6C # 6C # 6F # 2C # 20 # 57 # 6F # 72 # 6C # 64 # 21 @"

where the double quotes are only added to show begin and end of the result string.

Comments

1

Here's a way to modify your function to solve your problem without regex.

In each iteration check to see if the previous or next characters are new lines. In those cases, do not add a space:

def InsertSpaces(s):
    to_return = []

    for i, char in enumerate(s):
        if char in {'?', '#', '@'}:
            val = ' ' if ((i-1) > 0) and s[i-1] != '\n' else ''
            val += char
            val += ' ' if ((i+1) < len(s)) and s[i+1] != '\n' else ''
        else:
            val = char
        to_return.append(val)

    return ''.join(to_return)

s = '@0x0D?0A\n@0x1f?@0x2f?48#65#6C#6C#6F#2C#20#57#6F#72#6C#'
print(repr(InsertSpaces(s)))
#'@ 0x0D ? 0A\n@ 0x1f ?  @ 0x2f ? 48 # 65 # 6C # 6C # 6F # 2C # 20 # 57 # 6F # 72 # 6C #'

The key is this part:

val = ' ' if ((i-1) > 0) and s[i-1] != '\n' else ''        #1
val += char                                                #2
val += ' ' if ((i+1) < len(s)) and s[i+1] != '\n' else ''  #3
  • Line 1: Add either ' ' or '' to the beginning of the string based on the conditional. We check to see if the previous char s[i-1] is not a newline character \n. We also have to check that the index is inbounds ((i-1) > 0)
  • Line 2: Always add the current char
  • Line 3: Similar logic as Line 1, but check the next character and make sure you're not at the end of the string.

This will also not add a space after the special character if it is at the end of the string (or at the start). If you want that to happen, you'd have to slightly modify the conditional.

A couple of other changes that I made:

  • Renamed the input variable s because string is the name of a class
  • Initialized to_return to an empty list which will be appended to and enumerate(s) (instead of to_return) because bad things can happen when you modify the object over which you are iterating
  • Using in {set} instead of checking for all characters individually

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.