Find newline while iterating through array of strings

Question

Im working on a transpiler in python right now and one function of my code is to find certain symbols and to place spaces around them so theyre easier to parse later.

This is the code that initially places to the spaces around the chars

def InsertSpaces(string):
    to_return = list(string)

    for i, char in enumerate(to_return):
        if char == '?' or char == '#' or char == '@':
            to_return[i] = (to_return[i] + ' ')[::-1]
            to_return[i] += ' '

    print(''.join(to_return))

Although this worked, it created an annoying problem

It created spaces right after the newlines, which could cause problems later on and is just ugly.

So this:

'@0x0D?0A\n@0x1f?@0x2f?48#65#6C#6C#6F#2C#20#57#6F#72#6C#64#21'

Becomes this:

 ' @ 0x0D ? 0A
  @ 0x1f ?  @ 0x2f ? 48 # 65 # 6C # 6C # 6F # 2C # 20 # 57 # 6F # 72 # 6C # 64 # 21'

(Keep in mind this splits up the string into a list)

So I wrote this to detect newlines inside the list in which I would remove the spaces afterwards.

for char in to_return:
    char_next = to_return[to_return.index(char) + 1]
    if (char + char_next) == '':
        print('found a newline')

The issue is that it does not detect any newlines.

Printing the pairs of characters you can see the newline character but it cant be found by the code since it turns into a newline which is not readable by a simple string I believe.

 @ 0
0x
x0
0x
D ? 
 ? 0
0x
A


 @ 
 @ 0
0x
x0
1f
f ? 
 ? 0
 @ 0
0x
x0
2f
f ? 
 ? 0
48
8 # 
 # 6
65
5 # 
 # 6
65
C # 
 # 6
65
C # 
 # 6
65
F # 
 # 6
2f
C # 
 # 6
2f
0x
 # 6
5 # 
7 # 
 # 6
65
F # 
 # 6
7 # 
2f
 # 6
65
C # 
 # 6
65
48
 # 6
2f
1f

Is there a way to detect a newline character while iterating through a list of strings?

Christopher Shroba · Accepted Answer · 2018-03-19 14:57:03Z

2

First off, this is kind of a strange line of code:

to_return[i] = (to_return[i] + ' ')[::-1]

to_return[i] is one character long, so this line is equivalent to:

to_return[i] = ' ' + to_return[i]

Second of all, if you're just trying to pad all the '?', '#', and '@' with spaces, why not try a simple replace:

def InsertSpaces(string):
  return string.replace("?"," ? ").replace("#", " # ").replace("@", " @ ")

or even shorter if you use the re (regex) module:

def InsertSpace(string):
  return re.sub("(#|\?|@)",r" \1 ", string)

edited Mar 19, 2018 at 14:57

answered Mar 19, 2018 at 14:53

Christopher Shroba

7,7148 gold badges48 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Jongware Over a year ago

I thought of a simple replace as well but ... it does not do what OP asks! This code also inserts spaces before and after newlines. Of course adding yet more replaces could get rid of them, but I imagine there must be a fancier solution.

Mercury Platinum Over a year ago

I bet we can use re.sub again and replace \n with \n

Mercury Platinum Over a year ago

This works! return re.sub("\n ", r"\n", re.sub("(#|\?|@)",r" \1 ", string))

Christopher Shroba Over a year ago

Great! Quick suggestion to make your code more readable: split that into two separate statements instead of nesting. Something like: spaced_string=re.sub("(#|\?|@)",r" \1 ", string) and then return spaced_string.replace("\n ","\n"). (also note that I didn't use re.sub for the second replacement since regexes aren't necessary there)

Graipher · Accepted Answer · 2018-03-19 14:50:23Z

1

Iterate over both the current char and the next and use '\n':

for char, char_next in zip(to_return, to_return[1:]):
    if char + char_next == '\n ':
        print('found a newline')

answered Mar 19, 2018 at 14:50

Graipher

7,24630 silver badges49 bronze badges

2 Comments

Mercury Platinum Over a year ago

Is there a way to do it with regex? The other answer has a regex solution to a different problem and I think this could apply by replacing \n with '\n'

Mercury Platinum Over a year ago

Figured it out. Fixes all my problems. Re is powerfull. return re.sub("\n ", r"\n", re.sub("(#|\?|@)",r" \1 ", string))[1:]

Jongware · Accepted Answer · 2018-03-19 15:21:26Z

You don't have to "scan" for a newline, as long as you only add a space when there is a character which is not a space, either before or after, your current character.

I don't think it can be done with a single regular expression, but with two you can add just spaces where needed. It needs two lookbehinds/lookaheads, because there are two conditions:

There must be at least one character before/after the @, #, or ?;
It must not be a newline.

I added a third condition for consistency:

.. this character must not be a space.

so there won't be a space added when there already is one. (It's only a convenience addition, because \S happens to match "everything that is not space-like".)

Why do you need two lookbehinds? Because one of them will match if there is a character (which must not be a space-like) and another will match if there is "not" a newline, which includes start and end of the string itself.

The following code, with a slightly altered input string to show off that it works at edge cases,

import re

str = '@0x0D?0A\n@0x1f?@0x2f?48# 65#6C#6C#6F#2C#20#57#6F#72#6C#64#21@'

str = re.sub(r'(?<=\S)(?<!\n)([@?#])', r' \1', str)
str = re.sub(r'([@?#])(?!\n)(?=\S)', r'\1 ', str)
print ('"'+str+'"')

results in

"@ 0x0D ? 0A
@ 0x1f ? @ 0x2f ? 48 # 65 # 6C # 6C # 6F # 2C # 20 # 57 # 6F # 72 # 6C # 64 # 21 @"

where the double quotes are only added to show begin and end of the result string.

pault · Accepted Answer · 2018-03-19 15:57:20Z

Here's a way to modify your function to solve your problem without regex.

In each iteration check to see if the previous or next characters are new lines. In those cases, do not add a space:

def InsertSpaces(s):
    to_return = []

    for i, char in enumerate(s):
        if char in {'?', '#', '@'}:
            val = ' ' if ((i-1) > 0) and s[i-1] != '\n' else ''
            val += char
            val += ' ' if ((i+1) < len(s)) and s[i+1] != '\n' else ''
        else:
            val = char
        to_return.append(val)

    return ''.join(to_return)

s = '@0x0D?0A\n@0x1f?@0x2f?48#65#6C#6C#6F#2C#20#57#6F#72#6C#'
print(repr(InsertSpaces(s)))
#'@ 0x0D ? 0A\n@ 0x1f ?  @ 0x2f ? 48 # 65 # 6C # 6C # 6F # 2C # 20 # 57 # 6F # 72 # 6C #'

The key is this part:

val = ' ' if ((i-1) > 0) and s[i-1] != '\n' else ''        #1
val += char                                                #2
val += ' ' if ((i+1) < len(s)) and s[i+1] != '\n' else ''  #3

Line 1: Add either ' ' or '' to the beginning of the string based on the conditional. We check to see if the previous char s[i-1] is not a newline character \n. We also have to check that the index is inbounds ((i-1) > 0)
Line 2: Always add the current char
Line 3: Similar logic as Line 1, but check the next character and make sure you're not at the end of the string.

This will also not add a space after the special character if it is at the end of the string (or at the start). If you want that to happen, you'd have to slightly modify the conditional.

A couple of other changes that I made:

Renamed the input variable s because string is the name of a class
Initialized to_return to an empty list which will be appended to and enumerate(s) (instead of to_return) because bad things can happen when you modify the object over which you are iterating
Using in {set} instead of checking for all characters individually

Collectives™ on Stack Overflow

Find newline while iterating through array of strings

4 Answers 4

4 Comments

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related