2

I want to split alphabetical characters (letters) in a string using commas, but I also have non alphabetical characters that I want to preserve.

Examples (Input -> Desired Output):

"ABC" -> "A,B,C"
"-ABC" -> "-A,B,C"
"AB-C" -> "A,B,-C"

There can be maximum one "-" before a given letter.

My first attempt was using join method, but it would apply to any character of the string, whether is a letter or not.

e.g. ','.join("-ABC") gives "-,A,B,C" which is not what I want

Any suggestion?

2
  • Will the input string ever end in a non-alpha character? Commented Dec 14, 2016 at 12:39
  • In my practical case it can happen only if the string consists only in one non-alpha character, e.g. "-" Commented Dec 14, 2016 at 15:43

4 Answers 4

9

Match the letters, but use a negative lookahead to exclude a letter at the end:

re.sub(r'([A-Z])(?!$)', r'\1,', inputstring)

See the online demo at regex101.com, this Python session:

>>> import re
>>> re.sub(r'([A-Z])(?!$)', r'\1,', 'ABC')
'A,B,C'
>>> re.sub(r'([A-Z])(?!$)', r'\1,', '-ABC')
'-A,B,C'
>>> re.sub(r'([A-Z])(?!$)', r'\1,', 'AB-C')
'A,B,-C'
Sign up to request clarification or add additional context in comments.

4 Comments

No need capturing the whole pattern as you can always reference the whole match value with r'\g<0>'
@WiktorStribiżew: meh, you either introduce verbosity in the pattern (the group capture) or in the replacement pattern (having to use g and the angle brackets).
It is not a "verbosity" thing: capturing creates a memory buffer for the submatch.
@WiktorStribiżew: sure. And for larger captures that is going to be an issue. Not for a 1-letter capture, however.
1

This is probably a job for regex, but you can do it with .join, you just need to use a list comprehension with a test.

If the input string never ends in a non-alpha character you could do this:

data = ("ABC", "-ABC", "AB-C")
for s in data:
    t = ''.join([c + ',' if c.isalpha() else c for c in s])[:-1]
    print('{!r}\t-> {!r}'.format(s, t))

output

'ABC'   -> 'A,B,C'
'-ABC'  -> '-A,B,C'
'AB-C-' -> 'A,B,-C,-'

I admit that the [:-1] is a bit kludgy, but it's probably more efficient than doing an index check on every char to see if it's at the end of the string.

If the input string can end in a non-alpha character, we can do this:

data = ("ABC", "-ABC", "AB-C", "A-BC-")
for s in data:
    t = ''.join([c + ',' if c.isalpha() else c for c in s[:-1]] + [s[-1]])
    print('{!r}\t-> {!r}'.format(s, t))

output

'ABC'   -> 'A,B,C'
'-ABC'  -> '-A,B,C'
'AB-C'  -> 'A,B,-C'
'A-BC-' -> 'A,-B,C,-'

Ok, it's probably kludgier than the first version, but hey, it works. :)

As I said earlier, a regex substitution is probably the sane way to do this.

Comments

1

I know that this is an old post, but isn't:

','.join('AB-C').replace('-,','-')

>>> A,B,-C

easier?

Comments

0

isalpha is a function that can be called on any string object will return bool value for checking if alphabet.

def split_char(s):
    final = []
    temp = ''
    for i in s:
        if i.isalpha():
            final.append(temp+i)
            temp = ''
        else:
            temp = temp + i

    return final

print split_char('-ABC')

>>>['-A', 'B', 'C']

temp_list = split_char('AB-C')
print ','.join(temp_list)

>>> A,B,-C

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.