1

I want my string to only have alphanumeric characters, -, and underscores. Thats it. I am trying to write a method that takes in a user input string and converts it so that it follows the guideline.

My regex is obviously a-zA-Z0-9_-. What I want to do is replace all the spaces with the -, and just remove all the other characters that don't fall under my regex.

So, the string 'Hello, world!' would get converted into 'Hello-world'. The special characters get removed, and the space is replaced with a -.

What would be the most efficient way to do this using python? Do I have to iterate over the entire string character by character, or is there a better way? Thanks!

6
  • Does your output include digits? They are alphanumeric, but fail your regex Commented Jan 31, 2017 at 15:54
  • Is the Uppercase to lowercase intentional? Commented Jan 31, 2017 at 15:54
  • Could it be, that you need this for forming a url of a title? Commented Jan 31, 2017 at 15:54
  • @PatrickHaugh No digits, just A to Z (in both upper and lowercase, dash(-) and underscore(_) allowed. I made a mistake before. Its fixed now. Commented Jan 31, 2017 at 15:56
  • @ppasler Yeah, this is for a URL. Commented Jan 31, 2017 at 15:57

2 Answers 2

3

You can do it with two subs: 1) replace spaces with -; 2) remove other unwanted characters:

s = 'Hello, world!'

import re
re.sub("[^a-zA-Z_-]", "", re.sub("\s+", "-", s))
# 'Hello-world'

If you want to keep digits in your string:

re.sub("[^a-zA-Z0-9_-]", "", re.sub("\s+", "-", s))
# 'Hello-world'

Here [^a-zA-Z_-] matches a single character that is not a letter(upper and lower case), underscore and dash, the dash needs to be placed at the end of the character class [] so that it won't be treated as range but literal.

Sign up to request clarification or add additional context in comments.

Comments

1

What you want is also often used when generating URL names for content. It is implemented in django.utils.text.slugify. The slugify function converts to lowercase though. Here is a simplified version of Djangos slugify function that preserves case:

import re
def slugify(value):
    value = re.sub('[^A-Za-z_\s-]', '', value, flags=re.U).strip()
    return re.sub('[-\s]+', '-', value, flags=re.U)
print(slugify("Hello World!"))
# Hello-World

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.