String slugification in Python

Question

I am in search of the best way to "slugify" string what "slug" is, and my current solution is based on this recipe

I have changed it a little bit to:

s = 'String to slugify'

slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)

Anyone see any problems with this code? It is working fine, but maybe I am missing something or you know a better way?

are you working with unicode alot? if so, the last re.sub might be better if you wrap unicode() around it, This is what django does. Also, the [^a-z0-9]+ can be shortened to use \w . see django.template.defaultfilters, it's close to yours, but a bit more refined. — Mike Ramirez
– Mike Ramirez, Commented Apr 7, 2011 at 0:23
Are unicode characters allowed in URL? Also, I have changed \w to a-z0-9 because \w includes _ character and uppercase letters. Letters are set to lowercase in advance, so there will be no uppercase letters to match. — Zygimantas
– Zygimantas, Commented Apr 7, 2011 at 1:21
'_' is valid (but your choice, you did ask), unicode is as percent encoded chars. — Mike Ramirez
– Mike Ramirez, Commented Apr 7, 2011 at 1:36
Thank you Mike. Well, I asked a wrong question. Is there any reason to encode it back to unicode string, if we already replaced all characters except "a-z", "0-9" and "-" ? — Zygimantas
– Zygimantas, Commented Apr 7, 2011 at 1:47
For django, I believe it's important to them to have it all strings as unicode objects for compatibility. It's your choice if you want this. — Mike Ramirez
– Mike Ramirez, Commented Apr 7, 2011 at 1:51

kratenko · Accepted Answer · 2022-03-30 09:02:27Z

221

There is a python package named python-slugify, which does a pretty good job of slugifying:

pip install python-slugify

Works like this:

from slugify import slugify

txt = "This is a test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = "This -- is a ## test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = 'C\'est déjà l\'été.'
r = slugify(txt)
self.assertEquals(r, "cest-deja-lete")

txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
r = slugify(txt)
self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

txt = 'Компьютер'
r = slugify(txt)
self.assertEquals(r, "kompiuter")

txt = 'jaja---lol-méméméoo--a'
r = slugify(txt)
self.assertEquals(r, "jaja-lol-mememeoo-a")

See More examples

This package does a bit more than what you posted (take a look at the source, it's just one file). The project is still active (got updated 2 days before I originally answered, over nine years later (last checked 2022-03-30), it still gets updated).

careful: There is a second package around, named slugify. If you have both of them, you might get a problem, as they have the same name for import. The one just named slugify didn't do all I quick-checked: "Ich heiße" became "ich-heie" (should be "ich-heisse"), so be sure to pick the right one, when using pip or easy_install.

edited Mar 30, 2022 at 9:02

answered Feb 15, 2013 at 2:12

kratenko

7,6445 gold badges39 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Rotareti Over a year ago

python-slugify is licensed under MIT, but it uses Unidecode which is licensed under GPL, so it might not fit for some projects.

Ghassem Tofighi Over a year ago

@Rotareti Could you please explain for me why it is could not fit all the projects? Can't we use anything under MIT or GPL license and include them inside commercial software? I think the only restriction is putting the license besides the codes we develop. Am I wrong?

Rotareti Over a year ago

@GhassemTofighi In short: You can use it in your commercial software, but if you use it, you must open source your code as well. Anyway IANAL and this is no legal advice.

kratenko Over a year ago

@GhassemTofighi maybe take a look at softwareengineering.stackexchange.com/q/47032/71504 on that topic

Emilien Over a year ago

@Rotareti python-slugify now defaults to the Artistic License'd text-unidecode instead of the GPL-licensed Unidecode, addressing your licensing concern. github.com/un33k/python-slugify/commit/…

Arne · Accepted Answer · 2018-11-12 10:02:03Z

42

Install unidecode form from here for unicode support

pip install unidecode

# -*- coding: utf-8 -*-
import re
import unidecode

def slugify(text):
    text = unidecode.unidecode(text).lower()
    return re.sub(r'[\W_]+', '-', text)

text = u"My custom хелло ворлд"
print slugify(text)

>>> my-custom-khello-vorld

edited Nov 12, 2018 at 10:02

Arne

20.7k11 gold badges101 silver badges107 bronze badges

answered Dec 3, 2011 at 9:29

Normunds

5011 gold badge5 silver badges7 bronze badges

7 Comments

derevo Over a year ago

hi, its a bit strange but it give for my res like that "my-custom-ndud-d-d3-4-d2d3-4nd-d-"

kratenko Over a year ago

@derevo that happend when you don't send unicode strings. Replace slugify("My custom хелло ворлд") with slugify(u"My custom хелло ворлд"), and it should work.

crodjer Over a year ago

I would suggest against using variable names like str. This hides the builtin str type.

Jorge Leitao Over a year ago

unidecode is GPL, which may not be suitable for some.

Ryan Chou Over a year ago

What about the reslugifying or deslugifying.

|

voronin · Accepted Answer · 2014-03-02 21:01:10Z

12

There is python package named awesome-slugify:

pip install awesome-slugify

Works like this:

from slugify import slugify

slugify('one kožušček')  # one-kozuscek

awesome-slugify github page

answered Mar 2, 2014 at 21:01

voronin

6697 silver badges8 bronze badges

2 Comments

Rotareti Over a year ago

Nice package! But be careful, it's licensed under GPL.

Kalob Taulien Over a year ago

Heads up: this won't automatically .lower() your urls. You'll need to run slugify(text).lower() if you want that.

Animesh Sharma · Accepted Answer · 2014-12-03 05:35:16Z

11

def slugify(value):
    """
    Converts to lowercase, removes non-word characters (alphanumerics and
    underscores) and converts spaces to hyphens. Also strips leading and
    trailing whitespace.
    """
    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
    value = re.sub('[^\w\s-]', '', value).strip().lower()
    return mark_safe(re.sub('[-\s]+', '-', value))
slugify = allow_lazy(slugify, six.text_type)

This is the slugify function present in django.utils.text This should suffice your requirement.

answered Dec 3, 2014 at 5:35

Animesh Sharma

3,3861 gold badge21 silver badges34 bronze badges

Comments

Gaslight Deceive Subvert · Accepted Answer · 2011-09-07 13:16:09Z

9

The problem is with the ascii normalization line:

slug = unicodedata.normalize('NFKD', s)

It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:

Mørdag -> mrdag
Æther -> ther

A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:

import unidecode
slug = unidecode.unidecode(s)

You get better results for the above strings and for many Greek and Russian characters too:

Mørdag -> mordag
Æther -> aether

answered Sep 7, 2011 at 13:16

Gaslight Deceive Subvert

20.7k20 gold badges94 silver badges131 bronze badges

Comments

Nick Presta · Accepted Answer · 2011-04-06 23:22:30Z

8

It works well in Django, so I don't see why it wouldn't be a good general purpose slugify function.

Are you having any problems with it?

answered Apr 6, 2011 at 23:22

Nick Presta

28.8k6 gold badges61 silver badges76 bronze badges

2 Comments

raylu Over a year ago

The code has moved to here.

Spartacus Over a year ago

For the lazies: from django.utils.text import slugify

Spring98 · Accepted Answer · 2024-01-06 11:05:27Z

5

Another good answer for creating it could be this form

import re
re.sub(r'\W+', '-', st).strip('-').lower()

edited Jan 6, 2024 at 11:05

answered Oct 31, 2022 at 8:51

Spring98

3731 gold badge5 silver badges16 bronze badges

1 Comment

Jonathan DS Apr 17 at 13:41

Please beware this is does not take care of unicode, and in slugs we generally want to translitarate unicode characters

BomberMan · Accepted Answer · 2014-11-14 09:21:59Z

4

Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one

edited Nov 14, 2014 at 9:21

BomberMan

1,0943 gold badges13 silver badges33 bronze badges

answered Apr 22, 2013 at 14:29

Mikhail Korobov

22.3k8 gold badges75 silver badges66 bronze badges

Comments

Jeff Widman · Accepted Answer · 2016-03-16 13:35:01Z

A couple of options on GitHub:

Each supports slightly different parameters for its API, so you'll need to look through to figure out what you prefer.

In particular, pay attention to the different options they provide for dealing with non-ASCII characters. Pydanny wrote a very helpful blog post illustrating some of the unicode handling differences in these slugify'ing libraries: http://www.pydanny.com/awesome-slugify-human-readable-url-slugs-from-any-string.html This blog post is slightly outdated because Mozilla's unicode-slugify is no longer Django-specific.

Also note that currently awesome-slugify is GPLv3, though there's an open issue where the author says they'd prefer to release as MIT/BSD, just not sure of the legality: https://github.com/dimka665/awesome-slugify/issues/24

unutbu · Accepted Answer · 2011-04-06 23:36:57Z

1

You might consider changing the last line to

slug=re.sub(r'--+',r'-',slug)

since the pattern [-]+ is no different than -+, and you don't really care about matching just one hyphen, only two or more.

But, of course, this is quite minor.

answered Apr 6, 2011 at 23:36

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Comments

ostrokach · Accepted Answer · 2018-11-17 01:34:39Z

1

Another option is boltons.strutils.slugify. Boltons has quite a few other useful functions as well, and is distributed under a BSD license.

edited Nov 17, 2018 at 1:34

answered Nov 17, 2018 at 1:27

ostrokach

20.4k12 gold badges91 silver badges99 bronze badges

Comments

claudius · Accepted Answer · 2021-10-27 16:45:28Z

0

By your example, a fast manner to do that could be:

s = 'String to slugify'

slug = s.replace(" ", "-").lower()

answered Oct 27, 2021 at 16:45

claudius

1,0152 gold badges15 silver badges27 bronze badges

Collectives™ on Stack Overflow

String slugification in Python

12 Answers 12

5 Comments

7 Comments

2 Comments

Comments

Comments

2 Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

12 Answers 12

5 Comments

7 Comments

2 Comments

Comments

Comments

2 Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related