1

I have a database of articles that I want to search through. I had been using normal Django ORM to search, which was getting way to slow and then I got to know a little about Indexes in Django. I'm using MySQL and I now know that with MYSQL I cannot put an index field into a TextField as described here in this stack question which I was facing. However in my case I can't change this to CharField.

I was reading through the MyQSL Docs which stated

MySQL cannot index LONGTEXT columns specified without a prefix length on the key part, and prefix lengths are not permitted in functional key parts.

Hence I was of the understanding that since TextField in Django is LONGTEXT for MYSQL, I came across this Django-MySQL package here and thought that using this if I could change the LONGTEXT to a MEDIUMTEXT using this package, this might get resolved. So my updated model I did this

class MyModel(Model):
    ........
    document = SizedTextField(size_class=3)

However, I still see the same error while applying python manage.py makemigrations

django.db.utils.OperationalError: (1170, "BLOB/TEXT column 'document' used in key specification without a key length")

How can I go about resolving this?

2
  • 1
    Please show us the SELECT that you hope will speed up via the index. This will help us discuss FULLTEXT versus "prefix" versus some other solution. Commented Feb 20, 2022 at 16:40
  • @RickJames I'm simply returning all the articles that contain a given word passed by the client. So would be something SELECT * from articles WHERE text CONTAINS searchword Commented Feb 21, 2022 at 16:49

2 Answers 2

1

returning all the articles that contain a given word passed by the client. So would be something SELECT * from articles WHERE text CONTAINS searchword

Add

FULLTEXT(text)

and use

WHERE MATCH(text) AGAINST("searchword")

or perhaps

WHERE MATCH(text) AGAINST("+searchword" IN BOOLEAN MODE)

It will run very fast. There are caveats -- short words and "stop" words (like "the") are ignored.

(If DJango cannot facilitate that, then you have to do it with "raw SQL".)

Sign up to request clarification or add additional context in comments.

1 Comment

Marking this as the accepted since contains the relevant block of code for Full text search.
1

All of these related types, TEXT, MEDIUMTEXT, and LONGTEXT, are too large to be indexed without specifying a prefix. An index prefix means that only the first N characters of the string are included in the index. Like this:

create table mytable (
  t text, 
  index myidx (t(200))
);

The prefix length in this example is 200 characters. So only the first 200 characters are included in the index. Usually this is enough to help performance, unless you had a large number of strings that are identical in their first 200 characters.

The longest prefix that MySQL supports depends on the storage engine and the row format. Old versions of MySQL support index prefix up to 768 bytes, which means a lesser number of characters depending on if you use multi-byte character sets like utf8 or utf8mb4. The recent versions of MySQL default to a more modern row format, which supports up to 3072 bytes for an index, again reduced by 3 or 4 bytes per character.

I'm not a regular Django user, so I tried to skim the documentation about defining indexes on model classes. But given a few seconds of reading, I don't see an option to declare a prefix for an index on a long string column.

I think your options are one of the following:

  • Change the column to a shorter string column that can be indexed
  • Create the index using the MySQL client, not using Django migrations

3 Comments

This is helpful, thanks Bill. Yes, for my case I do have a large number of similar articles so setting a prefix might not be an option. Further, changing to a shorter string column (CharField for Django) may also be difficult since I do not have the limit at the client end however articles are around 10-20 KB in average so if CharField supports this size for MySQL this may work. Lastly, I will have to explore the last option you recommended for creating an index using MySQL client directly, since I had only been using Django for db operations.
Do you expect to search your articles for keywords, not for the entire exact text? If so, then creating an index wouldn't help anyway. You should use a fulltext index. I don't know if Django supports creating those kinds of indexes through the model classes, or if you'd have to create it manually.
I'm simply returning all the articles that contain a given keyword by the client application. Fulltext index looks relevant, I found this answer on how to do this with Django by adding a lookup which seems interesting.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.