How to search in a huge table on Django admin
We all know that the Django admin is a super cool tool for Django. You can check your models, and add/edit/delete records from the tables. If you are familiar with Django, I'm sure you already know about it.
I was given a task: Our client wanted to search in a table by one field. It seems easy enough, right? Well, the tricky part is that the table has 523.803.417 records.
Wow. 523.803.417 records.
At least the model was not that complex:
class HugeTable(models.Model): """Huge table information""" search_field = models.CharField(max_length=10, db_index=True, unique=True) is_valid = models.BooleanField(default=True) def __str__(self): return self.search_field
So for Django admin, it should be a breeze, right? WRONG.
First, I just added the search field on the admin.py:
class HugeTableAdmin(admin.ModelAdmin): search_fields = ('search_field', ) admin.site.register(HugeTable, HugeTableAdmin)
And it worked! I had a functioning search field on my admin.
Only one problem: It took 3mins+ to load the page and 5mins+ to search. But at least it was working, right?
First, let's split the issues:
- Why was it taking +3mins just to load the page?
- Why was it taking +5mins to search if the search field was indexed?
I started tackling the first one, and found it quite easily: Django was getting only 100 records at a time, but it had to calculate the length for the paginator and the "see more" button on the search bar
So near, yet so far
Improving the page load
A quick look at the Django docs told me how to deactivate the "see more" query:
Set show_full_result_count to control whether the full count of objects should be displayed on a filtered admin page (e.g. 99 results (103 total)). If this option is set to False, a text like 99 results (Show all) is displayed instead.
class HugeTableAdmin(admin.ModelAdmin): search_fields = ('search_field', ) show_full_result_count = False admin.site.register(HugeTable, HugeTableAdmin)
That fixed one problem, but how about the other? It seemed I needed to do my paginator.
Thankfully, I found an awesome post by Haki Benita called "Optimizing the Django Admin Paginator" that explained exactly that. Since I didn't need to know the records count, I went with the "Dumb" approach:
from django.core.paginator import Paginator from Django.utils.functional import cached_property class DumbPaginator(Paginator): """ Paginator that does not count the rows in the table. """ @cached_property def count(self): return 9999999999 class HugeTableAdmin(admin.ModelAdmin): search_fields = ('search_field', ) show_full_result_count = False paginator = DumbPaginator admin.site.register(HugeTable, HugeTableAdmin)
And it worked! The page was loading blazingly fast :) But the search was still ultra slow. So let's fix that.
Improving the search
I checked A LOT of options. I almost went with Haystack, but it seemed a bit overkill for what I needed. I finally found this super cool tool: djangoql. It allowed me to search the table by using sql like operations, so I could search by
search_field and make use of the indexation. So I installed it:
INSTALLED_APPS = [ ... 'djangoql', ... ]
from django.core.paginator import Paginator from django.utils.functional import cached_property from djangoql.admin import DjangoQLSearchMixin class DumbPaginator(Paginator): """ Paginator that does not count the rows in the table. """ @cached_property def count(self): return 9999999999 class HugeTableAdmin(DjangoQLSearchMixin, admin.ModelAdmin): show_full_result_count = False paginator = DumbPaginator admin.site.register(HugeTable, HugeTableAdmin)
And it worked! By performing the query:
search_field = "my search query"
I get my results in around 1 second.
Is it done?
Yes! Now my client can search by
search_field on a table of 523.803.417 records, very easily and very quickly.
I'm planning to post more Python/Django things I'm learning by working with this client, so you might want to stay tuned :)