Django: How to change unique_together constraints

6 minute read

There is one thing in common all sorts of software applications: they need data. As features and the application evolves, the data modelling and the data structures change too.

I have been developing software using the amazing Python’s web framework Django. Their documentation is pretty good and there’s a considerable community behind it to support most issues and problems you will face. But I believe that for some things, you should be really careful and considerate about the solutions you find. Well, that is not just valid for Django, but to any application. In this article, I want to discuss constraints in Django, namely the process I went through to modify the fields that once made sense to be unique together (pun intended).

Django’s ORM is great. It does so much for the developer, that often we forget how hard it is to actually maintain our data consistent and wholesome. A great feature in this ORM is the unique_together option for the database models. In summary, this allows the developer to specify a list of fields that should be unique. Django devs now recommend using the newer UniqueConstraint instead, but unique_together is still supported, and a good, simpler option to define constraints. The reasoning in this post will still apply to the newer UniqueConstraint.

When unique_together makes sense.

Let’s say you have a model, that contains fields A, B, and C. For simplification purposes, and to keep this post easy to read through, lets keep it abstract:

1class MyModel(models.Model):
2    field_A = models.CharField()
3    field_B = models.CharField()
4    field_C = models.DateTimeField()
5
6    class Meta:
7        unique_together = [('field_A', 'field_B')]

This sample shows a database model that, at the time of writing, ensures all new MyModel objects are unique by a combination of A and B.

As mentioned in the beginning of the post, the applications we write evolves. And so does its data. The future might reveal that our initial assumptions are not longer valid, or no longer apply. In our demo, the future might show us that MyModel should actually be unique by field A, B, and C.

I recently went through this and I first thought what would happen if I just updated the constraint. What could go wrong, right? Well, apparently, a lot of things.

To start things off, it will likely not let you go through the migration process because you will have violations to that constraint. This is specially true if your update is about an existing field, and not a new field. Then I started searching for other possibilities, and found a lot of things that got me worried. Some accepted solutions were to modify existing migrations, to (force) reset migrations, a.k.a. delete all & run makemigrations again with the updated constraint, connect to the database and run some SQL commands…

nooo_gif

Please, don’t

One of my main concerns when handling applications that contains terabytes of data (no, no misspell), is to maintain data consistency and continuity. We can’t just start over, as in, delete all the data and fill it back up. Nor we can handle with migrations issues in production. We can, we just want to avoid that as it is a massive pain.

Modifying existing migrations is a big no-no! Doing that will likely lead you to a turmoil of hair pulling, thanks to its cyclic dependencies, inconsistency errors, and even more issues I can’t think of. Never, really never, modify applied migrations. Even if you perform rolling updates, if you ever need to squash migrations, which is something healthy to do at some point in time, it just won’t work. In the end, modifying the past like that can really be disastrous. When data consistency is a priority, it makes you think about all possible solutions, their implications, and defined the right strategy and fall back plans.

So, how do you actually work this through? Well, if you have to learn anything from this exercise, is that you need to come up with a migration strategy, first, and then apply it manually before moving along.

The process

The main reason you are decided to modify this constraint is likely that there’s something in the logic that has proven the new constraint would actually work better, and be more accurate than the previous one.

When your constraint is about a new field, it should be really fine and easy to work with. Add the new field, a default for all others, something that has to be unique, apply the constraint, and celebrate.

My motivation though, is when we need to add an existing field to the constraint. If we think about it, if we are decided about this, it might mean there are already a few repeated values in there. The hard part now is to come up with a strategy we are okay with to update or remove duplicates.

In my case, we went through with the latest created object, and it helped that they contained a creation_date field, but we could really go with anything else that works for you.

I wanted to give you an example, still in abstract terms, on how to do this so you never have to work with existing migrations - which you shouldn’t. The healthy way to do this starts with removing the constraint. Comment it out or delete the text. Our model then should become something like:

1class MyModel(models.Model):
2    field_A = models.CharField()
3    field_B = models.CharField()
4    field_C = models.DateTimeField()

If you run makemigrations, it will create a migration that contains something like:

1    operations = [
2        migrations.AlterUniqueTogether(
3            name='mymodel',
4            unique_together=set(),
5        ),
6    ]

This is enough to drop the constraint. The next step, after applying the previous new migration, is to create the updating logic manually. To do that, you should start with creating an empty migration (python manage.py makemigrations --empty myapp) and in it, you will write a function outside the class Migration that contains the logic you have thought off to update the existing data:

 1def remove_duplicates(apps, schema_editor):
 2    """
 3    The logic should live under this method.
 4    """
 5    for obj in apps.get_model("myapp", "mymodel").objects.all():
 6        """
 7        logic to update existing models.
 8        Remember this is just Python code here. You can do anything here, that a script would do in any other place. 
 9        Your updating strategy might be go online, fetch some data, do some matching, update the field, check creation 
10        dates, ... Anything works. As long as you ensure that, after this migration, there will no longer be repeated 
11        values.
12        """
13
14
15class Migration(migrations.Migration):
16    ...
17
18    operations = [
19        migrations.RunPython(remove_duplicates)
20    ]

You only need to apply it now. The last step is to add the updated constraint, makemigrations, migrate and ensure no inconsistency errors were returned, or duplicate keys, or any other sort of errors:

1class MyModel(models.Model):
2    field_A = models.CharField()
3    field_B = models.CharField()
4    field_C = models.DateTimeField()
5
6    class Meta:
7        unique_together = [('field_A', 'field_B', 'field_C')]

During this time, you have been updating the data locally and you have created three migrations for the process. When you push this into production, what will happen is that it will instantaneously go through all new migrations and update the data for you.

My recommendation

Try and retrieve some dumps for tables you want to update. This is crucial. Locally, you can mess it up and you are fine. You don’t care for inconsistencies, duplicates, or whatever error. It’s fine, you can reset your local database any time. In production, well, if anything fails… You are dead :)

If your local data is exactly the same as in production, if your migration strategy is successful locally, then your chances of succeeding in production are significantly increased and you’re not that dependent on the migration gods.

So, to reiterate:

  1. Think through how and why you want to update the constraint
  2. Get some production dumps to your local database
  3. Write that strategy into an empty migration, after removing the constraint, but before updating it with the new fields
  4. Migrate and, if successful:
  5. Celebrate.

Thank you for your time reading, I’ll see you next time. gsilvapt

comments powered by Disqus