2

I run a dog salon where dogs get haircuts on an infrequent basis. In order to encourage owners back I would like to send out vouchers for their next visit. The voucher will be based on whether a dog has had a haircut within the last 2 months to 2 years. Beyond 2 years ago we can assume that the customer has been lost and less than 2 months ago is too close to their previous haircut. We will first target owners that have recently visited.

My underlying database is PostgreSQL.

from datetime import timedelta
from django.db import models
from django.db.models import Max, OuterRef, Subquery
from django.utils import timezone


# Dogs have one owner, owners can have many dogs, dogs can have many haircuts

class Owner(models.model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    name = models.CharField(max_length=255)


class Dog(models.model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    owner = models.ForeignKey(Owner, on_delete=models.CASCADE, related_name="dogs")
    name = models.CharField(max_length=255)


class Haircut(models.model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    dog = models.ForeignKey(Dog, on_delete=models.CASCADE, related_name="haircuts")
    at = models.DateField()


today = timezone.now().date()
start = today - timedelta(years=2)
end = today - timedelta(months=2)

It strikes me that the problem can be broken down into two queries. The first is something that aggregates an owner's dogs to most recently cut within the last 2 months to 2 years.

dog_aggregate = Haircut.objects.annotate(Max("at")).filter(at__range=(start, end))

And then joins the result of that to the owners table.

owners_by_shaggiest_dog_1 = Owner.objects # what's the rest of this?

Resulting in SQL similar to:

select
  owner.id,
  owner.name
from
  (
    select
      dog.owner_id,
      max(haircut.at) last_haircut
    from haircut
      left join dog on haircut.dog_id = dog.id
    where
      haircut.at
        between current_date - interval '2' year
            and current_date - interval '2' month
    group by
      dog.owner_id
  ) dog_aggregate
  left join owner on dog_aggregate.owner_id = owner.id
order by
  dog_aggregate.last_haircut asc,
  owner.name;

Through some playing around I have managed to get the correct result with:

haircut_annotation = Subquery(
    Haircut.objects
    .filter(dog__owner=OuterRef("pk"), at__range=(start, end))
    .order_by("-at")
    .values("at")[:1]
)

owners_by_shaggiest_dog_2 = (
    Owner.objects
    .annotate(last_haircut=haircut_annotation)
    .order_by("-last_haircut", "name")
)

However, the resulting SQL seems inefficient as a new query is performed for every row:

select
  owner.id,
  owner.name,
  (
    select
    from haircut
      inner join dog on haircut.dog_id = dog.id
    where haircut.at
            between current_date - interval '2' year
                and current_date - interval '2' month
      and dog.owner_id = (owner.id)
    order by
      haircut.at asc
    limit 1
  ) last_haircut
from
  owner
order by
  last_haircut asc,
  owner.name;

P.S. I don't actually run a dog salon so I can't give you a voucher. Sorry!

3
  • 1
    I am not really following why you need the minimal haircut here? Commented Jun 27, 2019 at 11:02
  • 1
    Are you sure, start and end would be future dates. Aren't they supposed to be dates of past if a customer has visited ? Commented Jun 27, 2019 at 11:05
  • WillemVanOnsem and bak2trak you are both correct. Thank you for spotting the logical inconsistencies. I have corrected the question to reflect. Commented Jun 27, 2019 at 12:29

1 Answer 1

1

Given I understood it correctly, you can make a query like:

from django.db.models import Max

Owners.objects.filter(
    dogs__haircuts__at__range=(start, end)
).annotate(
    last_haircut=Max('dogs__haircuts__at')
).order_by('last_haircut', 'name')

The last haircut should be the Maximum here, since as time passes by, the timestamp is larger.

Note however that your query and this query, does not exclude owners of dogs that have been washed more recently. We simply do not take that into account when we calculate the last_haircut.

If you want to exclude such owners, you should build a query like:

from django.db.models import Max

Owners.objects.exclude(
    dogs__haircuts__at__gt=end
).filter(
    dogs__haircuts__at__range=(start, end)
).annotate(
    last_haircut=Max('dogs__haircuts__at')
).order_by('last_haircut', 'name')
Sign up to request clarification or add additional context in comments.

3 Comments

This looks to be the right approach; it doesn't produce the same SQL I proposed above but I suspect the two inner joins are logically the same (I do need to investigate a bit more, however). The answer should use the related name, that is "dogs__haircuts__". I'm also not sure why the exclude is necessary. Is that not covered by the range?
@casr: ah, I did not spot the related_name=...s. Indeed, then you should use the related_name. Will update.
This is essentially what I was after. Thanks! If you want to give those that think in SQL an extra hint you can add this to your answer gist.github.com/casr/7584f5a427ab7f2d00a1b89eaba55f1a . Still not sure why you suggested the exclude though :/

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.