9

The Problem

I'm trying to use the Django ORM to do the equivalent of a SQL NOT IN clause, providing a list of IDs in a subselect to bring back a set of records from the logging table. I can't figure out if this is possible.

The Model

class JobLog(models.Model):
    job_number = models.BigIntegerField(blank=True, null=True)
    name = models.TextField(blank=True, null=True)
    username = models.TextField(blank=True, null=True)
    event = models.TextField(blank=True, null=True)
    time = models.DateTimeField(blank=True, null=True)

What I've Tried

My first attempt was to use exclude, but this does NOT to negate the entire Subquery, rather than the desired NOT IN:

query = (
    JobLog.objects.values(
        "username", "job_number", "name", "time",
    )
    .filter(time__gte=start, time__lte=end, event="delivered")
    .exclude(
        job_number__in=models.Subquery(
            JobLog.objects.values_list("job_number", flat=True).filter(
                time__gte=start, time__lte=end, event="finished",
            )
        )
    )
)

Unfortunately, this yields this SQL:

SELECT "view_job_log"."username", "view_job_log"."group", "view_job_log"."job_number", "view_job_log"."name", "view_job_log"."time"
FROM "view_job_log"
WHERE (
    "view_job_log"."event" = 'delivered'
    AND "view_job_log"."time" >= '2020-03-12T11:22:28.300590+00:00'::timestamptz
    AND "view_job_log"."time" <= '2020-03-13T11:22:28.300600+00:00'::timestamptz
    AND NOT (
        "view_job_log"."job_number" IN (
            SELECT U0."job_number"
            FROM "view_job_log" U0
            WHERE (
                U0."event" = 'finished' AND U0."time" >= '2020-03-12T11:22:28.300590+00:00'::timestamptz
                AND U0."time" <= '2020-03-13T11:22:28.300600+00:00'::timestamptz
            )
        )
        AND "view_job_log"."job_number" IS NOT NULL
    )
)

What I need is for the third AND clause to be AND "view_job_log"."job_number" NOT IN instead of the AND NOT (.

I've also tried doing the sub-select as it's own query first, with an exclude, as suggested here:

Django equivalent of SQL not in

However, this yields the same problematic result. Then I tried a Q object, which yields a similar query:

query = (
    JobLog.objects.values(
        "username", "subscriber_code", "job_number", "name", "time",
    )
    .filter(
        ~models.Q(job_number__in=models.Subquery(
            JobLog.objects.values_list("job_number", flat=True).filter(
                time__gte=start, time__lte=end, event="finished",
            )
        )),
        time__gte=start,
        time__lte=end,
        event="delivered",
    )
)

This attempt with the Q object yields the following SQL, again, without the NOT IN:

SELECT "view_job_log"."username", "view_job_log"."group", "view_job_log"."job_number", "view_job_log"."name", "view_job_log"."time"

FROM "view_job_log" WHERE (
    NOT (
        "view_job_log"."job_number" IN (
            SELECT U0."job_number"
            FROM "view_job_log" U0
            WHERE (
                U0."event" = 'finished'
                AND U0."time" >= '2020-03-12T11:33:28.098653+00:00'::timestamptz
                AND U0."time" <= '2020-03-13T11:33:28.098678+00:00'::timestamptz
            )
        )
        AND "view_job_log"."job_number" IS NOT NULL
    )
    AND "view_job_log"."event" = 'delivered'
    AND "view_job_log"."time" >= '2020-03-12T11:33:28.098653+00:00'::timestamptz
    AND "view_job_log"."time" <= '2020-03-13T11:33:28.098678+00:00'::timestamptz
)

Is there any way to get Django's ORM to do something equivalent to AND job_number NOT IN (12345, 12346, 12347)? Or am I going to have to drop to raw SQL to accomplish this?

Thanks in advance for reading this entire wall-of-text question. Explicit is better than implicit. :)

3
  • 1
    Your model is missing the event field Commented Mar 13, 2020 at 15:20
  • @Lotram Ah, I had deleted the non-relevant fields from the model and got a wee bit overzealous. Thank you for the catch! Commented Mar 13, 2020 at 18:52
  • 1
    Not sure if this is expected but since JobLog.job_number is nullable your NOT IN won't pass as soon as your subquery returns a NULL (stackoverflow.com/questions/129077/…). That's exactly what the "view_job_log"."job_number" IS NOT NULL SQL added by the ORM tries to protect against when you do exclude(field__in) but you'll be on your own if you use your own notin lookup. Commented Mar 14, 2020 at 22:20

3 Answers 3

9

I think the easiest way to do this would be to define a custom lookup, similar to this one or the in lookup

from django.db.models.lookups import In as LookupIn

class NotIn(LookupIn):
    lookup_name = "notin"

    def get_rhs_op(self, connection, rhs):
        return "NOT IN %s" % rhs

Field.register_lookup(NotIn)

or

class NotIn(models.Lookup):
    lookup_name = "notin"

    def as_sql(self, compiler, connection):
        lhs, params = self.process_lhs(compiler, connection)
        rhs, rhs_params = self.process_rhs(compiler, connection)
        params.extend(rhs_params)

        return "%s NOT IN %s" % (lhs, rhs), params

then use it in your query:

query = (
    JobLog.objects.values(
        "username", "job_number", "name", "time",
    )
    .filter(time__gte=start, time__lte=end, event="delivered")
    .filter(
        job_number__notin=models.Subquery(
            JobLog.objects.values_list("job_number", flat=True).filter(
                time__gte=start, time__lte=end, event="finished",
            )
        )
    )
)

this generates the SQL:

SELECT
    "people_joblog"."username",
    "people_joblog"."job_number",
    "people_joblog"."name",
    "people_joblog"."time"
FROM
    "people_joblog"
WHERE ("people_joblog"."event" = delivered
    AND "people_joblog"."time" >= 2020 - 03 - 13 15:24:34.691222 + 00:00
    AND "people_joblog"."time" <= 2020 - 03 - 13 15:24:41.678069 + 00:00
    AND "people_joblog"."job_number" NOT IN (
        SELECT
            U0. "job_number"
        FROM
            "people_joblog" U0
        WHERE (U0. "event" = finished
            AND U0. "time" >= 2020 - 03 - 13 15:24:34.691222 + 00:00
            AND U0. "time" <= 2020 - 03 - 13 15:24:41.678069 + 00:00)))
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you, @Lotram! Works like a charm. I'm pondering a notin PR to Django. :)
@Lotram, this is good, but it does not function as intended. You will get different SQL as opposed to ~Q, and you will get wrong results when for example performing a job_number__notin=[].
@MurphyAdam the question was indeed about not generating the same SQL, but rather introduce a NOT IN clause in the generated SQL, so that's what my answer does. I haven't tested every edge case, so it might not always work as intended. If you want to be safer, you can use @simon-charrette 's solution, as a django core developer, he probably knows best
1

You can likely achieve the same results by using an Exists and special casing NULLs.

.filter(
   ~Exists(
       JobLog.objects.filter(
           Q(jobnumber=None) | Q(jobnumber=OuterRef('jobnumber')),
           time__gte=start,
           time__lte=end,
           event='finished',
       )
   )
)

1 Comment

I appreciate the help, but isn't that quite a bit of gymnastics for what should be a fairly straightforward ORM operation?
0

Can you try this:

JobLog.objects.filter(time__gte=start, time__lte=end, event="delivered").exclude(time__gte=start, event='finished').exclude(time__lte=end, event='finished')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.