1

Django 1.3-dev provides several ways to query the database using raw SQL. They are covered here and here. The recommended ways are to use the .raw() or the .extra() methods. The advantage is that if the retrieved data fits the Model you can still use some of it's features directly.

The page I'm trying to display is somewhat complex because it uses lots of information which is spread across multiple tables with different relationships (one2one, one2many). With the current approach the server has to do about 4K queries per page. This is obviously slow due to database to webserver communication.

A possible solution is to use raw SQL to retrieve the relevant data but due to the complexity of the query I couldn't translate this to an equivalent in Django.

The query is:

SELECT clin.iso as iso,
   (SELECT COUNT(*)
       FROM clin AS a
       LEFT JOIN clin AS b
           ON a.pat_id = b.pat_id
       WHERE a.iso = clin.iso
   ) AS multiple_iso,
   (SELECT COUNT(*)
       FROM samptopat
       WHERE samptopat.iso_id = clin.iso
   ) AS multiple_samp,
   (SELECT GROUP_CONCAT(value ORDER BY snp_id ASC)
       FROM samptopat
       RIGHT JOIN samptosnp
           USING(samp_id)
       WHERE iso_id = clin.iso
       GROUP BY samp_id
       LIMIT 1 -- Return 1st samp only
   ) AS snp
FROM clin
WHERE iso IN (...)

or alternatively WHERE iso = ....

Sample output looks like:

+-------+--------------+---------------+-------------+
| iso   | multiple_iso | multiple_samp | snp         |
+-------+--------------+---------------+-------------+
|     7 |        19883 |             0 | NULL        |
|     8 |        19883 |             0 | NULL        |
| 21092 |            1 |             2 | G,T,C,G,T,G |
| 31548 |            1 |             0 | NULL        |
+-------+--------------+---------------+-------------+
4 rows in set (0.00 sec)

The documentation explains how one can do a query using WHERE col = %s but not the IN syntax. One part of this question is How do I perform raw SQL queries using Django and the IN statement?

The other part is, considering the following models:

class Clin(models.Model):
    iso = models.IntegerField(primary_key=True)
    pat = models.IntegerField(db_column='pat_id')
    class Meta:
        db_table = u'clin'

class SampToPat(models.Model):
    samptopat_id = models.IntegerField(primary_key=True)
    samp = models.OneToOneField(Samp, db_column='samp_id')
    pat = models.IntegerField(db_column='pat_id')
    iso = models.ForeignKey(Clin, db_column='iso_id')
    class Meta:
        db_table = u'samptopat'

class Samp(models.Model):
    samp_id = models.IntegerField(primary_key=True)
    samp = models.CharField(max_length=8)
    class Meta:
        db_table = u'samp'

class SampToSnp(models.Model):
    samptosnp_id = models.IntegerField(primary_key=True)
    samp = models.ForeignKey(Samp, db_column='samp_id')
    snp = models.IntegerField(db_column='snp_id')
    value = models.CharField(max_length=2)
    class Meta:
        db_table = u'samptosnp'

Is it possible to rewrite the above query into something more ORM oriented?

6
  • first, I'd remove the explicit primary keys, django uses integers anyway. Commented Nov 23, 2010 at 16:29
  • Looks like this is pointing to an existing database; hence he needs to specify the primary keys as they already have a name other than id. What I wonder is, both Clin and SampToPat refer to pat_id, but I see no model corresponding to the Pat. I also Don't see a model for Snp For completeness shouldn't you include those models? I can't help thinking that if we had the Pat model available we might be able to come up with ways of handling some of those aggregate functions. Commented Nov 23, 2010 at 16:41
  • @Jordan, yes the model exists but since I was not using them directly in the current query I removed those relationships for the sake of simplicity. Can you elaborate on why would the presence of the related models make aggregate functions easier to work with? Shouldn't the IDs be enough? Commented Nov 23, 2010 at 16:51
  • @Evgeny, as Jordan guessed this is a legacy setup and not all fields follow the Django convention. Commented Nov 23, 2010 at 16:52
  • i understand that, just trying to explain the problem to myself :) Commented Nov 23, 2010 at 16:54

2 Answers 2

1

For a problem like this one, I'd split the query into a small number of simpler ones, I think it's quite possible. Also, I found that MySQL actually may return results faster with this approach.

edit ...Actually after thinking a bit I see that you need to "annotate on subqueries", which is not possible in Django ORM (not in 1.2 at least). Maybe you have to do plain sql here or use some other tool to build the query.

Tried to rewrite your models in more default django pattern, maybe it will help to understand the problem better. Models Pat and Snp are missing though...

class Clin(models.Model):
    pat = models.ForeignKey(Pat)
    class Meta:
        db_table = u'clin'

class SampToPat(models.Model):
    samp = models.ForeignKey(Samp)
    pat = models.ForeignKey(Pat)
    iso = models.ForeignKey(Clin)
    class Meta:
        db_table = u'samptopat'
        unique_together = ['samp', 'pat']

class Samp(models.Model):
    samp = models.CharField(max_length=8)
    snp_set = models.ManyToManyField(Snp, through='SampToSnp')
    pat_set = models.ManyToManyField(Pat, through='SaptToPat')
    class Meta:
        db_table = u'samp'

class SampToSnp(models.Model):
    samp = models.ForeignKey(Samp)
    snp = models.ForeignKey(Snp)
    value = models.CharField(max_length=2)
    class Meta:
        db_table = u'samptosnp'

The following seems to mean - get count of unique patients per clinic ...

(SELECT COUNT(*)
   FROM clin AS a
   LEFT JOIN clin AS b
       ON a.pat_id = b.pat_id
   WHERE a.iso = clin.iso
) AS multiple_iso,

Sample count per clinic:

(SELECT COUNT(*)
   FROM samptopat
   WHERE samptopat.iso_id = clin.iso
) AS multiple_samp,

This part is harder to understand, but in Django there is no way to do GROUP_CONCAT in plain ORM.

(SELECT GROUP_CONCAT(value ORDER BY snp_id ASC)
   FROM samptopat
   RIGHT JOIN samptosnp
       USING(samp_id)
   WHERE iso_id = clin.iso
   GROUP BY samp_id
   LIMIT 1 -- Return 1st samp only
) AS snp
Sign up to request clarification or add additional context in comments.

1 Comment

After playing with your idea for a while I was able to reach a better solution than what I had initially. It turns out that the problem of the large number of queries was caused by the lazy aspect of Django's ORM. Simply by converting some of the QuerySets to lists I was able to reduce the number of queries from ~4k to ~120 and going from 10secs to 0.4secs which is good enough. The solution ended up being retrieving each block individually and forcing bulk retrieval by converting to lists in some locations.
0

Could you explain exactly what you're trying to extract w/ the snp subquery? I see you're joining over the two tables, but it looks like what you really want is Snp objects which have an associated Clin which has the given id. If so, this becomes almost as straightforward to do as a separate query as the other 2:

Snp.objects.filter(samp__pat__clin__pk=given_clin)

or some such thing ought to do the trick. You may have to rewrite that a bit due to all the ways you're violating the conventions, unfortunately.

The others are something like:

Pat.objects.filter(clin__pk=given_clin).count()

and

Samp.objects.filter(clin__pk=given_clin).count()

if @Evgeny's reading is correct (which is how I read it as well).

Often, with Django's ORM, I find I get better results if I try to think about directly what I want in terms of the ORM, instead of trying to translate to or from the SQL I might use if I wasn't using the ORM.

1 Comment

The table samptosnp has both the relationship and the information I'm trying to retrieve. What I'm doing with the subquery is to retrieve all the value in SampToSnp for a matching Samp but limiting it only to the first hit since the relationship between Samp and SampToSnp is one-to-many. The weird query is a workaround on a not so good design of the SampToSnp table which holds the Snp value and also a way to get all the information in one line via SQL directly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.