0

I am making an app that will search for similar drugs using the logic and models that if a drug is in a particular set of classes then other drugs that are in those and only those classes only then it will return those drugs. Here is the relevant code and dummy data -

views.py

class GetSimilarDrugs(APIView):

  def get(self, request, format=None):
    #import pdb
    #pdb.set_trace()
    get_req = request.GET.get('drugid', '')
    simi_list = []
    comp_class = DrugBankDrugEPClass.objects.filter(drug_bank_id = get_req).values_list('epc_id', flat=True).distinct()
    for drg_id in DrugBankDrugEPClass.objects.values_list('drug_bank_id', flat = True).distinct():
      classtocomp = DrugBankDrugEPClass.objects.filter(drug_bank_id = str(drg_id)).values_list('epc_id', flat=True).distinct()

      complist = list(comp_class)
      tolist = list(classtocomp)
      if complist == tolist:                
        simi_list.append(drg_id)
  return Response({'result':simi_list})

models.py

class DrugBankDrugEPClass(models.Model):
  drug_bank = models.ForeignKey(DrugBankDrugs, on_delete=models.CASCADE)
  epc = models.ForeignKey(DrugBankEPClass, on_delete=models.CASCADE)

Dummy SQL Data

 id   | drug_bank_id | epc_id |
+------+--------------+--------+
|    1 | DB12789      |      1 |
|    2 | DB12788      |      2 |
|    3 | DB00596      |      3 |
|    4 | DB09161      |      4 |
|    5 | DB01178      |      5 |
|    6 | DB01177      |      6 |
|    7 | DB01177      |      6 |
|    8 | DB01174      |      7 |
|    9 | DB01175      |      8 |
|   10 | DB01172      |      9 |
|   11 | DB01173      |     10 |
|   12 | DB12257      |     11 |
|   13 | DB08167      |     12 |
|   14 | DB01551      |     13 |
|   15 | DB01006      |     14 |
|   16 | DB01007      |     15 |
|   17 | DB01007      |     16 |
|   18 | DB01004      |     17 |
|   19 | DB01004      |     18 |
|   20 | DB01004      |     17 |
|   21 | DB01004      |     18 |
|   22 | DB01004      |     19 |
|   23 | DB00570      |     20 |
|   24 | DB01008      |     21 |
|   25 | DB00572      |     22 |
|   26 | DB00575      |      7 |
|   27 | DB00577      |     23 |
|   28 | DB00577      |     24 |
|   29 | DB00577      |     25 |
|   30 | DB00576      |     26 |
|   31 | DB00751      |     27 |
|   32 | DB00751      |     28 |
|   33 | DB00750      |     29 |
|   34 | DB00753      |     30 |
|   35 | DB00752      |     31 |
|   36 | DB00755      |     32 |
|   37 | DB00755      |     32 |
|   38 | DB00757      |     33 |
|   39 | DB00756      |     34 |
|   40 | DB00759      |     35 |
|   41 | DB00759      |     36 |
|   42 | DB00759      |     36 |

I am getting the result but the problem is that it iterating through list everytime and thus taking very much time and for lots of data it is really slow. Is there any other way so it can work faster?

3
  • It's not clear what you are trying to do here. Why are you querying the DrugBankDrugEPClass model so many times? Commented Jan 17, 2018 at 11:56
  • I think in first query he taking some epc_id and finding matching values from whole distinct drug_bank_id in that table. For taking epc_id of distinct drug_bank_id he uses loop. Commented Jan 17, 2018 at 11:59
  • like there is a drug_id called "DB00752" and it has epc_id = [1,2,3] so I want to search for other drugs that has epc_id = [1,2,3] only and return those drugs. Commented Jan 17, 2018 at 12:00

2 Answers 2

1

According to your need I think you can simply do like this :

get_req = request.GET.get('drugid', '')
# Fetching all the epc_ids that belongs to requisted drug_bank_ids
comp_class = DrugBankDrugEPClass.objects.filter(drug_bank_id = get_req).values_list('epc_id', flat=True).distinct()
# filters all drug_bank_ids thats matcth with the epc_ids in requisted
classtocomp = DrugBankDrugEPClass.objects.filter(epc_id__in = comp_class).values_list('drug_bank_id', flat=True).distinct()

UPD :

get_req = request.GET.get('drugid', '')

comp_class = DrugBankDrugEPClass.objects.filter(
     drug_bank_id=get_req).values_list('epc_id', flat=True).distinct()

class_to_comp = DrugBankDrugEPClass.objects.filter(
     epc_id__in=comp_class).values_list('drug_bank_id', 'epc_id')

d = {}
for k, v in class_to_comp:
     d.setdefault(k, []).append(v)

simi_list = [k for k, v in d.items() if v == list(comp_class)]
print(simi_list)

I Think it will be little fast than your code because if i'am looping also as like you did it's not hitting database in each loop. Also its looping through filtered data.

Sign up to request clarification or add additional context in comments.

5 Comments

Your query is almost working but as you are using __in so it is returning drugs even if one element in the list matches while I need all the elements of the list to match with other drugs epc_id
can you please tell me is there any other way?
So are you expecting like if a particular drug_bank_id have 3 epc_id , you want all other drug_bank_ids having all these 3 epc_id ?
Vikas exactly bro!
@l0n3_w01f I have added a piece of new code and just check it by run and tell me whether its meets your requirements or not
0

You can change the loop from

for drg_id in DrugBankDrugEPClass.objects.values_list('drug_bank_id', flat = True).distinct():
  classtocomp = DrugBankDrugEPClass.objects.filter(drug_bank_id = str(drg_id)).values_list('epc_id', flat=True).distinct()

to

drug_ids = DrugBankDrugEPClass.objects.values_list('drug_bank_id', flat = True).distinct()
comps = DrugBankDrugEPClass.objects.filter(drug_bank_id__in = drug_ids).values_list('epc_id', flat=True).distinct()

and then iterate over the comps result set.

Other optimizations you should do is add db_index = True to the necessary fields you're querying.

If you're using Postgres, you can add field params to distinct:

On PostgreSQL only, you can pass positional arguments (*fields) in order to specify the names of fields to which the DISTINCT should apply. This translates to a SELECT DISTINCT ON SQL query. Here’s the difference. For a normal distinct() call, the database compares each field in each row when determining which rows are distinct. For a distinct() call with specified field names, the database will only compare the specified field names.

in which you can do the following:

comps = DrugBankDrugEPClass.objects.values_list('drug_bank_id', flat = True).distinct('drug_bank_id', 'epc_id')

Edit to add:

Furthermore, you can use add-ons like django-silk or django-debug-toolbar to profile your queries and applications

1 Comment

Sorry forgot to mention that I am using MySQL and i will try the solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.