Avoid duplicate queries in Django

Question

I am working with CSV data in my application. One of the columns in the CSV file contains a name. For each of the rows, I need to check to see if this name is already registered. If it is, I will store a variable and use that later on:

data = []
with open(path) as f:
    reader = csv.reader(f)
    for row in reader:
        user = False
        check_user = User.objects.filter(name=row[0])
        if check_user:
            user = check_user[0] # This will only return a single row so I want that one to be stored in the user variable instead of a list
        data.append({'name': row[0], 'age': row[1], 'phone': row[2], 'user': user})

Then in my view I will do something like:

{% for info in data %}
  <td>{{ data.name }}</td>
  <td>{% if data.user %}} {{ data.user.name }} {% else %} No user {% endif %}</td>
{% endfor %}

The problem

This all works well. However, the problem is that the list in the CSV file contains many duplicate names. So I can have 1,000 records, with only 10 different names. But in the current scenario, there will be 1,000 queries to the database. What I am trying to do, but I'm not sure how, is to somehow check if the name was already looked up, and if it was, the previous query result should be used instead of doing a new query.

Clarification

In my user table I have alice, bob, marta

In my CSV file I have records like:

name,age,phone
marta,30,12345
marta,30,12345
marta,30,12345
marta,30,12345
bob,22,33555
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939

In the case above, I only have three unique names, so I would like to only make 3 different queries to the database. In my current setup, each row would result in a query which is a waste of resources (and compounded by the fact that the CSV file is VERY large with LOTS of duplicate names).

Can you describe more? When the same name in csv row, you want to pass it? — seuling
– seuling, Commented Jun 26, 2018 at 6:57

Ash Sharma · Accepted Answer · 2018-06-26 07:16:17Z

1

You can do the following using numpy

Load the the csv file using numpy

data = np.loadtxt(path)

Ask numpy to return the unique names from the column of the data

names_column = data[:,0] # if the names are in 0th column
unique_names = np.unique(names_column,return_index=True)

The return_index will provide you with the indices of the unique names in the column, which you can use it for further processing.

EDIT

Specifically for the example input you have pasted, you could do the following

data = np.genfromtxt('in_data',dtype=None,names=True,delimiter=',')
print np.unique(data['name'],return_index=True)

Output will look like:

(array(['alice', 'bob', 'marta'], dtype='|S5'), array([5, 4, 0]))

edited Jun 26, 2018 at 7:16

answered Jun 26, 2018 at 7:03

Ash Sharma

4783 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Suraj · Accepted Answer · 2018-06-26 07:26:34Z

1

data = []
with open(path) as f:
    reader = csv.reader(f)
    for row in reader:
        ...
        data.append({'name': row[0], 'age': row[1], 'phone': row[2], 'user': user})

As you're loading the whole CSV into a Python list, you should try converting the list to a set; A set contains only unique values. You can always convert back to a list.

data_set = set(data)
unique_data_list = list(data_set)

In the template,

{% for info in unique_data_list %}
  <td>{{ info.name }}</td>
  <td>{% if info.user %}} {{ info.user.name }} {% else %} No user {% endif %}</td>
{% endfor %}

answered Jun 26, 2018 at 7:26

Suraj

9406 gold badges23 silver badges44 bronze badges

Collectives™ on Stack Overflow

Avoid duplicate queries in Django

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related