1

I am working with CSV data in my application. One of the columns in the CSV file contains a name. For each of the rows, I need to check to see if this name is already registered. If it is, I will store a variable and use that later on:

data = []
with open(path) as f:
    reader = csv.reader(f)
    for row in reader:
        user = False
        check_user = User.objects.filter(name=row[0])
        if check_user:
            user = check_user[0] # This will only return a single row so I want that one to be stored in the user variable instead of a list
        data.append({'name': row[0], 'age': row[1], 'phone': row[2], 'user': user})

Then in my view I will do something like:

{% for info in data %}
  <td>{{ data.name }}</td>
  <td>{% if data.user %}} {{ data.user.name }} {% else %} No user {% endif %}</td>
{% endfor %}

The problem

This all works well. However, the problem is that the list in the CSV file contains many duplicate names. So I can have 1,000 records, with only 10 different names. But in the current scenario, there will be 1,000 queries to the database. What I am trying to do, but I'm not sure how, is to somehow check if the name was already looked up, and if it was, the previous query result should be used instead of doing a new query.

Clarification

In my user table I have alice, bob, marta

In my CSV file I have records like:

name,age,phone
marta,30,12345
marta,30,12345
marta,30,12345
marta,30,12345
bob,22,33555
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939
alice,55,1939

In the case above, I only have three unique names, so I would like to only make 3 different queries to the database. In my current setup, each row would result in a query which is a waste of resources (and compounded by the fact that the CSV file is VERY large with LOTS of duplicate names).

2
  • Can you describe more? When the same name in csv row, you want to pass it? Commented Jun 26, 2018 at 6:57
  • Let me add some examples to clarify Commented Jun 26, 2018 at 6:57

2 Answers 2

1

You can do the following using numpy

Load the the csv file using numpy

data = np.loadtxt(path)

Ask numpy to return the unique names from the column of the data

names_column = data[:,0] # if the names are in 0th column
unique_names = np.unique(names_column,return_index=True)

The return_index will provide you with the indices of the unique names in the column, which you can use it for further processing.

EDIT

Specifically for the example input you have pasted, you could do the following

data = np.genfromtxt('in_data',dtype=None,names=True,delimiter=',')
print np.unique(data['name'],return_index=True)

Output will look like:

(array(['alice', 'bob', 'marta'], dtype='|S5'), array([5, 4, 0]))
Sign up to request clarification or add additional context in comments.

Comments

1
data = []
with open(path) as f:
    reader = csv.reader(f)
    for row in reader:
        ...
        data.append({'name': row[0], 'age': row[1], 'phone': row[2], 'user': user})

As you're loading the whole CSV into a Python list, you should try converting the list to a set; A set contains only unique values. You can always convert back to a list.

data_set = set(data)
unique_data_list = list(data_set)

In the template,

{% for info in unique_data_list %}
  <td>{{ info.name }}</td>
  <td>{% if info.user %}} {{ info.user.name }} {% else %} No user {% endif %}</td>
{% endfor %}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.