Django import-export duplicating rows when importing same file multiple times

Question

I am building a tool that allows users to upload CSV files to update the database using the django-import-export tool. I uploaded my test CSV file with one data row, then uploaded it again and got a duplicated row (with a new primary key but all the other values are the same). The row.import_type value is "updated" but the only thing that is updated is the id.

Then I upload the same file a third time and get an error:

app.models.Role.MultipleObjectsReturned: get() returned more than one Role -- it returned 2!

(I really appreciate the exclamation point in that error message, by the way.)

Ideally I would get a skipped row on the second import and third import of the file. I suppose I'd be okay with an error. The file's contents are:

    Sales Role,System Role,System Plan,id
    Sales Rep,Account Executive,951M-NA,

This is the format users get when they export the csv dataset. Ideally they would export a file, change a few columns (aside from the name which is the import_id_field), and re-upload the data.

In app/resources.py:

    class RoleResourec(resources.ModelResource):
        name = Field(attribute='name', column_name="Sales Role")
        default_role = Field(attribute='default_role', column_name="System Role")
        default_plan = Field(attribute='default_plan', column_name="System Plan")
    
        class Meta:
            models=Role
            fields= ('id', 'name', 'default_role', 'default_plan')
            import_id_fields = ('name',)
            skip_unchanged = True

From what I can tell, on the second import, the get_or_init_instance() method isn't finding the object from the first import, but then does find them on the third. I haven't done anything to the resource to customize the import workflow as described in Import data workflow page.

What's going wrong here? Do I need to customize the import workflow or did I miss yet-another required attribute in the Resource?

Matthew Hegarty · Accepted Answer · 2021-03-13 07:50:56Z

1

The logic will only skip the row if all declared fields are same in both imported row and the persisted object. If any field is different, then an update will be performed.

For this to work, the fields you declare in import_id_fields have to be a unique match for a row, otherwise you will get MultipleObjectsReturned.

In your case, if duplicate rows are being created, then it must mean that name is not present in the db on the second run. I assume that you have not overridden ModelInstanceLoader or are running in bulk mode, because this would disrupt the skip row logic.

By default, import_id_fields is set to the row id, so if you can include this in your export then you are guaranteed to have a unique row. Obviously the users should not change this field, otherwise you will get duplicates.

The MultipleObjectsReturned error comes from here, and it's simply a call to Role.objects.get(name=<n>).

edited Mar 13, 2021 at 7:50

answered Mar 12, 2021 at 20:01

Matthew Hegarty

4,3242 gold badges31 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Josh English Over a year ago

I'm typing everything into SO by hand, so that's the source of the typo (I can never remember how to copy from vim to the system clipboard). I did find the error, after working hard to figure out how to subclass the instance loader so I could put in the breakpoint. Removing 'id' from the fields list in the Meta fixed the problem. I'll write that up as the answer.

Matthew Hegarty Over a year ago

if you are using PyCharm then it is easy to add breakpoints to library code - no need to subclass

Josh English Over a year ago

I'm in vim in a virtual environment, and I did start digging through the source code to figure out where to put the breakpoint, but then finding it on my system, yadda yadda. In short, I'm afraid to mess with somebody else's library, which is why I thought about subclassing.

Josh English · Accepted Answer · 2021-03-12 21:56:16Z

0

Here's the surprisingly simple solution:


    class RoleResourec(resources.ModelResource):
            name = Field(attribute='name', column_name="Sales Role")
            default_role = Field(attribute='default_role', column_name="System Role")
            default_plan = Field(attribute='default_plan', column_name="System Plan")
        
            class Meta:
                models=Role
                fields= ('name', 'default_role', 'default_plan')
                import_id_fields = ('name',)
                skip_unchanged = True

All I had to do was remove the 'id' from the fields list in the Meta class and now I get the expected behavior.

I can export this file to CSV (and the id column does not appear), edit the list, and re-upload and the system skips and updates and even adds new things as necessary.

answered Mar 12, 2021 at 21:56

Josh English

5523 silver badges19 bronze badges

2 Comments

Matthew Hegarty Over a year ago

I don't understand how this fixes it, so you might be masking the true issue. If 'name' is truly unique, then it should match on that unique row, and then skip or update depending on whether or not the other fields are unchanged. I've updated my answer to add more detail.

Josh English Over a year ago

I empty the app's tables by running python manage.py migrate <app> zero and then python manage.py migrate <app> which gives me a fresh start on everything. I upload a csv file with a header row and a data row. I confirm the table has only one row. Then I import the exact same file, and get two rows of the same data (only the id is new).

Collectives™ on Stack Overflow

Django import-export duplicating rows when importing same file multiple times

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related