0

I am building a tool that allows users to upload CSV files to update the database using the django-import-export tool. I uploaded my test CSV file with one data row, then uploaded it again and got a duplicated row (with a new primary key but all the other values are the same). The row.import_type value is "updated" but the only thing that is updated is the id.

Then I upload the same file a third time and get an error:

app.models.Role.MultipleObjectsReturned: get() returned more than one Role -- it returned 2!

(I really appreciate the exclamation point in that error message, by the way.)

Ideally I would get a skipped row on the second import and third import of the file. I suppose I'd be okay with an error. The file's contents are:

    Sales Role,System Role,System Plan,id
    Sales Rep,Account Executive,951M-NA,

This is the format users get when they export the csv dataset. Ideally they would export a file, change a few columns (aside from the name which is the import_id_field), and re-upload the data.

In app/resources.py:

    class RoleResourec(resources.ModelResource):
        name = Field(attribute='name', column_name="Sales Role")
        default_role = Field(attribute='default_role', column_name="System Role")
        default_plan = Field(attribute='default_plan', column_name="System Plan")
    
        class Meta:
            models=Role
            fields= ('id', 'name', 'default_role', 'default_plan')
            import_id_fields = ('name',)
            skip_unchanged = True

From what I can tell, on the second import, the get_or_init_instance() method isn't finding the object from the first import, but then does find them on the third. I haven't done anything to the resource to customize the import workflow as described in Import data workflow page.

What's going wrong here? Do I need to customize the import workflow or did I miss yet-another required attribute in the Resource?

2 Answers 2

1

The logic will only skip the row if all declared fields are same in both imported row and the persisted object. If any field is different, then an update will be performed.

For this to work, the fields you declare in import_id_fields have to be a unique match for a row, otherwise you will get MultipleObjectsReturned.

In your case, if duplicate rows are being created, then it must mean that name is not present in the db on the second run. I assume that you have not overridden ModelInstanceLoader or are running in bulk mode, because this would disrupt the skip row logic.

By default, import_id_fields is set to the row id, so if you can include this in your export then you are guaranteed to have a unique row. Obviously the users should not change this field, otherwise you will get duplicates.

The MultipleObjectsReturned error comes from here, and it's simply a call to Role.objects.get(name=<n>).

Sign up to request clarification or add additional context in comments.

3 Comments

I'm typing everything into SO by hand, so that's the source of the typo (I can never remember how to copy from vim to the system clipboard). I did find the error, after working hard to figure out how to subclass the instance loader so I could put in the breakpoint. Removing 'id' from the fields list in the Meta fixed the problem. I'll write that up as the answer.
if you are using PyCharm then it is easy to add breakpoints to library code - no need to subclass
I'm in vim in a virtual environment, and I did start digging through the source code to figure out where to put the breakpoint, but then finding it on my system, yadda yadda. In short, I'm afraid to mess with somebody else's library, which is why I thought about subclassing.
0

Here's the surprisingly simple solution:


    class RoleResourec(resources.ModelResource):
            name = Field(attribute='name', column_name="Sales Role")
            default_role = Field(attribute='default_role', column_name="System Role")
            default_plan = Field(attribute='default_plan', column_name="System Plan")
        
            class Meta:
                models=Role
                fields= ('name', 'default_role', 'default_plan')
                import_id_fields = ('name',)
                skip_unchanged = True

All I had to do was remove the 'id' from the fields list in the Meta class and now I get the expected behavior.

I can export this file to CSV (and the id column does not appear), edit the list, and re-upload and the system skips and updates and even adds new things as necessary.

2 Comments

I don't understand how this fixes it, so you might be masking the true issue. If 'name' is truly unique, then it should match on that unique row, and then skip or update depending on whether or not the other fields are unchanged. I've updated my answer to add more detail.
I empty the app's tables by running python manage.py migrate <app> zero and then python manage.py migrate <app> which gives me a fresh start on everything. I upload a csv file with a header row and a data row. I confirm the table has only one row. Then I import the exact same file, and get two rows of the same data (only the id is new).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.