Python: Can dumpdata cannot loaddata back. UnicodeDecodeError

Question

I have been using Python 2.7, Django 1.5 and PostgreSQL 9.2 for two weeks. Never saw it before. Everything is freshly installed on my Windows 7 machine, so it should have default settings. Django beautifully generates tables in my db. Looks like everything works fine. I am able to dump data from my database by running:

manage.py dumpdata > test.json

or

manage.py dumpdata  --indent4 > test.json

I saw that the JSON file it looks as it should.

Then, I truncate some tables and try to load them from the JSON file with:

python manage.py loaddata database = T2  test.json    // or without db name

I got the following error:

“UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte”

If I open the test.json file in notepad, save it as utf8 and try again, then I get:

“No JSON object could be decoded”

The file still looks OK, not empty.

By the way, when I open the JSON file with notepad it offers me to save it as Unicode. My database has UTF8 encoding. Please advise. Thank you.

Do not use Notepad to modify the code

Paulo Bu
– Paulo Bu

2013-07-24 20:08:54 +00:00
Commented Jul 24, 2013 at 20:08 — Paulo Bu
– Paulo Bu, Commented Jul 24, 2013 at 20:08
show print(repr(open('test.json', 'rb').read(4)))

jfs
– jfs

2013-07-25 16:18:54 +00:00
Commented Jul 25, 2013 at 16:18 — jfs
– jfs, Commented Jul 25, 2013 at 16:18

Ducktown · Accepted Answer · 2020-01-27 14:17:58Z

36

What worked for me is following these steps:

- Open the file in regular notepad
- Select save as
- Select encoding "UTF-8" (Not "UTF-8 (With BOM)")
- Save the file.

Now you can use loaddata.

However, this only works for files that are small enough for notepad to open.

edited Jan 27, 2020 at 14:17

answered Jan 22, 2020 at 10:05

Ducktown

4614 silver badges5 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

andyw Over a year ago

achieved in notepad++ by setting utf-8 via Encoding -> UTF-8, then saving

Psddp Over a year ago

Works in VSCode too

Community · Accepted Answer · 2017-05-23 12:25:15Z

7

0xff in position 0 looks like the start of a little-endian UTF-16 byte order marker to me. Notepad's "Unicode" save mode is little-endian UTF-16, so that makes sense if you saved your json from Notepad after creating it. Notepad will keep the byte order marker even in utf-8, which could plausibly cause loaddata to fail to parse it.

If you don't have your un-edited json still handy, you'll need to remove the BOM - personally I'd use emacs, but another answer suggested this stand-alone Windows .exe:

http://www.bryntyounce.com/filebomdetector.htm

edited May 23, 2017 at 12:25

CommunityBot

11 silver badge

answered Jul 24, 2013 at 20:06

Peter DeGlopper

37.5k7 gold badges95 silver badges88 bronze badges

3 Comments

Elena Kr Over a year ago

Peter,Thank you for your reply. I cannot use emacs since I have Windows7. I did install utility you suggested and run it. Indeed it shows that all files but one doctored by Notepad are UTF-16. However after running the utility I still have the same “UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte”

Peter DeGlopper Over a year ago

Step 1: convert to UTF-8. Step 2: Remove the BOM.

pst Over a year ago

"I cannot use emacs since I have Windows7": Yes, you can. gnu.org/software/emacs/download.html

Swayam Siddha Panda · Accepted Answer · 2022-01-29 10:54:21Z

4

After good research, I got the solution. In my case, datadump.json file was having the issue.

Simply Open the file in notepad format
Click on save as option
Go to encoding section below & Click on "UTF-8"
Save the file.

Now you can try running the command. You are good to go :)

For your reference, I have attached images below.

Notepad

Save as

UTF-8

answered Jan 29, 2022 at 10:54

Swayam Siddha Panda

512 bronze badges

Comments

Scott · Accepted Answer · 2022-11-18 14:04:48Z

4

On windows, if you run your standard dumpdata command with -Xutf8 it has always solved this problem for me:

python -Xutf8 manage.py dumpdata app.mymodel > app/fixtures/mymodel.json

Here is an article for reference: https://dev.to/methane/python-use-utf-8-mode-on-windows-212i

answered Nov 18, 2022 at 14:04

Scott

5366 silver badges11 bronze badges

Comments

Aidan Fitzpatrick · Accepted Answer · 2019-07-13 03:50:19Z

2

I found one way to solve this issue by manually re-output a new binary json file with following code, rb stand for "read and binary", wb for "write and binary".

First, go to shell:

python manage.py shell

Second, rewrite the test.json to a binary file:

with open('path/to/test.json', 'rb') as f:
    data = f.read()
newdata = open('newfile.json', 'wb')
newdata.write(data)
newdata.close()
exit()

Then you can load the file:

python manage.py loaddata newfile.json

Above code works for me. Hope it can help you as well.

edited Jul 13, 2019 at 3:50

Aidan Fitzpatrick

2,0751 gold badge22 silver badges26 bronze badges

answered Aug 27, 2018 at 5:27

Henning Lee

5845 silver badges13 bronze badges

Comments

zoro juro · Accepted Answer · 2020-01-14 02:22:29Z

2

i encountered the same problem when loading data. it has a problem with encodings. install notepad ++. and change the encoding format to UTF-8

in the lower right corner you can see the current encoding. if it is not UTF- 8, you can simply change it to UTF-8 form the encoding menu tab.

this solution worked for me.

orginal post

answered Jan 14, 2020 at 2:22

zoro juro

1061 silver badge5 bronze badges

Comments

Caleb Kandoro · Accepted Answer · 2020-01-30 18:45:29Z

1

If you are using newer versions of windows 10 you can use notepad to change the encoding from UTF-16 to UTF-8 simply by saving the file again and selecting the encoding option on the save dialog. See the example image below.

answered Jan 30, 2020 at 18:45

Caleb Kandoro

191 bronze badge

2 Comments

alias51 Over a year ago

Please can you link to the image

John Mc Over a year ago

Wondering why the Django manage.py dumpdata saves it in UTF-16 to begin with, anyone knows?

Collectives™ on Stack Overflow

Python: Can dumpdata cannot loaddata back. UnicodeDecodeError

7 Answers 7

2 Comments

3 Comments

Comments

Comments

Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

2 Comments

3 Comments

Comments

Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related