17

I have been using Python 2.7, Django 1.5 and PostgreSQL 9.2 for two weeks. Never saw it before. Everything is freshly installed on my Windows 7 machine, so it should have default settings. Django beautifully generates tables in my db. Looks like everything works fine. I am able to dump data from my database by running:

manage.py dumpdata > test.json

or

manage.py dumpdata  --indent4 > test.json

I saw that the JSON file it looks as it should.

Then, I truncate some tables and try to load them from the JSON file with:

python manage.py loaddata database = T2  test.json    // or without db name

I got the following error:

“UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte”

If I open the test.json file in notepad, save it as utf8 and try again, then I get:

“No JSON object could be decoded”

The file still looks OK, not empty.

By the way, when I open the JSON file with notepad it offers me to save it as Unicode. My database has UTF8 encoding. Please advise. Thank you.

2
  • 1
    Do not use Notepad to modify the code Commented Jul 24, 2013 at 20:08
  • show print(repr(open('test.json', 'rb').read(4))) Commented Jul 25, 2013 at 16:18

7 Answers 7

36

What worked for me is following these steps:

- Open the file in regular notepad
- Select save as
- Select encoding "UTF-8" (Not "UTF-8 (With BOM)")
- Save the file.

Now you can use loaddata.

However, this only works for files that are small enough for notepad to open.

Sign up to request clarification or add additional context in comments.

2 Comments

achieved in notepad++ by setting utf-8 via Encoding -> UTF-8, then saving
Works in VSCode too
7

0xff in position 0 looks like the start of a little-endian UTF-16 byte order marker to me. Notepad's "Unicode" save mode is little-endian UTF-16, so that makes sense if you saved your json from Notepad after creating it. Notepad will keep the byte order marker even in utf-8, which could plausibly cause loaddata to fail to parse it.

If you don't have your un-edited json still handy, you'll need to remove the BOM - personally I'd use emacs, but another answer suggested this stand-alone Windows .exe:

http://www.bryntyounce.com/filebomdetector.htm

3 Comments

Peter,Thank you for your reply. I cannot use emacs since I have Windows7. I did install utility you suggested and run it. Indeed it shows that all files but one doctored by Notepad are UTF-16. However after running the utility I still have the same “UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte”
Step 1: convert to UTF-8. Step 2: Remove the BOM.
"I cannot use emacs since I have Windows7": Yes, you can. gnu.org/software/emacs/download.html
4

After good research, I got the solution. In my case, datadump.json file was having the issue.

  • Simply Open the file in notepad format
  • Click on save as option
  • Go to encoding section below & Click on "UTF-8"
  • Save the file.

Now you can try running the command. You are good to go :)

For your reference, I have attached images below.

Notepad

Save as

UTF-8

Comments

4

On windows, if you run your standard dumpdata command with -Xutf8 it has always solved this problem for me:

python -Xutf8 manage.py dumpdata app.mymodel > app/fixtures/mymodel.json

Here is an article for reference: https://dev.to/methane/python-use-utf-8-mode-on-windows-212i

Comments

2

I found one way to solve this issue by manually re-output a new binary json file with following code, rb stand for "read and binary", wb for "write and binary".

First, go to shell:

python manage.py shell

Second, rewrite the test.json to a binary file:

with open('path/to/test.json', 'rb') as f:
    data = f.read()
newdata = open('newfile.json', 'wb')
newdata.write(data)
newdata.close()
exit()

Then you can load the file:

python manage.py loaddata newfile.json

Above code works for me. Hope it can help you as well.

Comments

2

i encountered the same problem when loading data. it has a problem with encodings. install notepad ++. and change the encoding format to UTF-8

in the lower right corner you can see the current encoding. if it is not UTF- 8, you can simply change it to UTF-8 form the encoding menu tab.

this solution worked for me.

orginal post

Comments

1

If you are using newer versions of windows 10 you can use notepad to change the encoding from UTF-16 to UTF-8 simply by saving the file again and selecting the encoding option on the save dialog. See the example image below.

2 Comments

Please can you link to the image
Wondering why the Django manage.py dumpdata saves it in UTF-16 to begin with, anyone knows?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.