1

I'm trying to make a data archive, but my data.gdf is not how it should be, the data.gdf is the concatenation of all files, all stored one after the others without any delimiter. the starting offsets and the lengths stored in the data.gdh are so obviously highly critical, if any is invalid the file described by the corrupted record can no longer be extracted, and obviously all next files will likely also be not extractable. And right now I'm trying to create a data archive with png files but it doesnt quiet seem to work.

import os

#--------Encryption/Decryption of data---------#
hidden
#--------Encryption/Decryption of data---------#
#                                              #
#--------------------Main----------------------#

with open('Output//data.gdf', 'w') as gdf: # clean data.gdf
    gdf.write('')

files = []
for (path, dirnames, filenames) in os.walk('Data'):
    files.extend(os.path.join(path, name) for name in filenames)

file_data = 'YwuiHg'
for i in files:
    with open(i, 'r') as data:
        with open('Output//data.gdf', 'r') as gdf:
            dataOffset = len(gdf.read())
        with open('Output//data.gdf', 'w') as gdf:
            gdf.write(data.read())
        dataLength = len(data.read())
        file_data += i + str(dataOffset) + 'FR' + str(dataLength) + 'FT' + 'eihwfw'
print file_data
with open('Output//data.gdh', 'w') as gdh:
    gdh.write(encrypt(key, file_data))

When printing the file_data:

YwuiHgData\images\background.png0FR1749FTeihwfwData\images\background1.png5FR354FTeihwfwData\images\gameover.png5FR0FTeihwfwData\images\ground.png5FR1571FTeihwfwData\images\icon.png5FR599FTeihwfwData\images\loadbackground.png5FR314FTeihwfwData\images\medal1.png5FR0FTeihwfwData\images\medal2.png5FR0FTeihwfwData\images\medal3.png5FR0FTeihwfwData\images\medal4.png5FR0FTeihwfwData\images\player1.png5FR0FTeihwfwData\images\player2.png5FR0FTeihwfwData\images\player3.png5FR0FTeihwfwData\images\playerdead.png5FR0FTeihwfwData\images\scorereward.png5FR0FTeihwfwData\images\start.png5FR239FTeihwfw

The offsets and data length also seem to be messed up. How would I fix all this? Thanks a lot!

EDIT: This issue has been fixed by @XavierCombelle, but I have a new problem when I want to load an image, for example the first one in the list background.png. When I put its full path Data\images\background.png it doesn't find the path but it does when i just simply put background.png Does this have to do with the \ being an escape code or something like that? FIXED by myself:

try:
    os.remove('Output//data.gdf')
except:
    'file does not exist, no need to delete'    

files = []
for (path, dirnames, filenames) in os.walk('Data'):
    files.extend(os.path.join(path, name) for name in filenames)

file_data = 'YwuiHg'
#ab mode writes to the end of the file so need to have clean file when beginning to make new data.gdf otherwise the whole file would be messed up.
print 'Opening data.gdf for writing...\n'
with open('Output//data.gdf', 'ab') as gdf:            
    for i in files:
        i.replace("\ ","\\")
        with open(i, 'rb') as data_file:
            data = data_file.read()
            dataOffset = str(gdf.tell())
            dataLength = str(len(data))
            print 'Writing to data.gdf ' + i + ' at offset ' + dataOffset + '. Data Length ' + dataLength
            gdf.write(data)
            print 'Storing identity of data into file_data -> ' + i + dataOffset + 'FR' + dataLength + 'FT' + 'eihwfw\n'
            file_data += i + dataOffset + 'FR' + dataLength + 'FT' + 'eihwfw'

print 'Encrypting file_data variable and writing it to data.gdh'
with open('Output//data.gdh', 'w') as gdh:
    gdh.write(encrypt(key, file_data))

exiting = raw_input('Press any key to continue...')
12
  • 1
    If the files contain binary data, you need to read & write them as such (adding a 'b' to your 'r' or 'w' argument to `open'). Commented Aug 3, 2014 at 19:22
  • Why not use a standard container format like e.g. shelve or an sqlite3 database to store the files? Commented Aug 3, 2014 at 19:56
  • encryption @Roland Smith Commented Aug 3, 2014 at 19:59
  • @JudiSean There is nothing stopping you from putting encrypted data in a shelve or db. But if the key is stored inside your program encryption is pretty much useless. Commented Aug 3, 2014 at 20:04
  • Why is it useless, where would I store the key then? @Roland Smith Commented Aug 3, 2014 at 20:16

2 Answers 2

1

open('Output//data.gdf', 'w') will truncate the output file, losing data that you have already written to it. So your offset is always the length of the previous entry written (the rest of the file's contents are lost each time).

You can simplify the two open() calls on the output file to a single one, and switch to appending:

with open('Output//data.gdf', 'a') as gdf:
  dataOffset = gdf.tell()
  gdf.write(data.read())

The data length is messed up because the first data.read() reads all the way to the end of the data file, so the len(data.read()) starts at the end of the file and returns nothing (so len = 0). Try instead:

dataLength = data.tell()
Sign up to request clarification or add additional context in comments.

Comments

0

First of all you have to set binary mode ("rb" or "wb" for any file which is not text)

Each time you open a file with "r" or "w" (even "rb" or "wb") it is reset to the start

To have the dataOffset of a file you can call tell() method

When you do data.read() it consume all the file so subsequent read return an empty string of length 0

so the core loop could be replaced by

#wb mode reset data.gdf so no need to doing previous write
with open('Output//data.gdf', 'wb') as gdf:            
    for i in files:
        with open(i, 'rb') as data_file:
            data = data_file.read()
            dataOffset = gdf.tell()
            gdf.write(data)
            dataLength = len(data)
            file_data += i + str(dataOffset) + 'FR' + str(dataLength) + 'FT' + 'eihwfw'

The gdf file format look very strange for example if you have a file named 127FR49FTeihwfw.png strange things would happen

1 Comment

Thanks, just did a small edit to your code for it to be perfect so accept the edit @Xavier Combelle

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.