0

I have been trying to use python to extract compressed blobs from Oracle and decompress them. But the bigger blobs do not get decompressed completely.

I have tried using dataframes and files to store the blobs and then decompress them but it doesn't convert the bigger blobs. Can memory be the possible issue? What changes can I try out?

I cannot share the blobs as its restricted data. And I dont have access to create test data.

I am using the below decompression code from git for decompressing which works perfectly for smaller blobs

https://github.com/joeatwork/python-lzw/blob/master/lzw/init.py

Below is my sample code :

sql_string = """select 
event_id
,blob_length
,blob field

 from table"""

cur.execute(sql_string)
path = "P:/Folders/"

    for row in cur:
        print('de-BLOBbing {}..\n')
        filename = path +  "clinical_notes_" + str(row[0]) + "_" + str(row[1]) + ".txt"      
        filename1 = path1 +  "clinical_notes_" + str(row[0]) + "_" + str(row[1]) + ".txt"      
        f = open(filename, "wb")
        f.write(row[3].read())
        f.close()
        h = html2text.HTML2Text()
        h.ignore_links=True
        blobbytes = row[3].read()
        f2 = h.handle(striprtf(decompress_without_eoi(blobbytes)))
        f1 = codecs.open(filename1, encoding='utf-8', mode='wb+')
        f1.write(f2)
        f1.close()

Also in case if I put them in data frames below is what it shows me in regards to structure and memory usage where is_blob is the blob field. enter image description here

5
  • Can you make the sample code reproducible? When I run it, it fails in the line cur.execute(sql_string), because cur is not defined. Even if that would run, I'm missing your input file. Can you upload a demo file (can be with random data for privacy reasons) somewhere so we can test? Commented Aug 20, 2018 at 0:24
  • Thans for trying @phihag I cant create any sample data as it always involves patient details and i don't have access to the compression algorithm. cur is the oracle connection using cx_Oracle. Another hint which I can give is, when I saved the blobs in the dataframe and then looked at the contents of the dataframe in the variable explorer,it doesnt show me the complete blob. Otherwise if I try to compress some test data using the compression code from git, it compresses/decompresses any length of data happily. Sorry that I am unable to recreate it. Commented Aug 20, 2018 at 1:57
  • 1
    So decompress_without_eoi() expects the whole string of the data to be in memory at once? Then it doesn't make sense to write the data to a file only to read it back again (see our discussion at stackoverflow.com/questions/51868112/…). Alternatives would be to use a streaming interface for decompression (if one exists), or some external tool, or get more memory :) Commented Aug 20, 2018 at 3:44
  • Yes this is exactly what is happening. the function decompress_without_eoi() expects the whole string of the data to be in memory at once which in this case is not happening. Sorry I am not good with the memory and python stuff. So was just trying to explore more :) But this issue had been making me crazy and I still haven't resolved it. And when you say get more memory can you explain a bit further? Are you talking about my computer RAM. My current RAM is 8 GB and is there a way I can check how much memory is allocated to python and increase that in case if its less. Commented Aug 20, 2018 at 5:00
  • Try and identify the cause of the problem. Are you getting errors? An out of memory error should be obvious, and since you haven't mentioned one, I'm not sure you have a memory problem. Check that that blobbytes contains the whole data to be decompressed. Check each step of the extraction process individually and make sure each bit is working, e.g. don't do h.handle(striprtf(decompress_without_eoi() until you know that each stage works. Commented Aug 20, 2018 at 23:49

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.