1

I need to get big Boolean arrays or BitSets from Java into Python via a text file. Ideally I want to go via a Base64 representation to stay compact, but still be able to embed the value in a CSV file. (So the boolean array will be one column in a CSV file.)

However I am having issues to get the byte alignment right. Where/how should I specify the correct byte order?

This is one example, working in the sense that it executes but not working in that my bits aren't where I want them.

Java:

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.util.Base64;
import java.util.Base64.Encoder;
import java.util.BitSet;

public class basictest {

    public static void main(String[] args) throws Exception {
        // TODO Auto-generated method stub
        Encoder b64 = Base64.getEncoder();
        String name = "name";
        BitSet b = new BitSet();
        b.set(444);
        b.set(777);
        b.set(555);
        byte[] bBytes = b.toByteArray();
        String fp_str = b64.encodeToString(bBytes);
        BufferedWriter w = new BufferedWriter(new FileWriter("out.tsv"));
        w.write(name  + "\t" + fp_str + "\n");
        w.close();
    }

}

Python:

import numpy as np
import base64
from bitstring import BitArray, BitStream ,ConstBitStream
filename = "out.tsv"
with open(filename) as file:
    data = file.readline().split('\t')
b_b64 = data[1]
b_bytes = base64.b64decode(b_b64)
b_bits = BitArray(bytes=b_bytes)

b_bits[444] # False
b_bits[555] # False
b_bits[777] # False
# but
b_bits[556] # True
# it's not shifted:
b_bits[445] # False
0

3 Answers 3

1

I am now reversing the bits in every byte using https://stackoverflow.com/a/5333563/1259675:

numbits = 8
r_bytes = [
    sum(1<<(numbits-1-i) for i in range(numbits) if b>>i&1)
    for b in b_bytes]
b_bits = BitArray(r_bytes)

This works, but is there a method that doesn't involve myself fiddling with the bits?

Sign up to request clarification or add additional context in comments.

Comments

0

If:

  • the maximum bit to set is "sufficiently small".
  • and the data, you want to encode doesn't vary in size too much.

..then one approach can be:

  • Set max (+ min) significant bit(s in java) .
  • and ignore them in python .

, then it c(sh!)ould work without byte reversal, or further transformation:

// assuming a 1024 bit word
public static final int LEFT_SIGN = 0;
public static final int RIGHT_SIGN = 1025; //choose a size, that fits your needs [0 .. Integer.MAX_VALUE - 1 (<-theoretically)] 

public static void main(String[] args) throws Exception {
    ...
    b.set(LEFT_SIGN);
    b.set(444 + 1);
    b.set(777 + 1);
    b.set(555 + 1);
    b.set(RIGHT_SIGN);
    ...

and then in python:

# as before ..
b_bits[0] # Ignore!
b_bits[445] # True
b_bits[556] # True
b_bits[778] # True
b_bits[1025] # Ignore!;)

Your convenience (= encoding) 'd be the (maximum) "word length" ... with all its benefits and drawbacks.

Comments

0

We can use the bitarray package from python for this particular usecase.

from bitarray import bitarray
import base64

with open(filename) as file:
    data = file.readline().strip().split('\t')

b_b64 = data[1]
b_bytes = base64.b64decode(b_b64)
bs = bitarray(endian='little')
bs.frombytes(b_bytes)
print bs

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.