The ZIP file format uses DEFLATE for its compression algorithm. So you need to consider how that algorithm works and pick data such that the algorithm finds it easy to compress. According to the wikipedia article, there are two stages of compression. The first uses LZ77 to find repeated sections of data and replace them with short references. The second uses Huffman coding to take the remaining data and strip out redundancy across the whole block. This is called entropic coding - if the information isn't very random (has low entropy) the code replaces common things with short symbols, increasing the entropy.
In general, then, lists with lots of repeated runs (i.e., [111,2,44,93,111,2,44,93...]) will compress well in the first pass. Lists with lots of repeated numbers within other random stuff (i.e., [111,34,43,50,111,34,111,111,2,34,22,60,111,98,2], where 34 and 111 show up often) will compress well in the second pass.
To find suitable numbers, I think the easiest thing to do is just sort each list, then merge them, keeping the merge sorted, until you get to 42000 output numbers. You'll get runs as they happen. This won't be optimal, you might have the number 255 in each input list and you'd miss them using this technique, but it would be easy.
Another approach would be to histogram the numbers into 256 bins. Any bins that stand out indicate numbers that should be grouped. After that, I guess you have to search for sequences. Again, sorting the inputs will probably make this easier.
I just noticed you had the constraint that you have to pick one number from each list. So in both cases you could sort each list then remove duplicates.
Additionally, Huffman codes can be generated using a tree, so I wonder if there's some magic tree structure you could put the numbers into that would automatically give the right answer.
homeworktag if so.