4

I have a dataframe with about 1.5 million rows. I want to convert this to a protobuf.

Naive method

# generated with protoc
import my_proto

pb = my_proto.Table()
for _, row in big_table.iterrows():
    e = pb.rows.add()
    e.similarity = row["similarity"]
    e.id = row["id"]

The throughput is about 100 rows per second. The total running time is about a couple of hours.

Is there a way to do this in a non-incremental fashion?

3
  • What's the context of your question? I can't tell if your question is pandas centric or protoc centric. Are you looking for a single pandas operations to transform your table? Commented Dec 3, 2020 at 3:14
  • @will.cass.wrig I can convert the data frame to something else like a dict or list, that part doesn’t matter as much. What matters is doing batch operations when writing profobuf data. Commented Dec 3, 2020 at 3:17
  • 1
    Sorry I'm not versed in protocol buffers but it seems like they can be implemented asynchronously (link). I would try tagging your post with grpc, there's a larger protobuf community under that tag vs protoc Commented Dec 3, 2020 at 3:35

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.