10

I am trying to dynamically parse a given .proto file in Java to decode a Protobuf-encoded binary.

I have the following parsing method, in which the "proto" string contains the content of the .proto file:

public static Descriptors.FileDescriptor parseProto (String proto) throws InvalidProtocolBufferException, Descriptors.DescriptorValidationException {
        DescriptorProtos.FileDescriptorProto descriptorProto = DescriptorProtos.FileDescriptorProto.parseFrom(proto.getBytes());
        return Descriptors.FileDescriptor.buildFrom(descriptorProto, null);
}

Though, on execution the previous method throws an exception with the message "Protocol message tag had invalid wire type.". I use the example .proto file from Google so I guess it is valid: https://github.com/google/protobuf/blob/master/examples/addressbook.proto

Here is the stack trace:

15:43:24.707 [pool-1-thread-1] ERROR com.github.whiver.nifi.processor.ProtobufDecoderProcessor - ProtobufDecoderProcessor[id=42c8ab94-2d8a-491b-bd99-b4451d127ae0] Protocol message tag had invalid wire type.
com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag had invalid wire type.
    at com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:115)
    at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:551)
    at com.google.protobuf.GeneratedMessageV3.parseUnknownField(GeneratedMessageV3.java:293)
    at com.google.protobuf.DescriptorProtos$FileDescriptorSet.<init>(DescriptorProtos.java:88)
    at com.google.protobuf.DescriptorProtos$FileDescriptorSet.<init>(DescriptorProtos.java:53)
    at com.google.protobuf.DescriptorProtos$FileDescriptorSet$1.parsePartialFrom(DescriptorProtos.java:773)
    at com.google.protobuf.DescriptorProtos$FileDescriptorSet$1.parsePartialFrom(DescriptorProtos.java:768)
    at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:163)
    at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:197)
    at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209)
    at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:214)
    at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
    at com.google.protobuf.DescriptorProtos$FileDescriptorSet.parseFrom(DescriptorProtos.java:260)
    at com.github.whiver.nifi.parser.SchemaParser.parseProto(SchemaParser.java:9)
    at com.github.whiver.nifi.processor.ProtobufDecoderProcessor.lambda$onTrigger$0(ProtobufDecoderProcessor.java:103)
    at org.apache.nifi.util.MockProcessSession.write(MockProcessSession.java:895)
    at org.apache.nifi.util.MockProcessSession.write(MockProcessSession.java:62)
    at com.github.whiver.nifi.processor.ProtobufDecoderProcessor.onTrigger(ProtobufDecoderProcessor.java:100)
    at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
    at org.apache.nifi.util.StandardProcessorTestRunner$RunProcessor.call(StandardProcessorTestRunner.java:251)
    at org.apache.nifi.util.StandardProcessorTestRunner$RunProcessor.call(StandardProcessorTestRunner.java:245)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Any idea? Thank you!

0

4 Answers 4

12

It looks like you're trying to use FileDescriptorSet.parseFrom to populate a FileDescriptorSet. This will only work if the bytes you're providing are the binary protobuf contents - which is to say: a compiled schema. You can get a compiled schema by using the protoc command-line-tool with the --descriptor_set_out option. What you're actually passing it right now is the text bytes that make up the text schema, which is not what parseFrom expects.

Without a compiled schema, you would need a runtime .proto parser. I'm not aware of one for Java; protobuf-net includes one (protobuf-net.Reflection), but that is C#/.NET. Without an available runtime .proto parser, you'd need to shell-execute protoc instead.

Sign up to request clarification or add additional context in comments.

10 Comments

I see, this makes sense. I will try to find a way to compile my proto file. Thank you for your answer!
@WSH it should just be protoc --descriptor_set_out
Okay, thanks for the precision. However if someone has ever heard about a Java proto compiler, I'm still interested :)
@WSH There probably isn't a Java implementation. The protocol compiler is more complicated than you might expect, and maintaining multiple implementations of the compiler in many languages wouldn't make a lot of sense. What you should try to do is arrange to have your .protos parsed using --descriptor_set_out offline, then send around the compiled descriptor set as needed, rather than try to parse the whole .proto file on-demand.
Hey there @kenton, great to see you still chiming in on protobuf questions. And yes, I found some of those complications when I finally got around to writing a 100% c# implementation and running it through every .proto I could find including most of the public Google API surface :)
|
2

Drawing from the other answers, here's a snippet of working Kotlin code from a library I'm developing. https://github.com/asarkar/okgrpc

private fun lookupProtos(
    protoPaths: List<String>,
    protoFile: String,
    tempDir: Path,
    resolved: MutableSet<String>
): List<DescriptorProtos.FileDescriptorProto> {
    val schema = generateSchema(protoPaths, protoFile, tempDir)
    return schema.fileList
        .filter { resolved.add(it.name) }
        .flatMap { fd ->
            fd.dependencyList
                .filterNot(resolved::contains)
                .flatMap { lookupProtos(protoPaths, it, tempDir, resolved) } + fd
        }
}

private fun generateSchema(
    protoPaths: List<String>,
    protoFile: String,
    tempDir: Path
): DescriptorProtos.FileDescriptorSet {
    val outFile = Files.createTempFile(tempDir, null, null)
    val stderr = ByteArrayOutputStream()
    val exitCode = Protoc.runProtoc(
        (protoPaths.map { "--proto_path=$it" } + listOf("--descriptor_set_out=$outFile", protoFile)).toTypedArray(),
        DevNull,
        stderr
    )
    if (exitCode != 0) {
        throw IllegalStateException("Failed to generate schema for: $protoFile")
    }
    return Files.newInputStream(outFile).use { DescriptorProtos.FileDescriptorSet.parseFrom(it) }
}

The idea is to use os72/protoc-jar to write out a compiled schema/file descriptor. Then use FileDescriptorSet.parseFrom to read that file, and recurse on its dependencies.

Comments

1

An alternaive to "shelling out" to exec protoc would be to use a .proto parser written in Java. There seem to be a few floating around - Google something like "proto parser in java". (I'm looking for one for an issue in my project).

1 Comment

Example script using the "wire-schema" library: gist.github.com/jmini/a0241af9a2f13a51532d1f1448ee38e6
-1

Don't use java String to hold the protobuf payload. The issue is that String does translations behind the scenes, and makes assumptions about character sets.

Protobuf works on byte arrays, and the exact representation in the array has to be unchanged. Going to and from String does not work.

2 Comments

That depends on whether they're loading the data, vs loading a schema. A schema (in .proto format) is text.
As Bob said, I am trying to parse a text file so I guess String should not be a problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.