3

I have a string problem when writing my lua dissector. My packet looks like:

0000   00 00 00 69 00 10 00 01 00 00 00 ed 00 00 00 0c
0010   bf a6 5f ...

When debugging, the tvb looks the same

enter image description here

The byte at offset 0x10 is 0xbf, but in my dissector function I got different result, here' my code:

local str = buf(0x10):string()
local x = string.byte(str, 1)

the variable x should be 0xbf, but it's 0xef, and some other offset are also 0xef:

local str = buf(0x11):string()
local x = string.byte(str, 1) -- also get 0xef, should be 0xa6

local str = buf(11):string()
local x = string.byte(str, 1) -- also get 0xef, should be 0xed

Seems big values will always get 0xef as result, like 0xa6/0xbf/0xed...

And small values will be correct, like 0x69/0x5f/0x0c...

I'm using the latest wireshark 2.0, is this a bug?

3
  • 1
    What is :string()? Commented Feb 5, 2016 at 10:29
  • Sorry I didn't explain well, post updated. And :string() is wireshark built-in function which converts tvb to a string Commented Feb 6, 2016 at 3:23
  • Try checking the values: buf(0x10), buf(0x10):string() too. Commented Feb 6, 2016 at 3:27

2 Answers 2

7

I don't know much about Wireshark in particular, but I have a pretty good idea what's going on.

You are using Wireshark's tvbrange:string([encoding]) function. The documentation I have found on the Wireshark website says that the default encoding is ENC_ASCII. Bytes in the range of 0x80-0xFF (for which you have reported problems) are not valid ASCII.

What Wireshark is probably doing is converting these to U+FFFD, Unicode's "Replacement Character". This is a standard practice for representing an unknown character in a Unicode string.

Then, Wireshark is probably encoding this string as UTF-8 when returning to Lua. The first byte of U+FFFD's UTF-8 encoding is 0xEF, so that's what you see.

If you want to get the raw byte values from a TVB, maybe try the tvbrange:bytes([encoding]) function to get the values. e.g.

local bytes = buf(0x10):bytes()
local x = bytes:get_index(0) -- maybe 1, I'm not sure if it would be 0 or 1 indexed

There also may be some encoding you can pass to tvbrange:string that would do what you want, but I couldn't find any good reference for this.

Sign up to request clarification or add additional context in comments.

Comments

7
+500

Assuming that buf refers to the parameter passed to your dissection routine, it is of type Tvb. When you call it (as in, buf(0x10)), you create a TvbRange instance. Both of them are documented here: https://www.wireshark.org/docs/wsdg_html_chunked/lua_module_Tvb.html

tehtmi is spot on about the reason why you obtain the wrong results, tvbrange:string() returns a string using the ASCII encoding (since the encoding parameter was omitted).

A way to obtain the raw bytes buffer (rather than converting it to a ASCII or UTF-8 string) is:

local x = buf:raw(0x10, 1)

(Using offset 16 and length 1.)

If you ever think about using buf(0x10):raw() directly, note that for some reason this would return the full data source that is backing this Tvb. Maybe a bug or feature... Workaround:

local bytes = buf(0x10)
local x = bytes:raw(bytes:offset(), bytes:len())

2 Comments

Thanks, the raw function should make things a lot easier! I was looking at the documentation on the wiki, wiki.wireshark.org/LuaAPI/Tvb, which doesn't seem to mention raw.
@legoscia The Wireshark's Lua API Reference Manual will always be more accurate than the wiki because it is directly generated from the documentation in the Wireshark C code. See also the note on top of wiki.wireshark.org/LuaAPI

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.