0

I think my solution is more complex than it should be, but I'm not familiar enough with Rust yet to know a better way. Below is the only solution I got working after much trial and error.

Rust Playground


Context

  1. Read in from an XML file that describes the records structure of a binary file. This step is skipped in the working code below and is simply simulated. See comments in the code blocks.
  2. One of the fields each record provides is a data_type description which is stored as a &str or String depending on how the Rust struct holding the records is defined.
  3. Read the records from the binary file using the XML description information to automate which type the resulting parsed &[u8] is cast into so that it can be used later. This data is also simulated in the code below since parsing the binary file is not the focus.

Example of a Simplified XML Description File

Assume many more records and many more data types. This is not used in the working code.

use quick_xml;
let xml_desc = r#""
                <Records>
                    <RecordDesc>
                        <name>Single</name>
                        <number>1</number>
                        <location unit="byte">0</location>
                        <data_type>IEEE754LSBSingle</data_type>
                        <length> unit="byte">4</length>
                    </RecordDesc>
                    <RecordDesc>
                        <name>Double</name>
                        <number>1</number>
                        <location unit="byte">5</location>
                        <data_type>IEEE754LSBDouble</data_type>
                        <length> unit="byte">8</length>
                    </RecordDesc>
                </Records>
                ""#;
// quick_xml/serde used to get that string of xml into the rust records
                
#[derive(Serialize, Deserialize, Debug)]
pub struct Records {
    pub records: Vec<Record>,
}

#[derive(Serialize, Deserialize, Debug)]
pub struct Record {
    ...,
    pub data_type: &str, //could also be String, but this example doesn't use this struct
    ...,
}

 
// For each record use data_type to cast into type rust can use


Working Code

First Macro

Creates functions that convert the &mut &[u8] into specific Rust equivalent types. Example output shown in first_example.

macro_rules! type_cast_function {
    ($func_name:ident, $endian:ident, $output_type:ty ) => {
        fn $func_name(input: &mut &[u8]) -> $output_type {
            let (int_bytes, _) = input.split_at(std::mem::size_of::<$output_type>());
            <$output_type>::$endian(int_bytes.try_into().unwrap())

        }
    };
}

Second Macro

Creates impl blocks for unwraping each specific value from the varients in DataTypes. Example output shown in first_example.

macro_rules! create_unwrap_impl_for_type {
    ($unwrap_name:ident, $variant:path, $output_type:ty) => {
        impl DataTypes { 
            pub fn $unwrap_name(self) -> $output_type {
                match self {
                    $variant(val) => val,
                    _ => panic!(),
                }
            }
        }
    };
}

Create Enum for Various Data Types

Note: Case is reflective of the case in the xml_desc

#[derive(Debug)]
pub enum DataTypes {
    // 4 Bytes
    IEEE754LSBSingle(f32),
    // 8 Bytes
    IEEE754LSBDouble(f64),
}

First Example

Matches data_type: &str descriptions and generates the relevant function and impl block for unwrapping the value for each match to be used elsewhere.

fn first_example(){
    // Simulated Data that would come from parsing the binary file
    let mut data: &[u8] = &[172, 152, 111, 195];
    let mut data2: &[u8] = &[172, 152, 111, 195, 117, 93, 133, 192];
    
    // Simulated looping through records with different types
    for dtype in ["IEEE754LSBSingle", "IEEE754LSBDouble"] {
        match dtype {
            "IEEE754LSBSingle" => {
                create_unwrap_impl_for_type!(unwrap_le_f32,DataTypes::IEEE754LSBSingle,f32);
                /* 
                outputs:
                    impl DataTypes { 
                        pub fn unwrap_le_f32(self) -> f32 {
                            match self {
                                DataTypes::IEEE754LSBSingle(val) => val,
                                _ => panic!(),
                            }
                        }
                    }
                */
                type_cast_function!(read_le_f32, from_le_bytes, f32);
                /* 
                outputs:
                    fn read_le_f32(input: &mut &[u8]) -> f32 {
                        let (int_bytes, _) = input.split_at(std::mem::size_of::<f32>());
                        f32::from_le_bytes(int_bytes.try_into().unwrap())
                    }
                */
                let single = DataTypes::IEEE754LSBSingle(read_le_f32(&mut data)).unwrap_le_f32();
                println!("First Example\tIEEE754LSBSingle {:?}",single);    
            },
            "IEEE754LSBDouble" => {
                create_unwrap_impl_for_type!(unwrap_le_f64,DataTypes::IEEE754LSBDouble,f64);
                /* 
                outputs:
                    impl DataTypes { 
                        pub fn unwrap_le_f64(self) -> f64 {
                            match self {
                                DataTypes::IEEE754LSBDouble(val) => val,
                                _ => panic!(),
                            }
                        }
                    }
                */
                type_cast_function!(read_le_f64, from_le_bytes, f64);
                /* 
                outputs:
                    fn read_le_f64(input: &mut &[u8]) -> f64 {
                        let (int_bytes, _) = input.split_at(std::mem::size_of::<f64>());
                        f64::from_le_bytes(int_bytes.try_into().unwrap())
                    }
                */
                let double = DataTypes::IEEE754LSBDouble(read_le_f64(&mut data2)).unwrap_le_f64();
                println!("First Example\tIEEE754LSBDouble {:?}",double);
                },
                _ => panic!(),
        };
        
       
    }
}

One Macro to Rule Them All

One macro for creating the function and impl blocks from the other macros. Makes the difference between the first_example above and the second_example below

macro_rules! generate_casting_extraction_functions {
    ($func_name:ident, $endian:ident, $unwrap_name:ident, $variant:path, $output_type:ty) => {
        create_unwrap_impl_for_type!($unwrap_name, $variant, $output_type);
        type_cast_function!($func_name, $endian, $output_type);
    }
}

Second Example

Matches data_type: &str descriptions and generates the relevant function and impl block for unwrapping the value for each match to be used elsewhere.

fn second_example(){
    // Simulated Data that would come from parsing the binary file
    let mut data: &[u8] = &[172, 152, 111, 195];
    let mut data2: &[u8] = &[172, 152, 111, 195, 117, 93, 133, 192];
    
    // Simulated looping through records with different types
    for dtype in ["IEEE754LSBSingle", "IEEE754LSBDouble"] {
        match dtype {
            "IEEE754LSBSingle" => {
                // Same output as first_example
                generate_casting_extraction_functions!(read_le_f32_2, from_le_bytes,unwrap_le_f32_2,DataTypes::IEEE754LSBSingle,f32);
                let single = DataTypes::IEEE754LSBSingle(read_le_f32_2(&mut data)).unwrap_le_f32_2();
                println!("Second Example\tIEEE754LSBSingle {:?}",single);    
            },
            "IEEE754LSBDouble" => {
            // Same output as first_example
                generate_casting_extraction_functions!(read_le_f64_2, from_le_bytes,unwrap_le_f64_2,DataTypes::IEEE754LSBDouble,f64);
                let double = DataTypes::IEEE754LSBDouble(read_le_f64_2(&mut data2)).unwrap_le_f64_2();
                println!("Second Example\tIEEE754LSBDouble {:?}",double);
                },
                _ => panic!(),
        };
        
       
    }
}
fn main() {
    first_example();
    second_example();
}
5
  • Can you give us some context at the top of the question? What is the overall purpose of this code? What problem are you solving? Commented Oct 13, 2022 at 19:10
  • @JohnKugelman I've updated the context. Essentially go from a string or string slice description of "IEEE754LSBSingle", for example, and cast it into a type that can be used in Rust Commented Oct 13, 2022 at 19:29
  • So the <Records> is a kind of index telling where each data type is in the binary file? Do you know all of the possible data types ahead of time? If so, you should have serde deserialize data_type as an enum instead, rather than using strings. Commented Oct 13, 2022 at 19:52
  • @PitaJ Records is an index of the descriptions of each data contained in the binary file. Where they're located, what types they are, human readable information about them, etc. All potential types are known and the beginnings of them are shown in the DataTypes enum described. Types may or may not vary from each record. I think I'd run into an issue with the serde deserialize data_type with the DataTypes enum because wouldn't it need the data in the associated type before it would be available? Commented Oct 13, 2022 at 20:06
  • Yes, you would have to have a second enum with just unit variants. Commented Oct 13, 2022 at 20:09

1 Answer 1

1

Here's what I would do.

First of all, you want two enums:

  • one with only unit variants that only serves to better represent the data_type string, which I will call DataKind
#[derive(Clone, Copy, Debug)]
enum DataKind {
    // 4 bytes
    IEEE754LSBSingle,
    // 8 bytes
    IEEE754LSBDouble,

    ...etc
}
  • and one that will hold the data you parse from the binary file
#[derive(Debug)]
enum DataTypes {
    // 4 bytes
    IEEE754LSBSingle(f32),
    // 8 bytes
    IEEE754LSBDouble(f64),

    ...etc
}

Then, you want a function to interpret the necessary number of input bytes for the target type and store the result in the corresponding DataTypes value:

impl DataKind {
    fn parse(self, input: &mut &[u8]) -> DataTypes {
        match self {
            DataKind::IEEE754LSBSingle => DataTypes::IEEE754LSBSingle({
                let (bytes, _) = input.split_at(std::mem::size_of::<f32>());
                f32::from_le_bytes(bytes.try_into().unwrap())
            }),
            DataKind::IEEE754LSBDouble => DataTypes::IEEE754LSBDouble({
                let (bytes, _) = input.split_at(std::mem::size_of::<f64>());
                f64::from_le_bytes(bytes.try_into().unwrap())
            }),

            ...etc
        }
    }
}

Thankfully, it's pretty easy to generate all of these at once with a macro:

macro_rules! generate_datatypes_parsing {
    [$( $name:ident($target_type:ty => $conversion:ident) ),+ $(,)*] => {
        #[derive(Clone, Copy, Debug)]
        pub enum DataKind {
            $( $name, )*
        }
        
        #[derive(Debug)]
        pub enum DataTypes {
            $( $name($target_type), )*
        }
        
        impl DataKind {
            fn parse(self, input: &mut &[u8]) -> DataTypes {
                match self {
                $(
                    DataKind::$name => DataTypes::$name({
                        let (bytes, _) = input.split_at(
                            std::mem::size_of::<$target_type>()
                        );
                        <$target_type>::$conversion(bytes.try_into().unwrap())
                    }),
                )*
                }
            }
        }
    };
}

generate_datatypes_parsing![
    IEEE754LSBSingle(f32 => from_le_bytes),
    IEEE754LSBDouble(f64 => from_le_bytes),

    ...etc
];

Then you can use DataKind::parse like so:

fn main() {
    // Simulated Data that would come from parsing the binary file
    let mut data: &[u8] = &[172, 152, 111, 195];
    let mut data2: &[u8] = &[172, 152, 111, 195, 117, 93, 133, 192];
    
    // parsing will eventually go in a loop somewhere
    println!("First Example\t{:?}", DataKind::IEEE754LSBSingle.parse(&mut data));
    println!("First Example\t{:?}", DataKind::IEEE754LSBDouble.parse(&mut data2));
}

playground

Why DataKind?

It's best to use an enum like DataKind, because this way you get more guarantees from the compiler. It's also far easier to pass around a Copy enum with no lifetimes than a &str with some lifetime you need to worry about.

Of course, you should #[derive(Deserialize)] for DataKind so serde can do that conversion from &str for you.

Return a Result

You may want to return a Result from fn parse with a custom error type. If you do, I'd recommend using a custom split_at function that also returns a Result if it goes out of bounds.

Sign up to request clarification or add additional context in comments.

8 Comments

I started to work on things from your suggestion, but then had to step away from the computer. I'll take a deeper look at this tomorrow. I definitely had a disconnect returning DataTypes from a function, like you do in parse, because I thought generics were the solution and got wrapped up in that for a while before I got my solution. I still have yet to properly implement a generic so I'm bound to do it again until I really understand them. Since this is a question asking about the most Rustic way, I'm going to let responses sit for a while regardless to see what the community thinks.
Is there a way to get something like IEEE754LSBSingleArray([f32;2] => from_le_bytes) to work in your example, or would a second macro be needed to create an impl block with parse_array() logic be needed? I created a separate macro for that behavior which still worked with how I had things laid out above. It used the tail from let (bytes, _ ) = input.split_at(...); instead of ignoring it
@William Here's a version that supports calling any conversion function (at the cost of repeating a little more) and that automatically advances the cursor play.rust-lang.org/…
@William have you had a chance to look at this?
yes, thank you for your response. I'm currently working on a proc_macro solution as an alternative. It's not working right now and I can't work on it after today for the weekend, but here's it's current state: github.com/RodogInfinite/binary_type_cast
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.