3

I need to deserialize an array (JSON) of a type let call Foo. I have implemented this and it works well for most stuff, but I have noticed the latest version of the data will sometimes include erroneous empty objects.

Prior to this change, each Foo can be de-serialized to the following enum:

#[derive(Deserialize)]
#[serde(untagged)]
pub enum Foo<'s> {
    Error {
        // My current workaround is using Option<Cow<'s, str>>
        error: Cow<'s, str>,
    },
    Value {
        a: u32,
        b: i32,
        // etc.
    }
}

/// Foo is part of a larger struct Bar.
#[derive(Deserialize)]
#[serde(untagged)]
pub struct Bar<'s> {
    foos: Vec<Foo<'s>>,
    // etc.
}

This struct may represent one of the following JSON values:

// Valid inputs
[]
[{"a": 34, "b": -23},{"a": 33, "b": -2},{"a": 37, "b": 1}]
[{"error":"Unable to connect to network"}]
[{"a": 34, "b": -23},{"error":"Timeout"},{"a": 37, "b": 1}]

// Possible input for latest versions of data 
[{},{},{},{},{},{},{"a": 34, "b": -23},{},{},{},{},{},{},{},{"error":"Timeout"},{},{},{},{},{},{}]

This does not happen very often, but it is enough to cause issues. Normally, the array should include 3 or less entries, but these extraneous empty objects break that convention. There is no meaningful information I can gain from parsing {} and in the worst cases there can be hundreds of them in one array.

I do not want to error on parsing {} as the array still contains other meaningful values, but I do not want to include {} in my parsed data either. Ideally I would also be able to use tinyvec::ArrayVec<[Foo<'s>; 3]> instead of a Vec<Foo<'s>> to save memory and reduce time spent performing allocation during paring, but am unable to due to this issue.

How can I skip {} JSON values when deserializing an array with serde in Rust?

I also put together a Rust Playground with some test cases to try different solutions.

2
  • I don't think you can without a custom Deserialize implementation. Commented Sep 8, 2022 at 18:17
  • Yea, I suspect I need to use #[serde(deserialize_with = "foobar")], but I don't know how I would write one for this use case that can detect empty objects. Commented Sep 8, 2022 at 18:25

2 Answers 2

6

serde_with::VecSkipError provides a way to ignore any elements which fail deserialization, by skipping them. This will ignore any errors and not only the empty object {}. So it might be too permissive.

#[serde_with::serde_as]
#[derive(Deserialize)]
pub struct Bar<'s> {
    #[serde_as(as = "serde_with::VecSkipError<_>")]
    foos: Vec<Foo<'s>>,
}

Playground

Sign up to request clarification or add additional context in comments.

3 Comments

This seems like by far the most convenient solution, but I do worry that it would cover up meaningful errors. I can see this being useful in cases where you have control over the data format and want to filter out malformed results, but since I am receiving data from a third-party source it is harder to determine who is at fault for errors during parsing.
In that case be careful with the solution of BallpointBen. The check for empty objects {} is not quite right. For example, {"a": "Foo"} will be treated as empty and ignored.
Good catch. I'll make sure to apply #[serde(deny_unknown_fields)] to the relevant structs.
2

The simplest, but not performant, solution would be to define an enum that captures both the Foo case and the empty case, deserialize into a vector of those, and then filter that vector to get just the nonempty ones.

#[derive(Deserialize, Debug)]
#[serde(untagged)]
pub enum FooDe<'s> {
    Nonempty(Foo<'s>),
    Empty {},
}

fn main() {
    let json = r#"[
        {},{},{},{},{},{},
        {"a": 34, "b": -23},
        {},{},{},{},{},{},{},
        {"error":"Timeout"},
        {},{},{},{},{},{}
    ]"#;
    let foo_des = serde_json::from_str::<Vec<FooDe>>(json).unwrap();
    let foos = foo_des
        .into_iter()
        .filter_map(|item| {
            use FooDe::*;
            match item {
                Nonempty(foo) => Some(foo),
                Empty {} => None,
            }
        })
        .collect();
    let bar = Bar { foos };
    println!("{:?}", bar);

    // Bar { foos: [Value { a: 34, b: -23 }, Error { error: "Timeout" }] }
}

Conceptually this is simple but you're allocating a lot of space for Empty cases that you ultimately don't need. Instead, you can control exactly how deserialization is done by implementing it yourself.

struct BarVisitor<'s> {
    marker: PhantomData<fn() -> Bar<'s>>,
}

impl<'s> BarVisitor<'s> {
    fn new() -> Self {
        BarVisitor {
            marker: PhantomData,
        }
    }
}

// This is the trait that informs Serde how to deserialize Bar.
impl<'de, 's: 'de> Deserialize<'de> for Bar<'s> {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        impl<'de, 's: 'de> Visitor<'de> for BarVisitor<'s> {
            // The type that our Visitor is going to produce.
            type Value = Bar<'s>;

            fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
                formatter.write_str("a list of objects")
            }

            fn visit_seq<V>(self, mut access: V) -> Result<Self::Value, V::Error>
            where
                V: SeqAccess<'de>,
            {
                let mut foos = Vec::new();

                while let Some(foo_de) = access.next_element::<FooDe>()? {
                    if let FooDe::Nonempty(foo) = foo_de {
                        foos.push(foo)
                    }
                }

                let bar = Bar { foos };

                Ok(bar)
            }
        }

        // Instantiate our Visitor and ask the Deserializer to drive
        // it over the input data, resulting in an instance of Bar.
        deserializer.deserialize_seq(BarVisitor::new())
    }
}

fn main() {
let json = r#"[
        {},{},{},{},{},{},
        {"a": 34, "b": -23},
        {},{},{},{},{},{},{},
        {"error":"Timeout"},
        {},{},{},{},{},{}
    ]"#;
    let bar = serde_json::from_str::<Bar>(json).unwrap();
    println!("{:?}", bar);

    // Bar { foos: [Value { a: 34, b: -23 }, Error { error: "Timeout" }] }
}

3 Comments

I was able to adapt this to a function I could apply with #[deserialize_with = "skip_empty"]. The code for that can be found on this Rust Playground. However, I did remove access.size_hint() since it seemed likely that it would partially defeat the purpose of the skipping empty objects on some data formats by still allocating space for them.
You might want to make the checks for FooDe tighter, because right now {"a": "Foo"} is treated as empty. If only the empty object is allowed, you probably need to change the order and mix in some deny_unknown_fields.
@Locke good catch about Vec::with_capacity(). @jonasbb Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.