2

I am porting some of my more complex C++ code over to Rust as a way to learn the language. One thing I have is a map of values keyed by a std::string held inside the value type, to avoid copying the string into the maps key I use a string_view. Because of how it's used, the key will always point to the valid string held in its value. The Rust below represents the structures I am porting.

use std::collections::HashMap;
use std::sync::Arc;

struct BaseThing {
    name: String,
    data1: String,
    data2: f32,
    // and many more bits of data
}

struct Component {
    pub base_thing: Arc<BaseThing>,
    pub power: i32,
}

type ComponentsMap = HashMap<String, Component>;

struct Value {
    pub components: ComponentsMap,
}

fn add_component(value: &mut Value, base_thing: Arc<BaseThing>, power: i32) {
    value.components.insert(base_thing.name.clone(), Component { base_thing, power });
}

Ideally I want the map to be a keyed by a reference to the contained string. Something like this:

type ComponentsMap = HashMap<&str, Component>;

But I can't figure the borrow checker voodoo to make that work, if it can work at all.

3
  • This question is similar to: Struct property as key and the struct itself as value in HashMaps. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. Commented May 22 at 6:15
  • 3
    Interesting timing: a crate for this was just released: github.com/oxidecomputer/iddqd Commented May 22 at 8:03
  • You can convert your String to Rc<str> then clone the Rc (which only increments the reference counter, without copying the data), store one clone in your struct and use the other clone as key in the HashMap. Commented May 22 at 11:56

4 Answers 4

4

This is fundamentally impossible. See Why can't I store a value and a reference to that value in the same struct?

However, there is a solution. You can do even better than the C++ code.

The key is to use hashbrown::HashTable. hashbrown is the crate that powers the hashmap in std, and its HashTable is a "raw" hash map, with more basic API. Instead of keys and values, you store just values. And the map doesn't know to hash keys or to compare them: you need to provide callback to do that on every operation.

You can build a wrapper around that, where "the key" is just the data you use for the hashing and comparison. You need to be careful, though, to not mutate the key - in other words, the hash and equality of an entry must not change once inserted, otherwise anything can occur (except undefined behavior - e.g. panics, infinite loops, etc.).

Sign up to request clarification or add additional context in comments.

Comments

3

As noted in other answers, this is difficult in general as it would involve self-references.

But as Vi. suggests in their answer, you can use a custom key type to avoid the extra allocation. And in your specific use case, the BaseThing is already wrapped in an Arc, so you can implement the custom key type yourself as a wrapper around the Arc, which is cheap to clone, like so:

use std::borrow::Borrow;
use std::hash::{Hash, Hasher};
use std::sync::Arc;

struct BaseThingName(Arc<BaseThing>);

// The `Borrow` trait implies a contract that `BaseThingName` "behaves
// identically" to the returned value (`self.0.name`), i.e., the `Eq` and `Hash`
// implementations yield the same result as `name` would, so that `HashMap::get`
// can look up the thing by the `name`, relying on the contract.
impl Borrow<str> for BaseThingName {
    fn borrow(&self) -> &str {
        &self.0.name
    }
}

impl Eq for BaseThingName {}

impl PartialEq for BaseThingName {
    fn eq(&self, other: &Self) -> bool {
        self.0.name == other.0.name
    }
}

impl Hash for BaseThingName {
    fn hash<H: Hasher>(&self, hasher: &mut H) {
        self.0.name.hash(hasher);
    }
}

(Playground)

(See the official documentation for details on the trait requirements for the key type of a HashMap.)

To use the custom key, replace the key type of ComponentsMap with the wrapper type and update the insertion code to use it instead of a String:

type ComponentsMap = HashMap<BaseThingName, Component>;

pub fn add_component(value: &mut Value, base_thing: Arc<BaseThing>, power: i32) {
    value.components.insert(
        BaseThingName(Arc::clone(&base_thing)),
        Component { base_thing, power },
    );
}

You can look up components by the name even though the key is now a wrapper for the BaseThing as a whole, thanks to the Borrow trait implementation for BaseThingName:

#[test]
fn it_works() {
    let mut value = Value {
        components: ComponentsMap::new(),
    };
    let base_thing = Arc::new(BaseThing {
        name: "hello".to_owned(),
        // …
    });
    add_component(&mut value, base_thing, 42);
    assert_eq!(value.components.get("hello").unwrap().power, 42);
}

Even further, you could implement the traits for Component (or a wrapper around it) instead, and insert it into a HashSet instead of a HashMap:

pub struct SetComponent(Component);

type ComponentsSet = HashSet<SetComponent>;

pub struct Value {
    pub components: ComponentsSet,
    // …
}

impl Borrow<str> for SetComponent {
    fn borrow(&self) -> &str {
        &self.0.base_thing.name
    }
}

impl Eq for SetComponent {}

impl PartialEq for SetComponent {
    fn eq(&self, other: &Self) -> bool {
        self.0.base_thing.name == other.0.base_thing.name
    }
}

impl Hash for SetComponent {
    fn hash<H: Hasher>(&self, hasher: &mut H) {
        self.0.base_thing.name.hash(hasher);
    }
}

pub fn add_component(value: &mut Value, base_thing: Arc<BaseThing>, power: i32) {
    value
        .components
        .insert(SetComponent(Component { base_thing, power }));
}

(Playground)

Now it should be even cheaper than the original C++ code (in theory), because the set doesn't allocate for (the smart pointers to) the keys by itself.

1 Comment

Thanks for that, it taught me a bit about the language, but I'll use the iddqd crate for my problem, as it's exactly what I need.
1

Ideally I want the map to be a keyed by a reference to the contained string, but I can't figure the borrow checker voodoo to make that work, if it can work at all.

As others have noted you can't do that directly becausse the borrow checker is lexical. However there are two ways around that:

  1. use a HashSet (or BTreeSet, or IndexSet) and implement Hash and Eq to only use the name attribute of your component (use a wrapper / newtype to do that if you need a full eq/hash on the value struct), aka the second half of tesaguri's answer), although that can be constraining for lookups (it's not clear whether you're looking for the unicity or the indexing)
  2. take a gander at existing crates in the field which solve this exact issue e.g. oxide's recent iddqd or the older multi-index-map

Comments

0

This would make hash table entries self-referential. This is better to be avoided in Rust.

One of the ways to avoid excessive allocations/duplication of String keys use some custom, external type instead of just String.

If many strings are expected to be short, something like smartstring can be used. It would inline small enough keys into the handles themselves.

---

If it is OK to have external "context" to resolve actual names of keys (i.e. ComponentsMap not being self-sufficient), string internment can be used (e.g. from string-interner crate).

struct Components {
   map:  HashMap<string_interner::symbol::SymbolUsize, Component>,
   interner: string_interner::StringInterner,
}

This way using actual key (as a string) becomes harder, but you can control how much memory it takes.

The same Symbol can be used inside the value as well, thus only the handle would be duplicated, not the actual string content.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.