How to implement an intrusive linked list that avoids undefined behavior?

Question

For the third time in a few years I find myself needing an intrusive linked list for a project that does not allow boost (ask management...).

For the third time I find that the intrusive linked list implementation I have works perfectly, but that I really don't like that it makes use of undefined behavior - namely when converting the pointer to a list node into a pointer to the object containing that list node.

That awful code currently looks like this:

struct IntrusiveListNode {
    IntrusiveListNode * next_;
    IntrusiveListNode * prev_;
};

template <typename T, IntrusiveListNode T::*member>
class IntrusiveList {
// snip ...
private:
    T & nodeToItem_(IntrusiveListNode & node) {
        return *(T*)(((char*)&node)-((size_t)&(((T*)nullptr)->*member)));
    }

    IntrusiveListNode root_;
};

I don't really care how ugly nodeToItem_ gets, but I would like to keep the public interface and syntax of IntrusiveList the same. Specifically, I would like to specify the type of a list type using IntrusiveList<Test, &Test::node_> rather than IntrusiveList<Test, offsetof(Test, node_)>.

It's almost 2016 - is there any way to do this without invoking undefined behavior?

Edit: There have been a few suggested solutions (involving different structures of the list) in the comments which I want to summarize here:

Live with undefined behavior, since the language has seemingly arbitrary limitations that prevent using member pointers in reverse.
Store an additional pointer to the containing class within IntrusiveListNode. This is currently probably the cleanest solution (no change in interface necessary), but does require a third pointer in every list node (small optimizations may be possible).
Derive from IntrusiveListNode and use static_cast. In boost this is the base_hook version of an intrusive linked list. I would like to stick with the member_hook version to avoid introducing multiple inheritance.
Store pointers to the next and previous containing class instead of to the next and previous list node within IntrusiveListNode. This makes creating the root node within the intrusive list difficult. Either the list must include a full instantiation of T (which is not possible e.g. if T is abstract), or the end of the list would need to be a null pointer (which would break --list.end(), allowing forward iteration only).
Boost intrusive lists have a member_hook version that works somehow, but the implementation has not been understood (and it possibly also relies on undefined behavior).

The question remains: is it possible to make an intrusive member-based list with bidirectional iteration support, no undefined behavior, and no "unnecessary" memory overhead?

Coudnt parse your return statement. Check if github.com/arun11299/Cpp-Intrusive-list/blob/master/… helps. I implemented it long time back. Not sure at what point I left it, but basic things should work. — Arunmu
– Arunmu, Commented Dec 7, 2015 at 13:52
@ddriver No dynamic allocations at run time is the biggest drive behind using intrusive linked lists. Plus the fact that objects will be in several lists at the same time, though that could of course be solved using lists of pointers. — zennehoy
– zennehoy, Commented Dec 7, 2015 at 13:56
@Arunmu Your implementation requires that items to be placed in the list must be derived from ListNode. This only allows items to be added to a single list. I need the node as member variant of an intrusive linked list. — zennehoy
– zennehoy, Commented Dec 7, 2015 at 13:59
I guess I'm not seeing it but...: can't you use a T*s in your list node information and avoid the need to determine the outer node based on member? As far as I know there is no way in the C++ standard to obtain a containing object based on a member (things like offsetof() certainly don't work with non-standard layout types). — Dietmar Kühl
– Dietmar Kühl, Commented Dec 7, 2015 at 14:02

Dietmar Kühl · Accepted Answer · 2015-12-07 20:37:16Z

I would side-step the problem and use a node<T> containing suitable members to link the range. To deal with a bidirectional, intrusive list I'd use an asymmetric node<T> like this:

template <typename T>
class intrusive::node
{
    template <typename S, node<S> S::*> friend class intrusive::list;
    template <typename S, node<S> S::*> friend class intrusive::iterator;

    T*       next;
    node<T>* prev;
public:
    node(): next(), prev() {}
    node(node const&) {}
    void operator=(node const&) {}
};

The basic idea is that the list<T, L> contains a node<T> using the next pointer to point to the first element. That's fairly straight forward: given a pointer p to a T the link to the next node can be traversed using (p->*L).next. However, instead of directly navigating the list using T*, an iterator<T, L> actually uses a pointer to the node<T>: while that isn't necessary for forward traversal, it enables backward traversal and insertions anywhere in the list without special treatment of the list head.

The copy constructor and copy assignment are defined to do nothing to avoid half-inserted nodes when copying a node. Depending on the needs of the nodes it may be more reasonable to rather = delete these operations. However, that's unrelated to the question at hand.

The iterator just uses a pointer to the node<T> whose next member points at the current node. For the first element in the list this is a pointer to the list<T, L>'s node<T> member. Assuming you got a pointer to a suitable node<T> an iterator<T, L> can be created from that:

template <typename T, intrusive::node<T> T::*Link>
class intrusive::iterator
{
    template <typename S, node<S> S::*> friend class intrusive::list;
    node<T>* current;

public:
    explicit iterator(node<T>* current): current(current) {}
    T& operator*() { return *this->operator->(); }
    T* operator->() { return this->current->next; }
    bool operator== (iterator const& other) const {
        return this->current == other.current;
    }
    bool operator!= (iterator const& other) const {
        return !(*this == other);
    }
    iterator& operator++() {
        this->current = &(this->current->next->*Link);
        return *this;
    }
    iterator operator++(int) {
        iterator rc(*this);
        this->operator++();
        return rc;
    }
    iterator& operator--() {
        this->current = this->current->prev;
        return *this;
    }
    iterator operator--(int) {
        iterator rc(*this);
        this->operator--();
        return rc;
    }
};

Dereferencing just uses the next pointer. The same is true for forward iteration which uses the next pointer together with the member pointer to get hold of the address of the next node<T>. Since the iterator's prev already pointesr at a node<T> backward iteration just needs to replace the current node<T> with the prev element.

Finally, this leaves a list maintaining the beginning and the end of the list. Dealing with the bidirectional access and the corresponding access to the last node adds a bit of complexity and the need to actually have a dedicated node. Here is an implementation (which isn't thoroughly tested: I may have messed up some of the links):

template <typename T, intrusive::node<T> T::*Link>
class intrusive::list
{
    node<T> content;

public:
    list() { this->content.prev = &this->content; }
    iterator<T, Link> begin() { return iterator<T, Link>(&this->content); }
    iterator<T, Link> end() { return iterator<T, Link>(this->content.prev); }

    T& front() { return *this->content.next; }
    T& back() { return *(this->content.prev->prev->next); }
    bool empty() const { return &this->content == this->content.prev; }
    void push_back(T& node) { this->insert(this->end(), node); }
    void push_front(T& node) { this->insert(this->begin(), node); }
    void insert(iterator<T, Link> pos, T& node) {
        (node.*Link).next = pos.current->next;
        ((node.*Link).next
         ? (pos.current->next->*Link).prev 
         : this->content.prev) = &(node.*Link);
        (node.*Link).prev = pos.current;
        pos.current->next = &node;
    }
    iterator<T, Link> erase(iterator<T, Link> it) {
        it.current->next = (it.current->next->*Link).next;
        (it.current->next
         ? (it.current->next->*Link).prev
         : this->content.prev) = it.current;
        return iterator<T, Link>(&(it.current->next->*Link));
    }
};

Just for a bit of sanity: here is a function to simply print the list:

template <typename T, intrusive::node<T> T::*Link>
std::ostream& intrusive::operator<< (std::ostream& out, intrusive::list<T, Link>& list)
{
    out << "[";
    if (!list.empty()) {
        std::copy(list.begin(), --list.end(), std::ostream_iterator<T>(out, ", "));
        out << list.back();
    }
    return out << "]";
}

There are few other approaches avoiding the need to do any funky access to the enclosing class. The above avoids a couple of conditions. Assuming I managed to set the appropriate links correct the code would not rely on any implementation defined or undefined behavior.

You'd use the list like this:

class Node {
public:
    intrusive::node<Node> link0;
    intrusive::node<Node> link1;
    int                   n;
    Node(int n): n(n) {}
};
std::ostream& operator<< (std::ostream& out, Node const& n) {
    return out << n.n;
}

int main()
{
    intrusive::list<Node, &Node::link0> l0;
    intrusive::list<Node, &Node::link1> l1;

    Node n[] = { 10, 11, 12, 13, 14, 15 };

    l0.push_front(n[0]);
    l0.push_front(n[1]);
    l0.push_front(n[2]);

    l1.push_back(n[0]);
    l1.push_back(n[1]);
    l1.push_back(n[2]);

    std::cout << "l0=" << l0 << " l1=" << l1 << "\n";
}

Brilliant, this is exactly the answer I was hoping for! I noticed a small bug in the erase method, which currently returns an iterator two after the erased node, rather than directly after the erased node (current of the returned iterator should point to the node prior to the node that was erased, such that current->next will return the node after the one erased). A simple return it; does the trick.

pcfist · Accepted Answer · 2023-09-13 15:12:45Z

4

The question remains: is it possible to make an intrusive member-based list with bidirectional iteration support, no undefined behavior, and no "unnecessary" memory overhead?

What you are trying to do is take a non-static data member of a C++ object, and convert it to a pointer to its containing class. In order to do that, you have to do some operation of the form:

node_ptr *ptr = ...;
auto p = reinterpret_cast<char*>(ptr) + offset;
T *t = reinterpret_cast<T*>(p);

To make this operation legal C++, you need all of the following to be well-defined:

Getting an byte offset from the particular NSDM for the node to the T that contains it.
Applying that offset to a pointer-to-a-member will result in a pointer value that is legal to cast to its owning type T.

Item 1 is only possible in well-defined C++ via offsetof; the standard provides no other way to compute that offset. And offsetof requires the type (in this case T) to be standard layout.

Of course, offsetof requires the name of the member as a parameter. And you can't pass parameter names through template arguments and the like; you have to do it through a macro. Unless you're willing to force the user to name the member in a specific way.

So there are your restrictions: T must be standard layout, and you have to either use a macro instead of a direct function call, or you must force the user to use a specific name for the member. If you do this, you should be safe, according to C++.

Here's what the code would look like:

struct intrusive_list_node
{
  intrusive_list_node *next;
  intrusive_list_node *prev;

  template<typename T, size_t offset> T *convert()
  {
    auto p = reinterpret_cast<char*>(this); //Legal conversion, preserves address.
    p -= offset; //Legal offset, so long as `offset` is correct
    return reinterpret_cast<T*>(p); //`p` has the same value representation as `T*` did originally, so should be legal.
  }
}

#define CONVERT_FROM_MEMBER(node, T, member_name) node->convert<T, offsetof(T, member_name)>()

edited Sep 13, 2023 at 15:12

pcfist

1931 gold badge3 silver badges9 bronze badges

answered Dec 7, 2015 at 17:56

Nicol Bolas

481k66 gold badges862 silver badges1.1k bronze badges

18 Comments

dtech Over a year ago

Kind of offtopic, but I wonder what exactly is the argumentation for not supporting offsetof for all types. I mean the compiler obviously knows the exact object binary layout.

Arunmu Over a year ago

Boost currently does what the OP is doing for GNU, Intel etc. For MSVC it folllows the ABI to get the correct offset. Function to look 'offset_from_pointer_to_member'.

Arunmu Over a year ago

Maybe the need for standard ABI is the real answer to this question :)

dtech Over a year ago

@Arunmu - that would be great but is not likely to happen, for some reason people are very committed to supporting exotic platforms with odd rules that have long been extinct. I myself am currently implementing a programming language, and happily "limited" it to only supporting 99.99999999999% of all operational devices in existence :)

Ben Voigt Over a year ago

@ddriver: The reason that offsetof is not allowed on "all types" is the same reason a pointer-to-data-member in the general case is much more than an offset: virtual inheritance is evil and breaks many things. To be exact, when virtual inheritance is involved, the compiler does NOT know the exact object layout, and must compute it at runtime with the help of virtual-base-subobject pointers. I have a feeling that "standard-layout" is unnecessarily restrictive, but there doesn't seem to be a term for "types not involving virtual inheritance".

|

Fabian Knorr · Accepted Answer · 2015-12-07 18:12:04Z

If you don't mind changing the IntrusiveListNode type, you can have a node contain a handle pointing to the previous / next node - you'll only have to do the node -> handle lookup, not the reverse.

template<typename Node>
struct IntrusiveListHandle {
    Node *next = nullptr;
    // and Node* prev, etc ...
};

template<typename Node, IntrusiveListHandle<Node> Node::*handle>
struct IntrusiveList {
    Node *first;    

    static Node *next(Node *n) {
        auto h = (n->*handle).next;
    }
};

Usage example:

#include <iostream>

struct Test {
    IntrusiveListHandle<Test> handle;
    std::string value;

    Test(const std::string &v): value(v) {}
};

template<typename IntrusiveList>
void print(const IntrusiveList &list) {
    for (Test *n = list.first; n; n = list.next(n)) {
        std::cout << n->value << "\n";
    }
}

int main() {
    Test hello("hello");    
    Test world("world!");
    hello.handle.next = &world;
    IntrusiveList<Test, &Test::handle> list;
    list.first = &hello;
    print(list);
}

You should avoid undefined behaviour at all costs as compilers are getting smarter and smarter in exploiting UB for optimization - code that works fine now may suddenly break with the next compiler update.

I see that you mentioned reverse iteration. --end() would not work with this code, but the usual approach is to provide both a begin()/end() and an rbegin()/rend() pair to allow reverse iteration.

Jens · Accepted Answer · 2015-12-07 16:23:20Z

1

I think you can achieve the benefits using CRTP:

#include <iostream>
using namespace std;

template<typename T>
struct ListNode
{
    ListNode<T>* next;

    // this would be nodeToItem in the list class
    T* value()
    {
        return static_cast<T*>(this);
    }
};

// This would be your abstract base class
struct A: public ListNode<A>
{
    A(int i): x(i) {}
    virtual ~A() = 0;
    int x;
};

inline A::~A() {}

struct B: public A
{
    B(int i): A(i) {}
    virtual ~B() {}
};

template<typename T>
class IntrusiveList {
public:
IntrusiveList(ListNode<T>* ptr): root(ptr) 
{
    ptr->next = nullptr;
}

void append(ListNode<T>* ptr)
{
    ptr->next = root;
    root = ptr;
}

ListNode<T>* begin() {return root;}
private:
ListNode<T>* root;
};

int main() {
    B b(10);
    B b2(11);
    IntrusiveList<A> l(&b);
    l.append(&b2);

    for(ListNode<A>* n=l.begin(); n != nullptr; n = n->next)
    {
         std::cout << n->value()->x << std::endl;
    }
    return 0;
}

Having the elements in more than one list should be possible by using an array of ListNode pointers in the struct, and passing the index of the array to the list class as either a template parameter or constructor argument. Iterator would also need to store the index in the ListNode array.

edited Dec 7, 2015 at 16:23

answered Dec 7, 2015 at 16:18

Jens

9,4483 gold badges30 silver badges49 bronze badges

7 Comments

JSF Over a year ago

The OP's objections to that design were implied in the question and well explained in the comments before you posted that answer.

zennehoy Over a year ago

This is basically the base_hook version of the intrusive linked list, where the data class is derived from the list node. I'm looking for a solution given a member_hook, where the list node is a member of the data class. Also, given iteration as you show involving a nullptr end, the list root and all next pointers could just be T*s.

Jens Over a year ago

@JSF I haven't seen any discussion of that approach. The question just says he wants an intrusive list without UB, and I think the solution offers this. The comments then discuss details of the "offset-approach", but not CRTP.

JSF Over a year ago

I was using CRTP for years before I ever learned that it was called that. The discussion of having the node be a base class (rather than member) of T implied CRTP because it was still clear and necessary that the node be templated on T. In an existing discussion of a base class templated on the derived class, "CRTP" is just a buzzword, not a new answer.

Ben Voigt Over a year ago

@JSF: Since your goal in using member_hook is "avoid multiple inheritance", I'd like to point out that you can do that here too with a slight modification: template<typename T, TBase> struct ListNode : TBase and then struct A : Base becomes struct A : ListNode<A, Base>

|

Guillaume Racicot · Accepted Answer · 2015-12-07 15:23:46Z

-1

You can Hardly get the original object with the pointer of one of it's member without invoking UB. Why you absolutely can't? Because a IntrusiveListNode can be hold anywhere. There is no clues that a specific IntrusiveListNode is held in a T and another proof you can't do that: The compiler can't know if the node sent to your function is really held in a T. What you are trying to do is undefined behaviour. The right way to do this would be to add a pointer to it's container in IntrusiveListNode.

template<typename T>
struct IntrusiveListNode {
    IntrusiveListNode * next_;
    IntrusiveListNode * prev_;
    T* item_;
};

template <typename T, IntrusiveListNode<T> T::*member>
class IntrusiveList {
// snip ...
private:
    T & nodeToItem_(IntrusiveListNode<T> & node) {
        return *(node->item_);
    }

    IntrusiveListNode<T> root_;
};

If you can't use template for IntrusiveListNode, you can use void* instead of T*

You can see an example of an implementation of an intrusive linked list here

edited Dec 7, 2015 at 15:23

answered Dec 7, 2015 at 14:58

Guillaume Racicot

42.7k9 gold badges93 silver badges163 bronze badges

7 Comments

user743382 Over a year ago

The question doesn't ask to prove that the IntrusiveListNode is really a member of T, it assumes that it is, and an answer that has defined behaviour if and only if it really is would be wholly appropriate. Your approach works, but at the cost of an additional data member that should not be needed. Besides, by your logic, this is still totally horribly broken: how can you prove that the T takes care to set the list node's item_ properly? You can't. You're assuming the class does this properly. Such assumptions are perfectly valid, but they're equally valid when they're made by the OP.

Guillaume Racicot Over a year ago

This code I probably the best to do what the OP asked for, get rid of the undefined behaviour. He didn't ask for the IntrusiveListNode to be same size in the question, just the public interface to be the same. That's why I proposed to change T* to a void* to keep public interface the same. The downvote is not justified.

user743382 Over a year ago

Quoting one of the OP's comments, when thinking that someone else was suggesting the same thing as in your answer: "Yes, that would be possible, and I have considered it, but it does increase the size of every list node by another pointer." I do believe the OP is looking for the IntrusiveListNode not to increase in size.

JSF Over a year ago

Even after the edit, your basic point about the member vs outer class relationship would apply exactly the same way between a base class and outer class. In a static cast from base class to outer class, the compiler must trust the programmer that this base class was actually a part of the right outer class. The compiler's job in navigating from base class to outer object is identical to navigating from member to outer object. The problem seems to be in the language arbitrary rules, rather than anything wrong with the requested operation.

Guillaume Racicot Over a year ago

sorry, forgot to change the content of the function.

|

Jimmy Venema · Accepted Answer · 2015-12-15 13:59:03Z

With templates it is hard to do. It is possible with macro's, so the needed members _next, _prev etc are in the scope of the object itself, not inside a separate template object. Using the macro's it is possible to avoid typing in code that is very similar every time. In fact I created a Case tool "ClassBuilder" (http://sourceforge.net/projects/classbuilder/) years ago that writes code using macro's to create data structures that uses intrusive linked lists. In the area I work, the normal templated linked list stuff is just way to slow. In our business, it is normal to work with very large in core data structures which are also very dynamic. Thus lots of removals and additions and searches on the lists. With the tool you totally abstracts from the actual implementation, you just create class diagrams and generate the code from there. In a relative simple test case we did, the run time performance of the generated code was 40s and 400s for a C++ solution using the "normal" STL kind of implementation. A C# implementation of the same test case was aborted after a few hours of running. Its implementation was similar to teh STL one, but this one was hit very hard by the Garbage Collector. Due to the dynamic behavior of the test case all memory that could be reclaimed could only be reclaimed in a full scan.

Collectives™ on Stack Overflow

How to implement an intrusive linked list that avoids undefined behavior?

6 Answers 6

1 Comment

18 Comments

Comments

7 Comments

7 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

1 Comment

18 Comments

Comments

7 Comments

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related