18

For the third time in a few years I find myself needing an intrusive linked list for a project that does not allow boost (ask management...).

For the third time I find that the intrusive linked list implementation I have works perfectly, but that I really don't like that it makes use of undefined behavior - namely when converting the pointer to a list node into a pointer to the object containing that list node.

That awful code currently looks like this:

struct IntrusiveListNode {
    IntrusiveListNode * next_;
    IntrusiveListNode * prev_;
};

template <typename T, IntrusiveListNode T::*member>
class IntrusiveList {
// snip ...
private:
    T & nodeToItem_(IntrusiveListNode & node) {
        return *(T*)(((char*)&node)-((size_t)&(((T*)nullptr)->*member)));
    }

    IntrusiveListNode root_;
};

I don't really care how ugly nodeToItem_ gets, but I would like to keep the public interface and syntax of IntrusiveList the same. Specifically, I would like to specify the type of a list type using IntrusiveList<Test, &Test::node_> rather than IntrusiveList<Test, offsetof(Test, node_)>.

It's almost 2016 - is there any way to do this without invoking undefined behavior?


Edit: There have been a few suggested solutions (involving different structures of the list) in the comments which I want to summarize here:

  1. Live with undefined behavior, since the language has seemingly arbitrary limitations that prevent using member pointers in reverse.

  2. Store an additional pointer to the containing class within IntrusiveListNode. This is currently probably the cleanest solution (no change in interface necessary), but does require a third pointer in every list node (small optimizations may be possible).

  3. Derive from IntrusiveListNode and use static_cast. In boost this is the base_hook version of an intrusive linked list. I would like to stick with the member_hook version to avoid introducing multiple inheritance.

  4. Store pointers to the next and previous containing class instead of to the next and previous list node within IntrusiveListNode. This makes creating the root node within the intrusive list difficult. Either the list must include a full instantiation of T (which is not possible e.g. if T is abstract), or the end of the list would need to be a null pointer (which would break --list.end(), allowing forward iteration only).

  5. Boost intrusive lists have a member_hook version that works somehow, but the implementation has not been understood (and it possibly also relies on undefined behavior).

The question remains: is it possible to make an intrusive member-based list with bidirectional iteration support, no undefined behavior, and no "unnecessary" memory overhead?

36
  • I'd try to fix management first. :-D Commented Dec 7, 2015 at 13:36
  • Coudnt parse your return statement. Check if github.com/arun11299/Cpp-Intrusive-list/blob/master/… helps. I implemented it long time back. Not sure at what point I left it, but basic things should work. Commented Dec 7, 2015 at 13:52
  • @ddriver No dynamic allocations at run time is the biggest drive behind using intrusive linked lists. Plus the fact that objects will be in several lists at the same time, though that could of course be solved using lists of pointers. Commented Dec 7, 2015 at 13:56
  • @Arunmu Your implementation requires that items to be placed in the list must be derived from ListNode. This only allows items to be added to a single list. I need the node as member variant of an intrusive linked list. Commented Dec 7, 2015 at 13:59
  • 1
    I guess I'm not seeing it but...: can't you use a T*s in your list node information and avoid the need to determine the outer node based on member? As far as I know there is no way in the C++ standard to obtain a containing object based on a member (things like offsetof() certainly don't work with non-standard layout types). Commented Dec 7, 2015 at 14:02

6 Answers 6

12

I would side-step the problem and use a node<T> containing suitable members to link the range. To deal with a bidirectional, intrusive list I'd use an asymmetric node<T> like this:

template <typename T>
class intrusive::node
{
    template <typename S, node<S> S::*> friend class intrusive::list;
    template <typename S, node<S> S::*> friend class intrusive::iterator;

    T*       next;
    node<T>* prev;
public:
    node(): next(), prev() {}
    node(node const&) {}
    void operator=(node const&) {}
};

The basic idea is that the list<T, L> contains a node<T> using the next pointer to point to the first element. That's fairly straight forward: given a pointer p to a T the link to the next node can be traversed using (p->*L).next. However, instead of directly navigating the list using T*, an iterator<T, L> actually uses a pointer to the node<T>: while that isn't necessary for forward traversal, it enables backward traversal and insertions anywhere in the list without special treatment of the list head.

The copy constructor and copy assignment are defined to do nothing to avoid half-inserted nodes when copying a node. Depending on the needs of the nodes it may be more reasonable to rather = delete these operations. However, that's unrelated to the question at hand.

The iterator just uses a pointer to the node<T> whose next member points at the current node. For the first element in the list this is a pointer to the list<T, L>'s node<T> member. Assuming you got a pointer to a suitable node<T> an iterator<T, L> can be created from that:

template <typename T, intrusive::node<T> T::*Link>
class intrusive::iterator
{
    template <typename S, node<S> S::*> friend class intrusive::list;
    node<T>* current;

public:
    explicit iterator(node<T>* current): current(current) {}
    T& operator*() { return *this->operator->(); }
    T* operator->() { return this->current->next; }
    bool operator== (iterator const& other) const {
        return this->current == other.current;
    }
    bool operator!= (iterator const& other) const {
        return !(*this == other);
    }
    iterator& operator++() {
        this->current = &(this->current->next->*Link);
        return *this;
    }
    iterator operator++(int) {
        iterator rc(*this);
        this->operator++();
        return rc;
    }
    iterator& operator--() {
        this->current = this->current->prev;
        return *this;
    }
    iterator operator--(int) {
        iterator rc(*this);
        this->operator--();
        return rc;
    }
};

Dereferencing just uses the next pointer. The same is true for forward iteration which uses the next pointer together with the member pointer to get hold of the address of the next node<T>. Since the iterator's prev already pointesr at a node<T> backward iteration just needs to replace the current node<T> with the prev element.

Finally, this leaves a list maintaining the beginning and the end of the list. Dealing with the bidirectional access and the corresponding access to the last node adds a bit of complexity and the need to actually have a dedicated node. Here is an implementation (which isn't thoroughly tested: I may have messed up some of the links):

template <typename T, intrusive::node<T> T::*Link>
class intrusive::list
{
    node<T> content;

public:
    list() { this->content.prev = &this->content; }
    iterator<T, Link> begin() { return iterator<T, Link>(&this->content); }
    iterator<T, Link> end() { return iterator<T, Link>(this->content.prev); }

    T& front() { return *this->content.next; }
    T& back() { return *(this->content.prev->prev->next); }
    bool empty() const { return &this->content == this->content.prev; }
    void push_back(T& node) { this->insert(this->end(), node); }
    void push_front(T& node) { this->insert(this->begin(), node); }
    void insert(iterator<T, Link> pos, T& node) {
        (node.*Link).next = pos.current->next;
        ((node.*Link).next
         ? (pos.current->next->*Link).prev 
         : this->content.prev) = &(node.*Link);
        (node.*Link).prev = pos.current;
        pos.current->next = &node;
    }
    iterator<T, Link> erase(iterator<T, Link> it) {
        it.current->next = (it.current->next->*Link).next;
        (it.current->next
         ? (it.current->next->*Link).prev
         : this->content.prev) = it.current;
        return iterator<T, Link>(&(it.current->next->*Link));
    }
};

Just for a bit of sanity: here is a function to simply print the list:

template <typename T, intrusive::node<T> T::*Link>
std::ostream& intrusive::operator<< (std::ostream& out, intrusive::list<T, Link>& list)
{
    out << "[";
    if (!list.empty()) {
        std::copy(list.begin(), --list.end(), std::ostream_iterator<T>(out, ", "));
        out << list.back();
    }
    return out << "]";
}

There are few other approaches avoiding the need to do any funky access to the enclosing class. The above avoids a couple of conditions. Assuming I managed to set the appropriate links correct the code would not rely on any implementation defined or undefined behavior.

You'd use the list like this:

class Node {
public:
    intrusive::node<Node> link0;
    intrusive::node<Node> link1;
    int                   n;
    Node(int n): n(n) {}
};
std::ostream& operator<< (std::ostream& out, Node const& n) {
    return out << n.n;
}

int main()
{
    intrusive::list<Node, &Node::link0> l0;
    intrusive::list<Node, &Node::link1> l1;

    Node n[] = { 10, 11, 12, 13, 14, 15 };

    l0.push_front(n[0]);
    l0.push_front(n[1]);
    l0.push_front(n[2]);

    l1.push_back(n[0]);
    l1.push_back(n[1]);
    l1.push_back(n[2]);

    std::cout << "l0=" << l0 << " l1=" << l1 << "\n";
}
Sign up to request clarification or add additional context in comments.

1 Comment

Brilliant, this is exactly the answer I was hoping for! I noticed a small bug in the erase method, which currently returns an iterator two after the erased node, rather than directly after the erased node (current of the returned iterator should point to the node prior to the node that was erased, such that current->next will return the node after the one erased). A simple return it; does the trick.
4

The question remains: is it possible to make an intrusive member-based list with bidirectional iteration support, no undefined behavior, and no "unnecessary" memory overhead?

What you are trying to do is take a non-static data member of a C++ object, and convert it to a pointer to its containing class. In order to do that, you have to do some operation of the form:

node_ptr *ptr = ...;
auto p = reinterpret_cast<char*>(ptr) + offset;
T *t = reinterpret_cast<T*>(p);

To make this operation legal C++, you need all of the following to be well-defined:

  1. Getting an byte offset from the particular NSDM for the node to the T that contains it.
  2. Applying that offset to a pointer-to-a-member will result in a pointer value that is legal to cast to its owning type T.

Item 1 is only possible in well-defined C++ via offsetof; the standard provides no other way to compute that offset. And offsetof requires the type (in this case T) to be standard layout.

Of course, offsetof requires the name of the member as a parameter. And you can't pass parameter names through template arguments and the like; you have to do it through a macro. Unless you're willing to force the user to name the member in a specific way.

So there are your restrictions: T must be standard layout, and you have to either use a macro instead of a direct function call, or you must force the user to use a specific name for the member. If you do this, you should be safe, according to C++.

Here's what the code would look like:

struct intrusive_list_node
{
  intrusive_list_node *next;
  intrusive_list_node *prev;

  template<typename T, size_t offset> T *convert()
  {
    auto p = reinterpret_cast<char*>(this); //Legal conversion, preserves address.
    p -= offset; //Legal offset, so long as `offset` is correct
    return reinterpret_cast<T*>(p); //`p` has the same value representation as `T*` did originally, so should be legal.
  }
}

#define CONVERT_FROM_MEMBER(node, T, member_name) node->convert<T, offsetof(T, member_name)>()

18 Comments

Kind of offtopic, but I wonder what exactly is the argumentation for not supporting offsetof for all types. I mean the compiler obviously knows the exact object binary layout.
Boost currently does what the OP is doing for GNU, Intel etc. For MSVC it folllows the ABI to get the correct offset. Function to look 'offset_from_pointer_to_member'.
Maybe the need for standard ABI is the real answer to this question :)
@Arunmu - that would be great but is not likely to happen, for some reason people are very committed to supporting exotic platforms with odd rules that have long been extinct. I myself am currently implementing a programming language, and happily "limited" it to only supporting 99.99999999999% of all operational devices in existence :)
@ddriver: The reason that offsetof is not allowed on "all types" is the same reason a pointer-to-data-member in the general case is much more than an offset: virtual inheritance is evil and breaks many things. To be exact, when virtual inheritance is involved, the compiler does NOT know the exact object layout, and must compute it at runtime with the help of virtual-base-subobject pointers. I have a feeling that "standard-layout" is unnecessarily restrictive, but there doesn't seem to be a term for "types not involving virtual inheritance".
|
2

If you don't mind changing the IntrusiveListNode type, you can have a node contain a handle pointing to the previous / next node - you'll only have to do the node -> handle lookup, not the reverse.

template<typename Node>
struct IntrusiveListHandle {
    Node *next = nullptr;
    // and Node* prev, etc ...
};

template<typename Node, IntrusiveListHandle<Node> Node::*handle>
struct IntrusiveList {
    Node *first;    

    static Node *next(Node *n) {
        auto h = (n->*handle).next;
    }
};

Usage example:

#include <iostream>

struct Test {
    IntrusiveListHandle<Test> handle;
    std::string value;

    Test(const std::string &v): value(v) {}
};

template<typename IntrusiveList>
void print(const IntrusiveList &list) {
    for (Test *n = list.first; n; n = list.next(n)) {
        std::cout << n->value << "\n";
    }
}

int main() {
    Test hello("hello");    
    Test world("world!");
    hello.handle.next = &world;
    IntrusiveList<Test, &Test::handle> list;
    list.first = &hello;
    print(list);
}

You should avoid undefined behaviour at all costs as compilers are getting smarter and smarter in exploiting UB for optimization - code that works fine now may suddenly break with the next compiler update.

I see that you mentioned reverse iteration. --end() would not work with this code, but the usual approach is to provide both a begin()/end() and an rbegin()/rend() pair to allow reverse iteration.

Comments

1

I think you can achieve the benefits using CRTP:

#include <iostream>
using namespace std;

template<typename T>
struct ListNode
{
    ListNode<T>* next;

    // this would be nodeToItem in the list class
    T* value()
    {
        return static_cast<T*>(this);
    }
};

// This would be your abstract base class
struct A: public ListNode<A>
{
    A(int i): x(i) {}
    virtual ~A() = 0;
    int x;
};

inline A::~A() {}

struct B: public A
{
    B(int i): A(i) {}
    virtual ~B() {}
};

template<typename T>
class IntrusiveList {
public:
IntrusiveList(ListNode<T>* ptr): root(ptr) 
{
    ptr->next = nullptr;
}

void append(ListNode<T>* ptr)
{
    ptr->next = root;
    root = ptr;
}

ListNode<T>* begin() {return root;}
private:
ListNode<T>* root;
};

int main() {
    B b(10);
    B b2(11);
    IntrusiveList<A> l(&b);
    l.append(&b2);

    for(ListNode<A>* n=l.begin(); n != nullptr; n = n->next)
    {
         std::cout << n->value()->x << std::endl;
    }
    return 0;
}

Having the elements in more than one list should be possible by using an array of ListNode pointers in the struct, and passing the index of the array to the list class as either a template parameter or constructor argument. Iterator would also need to store the index in the ListNode array.

7 Comments

The OP's objections to that design were implied in the question and well explained in the comments before you posted that answer.
This is basically the base_hook version of the intrusive linked list, where the data class is derived from the list node. I'm looking for a solution given a member_hook, where the list node is a member of the data class. Also, given iteration as you show involving a nullptr end, the list root and all next pointers could just be T*s.
@JSF I haven't seen any discussion of that approach. The question just says he wants an intrusive list without UB, and I think the solution offers this. The comments then discuss details of the "offset-approach", but not CRTP.
I was using CRTP for years before I ever learned that it was called that. The discussion of having the node be a base class (rather than member) of T implied CRTP because it was still clear and necessary that the node be templated on T. In an existing discussion of a base class templated on the derived class, "CRTP" is just a buzzword, not a new answer.
@JSF: Since your goal in using member_hook is "avoid multiple inheritance", I'd like to point out that you can do that here too with a slight modification: template<typename T, TBase> struct ListNode : TBase and then struct A : Base becomes struct A : ListNode<A, Base>
|
-1

You can Hardly get the original object with the pointer of one of it's member without invoking UB. Why you absolutely can't? Because a IntrusiveListNode can be hold anywhere. There is no clues that a specific IntrusiveListNode is held in a T and another proof you can't do that: The compiler can't know if the node sent to your function is really held in a T. What you are trying to do is undefined behaviour. The right way to do this would be to add a pointer to it's container in IntrusiveListNode.

template<typename T>
struct IntrusiveListNode {
    IntrusiveListNode * next_;
    IntrusiveListNode * prev_;
    T* item_;
};

template <typename T, IntrusiveListNode<T> T::*member>
class IntrusiveList {
// snip ...
private:
    T & nodeToItem_(IntrusiveListNode<T> & node) {
        return *(node->item_);
    }

    IntrusiveListNode<T> root_;
};

If you can't use template for IntrusiveListNode, you can use void* instead of T*

You can see an example of an implementation of an intrusive linked list here

7 Comments

The question doesn't ask to prove that the IntrusiveListNode is really a member of T, it assumes that it is, and an answer that has defined behaviour if and only if it really is would be wholly appropriate. Your approach works, but at the cost of an additional data member that should not be needed. Besides, by your logic, this is still totally horribly broken: how can you prove that the T takes care to set the list node's item_ properly? You can't. You're assuming the class does this properly. Such assumptions are perfectly valid, but they're equally valid when they're made by the OP.
This code I probably the best to do what the OP asked for, get rid of the undefined behaviour. He didn't ask for the IntrusiveListNode to be same size in the question, just the public interface to be the same. That's why I proposed to change T* to a void* to keep public interface the same. The downvote is not justified.
Quoting one of the OP's comments, when thinking that someone else was suggesting the same thing as in your answer: "Yes, that would be possible, and I have considered it, but it does increase the size of every list node by another pointer." I do believe the OP is looking for the IntrusiveListNode not to increase in size.
Even after the edit, your basic point about the member vs outer class relationship would apply exactly the same way between a base class and outer class. In a static cast from base class to outer class, the compiler must trust the programmer that this base class was actually a part of the right outer class. The compiler's job in navigating from base class to outer object is identical to navigating from member to outer object. The problem seems to be in the language arbitrary rules, rather than anything wrong with the requested operation.
sorry, forgot to change the content of the function.
|
-2

With templates it is hard to do. It is possible with macro's, so the needed members _next, _prev etc are in the scope of the object itself, not inside a separate template object. Using the macro's it is possible to avoid typing in code that is very similar every time. In fact I created a Case tool "ClassBuilder" (http://sourceforge.net/projects/classbuilder/) years ago that writes code using macro's to create data structures that uses intrusive linked lists. In the area I work, the normal templated linked list stuff is just way to slow. In our business, it is normal to work with very large in core data structures which are also very dynamic. Thus lots of removals and additions and searches on the lists. With the tool you totally abstracts from the actual implementation, you just create class diagrams and generate the code from there. In a relative simple test case we did, the run time performance of the generated code was 40s and 400s for a C++ solution using the "normal" STL kind of implementation. A C# implementation of the same test case was aborted after a few hours of running. Its implementation was similar to teh STL one, but this one was hit very hard by the Garbage Collector. Due to the dynamic behavior of the test case all memory that could be reclaimed could only be reclaimed in a full scan.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.