31

When I like to know how a algorithm in the C++ Standard Library could be implemented, I always look at http://en.cppreference.com/w/cpp/algorithm, which is a great source. But sometimes I don't understand some implementation details and I would need some explanation why something is done that particular way. For example in the implementation of std::copy_n, why the first assignment is made outside the loop and the loop therefore starts with 1?

template< class InputIt, class Size, class OutputIt>
OutputIt copy_n(InputIt first, Size count, OutputIt result)
{
    if (count > 0) {
        *result++ = *first;
        for (Size i = 1; i < count; ++i) {
            *result++ = *++first;
        }
    }
    return result;
}

Additionally: Do you know a site where possible algorithm implementations are explained?

3
  • 2
    this possible implementation is based on the one found in libc++ Commented Jul 15, 2013 at 21:02
  • 2
    Also, general note: possible implementations on cppreference are only provided when a functionally correct, but generally far from optimal, implementation can be written in just a few lines of code: std::shuffle is pushing it, std::stable_sort is not gonna happen. Do look at real implementations in the C++ standard libraries such as LLVM libc++, GNU libstdc++, or what-have-you, if you're interested in what really happens (e.g. when std::copy is compiled as std::memmove) Commented Jul 15, 2013 at 22:33
  • @Cubbi: Before I asked this question, I looked at the MSVC implementation. I had to comb through a lot of function calls to reach the actuall implementation. Therefore I like the site cppreference for its "just a few lines of code" which are yet correct as the copy_n example showed. The algorithm implementations give an inspiration, how to write my one generalized, iterator based algorithms. I want to thank you for your valuable comments to this question and the given answers, I think it was you, how pointed out the bug with the naive implementation of copy_n. Commented Jul 16, 2013 at 7:34

4 Answers 4

21

Compare it with the naive implementation:

template< class InputIt, class Size, class OutputIt>
OutputIt copy_n(InputIt first, Size count, OutputIt result)
{
  for (Size i = 0; i < count; ++i) {
    *result++ = *first++;
  }
  return result;
}

This version does one more increment of first!

  1. count==0, both do 0 increments of first.

  2. count==1, their version does zero increments of first. The above version does 1.

  3. count==2, their version does one increments of first. The above version does 2.

A possibility is to handle iterators that are dereferenceable, but not incrementable. At least in STL days, there was a distinction. I am not sure if input iterators have this property today.

Here is a bug that seems to occur if you use the naive implementation, and Here is some documentation that claims "The actual read operation is performed when the iterator is incremented, not when it is dereferenced."

I have not yet tracked down the chapter-and-verse for the existence of dereferenceable, non-incrementable input iterators. Apparently the standard details how many times copy_n dereferences the input/output iterators, but does not detail how many times it increments the input iterator.

The naive implementation increments the input iterator one more time than the non-naive implementation. If we have a single-pass input iterator that reads on ++ with insufficient space, copy_n could block needlessly on further input, trying to read data past the end of the input stream.

Sign up to request clarification or add additional context in comments.

15 Comments

@RanEldan Who has an extra condition? Where?
@JimBuck Give an example input, and count the if-tests run, that shows an extra if-test? I'm not seeing it. Both do 1 more if-test than the number of items processed, exactly.
+1, but it may help to point out specifically that the naive implementation is wrong because copy_n has to support input iterators
@ChristianAmmer: Each call to postfix operator++ requires the creation of a copy of the iterator in the current state before performing the increment.
|
13

That is just an implementation. The implementation in GCC 4.4 is different (and conceptually simpler):

template<typename InputIterator, typename _Size, typename _OutputIterator>
_OutputIterator
copy_n(_InputIterator __first, _Size __n,
     _OutputIterator __result)
{
  for (; __n > 0; --__n)
{
  *__result = *__first;
  ++__first;
  ++__result;
}
  return __result;
}

[With a bit of handwaving, since I only provided the implementation when the input iterator is an input iterator, there is a different implementation for the case where the iterator is a random access iterator] That implementation has a bug in that it increments the input iterator one time more than expected.

The implementation in GCC 4.8 is a bit more convoluted:

template<typename _InputIterator, typename _Size, typename _OutputIterator>
_OutputIterator
copy_n(_InputIterator __first, _Size __n,
     _OutputIterator __result)
{
  if (__n > 0)
{
  while (true)
    {
      *__result = *__first;
      ++__result;
      if (--__n > 0)
    ++__first;
      else
    break;
    }
}
  return __result;
}

6 Comments

As noted here, the above implementation is incorrect. (What @Cubbi said, with more words)
@Yakk You should make an answer with that link, showing that a more naive implementation would be functionally wrong (due to the extra increment).
@Cubby: Correct, there is a bug in this implementation. Updated with the latest implementation in g++ 4.8
The MSVC 2012 library also has slightly different implementations for input iterators, forward iterators, and pointers to scalars (calling memmove() in the last case).
Does the standard say anything about how the iterators may be incremented? I can't see it.
|
7

With the naive implementation, you increment the input iterator n times, not just n - 1 times. This is not just potentially inefficient (since iterators can have arbitrary and arbitrarily expensive user-defined types), but it may also be outright undesirable when the input iterator doesn't support a meaningful "past-the-end" state.

For a simple example, consider reading n elements from std::cin:

#include <iostream>    // for std:cin
#include <iterator>    // for std::istream_iterator


std::istream_iterator it(std::cin);
int dst[3];

With the naive solution, the program blocks on the final increment:

int * p = dst;

for (unsigned int i = 0; i != 3; ++i) { *p++ = *it++; }   // blocks!

The standard library algorithm doesn't block:

#include <algorithm>

std::copy_n(it, 3, dst);    // fine

Note that the standard doesn't actually explicitly speak about iterator increments. It only says (25.3.1/5) that copy_n(first, n, result) has

Effects: For each non-negative integer i < n, performs *(result + i) = *(first + i).

There is only a note in 24.2.3/3:

[input-iterator] algorithms can be used with istreams as the source of the input data through the istream_iterator class template.

3 Comments

Does increment cheeper than condition?
@RanEldan: Iterator increment can be a user-defined operation, so it can be as expensive as you like. And you're doing n comparisons either way!
General comment: My current understanding is that the standard makes no requirements on the state of a stream after it's been subjected on an algorithm that consumes an input iterator for that stream. I don't see anything that forbids the naive implementation. Cubbi's link above to the GCC bug is an interesting example of unexpected behaviour when "reusing" streams (and it is not clear to me that it is an actual bug, rather than a very insidious gotcha).
1

Because of the initial check

if (count > 0)

we know that count > 0, therefore the author of that code felt that he didn't need to test against count again until he reached the value of 1. Remember that "for" executes the conditional test at the start of every iteration, not at the end.

Size count = 1;
for (Size i = 1; i < count; ++i) {
    std::cout << i << std::endl;
}

would print nothing.

Thus the code eliminates a conditional branch, and if Size is 1, it eliminates the need to increment/adjust "first" - hence it being a pre-increment.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.