Introduction
The following code is an implementation of, in ambition, RFC 4648 compliant Base16, Base32 and Base64 conversions as range adaptors and constrained algorithms in C++23. It can be used at compile time or at run time.
I'll take any feedback, but I am especially interested in feedback from people who have experience of std::ranges, functional programming and lazy evaluation. I have not checked any performance aspect yet, but I have tried to make the implementation efficient.
I am also interested in any feedback on RFC 4648 compliance.
A common algorithm
This implementation features a single algorithm for encoding, and a single algorithm for decoding. This comes from the realization that Base16, Base32 and Base64 really are a single encoding/decoding, parametrized by an alphabet whose size is a power of 2. From that size, we can calculate:
- The alphabet symbol width. It is the logarithm in base 2 of the alphabet length: 4 for Base16, 5 for Base32 and 6 for Base64.
- The block size in bits. It is the least common multiple of the symbol width and eight: 8 for Base16, 40 for Base32 and 24 for Base64.
- The number of octets and alphabet symbols in a block: 1 and 2 for Base16, 5 and 8 for Base32, 3 and 4 for Base64.
Padding is used during encoding if the number of octets to encode is not a multiple of the block size. That can of course never happen for Base16, but we can still use a common encoding algorithm. It will just never generate padding for Base16 because the conditions for padding will just never be fulfilled in that case.
On the decoding-side, one single algorithm too, which will detect a number of possible errors:
- Illegal character
- Missing character(s)
- Illegal padding
- Non-canonical encoding.
Error 1 is trivial.
Error 2 is when the total number of character symbols is not a multiple of the block size. For Base16, that happens any time we try to decode an odd number of alphabet symbols.
Error 3 is when padding characters ("=") are found where they should not be. For Base16, that is: everywhere (padding is always illegal in the case of Base16).
Error 4 exists because when padding is used, typically, some encoded bits will not correspond to any decoded bits. Those are supposed to always be zero. Otherwise, the encoding is deemed non-canonical and, like for other errors, the input is rejected.
Range adaptor for encoding, constrained algorithm for decoding
Because encoding cannot fail, a range adaptor, which can be part of a range pipeline, is a great solution. On the other hand, range adaptors are not well adapted to reporting failure. For that, constrained algorithms, which can return anything, are much better. They can for instance return a std::expected type, which is perfect to indicate several possible failures, and this is the choice I have made here (inspired by that review).
I have tried to follow the standard library API conventions as closely as possible in this implementation.
Range concepts and limitations
I have tried to support all kinds of ranges in this implementation, but have run into a limitation on the encoding side, for at least as long as views::cache_last has not been released. This is because encoding, basically, splits the input into chunks (of three octets, for instance, in Base64) and transforms each of those chunks into a chunk of alphabet symbols (four in the case of Base64). Except in the case of Base16, most such symbols are calculated from more than one input octet. The easiest way to achieve that is to apply some kind of multipass, which is not available for input ranges (there are other ways, but they either prevent from reusing standard range adaptors, or have some other drawbacks, like not preserving forward_range properties).
Based on our overall goals and the previous limitation, this implementation, for:
- Encoding: supports
forward_rangeand better, preserves range properties up torandom_access_range, and preservessized_ranges. - Decoding: supports
input_rangeand better (i.e. all kinds of input ranges).
For an implementation that support input_range Base16 encoding, see a Base16-only implementation that this implementation is rooted in.
Some highlights
A few examples of what this implementation provides, focused on Base64:
// RFC 4648 test vectors ("foobar" in ASCII)
static_assert(
equal("\x{66}\x{6F}\x{6F}\x{62}\x{61}\x{72}"sv | encode, "Zm9vYmFy"sv));
static_assert(equal(decode("Zm9vYmFy"sv),
"\x{66}\x{6F}\x{6F}\x{62}\x{61}\x{72}"sv | to_bytes));
// "Many hands make light work." in ASCII
constexpr auto many_etc =
"\x{4D}\x{61}\x{6E}\x{79}\x{20}\x{68}\x{61}\x{6E}\x{64}\x{73}\x{20}\x{6D}\x{61}\x{6B}\x{65}\x{20}"sv
"\x{6C}\x{69}\x{67}\x{68}\x{74}\x{20}\x{77}\x{6F}\x{72}\x{6B}\x{2E}"sv;
static_assert(
std::ranges::random_access_range<decltype(many_etc | encode)>);
static_assert(std::ranges::sized_range<decltype(many_etc | encode)>);
static_assert(std::ranges::size(many_etc | encode) == 36);
// Error handling
static_assert(not decode("TWFuT"sv).has_value() and
decode("TWFuT"sv).error() ==
decodexx_error::missing_character);
static_assert(not decode("bGlnaHQgd2=="sv).has_value() and
decode("bGlnaHQgd2=="sv).error() ==
decodexx_error::non_canonical);
Source code
Quite extensive unit tests are provided. Also: run on godbolt.
#include <algorithm>
#include <cassert>
#include <expected>
#include <functional>
#include <numeric>
#include <ranges>
#include <sstream>
#include <string_view>
enum class decodexx_error : signed char {
illegal_character,
missing_character,
illegal_padding,
non_canonical,
};
template <typename I, typename O>
struct decodexx_error_result {
std::ranges::in_out_result<I, O> in_out_result;
decodexx_error error;
};
namespace detail {
namespace basexx {
constexpr auto padding = '=';
enum class identifier : unsigned char {
base16,
base32,
base32hex,
base64,
};
template <identifier id>
struct alphabet {};
template <identifier id>
inline constexpr auto alphabet_v = alphabet<id>::value;
template <>
struct alphabet<identifier::base16> {
static constexpr auto value = std::string_view{"0123456789ABCDEF"};
};
template <>
struct alphabet<identifier::base32> {
static constexpr auto value =
std::string_view{"ABCDEFGHIJKLMNOPQRSTUVWXYZ234567"};
};
template <>
struct alphabet<identifier::base32hex> {
static constexpr auto value =
std::string_view{"0123456789ABCDEFGHIJKLMNOPQRSTUV"};
};
template <>
struct alphabet<identifier::base64> {
static constexpr auto value = std::string_view{
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"};
};
template <identifier id>
inline constexpr std::size_t alphabet_size{alphabet_v<id>.size()};
template <identifier id>
struct shift_register {};
template <identifier id>
using shift_register_t = shift_register<id>::type;
template <>
struct shift_register<identifier::base16> {
using type = unsigned char;
};
template <identifier id>
requires(id == identifier::base32 or id == identifier::base32hex)
struct shift_register<id> {
using type = unsigned long long;
};
template <>
struct shift_register<identifier::base64> {
using type = unsigned long;
};
template <identifier id>
struct alphabet_properties {
static constexpr int width{std::countr_zero(alphabet_size<id>)};
static constexpr int octet_width{8};
static constexpr int block_width{std::lcm(width, octet_width)};
static constexpr int block_num_octets{block_width / octet_width};
static constexpr int block_num_chars{block_width / width};
static_assert(static_cast<int>(sizeof(shift_register_t<id>)) >=
block_num_octets);
};
template <identifier id>
struct encode : std::ranges::range_adaptor_closure<encode<id>> {
template <std::ranges::viewable_range R>
requires(
std::ranges::forward_range<R> and
std::convertible_to<std::ranges::range_reference_t<R>, std::byte>)
constexpr auto operator()(R &&r) const {
using props = basexx::alphabet_properties<id>;
return std::views::cartesian_product(
std::forward<R>(r) |
std::views::chunk(props::block_num_octets),
std::views::iota(0,
static_cast<int>(props::block_num_chars)) |
std::views::reverse) |
std::views::transform([](auto indexed_chunk) -> char {
auto [chunk, i] = indexed_chunk;
auto j = props::block_num_octets - 1;
unsigned char c{};
for (const auto octet : chunk) {
c |= std::rotr(
static_cast<basexx::shift_register_t<id>>(octet),
(props::width * i) - (props::octet_width * j--));
}
return i < (props::block_num_chars -
(props::octet_width *
(props::block_num_octets - j - 1) +
props::width - 1) /
props::width)
? basexx::padding
: alphabet_v<id>.at(c & (alphabet_size<id> - 1));
});
}
};
} // namespace basexx
namespace decodexx {
template <basexx::identifier id>
class max_size {
static constexpr auto input_size_to_max_size(size_t input_size)
-> std::expected<size_t, decodexx_error> {
using props = basexx::alphabet_properties<id>;
if ((input_size % props::block_num_chars) != 0) {
return std::unexpected{decodexx_error::missing_character};
}
return input_size / props::block_num_chars * props::block_num_octets;
}
public:
template <typename R>
requires(std::ranges::sized_range<R> or std::ranges::forward_range<R>)
// NOLINTNEXTLINE(cppcoreguidelines-missing-std-forward)
constexpr auto operator()(R &&r) const
-> std::expected<size_t, decodexx_error> {
if constexpr (std::ranges::sized_range<R>) {
return input_size_to_max_size(std::ranges::size(r));
} else {
return input_size_to_max_size(std::ranges::distance(r));
}
}
};
} // namespace decodexx
namespace basexx {
template <identifier id>
struct try_decode {
template <std::input_iterator I, std::sentinel_for<I> S,
std::weakly_incrementable O, typename Proj = std::identity>
requires(std::convertible_to<std::indirect_result_t<Proj, I>, char> and
std::indirectly_writable<O, std::byte>)
constexpr auto operator()(I first, S last, O result, Proj proj = {}) const
-> std::expected<std::ranges::in_out_result<I, O>,
decodexx_error_result<I, O>> {
using namespace std::views;
using lookup_t =
std::array<unsigned char,
std::numeric_limits<unsigned char>::max() + 1>;
// The following flags would not work for any alphabet size, but they
// work up to 64, which is enough for us.
static constexpr auto is_valid = 0x40;
static constexpr auto is_padding = 0x80;
static constexpr auto lookup = [] -> lookup_t {
lookup_t a{};
unsigned char i{};
for (const unsigned char c : basexx::alphabet_v<id>) {
a.at(c) = is_valid | i++;
}
a.at(static_cast<unsigned char>('=')) = is_padding | is_valid;
return a;
}();
auto make_error_result =
[first = std::move(first),
result = std::move(result)](decodexx_error error) mutable {
return std::unexpected{decodexx_error_result<I, O>{
.in_out_result = {std::move(first), std::move(result)},
.error = error,
}};
};
while (first != last) {
using props = alphabet_properties<id>;
basexx::shift_register_t<id> shift_register{};
auto num_padding = 0;
auto i = 0;
for (i = 0; i < props::block_num_chars and first != last;
++i, ++first) {
const auto bitset = lookup.at(
static_cast<unsigned char>(std::invoke(proj, *first)));
if (not(bitset & is_valid)) {
return make_error_result(decodexx_error::illegal_character);
}
const auto bitset_is_padding = static_cast<int>(bitset >> 7);
if ((i < 2 and bitset_is_padding) or
(num_padding and not bitset_is_padding)) {
return make_error_result(decodexx_error::illegal_padding);
}
num_padding += bitset_is_padding;
shift_register = (shift_register << props::width) |
(bitset & (alphabet_size<id> - 1));
}
if (i != props::block_num_chars) {
return make_error_result(decodexx_error::missing_character);
}
const int num_missing_octets{
(num_padding * props::width + props::octet_width - 1) /
props::octet_width};
if (shift_register &
((decltype(shift_register){1}
<< (num_missing_octets * props::octet_width)) -
1)) {
return make_error_result(decodexx_error::non_canonical);
}
for (i = 0; i < props::block_num_octets - num_missing_octets; ++i) {
*result++ = static_cast<std::byte>(
shift_register >>
((props::block_num_octets - i - 1) * props::octet_width));
}
}
return std::ranges::in_out_result<I, O>{std::move(first),
std::move(result)};
}
template <std::ranges::input_range R, std::weakly_incrementable O,
typename Proj = std::identity>
requires(std::convertible_to<
std::indirect_result_t<Proj, std::ranges::iterator_t<R>>,
char> and
std::indirectly_writable<O, std::byte>)
// NOLINTNEXTLINE(cppcoreguidelines-missing-std-forward)
constexpr auto operator()(R &&r, O result, Proj proj = {}) const
-> std::expected<
std::ranges::in_out_result<std::ranges::borrowed_iterator_t<R>, O>,
decodexx_error_result<std::ranges::borrowed_iterator_t<R>, O>> {
return (*this)(std::ranges::begin(r), std::ranges::end(r),
std::move(result), std::move(proj));
}
};
template <identifier id, template <typename> typename C>
struct try_decode_to {
template <std::ranges::input_range R, typename Proj = std::identity>
requires std::convertible_to<
std::indirect_result_t<Proj, std::ranges::iterator_t<R>>, char>
constexpr auto operator()(R &&r, Proj proj = {}) const
-> std::expected<C<std::byte>, decodexx_error>;
};
template <identifier id>
struct try_decode_to<id, std::vector> {
template <std::ranges::input_range R, typename Proj = std::identity>
requires std::convertible_to<
std::indirect_result_t<Proj, std::ranges::iterator_t<R>>, char>
constexpr auto operator()(R &&r, Proj proj = {}) const
-> std::expected<std::vector<std::byte>, decodexx_error> {
std::vector<std::byte> v{};
return try_decode<id>{}(std::forward<R>(r),
std::back_insert_iterator{v}, std::move(proj))
.transform(
[v = std::move(v)](auto &&) mutable { return std::move(v); })
.transform_error([](const auto &error) { return error.error; });
}
template <std::ranges::input_range R, typename Proj = std::identity>
requires(std::convertible_to<
std::indirect_result_t<Proj, std::ranges::iterator_t<R>>,
char> and
(std::ranges::sized_range<R> or std::ranges::forward_range<R>))
constexpr auto operator()(R &&r, Proj proj = {}) const
-> std::expected<std::vector<std::byte>, decodexx_error> {
const auto size = decodexx::max_size<id>{}(std::forward<R>(r));
if (not size.has_value()) {
return std::unexpected{size.error()};
}
std::vector<std::byte> v(*size);
const auto result =
try_decode<id>{}(std::forward<R>(r), v.begin(), std::move(proj));
if (result.has_value()) {
v.resize(result->out - v.begin());
}
return result
.transform(
[v = std::move(v)](auto &&) mutable { return std::move(v); })
.transform_error([](const auto &error) { return error.error; });
}
};
template <identifier id, template <typename> typename C>
requires std::default_initializable<C<std::byte>>
struct decode_to {
template <std::ranges::input_range R, typename Proj = std::identity>
requires std::convertible_to<
std::indirect_result_t<Proj, std::ranges::iterator_t<R>>, char>
constexpr auto operator()(R &&r, Proj proj = {}) const -> C<std::byte> {
return try_decode_to<id, C>{}(std::forward<R>(r), std::move(proj))
.value_or(C<std::byte>{});
}
};
} // namespace basexx
} // namespace detail
inline constexpr detail::basexx::encode<detail::basexx::identifier::base16>
encode16{};
inline constexpr detail::basexx::encode<detail::basexx::identifier::base32>
encode32{};
inline constexpr detail::basexx::encode<detail::basexx::identifier::base32hex>
encode32hex{};
inline constexpr detail::basexx::encode<detail::basexx::identifier::base64>
encode64{};
inline constexpr detail::decodexx::max_size<detail::basexx::identifier::base16>
decode16_max_size{};
inline constexpr detail::decodexx::max_size<detail::basexx::identifier::base32>
decode32_max_size{};
inline constexpr detail::decodexx::max_size<
detail::basexx::identifier::base32hex>
decode64_max_size{};
inline constexpr detail::basexx::try_decode<detail::basexx::identifier::base16>
try_decode16{};
inline constexpr detail::basexx::try_decode<detail::basexx::identifier::base32>
try_decode32{};
inline constexpr detail::basexx::try_decode<
detail::basexx::identifier::base32hex>
try_decode32hex{};
inline constexpr detail::basexx::try_decode<detail::basexx::identifier::base64>
try_decode64{};
inline constexpr detail::basexx::try_decode_to<
detail::basexx::identifier::base16, std::vector>
try_decode16_to_vector{};
inline constexpr detail::basexx::try_decode_to<
detail::basexx::identifier::base32, std::vector>
try_decode32_to_vector{};
inline constexpr detail::basexx::try_decode_to<
detail::basexx::identifier::base32hex, std::vector>
try_decode32hex_to_vector{};
inline constexpr detail::basexx::try_decode_to<
detail::basexx::identifier::base64, std::vector>
try_decode64_to_vector{};
inline constexpr detail::basexx::decode_to<detail::basexx::identifier::base16,
std::vector>
decode16_to_vector{};
inline constexpr detail::basexx::decode_to<detail::basexx::identifier::base32,
std::vector>
decode32_to_vector{};
inline constexpr detail::basexx::decode_to<
detail::basexx::identifier::base32hex, std::vector>
decode32hex_to_vector{};
inline constexpr detail::basexx::decode_to<detail::basexx::identifier::base64,
std::vector>
decode64_to_vector{};
namespace {
using namespace std::string_view_literals;
using std::ranges::equal;
constexpr auto to_bytes = std::views::transform(
[](auto num) -> std::byte { return static_cast<std::byte>(num); });
void test_encode16() {
static constexpr auto encode = to_bytes | encode16;
// RFC 4648 test vectors ("foobar" in ASCII)
static_assert(equal(""sv | encode, ""sv));
static_assert(equal("\x{66}"sv | encode, "66"sv));
static_assert(equal("\x{66}\x{6F}"sv | encode, "666F"sv));
static_assert(equal("\x{66}\x{6F}\x{6F}"sv | encode, "666F6F"sv));
static_assert(equal("\x{66}\x{6F}\x{6F}\x{62}"sv | encode, "666F6F62"sv));
static_assert(
equal("\x{66}\x{6F}\x{6F}\x{62}\x{61}"sv | encode, "666F6F6261"sv));
static_assert(equal("\x{66}\x{6F}\x{6F}\x{62}\x{61}\x{72}"sv | encode,
"666F6F626172"sv));
}
void test_decode16() {
// RFC 4648 test vectors ("foobar" in ASCII)
static constexpr auto decode = decode16_to_vector;
static_assert(equal(decode(""sv), ""sv | to_bytes));
static_assert(equal(decode("66"sv), "\x{66}"sv | to_bytes));
static_assert(equal(decode("666F"sv), "\x{66}\x{6F}"sv | to_bytes));
static_assert(equal(decode("666F6F"sv), "\x{66}\x{6F}\x{6F}"sv | to_bytes));
static_assert(
equal(decode("666F6F62"sv), "\x{66}\x{6F}\x{6F}\x{62}"sv | to_bytes));
static_assert(equal(decode("666F6F6261"sv),
"\x{66}\x{6F}\x{6F}\x{62}\x{61}"sv | to_bytes));
static_assert(equal(decode("666F6F626172"sv),
"\x{66}\x{6F}\x{6F}\x{62}\x{61}\x{72}"sv | to_bytes));
}
void test_decode16_error_handling() {
static constexpr auto decode = try_decode16_to_vector;
static_assert(not decode("Z123"sv).has_value() and
decode("Z123"sv).error() ==
decodexx_error::illegal_character);
static_assert(not decode("\x{00}123"sv).has_value() and
decode("\x{00}123"sv).error() ==
decodexx_error::illegal_character);
static_assert(not decode("\x{FF}123"sv).has_value() and
decode("\x{FF}123"sv).error() ==
decodexx_error::illegal_character);
static_assert(not decode("A1234"sv).has_value() and
decode("A1234"sv).error() ==
decodexx_error::missing_character);
// Note: flagging illegal padding, as opposed to illegal character, could
// seem strange in the case of Base16, since padding is never needed in that
// case. But RFC 4648 specifies "=" as the general padding character, so
// flagging its use in Base16 as illegal padding is not a violation of the
// standard. On the other hand, the non-canonical error cannot occur in
// Base16, because it can only occur if actual padding is used.
static_assert(not decode("1=34"sv).has_value() and
decode("1=34"sv).error() == decodexx_error::illegal_padding);
static_assert(not decode("12=4"sv).has_value() and
decode("12=4"sv).error() == decodexx_error::illegal_padding);
}
void test_encode32() {
static constexpr auto encode = to_bytes | encode32;
// RFC 4648 test vectors ("foobar" in ASCII)
static_assert(equal(""sv | encode, ""sv));
static_assert(equal("\x{66}"sv | encode, "MY======"sv));
static_assert(equal("\x{66}\x{6F}"sv | encode, "MZXQ===="sv));
static_assert(equal("\x{66}\x{6F}\x{6F}"sv | encode, "MZXW6==="sv));
static_assert(equal("\x{66}\x{6F}\x{6F}\x{62}"sv | encode, "MZXW6YQ="sv));
static_assert(
equal("\x{66}\x{6F}\x{6F}\x{62}\x{61}"sv | encode, "MZXW6YTB"sv));
static_assert(equal("\x{66}\x{6F}\x{6F}\x{62}\x{61}\x{72}"sv | encode,
"MZXW6YTBOI======"sv));
}
void test_decode32() {
// RFC 4648 test vectors ("foobar" in ASCII)
static constexpr auto decode = decode32_to_vector;
static_assert(equal(decode(""sv), ""sv | to_bytes));
static_assert(equal(decode("MY======"sv), "\x{66}"sv | to_bytes));
static_assert(equal(decode("MZXQ===="sv), "\x{66}\x{6F}"sv | to_bytes));
static_assert(
equal(decode("MZXW6==="sv), "\x{66}\x{6F}\x{6F}"sv | to_bytes));
static_assert(
equal(decode("MZXW6YQ="sv), "\x{66}\x{6F}\x{6F}\x{62}"sv | to_bytes));
static_assert(equal(decode("MZXW6YTB"sv),
"\x{66}\x{6F}\x{6F}\x{62}\x{61}"sv | to_bytes));
static_assert(equal(decode("MZXW6YTBOI======"sv),
"\x{66}\x{6F}\x{6F}\x{62}\x{61}\x{72}"sv | to_bytes));
}
void test_decode32_error_handling() {
static constexpr auto decode = try_decode32_to_vector;
static_assert(not decode("MYVA\x{00}LUE"sv).has_value() and
decode("MYVA\x{00}LUE"sv).error() ==
decodexx_error::illegal_character);
static_assert(not decode("MYVA\x{FF}LUE"sv).has_value() and
decode("MYVA\x{FF}LUE"sv).error() ==
decodexx_error::illegal_character);
static_assert(not decode("MYV\x{FF}LUE"sv).has_value() and
decode("MYV\x{FF}LUE"sv).error() ==
decodexx_error::missing_character);
static_assert(not decode("M=YV\x{FF}LUE"sv).has_value() and
decode("M=YV\x{FF}LUE"sv).error() ==
decodexx_error::illegal_padding);
static_assert(not decode("MY==A==="sv).has_value() and
decode("MY==A==="sv).error() ==
decodexx_error::illegal_padding);
static_assert(not decode("MZ======"sv).has_value() and
decode("MZ======"sv).error() ==
decodexx_error::non_canonical);
static_assert(not decode("M7======"sv).has_value() and
decode("M7======"sv).error() ==
decodexx_error::non_canonical);
static_assert(not decode("MY======A"sv).has_value() and
decode("MY======A"sv).error() ==
decodexx_error::missing_character);
}
void test_encode32hex() {
static constexpr auto encode = to_bytes | encode32hex;
// RFC 4648 test vectors ("foobar" in ASCII)
static_assert(equal(""sv | encode, ""sv));
static_assert(equal("\x{66}"sv | encode, "CO======"sv));
static_assert(equal("\x{66}\x{6F}"sv | encode, "CPNG===="sv));
static_assert(equal("\x{66}\x{6F}\x{6F}"sv | encode, "CPNMU==="sv));
static_assert(equal("\x{66}\x{6F}\x{6F}\x{62}"sv | encode, "CPNMUOG="sv));
static_assert(
equal("\x{66}\x{6F}\x{6F}\x{62}\x{61}"sv | encode, "CPNMUOJ1"sv));
static_assert(equal("\x{66}\x{6F}\x{6F}\x{62}\x{61}\x{72}"sv | encode,
"CPNMUOJ1E8======"sv));
}
void test_decode32hex() {
// RFC 4648 test vectors ("foobar" in ASCII)
static constexpr auto decode = decode32hex_to_vector;
static_assert(equal(decode(""sv), ""sv | to_bytes));
static_assert(equal(decode("CO======"sv), "\x{66}"sv | to_bytes));
static_assert(equal(decode("CPNG===="sv), "\x{66}\x{6F}"sv | to_bytes));
static_assert(
equal(decode("CPNMU==="sv), "\x{66}\x{6F}\x{6F}"sv | to_bytes));
static_assert(
equal(decode("CPNMUOG="sv), "\x{66}\x{6F}\x{6F}\x{62}"sv | to_bytes));
static_assert(equal(decode("CPNMUOJ1"sv),
"\x{66}\x{6F}\x{6F}\x{62}\x{61}"sv | to_bytes));
static_assert(equal(decode("CPNMUOJ1E8======"sv),
"\x{66}\x{6F}\x{6F}\x{62}\x{61}\x{72}"sv | to_bytes));
}
void test_encode64() {
static constexpr auto encode = to_bytes | encode64;
// "Man" in ASCII
static_assert(equal("\x{4D}\x{61}\x{6E}"sv | encode, "TWFu"sv));
static_assert(equal("\x{4D}\x{61}"sv | encode, "TWE="sv));
static_assert(equal("\x{4D}"sv | encode, "TQ=="sv));
// "light work." in ASCII
static_assert(equal(
"\x{6C}\x{69}\x{67}\x{68}\x{74}\x{20}\x{77}\x{6F}\x{72}\x{6B}\x{2E}"sv |
encode,
"bGlnaHQgd29yay4="sv));
static_assert(
equal("\x{6C}\x{69}\x{67}\x{68}\x{74}\x{20}\x{77}\x{6F}\x{72}\x{6B}"sv |
encode,
"bGlnaHQgd29yaw=="sv));
static_assert(equal(
"\x{6C}\x{69}\x{67}\x{68}\x{74}\x{20}\x{77}\x{6F}\x{72}"sv | encode,
"bGlnaHQgd29y"sv));
static_assert(
equal("\x{6C}\x{69}\x{67}\x{68}\x{74}\x{20}\x{77}\x{6F}"sv | encode,
"bGlnaHQgd28="sv));
static_assert(equal("\x{6C}\x{69}\x{67}\x{68}\x{74}\x{20}\x{77}"sv | encode,
"bGlnaHQgdw=="sv));
// "Many hands make light work." in ASCII
constexpr auto many_etc =
"\x{4D}\x{61}\x{6E}\x{79}\x{20}\x{68}\x{61}\x{6E}\x{64}\x{73}\x{20}\x{6D}\x{61}\x{6B}\x{65}\x{20}"sv
"\x{6C}\x{69}\x{67}\x{68}\x{74}\x{20}\x{77}\x{6F}\x{72}\x{6B}\x{2E}"sv;
static_assert(
equal(many_etc | encode, "TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu"sv));
// RFC 4648 test vectors ("foobar" in ASCII)
static_assert(equal(""sv | encode, ""sv));
static_assert(equal("\x{66}"sv | encode, "Zg=="sv));
static_assert(equal("\x{66}\x{6F}"sv | encode, "Zm8="sv));
static_assert(equal("\x{66}\x{6F}\x{6F}"sv | encode, "Zm9v"sv));
static_assert(equal("\x{66}\x{6F}\x{6F}\x{62}"sv | encode, "Zm9vYg=="sv));
static_assert(
equal("\x{66}\x{6F}\x{6F}\x{62}\x{61}"sv | encode, "Zm9vYmE="sv));
static_assert(
equal("\x{66}\x{6F}\x{6F}\x{62}\x{61}\x{72}"sv | encode, "Zm9vYmFy"sv));
}
void test_encode64_preservation_of_range_properties() {
static constexpr auto encode = to_bytes | encode64;
// "Many hands make light work." in ASCII
constexpr auto many_etc =
"\x{4D}\x{61}\x{6E}\x{79}\x{20}\x{68}\x{61}\x{6E}\x{64}\x{73}\x{20}\x{6D}\x{61}\x{6B}\x{65}\x{20}"sv
"\x{6C}\x{69}\x{67}\x{68}\x{74}\x{20}\x{77}\x{6F}\x{72}\x{6B}\x{2E}"sv;
static_assert(
std::ranges::random_access_range<decltype(many_etc | encode)>);
static_assert(std::ranges::sized_range<decltype(many_etc | encode)>);
static_assert(std::ranges::size(many_etc | encode) == 36);
}
void test_decode64() {
static constexpr auto decode = decode64_to_vector;
// "Man" in ASCII
static_assert(equal(decode("TWFu"sv), "\x{4D}\x{61}\x{6E}"sv | to_bytes));
static_assert(equal(decode("TWE="sv), "\x{4D}\x{61}"sv | to_bytes));
static_assert(equal(decode("TQ=="sv), "\x{4D}"sv | to_bytes));
// "light work." in ASCII
static_assert(equal(
decode("bGlnaHQgd29yay4="sv),
"\x{6C}\x{69}\x{67}\x{68}\x{74}\x{20}\x{77}\x{6F}\x{72}\x{6B}\x{2E}"sv |
to_bytes));
static_assert(
equal(decode("bGlnaHQgd29yaw=="sv),
"\x{6C}\x{69}\x{67}\x{68}\x{74}\x{20}\x{77}\x{6F}\x{72}\x{6B}"sv |
to_bytes));
static_assert(equal(
decode("bGlnaHQgd29y"sv),
"\x{6C}\x{69}\x{67}\x{68}\x{74}\x{20}\x{77}\x{6F}\x{72}"sv | to_bytes));
static_assert(
equal(decode("bGlnaHQgd28="sv),
"\x{6C}\x{69}\x{67}\x{68}\x{74}\x{20}\x{77}\x{6F}"sv | to_bytes));
static_assert(
equal(decode("bGlnaHQgdw=="sv),
"\x{6C}\x{69}\x{67}\x{68}\x{74}\x{20}\x{77}"sv | to_bytes));
// "Many hands make light work." in ASCII
constexpr auto many_etc =
"\x{4D}\x{61}\x{6E}\x{79}\x{20}\x{68}\x{61}\x{6E}\x{64}\x{73}\x{20}\x{6D}\x{61}\x{6B}\x{65}\x{20}"sv
"\x{6C}\x{69}\x{67}\x{68}\x{74}\x{20}\x{77}\x{6F}\x{72}\x{6B}\x{2E}"sv;
static_assert(equal(decode("TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu"sv),
many_etc | to_bytes));
// RFC 4648 test vectors ("foobar" in ASCII)
static_assert(equal(decode(""sv), ""sv | to_bytes));
static_assert(equal(decode("Zg=="sv), "\x{66}"sv | to_bytes));
static_assert(equal(decode("Zm8="sv), "\x{66}\x{6F}"sv | to_bytes));
static_assert(equal(decode("Zm9v"sv), "\x{66}\x{6F}\x{6F}"sv | to_bytes));
static_assert(
equal(decode("Zm9vYg=="sv), "\x{66}\x{6F}\x{6F}\x{62}"sv | to_bytes));
static_assert(equal(decode("Zm9vYmE="sv),
"\x{66}\x{6F}\x{6F}\x{62}\x{61}"sv | to_bytes));
static_assert(equal(decode("Zm9vYmFy"sv),
"\x{66}\x{6F}\x{6F}\x{62}\x{61}\x{72}"sv | to_bytes));
}
void test_decode64_error_handling() {
static constexpr auto decode = try_decode64_to_vector;
static_assert(not decode("TWFuT"sv).has_value() and
decode("TWFuT"sv).error() ==
decodexx_error::missing_character);
static_assert(not decode("bGlnaHQgd2=="sv).has_value() and
decode("bGlnaHQgd2=="sv).error() ==
decodexx_error::non_canonical);
static_assert(not decode("bGlnaH\x{00}gd28="sv).has_value() and
decode("bGlnaH\x{00}gd28="sv).error() ==
decodexx_error::illegal_character);
static_assert(not decode("bGlnaH\x{FF}gd28="sv).has_value() and
decode("bGlnaH\x{FF}gd28="sv).error() ==
decodexx_error::illegal_character);
static_assert(not decode("bGln=HQgd28="sv).has_value() and
decode("bGln=HQgd28="sv).error() ==
decodexx_error::illegal_padding);
}
void test_decode64_input_range() {
std::istringstream many_etc{"TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu"};
auto many_etc_view = std::views::istream<char>(many_etc);
assert(equal(
decode64_to_vector(many_etc_view),
"\x{4D}\x{61}\x{6E}\x{79}\x{20}\x{68}\x{61}\x{6E}\x{64}\x{73}\x{20}\x{6D}\x{61}\x{6B}\x{65}\x{20}"sv
"\x{6C}\x{69}\x{67}\x{68}\x{74}\x{20}\x{77}\x{6F}\x{72}\x{6B}\x{2E}"sv |
to_bytes));
}
void test_decode64_input_range_error_handling() {
std::istringstream missing_character{"TWFuT"};
auto missing_character_view = std::views::istream<char>(missing_character);
const auto missing_char_result =
try_decode64_to_vector(missing_character_view);
assert(not missing_char_result.has_value());
assert(missing_char_result.error() == decodexx_error::missing_character);
}
void test_encode64_decode64() {
// Note: the try_decode functions return updated input and output iterators
// when they are successful. If the passed range to decode is an automatic
// variable, the returned iterators would be dangling. In such a case, even
// if we use the decode functions, which throw away those iterators, the
// decoded expression is not a constant expression. In order to avoid that
// problem, we need to ensure that the encoded range still is in scope when
// the decoding function returns. Hence the somewhat verbose code below.
constexpr auto man = "\x{4D}\x{61}\x{6E}"sv;
constexpr auto encoded = man | to_bytes | encode64;
static_assert(equal(decode64_to_vector(encoded), man | to_bytes));
}
void test_decode64_encode64() {
constexpr auto encoded = "TWFu"sv;
static_assert(equal(decode64_to_vector(encoded) | encode64, encoded));
}
} // namespace
auto main() -> int {
test_encode16();
test_decode16();
test_decode16_error_handling();
test_encode32();
test_decode32();
test_decode32_error_handling();
test_encode32hex();
test_decode32hex();
// Note: no explicit check for Base32 Hex error handling, since only the
// alphabet symbols differ from regular Base32.
test_encode64();
test_decode64();
test_decode64_error_handling();
// "Special functionality" only tested for Base64 to avoid even more
// verbosity. Base16 and Base32 should give the same results. Also worth
// noting: input range encoding is at the time of writing not provided,
// because the best solution to run our algorithm without multi-pass would
// be to use views::cache_last between views::transform and
// views::cartesian_product, but that facility at the time of writing is
// only proposed for standardization
// (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3138r0.html).
// Also, using views::cache_last will not be sufficient: in order for our
// implementation not to be ill-formed with input ranges, we will also need
// to implement a variant of std::transform that does not require
// equality-preservation (see
// https://stackoverflow.com/questions/79069912/stdviewschunk-stdviewstransform-inputs-ranges-and-ill-formedness).
test_encode64_preservation_of_range_properties();
test_decode64_input_range();
test_decode64_input_range_error_handling();
test_encode64_decode64();
test_decode64_encode64();
return 0;
}