5

Simple Problem Statement:

Is is possible to have a array of a custom size data type (3/5/6/7 byte) in C or Cython?

Background:

I have run across a major memory inefficiency while attempting to code a complex algorithm. The algorithm calls for the storage of a mind-blowing amount of data. All the data is arranged in a contiguous block of memory (like an array). The data is simply a very long list of [usually] very large numbers. The type of the numbers in this list/array is constant given a certain set of numbers (they operate almost as a regular C array, where all numbers are of the same type in a array)

Problem:

Sometimes it is not efficient to store each number in a standard data size. Usually the normal data types are char, short, int, long etc... However if I use a int array to store a data type that is only in the range that can be stored in 3 bytes, on each number I lose 1 byte of space. This makes extreme inefficiency, and when you are storing millions of numbers, the effects are memory breaking. There is unfortunately no other way to implement the solution to the algorithm, and I believe a rough implementation of a custom data size is the only way to do this.

What I tried:

I have tried to use char arrays to complete this task, but the conversions between the different 0 - 255 value bits to form a larger data type is just inefficient in most cases. Often times, there is a mathematical method of taking chars and packing them into a larger number, or taking that larger number, and dividing out its individual chars. Here was an extremely inefficient algorithm of my attempt at this, written in Cython:

def to_bytes(long long number, int length):
    cdef:
        list chars = []
        long long m
        long long d
    
    for _ in range(length):
        m = number % 256
        d = number // 256
        chars.append(m)
        number = d
    
    cdef bytearray binary = bytearray(chars)
    binary = binary[::-1]
    return binary

def from_bytes(string):
    cdef long long d = int(str(string).encode('hex'), 16)
    return d

Keep in mind I don't exactly want improvements to this algorithm, but a fundamental way of declaring an array of a certain data type, so I don't have to do this transformation.

16
  • 1
    Maybe I'm missing it, but not using 1 byte for millions of entries is only on the order of Megabytes. On modern architecture this is pretty acceptable, is there something more you are talking about? Commented Jul 6, 2014 at 2:06
  • 1
    Ah yes, there is not just one array. Sometimes there are more than a standard 8 GBs of ram can handle, especially when the data only requires 5 bytes, but utilizes 8 bytes (long) to store it. I am trying to simply get the most efficient implementation of a specific size of data in a given array, so when there are many of these arrays, there is no problem. Commented Jul 6, 2014 at 2:11
  • 1
    The difficulty isn't just storing the bytes; its storing the size of each element. It sounds like (and correct me if I'm wrong) that you're looking to store each value in the fewest octets needed to represent that value; not the number of octets needed to represent all values of that type, and with that the hope is increased memory efficiency at the price of potential alignment penalties. If that is the case, a size for each element is needed, and not cheap (if you've never coded ASN.1 DER/BER, you don't know what you're missing). Commented Jul 6, 2014 at 2:23
  • 2
    Some languages use the keyword packed or such to enforce speed inefficient, but space efficient code/data. Venture carefully into those dark packed woods though, you may not come out. Commented Jul 6, 2014 at 3:40
  • 1
    Not that this isn't an interesting question, but if you're running out of RAM at only 8 GB, it seems like the easy solution would be to buy another stick or rent some time on better hardware. Either option is surprisingly affordable these days. Commented Jul 11, 2014 at 1:41

6 Answers 6

1
+50

I think the important question is if you need to have access to all the data at the same time.

If you only need to access one chunk of data at the same time

If you only need to access one array at a time, then one Pythonic possibility is to use NumPy arrays with data type uint8 and width as needed. When you need to operate on the data, you expand the compressed data by (here 3-octet numbers into uint32):

import numpy as np

# in this example `compressed` is a Nx3 array of octets (`uint8`)
expanded = np.empty((compressed.shape[0], 4))
expanded[:,:3] = compressed
expanded[:, 3] = 0
expanded = expanded.view('uint32').reshape(-1)

Then the operations are performed on expanded which is a 1-d vector of N uint32 values.

After we are done, the data can be saved back:

# recompress
compressed[:] = expanded.view('uint8').reshape(-1,4)[:,:3]

The time taken for each direction is (in my machine with Python) approximately 8 ns per element for the example above. Using Cython may not give much performance advantage here, because almost all time is spent copying the data between buffers somewhere in the dark depths of NumPy.

This is a high one-time cost, but if you are planning to access each element at least once, it is probably less expensive to pay the one-time cost than a similar cost for each operation.


Of course, the same approach can be taken in C:

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <sys/resource.h>

#define NUMITEMS 10000000

int main(void)
    {
    uint32_t *expanded;
    uint8_t * cmpressed, *exp_as_octets;
    struct rusage ru0, ru1;
    uint8_t *ep, *cp, *end;
    double time_delta;

    // create some compressed data
    cmpressed = (uint8_t *)malloc(NUMITEMS * 3);

    getrusage(RUSAGE_SELF, &ru0);

    // allocate the buffer and copy the data
    exp_as_octets = (uint8_t *)malloc(NUMITEMS * 4);
    end = exp_as_octets + NUMITEMS * 4;
    ep = exp_as_octets;
    cp = cmpressed;
    while (ep < end)
        {
        // copy three octets out of four
        *ep++ = *cp++;
        *ep++ = *cp++;
        *ep++ = *cp++;
        *ep++ = 0;
        }
    expanded = (uint32_t *)exp_as_octets;

    getrusage(RUSAGE_SELF, &ru1);
    printf("Uncompress\n");
    time_delta = ru1.ru_utime.tv_sec + ru1.ru_utime.tv_usec * 1e-6 
               - ru0.ru_utime.tv_sec - ru0.ru_utime.tv_usec * 1e-6;
    printf("User: %.6lf seconds, %.2lf nanoseconds per element", time_delta, 1e9 * time_delta / NUMITEMS);
    time_delta = ru1.ru_stime.tv_sec + ru1.ru_stime.tv_usec * 1e-6 
               - ru0.ru_stime.tv_sec - ru0.ru_stime.tv_usec * 1e-6;
    printf("System: %.6lf seconds, %.2lf nanoseconds per element", time_delta, 1e9 * time_delta / NUMITEMS);

    getrusage(RUSAGE_SELF, &ru0);
    // compress back
    ep = exp_as_octets;
    cp = cmpressed;
    while (ep < end)
       {
       *cp++ = *ep++;
       *cp++ = *ep++;
       *cp++ = *ep++;
       ep++;
       }
    getrusage(RUSAGE_SELF, &ru1);
    printf("Compress\n");
    time_delta = ru1.ru_utime.tv_sec + ru1.ru_utime.tv_usec * 1e-6 
               - ru0.ru_utime.tv_sec - ru0.ru_utime.tv_usec * 1e-6;
    printf("User: %.6lf seconds, %.2lf nanoseconds per element", time_delta, 1e9 * time_delta / NUMITEMS);
    time_delta = ru1.ru_stime.tv_sec + ru1.ru_stime.tv_usec * 1e-6 
               - ru0.ru_stime.tv_sec - ru0.ru_stime.tv_usec * 1e-6;
    printf("System: %.6lf seconds, %.2lf nanoseconds per element", time_delta, 1e9 * time_delta / NUMITEMS);
    }

This reports:

Uncompress
 User: 0.022650 seconds, 2.27 nanoseconds per element
 System: 0.016171 seconds, 1.62 nanoseconds per element
Compress
 User: 0.011698 seconds, 1.17 nanoseconds per element
 System: 0.000018 seconds, 0.00 nanoseconds per element

The code was compiled with gcc -Ofast and is probably relatively close to the optimal speed. The system time is spent with the malloc. To my eye this looks pretty fast, as we are doing memory reads at 2-3 GB/s. (Which also means that while making the code multi-threaded would be easy, there may not be much speed benefit.)

If you want to have the best performance, you'll need to code the compression/decompression routines for each data width separately. (I do not promise the C code above is the absolutely fastest on any machine, I did not take a look at the machine code.)

If you need to random access separate values

If you, instead, need to access only one value here and another there, Python will not offer any resonably fast methods, as the array lookup overhead is huge.

In this case I suggest you create the C routines to fetch and put the data back. See technosaurus's answer. There are a lot of tricks, but the alignment problems cannot be avoided.

One useful trick when reading the odd-sized array might be (here reading 3 octets from an octet array compressed into a uint32_t value):

value = (uint32_t *)&compressed[3 * n] & 0x00ffffff;

Then someone else will take care of the possible misalignment, and there'll be one octet of garbage in the end. Unfortunately this cannot be used when writing the values. And - again - this may or may not be faster or slower than any of the other alternatives.

Sign up to request clarification or add additional context in comments.

Comments

1

In C you can define a custom data type to handle the complexities with an arbitrary byte size:

typedef struct 3byte { char x[3]; } 3byte;

You are then able to do all the nice things like passing by value, getting the correct size_t, and creating an array of this type.

2 Comments

@WhozCraig: No they won't. It doesn't have any alignment requirements; padding is useless. Ideone demo with a size of 3: ideone.com/FNcWxf (We do have to change the name, though, since it's not allowed to start with a digit.)
@user2357112 you're completely correct. I seriously need to stop posting comments without appropriate caffeination. A single 3-char member has no alignment specs, even sequenced in an array of these. nice catch. dropping erroneous comment; thanks for keeping me honest.
1

You can use a packed bitfield. On GCC, that would look like

typedef struct __attribute__((__packed__)) {
    int x : 24;
} int24;

For an int24 x, x.x behaves pretty much like a 24-bit int. You can make an array of these, and it won't have any unnecessary padding. Note that this will be slower than using ordinary ints; the data will not be aligned, and I don't think there's any instruction for a 24-bit read. The compiler will need to generate extra code for each read and store.

4 Comments

The only problem with this method is that for "wrong" endianness, multiple bytes would need to be swapped.... more operations than if it were just char arrays.
@technosaurus: I'm not sure what you mean by that. Why do we care about endianness? There don't seem to be any memory layout requirements; the very fact that we're choosing the amount of bytes used to represent an entry means we're not dealing with a fixed memory layout imposed by some external requirement.
Assume that this data was generated on an x86 and this code is run on an x86 - everything will be fine ... until it is compiled/run on a machine with different endianness (arm, mips,...) the numbers will be wrong ... ex. 0xFFAA00 could be interpreted as 0x00AAFF ... or what that machine's endianness dictates... for short/long there are functions for going to/from the host endianness to network byte order: htonl, htons, ntohl, ntohs ... but the OP does not mention the byte order, so it is probably in their machine's native byte order or they'd have noticed and mentioned it.
@technosaurus: I'm pretty sure we get to decide the byte order.
1

MrAlias & user both make good points, so why not combine them?

typedef union __attribute__((__packed__)) {
  int x : 24;
  char s[3];
} u3b;

typedef union __attribute__((__packed__)) {
  long long x : 56;
  char s[7];
} u7b;

For large amounts of data you may save some memory this way but the code will almost definitely be slower due to the unaligned accesses that it will incur. For the most efficiency you should extend them to align to a standard integral length and operate on those (read arrays in multiples of 4 or 8).

Then you will still have endianness issues, so if you need to be compatible with both big and little endian it would be necessary to use the char part of the union to accommodate the platform that the data is not meant for (The union will only work for one type of endianness). For the other endianness you would need something along the lines of:

int x = myu3b.s[0]|(myu3b.s[1]<<8)|(myu3b.s[2]<<16);
//or
int x = myu3b.s[2]|(myu3b.s[1]<<8)|(myu3b.s[0]<<16);

This method may be just as fast after optimization (compiler dependent), if so you can just use char arrays and skip the union altogether.

Comments

1

I completely support the bit-set approach, just watch out for alignment issues. If you do a lot of random access, you might want to make sure that you align to your cache + cpu architecture.

In addition I would suggest looking into a another approach:

You could decompress data you need, on the fly, using for instance zlib. If you expect there to be a lot of duplicate values in the stream this can significantly reduce the IO-traffic as well as the memory-footprint. (Assuming the need for random-access is not too great.) See here for a quick tutorial on zlib.

Comments

1

With the rate at which processors can rip up instructions, I was interested in how general one could make this, and still run in reasonable time.

The problem with packed bit fields is that they are not standard, and don't work for read/write on machines of different ended-ness. It occurred to me that little-endian is just the ticket for this problem... so making a virtue out wanting to solve the endian-issue, the trick seemed to be to store stuff little-endian. For, say, 5 byte integers: storing a little-endian value is simple, you just copy the first 5 bytes; loading is not quite so simple, because you have to sign extend.

The code below will do arrays of 2, 3, 4 and 5 byte signed integers: (a) forcing little-endian, and (b) using packed bit fields for comparison (see BIT_FIELD). As given, it compiles under gcc on linux (64-bit).

The code makes two flying assumptions:

  1. -ve numbers are 2's or 1's complement (no sign & magnitude) !

  2. that structures with alignment == 1 can be read/written at any address, for any size of structure.

The main does some testing and timing. It runs the same test on large arrays of: (a) 'flex' arrays with integer lengths 2, 3, 4 and 5; and (b) simple arrays with integer lengths 2, 4, 4 and 8. On my machine here I got (compiled -O3, maximum optimization):

Arrays of 800 million entries -- not using bit-field
With 'flex' arrays of 10.4G bytes: took 20.160 secs: user 16.600 system 3.500
With simple arrays of 13.4G bytes: took 32.580 secs: user 14.680 system 4.910

Arrays of 800 million entries -- using bit-field
With 'flex' arrays of 10.4G bytes: took 22.280 secs: user 18.820 system 3.380
With simple arrays of 13.4G bytes: took 20.450 secs: user 14.450 system 4.620

So, using reasonably general code, the special length integers take longer, but perhaps not as bad as one might have expected !! The bit-field version comes out slower... I haven't had time to dig into why.

So... looks doable to me.

/*==============================================================================
 * 2/3/4/5/... byte "integers" and arrays thereof.
 */
#include <stdint.h>
#include <stdbool.h>
#include <stdlib.h>
#include <stddef.h>
#include <unistd.h>
#include <memory.h>
#include <stdio.h>
#include <sys/times.h>
#include <assert.h>

/*==============================================================================
 * General options
 */
#define BIT_FIELD 0             /* use bit-fields (or not)  */

#include <endian.h>
#include <byteswap.h>

#if __BYTE_ORDER == __LITTLE_ENDIAN
# define htole16(x) (x)
# define le16toh(x) (x)

# define htole32(x) (x)
# define le32toh(x) (x)

# define htole64(x) (x)
# define le64toh(x) (x)

#else
# define htole16(x) __bswap_16 (x)
# define le16toh(x) __bswap_16 (x)

# define htole32(x) __bswap_32 (x)
# define le32toh(x) __bswap_32 (x)

# define htole64(x) __bswap_64 (x)
# define le64toh(x) __bswap_64 (x)
#endif

typedef int64_t imax_t ;

/*------------------------------------------------------------------------------
 * 2 byte integer
 */
#if BIT_FIELD
typedef struct __attribute__((packed)) { int16_t  i : 2 * 8 ; } iflex_2b_t ;
#else
typedef struct { int8_t b[2] ; } iflex_2b_t ;
#endif

inline static int16_t
iflex_get_2b(iflex_2b_t item)
{
#if BIT_FIELD
  return item.i ;
#else
  union
  {
    int16_t     i ;
    iflex_2b_t  f ;
  } x ;

  x.f = item ;
  return le16toh(x.i) ;
#endif
} ;

inline static iflex_2b_t
iflex_put_2b(int16_t val)
{
#if BIT_FIELD
  iflex_2b_t x ;
  x.i = val ;
  return x ;
#else
  union
  {
    int16_t     i ;
    iflex_2b_t  f ;
  } x ;

  x.i = htole16(val) ;
  return x.f ;
#endif
} ;

/*------------------------------------------------------------------------------
 * 3 byte integer
 */
#if BIT_FIELD
typedef struct __attribute__((packed)) { int32_t  i : 3 * 8 ; } iflex_3b_t ;
#else
typedef struct { int8_t b[3] ; } iflex_3b_t ;
#endif

inline static int32_t
iflex_get_3b(iflex_3b_t item)
{
#if BIT_FIELD
  return item.i ;
#else
  union
  {
    int32_t     i ;
    int16_t     s[2] ;
    iflex_2b_t  t[2] ;
  } x ;

  x.t[0] = *((iflex_2b_t*)&item) ;
  x.s[1] = htole16(item.b[2]) ;

  return le32toh(x.i) ;
#endif
} ;

inline static iflex_3b_t
iflex_put_3b(int32_t val)
{
#if BIT_FIELD
  iflex_3b_t x ;
  x.i = val ;
  return x ;
#else
  union
  {
    int32_t     i ;
    iflex_3b_t  f ;
  } x ;

  x.i = htole32(val) ;
  return x.f ;
#endif
} ;

/*------------------------------------------------------------------------------
 * 4 byte integer
 */
#if BIT_FIELD
typedef struct __attribute__((packed)) { int32_t  i : 4 * 8 ; } iflex_4b_t ;
#else
typedef struct { int8_t b[4] ; } iflex_4b_t ;
#endif

inline static int32_t
iflex_get_4b(iflex_4b_t item)
{
#if BIT_FIELD
  return item.i ;
#else
  union
  {
    int32_t     i ;
    iflex_4b_t  f ;
  } x ;

  x.f = item ;
  return le32toh(x.i) ;
#endif
} ;

inline static iflex_4b_t
iflex_put_4b(int32_t val)
{
#if BIT_FIELD
  iflex_4b_t x ;
  x.i = val ;
  return x ;
#else
  union
  {
    int32_t     i ;
    iflex_4b_t  f ;
  } x ;

  x.i = htole32((int32_t)val) ;
  return x.f ;
#endif
} ;

/*------------------------------------------------------------------------------
 * 5 byte integer
 */
#if BIT_FIELD
typedef struct __attribute__((packed)) { int64_t  i : 5 * 8 ; } iflex_5b_t ;
#else
typedef struct { int8_t b[5] ; } iflex_5b_t ;
#endif

inline static int64_t
iflex_get_5b(iflex_5b_t item)
{
#if BIT_FIELD
  return item.i ;
#else
  union
  {
    int64_t     i ;
    int32_t     s[2] ;
    iflex_4b_t  t[2] ;
  } x ;

  x.t[0] = *((iflex_4b_t*)&item) ;
  x.s[1] = htole32(item.b[4]) ;

  return le64toh(x.i) ;
#endif
} ;

inline static iflex_5b_t
iflex_put_5b(int64_t val)
{
#if BIT_FIELD
  iflex_5b_t x ;
  x.i = val ;
  return x ;
#else
  union
  {
    int64_t     i ;
    iflex_5b_t  f ;
  } x ;

  x.i = htole64(val) ;
  return x.f ;
#endif
} ;

/*------------------------------------------------------------------------------
 *
 */
#define alignof(t) __alignof__(t)

/*==============================================================================
 * To begin at the beginning...
 */
int
main(int argc, char* argv[])
{
  int count = 800 ;

  assert(sizeof(iflex_2b_t)  == 2) ;
  assert(alignof(iflex_2b_t) == 1) ;
  assert(sizeof(iflex_3b_t)  == 3) ;
  assert(alignof(iflex_3b_t) == 1) ;
  assert(sizeof(iflex_4b_t)  == 4) ;
  assert(alignof(iflex_4b_t) == 1) ;
  assert(sizeof(iflex_5b_t)  == 5) ;
  assert(alignof(iflex_5b_t) == 1) ;

  clock_t at_start_clock, at_end_clock ;
  struct tms at_start_tms, at_end_tms ;
  clock_t ticks ;

  printf("Arrays of %d million entries -- %susing bit-field\n", count,
                                                      BIT_FIELD ? "" : "not ") ;
  count *= 1000000 ;

  iflex_2b_t* arr2 = malloc(count * sizeof(iflex_2b_t)) ;
  iflex_3b_t* arr3 = malloc(count * sizeof(iflex_3b_t)) ;
  iflex_4b_t* arr4 = malloc(count * sizeof(iflex_4b_t)) ;
  iflex_5b_t* arr5 = malloc(count * sizeof(iflex_5b_t)) ;

  size_t bytes = ((size_t)count * (2 + 3 + 4 + 5)) ;

  srand(314159) ;

  at_start_clock = times(&at_start_tms) ;

  for (int i = 0 ; i < count ; i++)
    {
      imax_t v5, v4, v3, v2, r ;

      v2 = (int16_t)(rand() % 0x10000) ;
      arr2[i] = iflex_put_2b(v2) ;

      v3 = (v2 * 0x100) | ((i & 0xFF) ^ 0x33) ;
      arr3[i] = iflex_put_3b(v3) ;

      v4 = (v3 * 0x100) | ((i & 0xFF) ^ 0x44) ;
      arr4[i] = iflex_put_4b(v4) ;

      v5 = (v4 * 0x100) | ((i & 0xFF) ^ 0x55) ;
      arr5[i] = iflex_put_5b(v5) ;

      r = iflex_get_2b(arr2[i]) ;
      assert(r == v2) ;

      r = iflex_get_3b(arr3[i]) ;
      assert(r == v3) ;

      r = iflex_get_4b(arr4[i]) ;
      assert(r == v4) ;

      r = iflex_get_5b(arr5[i]) ;
      assert(r == v5) ;
    } ;

  for (int i = count - 1 ; i >= 0 ; i--)
    {
      imax_t v5, v4, v3, v2, r, b ;

      v5 = iflex_get_5b(arr5[i]) ;
      b  = (i & 0xFF) ^ 0x55 ;
      assert((v5 & 0xFF) == b) ;
      r  = (v5 ^ b) / 0x100 ;

      v4 = iflex_get_4b(arr4[i]) ;
      assert(v4 == r) ;
      b  = (i & 0xFF) ^ 0x44 ;
      assert((v4 & 0xFF) == b) ;
      r  = (v4 ^ b) / 0x100 ;

      v3 = iflex_get_3b(arr3[i]) ;
      assert(v3 == r) ;
      b  = (i & 0xFF) ^ 0x33 ;
      assert((v3 & 0xFF) == b) ;
      r  = (v3 ^ b) / 0x100 ;

      v2 = iflex_get_2b(arr2[i]) ;
      assert(v2 == r) ;
    } ;

  at_end_clock  = times(&at_end_tms) ;

  ticks = sysconf(_SC_CLK_TCK) ;

  printf("With 'flex' arrays of %4.1fG bytes: "
                                  "took %5.3f secs: user %5.3f system %5.3f\n",
      (double)bytes / (double)(1024 *1024 *1024),
      (double)(at_end_clock - at_start_clock)                 / (double)ticks,
      (double)(at_end_tms.tms_utime - at_start_tms.tms_utime) / (double)ticks,
      (double)(at_end_tms.tms_stime - at_start_tms.tms_stime) / (double)ticks) ;

  free(arr2) ;
  free(arr3) ;
  free(arr4) ;
  free(arr5) ;

  int16_t* brr2 = malloc(count * sizeof(int16_t)) ;
  int32_t* brr3 = malloc(count * sizeof(int32_t)) ;
  int32_t* brr4 = malloc(count * sizeof(int32_t)) ;
  int64_t* brr5 = malloc(count * sizeof(int64_t)) ;

  bytes = ((size_t)count * (2 + 4 + 4 + 8)) ;

  srand(314159) ;

  at_start_clock = times(&at_start_tms) ;

  for (int i = 0 ; i < count ; i++)
    {
      imax_t v5, v4, v3, v2, r ;

      v2 = (int16_t)(rand() % 0x10000) ;
      brr2[i] = v2 ;

      v3 = (v2 * 0x100) | ((i & 0xFF) ^ 0x33) ;
      brr3[i] = v3 ;

      v4 = (v3 * 0x100) | ((i & 0xFF) ^ 0x44) ;
      brr4[i] = v4 ;

      v5 = (v4 * 0x100) | ((i & 0xFF) ^ 0x55) ;
      brr5[i] = v5 ;

      r = brr2[i] ;
      assert(r == v2) ;

      r = brr3[i] ;
      assert(r == v3) ;

      r = brr4[i] ;
      assert(r == v4) ;

      r = brr5[i] ;
      assert(r == v5) ;
    } ;

  for (int i = count - 1 ; i >= 0 ; i--)
    {
      imax_t v5, v4, v3, v2, r, b ;

      v5 = brr5[i] ;
      b  = (i & 0xFF) ^ 0x55 ;
      assert((v5 & 0xFF) == b) ;
      r  = (v5 ^ b) / 0x100 ;

      v4 = brr4[i] ;
      assert(v4 == r) ;
      b  = (i & 0xFF) ^ 0x44 ;
      assert((v4 & 0xFF) == b) ;
      r  = (v4 ^ b) / 0x100 ;

      v3 = brr3[i] ;
      assert(v3 == r) ;
      b  = (i & 0xFF) ^ 0x33 ;
      assert((v3 & 0xFF) == b) ;
      r  = (v3 ^ b) / 0x100 ;

      v2 = brr2[i] ;
      assert(v2 == r) ;
    } ;

  at_end_clock  = times(&at_end_tms) ;

  printf("With simple arrays of %4.1fG bytes: "
                                  "took %5.3f secs: user %5.3f system %5.3f\n",
      (double)bytes / (double)(1024 *1024 *1024),
      (double)(at_end_clock - at_start_clock)                 / (double)ticks,
      (double)(at_end_tms.tms_utime - at_start_tms.tms_utime) / (double)ticks,
      (double)(at_end_tms.tms_stime - at_start_tms.tms_stime) / (double)ticks) ;

  free(brr2) ;
  free(brr3) ;
  free(brr4) ;
  free(brr5) ;

  return 0 ;
} ;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.