2

Preamble

So i've noticed that a lot of functional / jitted languages (eg julia, numba jit for python) have virtual methods attached to "functions" rather than classes, and those "functions" can be dispatched based on types of several arguments. example in julia:

function f(x::Int)
 println("x::Int = $x")
end
function f(x::AbstractFloat)
 println("x::Float = $x")
end
function f(x::Int, y::AbstractFloat)
 println("x::Int = $x ; y::Float=y")
end
f(5) # prints "x::Int = 5"
f(6.5) # prints "x::Float = 6.5"
f(7, 8.5) # prints "x::Int = 7 ; y::Float = 8.5"
f(9.5, 10) # throws a MethodError

I want to understand how to implement a dispatching system like this in C. For a function with exactly 1 argument and exactly 0 return values this is a trivial task:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

const int T_int = 1;
const int T_float = 2;

// idk any shorter way of type punning
static inline uintptr_t f2i(float x){union {uintptr_t i; float f;} fi; fi.f = x; return fi.i;}
static inline float i2f(uintptr_t x){union {uintptr_t i; float f;} fi; fi.i = x; return fi.f;}

void f_int(uintptr_t x){printf("int x = %d\n", (int) x);}
void f_float(uintptr_t x) {printf("float x = %f\n", i2f(x));}

typedef void (*Fn1) (uintptr_t);
typedef struct {int T; Fn1 fn;} Method1; 
typedef struct {int len; Method1*methods; } Dispatch1;

void call1(Dispatch1 disp, int T, uintptr_t x){
  for (int i = 0; i < disp.len; i++) {
    if (T == disp.methods[i].T) {
      return disp.methods[i].fn(x);
    }}
  printf("dispatch for type tag %d failed\n", T); exit(1);
}

int main() {
  Dispatch1 f = {2, calloc(2, sizeof(Method1))};
  f.methods[0] = (Method1) {T_int, (Fn1) f_int};
  f.methods[1] = (Method1) {T_float, (Fn1) f_float};

  call1(f, T_int, (uintptr_t) 5); // prints "int x = 5"
  call1(f, T_float, f2i(6.5)); // prints "float x = 6.50000"
  call1(f, 10, (uintptr_t) 5); // prints "dispatch for type tag 10 failed", exits
}

The question:

is there a relatively simple way to implement in C a similar dispatch for arbitrary number of arguments?

if no, is there a relatively simple way to implement this for functions with, say, 0-3 arguments and 0-1 return values?

Note: i know, that this is inefficient, dumb, etc, i don't care.

Edit1: i don't want generic programming in C, i want something like a vtable, but for dispatching on multiple arguments.

Edit2: the context of why i'm doing is, i want to implement something like a Lisp interpreter, but with typecheking capabilities. i want to understand how this feature works without sifting through julia / LLVM codebase.

What i don't understand, is how to compresses function signature, so that each vtable entry would be relatively small and fast to check against argument types.

23
  • 3
    Unlike C++, C doesn't support overloading functions. You just have to give them different names. Commented May 30 at 14:27
  • ... and C++ implementations typically support function overloading by use of name mangling, which looks to the system exactly like giving the functions different names. Commented May 30 at 14:33
  • C11 has the _Generic , check out [stackoverflow.com/questions/9804371/… Commented May 30 at 14:34
  • @dbush, i don't want function overloading, i want a runtime dispatch. Commented May 30 at 14:35
  • 1
    Of course it makes a difference. C function call semantics have strict requirements for matching of declared and actual function types, which includes the specific type of the return value and the number and types of the arguments. Also, the actual arguments must be assignment-compatible with the parameter types. Those are going to cause you a mountain of headaches if you try to represent functions defined in your language directly as C functions, callable via a C function pointer. Python for sure does not do that, nor does Java. Probably not Julia, either. Commented May 30 at 17:11

5 Answers 5

1

I did this by having a static ID associated with my type structure, that gets incremented each time a type is created and is associated with that particular type instance.

I then made a map where the key stores the type ID of every parameter, when a function is invoked the call system looks at all the functions that match the name and chooses the matching function or gives up.

I used this not just for named function overloading, but also for operators. So that, for example, two reals being compared chooses a function that would directly compare them, but a real and a string chooses one that first tries to convert the real to a string and if that succeeds compares them, but if it fails the real gets converted to a string and regular lexicographic comparison is used instead.

Note I did this in C++, so the associative container was quite a bit easier, but the static type ID would work just as well for either language.

Sign up to request clarification or add additional context in comments.

1 Comment

This is a nice answer! And it keeps the functionality to the interpreter side and out of the C side (though I think the OP was asking about the C side, alas).
1

as I initially understood the code, the following code dispatches to several functions:

#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>

#define CHECK(expr, ...) do { \
    if (!(expr)) { \
        fflush(0); \
        fprintf(stderr, "ERROR: %s:%d: expression failed: %s", __func__, __LINE__, #expr); \
        __VA_OPT__(fprintf(stderr, ": " __VA_ARGS__);) \
        fprintf(stderr, "\n"); \
        exit(1); \
    } \
} while(0)

// method library

// maximum number of arguments we support
#define MAXARGS  16

// types
enum type {
    VOID = 0,
    INT,
    FLOAT,
};

const char *type2str(enum type type) {
    switch (type) {
        case VOID: return "void";
        case INT: return "int";
        case FLOAT: return "float";
    }
    return "unknown";
}

// function to call
struct method {
    enum type arg[MAXARGS];
    void (*m)();
};

// an array of overloaded functions
struct methods {
    int len;
    struct method *m;
};

// argument to pass
struct arg {
    enum type type;
    union {
        float f;
        int i;
    } v;
};

// create an argument
struct arg mk_arg_i(int i) {
    return (struct arg){INT, .v.i=i};
}
struct arg mk_arg_f(float f) {
    return (struct arg){FLOAT, .v.f=f};
}
#define mk_arg(x) _Generic((x) \
    , int: mk_arg_i \
    , float: mk_arg_f \
    , double: mk_arg_f \
    )(x)

// call function with given arguments
void method_call(struct method t, struct arg a[MAXARGS]) {
    switch (t.arg[0] + t.arg[1] * 10) {
    case VOID:
        CHECK(a[0].type == VOID);
        CHECK(a[1].type == VOID);
        ((void(*)())t.m)();
        return;
    case INT:
        CHECK(a[0].type == INT);
        CHECK(a[1].type == VOID);
        ((void(*)(int))t.m)(a[0].v.i);
        return;
    case FLOAT:
        CHECK(a[0].type == FLOAT);
        CHECK(a[1].type == VOID);
        ((void(*)(float))t.m)(a[0].v.f);
        return;
    case INT + FLOAT * 10:
        CHECK(a[0].type == INT);
        CHECK(a[1].type == FLOAT);
        ((void(*)(int, float))t.m)(a[0].v.i, a[1].v.f);
        return;
    }
    CHECK(0, "f(%d, %d) with %d %d",
        t.arg[0], t.arg[1], a[0].type, a[1].type);
}

// from an array of functions call first matching function that matches arguments
void method_dispatch(struct methods t, struct arg a[MAXARGS]) {
    for (struct method *end = t.m + t.len, *m = t.m; m != end; m++) {
        int same = 1;
        for (int i = 0; i < MAXARGS; ++i) {
            same &= m->arg[i] == a[i].type;
        }
        if (same) {
            method_call(*m, a);
            return;
        }
    }
    CHECK(0, "MethodError: Could not dispatch %s %s", 
        type2str(a[0].type), type2str(a[1].type));
}

// cool overload on number of args to hide boilerplate
#define dispatch_1(t, a)     method_dispatch(t, (struct arg[MAXARGS]){mk_arg(a)})
#define dispatch_2(t, a, b)  method_dispatch(t, (struct arg[MAXARGS]){mk_arg(a), mk_arg(b)})
#define dispatch_N(_2,_1,N,...) dispatch##N  
#define dispatch(t, ...) dispatch_N(__VA_ARGS__,_2,_1)(t,__VA_ARGS__)

// language

void f_i(int x) {
    printf("x::Int = %d\n", x);
}
void f_f(float x) {
    printf("x::Float = %g\n", x);
}
void f_i_f(int x, float y) {
    printf("x::Int = %d ; y::Float=%g\n", x, y);
}
struct methods f = {
    3,
    (struct method[]) {
        { { INT }, (void(*)())f_i, },
        { { FLOAT }, (void(*)())f_f, },
        { { INT, FLOAT }, (void(*)())f_i_f, },
    },
};

int main() {
    dispatch(f, 5); // prints "x::Int = 5"
    dispatch(f, 6.5); // prints "x::Float = 6.5"
    dispatch(f, 7, 8.5); // prints "x::Int = 7 ; y::Float = 8.5"
    dispatch(f, 9.5, 10); // # throws a MethodError
}

Which prints:

x::Int = 5
x::Float = 6.5
x::Int = 7 ; y::Float=8.5
ERROR: method_dispatch:110: expression failed: 0: MethodError: Could not dispatch float int

But looking at your code, you are happy with writing serializer/deserializer of your data to a chosen common storage - like uintptr_t. Bottom line, this is similar to int main(int argc, char *argv[]) , just instead of char * you use your own datatype that can be converted to your chosen subset of types - int or float. The following code works on arbitrary number of arguments and abstracts them away with a generic "obj" type:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <assert.h>
#include <string.h>
#include <stdarg.h>
#include <limits.h>

// basic

// assert expression is true
#define CHECK(expr, ...) do { \
    if (!(expr)) { \
        fflush(0); \
        fprintf(stderr, "ERROR: %s:%d: expression failed: %s", __func__, __LINE__, #expr); \
        __VA_OPT__(fprintf(stderr, ": " __VA_ARGS__);) \
        fprintf(stderr, "\n"); \
        exit(1); \
    } \
} while(0)

// fail on allocation error
void *xmalloc(size_t n) {
    void *p = malloc(n);
    CHECK(p);
    return p;
}

// append printf
int rasprintf(char **pnt, char *fmt, ...) {
    va_list va;
    va_start(va, fmt);
    int add = vsnprintf(0, 0, fmt, va);
    va_end(va);
    CHECK(add > 0);
    size_t cur = *pnt ? strlen(*pnt) : 0;
    char *new = realloc(*pnt, cur + add);
    CHECK(new);
    va_start(va, fmt);
    int ret = vsnprintf(new + cur, INT_MAX, fmt, va);
    va_end(va);
    CHECK(ret > 0);
    *pnt = new;
    return ret;
}

// enumerate types

enum type {
    VOID,
    INT,
    FLOAT,
};

const char *type2str(enum type type) {
    switch (type) {
        case VOID: return "void";
        case INT: return "int";
        case FLOAT: return "float";
    }
    return "unknown";
}

// obj - represents arbitrary value of some type.

typedef struct {
    enum type type;
    char v[16];
} obj;

// create an argument
obj mk_obj_in(enum type type, void *v, size_t n) {
    obj r = {type};
    CHECK(sizeof(r.v) > n);
    memcpy(r.v, v, n);
    return r;
}
obj mk_obj_i(int x) { return mk_obj_in(INT, &x, sizeof(x)); }
obj mk_obj_f(float x) { return mk_obj_in(FLOAT, &x, sizeof(x)); }
#define mk_obj(x) _Generic((x) \
    , int: mk_obj_i \
    , float: mk_obj_f \
    )(x)

// convert obj into actual value
float obj2float(obj i) { CHECK(i.type == FLOAT); float r; memcpy(&r, i.v, sizeof(r)); return r; }
int obj2int(obj i) { CHECK(i.type == INT); int r; memcpy(&r, i.v, sizeof(r)); return r; }

// method - represents function to call.
struct method {
    enum type ret;
    size_t nargs;
    enum type *args;
    obj (*func)(int nargs, obj *args);
};

struct methods {
    size_t count;
    struct method *v;
};

obj dispatch_in(struct methods methods, size_t nargs, obj *args){
    for (size_t i = 0; i < methods.count; i++) {
        if (methods.v[i].nargs == nargs) {
            int same = 1;
            for (size_t j = 0; j < nargs; ++j) {
                same &= methods.v[i].args[j] == args[j].type;
            }
            if (same) {
                return methods.v[i].func(nargs, args);
            }
        }
    }
    // error
    char *str = 0;
    for (size_t i = 0; i < nargs; i++) {
        CHECK(rasprintf(&str, "%s", type2str(args[i].type)));
        if (i + 1 < nargs) {
            CHECK(rasprintf(&str, ", "));
        }
    }
    CHECK(0, "MethodError: Could not dispatch function with (%s)", str);
    free(str);
}

#define dispatch_1(m, a)  dispatch_in(m, 1, (obj[]){mk_obj(a)})
#define dispatch_2(m, a, b)  dispatch_in(m, 2, (obj[]){mk_obj(a), mk_obj(b)})
#define disaptch_N(_2,_1,N,...) dispatch##N
#define dispatch(m, ...)  disaptch_N(__VA_ARGS__,_2,_1)(m, __VA_ARGS__)

// usage

obj f_int(int n, obj *a) { return mk_obj(printf("int x = %d\n", obj2int(a[0]))); }
obj f_float(int n, obj *a) { return mk_obj(printf("float x = %f\n", obj2float(a[0]))); }
obj f_int_float(int n, obj *a) { return mk_obj(printf("int x = %d\n", obj2int(a[0]), obj2float(a[1]))); }

struct methods f = {
    3,
    (struct method[]){
        { VOID, 1, (enum type[]){ INT }, f_int },
        { VOID, 1, (enum type[]){ FLOAT }, f_float },
        { VOID, 2, (enum type[]){ INT, FLOAT }, f_int_float },
    },
};

int main() {
    dispatch(f, 5); // prints "x::Int = 5"
    dispatch(f, 6.5f); // prints "x::Float = 6.5"
    dispatch(f, 7, 8.5f); // prints "x::Int = 7 ; y::Float = 8.5"
    dispatch(f, 9.5f, 10); // # throws a MethodError
}

is there a relatively simple way to implement in C a similar dispatch for arbitrary number of arguments?

I would say no.

Comments

0

So... what you want to do can actually be done in C. IDK if I actually recommend it, but here you go.

Technically, you are trying to do two distinct things here:

  • default arguments / variable-length argument lists
  • argument type discrimination

For the first problem I have actually written a little library for that. It isn’t particularly unique (others have done it before and in other ways), but it is what I will use for this answer. You can get it from my response at Default arguments in C. Requires C99 or better.

The second problem is solved by the _Generic selector available since C11.

Now we can combine the two for a solution.

#include <stdio.h>
#include <duthomhas/c-default-arguments.h>

// These first three functions are as you gave them in your example

void f_int(int x)
{
  printf("int x = %d\n", x);
}

void f_double(double x)
{
  printf("double x = %f\n", x);
}

void f_int_double(int x, double y)
{
  printf("int x = %d; double y = %f\n", x, y);
}


// Here I add a simple non-standard “point” data type, 
// just to illustrate just how capable all this is.

struct
{
  double x, y;
}
typedef point;

point make_point(double x, double y)
{
  point p = {x, y};
  return p;
}

void f_point(point p)
{
  printf("point p = {%f, %f}\n", p.x, p.y);
}


// Now we combine the magics to create a singular function macro:
//  • variable-length argument lists
//  • generic type selection

#define f(...) DUTHOMHAS_CALL_OVERLOAD(f_, __VA_ARGS__)
#define f_1(x) _Generic((x), \
  int    : f_int, \
  double : f_double, \
  point  : f_point \
  )(x)
#define f_2(a, b) f_int_double(a, b)


// Behold!

int main(void)
{
  f(5);
  f(6.5);
  f(7, 8.5);
//  f(9.5, 10); // Strictly speaking, this is a WARNING for (double) to (int) conversion
  f(make_point(3.141592, 2025));
  return 0;
}

You can get that warning to become an error by either adjusting your compiler settings or by adding generic selectors for the two-argument macro expansion accepting only an (int, float) combination.

Oh, here’s the output. Works for (and tested with) both Clang/GCC and MSVC.

int x = 5
double x = 6.500000
int x = 7; double y = 8.500000
point p = {3.141592, 2025.000000}

Caveats!

This isn’t perfect. It is a pain to manage whenever you add a new function, and the trickier your argument lists get the messier your generic selectors get.

Every invocation of f() must be static. This should be obvious, though — C is strictly-typed and does not provide an easy way to swap out argument lists to the same function call — i.e. the same as for any other function call in C.

(I believe it is possible to get around that too, but there is where you begin digging for gold under sleeping dragons. You might succeed for a specific version of a specific compiler on a specific OS + hardware combination, but it’ll be fragile and crufty in a bad way.)

Parting advice

Rethink your solution. AFAIK most (if not all) Lisp implementations are initially written in C, but after proper bootstrapping they are written in... Lisp!

In other words, what you are trying to do is not necessary on the C side of things. It is implemented by the Lisp interpreter itself.

Good luck!

1 Comment

i'm interested in runtime dispatch, not generics. that's an essential part of the problem
0

yes, your option#3 is pretty similar to my implementation for a single argument. however, i want methods to have varying number of arguments, like in julia (at least in a sensible range like 0-3). what i can't quite figure out is how to combine type tags so that they would be easy to compare. – Daniil Zuev

It sound like your [biggest] issues are:

  1. Getting call1 to search on multiple argument types at once
  2. Do the search efficiently

I've refactored your code to do this.

It's a bit incomplete but it should give you some ideas:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdarg.h>
#include <string.h>

typedef enum {
    T_null,                             // [optional] -- invalid/null type
    T_int,                              // (e.g. f_int)
    T_float,                            // (e.g. f_float)
    T_intflt,                           // (e.g. f_intfloat)
    T_array,                            // (e.g. f_array)
    T_max
} Type;

enum {
    A_max = 8                           // maximum number of arguments
};

typedef struct array Array;             // forward decl of array

// data type
typedef union {
    uintptr_t i;
    float f;
    Array *a;
} Data;

// function signature (i.e. types of arguments)
typedef union {
    uint8_t sig_list[A_max];            // list of argument types
    uint64_t sig_xid;                   // type signature of function
} FnSig;

#define SIGDEF(_sig,_av...) \
    FnSig _sig = { .sig_list = { _av } }

// pointer to function
typedef union {
    void (*Fn_int)(uintptr_t);
    void (*Fn_float)(uintptr_t);
    void (*Fn_intfloat)(int,float);
    void (*Fn_array)(Array *arr);
    void *Fn_void;
} FnPtr;

// function/method descriptor
typedef struct method Method;
struct method {
    Method *next;                   // pointer to next method in hash bucket
    FnPtr fn;                       // pointer to function
    FnSig sig;
};

// linked list table
#if USEHASH
#define BUCKETS     5
#else
#define BUCKETS     1
#endif
typedef struct {
    Method *buckets[BUCKETS];       // hash buckets
} Dispatch;

// array of data (i.e. Array type)
struct array {
    Method *arr_method;                 // pointer to method (i.e. type)
    size_t arr_count;                   // number of array elements
    Data *arr_data;                     // array data
};

#define FNALL(_cmd) \
    _cmd(Fn_int,T_int) \
    _cmd(Fn_float,T_float) \
    _cmd(Fn_intfloat,T_int,T_float) \
    _cmd(Fn_array,T_array)

typedef void (*Fn_int)(uintptr_t);
typedef void (*Fn_float)(uintptr_t);
typedef void (*Fn_intfloat)(int,float);
typedef void (*Fn_array)(Array *arr);

typedef void (*Fn_float)(uintptr_t);
typedef void (*Fn_intfloat)(int,float);
typedef void (*Fn_array)(Array *arr);

Dispatch dispatcher;

// idk any shorter way of type punning
static inline uintptr_t
f2i(float x)
{
    Data fi;

    fi.f = x;

    return fi.i;
}

static inline float
i2f(uintptr_t x)
{
    Data fi;

    fi.i = x;

    return fi.f;
}

void
f_int(uintptr_t x)
{
    printf("int: x = %d\n", (int) x);
}

void
f_float(uintptr_t x)
{
    printf("float: x = %f\n", i2f(x));
}

void
f_intfloat(int a1,float a2)
{

    printf("intfloat: a1=%d a2=%f\n",a1,a2);
}

void
f_array(Array *arr)
{
}

// strhash -- calculate a string hash (hash algorithm that perl uses ;-)
uint32_t
strhash(const void *bf,size_t bflen)
{
    const uint8_t *bp;
    const uint8_t *bfe;
    uint32_t hash;

    bp = bf;
    bfe = bp + bflen;

    hash = 0x37FB26C5;

    for (;  bp < bfe;  ++bp) {
        hash += *bp;
        hash += hash << 10;
        hash ^= hash >> 6;
    }

    hash += hash << 3;
    hash ^= hash >> 11;
    hash += hash << 15;

    return hash;
}

uint32_t
sighash(const FnSig *sig)
{
#if USEHASH
    uint32_t hash = strhash(sig->sig_list,A_max);
#else
    uint32_t hash = 0;
#endif
    return hash;
}

// sigcalc_v -- get the type vector (i.e. "signature") for the function
uint32_t
sigcalc_v(FnSig *sig,va_list ap)
{

    sig->sig_xid = 0;

    // get the type vector (i.e. "signature") for the function
    int count = 0;
    while (1) {
        int type = va_arg(ap,int);
        if (type == T_null)
            break;
        sig->sig_list[count++] = type;
    }

    uint32_t hash = sighash(sig);

    return hash;
}

// sigcalc -- get the type vector (i.e. "signature") for the function
uint32_t
sigcalc(FnSig *sig,...)
{
    va_list ap;

    va_start(ap,sig);
    uint32_t hash = sigcalc_v(sig,ap);
    va_end(ap);

    return hash;
}

// addmethod -- add method to list
Method *
addmethod(Dispatch *disp,void *fn,...)
{
    va_list ap;

    Method *ret = calloc(1,sizeof(*ret));

    ret->fn.Fn_void = fn;

    FnSig *sig = &ret->sig;

    // get the type vector (i.e. "signature") for the function
    va_start(ap,fn);
    uint32_t hash = sigcalc_v(sig,ap);
    va_end(ap);

    Method *cur = disp->buckets[hash % BUCKETS];
    Method *prev = NULL;
    for (;  cur != NULL;  cur = cur->next)
        prev = cur;

    if (prev != NULL)
        prev->next = ret;
    else
        disp->buckets[hash % BUCKETS] = ret;

    return ret;
}

void
call2(Method *method,va_list ap)
{

    // dispatch on types ...
}

void
call1(Dispatch *disp,const FnSig *sigfind, ...)
{
    va_list ap;

    uint32_t findhash = sighash(sigfind);
    uint32_t bucket_no = findhash % BUCKETS;
    Method *cur = disp->buckets[bucket_no];

    for (;  cur != NULL;  cur = cur->next) {
        FnSig *sigcur = &cur->sig;
        if (sigcur->sig_xid == sigfind->sig_xid) {
            va_start(ap,sigfind);
            call2(cur,ap);
            va_end(ap);
            return;
        }
    }

    printf("dispatch failed for type tag:");
    for (int idx = 0;  idx < A_max;  ++idx)
        printf(" %d",sigfind->sig_list[idx]);
    printf("\n");

    exit(1);
}

SIGDEF(sig1,T_int,T_null);
SIGDEF(sig2,T_float,T_null);
SIGDEF(sig3,T_int,T_float,T_null);
SIGDEF(sig4,T_int,T_float,T_array,T_null);

int
main()
{
#if 0
    Dispatch f = { 2, calloc(2, sizeof(Method)) };
    f.methods[0] = (Method1) {
    T_int, (Fn1) f_int};
    f.methods[1] = (Method1) {
    T_float, (Fn1) f_float};
#endif

    addmethod(&dispatcher,f_int,T_int,T_null);
    addmethod(&dispatcher,f_float,T_float,T_null);
    addmethod(&dispatcher,f_intfloat,T_int,T_float,T_null);
    addmethod(&dispatcher,f_array,T_array,T_null);

    call1(&dispatcher, &sig1, (uintptr_t) 5);

    call1(&dispatcher, &sig2, f2i(6.5));

    call1(&dispatcher, &sig3, (uintptr_t) 5, 23.7);

    // this will fail as there is _no_ T_int,T_float,T_array
    call1(&dispatcher, &sig4, (uintptr_t) 5, 23.7);
}

Comments

0

So after some thinking i came up with this system. it's not exactly idiomatic form pov of C programming, but it does exactly what i want and seems to work well with limited number of arguments (0-7 arguments, 0-1 return value). Key insights:

  • void/unit/empty/none type that most prominent languages have is there not just for mathematical completeness but allows to check type signatures more easily.
  • we normally use integers as tag types and therefore type identity checks become integer comparison. we can extend this analogy to function types, which become like strings (where data types are individual characters), and comparing function types is mostly identical to string comparison. moreover, we can encode function types exactly like we encode strings: with fixed length (my approach), with known width (like most modern languages encode strings), null-terminated (useing the void/unit type tag as null), we can use utf-8-like compression on data type tags (if we have, say, uint32_t tags), we can store function's signature as "umbra-strings", it may be possible to use string hashing functions on these signatures, etc.
  • return type should be the first "character" in the function type string, it actually simplifies function type comparisons (not that much, but still).
  • there should be special void* / any type that matches any other characters (that's the only difference between function type comparison and string comparison).
  • this approach can be trivially extended to variadic arguments, by adding special flag to function signature.

limitations:

  • type unions, abstract types are not very easy to encode in this system (probably needs some sort of separate lookup table system).
  • runtime generic types (typed arrays, hashmaps), are also not encodable in this system, and i'm not even sure that there is a reasonable solution to this problem in this approach (perhaps there can be "compound" type tags, similar to how modern fonts can have compound emojis or ligature-connected characters).
  • there seems to be no good way to deal with key=value arguments in this sort of paradigm, but maybe it's better to just have a map/dict type and pass it as a single argument to functions.

below is the prototype i've ended up with for now:


#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

// type system. for now only numeric types, 
// but can be extended to arrays, strings & custom types
typedef union TPun TPun;
typedef struct Obj Obj;
union TPun{ // for type punning
  uintptr_t Any; void* P; Obj (*Fn)(Obj *arg7);
  uint8_t U8; uint16_t U16; uint32_t U32; uint64_t U64;
  int8_t I8; int16_t I16; int32_t I32; int64_t I64;
  float F32; double F64;
};
struct Obj {uint16_t T; TPun V;};

typedef enum {
  T_Unit = 0, T_Any=1, T_P = 2, T_Fn=3,
  T_U8=4, T_U16=5, T_U32=6, T_U64=7,
  T_I8=8, T_I16=9, T_I32=10, T_I64=11,
  T_F32=12, T_F64=13,
} TBase;

const char * TBaseNames[14] = {
  "Unit", "Any", "Ptr", "Fn", 
  "U8", "U16", "U32", "U64",
  "I8", "I16", "I32", "I64",
  "F32", "F64",
};
// dispatch system
typedef struct {uint16_t retT; uint16_t aT[7]; void* fn;} Method7; 
typedef struct {uint16_t len; uint16_t cap; Method7*methods;} Dispatch7;
// Dispatch is a dynamic array of signatures + pointers
int find_method7(Dispatch7 *disp, uint16_t sig[8]){
  for (int i=0; i<(int) disp->len; i++) {
    Method7*m = &disp->methods[i];
    uint16_t* msig = (uint16_t*) m;
    int sig_match = 1;
    for (int a = 0; a<8; a++) {
      sig_match &= (sig[a] == T_Any) | (msig[a] == T_Any) | (msig[a] == sig[a]);
      // any type is expected | returns untyped | types match 
    }
    if (sig_match) {return i;}
  }
  return -1;
}

int put_method7(Dispatch7 *disp, Method7*meth){
  int i = find_method7(disp, (uint16_t*) meth);
  if (i != -1) { // substitute
    disp->methods[i] = *meth;
    return i;
  } else { // append
    if ((disp->cap-disp->len)==0) {// realloc
      uint32_t newcap;
      if (disp->cap == 0) newcap=1;
      else if (disp->cap==1) newcap=4;
      else newcap=disp->cap+4;
      disp->methods = reallocarray(disp->methods, disp->cap+4, sizeof(Method7));
      assert(disp->methods && "don't forget to buy some RAM");
      disp->cap = newcap;
    } // append
    i = disp->len;
    disp->len++;
    disp->methods[i] = *meth;
    return i;
  }
}
int call7(Dispatch7*disp, Obj args[8]){
  uint16_t sig[8];
  for (int i=0; i<8; i++) {sig[i] = args[i].T;}
  int i = find_method7(disp, sig);
  if (i == -1) return 1;
  TPun fn; fn.Fn = disp->methods[i].fn;
  args[0] = fn.Fn(&args[1]);
  return 0;
}
void clear_args7(Obj args[8]){
  for (int i = 0; i<8; i++){
    args[i].T = T_Unit;//0
    args[i].V.Any = 0;
  }
  args[0].T = T_Any; // by default no expectation of return type, 
  // if particular type should be matched, it must be set explicitly
}

// example functions
Obj f_int(Obj *args){
  printf("int x = %ld\n", args[0].V.I64);
  return ((Obj) {T_Unit,{.Any=0}});
}
Obj int_f_int(Obj *args){
  printf("int x = %ld ; return int x = %ld \n", args[0].V.I64, args[0].V.I64+5);
  return (Obj) {T_I64,{.I64=args[0].V.I64+5}}; // to test returns
}
Obj f_float(Obj *args) {
  printf("float x = %f\n", args[0].V.F32);
  return (Obj) {T_Unit,{.Any=0}};
}
Obj f_int_float(Obj *args) {
  printf("int x = %ld ; float y = %f\n", args[0].V.I64, args[1].V.F32);
  return (Obj) {T_Unit,{.Any=0}};
}


Method7 ms[4] = {
    {0, T_I64, 0,0,0,0,0,0, f_int},
    {T_I64, T_I64, 0,0,0,0,0,0, int_f_int},
    {0, T_F32, 0,0,0,0,0,0, f_float},
    {0, T_I64, T_F32, 0,0,0,0,0, f_int_float},
  };
Dispatch7 f = {4, 4, ms};
int main(){
  Obj args[8];
  clear_args7(args);
  args[1] = (Obj) {T_I64,{.I64=5}}; 
  call7(&f, args); // int x = 5
  assert((args[0].T == T_Unit) && "void_f_int should be called");
  clear_args7(args);
  args[0].T = T_I64;
  args[1] = (Obj) {T_I64,{.I64=5}}; 
  call7(&f, args); // int x = 5
  assert((args[0].T == T_I64) && "inf_f_int should be called"); 
  assert((args[0].V.I64 == 10) && "int_f_int should return 10"); 
  clear_args7(args);
  args[1] = (Obj) {T_F32,{.F32=6.5}}; 
  call7(&f, args); // float y = 6.50000
  clear_args7(args);
  args[1] = (Obj) {T_I64, {.I64=7}};
  args[2] = (Obj) {T_F32,{.F32=8.5}};
  call7(&f, args); // int x = 7 ; float y = 8.50000
  clear_args7(args);
  args[1] = (Obj) {T_F32,{.F32=9.5}};
  args[2] = (Obj) {T_I64, {.I64=10}};
  int i = call7(&f, args); 
  assert((i != 0) && "should fail");
}

1 Comment

one additional idea is to store only a hash of function signature in the vtable (with generic/abstract types substituted to Any before computing the hash). then most dispatching will be very fast, but there has to be a fallback mechanism for distinguishing between functions that fall into the same bucket.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.