Waiter! There's a VLA in my C!

Real world ayekat has chosen to pass on some of his knowledge to the microtechnics and electricity people on his daily living site. Mostly this consists of annoying some poor students with elaborated in-depth facts about the beauty of Unix and the C programming language.

While it might sound like a rather doable job, one never ceases to learn new things. Or as for this matter: one never ceases to encounter the ugly sides in the things one so loves and cherishes.

In my case it was a simple sequence of code that I thought would never compile — and the moment it did, I feared I would not be on good terms with it:

        int n;
        scanf("%d", &n);
        int array[n];

-Wall!

-Wextra! -pedantic! man gcc! -WHowCanThisNotThrowAWarning

Variable Length Arrays

What is happening? Obviously the array is allocated at runtime, yet it doesn't seem to require a manual free afterwards. And indeed, a quick test program confirmed my assumption: it is allocated on the stack.

That doesn't seem to be that much of black magick. After all, alloca — while non-standard — is also used to allocate memory on the stack, and apparently it is just C99 notation for

        int n, *array;

        scanf("%d", &n);
        array = alloca(n * sizeof(int));

So why do I react so allergic to those VLAs?

First, I do program in C99, but I tend to put all variable declarations at the top of the functions. That may be a relic from the C89 times, but for me, this also makes it easier to comprehend what is happening on the stack; I don't like random variable declarations popping up in the middle of the code (and to all people arguing about keeping variable declarations as local as possible: they are local to the function — if you argue with "but for longer functions", modularise your bloody code).

VLAs mess around with my mental picture of what's happening, because a variable declaration depends on executed code (the alloca way makes it clearer, while keeping the same functionality).

As for a technical reason: again, it is allocated on the stack, and the stack has a limited size, and all you can do is sit there and watch and pray… and subsequently enter Damnation:

RETURN VALUE

The alloca() function returns a pointer to the beginning of the allocated space. If the allocation causes stack overflow, program behavior is undefined.

Undefined.

Try entering 192837465647382910 to the above program and watch the World burn. This is no longer some non-standard, unsafe technique — it is now some standard, unsafe technique. And to make matters worse, the abovementioned students use a two-dimensional VLA to store image data on the stack.

What could possibly go wrong…

The Rabbit Hole

As stated above, the students use two-dimensional arrays. So while messing around with those VLAs, I stumbled over another peculiarity in the C language.

void do_something(int width, int height, int **array);

// ...

        int array[w][h];
        do_something(w, h, array);

Now let's see what interesting things gcc has to say about this:

warning: passing argument 3 of ‘do_something’ from incompatible pointer type [-Wincompatible-pointer-types]
  do_something(w, h, array);
                     ^
note: expected ‘int **’ but argument is of type ‘int (*)[(sizetype)(a)]’
 void do_something(int width, int height, int **array);
      ^

So apparently a 2D array is not the same thing as **, fine — but what in the name of the Seven Hells is an int (*)[(sizetype)(a)]?

Multi-Dimensional Arrays

This might seem pretty logic, but I've nevertheless managed to write programs in C for about 8 years without paying proper attention to the subtle differences between pointers and arrays; I knew that if they were statically allocated, the program would handle them differently for sizeof, yet I failed to grasp the exact reason for that — and subsequently also what so radically separates multi-dimensional arrays from pointers.

Yet, if we look at how they are stored in memory, it makes perfect sense: a 2D array is not simply a 1D array of 1D arrays, but rather like one big 1D array that has been cut into n pieces. We can't just put classical "double-pointers" there (think about pointer arithmetics violently blowing up and cities burning and people dying and the air stinking and everything being really, really bad).

So how can we reference a 2D array?

        int a[3][4];
        int **p;
        p = a;      // OK

                    // ... or no, wait - no, actually: BOOM!

Not like this.

Let us have another look at that bizarre int (*)[(sizetype)(a)]. It resembles a function pointer, yet that would make no sense. And it isn't, anyway.

It is simply a notation. The (*) indicates that it is a pointer to something, and the brackets to the right indicate that the "something" is an array of size a. Consequently, a 2D array is like a 1D array of some "units", where each unit has a size of a. And sizetype is simply gcc's way of telling us that it expects a type that can be used to indicate a size — might be an int, a size_t, or something alike.

So we would need to write this instead:

        int a[3][4];
        int (*p)[4];
        p = a;      // yay!

a[3][4] is a array of three 4-element units. (*p)[4] is a pointer to the first element in an array of 4-element units (not an array of arrays) and the notation suddenly starts to make "sense", or whatever you would call that for some weird C notation.

Equally, a function would need to be declared like this:

void do_something(int width, int height, int (*array)[height]);

// or with some "sugar":
void do_something(int width, int height, int array[][height]);

// or with even MORE "sugar":
void do_something(int width, int height, int array[width][height]);

Science! Source: Tech Talk About C99 — What Are Variable Length Arrays?

read more

2021

2019

2017

2015