Mad Man with a Compiler: code

Showing posts with label code. Show all posts

Tuesday, November 2, 2021

Don't Use Modular Arithmetic With Negative Numbers

Most programmers are familiar with the "modulus" operator, which is the percent symbol in the language C and many other programming languages that have followed. Anyone who remembers learning division in elementary school will recall that when you divide a number, say 21 by a smaller number, say 5, you get a whole number part, 4 in this case, but there's a little left over. Later in arithmetic, you learned to continue the division and get 4.2, but when first learning, you would be told to stop at 4 and say you had a remainder of 1. This concept turns out to be incredibly useful given that computers do a lot of work with integers.

In C, when working with integer math, you will observe:

   21 / 5 = 4

   21 % 5 = 1

When I was a math major in college, we worked with modular arithmetic quite a bit both in my number theory class and in my modern algebra class when working with ring theory. Number theory in particular was a wonderful class for a computer scientist, as it is the mathematical basis for pretty much all integer computing, as well as the basis for cryptography.

Now in modular arithmetic, as any mathematician knows, if you're working with mod 5, you always end up with numbers from 0 through 4. Here, 4+1 is 0, and 0-1 is 4. You've just taken the number line and wrapped it into a tight circle.

In computers, this is used in many applications.

Anything needing random numbers typically calls a function that returns a random unsigned integer from zero to however large the integer is (about four billion for a 32-bit integer). If you're writing code to simulate rolling dice for Dungeons and Dragons, then you'll often want a random number from 1 through 20, so you'll take your huge random number, mod it by 20 to get a number from 0 through 19, and add one.

Another common data structure is an array. It doesn't matter here what the elements are, they could be simple integers or complex structures. What matters is that the array has a known number of elements, and you reference elements using integer indices of zero through one less than the array length. So if the array has 50 elements, they're 0 through 49. Now sometimes we have an array where we are keeping a history of events, so we save the index of the next event to write, and after writing it, we add one to move to the next element. But to avoid going past the end, we mod the index by the array length to make it wrap around at the end.

That all works great.

Now suppose we have a circular array as I described, and we want to read it backwards. We should be able to simply keep subtracting one and taking the result mod the array size so that when we go from zero to negative one it will jump back instead to the end (49 in the above example). But we have two problems here.

First, often programmers use unsigned integers, especially for array indices. When you subtract 1 from 0 with an unsigned number, it's an integer underflow, which wraps around to be the largest possible integer (that 4 billion-ish number for 32-bit integers). And unless your array size is a power of 2, that modular arithmetic will not work out right. If you think back to how modular arithmetic is like wrapping the number line into a circle, you can see this if you imagine wrapping a 72-inch tape measure into a 5-inch circle. This would work great most of the way, but since you would have that last two inches at the end, moving backwards from zero would be as if the tape connected instantly back to 72, so subtracting one would go to 71, which would be read mod 5 as 1, not 4 like a mathematician would expect.

Second, even if you're carefully using signed integers for the array indices, the modular operator doesn't follow arithmetic rules as mathematicians would expect. In fact, it's not really a modular operator at all, but really a remainder operator. But those are the same thing, right? Not really. The catch is how the division operator works.

What is -1/5 in integer arithmetic?

Is that 0 or -1?

If it's -1, then you would have a remainder of 4. That's also what you would want from a modular operator, keeping the value always in the 0 through 4 range.

If it's 0, then you would have a remainder of -1. That's not how modular arithmetic works, as you can't have a negative modular result.

In C, the answer is 0. And the so-called modular operator will tell you that -1%5 is -1.

In C, division of negative numbers rounds towards zero instead of negative infinity, which breaks true modular arithmetic.

Fortunately, it's not hard to work around this problem.

Say you're moving backwards through a circular array, and what you want to do is:

   previous_index = ( current_index - 1 ) % number_of_elements;

But since that breaks at zero, just add in the array size:

   previous_index = ( number_of_elements + current_index - 1 ) % number_of_elements;

That simple change prevents the index from ever going negative, and except when wrapping around below zero, the added array size is just erased by the modular operator.

Tuesday, August 15, 2017

A Subtle Difference between C and C++: String Literals

For the most part, C++ is a superset of C. You can write C code, rename the file with a .cpp extension, and the compiler will compile it in C++ mode, generating essentially the same code. In fact, the very first C++ compilers were actually C compilers with extra pre-processing. But there are some subtle differences, and I recently ran into one that has some important implications.

In C, string literals are not constant, but in C++ they are.

For most practical purposes, string literals are constants in C. The compiler puts them into a data segment that the code can't modify. All the libc string functions take (const char *) parameters. But if you assign a (char *) pointer to the start of a string literal, the compiler won't warn you about dropping the 'const' from the type, and you can write code that modifies the string (only to have it fault at run time). In fact, the warning about dropping the 'const' is probably the reason that newer versions of C haven't changed string literals to be constant.

Why does anyone care?

I've found two specific instances where code works when compiled as C++, but not as C because of this.

You can't use string literals as the targets of case statements.

This may seem obvious, as a case statement takes an integer type, not a string. But suppose you want to do a switch based on the first four characters of a string (after ensuring that it's at least four characters long). Imagine the following macro:

#define STR2INT(s) ( ((s[0]) << 24) | ((s[1]) << 16) | ((s[2]) << 8) | (s[3]) )

Now you could write code like:

   switch(STR2INT(option)) {
      case STR2INT("help"):
         ...
      case STR2INT("read"):
         ...
   }

That works in C++. But in C, the compiler complains that the case statements aren't constant expressions. To make it work in C, you have to have a much uglier version of the macro for the case statements:

#define CHARS2INT(w, x, y, z) (((w) << 24) | ((x) << 16) | ((y) << 8) | (z))

Then the code looks like:

   switch(STR2INT(option)) {
      case CHARS2INT('h','e','l','p'):
         ...
      case CHARS2INT('r','e','a','d'):
         ...
   }

That works in both C and C++, but is a pain to write. At least the STR2INT macro works fine in other situations where the compiler insists on constant values.

You can't write asserts based on macro names.

In large software projects, it's not unusual to have sets of macros for specific purposes. These macros are by convention supposed to follow some project-specific format. There even may be a defined correlation between the name of the macro and the value. It would be nice to be able to write asserts based on the macro name to enforce those conventions.

A quick aside on asserts:

Both C and C++ now support compile-time asserts. It used to be that you would write code that would generate a negative shift if the expression wasn't true or something like that. When the assert failed, you would get a compile-time error that was rather confusing until you looked at the offending line. With the new mechanism, the compiler displays whatever message you tell it. You use static_assert(expression,"message"); In C, you have to include <assert.h> or use _Static_assert. This was added in C11 and C++11.

So for a trivial example, suppose we have macros like:

#define VAL_7B 0x7b

Now somewhere we use those macros:

   process_value(VAL_7B);

Obviously real code would have other parameters, but this is enough for our purposes.

To have asserts based on the macro name, what appears to be a function call must also be a macro; presumably a macro wrapper around the real function call. Consider this definition:

#define process_value(v) \
   do { \
      _process_value(v); \
   } while(0)

That's a basic wrapper, forcing a semicolon at the end of the do-while loop. This lets us add in asserts using the '#' preprocessor operator to stringify the input parameter:

#define CHAR2HEX(c) ( (c) <= '9' ? (c) - '0' : (c) - 'A' + 10 ) // Assumes uppercase
#define process_value(v) \
   do { \
      static_assert( (#v)[0]=='V' && (#v)[1]=='A' && (#v)[2]=='L' && (#v)[3]=='_', "Must use a 'VAL_xx' macro here" ); \
      static_assert( CHAR2HEX((#v)[4]) == ((v)>>4)  , "'VAL_xx' macro doesn't match defined value" ); \
      static_assert( CHAR2HEX((#v)[5]) == ((v)&0x0f), "'VAL_xx' macro doesn't match defined value" ); \
      static_assert( (#v)[06]==0, "'VAL_xx' macro format wrong" ); \
      _process_value(v); \
   } while(0)

In C++, that works great. In C, you just can't do that.

And here's something interesting: Why not change the above example to look like:

#define CHAR2HEX(c) ( (c) <= '9' ? (c) - '0' : (c) - 'A' + 10 ) // Assumes uppercase
#define process_value(v) \
   do { \
      static_assert( (#v)[0]=='V' && (#v)[1]=='A' && (#v)[2]=='L' && (#v)[3]=='_', "Must use a 'VAL_xx' macro here" ); \
      static_assert( CHAR2HEX((#v)[4]) <= 0xf  , "'VAL_xx' macro with bad hex value" ); \
      static_assert( CHAR2HEX((#v)[5]) <= 0xf  , "'VAL_xx' macro with bad hex value" ); \
      static_assert( (#v)[06]==0, "'VAL_xx' macro format wrong" ); \
      _process_value( CHAR2HEX((#v)[4])<<4 | CHAR2HEX((#v)[5]) ); \
   } while(0)

This uses the value directly out of the macro name, so you can leave the value off entirely when defining the macro, right? Yes. But it goes further than that. Since the above code only uses the stringification of the parameter, it never expands it. That means it's perfectly happy if you never define the VAL_XX macros at all, which is probably not what you want. Be sure that the wrapper macro expands the macro somewhere if you want to be sure it's actually a defined macro.

Conclusion

So if you've followed my other writing up to this point, you're probably expecting some clever hack to make this work in C. Sorry, but not this time. It would probably be relatively simple to add a compiler option or #pragma directive to make string constants literal in C, but gcc doesn't have this, and I'm not aware of any other compiler that does. (Please comment if you know otherwise.) There are plenty of tricks you could do if you're willing to use additional tools in your build process, like an additional step between the regular preprocessor and the compiler to look for extracting characters from string literals and convert them into character constants (and you could tell the compiler to use a wrapper script to do that as the preprocessor), but that's not likely to be an acceptable option.

You just can't do that in C.

Resources

This is the test file I used to be sure my above examples were correct: c_vs_cpp_example.c

Monday, June 19, 2017

Go To Statement Considered Helpful (part 5: Fun with Switches)

For my final segment on the use of gotos, let us not forget one important hidden goto. The switch statement is essentially a goto with a variable label. This means that the if(0) trick can apply here, too.

Before I get into the meat, I want to point out that many of the examples in this section aren't great code. In most cases the best way is to find a more traditional solution. Unlike the previous segments I've written on using gotos, this is mostly fun hacking food for thought, not something you're likely to include in your code.

You're probably vaguely familiar with Duff's Device. Typically you run across it as a loop unrolling technique that a professor mentions in some computer science class, but you then forget about it. Anyway, here's an example. Without Duff's Device, you have a standard loop as below:

for(i=0; i<n; ++i) do_it_once();

With Duff's Device, the loop is unrolled manually as below:

int loops = (n + 3) / 4;
switch(n%4) {
    do {
        case 0: do_it_once();
        case 3: do_it_once();
        case 2: do_it_once();
        case 1: do_it_once();
    } while ( --loops > 0 );
}

That takes a bit to make sense of. The trick is that first time in, the switch jumps to only do the odd cases, then it hits the do/while loop and goes through all the cases for as many times as is needed. The most common case cited for using that code is writing out a buffer to a register. If you're writing to a serial port or something like that, then that's the right way to do it, but if you're doing something more like a buffer copy, a simpler way that is easier to parse would be to pull the loop outside the switch as in the code below. To keep this example parallel to the Duff's device example, the odd instances are done first, though there's no reason for doing it one way or the other. (In fact, for a buffer copy, you might do odd bytes first and last to optimize for alignment, depending on the needs of the system.)

switch(n%4){
    case 3: do_it_once();
    case 2: do_it_once();
    case 1: do_it_once();
}
for(i=0;i<n/4;++i) do_it_four_times();

That's rarely used, but in very specific cases it can be a big win. For example, a memory copy routine may use something very similar to use large registers for the bulk of the data while still supporting odd numbers of bytes. Duff's Device (with the do loop inside the switch statement) has the advantage of eliminating redundant code, especially when the do_it_four_times() call needs to be the same statement four times, and when a larger number is used. If you're working at this level of manual optimization, they key is to try a number of implementations and run performance testing. When combining clever code with optimizing compilers and specific hardware, there are usually more factors than you're taking into account, and you may be surprised at the results.

Now for what I like to call Crow's Corollary to Duff's Device. I sometimes find that I have a series of cases that handle mostly the same thing, but need a slightly different setup. Other cases have completely different code. See the example below.

switch(var) {
    case A:
        buf_to_use = A_BUF;
        use_buf(buf_to_use);
        break;
    case B:
        buf_to_use = B_BUF;
        use_buf(buf_to_use);
        break;
    case C:
        buf_to_use = C_BUF;
        use_buf(buf_to_use);
        break;
    case D:
        // Completely different code from A, B, C
        ...
}

Now imagine that "use_buf()" is 15 or 20 lines of code instead of a simple function. (Yes, I'll accept that the best solution is probably to refactor those 15 to 20 lines into a simple function, but I'm having fun looking at other options.) In order to avoid repeating the code that is the same for the similar cases, you can put the case labels inside an if(0) just like with a goto! Here is the example without the repeated code.

switch(var) {
    case A:
        buf_to_use = A_BUF;
        // Fall through but skip over assignment
        if ( 0 ) {
    case B:
            buf_to_use = B_BUF;
        }
        // Fall through but skip over assignment
        if ( 0 ) {
    case C:
            buf_to_use = C_BUF;
        }
        // General code for cases: A, B, C
        use_buf(buf_to_use);
        break;
    case D:
        // Completely different code from A, B, C
        ...
}

The importance here is the if(0) construct allows you to avoid duplicating the code that is common to several of the cases without requiring that the initial portion of the cases be identical. This could be implemented with a nested switch statement to set up A, B, and C, but programmers are lazy; I've never seen anyone implement the above example with a nested switch to avoid the use_buf() duplication, even if it's 20 lines of code instead of one. Rarely will someone create a function to factor out the duplicated code. Usually I just see the 20 lines of code duplicated. That's bad. That's what if(0) is there to solve.

Of course, this can be implemented more directly using goto:

switch(var) {
    case A:
        buf_to_use = A_BUF;
        goto use_buf;
    case B:
        buf_to_use = B_BUF;
        goto use_buf;
    case C:
        buf_to_use = C_BUF;
        goto use_buf;
    // General code for cases: A, B, C
    use_buf:
        use_buf(buf_to_use);
        break;
    case D:
        // Completely different code from A, B, C
        ...
}

The former version with if(0) blocks interleaved with the case statements is more fun, but it's certainly more confusing and hard to read than simply using a goto. The goto version is also much more friendly to your text editor's auto-indent configuration. (I'm not even sure what "proper" indention is for the interleaved if(0) construct!)

And here's what you would need to write if you were being pedantic about not using gotos:

switch(var) {
    case A:
    case B:
    case C:
        switch(var) {
            case A:
                buf_to_use = A_BUF;
                break;
            case B:
                buf_to_use = B_BUF;
                break;
            case C:
                buf_to_use = C_BUF;
                break;
            default:
                flag_error(); // Should be impossible to reach
                break; // avoid warning
        }
        // General code for cases: A, B, C
        use_buf(buf_to_use);
        break;
    case D:
        // Completely different code from A, B, C
        ...
}

That's clunkier to write, harder to read, and more difficult to maintain. If you change the list of cases that use_buf() applies to, you have to modify the case statements in both switch statements, which might as well be begging for bugs. The repeated case statements in the nested switch statement still constitute duplicated code--exactly what we're trying to avoid by factoring out the use_buf() code. Keep your code clean and concise. Use the right tool for the job, and sometimes 'goto' is the right tool.

If you really want to write the same code without a goto or repeated cases, there is a way, but I don't recommend it. You can interleave another loop inside the switch statement so the breaks for the cases break the loop, not the switch. It may be a fun hack, but it's not good code:

switch(var) {
    // do loop to change where break goes
        do {
    case A:
            buf_to_use = A_BUF;
            break; // exits do loop, not switch
    case B:
            buf_to_use = B_BUF;
            break; // exits do loop, not switch
    case C:
            buf_to_use = C_BUF;
            break; // exits do loop, not switch
        } while (0);
        // General code for cases: A, B, C
        use_buf(buf_to_use);
        break; // regular switch exit
    case D:
        // Completely different code from A, B, C
        ...
}

This is the closest corollary to Duff's Device, but again, don't do that. The above example should be viewed by the compiler as being identical to the goto version. Most C programmers aren't used to mixing up switch statements with other control structures like that, which also makes it harder to understand. Unless you have some arbitrary "no gotos" rule handed down from management, the goto version is the simplest to read and understand. Use the gotos.

Go To Statement Considered Helpful (part 4: Tail Recursion)

One other place where goto can be useful is in implementing tail recursion. As you should recall, tail recursion is when a function returns with a call to itself. For example:

foo(int a)
{
    ...
    if ( recurse )
    {
        return(foo(a+1)); // tail recursion
    }
}

Most compilers will optimize that into a goto to the top of the function. However, in some cases the compiler will fail you, and you have a runtime fault when the stack overflows. Perhaps you turned off optimization to aid in debugging. Perhaps the compiler isn't as smart as it should be. Who knows? If you're running with a limited stack (and stacks are always limited), don't trust your compiler. Replace the recursion with a goto as below:

foo(int a)
{
  tail_recursion:
    ...
    if ( recurse )
    {
        a=a+1;
        goto tail_recursion; // return(foo(a)) without recursion
    }
}

Now you won't blow away your stack if the optimization settings change.

Of course, you can implement that by putting the body of the function inside a while(1){...; break;} loop, and then replace the goto with a continue. However, doing that is just as much of a hack as a goto. It also adds an unnecessary indention level, and it is definitely harder to read than the simple goto.

Unlike the previous examples, I'm not hiding the goto here in a macro. I am using a label name that should tell any experienced programmer what I'm doing. If I were to hide this in a macro, it would be less clear what's going on.

Now you might object that previously I've said to trust the compiler to optimize your code and don't worry about whether it looks like you're adding a few extra instructions. I still stand by that, but this isn't a matter of trusting optimization for a few instructions. This is a matter of your program not crashing. I ran into this exact situation in real code. When the correctness of your code requires optimization, you need to force the code to always be optimized, which is exactly why we need the goto in this case.

Go To Statement Considered Helpful (part 3: Finding Matches in a Loop)

Another time I often find use for a goto is in a for loop that is scanning an array to find a match. If a match is found, a break terminates the loop. See the standard version of this below. It repeats the condition check for the loop to see if the loop terminated early. The other option would be to use a flag to indicate if a match was found. This requires initializing the variable and then setting it if a match is found. Both of these variations are very common.

for (i=0; i<ARRAY_SIZE(data); ++i)
{
    if ( data[i].key == value ) break;
}
if ( i<ARRAY_SIZE(data) )
{
    // data[i].key is our match
    match_found:
    ...
}
else
{
    // No match was found
    ...
}

I hate repeating code; that's a source of bugs, as years later someone might need to make a change and only change one of the two copies. Adding a tracking variable is clunky. Again, goto can save the day. Consider the code below:

for (i=0; i<ARRAY_SIZE(data); ++i)
{
    if ( data[i].key == value ) goto match_found;
}
if ( 0 )
{
    // data[i].key is our match
    match_found:
    ...
}
else
{
    // No match was found
    ...
}

This eliminates the repeated comparison, so if something changes, it only has to change once. A quick reaction that many will have is that this also will be faster than the alternative, but generally compilers are very good at eliminating redundant comparisons, and processors are fast enough that it's rarely worth worrying about saving a few machine instructions; in fact, it's usually best to be willing to sacrifice machine instructions for better code.

Without the goto, you can't scope the loop index to be just the loop, or you end up using an extra boolean tracking variable. Most importantly, without the goto, the code is longer and more complicated, and that's exactly what you don't want. Of course, in most cases, you'll still need the loop index to know which item matched, but sometimes you only need to know if there was a match.

Once again, the C preprocessor comes to the rescue with a clever pair of macros below:

#define EXIT_LOOP_WITH_MATCH(name) goto _loop_match_ ## _name;
#define IF_LOOP_EXITED_WITH_MATCH(_name) if(0) _loop_match_ ## _name:

The macro works just like a regular if statement. You can use it with or without braces, and you can use a regular else condition. The only catch is that the compiler will likely complain about an unused label if you have the IF macro without a matching EXIT macro. That may well be more of a benefit than a restriction.

With the macro, our previous example is nice and clean as seen below:

for (i=0; i<ARRAY_SIZE(data); ++i)
{
    if ( data[i].key == value ) EXIT_LOOP_WITH_MATCH(my_loop);
}
IF_LOOP_EXITED_WITH_MATCH(my_loop)
{
    // data[i].key is our match
    ...
}
else
{
    // No match was found
    ...
}

Those macros may look familiar. They should. They're exactly the same as the THROW and CATCH macros I discussed previously for exception handling in C.

It is interesting to note that this and the previous example for the use of goto (breaking out of a nested loop and avoiding extra code to handle exiting a loop early) are exactly the two cases cited in Kernighan and Ritchie's The C Programming Language book when it discusses the goto statement in section 3.8.

Go To Statement Considered Helpful (part 2: Nested Loops)

If you're in a nested loop in C code, and you want to issue a break or continue statement, but want it to apply to the outer loop, you're going to have some awkward code.

  for(j=0;i<j_end;++j)
  {
      for(i=0;i<end;++i)
      {
          ...
          if (condition) next_j; // break then continue?
          ...
      }
      ...
  }

Implementing that 'next_j' statement could be done with setting a flag, issuing a break, and then checking the flag after the inner loop terminates. What we want is a "goto continue_outer_loop." And when you do this, it can be a perfect place for one of my favorite tricks, the use of "if (0)" for live code. At the end of the outer loop, put in the block of code, "if (0) { continue_outer_loop: continue; }" and you can then use your goto to implement the feature that the language is missing. This looks like:

  for(j=0;i<j_end;++j)
  {
      for(i=0;i<end;++i)
      {
          ...
          if (condition) goto continue_outter_loop;
          ...
      }
      ...
      if(0) continue_outter_loop: continue;
  }

A nice thing to do would be to imagine how the language might be extended to eliminate this need for a goto. Loops could easily be named, allowing the name of the loop to be an optional parameter to a break or continue statement. What if instead of for(i=0;i<10;++i), you could write for(i=0;i<10;++i;index_loop). Then you could use "break index_loop;" inside an inner loop with no need for a goto. Good luck getting the language committees to agree to this anytime soon. On the bright side, Java recognized this shortcoming, so the language provides for named loops; in C and C++, goto lets you implement the missing feature.

In this case, all this can be included in macros, as shown below. The first pass at trying this can result in compiler warnings about unused labels, which is why I added the extra do-nothing gotos. With even the most minimal compiler optimizations, this should provide exactly the same execution as if the language provided for named loops natively. The only restriction is that you can't use the same name for more than one loop in the same function.

/*
 * Named loops:
 *
 * Name any loop, typically the first thing after the opening brace.
 * Then in a nested loop, break or continue from it with
 * BREAK_LOOP/CONTINUE_LOOP.
 *
 * Code Copyright 2015 by Preston Crow
 * used by permission
 * (You have permission to use this as long as you keep
 *  this comment block intact.)
 */
#define NAME_THIS_LOOP(_name)            \
if(0)                                    \
{                                        \
    goto _continue_loop_ ## _name;       \
    _continue_loop_ ## _name : continue; \
}                                        \
if(0)                                    \
{                                        \
    goto _break_loop_ ## _name;          \
    _break_loop_ ## _name : break;       \
}                                        \
do { ; } while (0) /* Force a semicolon */
#define CONTINUE_LOOP(_name) goto _continue_loop_ ## _name
#define BREAK_LOOP(_name) goto _break_loop_ ## _name

Note that the naming macro can go anywhere in the loop where a continue or break statement would work as expected. Using those macros, the code is quite readable:

  for(j=0;i<j_end;++j)
  {
      NAME_THIS_LOOP(outter_loop)
      for(i=0;i<end;++i)
      {
          ...
          if (condition) CONTINUE_LOOP(outter_loop);
          ...
      }
      ...
  }

I thought I was very clever the day I came up with that. I felt a little less clever when researching revealed that others have also suggested very similar macros for the same purpose. In any case, I highly recommend that you use these macros or the like in all C and C++ projects where appropriate.

Go To Statement Considered Helpful (part 1: Exceptions)

The most obvious use of a goto statement in C is for exception handling. Many people before have noted that this is one good use of gotos.

It's not unusual to have a function with a number of checks for errors. When an error is encountered, the function must release locks and free memory acquired and allocated at the start of the function before returning an error to the caller. The simple way to do this is to have all the errors jump to a label near the end of the function where this is done. Often this is the same code that a successful exit from the function also executes. This is certainly better than repeating the same code many different times in the same function. Newer languages have recognized this and implemented this functionality in the language. In C++, you can use throw/catch. In Java, there is also the try/finally construct that can also handle many of these situations.

In C, to handle exceptions, you will often see code with a goto as below:

    buf=malloc(size);
    if ( !buf ) goto error_exit;
    ...
  error_exit:
    printf("Out of memory\n");
    ...

This code is typical enough that many coders are happy with it like that. But what if we could implement exception handling like C++ in C using macros? Well, we can. Two language tricks make this possible. First, you can use "if (0)" to create code that can only be reached by a goto. Second, the target of an if statement can be a labeled statement or block. I'll get back to that in a bit. Here are the macros:

#define THROW(_name) goto handle_exception_ ## _name;
#define CATCH(_name) if(0) handle_exception_ ## _name:

Now there's no need to complain that C lacks exception handling. This works much like the C++ version, but without the parameter. Also, they are scoped for the whole function, as they aren't connected to a “try{...}” block of statements. See the example below:

buf=malloc(size);
if ( !buf ) THROW(out_of_memory);
...
CATCH(out_of_memory) {
    printf("Out of memory\n");
}
if (buf) free(buf);
return;

Wait a second. What's going on with those macros? That's some crazy code that can't be right.

The first trick is to protect the target of the goto in the target of an if(0) statement. That means the target of the goto is in code that can't be reached without the goto. This trick reduces some of the spaghetti code issues normally associated with goto statements.

But what about that label after the if(0)?

Well, I'm taking advantage of a surprising glitch in the C language definition. When I first wrote the code, I was thinking that what I needed for my macro was to be able to write: if(0) label: { statements; } but putting the label in there certainly wouldn't be allowed. Much to my surprise, it compiled and worked! I went back and checked the language definition, and found that it is, in fact, legal code. I can't believe they intended this when defining the language, but as a lucky artifact of how they set up the parsing, it is how they defined it. (Perhaps it was because case statements in a switch work just like labels. The reason isn't particularly important here.) This allows you to macro-ize the CATCH without having any ugliness with the target and braces. You can follow the CATCH macro with a block of statements, a single statement, or just a semicolon to follow the same code that follows.

By hiding all this in the macro, I've effectively extended the C language to add a missing feature.

Go To Statement Considered Harmful (revisited)

Hopefully you've heard of the famous letter by Edsger Dykstra, "Go To Statement Considered Harmful." This was one of the most influential letters ever written in Computer Science, published in Communications of the ACM in March, 1968. This letter effectively defined structured programming for generations of coders. It has been taken as gospel and taught in so many programming classes that most developers take it for granted. The goto statement is just too low-level for modern programming.

This has been taken so far that some languages simply didn't include goto in the language. Java is a prime example. Though in that case, they were hedging their bets, as they kept a reserved word for it, and they did include it in the bytecode.

The letter itself is rather dry and academic. You can find it online, and it's worth the read. (I found it at https://www.cs.utexas.edu/~EWD/transcriptions/EWD02xx/EWD215.html under the original title; it was the ACM editor that selected the now-famous "harmful" title.) It has good points. In general, the use of goto results in spaghetti code that is hard to debug and harder to maintain. Clean code breaks things into functions, loops, and other well-defined control structures, with little obvious need for direct jumps.

I first heard of this when my world of programming consisted of Atari BASIC. The idea of not using goto was rather shocking in that context, but then most BASIC implementations of the day lacked functions, while loops, and a number of other higher-level structures that even shell scripts have today. When I moved on to Pascal and C, there seemed to be little use for the old goto, and I readily accepted the idea that it was a bad idea.

But the idea of avoiding direct jumps has been treated as a religious dogma instead of as a general guideline, and this has been harmful itself. There are many reasons why using an explicit goto is a good idea. In fact, no matter how nicely structured your code is, it's impossible to avoid that pesky goto. How do you think all those nice loops are implemented? At the assembly level, it's all comparisons and jumps.

Now you can say that that's just a technical nitpick, and you're mostly correct. Dykstra even referenced it in his original letter. Just because the good structured statements your language provides are implemented with jumps doesn't mean you should use them directly. However, what it does do is give a good consistent reason for when you should use gotos in your code.

Use a goto only when you need to implement something that your language doesn't provide for.

Following this, I've written a series of examples where a goto statement provides some useful and interesting features for C programmers.

Thursday, December 10, 2015

Portable Inline Assembly

This is a fun little tidbit. Assembly language is, obviously, not portable. If you use inline assembly code in a C program, you won't expect it to run on multiple processors without special handling for each one. Well, there's an exception to that.

First, the trivial exception:

__asm__("");

That's valid code (assuming the gcc compiler, as I will throughout this discussion). But it doesn't do anything, so it's rather pointless. The magic comes in when you specify the input and output parameters. For output, I'll specify two registers, assigned to the variables 'a' and 'b'. For input, I'll use the same registers. The syntax GCC uses is that zero is the first output register, and so-forth. Note that you specify the output first.

__asm__("" \
 :"=r"(a), "=r"(b) \
 :"0"(b),"1"(a) );

Now this does something! We've told it that the input is two registers with the values from variables 'b' and 'a', and the output is the same two registers which are variables 'a' and 'b'. Notice the order. We've swapped two variables. No intermediary storage. No fancy xors. No actual code. What we've done is tell the compiler to swap it's notion of where 'a' and 'b' are stored (with both being registers).

We can make this into a fancy macro:

#define SWAP_ASM(_a,_b) __asm__("" \
    :"=r"(_a), "=r"(_b) \
    :"0"(_b),"1"(_a) );

And demonstrate that it works:

#include <stdio.h>

#define SWAP_ASM(_a,_b) __asm__("" \
    :"=r"(_a), "=r"(_b) \
    :"0"(_b),"1"(_a) );

int main(int argc,char *argv[])
{
   int a='a', b='b';
   (void)argc;(void)argv; // no "unused" warning
   SWAP_ASM(a,b);
   printf("a contains '%c'\n"
          "b contains '%c'\n",a,b);
   return 0;
}

Now admittedly, this isn't very useful. If the variables aren't in registers, the compiler has to generate load and store instructions before and after the swap. Using inline assembly can mess up optimization (probably more so in older releases), so using a variable to do a swap with all your code being in C is likely a better choice. Still, a neat little hack.

Wednesday, November 25, 2015

Solving a Puzzle

Earlier this year, I was given a puzzle as a gift. It's a simple frame with Tetris-like blocks of different shapes, and the object is to get all nine blocks into the frame. Making it more difficult, there will be leftover empty squares throughout the puzzle. I played around with it for a while, and it wasn't too difficult to get to within one square of completing it, but I was probably nowhere close to actually solving it.

Being a programmer, I decided it would be more fun to program my computer to solve the puzzle. I spent a couple of hours on it, and came up with a nice result. I'll share the result below, along with some observations about the program and the puzzle.

First, here's a link to the puzzle: http://www.palmettopuzzleworks.net/

You can buy it directly through Etsy for $20.

Before I delve into my program, let me start by suggesting that this might make a great programming assignment. It's a very well-defined problem, but with a number of subtleties to watch out for.

So let me show you my solution. My intuition was that this would be a computationally complex problem, and it looked like a good fit regardless, so my language of choice was C.

The first thing to do is define the data structures. I've found that in the vast majority of problems, the data structures drive the code. If you have the wrong data structures, you'll never get a good solution, but with the right data structures, the code mostly flows into place.

As you can see in the picture, the board is an 11x11 grid with some spaces filled in. That makes the board an easy two-dimensional array. The largest piece fits in a 4x5 grid, so I created a three-dimensional array for the pieces initialized to a 1 for the spots that are filled and a 0 for the squares not occupied. Thinking ahead, I'm going to have to rotate each piece, so I need to use a square for each piece, not just a rectangle, so each piece is a 5x5 grid.

int grid[11][11];
int pieces[9][5][5] =
{
   // first four pieces are "big" and take four rows
   { { 0,1,1,0,0 },
     { 0,1,1,1,0 },
     { 1,1,1,1,1 },
     { 0,1,0,0,0 },
     { 0,0,0,0,0 } },
   { { 1,1,1,0,0 },
     { 0,1,1,1,0 },
     { 1,1,1,0,0 },
     { 0,1,0,0,0 },
     { 0,0,0,0,0 } },
   { { 0,1,0,0,0 },
     { 0,1,1,1,0 },
     { 0,1,1,1,1 },
     { 1,1,1,1,0 },
     { 0,0,0,0,0 } },
   { { 0,1,0,0,0 },
     { 1,1,1,1,0 },
     { 0,1,1,1,1 },
     { 0,1,0,1,0 },
     { 0,0,0,0,0 } },
   // remaining five pieces take three rows
   { { 0,1,1,1,0 },
     { 1,1,1,0,0 },
     { 1,1,1,0,0 },
     { 0,0,0,0,0 },
     { 0,0,0,0,0 } },
   { { 1,1,1,1,0 },
     { 1,1,1,0,0 },
     { 1,0,1,0,0 },
     { 0,0,0,0,0 },
     { 0,0,0,0,0 } },
   { { 1,1,1,0,0 },
     { 1,1,1,1,0 },
     { 1,0,1,0,0 },
     { 0,0,0,0,0 },
     { 0,0,0,0,0 } },
   { { 0,1,1,1,0 },
     { 1,1,1,0,0 },
     { 1,0,1,0,0 },
     { 0,0,0,0,0 },
     { 0,0,0,0,0 } },
   { { 1,1,0,0,0 },
     { 1,1,1,1,0 },
     { 0,1,1,0,0 },
     { 0,0,0,0,0 },
     { 0,0,0,0,0 } }
};

Initializing the grid is pretty simple: a zero represents an open space, a number indicates a piece (1 through 9), and another constant represents the border (10). Because the border is more filled than open, the code fills the borders, then opens up the spaces where a piece can intrude.

#define BORDER 10

void init_border(void)
{
   // Set grid to zero
   memset(grid,0,sizeof(grid));
   // Border defaults to blocked
   for(int i=0;i<11;++i)
   {
      grid[0][i] = grid[i][0] = grid[10][i] = grid[i][10] = BORDER;
   }
   // Clear spots
   grid[0][2] = grid[0][4] = grid[0][8] = 0;
   grid[10][1] = grid[10][4] = grid[10][6] = grid[10][8] = 0;
   grid[3][0] = grid[5][0] = grid[8][0] = 0;
   grid[1][10] = grid[5][10] = grid[7][10] = grid[9][10] = 0;
}

Now here we come to the key trick to make the whole thing work. I have to try each piece in four different rotations. If I rotate the pieces on the fly, the program will be spending most of its time doing rotations. That's not a good idea for something that is already likely to be quite computationally intensive. So why not save the work and just do it once? This raises one subtlety to the problem: The pieces all fit in a 5x5 grid, but have at least one row or column empty. The solving code will want to assume that the top row and left column are non-empty, so each rotation has to also shift to make that happen. With that explained, there's nothing particularly fancy about the code, though I'll admit that on the first run, it mangled the pieces by doing some reflections as well as rotations.

int all_pieces[9][4][5][5];

void init_rotations(void)
{
   for (int piece=0; piece<9; ++piece)
   {
      for (int rotation=0; rotation<4; ++rotation )
      {
         if ( rotation == 0 )
         {
            for ( int r=0;r<5;++r )
            {
               for ( int c=0;c<5;++c )
               {
                  all_pieces[piece][rotation][r][c] = pieces[piece][r][c];
               }
            }
         }
         else
         {
            for ( int r=0;r<5;++r )
            {
               for ( int c=0;c<5;++c )
               {
                  all_pieces[piece][rotation][4-c][r] = all_pieces[piece][rotation-1][r][c];
               }
            }
         }
      }
   }
   // Now shift rotations to all be in the upper-left (no all-zero rows or all-zero columns
   for (int piece=0; piece<9; ++piece)
   {
      for (int rotation=1; rotation<4; ++rotation )
      {
         for ( int i=0; i<5; ++i )
         {
            if ( all_pieces[piece][rotation][0][i] ) goto next; // Something in the first row
         }
         for ( int r=0;r<4;++r )
         {
            for ( int c=0;c<5;++c )
            {
               all_pieces[piece][rotation][r][c] = all_pieces[piece][rotation][r+1][c];
            }
         }
         for ( int c=0;c<5;++c )
         {
            all_pieces[piece][rotation][4][c] = 0;
         }
         --rotation; continue;
      next: ;
      }
   }
   for (int piece=0; piece<9; ++piece)
   {
      for (int rotation=1; rotation<4; ++rotation )
      {
         for ( int i=0; i<5; ++i )
         {
            if ( all_pieces[piece][rotation][i][0] ) goto next2; // Something in the first column
         }
         for ( int r=0;r<5;++r )
         {
            for ( int c=0;c<4;++c )
            {
               all_pieces[piece][rotation][r][c] = all_pieces[piece][rotation][r][c+1];
            }
         }
         for ( int r=0;r<5;++r )
         {
            all_pieces[piece][rotation][r][4] = 0;
         }
         --rotation; continue;
      next2: ;
      }
   }
}

Before writing the code to try all the combinations and find the solution, I need to be able to display the board. I also need to be able to display the pieces to check that I've coded and rotated them correctly. This is a place where spending a few minutes extra to have nice readable output can save a lot of time in understanding the results. Using a graphics toolkit or generating HTML would be overkill for this problem. Simple printing to the console would be fine, but tossing in some ANSI color codes to make the pieces easier to visualize would be well worth the effort. Fortunately, I have a handy set of ANSI color code defines:

#define _CN "\033[m"          // Reset to default color and attributes
#define _CRED "\033[1;31m"    // Set text to bold red
#define _Cred "\033[31m"      // Set text to red
#define _C_red "\033[4;31m"   // Set text to underlined and red
#define _C_RED "\033[4;1;31m" // Set text to underline and bold red
#define _Bred "\033[41m"      // Set background to red
#define _CGRN "\033[1;32m"
#define _Cgrn "\033[32m"
#define _C_grn "\033[4;32m"
#define _C_GRN "\033[4;1;32m"
#define _Bgrn "\033[42m"
#define _CYEL "\033[1;33m"
#define _Cyel "\033[33m"
#define _C_yel "\033[4;33m"
#define _C_YEL "\033[4;1;33m"
#define _Byel "\033[43m"
#define _CBLU "\033[1;34m"
#define _Cblu "\033[34m"
#define _C_blu "\033[4;34m"
#define _C_BLU "\033[4;1;34m"
#define _Bblu "\033[44m"
#define _CMAG "\033[1;35m"
#define _Cmag "\033[35m"
#define _C_mag "\033[4;35m"
#define _C_MAG "\033[4;1;35m"
#define _Bmag "\033[45m"
#define _CCYN "\033[1;36m"
#define _Ccyn "\033[36m"
#define _C_cyn "\033[4;36m"
#define _C_CYN "\033[4;1;36m"
#define _Bcyn "\033[46m"
#define _CWHT "\033[1;37m"
#define _Cwht "\033[37m"
#define _C_wht "\033[4;37m"
#define _C_WHT "\033[4;1;37m"
#define _Bwht "\033[47m"
#define _CBLK "\033[1;30m"
#define _Cblk "\033[30m"
#define _C_blk "\033[4;30m"
#define _C_BLK "\033[4;1;30m"
#define _Bblk "\033[40m"
/*
 * ANSI attribute strings
 */
#define _A_BOLD "\033[1m"         // Leave color and underline-status as-is, set BOLD
#define _A_UNDER "\033[4m"        // Leave color and bold-status as-is, set UNDERLINE
#define _A_BOLD_UNDER "\033[1;4m" // Just an optimized version of _A_BOLD _A_UNDER

So using those defines to display the grid and the pieces is fairly straightforward. You'll notice that I take advantage of C's string concatenation--any two strings separated only by whitespace are treated as a single string.

void display_grid(void)
{
   printf(_Bblk"             "_CN"\n");
   for (int row=0;row<11;++row)
   {
      printf(_Bblk" "_CN);
      for (int col=0;col<11;++col)
      {
         char c;
         c = grid[row][col] + '0';
         switch(grid[row][col]) {
            case 0: c=' '; printf(_Bwht); break;
            case 1:
               printf(_Bcyn); break;
            case 2:
            case 3:
               printf(_Bblu); break;
            case 4:
            case 5:
               printf(_Bmag); break;
            case 6:
            case 7:
               printf(_Byel); break;
            case 8:
               printf(_Bred); break;
            case 9:
               printf(_Bgrn); break;
            case BORDER: c=' '; printf(_Bblk); break;
         }
         printf("%c"_CN,c);
      }
      printf(_Bblk" "_CN"\n");
   }
   printf(_Bblk"             "_CN"\n");
   printf("\n");
}

void display_pieces(void)
{
   for (int piece=0;piece<9;++piece)
   {
      printf("Piece %d rotations\n",piece+1);
      for (int row=0;row<5;++row)
      {
         for (int rotation=0;rotation<4;++rotation)
         {
            for (int col=0;col<5;++col)
            {
               char c;
               c = all_pieces[piece][rotation][row][col] ? '1' + piece : ' ';
              
               switch(all_pieces[piece][rotation][row][col] * (piece+1)) {
                  case 0: c=' '; printf(_Bwht); break;
                  case 1:
                     printf(_Bcyn); break;
                  case 2:
                  case 3:
                     printf(_Bblu); break;
                  case 4:
                  case 5:
                     printf(_Bmag); break;
                  case 6:
                  case 7:
                     printf(_Byel); break;
                  case 8:
                     printf(_Bred); break;
                  case 9:
                     printf(_Bgrn); break;
                  default:
                     printf("invalid"); break;
               }
               printf("%c"_CN,c);
            }
            printf("      ");
         }
         printf("\n");
      }
   }
}

With the above code, I have all that is needed to display the pre-rotated pieces (which proved vital for debugging mangled rotations) and the game board with any pieces placed in it. Starting out, it looks like:

Piece 1 rotations
 11          1           1        1         
 111        11        11111      1111       
11111      111         111        111       
 1         1111         11        11        
             1                    1         
Piece 2 rotations
222         2           2         2 2       
 222       222         222       2222       
222        2222       222         222       
 2         2 2         222         2        
                                            
Piece 3 rotations
 3           3         3333      3          
 333        333       3333       3333       
 3333       333        333       333        
3333       3333          3       333        
              3                   3         
Piece 4 rotations
 4           4         4 4         4        
4444        444       4444       4444       
 4444       44         4444       44        
 4 4       4444          4       444        
            4                     4         
Piece 5 rotations
 555       5           555       55         
555        555         555       555        
555        555        555        555        
            55                     5        
                                            
Piece 6 rotations
6666       6           6 6       666        
666        666         666        66        
6 6        66         6666       666        
           666                     6        
                                            
Piece 7 rotations
777         7          7 7       777        
7777       777        7777        77        
7 7        77          777       777        
           777                    7         
                                            
Piece 8 rotations
 888       8           8 8       88         
888        888         888        88        
8 8        88         888        888        
            88                     8        
                                            
Piece 9 rotations
99          9          99         99        
9999        99        9999       999        
 99        999          99       99         
           99                     9         
                                            
Empty board

So the the code to actually go through the placements feels anticlimactic at this point. It simply marches through the puzzle, trying every possible combination in much the way I would do if I were doing an exhaustive search by hand. Here's the complete program:

/*
 * Find the solution or solutions to The Infuriater
 *
 * By Preston Crow
 *
 * gcc -std=c99 -W -Wall puzzle.c -o puzzle -O3 && puzzle
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define _CN "\033[m"          // Reset to default color and attributes
#define _CRED "\033[1;31m"    // Set text to bold red
#define _Cred "\033[31m"      // Set text to red
#define _C_red "\033[4;31m"   // Set text to underlined and red
#define _C_RED "\033[4;1;31m" // Set text to underline and bold red
#define _Bred "\033[41m"      // Set background to red
#define _CGRN "\033[1;32m"
#define _Cgrn "\033[32m"
#define _C_grn "\033[4;32m"
#define _C_GRN "\033[4;1;32m"
#define _Bgrn "\033[42m"
#define _CYEL "\033[1;33m"
#define _Cyel "\033[33m"
#define _C_yel "\033[4;33m"
#define _C_YEL "\033[4;1;33m"
#define _Byel "\033[43m"
#define _CBLU "\033[1;34m"
#define _Cblu "\033[34m"
#define _C_blu "\033[4;34m"
#define _C_BLU "\033[4;1;34m"
#define _Bblu "\033[44m"
#define _CMAG "\033[1;35m"
#define _Cmag "\033[35m"
#define _C_mag "\033[4;35m"
#define _C_MAG "\033[4;1;35m"
#define _Bmag "\033[45m"
#define _CCYN "\033[1;36m"
#define _Ccyn "\033[36m"
#define _C_cyn "\033[4;36m"
#define _C_CYN "\033[4;1;36m"
#define _Bcyn "\033[46m"
#define _CWHT "\033[1;37m"
#define _Cwht "\033[37m"
#define _C_wht "\033[4;37m"
#define _C_WHT "\033[4;1;37m"
#define _Bwht "\033[47m"
#define _CBLK "\033[1;30m"
#define _Cblk "\033[30m"
#define _C_blk "\033[4;30m"
#define _C_BLK "\033[4;1;30m"
#define _Bblk "\033[40m"
/*
 * ANSI attribute strings
 */
#define _A_BOLD "\033[1m"         // Leave color and underline-status as-is, set BOLD
#define _A_UNDER "\033[4m"        // Leave color and bold-status as-is, set UNDERLINE
#define _A_BOLD_UNDER "\033[1;4m" // Just an optimized version of _A_BOLD _A_UNDER

int grid[11][11];
int pieces[9][5][5] =
{
   // first four pieces are "big" and take four rows
   { { 0,1,1,0,0 },
     { 0,1,1,1,0 },
     { 1,1,1,1,1 },
     { 0,1,0,0,0 },
     { 0,0,0,0,0 } },
   { { 1,1,1,0,0 },
     { 0,1,1,1,0 },
     { 1,1,1,0,0 },
     { 0,1,0,0,0 },
     { 0,0,0,0,0 } },
   { { 0,1,0,0,0 },
     { 0,1,1,1,0 },
     { 0,1,1,1,1 },
     { 1,1,1,1,0 },
     { 0,0,0,0,0 } },
   { { 0,1,0,0,0 },
     { 1,1,1,1,0 },
     { 0,1,1,1,1 },
     { 0,1,0,1,0 },
     { 0,0,0,0,0 } },
   // remaining five pieces take three rows
   { { 0,1,1,1,0 },
     { 1,1,1,0,0 },
     { 1,1,1,0,0 },
     { 0,0,0,0,0 },
     { 0,0,0,0,0 } },
   { { 1,1,1,1,0 },
     { 1,1,1,0,0 },
     { 1,0,1,0,0 },
     { 0,0,0,0,0 },
     { 0,0,0,0,0 } },
   { { 1,1,1,0,0 },
     { 1,1,1,1,0 },
     { 1,0,1,0,0 },
     { 0,0,0,0,0 },
     { 0,0,0,0,0 } },
   { { 0,1,1,1,0 },
     { 1,1,1,0,0 },
     { 1,0,1,0,0 },
     { 0,0,0,0,0 },
     { 0,0,0,0,0 } },
   { { 1,1,0,0,0 },
     { 1,1,1,1,0 },
     #if 0
     { 0,1,1,0,0 },
     #else
     { 1,1,1,0,0 },
     #endif
     { 0,0,0,0,0 },
     { 0,0,0,0,0 } }
};

int all_pieces[9][4][5][5];

int place[9][3]; // rotation,row,col for the piece

#define BORDER 10

void init_border(void)
{
   // Set grid to zero
   memset(grid,0,sizeof(grid));
   // Border defaults to blocked
   for(int i=0;i<11;++i)
   {
      grid[0][i] = grid[i][0] = grid[10][i] = grid[i][10] = BORDER;
   }
   // Clear spots
   grid[0][2] = grid[0][4] = grid[0][8] = 0;
   grid[10][1] = grid[10][4] = grid[10][6] = grid[10][8] = 0;
   grid[3][0] = grid[5][0] = grid[8][0] = 0;
   grid[1][10] = grid[5][10] = grid[7][10] = grid[9][10] = 0;
}

void init_rotations(void)
{
   for (int piece=0; piece<9; ++piece)
   {
      for (int rotation=0; rotation<4; ++rotation )
      {
         if ( rotation == 0 )
         {
            for ( int r=0;r<5;++r )
            {
               for ( int c=0;c<5;++c )
               {
                  all_pieces[piece][rotation][r][c] = pieces[piece][r][c];
               }
            }
         }
         else
         {
            for ( int r=0;r<5;++r )
            {
               for ( int c=0;c<5;++c )
               {
                  all_pieces[piece][rotation][4-c][r] = all_pieces[piece][rotation-1][r][c];
               }
            }
         }
      }
   }
   // Now shift rotations to all be in the upper-left (no all-zero rows or all-zero columns
   for (int piece=0; piece<9; ++piece)
   {
      for (int rotation=1; rotation<4; ++rotation )
      {
         for ( int i=0; i<5; ++i )
         {
            if ( all_pieces[piece][rotation][0][i] ) goto next; // Something in the first row
         }
         for ( int r=0;r<4;++r )
         {
            for ( int c=0;c<5;++c )
            {
               all_pieces[piece][rotation][r][c] = all_pieces[piece][rotation][r+1][c];
            }
         }
         for ( int c=0;c<5;++c )
         {
            all_pieces[piece][rotation][4][c] = 0;
         }
         --rotation; continue;
      next: ;
      }
   }
   for (int piece=0; piece<9; ++piece)
   {
      for (int rotation=1; rotation<4; ++rotation )
      {
         for ( int i=0; i<5; ++i )
         {
            if ( all_pieces[piece][rotation][i][0] ) goto next2; // Something in the first column
         }
         for ( int r=0;r<5;++r )
         {
            for ( int c=0;c<4;++c )
            {
               all_pieces[piece][rotation][r][c] = all_pieces[piece][rotation][r][c+1];
            }
         }
         for ( int r=0;r<5;++r )
         {
            all_pieces[piece][rotation][r][4] = 0;
         }
         --rotation; continue;
      next2: ;
      }
   }
}

void display_grid(void)
{
   printf(_Bblk"             "_CN"\n");
   for (int row=0;row<11;++row)
   {
      printf(_Bblk" "_CN);
      for (int col=0;col<11;++col)
      {
         char c;
         c = grid[row][col] + '0';
         switch(grid[row][col]) {
            case 0: c=' '; printf(_Bwht); break;
            case 1:
               printf(_Bcyn); break;
            case 2:
            case 3:
               printf(_Bblu); break;
            case 4:
            case 5:
               printf(_Bmag); break;
            case 6:
            case 7:
               printf(_Byel); break;
            case 8:
               printf(_Bred); break;
            case 9:
               printf(_Bgrn); break;
            case BORDER: c=' '; printf(_Bblk); break;
         }
         printf("%c"_CN,c);
      }
      printf(_Bblk" "_CN"\n");
   }
   printf(_Bblk"             "_CN"\n");
   printf("\n");
}

void display_pieces(void)
{
   for (int piece=0;piece<9;++piece)
   {
      printf("Piece %d rotations\n",piece+1);
      for (int row=0;row<5;++row)
      {
         for (int rotation=0;rotation<4;++rotation)
         {
            for (int col=0;col<5;++col)
            {
               char c;
               c = all_pieces[piece][rotation][row][col] ? '1' + piece : ' ';
              
               switch(all_pieces[piece][rotation][row][col] * (piece+1)) {
                  case 0: c=' '; printf(_Bwht); break;
                  case 1:
                     printf(_Bcyn); break;
                  case 2:
                  case 3:
                     printf(_Bblu); break;
                  case 4:
                  case 5:
                     printf(_Bmag); break;
                  case 6:
                  case 7:
                     printf(_Byel); break;
                  case 8:
                     printf(_Bred); break;
                  case 9:
                     printf(_Bgrn); break;
                  default:
                     printf("invalid"); break;
               }
               printf("%c"_CN,c);
            }
            printf("      ");
         }
         printf("\n");
      }
   }
}

void remove_piece(int piece)
{
   for (int row=0;row<11;++row)
   {
      for (int col=0;col<11;++col)
      {
         if ( grid[row][col] == piece+1 )
         {
            grid[row][col] = 0;
         }
      }
   }
}

int place_piece_here(int piece,int rotation,int row,int col)
{
   for ( int r=0; r<5; ++r )
   {
      for ( int c=0; c<5; ++c )
      {
         if ( all_pieces[piece][rotation][r][c] )
         {
            if ( r+row >= 11 || c+col >=11 ) return 0; // Off the grid
            if ( grid[r+row][c+col] ) return 0; // Spot full
         }
      }
   }
   // Yes, it goes here; put it in the grid
   for ( int r=0; r<5; ++r )
   {
      for ( int c=0; c<5; ++c )
      {
         if ( all_pieces[piece][rotation][r][c] )
         {
            grid[r+row][c+col] = piece+1;
         }
      }
   }
   return 1;
}

// Place a piece if possible; return true if placed
int place_piece(int piece)
{
   for ( int rotation=place[piece][0]; rotation<4; ++rotation,place[piece][1]=0,place[piece][2]=0 )
   {
      for ( int row=place[piece][1]; row<9; ++row,place[piece][2]=0 )
      {
         for ( int col=place[piece][2]; col<9; ++col )
         {
            if ( place_piece_here(piece,rotation,row,col) )
            {
               place[piece][0]=rotation;
               place[piece][1]=row;
               place[piece][2]=col;
               return 1;
            }
         }
      }
   }
   return 0;
}

int main(int argc,char *argv[])
{
   (void)argc;(void)argv;
   init_border();
   init_rotations();
   display_pieces();
   printf("Empty board\n");
   display_grid();

   int solutions=0;
   for (int i=0; i<9; ++i )
   {
      if ( place_piece(i) )
      {
         if ( i == 8 )
         {
            ++solutions;
            printf("Solution %d found:\n",solutions);
            display_grid();
            remove_piece(i);
         }
         else
         {
            if ( i >=7 && 0 ) // debug
            {
               printf("Placed %d pieces:\n",i+1);
               display_grid();
            }
            if ( i != 8 )
            {
               for ( int j=0;j<3; ++j)
                  place[i+1][j]=0;
            }
            continue;
         }
      }
      if ( i == 0 )
      {
         if ( !solutions )
         {
            printf("Failed to solve puzzle\n");
         }
         break;
      }
//printf("Can't place piece %d; remove piece %d\n",i+1,i);
      // Advance the previous piece
      --i;
      remove_piece(i);
//      printf("piece %d was at rot,row,col %d,%d,%d\n",i+1,place[i][0],place[i][1],place[i][2]);
      ++place[i][2];
      // Go back one more as the loop will increment
      --i;
//      printf("piece %d was at rot,row,col %d,%d,%d\n",i+1,place[i][0],place[i][1],place[i][2]);
   }
   //display_grid();
   return 0;
}

Now much to my surprise, once I had debugged it, it ran quite fast. I had guessed that it might run for hours before finding the solution, but it ran to completion in under a minute. The biggest surprise was that it found not one, but eight solutions! I shared my results with the creator of the puzzle. He said he had also used a computer to verify that there was only one solution, but apparently his program had a bug. He has made a slight change to the puzzle, so if you buy it now, it will have one unique solution.

In case you're interested in buying and playing with the puzzle yourself, I won't spoil it by posting the solution here. It's simple enough to run the program if you want to see them.

And a note on copyright: While I hold the copyright on all my posts, including the above code, in my opinion, the ANSI color code defines are simple and obvious enough that they can hardly be considered copyrightable, so feel free to use them.