Mad Man with a Compiler: Go To Statement Considered Helpful (part 5: Fun with Switches)

For my final segment on the use of gotos, let us not forget one important hidden goto. The switch statement is essentially a goto with a variable label. This means that the if(0) trick can apply here, too.

Before I get into the meat, I want to point out that many of the examples in this section aren't great code. In most cases the best way is to find a more traditional solution. Unlike the previous segments I've written on using gotos, this is mostly fun hacking food for thought, not something you're likely to include in your code.

You're probably vaguely familiar with Duff's Device. Typically you run across it as a loop unrolling technique that a professor mentions in some computer science class, but you then forget about it. Anyway, here's an example. Without Duff's Device, you have a standard loop as below:

for(i=0; i<n; ++i) do_it_once();

With Duff's Device, the loop is unrolled manually as below:

int loops = (n + 3) / 4;
switch(n%4) {
    do {
        case 0: do_it_once();
        case 3: do_it_once();
        case 2: do_it_once();
        case 1: do_it_once();
    } while ( --loops > 0 );
}

That takes a bit to make sense of. The trick is that first time in, the switch jumps to only do the odd cases, then it hits the do/while loop and goes through all the cases for as many times as is needed. The most common case cited for using that code is writing out a buffer to a register. If you're writing to a serial port or something like that, then that's the right way to do it, but if you're doing something more like a buffer copy, a simpler way that is easier to parse would be to pull the loop outside the switch as in the code below. To keep this example parallel to the Duff's device example, the odd instances are done first, though there's no reason for doing it one way or the other. (In fact, for a buffer copy, you might do odd bytes first and last to optimize for alignment, depending on the needs of the system.)

switch(n%4){
    case 3: do_it_once();
    case 2: do_it_once();
    case 1: do_it_once();
}
for(i=0;i<n/4;++i) do_it_four_times();

That's rarely used, but in very specific cases it can be a big win. For example, a memory copy routine may use something very similar to use large registers for the bulk of the data while still supporting odd numbers of bytes. Duff's Device (with the do loop inside the switch statement) has the advantage of eliminating redundant code, especially when the do_it_four_times() call needs to be the same statement four times, and when a larger number is used. If you're working at this level of manual optimization, they key is to try a number of implementations and run performance testing. When combining clever code with optimizing compilers and specific hardware, there are usually more factors than you're taking into account, and you may be surprised at the results.

Now for what I like to call Crow's Corollary to Duff's Device. I sometimes find that I have a series of cases that handle mostly the same thing, but need a slightly different setup. Other cases have completely different code. See the example below.

switch(var) {
    case A:
        buf_to_use = A_BUF;
        use_buf(buf_to_use);
        break;
    case B:
        buf_to_use = B_BUF;
        use_buf(buf_to_use);
        break;
    case C:
        buf_to_use = C_BUF;
        use_buf(buf_to_use);
        break;
    case D:
        // Completely different code from A, B, C
        ...
}

Now imagine that "use_buf()" is 15 or 20 lines of code instead of a simple function. (Yes, I'll accept that the best solution is probably to refactor those 15 to 20 lines into a simple function, but I'm having fun looking at other options.) In order to avoid repeating the code that is the same for the similar cases, you can put the case labels inside an if(0) just like with a goto! Here is the example without the repeated code.

switch(var) {
    case A:
        buf_to_use = A_BUF;
        // Fall through but skip over assignment
        if ( 0 ) {
    case B:
            buf_to_use = B_BUF;
        }
        // Fall through but skip over assignment
        if ( 0 ) {
    case C:
            buf_to_use = C_BUF;
        }
        // General code for cases: A, B, C
        use_buf(buf_to_use);
        break;
    case D:
        // Completely different code from A, B, C
        ...
}

The importance here is the if(0) construct allows you to avoid duplicating the code that is common to several of the cases without requiring that the initial portion of the cases be identical. This could be implemented with a nested switch statement to set up A, B, and C, but programmers are lazy; I've never seen anyone implement the above example with a nested switch to avoid the use_buf() duplication, even if it's 20 lines of code instead of one. Rarely will someone create a function to factor out the duplicated code. Usually I just see the 20 lines of code duplicated. That's bad. That's what if(0) is there to solve.

Of course, this can be implemented more directly using goto:

switch(var) {
    case A:
        buf_to_use = A_BUF;
        goto use_buf;
    case B:
        buf_to_use = B_BUF;
        goto use_buf;
    case C:
        buf_to_use = C_BUF;
        goto use_buf;
    // General code for cases: A, B, C
    use_buf:
        use_buf(buf_to_use);
        break;
    case D:
        // Completely different code from A, B, C
        ...
}

The former version with if(0) blocks interleaved with the case statements is more fun, but it's certainly more confusing and hard to read than simply using a goto. The goto version is also much more friendly to your text editor's auto-indent configuration. (I'm not even sure what "proper" indention is for the interleaved if(0) construct!)

And here's what you would need to write if you were being pedantic about not using gotos:

switch(var) {
    case A:
    case B:
    case C:
        switch(var) {
            case A:
                buf_to_use = A_BUF;
                break;
            case B:
                buf_to_use = B_BUF;
                break;
            case C:
                buf_to_use = C_BUF;
                break;
            default:
                flag_error(); // Should be impossible to reach
                break; // avoid warning
        }
        // General code for cases: A, B, C
        use_buf(buf_to_use);
        break;
    case D:
        // Completely different code from A, B, C
        ...
}

That's clunkier to write, harder to read, and more difficult to maintain. If you change the list of cases that use_buf() applies to, you have to modify the case statements in both switch statements, which might as well be begging for bugs. The repeated case statements in the nested switch statement still constitute duplicated code--exactly what we're trying to avoid by factoring out the use_buf() code. Keep your code clean and concise. Use the right tool for the job, and sometimes 'goto' is the right tool.

If you really want to write the same code without a goto or repeated cases, there is a way, but I don't recommend it. You can interleave another loop inside the switch statement so the breaks for the cases break the loop, not the switch. It may be a fun hack, but it's not good code:

switch(var) {
    // do loop to change where break goes
        do {
    case A:
            buf_to_use = A_BUF;
            break; // exits do loop, not switch
    case B:
            buf_to_use = B_BUF;
            break; // exits do loop, not switch
    case C:
            buf_to_use = C_BUF;
            break; // exits do loop, not switch
        } while (0);
        // General code for cases: A, B, C
        use_buf(buf_to_use);
        break; // regular switch exit
    case D:
        // Completely different code from A, B, C
        ...
}

This is the closest corollary to Duff's Device, but again, don't do that. The above example should be viewed by the compiler as being identical to the goto version. Most C programmers aren't used to mixing up switch statements with other control structures like that, which also makes it harder to understand. Unless you have some arbitrary "no gotos" rule handed down from management, the goto version is the simplest to read and understand. Use the gotos.

Mad Man with a Compiler

Monday, June 19, 2017

Go To Statement Considered Helpful (part 5: Fun with Switches)

No comments:

Post a Comment