Testing Compilers Using Csmith

This document assumes that the csmith executable is in your path and also that the environment variable CSMITH_HOME points to the directory into which Csmith was untarred and compiled.

Before going into any details, we need to say this:

We strongly request that you:
  • acquire a sophisticated understanding of the C standard before reporting any bug -- this may take months
  • read this entire document before reporting a compiler bug
  • always check if an issue is known before reporting it
  • understand and conform to whatever additional local bug-reporting conventions apply to the compiler you are testing
  • listen to feedback from compiler developers and other members of the community
Failure to heed these instructions will cause you to:
  • waste developers' time
  • probably be publicly flamed
  • definitely be ignored in the future
It is surprisingly difficult to submit good compiler bug reports. We know because we have submitted our share of poor ones.

The Idea

A program generated by Csmith performs random computations, computes a checksum of its global variables, prints the checksum to STDOUT, and then exits. Csmith is designed to guarantee that for every generated program:

A collection of correct C compilers whose implementation-defined behaviors (integer size, etc.) are compatible must all compile the generated program into executables that output the same checksum.

Thus, when a compiler crashes or when changing the compiler or compiler options changes the checksum printed by a Csmith-generated program, a compiler (or Csmith) bug has been found.

Getting Started #1

You can use Csmith to reproduce some example bugs that we've already found in common versions of GCC.

Getting Started #2

Csmith is just a program generator. To use it for compiler testing, some sort of driver is required. This bash script will do the job:

set -e
while [ true ]
  csmith > test.c;
  gcc-4.0 -I${CSMITH_HOME}/runtime -O -w test.c -o /dev/null;

Here we're looking only for compiler-crash bugs (that is, we're not running the compiler's output) at a single optimization level. We've chosen GCC 4.0.0 because Csmith has an easy time crashing it. A typical run of this script is:

regehr@home:~$ ./driver0.sh
test.c: In function 'func_108':
test.c:1000: internal compiler error: in c_common_type, at c-typeck.c:531
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.

The failure-inducing test case is in test.c. It will take only seconds for a failure-inducing test case to be found. If you choose a different compiler, you'll have to wait longer, maybe even days.

A Better Driver

Csmith is distributed with a more sophisticated driver found in ${CSMITH_HOME}/scripts/compiler_test.pl. It should run on suitably-equipped Linux, MacOS, and Windows+Cygwin machines. Before running the driver you should:

  1. create a temporary directory and cd there
  2. create a configuration file listing compilers you want to test; an example is found in ${CSMITH_HOME}/scripts/compiler_test.in

Then run the script:

$CSMITH_HOME/scripts/compiler_test.pl N compiler_test.in

N is the number of tests you want to run (or 0 to keep going indefinitely). By default, this script only looks for compiler crash bugs.

The driver prints some results to STDOUT, but this can be safely ignored. Any interesting programs that it finds will be saved in the script's working directory as crashX.c, wrongX.c, or csmith_bug_X.c.

If you want to test Visual C++ compilers, the path must be setup in Cygwin by inserting "@call <Visual Studio PATH>\VC\bin\vcvars32.bat" to cygwin.bat.

Reporting Compiler Bugs

Finding compiler bugs is the easy part; reporting them (well) is the hard part.

What does a good compiler bug report look like?
  1. It reports an issue that is not already known.
  2. It reports an issue in a compiler that is currently supported. Bugs in obsolete compilers are generally not interesting. In fact, for LLVM and GCC it is best to test the latest version from the SVN repository.
  3. It contains all information necessary to reproduce the problem including compiler version, host platform, target platform, compiler options, expected output, and actual output.
  4. It contains a reduced, preprocessed, reproducible, understandable, and valid test case.
    • A reduced test case is the absolutely smallest test case you can find that triggers the bug. Ideally the test case you report is 3-5 lines of code, and it should seldom be more than 20.
    • A preprocessed test case is important because it removes dependencies on your host system.
    • An understandable test case is one where you, the reporter, fully understand what is going on in every line of the code. This is most important for wrong-code bugs.
    • A reproducible test case is one that happens every time. If you think you've found a compiler bug but it is not reproducible, you must make it reproducible before reporting it. Turning off ASLR can help.
    • A valid test case is one that fails to invoke undefined behavior or depend on unspecified behavior in the C standard. See Annex J for a summary. The issues here are very subtle: be careful and think twice before reporting an issue. Some -- but not all -- bad behaviors can be detected by turning on maximum compiler warnings, by using Valgrind, and by using the -fwrapv and -ftrapv options supported by GCC and Clang.
  5. It does not contain zip files, tarballs, or anything like that.
  6. It is reported to the compiler provider. For example, if you find a bug in the compiler that ships with Ubuntu 10.10, you should report the bug to the Ubuntu people. If you can reproduce the problem using the vanilla upstream compiler version, then it's OK to report it to the compiler vendors.
Violating any of these guidelines is likely to waste people's time.

It is very easy to accidentally report a bug that is not really a bug because relies on undefined or unspecified behavior. These behaviors can creep into the program during testcase reduction, or they could have been present in the original Csmith output. In the latter case, there is a serious bug in Csmith and we would appreciate it if you reported it to us.

These links all contain good information:

This implementation of Delta debugging can be useful in creating reduced test cases. LLVM has its own Bugpoint tool that is also very helpful. We're working on our own test case reduction and will release these perhaps sometime during 2011.

Tuning Csmith

An advantage of random testing is that it is very easy to tune properties of the generated code including the C subset, the size of the program, and other properties. There are two levels at which Csmith can be tuned (or three, really, since you can always hack the code).

First, Csmith has many handy command-line options. To see commonly-used options run csmith -h. To see uncommonly-used options, run csmith -hh. We do not in fact expect that most of the -hh options will be useful to anyone other than ourselves.

Second, you can hack Csmith's probability distributions. Running csmith --dump-default-probabilities <file> will give you a look at some of its internal configuration variables. You'll probably need to look into the code to understand what they do. You can change the numbers in this file and then feed them back to Csmith like this: csmith --probability-configuration <file>. More details can be found in $CSMITH_HOME/doc/probabilities.txt.

Where Are The Bugs?

We've put an enormous amount of time into testing Clang and GCC for x86 and x86-64 at the -O0, -O1, -O2, -Os, and -O3 levels. The developers have put a lot of time into fixing the bugs we've reported. Consequently, you are unlikely to find many bugs in these compilers, for these targets, at these optimization levels.

If you want to find compiler bugs, you might consider testing:

Before submitting a lot of bugs, please check and make sure the developers are actually interested in seeing bug reports. Commercial compiler vendors are typically reluctant to fix bugs that are not impacting important customers. Open source developers may also have other priorities. It's strongly possible that nobody cares if some random combination of optimization flags causes a compiler crash for the PDP-11 target.

Frequently Asked Questions

Why doesn't Csmith's output compile as C++? We haven't had time to fix this, but will try to do so. In the meantime, passing Csmith the --no-structs command line option suffices to generate valid C++ most of the time.

Can Csmith be altered to emit programs in a language other than C or C++? Not easily.

How do I turn off ASLR? Google will tell you.