Whimsy¶
Test Framework proposal for gem5.
This framework is by no means final. I am open to suggestions, criticism, and change requests to this framework. This is meant to be a strong starting point for the rewrite of the testing system for gem5.
Please feel free to comment on the gem5-dev-list or create issues on the pull request for this patch.
The development branch for this framework can be found here.
Motivation¶
Note: This section is non-normative.
Current Framework Issues¶
A testing infrastructure already exists for Gem5. Why create a new one? There are quite a few issues which exist in the current infrastructure. Just to list a couple:
- The system is scattered across multiple systems.
- No center location for documentation/information
- Multiple entrypoints to the testing system.
- Massive amounts of indirection between scons, config files, and the current framework
- There is not a simple way to add requirements for tests. If they need something set up, one would need to muck around the gem5 build system.
- Tests do not have a clear reason for failure.
- Test names give no idication of what they are intended to test
- Large amounts of output spew take time to pour over
- Tests are not written with an explicit goal for test coverage in mind.
- There are lots of legacy components.
- SPARC tests which are not open source
- Broken tests which everyone has to know have been failing forever
On top of this issue, since we use SCons to run them current tests are incredibly static and must be formatted in a very specific format. Adding additional novel tests such as testing gdb functionality, or unit tests requires a rewrite of the framework.
Other Frameworks Available¶
We obviously need a new framework, but why write our own again? Before starting this project I explored a few other frameworks. Each had their own issues that made them feel imperfect for our needs.
Pytest seemed like the best option since it is written in python, is relatively popular, and it has deep support for objects called Fixtures, essentially an item that can be set up and torn down. Fixtures should almost cover our need to specify build targets. Unfortunately, these fixtures are not enumerated until the specifc test that runs them is started. So there is no natural way in pytest to specify all scons targets and only execute a single scons build.
Even worse than this issue is the bug that exists in their *marks* which they use to indicate tests require a fixture. Any test that is derived from one another and adds a mark will backpropogate that mark to their child. This effectively ruins code reuse, something that is very import for our testing since every tests we currently has does the same thing with different fixtures or configs.
There is a posted workaround for this issue, but it is both esoteric and requires users to know about the bug or spend hours debugging the strange issue to come to discover the workaround.
See also
See here for a simple example of the issue.
Avacado/Autotest was another option briefly explored. This framework also has the same issue as Pytest in that it has no natural way to enumerate all build targets and build them right away each test is loaded and ran one at a time so there is no way to gather up all fixture elements and build those we want to right away.
Options that were not deeply explored were those in the ‘acceptance test’ world. I have personal experience working with the RobotFramework and found that working in ‘human’ language is more difficult and error prone than modern programming languages.
One final consideration is Gtest or another compiled testing framework. Besides the fact that these systems would either require being plugged into gem5, being so low level means writing more code and greater attention to detail when writing tests. The hope for a testing system is that it should be (relatively) easy to add additional tests.
Definition of Terms¶
NOTE: The remaining sections may contain limited non-normative comments.
Before introducing the framework with a brief overiew of the run loop, there are a few terms used in this documentation that readers may not be familiar with. The purpose of this section is to briefly introduce users to these terms.
Test Suite¶
A whimsy.suite.TestSuite
is a completely self-contained unit of
testing which must contain one or more TestCase instances.
Test suites can rely on Fixtures, have tags (which
contained test cases will be tagged with), and be marked fail_fast. When tests are run, test suites will automatically pass the
fixtures they require to the their test cases. Additionally, when querying
based on tags, test cases will be marked with the same tags as their containing
TestSuite
.
Test Case¶
A whimsy.test.TestCase
is a unit of test that is not necessarily
self-contained. An example of a test which is not self-contained would be
a test which parses the output of a gem5 run against a gold standard. Since
this test case relies on gem5 running first, it would no longer pass if ran on
its own and therfore the test is not self-contained. Test cases have all the
metadata that a test suite has (Tags and Fixtures). However, they cannot be
individually marked fail_fast.
Fixture¶
A whimsy.fixture.Fixture
is an object that may require setup or
tearing down before or after a TestCase or TestSuite has run. When tests are run, they will be handed fixtures from
their containing TestSuite, and will set up any fixtures that are not already
built. This allows test cases to incrementally test results from a single gem5
execution.
Most importantly Fixtures remove the requirement on SCons to keep track of test requirements. TestCases and TestSuites now maintain that information on their own and the runner will make an aggregated call to SCons on their behalf.
Tags¶
Tags are used to mark groups of related tests. Common examples of tags are ISA
(X86, ARM), testing length (quick, long), and system emulation type
(se, fs). Indexes of tags are built by the
whimsy.loader.TestLoader
to query and run tests by specifying command
line arguments.
Fail Fast¶
Fail Fast (written fail_fast
throughout this document) has slightly
different semantics depending on the use case. In general it means that given
a TestCase failure, refrain from testing some future number of tests.
The three different semantics of fail_fast
given a test failure are as follows:
- The
--fail-fast
flag will cause all remaining tests to be ignored. The use case for this could be a script that just checks on commit if all tests pass or not. If they don’t pass we avoid wasting time running the remaining tests. - If a
TestCase
in that suite fails while executing aTestSuite
that is markedfail_fast
then the remaining tests in that TestSuite will be skipped. If there are any remaining test suites to run, they will continue to run one at a time. - Inside of a
TestSuite
, test cases are stored in hierarchicalwhimsy.suite.TestList
objects. In addition to utility functionsTestList
instances have afail_fast
attribute. When a test fails in aTestList
the remaining test cases in thatTestList
will be skipped. However, if there are any remaining test lists or cases outside of the failed one, but inside of the currently executing TestSuite, they will still be executed.
This last case visually:
- TestList (Marked fail_fast)
- Gem5 Run (FAILS)
- TestList (Will all be skipped)
- TestStdout (skipped)
- TestStderr (skipped)
Again with a failure in one of the output checkers:
- TestList (Marked fail_fast)
- Gem5 Run (PASS)
- TestList (Not marked fail_fast)
- TestStdout (FAILS)
- TestStderr (Still will be run.)
Note
The use case for the TestSuite
fail_fast
option is more one
of convinience. Its semantics differ slightly from the TestList
use,
but in the general case it just allows users to create a TestSuite and
TestCases without an intermediate TestList
. It might be worth removing,
as I haven’t found a use case for it.
File Organization¶
The new layout for tests doesn’t change much from the old one. The first minor change is that test cases will be located in a test.py file, and the old test.py files will be changed to config.py. This is part of an effor to make test cases more explicit and discoverable by users. Rather than have a single file that generates all tests, each file can generate their own variants on tests.
So
tests/speed/system-mode/test-name/test.py
changes to
tests/gem5/test-name/config.py
tests/gem5/test-name/test.py
Where test.py will more than likely contain
a whimsy.gem5.suite.gem5_verify_config()
function call. Reference files
will be placed in the same directory they already are in. The only other
difference will be that all ISA names should be capitalized.
Also as you might have noticed above, the root of all tests is now
tests/gem5
with no specification being given on the speed of the tests by
path name. This can now be done with tags. Reference files path names have also
been trimmed a tiny bit.
See also
Test Programs
Test-programs will remain in the same directory. Only the ISA name will now be capitalized. I would like to keep it consistent throughout the codebase. Since we are going to be building gem5 using a uppercase name, everywhere else can take that standard. With this framework it is possible to build or download these test-programs each time rather than storing binaries in the repo.
Location of the Framework
Finally, I would suggest that this framework be placed in the ext
directory. The gem5 helpers (under whimsy/gem5
in this repo) could be
placed directly in the tests
dir. The single entrypoint to run tests would
then be located in tests/main.py
. This script would set up paths and
import all the necessary support libraries for testing.
I would expect there should not be too many changes made to this framework once
it is solidified. However, I would hope that more gem5 specific Fixture
and
TestCase
types are created, so the tests dir might fit that more lively
update pattern.
Running Tests¶
The external interface for whimsy is not too different than the one exposed by
test.py
right now.
To run all tests use the run
subcommand:
./main.py run . # The '.' is optional.
The run
subcommand has some optional flags:
--skip-build
skip the building of scons targets (like gem5)-v
increase verbosity level once per flag.--uid
run the test item with the given uid.-h
Show help and list more available flags.
Typical Runloop¶
In a typical run of whimsy using the run subcommand. Whimsy will first parse
the command line flags. Assuming the run command is given, whimsy will then
create a whimsy.loader.TestLoader
object and use that object to
collect all tests in the given directory.
See also
For more info see whimsy.main
Test Collection and Discovery¶
The TestLoader
will recurse down the directory tree looking for test
program file names that match the default_filepath_regex
. Python files that
either begin or end in test
or tests
with a hyphen or underscore will
match. e.g. test-something.py
or special-tests.py
will match, but
tests.py
will not. Additionally, ‘hidden’ files that begin with a .
will be ignored.
Once the TestLoader
has found a file that has a name indicating it
is a test program, the loader will begin to load tests from that file by
calling execfile
on it. TestCase
instances and TestSuite
objects in the test file will be collected automatically. Any
TestCase
objects which are not specifically placed into a
TestSuite
instance will be collected into a TestSuite
created
for the module.
See also
For more info on discovery, valid filenames, and collection see
whimsy.loader
Test Running Step¶
Once the tests have been discovered and collected by the TestLoader
,
whimsy.main
will create the requested
whimsy.result.ResultLogger
logger objects used to display results
and/or stream them into a file in a specified format. (Currently
a ConsoleLogger
, InternalLogger
, JUnitLogger
exist). All loggers
are designed to minimize the amount of memory used by writing out test
information as soon as possible rather than storing large strings.
The whimsy.runner.Runner
is instantiated using suites collected by the
the TestLoader
in addition to any of the previously mentioned result
loggers. Once the runner begins, it first sets up any Fixture
objects that
are not marked lazy_init
. Once all these lazy_init
fixtures have been
set up the Runner
begins to iterate through its suites.
The run of a suite takes the following steps:
- Iterate through each
TestCase
passing suite level fixtures to them and running them. - If the
TestCase
fails, checkfail_fast
conditions and fail out if one occurs.- A
TestSuite
or the containingTestList
was markedfail_fast
- The
--fail-fast
flag was given as a command line arg.
- A
teardown
any built fixtures contained in theTestSuite
object.
The run of a TestCase
follows these steps:
- Start capturing stdout and stderr logging it into separate files.
- Copy the suites fixtures and override them with any versions we have in this test case.
- Build all the fixtures that are required for this test.
- If any fixture build fails by throwing an exception, mark the test as failed.
- Execute the actual test function, catching all exceptions.
- Any exception other than the
whimsy.test.TestSkipException
thrown by thewhimsy.test.skip()
function will result in a fail status for the test. - The test passes if no exceptions are thrown and the
__call__
returns.
- Any exception other than the
While all of these above steps are executed, calls are made to the result loggers to notify them of results.
See also