I've come to the opinion, when writing software, you should test it...usually unit testing suffices. But a game is a special kind of software. So special, one naturally faces the question, "Should we still unit test a game?"
To be clear, there are varying degrees and different notions of "testing" a game. We could test the software (make it "bug free", or at least have fewer bugs), test the enjoyability of the game (e.g., make sure it's fun, winnable, etc.), test the UI behaves as desired, "integration tests", end-to-end testing (some sort of "autmated player" searching for specific bugs). I'll discuss a few of these notions.
Testing the Code
If we have adopted a model-view-controller architecture of some flavor, then we can unit test the models. I contend we should unit test the models, and use contracts to enforce the assumptions of the model methods. What would this look like?
We can encode assumptions like Actor::dies()
should demand the actor's health is non-positive (i.e., either zero or negative). This could be encoded with an assertion ("precondition"). Then we could write a unit test to create an actor, give the actor some amount of health points, cause damage, then try to call actor.dies()
. There are three test cases (when actor.health()
is positive, zero, and negative) which should result in the death of the actor in two cases.
Organizing Unit Tests
For object oriented languages, I'm inclined to follow some kind of xUnit testing framework or JUnit: each class we write (say class Thinger
) should have some corresponding test class (e.g., class ThingerTest
) where each method of the class is tested several times...so Object Thinger::methodOne()
should have several corresponding methods void ThingerTest::methodOneShouldDoXTest()
, and so on.
For Lisp, there's usually a framework given (Clojure has clojure.test
, Common Lisp has several frameworks, etc.). The organization is analogous as for object-oriented languages (functions in module.lisp
like (defun thinger-method (...))
should have several test cases handled in module_tests.lisp
).
Terminology: JUnit organizes test cases as methods using assertions on a Test
class, which are organized into test suites (analogous to how files are organized into directories). A test runner then iterates through the test suites and executes each test case, recording results (both successes and failures) for later use. The exact terminology varies (some xUnit systems, e.g. in smalltalk, have test case classes), but the intuition remains the same: test cases organized into test suites, and a test runner that executes the test cases and records the results.
We organize code by modules, which contain classes, which contain methods. These terms are used loosely: C programmers lack any module system, but use struct
s instead of classes, and functions instead of methods; Haskell programmers use modules, data types, and functions; etc. Whatever the terminology, we have some kind of ersatz class and ersatz method. Each "class" should have a corresponding test suite, each method should have several tests. Depending on how we organize classes, we should have a corresponding organization of tests: if we have one file per class, we should have an analogous file per test suite. The motivation for this scheme is to make it obvious where to place tests for code (all the tests for /game/src/models/my_module.code
is placed in /game/tests/models/my_module_tests.code
for example). It's more important to be consistent in whatever scheme you choose.
What to Unit Test
Test only code you have written. There's no need to write tests for third-party libraries. We trust ncurses works as expected, the GNU Scientific Library functions, and so on. It's only the code we wrote that we want to test.
We should unit test all public functions. While aiming for 100% coverage is ideal, if we get to "a lot" (I dunno, say, ~90% or whatever), we can say it's "good enough". The rationale here is that public functions are used to build our game, so if we have tested them thoroughly enough, then we can have greater confidence in their correctness (they do what we think they should).
Test all code paths. Has each statement in the method been tested? Has each edge in the control-flow graph been tested? Has each branch of every if-else statement been tested? (Or every case in a switch
statement been tested?) Has every boolean subexpression of each conditional been tested? Complicated conditional tests could be refactored into predicate functions, which can be independently tested.
Each function should do one thing, and unit tests make sure the function does what we expect/hope. For example, the Actor::dies()
method does one thing; once it is called, we should have Actor::isAlive()
return false. This gives us two cases to consider: one where Actor::dies()
fails (e.g., when Actor::health()
is positive), the other when Actor::dies()
succeeds. The former case should have Actor::isAlive()
return the same result as Actor::health() > 0
, the latter should have Actor::isAlive() == false
.
Tests should be atomic. Each test case will test exactly one thing. If a test case is testing more than one thing, we should refactor it into multiple test cases.
Tests should be independent of each other. They should not rely on each other (in the sense that they don't call each other). A unit test should test exactly one thing.
Tests should be readable. Think of them as not just testing the behavior of the function, but also as an example of how to use the function. This gives us a name for the test (e.g., Actor::diesShouldNotBeAliveTest()
or Actor::healthy_should_not_be_dead_test()
, etc.).
Tests should be repeatable/deterministic. We should test the mechanical parts of the game (e.g., marking an Actor
with zero health as "dead") where the same inputs produce the same outputs. If we are testing randomness ("rolling a die"), we should have some way to "mock out" that randomness with something deterministic ("load the die", "use a two-headed coin", etc.) to make sure the methods do what we expect.
Tests should be fast. Since each test case tests exactly one thing, we should make them small and fast.
Tests should be automated and tracked. We should be able to run the tests with a single command (e.g., "make test
" or whatever), and we should include the test code in our git repository. Best practices suggest running the tests and make sure they pass before pushing code out to the repository's master branch ("don't break master").
Testing the Game
"Testing the game" has several distinct meanings: make sure the game is playable, make sure the game is fun, etc. In some sense, unit testing is like checking to make sure each square of the board is flat: but if we glue the edges badly, we could actually end up with a curved board. Unit testing checks locally each function does what we hope, but it doesn't check the game does what we hope. This motivates integration testing and end-to-end testing.
Integration testing can be useful. If we want to make sure dialog options trigger quests and completes quests, we need integration testing. This amounts to setting up a mock game, simulating dialog, then checking the game state matches what we expect. Since multiple "units" are being tested in conjunction (dialog, quests, etc.), we're really testing that they're "integrated" correctly. For people trying to create an old-school Fallout or Wasteland-type game, this can be very useful.
Testing from the Player's Perspective
I would suggest testing the game from the user's perspective. This is harder to do, and varies depending on the type of game you're making. In my mind, I assume you are programming something like a Baldur's Gate, Fallout, or Wasteland-type game: a mixture of quests, dialog, combat, and possibly more.
What do we hope to test? We'd like the game to be playable (quests "fit together" in a way that the player can get to the end of the game), but we'd also like the game to be fun. This is where design decisions are needed: how do we specify a game to be "fun"? Is there sufficient choice architecture?
Test the game is playable (quests sequence properly). For example: quest A occurs before quest B, wherein quest A requires killing actor X, but actor X issues quest B after completing quest A, rendering quest B un-triggerable. (More concretely: if the king gives us a quest after completing his minister's quest, and the minister asks us to assassinate the king, then there better be some mechanism for us to continue after killing the king...like the minister takes over. Otherwise, if we are waiting on the deceased king to give us our next quest, we'll be stuck.)
If we have a domain specific language for quests, actors, items (and if we store these in .info
files), then we could have a simple helper program which runs through the quests, makes certain the quest items are fetchable, the quest-issuing actors are alive, and there are chains of dialog/quests which start with a specified initial quest and end at the specified final quest.
The helper script should record sequences of quests which are unwinnable, or when there are disconnected components (e.g., killing actor X early in the game prevents quests B, C, and D). At the end, it will print out to the screen a summary (along the lines of "M paths succeed, N paths tried with K character builds") and a more verbose explanation to a file ("Character build C played the quest chain Q1, ..., Qm then got stuck at quest X") possibly with the trajectory of events for reconstruction. We could automate this script to try all variations of skills and stats, too.
I've discussed this idea in passing a few times, I'll probably make it the subject of a future post...maybe have a minimal working example for people to play with, we'll see. It'll involve a variation of depth-first search along a few distinct play-styles...we'll see, friends, we'll see.
Test you aren't a jerk to the player. Suppose our game has factions and the player has a reputation (loved, liked, undecided, disliked, hated). If our game penalizes the player's reputation when the player kills a member of the faction, then we should beware of the situation when the player witnesses the death of a faction's member: will this tarnish the player's reputation or not? This is a prime candidate for sticking away into a unit test, for regression testing.
If this is part of the plot (the player, witnessing a murder, is then falsely charged with the murder), then it should be written into the game manually. The last thing a player wants is to find the police are after them for...apparently doing nothing. That may be amusing to the programmer (I certainly chuckled), but it's no fun to the player.
There are other similar cases which, when programming, do not immediately sound consequential. But for the player, it feels like the game is designed by a vindictive jerk. It may not be easy to discern when this happens, but once discovered we should try to create unit tests to ensure we aren't jerks.
Heuristic "Tests"
These are measurements of symptoms which boring games exhibit. Alas, there's no way to automate the underlying "boring-ness" away.
Test the game has choice and consequences. Does dialog change to reflect the player's actions? Are new quests opened up specific to the player's choices? Do new interactions [dialog, NPC encounters, quests offered] occur after the player chooses particular outcomes?
Is it possible to have a playthrough where the player kills everyone before talking to them? This forces us to design the quests with constraints that force the game to have consequences and the player to have freedom. Chris Sawyer noted this design decision in an interview with IGN as key to game reactivity and player choice. We can enforce this check with a particular playthrough in our helper script.
Test the game takes you to all the locations. What locations are visited by the play-throughs? We could dump the trajectory of locations visited for further analysis. Sometimes a location is visited more frequently than intended, other times a location is never visited. This is a symptom of possibly less fun games, which can't be automated to enforce: it's an aid to help revise quest considerations.
In general, test for symptoms of fun. Depending on what your game is trying to accomplish, the criteria for "having fun" varies. Each criteria has different symptoms, and we should figure out how to automate ways to check these symptoms are present in our game. On the flip side, there are certainly symptoms of anti-fun: fun-killing elements we want to avoid. We should also automate ways to check these anti-fun elements are not present in our game.
In some sense, this is the best we can do with automated testing: test for proxies of what we want, and regression-checks against what we dislike. There's no way to properly "test for fun", but we can test the game can be played in different playstyles and for the "kill everyone before you even interact with them" heuristic Chris Sawyer noted.
Concluding Remarks
Game developers tend not to test their games, at least not in the same way that software engineers test their programs. Unit testing is generally discouraged among game developers, for good reason (having unit tests give false sense of being "correct", whereas games seek fun not correctness).
But we can test for an RPG "being playable". We can further make such testing automated. Insofar as we can make such testing scripts, I think we should...at least, I should. Such automated testing checks the quests are ordered correctly and unlockable, speakers are referenced properly, and so on. Again, this doesn't test gameplay, but it tests the game can be played.
As for what this looks like, I'm working on a minimal RPG I've decided to refer to as "project Delaware". (Why Delaware? Nothing special: I've just opted to use the names of states in the U.S. by order of admission to the union. And Delaware is the first state admitted to the union.) I hope to have something to share soon-ish.