Wednesday, October 28, 2020

Programming Exercise: Dice

A short exercise for the reader: come up with a domain specific language for rolling dice. This is a recurring mechanic in Rogue-likes.

Exercise 1. The conventional notation in Dungeons and Dragons is to write "mdn + k" (where "+ k" is optional) to roll m separate n-sided dice, sum them together, then add k to the result and return it. Write a function to accept a string of this form.

What happens when m is not a positive integer? When n is not a positive integer?

Exercise 2. In Dungeons and Dragons, when creating a character, you roll 4d6 but take the highest 3 values. How would you extend the domain language to include this situation?

The source for these exercises are the original Rogue-likes had a function which rolled dice based on a domain specific language, and this idea was picked up by Troll.

Thursday, October 8, 2020

One scheme to world-building

What makes this game fun? That's the question I want to ask when I've finished a game (or as I'm playing it). I recently re-played Fallout 2, and I noticed there's a lot of small quests which piece together to form a tapestry. Individually, the elements are not complex (deliver this item to so-and-so, talk to [person] about [subject]), but they compose together like lego blocks.

This post will discuss one way to world-building, which is necessary to coming up with quests and a game-plot. This is more a "review article" than anything innovative on my part.

Levine's "Stars" and "Passions"

Ken Levine has a great talk on "Narrative Legos". Basically, each location (village, fort, etc.) has around a half-dozen named characters ("Stars"), where each character has three "Passions". A "Passion" is what a character cares about relative to what the player can impact upon; it is transparent to the character, and responsive to the player's action. Effectively a "Passion" is a "bank account" for a Star (or a number between -10 and +10, starting at 0); helping a Star with their Passion results in that Star's invest positive points into their "account", thereby improve the Star's opinion of the player.

When a Star's opinion of the player (formed by combining the three Passion scores together somehow, e.g., adding them together, taking their geometric or harmonic mean, or whatever) reaches a certain high point, it unlocks certain bonuses. Blacksmiths offer additional bonus gear, Clerics offer additional services, etc. Conversely, if a Star's opinion diminishes, services cost more or are outrightly refused.

But two characters could have conflicting passions. This is a goal for writing a plot, because conflicting Passions for Stars means the player is thrust into a zero-sum game...which is fun for the player, and encourages the author to say, "Yes, and I can work this into another story arc!"

The number of characters should be around 5 or 6 to avoid over-burdening the player, and the number of passions should be 3 to avoid accidentally conflicting with too many other character passions (rendering the game unplayable accidentally).

There's a lot to digest here, and how we implement varies considerably. If we formalize a "passion" using a data structure (a few counters) and functions, we could [should] test the rewards and punishments are triggered properly upon shifting an NPC's opinion of the player. But this is just the observer pattern.

Exercise. Look at Fallout 1 and Fallout 2 (or any RPG you enjoy). For each town, write down who are the Stars and what are their Passions.

Begin with a Map and Needs

I've been playing a lot of post-apocalyptic games recently. I found it helps to begin with a map and asking myself, "How does society reproduce itself? Who produces what goods, and how are those exchanged among the cities producing them?" Looking at a real-world map, I can pick out a few cities that survived the apocalypse, think about trade routes, goods produced [at least food, water, armor, weapons, clothing], and this organizes and frames the thought-process about factions, their aims and beliefs.

These questions lead to more interesting power dynamics: multiple factions within a city disagree with how to distribute the goods, who to trade with, what to prioritize. How far are these factions willing to go to enact their policies?

A town that produces nothing except coordinates trading (like the Hub in Fallout) has its own unique concerns. Just to rattle off a few:

  • If a single partner (call it McGuffinsburg) is its sole source of McGuffins, then the internal politics of McGuffinsburg is a concern of the trading hub.
  • Raiders are a perpetual problem.
  • Power dynamics among competing traders; when there's little or no law, it gets quite cut-throat.
  • Arms dealers in particular encourage secret agents to incite war between two neighboring powers, to profit from selling arms to both sides.
  • Any of these could be reversed (McGuffinsburg exerting power over the trade hub, raiders as sympathetic figures, lawkeepers trying to maintain order, etc.)
  • Any two or more of these could be combined.

One bit of advice: just as each Star has 3 Passions (for the sake of foregiving the player for acting unfavorably against one Passion without alienating the Star, thereby cutting off a potential quest-giver), we should take care to have several sources for each commodity. This is a rule-of-thumb, not a Law: sometimes, it's fun to have a single provider for a good, which causes tension and drives the plot ("We need to finish the [apparatus] to provide [goods] to save the world").

Note: although I have been thinking about a game in a post-apocalyptic setting, there's nothing preventing us from applying it to any setting. It's just a little harder if we have no map. (Post-apocalyptic settings let me be lazy, and use existing maps.)

Recap. So far, we have considered using the economy as a way to organize towns and factions. Each town produces some goods, and need to trade with each other to survive. This leads to identifying factions within towns, tensions between towns, and problems to be sorted out. It also leads naturally to using Levine's "Stars" and "Passions" to further refine the game. Both provide natural motivations for quests.

The map gives us a way to visually organize factions, consider how society sustains itself, the relationships between different towns and factions. Such considerations naturally give rise to Stars and their Passions. Altogether we have not formed a plot, but the fertile grounds for a plot as carried out through quests.

We have thus the basic process of world-building naturally give us ideas for quests and plot-lines. The "bottom up" approach with Levine's Stars and Passions combines well with the "top down" approach of drawing a map, determining the villages and towns, coming up with economies, inserting Stars and their Passions within each town, generating factions, and so on.

How real is the economy?

We need to decide how much detail the economy needs. This can serve a variety of purposes: just fluff, determine the actual value of goods, or as grounds for certain quests. Let's consider this last point in particular.

As for modeling the value of commodities using economics, this has problems in the real world using textbook economics. As far as world building cares: value matters for trade routes, and for player bartering with merchants. For trade routes, we only need them to feel "about right" (e.g., goods expensive in one town is imported from a town where those goods are cheap, don't import goods when "domestically produced" versions are cheaper than the imported version, etc.).

Suggestion: Update the world map to reflect trade, specifically roads and ports are built (and improved) over time to better facilitate trade between partners. The quality of roads and ports reflect the trade power between partners.

For the player, accuracy can conflict with fun. When this happens, always side with fun (unless we want to create a constraint for quest lines, e.g., iron shortage causes more expensive equipment...motivating the player to, y'know, fix that shortage). The real concern is that the player has access to equipment matching the challenge. We don't want to give the player overpowered gear too early, nor do we want to force the player to have access only to mediocre equipment.

Economy for lore, well, this provides the grounds for quests. Town A wants to open trade with town B since B produces silk cloth. This isn't reflected in the goods sold in either town, but provides a new quest-line. Alternatively, if B were instead the sole source of iron, then town A could supply only, say, bronze equipment. This combines lore and player experience (which is good and desired: the player should experience consequences of their choices).

In short, put yourself in the world, and ask yourself, "What goods would I have access to? How would I get food? Water? Who produces them? How would I get them? What about luxury goods? Or equipment?" Answering these questions require us to consider the towns further, and increases the realism. It's not enough to have farms randomly placed around towns (Fallout 4 tried to do that): we must also consider the infrastructure, the shortages, needs, scarcities and abundances. This leads us to consider how towns interact, how factions within a town interact, and helps us build a world.

Generating History and Culture

History, Myth, State formation. If we consider how these Stars and villages interact, we can generate a history from conflict. Items of our Stars become revered artifacts. Music and art memorialize these events. Myths emerge from misunderstanding or deliberate lies. Rivalries build up, grudges between Stars and factions emerge over time. Villages band together, forming quasi-states, which dissolve under stress and strain.

Governments. It's worth noting that we could borrow liberally from history. For example, the Polish–Lithuanian Commonwealth had a unique form of government that is seldom discussed...it could easily generate problems to be remedied for plot-line. The ducy of Venice picked its leader through a lottery (well, a lottery picked an electorate, who then picked another electorate, and so on — the convoluted process of lotteries and indirect elections resulted in a new duke).

Religion. I don't have much to comment on here. Post-apocalyptic games tend to seldom discuss religion, and the fantasy games I play have similar pantheons. One thing worth considering, the original Rogue had "daemons" responsible for updating the health, etc., for players. I don't think anyone has made these daemons the Pantheon for the game, which could lead to interesting gameplay: praying to daemons leads to temporary buffs, destroying temples related to a daemon results in temporary negative buffs, etc. Or it could have the same effect as praying to the laws of physics (i.e., nothing noticeable).

Communication. And the most underappreciated point of consideration: it takes time to communicate these events. When an event occurs, news of it spreads through caravans and travelers. Spreading information takes time, and plans revolve around information. This is a challenge to code up, because we no longer have a simple observer pattern to update dialog and quest lines.

Friday, October 2, 2020

Testing the Game

I've come to the opinion, when writing software, you should test it...usually unit testing suffices. But a game is a special kind of software. So special, one naturally faces the question, "Should we still unit test a game?"

To be clear, there are varying degrees and different notions of "testing" a game. We could test the software (make it "bug free", or at least have fewer bugs), test the enjoyability of the game (e.g., make sure it's fun, winnable, etc.), test the UI behaves as desired, "integration tests", end-to-end testing (some sort of "autmated player" searching for specific bugs). I'll discuss a few of these notions.

Testing the Code

If we have adopted a model-view-controller architecture of some flavor, then we can unit test the models. I contend we should unit test the models, and use contracts to enforce the assumptions of the model methods. What would this look like?

We can encode assumptions like Actor::dies() should demand the actor's health is non-positive (i.e., either zero or negative). This could be encoded with an assertion ("precondition"). Then we could write a unit test to create an actor, give the actor some amount of health points, cause damage, then try to call actor.dies(). There are three test cases (when actor.health() is positive, zero, and negative) which should result in the death of the actor in two cases.

Organizing Unit Tests

For object oriented languages, I'm inclined to follow some kind of xUnit testing framework or JUnit: each class we write (say class Thinger) should have some corresponding test class (e.g., class ThingerTest) where each method of the class is tested several times...so Object Thinger::methodOne() should have several corresponding methods void ThingerTest::methodOneShouldDoXTest(), and so on.

For Lisp, there's usually a framework given (Clojure has clojure.test, Common Lisp has several frameworks, etc.). The organization is analogous as for object-oriented languages (functions in module.lisp like (defun thinger-method (...)) should have several test cases handled in module_tests.lisp).

Terminology: JUnit organizes test cases as methods using assertions on a Test class, which are organized into test suites (analogous to how files are organized into directories). A test runner then iterates through the test suites and executes each test case, recording results (both successes and failures) for later use. The exact terminology varies (some xUnit systems, e.g. in smalltalk, have test case classes), but the intuition remains the same: test cases organized into test suites, and a test runner that executes the test cases and records the results.

We organize code by modules, which contain classes, which contain methods. These terms are used loosely: C programmers lack any module system, but use structs instead of classes, and functions instead of methods; Haskell programmers use modules, data types, and functions; etc. Whatever the terminology, we have some kind of ersatz class and ersatz method. Each "class" should have a corresponding test suite, each method should have several tests. Depending on how we organize classes, we should have a corresponding organization of tests: if we have one file per class, we should have an analogous file per test suite. The motivation for this scheme is to make it obvious where to place tests for code (all the tests for /game/src/models/my_module.code is placed in /game/tests/models/my_module_tests.code for example). It's more important to be consistent in whatever scheme you choose.

What to Unit Test

Test only code you have written. There's no need to write tests for third-party libraries. We trust ncurses works as expected, the GNU Scientific Library functions, and so on. It's only the code we wrote that we want to test.

We should unit test all public functions. While aiming for 100% coverage is ideal, if we get to "a lot" (I dunno, say, ~90% or whatever), we can say it's "good enough". The rationale here is that public functions are used to build our game, so if we have tested them thoroughly enough, then we can have greater confidence in their correctness (they do what we think they should).

Test all code paths. Has each statement in the method been tested? Has each edge in the control-flow graph been tested? Has each branch of every if-else statement been tested? (Or every case in a switch statement been tested?) Has every boolean subexpression of each conditional been tested? Complicated conditional tests could be refactored into predicate functions, which can be independently tested.

Each function should do one thing, and unit tests make sure the function does what we expect/hope. For example, the Actor::dies() method does one thing; once it is called, we should have Actor::isAlive() return false. This gives us two cases to consider: one where Actor::dies() fails (e.g., when Actor::health() is positive), the other when Actor::dies() succeeds. The former case should have Actor::isAlive() return the same result as Actor::health() > 0, the latter should have Actor::isAlive() == false.

Tests should be atomic. Each test case will test exactly one thing. If a test case is testing more than one thing, we should refactor it into multiple test cases.

Tests should be independent of each other. They should not rely on each other (in the sense that they don't call each other). A unit test should test exactly one thing.

Tests should be readable. Think of them as not just testing the behavior of the function, but also as an example of how to use the function. This gives us a name for the test (e.g., Actor::diesShouldNotBeAliveTest() or Actor::healthy_should_not_be_dead_test(), etc.).

Tests should be repeatable/deterministic. We should test the mechanical parts of the game (e.g., marking an Actor with zero health as "dead") where the same inputs produce the same outputs. If we are testing randomness ("rolling a die"), we should have some way to "mock out" that randomness with something deterministic ("load the die", "use a two-headed coin", etc.) to make sure the methods do what we expect.

Tests should be fast. Since each test case tests exactly one thing, we should make them small and fast.

Tests should be automated and tracked. We should be able to run the tests with a single command (e.g., "make test" or whatever), and we should include the test code in our git repository. Best practices suggest running the tests and make sure they pass before pushing code out to the repository's master branch ("don't break master").

Testing the Game

"Testing the game" has several distinct meanings: make sure the game is playable, make sure the game is fun, etc. In some sense, unit testing is like checking to make sure each square of the board is flat: but if we glue the edges badly, we could actually end up with a curved board. Unit testing checks locally each function does what we hope, but it doesn't check the game does what we hope. This motivates integration testing and end-to-end testing.

Integration testing can be useful. If we want to make sure dialog options trigger quests and completes quests, we need integration testing. This amounts to setting up a mock game, simulating dialog, then checking the game state matches what we expect. Since multiple "units" are being tested in conjunction (dialog, quests, etc.), we're really testing that they're "integrated" correctly. For people trying to create an old-school Fallout or Wasteland-type game, this can be very useful.

Testing from the Player's Perspective

I would suggest testing the game from the user's perspective. This is harder to do, and varies depending on the type of game you're making. In my mind, I assume you are programming something like a Baldur's Gate, Fallout, or Wasteland-type game: a mixture of quests, dialog, combat, and possibly more.

What do we hope to test? We'd like the game to be playable (quests "fit together" in a way that the player can get to the end of the game), but we'd also like the game to be fun. This is where design decisions are needed: how do we specify a game to be "fun"? Is there sufficient choice architecture?

Test the game is playable (quests sequence properly). For example: quest A occurs before quest B, wherein quest A requires killing actor X, but actor X issues quest B after completing quest A, rendering quest B un-triggerable. (More concretely: if the king gives us a quest after completing his minister's quest, and the minister asks us to assassinate the king, then there better be some mechanism for us to continue after killing the king...like the minister takes over. Otherwise, if we are waiting on the deceased king to give us our next quest, we'll be stuck.)

If we have a domain specific language for quests, actors, items (and if we store these in .info files), then we could have a simple helper program which runs through the quests, makes certain the quest items are fetchable, the quest-issuing actors are alive, and there are chains of dialog/quests which start with a specified initial quest and end at the specified final quest.

The helper script should record sequences of quests which are unwinnable, or when there are disconnected components (e.g., killing actor X early in the game prevents quests B, C, and D). At the end, it will print out to the screen a summary (along the lines of "M paths succeed, N paths tried with K character builds") and a more verbose explanation to a file ("Character build C played the quest chain Q1, ..., Qm then got stuck at quest X") possibly with the trajectory of events for reconstruction. We could automate this script to try all variations of skills and stats, too.

I've discussed this idea in passing a few times, I'll probably make it the subject of a future post...maybe have a minimal working example for people to play with, we'll see. It'll involve a variation of depth-first search along a few distinct play-styles...we'll see, friends, we'll see.

Test you aren't a jerk to the player. Suppose our game has factions and the player has a reputation (loved, liked, undecided, disliked, hated). If our game penalizes the player's reputation when the player kills a member of the faction, then we should beware of the situation when the player witnesses the death of a faction's member: will this tarnish the player's reputation or not? This is a prime candidate for sticking away into a unit test, for regression testing.

If this is part of the plot (the player, witnessing a murder, is then falsely charged with the murder), then it should be written into the game manually. The last thing a player wants is to find the police are after them for...apparently doing nothing. That may be amusing to the programmer (I certainly chuckled), but it's no fun to the player.

There are other similar cases which, when programming, do not immediately sound consequential. But for the player, it feels like the game is designed by a vindictive jerk. It may not be easy to discern when this happens, but once discovered we should try to create unit tests to ensure we aren't jerks.

Heuristic "Tests"

These are measurements of symptoms which boring games exhibit. Alas, there's no way to automate the underlying "boring-ness" away.

Test the game has choice and consequences. Does dialog change to reflect the player's actions? Are new quests opened up specific to the player's choices? Do new interactions [dialog, NPC encounters, quests offered] occur after the player chooses particular outcomes?

Is it possible to have a playthrough where the player kills everyone before talking to them? This forces us to design the quests with constraints that force the game to have consequences and the player to have freedom. Chris Sawyer noted this design decision in an interview with IGN as key to game reactivity and player choice. We can enforce this check with a particular playthrough in our helper script.

Test the game takes you to all the locations. What locations are visited by the play-throughs? We could dump the trajectory of locations visited for further analysis. Sometimes a location is visited more frequently than intended, other times a location is never visited. This is a symptom of possibly less fun games, which can't be automated to enforce: it's an aid to help revise quest considerations.

In general, test for symptoms of fun. Depending on what your game is trying to accomplish, the criteria for "having fun" varies. Each criteria has different symptoms, and we should figure out how to automate ways to check these symptoms are present in our game. On the flip side, there are certainly symptoms of anti-fun: fun-killing elements we want to avoid. We should also automate ways to check these anti-fun elements are not present in our game.

In some sense, this is the best we can do with automated testing: test for proxies of what we want, and regression-checks against what we dislike. There's no way to properly "test for fun", but we can test the game can be played in different playstyles and for the "kill everyone before you even interact with them" heuristic Chris Sawyer noted.

Concluding Remarks

Game developers tend not to test their games, at least not in the same way that software engineers test their programs. Unit testing is generally discouraged among game developers, for good reason (having unit tests give false sense of being "correct", whereas games seek fun not correctness).

But we can test for an RPG "being playable". We can further make such testing automated. Insofar as we can make such testing scripts, I think we should...at least, I should. Such automated testing checks the quests are ordered correctly and unlockable, speakers are referenced properly, and so on. Again, this doesn't test gameplay, but it tests the game can be played.

As for what this looks like, I'm working on a minimal RPG I've decided to refer to as "project Delaware". (Why Delaware? Nothing special: I've just opted to use the names of states in the U.S. by order of admission to the union. And Delaware is the first state admitted to the union.) I hope to have something to share soon-ish.

Bread crumbs and notes on designing a roguelike from scratch.

Table of Contents