When you need to generate data for tests

Pseudo-random data generation for tests

I often say that in tests we break rules, we cheat and lie as long as the code is exercised and the production code is safe, bug free.

Quite often we need to be able to test what happens when we operate on an aggregate, but the entire aggregate is not important, when we only care about a few values and the rest is needed sometimes for logging, sometimes for validity checks we would rather gloss over.

Then on every test we setup fields that we don't really need but sometimes there is no working default. It is a distraction from the actual functionality we are testing. Later we read the test and wonder:

"the stock is being set to 42... Does it affect my other calculation under test?
I guess I need to check!"

I have been solving this problem for the past few years with randomized tests, pseudo-randomized tests to be clear.
And because the random values are predictable, they are repeatable.

Lets look at some code

It is random as in we can't fine tune the values generated, we only know they pass a rough inspection, then we set the values that matter. Not distractions or boilerplate on the tests.
Check this Spock test snippet:

def "askingPrice works"() {
    given: "some book with price and margin"
      def b = gen.getBook().toBuilder()
              .price(100.0d)
              .margin(0.3d)
              .build()
    expect: "asking price calculation, rounded down is correct"
      b.calculateAskingPrice().trunc() == 131 //will fail
}

Let me break down line by line:

  1. test is called askingPrice works
  2. given: tag just allows us to setup what will be used
  3. variable b contains a randomly generated book from gen.getBook(), toBuilder allow us to create a clone with some differences
  4. price is set to 100.0
  5. margin is set to 30%
  6. build new object
  7. expect: tag tells our assertions follow
  8. calculate asking price, truncate to integer and compare with my expected value.

It will fail, on the report what we get is:

Condition not satisfied:

b.calculateAskingPrice().trunc() == 131
| |                      |       |
| 130.0                  130.0   false
Book(title=Ssbsevmlh Wd Nyughpsvjjlh Hdbkcing, author=Hjmlta Gq \
, release=2021-11-25, price=100.0, margin=0.3, stockQuantity=272)

Here we had no idea of what other fields book holds, but we can clearly see that the values we set are important for the test. The book contains gibberish but It seems usable.

Now let me show the implementation of gen.getBook():

class Generator {
    //...
    Book getBook(Random random = this.random) {
        def maxDate = LocalDate.of(2022, 10, 31)
        def minDate = LocalDate.of(2015, 1, 1)
        Book.builder()
                .title(getTitle(random))
                .author(getAuthor(random))
                .release(getDate(maxDate, minDate,random))
                .price(random.nextDouble(1_000.0))
                .margin(random.nextDouble(1.0))
                .stockQuantity(random.nextInt(1,400))
                .build()
    }
}

If it seems weird, here are some clarifications:

  • getBook() method takes a random parameter that defaults to an instance member
  • some auxiliary methods generate values (getTitle, getAuthor, getDate)
  • then random generates some numbers with some caps

Now how the Generator instance is created?

    Generator(long seed = new Random().nextLong()) {
        this.seed = seed
        this.random = new Random(seed)
        println("Using Generator seed= ${seed}")
    }

The default value for seed is randomly generated, but that is used to generate the Random object that govern all the other values we use. And an important part is writing that seed so if tests fail we can retry with exact same values by hard coding the seed once.

Java's Random will generate the same sequence of values when given the same seed.

I have the code above in a simple project that can be inspected here: github/mashimom/pseudo-random-test-data-generation