Is It a Bad Practice to Randomly-Generate Test Data

Is it a bad practice to randomly-generate test data?

This is an answer to your second point:

(2) I use testing to as a form of documentation for the code. If I have hard-coded fixture values, it's hard to reveal what a particular test is trying to demonstrate.

I agree. Ideally spec examples should be understandable by themselves. Using fixtures is problematic, because it splits the pre-conditions of the example from its expected results.

Because of this, many RSpec users have stopped using fixtures altogether. Instead, construct the needed objects in the spec example itself.

describe Item, "#most_expensive" do
it 'should return the most expensive item' do
items = [
Item.create!(:price => 100),
Item.create!(:price => 50)
]

Item.most_expensive.price.should == 100
end
end

If your end up with lots of boilerplate code for object creation, you should take a look at some of the many test object factory libraries, such as factory_girl, Machinist, or FixtureReplacement.

What are the downsides using random values in Unit Testing?

Downsides

Firstly, it makes the test more convoluted and slightly harder to debug, as you cannot directly see all the values being fed in (though there's always the option of generating test cases as either code or data, too). If you're doing some semi-complicated logic to generate your random test data, then there's also the chance that this code has a bug in it. Bugs in test code can be a pain, especially if developers immediate assume the bug is the production code.

Secondly, it is often impossible to be specific about the expected answer. If you know the answer based on the input, then there's a decent chance you're just aping the logic under test (think about it -- if the input is random, how do you know the expected output?) As a result, you may have to trade very specific asserts (the value should be x) for more general sanity-check asserts (the value should be between y and z).

Thirdly, unless there's a wide range of inputs and outputs, you can often cover the same range using well chosen values in a standard unit tests with less complexity. E.g. pick the numbers -max, (-max + 1), -2, -1, 0, 1, 2, max-1, max. (or whatever is interesting for the algorithm).

Upsides

When done well with the correct target, these tests can provide a very valuable complementary testing pass. I've seen quite a few bits of code that, when hammered by randomly generated test inputs, buckled due to unforeseen edge cases. I sometimes add an extra integration testing pass that generates a shedload of test cases.

Additional tricks

If one of your random tests fails, isolate the 'interesting' value and promote it into a standalone unit test to ensure that you can fix the bug and it will never regress prior to checkin.

Random data in Unit Tests?

There's a compromise. Your coworker is actually onto something, but I think he's doing it wrong. I'm not sure that totally random testing is very useful, but it's certainly not invalid.

A program (or unit) specification is a hypothesis that there exists some program that meets it. The program itself is then evidence of that hypothesis. What unit testing ought to be is an attempt to provide counter-evidence to refute that the program works according to the spec.

Now, you can write the unit tests by hand, but it really is a mechanical task. It can be automated. All you have to do is write the spec, and a machine can generate lots and lots of unit tests that try to break your code.

I don't know what language you're using, but see here:

Java
http://functionaljava.org/

Scala (or Java)
http://github.com/rickynils/scalacheck

Haskell
http://www.cs.chalmers.se/~rjmh/QuickCheck/

.NET:
http://blogs.msdn.com/dsyme/archive/2008/08/09/fscheck-0-2.aspx

These tools will take your well-formed spec as input and automatically generate as many unit tests as you want, with automatically generated data. They use "shrinking" strategies (which you can tweak) to find the simplest possible test case to break your code and to make sure it covers the edge cases well.

Happy testing!

Is it good to create random tests?

You can work with generated test data, but I would discourage you from using random live-generated input for one reason: repeatability.

If there is a problem in your code, you might have a test that suddently go red because you accidentally found a "wrong" parameter combination... or not.

Such a test is highly unreliable.

Is data-driven testing bad?

I think the main problem is testing with "randomly generated data". It is not clear from your question whether this data is re-generated each time your test harness is run. If it is, then your test results are not reproducible. If some test fails, it should fail every time you run it, not once in a blue moon, upon some weird random test data combination.

So in my opinion you should pre-generate your test data and keep it as a part of your test suite. You also need to ensure that the dataset is large enough and diverse enough to offer sufficient code coverage.

Moreover, As Ben Voigt commented below, testing with random data only is not enough. You need to identify corner cases in your algorithms and test them separately, with data tailored specifically for these cases. However, in my opinion, additional testing with random data is also beneficial when/if you are not sure that you know all your corner cases. You may hit them by chance using random data.

Why not use a pseudo random number generator to produce test data?

Random number generation can create couplings between classes and timing artifacts is not clear to me.

This is more clear by taking into account the next sentence:

because most random number generator classes are thread safe and therefore introduce additional synchronization

It's the memory synchronization that may change the timing of your program. If you look into Random, you can see that it uses an AtomicInteger under the covers so using it will cause read and write memory barriers as part of the generation of the test data which may change how the other threads see data and the timing of your application overall.

Which classes and timing artifacts is he referring to here?

Any class that uses threads and relies on memory synchronization may be affected. Basically all threads and classes that they call.

What kind of couplings the RNG can create?

As @Bill the Lizard commented on, the book is saying that by using a RNG, the timing of the program then is relying on or affected by the RNG synchronization.

The real lesson here is that the test data you inject into program should not change the timing of your program if possible. It is often difficult and can be impossible but the goal is to simulate the application behavior (timing, input, output, ...) as much as possible in the test.

In terms of a solution, you could use another simple random algorithm that was not synchronized. You could also generate a class that stored 10000 random numbers (or however many you need) beforehand and then handed them out without synchronization. But by using a class in your tests that did memory synchronization, you are changing the timing of your program.

Testing with random inputs best practices

I agree with Federico - randomised testing is counterproductive. If a test won't reliably pass or fail, it's very hard to fix it and know it's fixed. (This is also a problem when you introduce an unreliable dependency, of course.)

Instead, however, you might like to make sure you've got good data coverage in other ways. For instance:

  • Make sure you have tests for the start, middle and end of every month of every year between 1900 and 2100 (if those are suitable for your code, of course).
  • Use a variety of cultures, or "all of them" if that's known.
  • Try "day 0" and "one day after the end of each month" etc.

In short, still try a lot of values, but do so programmatically and repeatably. You don't need every value you try to be a literal in a test - it's fine to loop round all known values for one axis of your testing, etc.

You'll never get complete coverage, but it will at least be repeatable.

EDIT: I'm sure there are places where random tests are useful, although probably not for unit tests. However, in this case I'd like to suggest something: use one RNG to create a random but known seed, and then seed a new RNG with that value - and log it. That way if something interesting happens you will be able to reproduce it by starting an RNG with the logged seed.

Is it bad practice to iterate over many random training & test set splits until a high accuracy is achieved?

Is it bad practice to iterate over many random training & test set splits until a high accuracy is achieved?

Yes, this is bad practice. You should be evaluating on data that your model has never been trained on, and this wouldn't really be the case if you train many times to find the best train/test split.

You can put aside a test set before you train the model. Then you can create as many train/validation splits as you want and train the model multiple times. You would evaluate on the test set, on which the model was never trained.

You can also look into nested cross-validation.

Are Test Data Factory Methods Dangerous or Beneficial?

I think a good unit test needs to do its job and focus on robust testing.

I prefer option 1. Using randomized data may lead to unexpected unit test failures, but at the end of the day this is a good thing because you are covering every technically possible scenario, including scenarios a developer would not think of.

Option 1 may lend itself to a bit more work debugging, but if the methods being tested are simple then this is not likely to be an important amount of overhead. If the methods being tested are extremely complex, it would likely be difficult to debug anyway and probably needs to be broken up.

Options 2 and 3 introduce human error. Minimizing human error is a key goal of any program. 3 is also verbose and unmaintainable. Those issues are likely to eventually end up costing more time and effort than whatever would be dedicated to an extra bit of debugging.

My guess is that #1 will take less time and effort in the long run. That's only a guess and not the main concern. The main concern is that #1 is the proper way to ensure a robust test is executed.

Using randomness and/or iterations in unit tests?

I have been using randomness in my testcases. It found me some errors in the SUT and it gave me some errors in my testcase.

Note that the testcase get more complex by using randomnes.

  • You'll need a method to run your testcase with the random value(s) it failed on
  • You'll need to log the random values used for every test.
  • ...

All in all, I'm throthling back on using randomness but not dismissing it enterly. As with every technique, it has its values.

For a better explanation of what you are after, look up the term fuzzing



Related Topics



Leave a reply



Submit