Most of the code I’ll be working with in my new job (BTW blog: I have a new job) is written in Scala and uses property based testing with ScalaCheck. Yesterday I ran into a problem with an existing test suite that suddenly began failing with too many discarded tests:
[info] FormattersSpec
[info] Formatters are invertible for:
[info] + Mapping
[info] + Identifier
[info]
[error] x Metadata
[error] Gave up after only 39 passed tests. 197 tests were discarded. (FormattersSpec.scala:11)
This test generates random Metadata values and makes sure that they
can be serialised and deserialised correctly (i.e. values can be
round-tripped). The property being test here is identical, only the
Arbitrary, Serialise, and Deserialise instances vary in each
case. The truly odd thing is that the pertinent code looks like this:
case class Identifier(name: String)
case class Metadata (id: Identifier, maps: Set[Identifier])
implicit val ArbIdentifier = Arbitrary(
for {
name <- arbitrary[String]
} yield Identifier(name)
)
implicit val ArbMetadata = Arbitrary(
for {
identifier <- arbitrary[Identifier]
mappings <- arbitrary[Set[Identifier]]
} yield Metadata(identifier, mappings)
)My first step was redefining a few related Arbitrary instances to
avoid using suchThat (which discards invalid values) but this didn’t
fix the problem. Eventually I tried redefining ArbMetadata like
this:
implicit val ArbMetadata = Arbitrary(
for {
identifier <- arbitrary[Identifier]
mappings <- Gen.const(Set.empty[Set[Identifier]])
} yield Metadata(identifier, mappings)
)and the problem went away. Trying to use arbitrary[Set[Identifier]]
in various ways in the Scala REPL confirmed that it is the problem; we
can easily generate as large a List[Identifier] as we like, but a
Set[Identifier] fails fairly frequently:
// This always generates a Some[List[Identifier]] value.
Gen.listOfN(100, arbitrary[Identifier]).map(_.length).sample
// Sometimes we get a Some[List[Set[Identifier]]] and others None.
Gen.listOfN(100, arbitrary[Set[Identifier]]).map(_.length).sampleIt appears as though whatever mechanism is used by arbitrary[Set[_]]
to construct the sets, it doesn’t fails when the generator for the
value type returns duplicate elements. You can confirm this easily by
trying arbitrary[Set[Unit]]; any Gen[Unit] has no choice but to
return a the single value of type Unit (or to fail) and, as
expected, this almost never succeeds. Replacing the problematic
arbitrary[Set[Identifier]] in the original code with
arbitrary[Seq[Identifier]].map(_.toSet) resolves the issue:
constructing a set from a list of possibly duplicate Identifiers
always works.
After a bit of reading in the ScalaCheck source code it seems as
though the root cause of this problem is some instance of
CanBuildFrom[Set[_], A, Set[A]] but I’ve no idea how to go
about figure out which one or why it’s broken. In any case, I now know
a bit more about working with Scala.
For more information, see the ScalaCheck issue #89.