ConceptScript 101

Humans suck at thinking up edge cases.

Machines don’t. Nothing finds edge cases better than a few weeks in production.

Our technology makes machines write high quality test data, filled with stuff humans forget.

Quick Example

ConceptScript describes random data. Consider a database with some mock user data:

registeredUsers = {
    name: unicodeChar().list().concat(),
    email: internet.email(),
    password: [
        // realistic password distribution
        '123', 'password', 'swordfish',
        latinChar().list().concat()
    ].element()
}.list();

From the description, random samples are taken and used in tests. Here’s a possible one:

 [
     {"password":"swordfish",
      "name":"𫍬󞷒󺂩ð°",
      "email":"~$|+*|~|@9c1--8p.46-nd8e.73--D59vl3"},
     {"password":"swordfish",
      "name":"󃝄񨝄왫󢁚򟬷󸌔",
      "email":"#~?`'@v-37--7-11.cH03.7DQ"},
     {"password":"ET",
      "name":"󾢇",
      "email":"!$-$8#!@4zY4.9M5Y7.00.lC0n.7.5.Y6z.B.E-36--1l.c9"},
     {"password":"password",
      "name":"",
      "email":"!!{}?^4=@9t-2623.3--u-S-l3.4-1x35I.y-6"}
 ]

The generator algorithm is dumb, but honest. Think about the edge cases it will cover:

  • Unicode characters in usernames: encoding mismatch issues should crop up in tests.
  • Atypical emails: poor regex checks will fail, and naive parsers may crash.
  • Empty fields: how well does your system sanitize what comes from the db?
  • A sample may get generated with zero registered users: does a clean, userless installation work correctly?

Unit tests tend to start specific, and get more general. Our first sample starts general, and you only make it as specific as it needs to be. Using our data generation in tests starts catching edge cases immediately!

The description is trivial to tailor to your needs as you realize and document system invariants. For example:

  • If the name field is guaranteed to be non-empty by MySQL, you can change list() to list().nonEmpty().
  • If emails are restricted to a server you control for the tests, you can do email(Domain=’test-smtp.oursite.com’).

With such trivial, incremental changes you can match your test data to your system:

 [
     {"password":"qDXx",
      "name":"󞗷򰶇󑹧􊛄򱕹",
      "email":"*'|^_/~@test-smtp.oursite.com"},
     {"password":"wkXUZMf",
      "name":"𮴸򆗖󻾲􀁯",
      "email":"`6?&^!`@test-smtp.oursite.com"},
     {"password":"DjdnkDA",
      "name":"􆘣򱄼󖗋𪊒𒯷􎰺򾇰",
      "email":"!*^.+?%=@test-smtp.oursite.com"}
 ]

Using ConceptScript

There’s an online editor with autocompletion, linked-in docs, and strong static typechecking. It lets you preview samples of your data at the press of a button, and comes preloaded with examples.

../_images/htui.jpg

Automated test tools can use an HTTP API, via GET, GET with JSONP, or POST. (Demo to be announced.)

As well, there are wrappers of the API in whatever tool you’re familiar with. Thus far a Unix shell script and Erlang library are available, with more to come.

Language Overview

Constant Values

Constant test data is what everyone writes, and is boring. Things will get interesting soon, but first we need integers, strings, floats and booleans.

You can give things names. These names are permanently attached to the concept they define, and can’t be reassigned.

Arrays and string-keyed JSON-like objects are supported. Their main purpose is to structure data, so they’re defined often but operated on rarely.

postIds = [5125,5432,6348,8643];
block = {
    number: 150873,
    hash: "33ab9f8731ae7b7501084a69ae91a2f0d340fc4ed658557ff2f",
    transactions: 56,
    total: 2720.41288992,
    size: 23812,
    ttl: 5.32653e8
    confirmed: true
};

Sample Generators

Generators produce random output according to some format:

stuff = [boolean(), digit(), hexDigit()];
 [false,"8","F"]

You can chain them together to produce fancy data:

hash = hexDigit().vector(64).concat();
xForwardedFor = ipv4().list().nonEmpty();
 "0e522b0cA5fD9c3AC4dd8Dfd2Db1Ca0da9FafaE2Cc414C7F93dDEcb9Fc7C48Ff"
 ["192.114.194.217","69.31.176.218","33.243.170.218","252.40.248.14","39.74.25.48","235.246.165.140","39.29.42.39"]

Many generators have arguments with default values you can override - with other generators! Very fancy data is easy to generate:

files = [
    ['views-','clicks-','revenue-'].element(),
    date(),
    ['.js', '.xml', '.html', '.pdf'].element()
].concat();
fileRequest = url(Scheme=['http','ftp'].element(), Path=files(), Domain='files.oursite.com');
 "http://files.oursite.com:54754/views-2011.09.03.js?cl=5gF0F;k=ZMqA;zYRqwVR4i;aSjHN=SUSVvX;V41mlLBSY#7z3PBgA"

All those generators actually reside in namespaces, like internet.url.scheme() and string.concat(). We infer most of them for you, so you rarely need to type them:

integer.random(0,100);
float.random(0.0,100.0);
 52
 30.049200166551643

Type System

Does this look right to you?

integer.random("zero", email());

How about this?

integer.random(0, [3,5,7,11].element());

With ConceptScript’s strong static typing, who cares!

  • Errors in your program are discovered as you write it, not as you run it.
  • Tricky code is sanity checked by a machine, so your sleep-addled brain doesn’t have to.
  • Absolutely awesome autocompletion!

It also has total type inference, so you never have to specify a type. You can view them, but never have to write them.

Evaluation

ConceptScript is about defining generators in terms of generators. Constant values are only used as arguments, or in a list of possible values to choose from.

This is good practice, as it encourages generation of the most varied data possible.

Sometimes that’s insufficient, and a value must be generated that gets re-used. A special binding operator, ‘<-‘, can be used instead of ‘=’.

length <- integer.random(0,10);
msg = {
    count: length(),
    mask: boolean().vector(length())
}
 {"mask":[false,true,true,false,false,false,true],"count":7}