Generating Random Tests with Python and Pytest

One of the things I’m trying to incorporate into my work on cfltools is highly integrated unit testing. Since cfltools is a forensic utility, it’s critical that I am able to show that the program creates predictable, consistent outputs when fed a variety of data. This ensures that the products of the tool have legally defensible, evidentiary value.

To get this done, we’re going to use pytest, a very standard library for integrating unit tests into python. One primary thing my program is intended to do is process IP logs. These logs are going to vary from source to source, but what I’m interested right now are two critical pieces of data: an IP address and a time that the IP address was recorded. I can extend the general concept later, but for now this is my focus.

So, to test the particular functions of the program I’m developing, I want to generate large logs of random data on the fly. I’ve divided cfltools up into modules (you can see the general structure of the program at its GitHub repository). The module I’m working on just parses log files, and so it’s pretty uncreatively called logparse (which I import as cfltools.logparse).

pytest will look through files and folders for functions or files preceded by the keyword test_. So, I create a file in cfltools/logparse called test_logparse.py and put most of my unit tests in there.

Creating the Fixtures

Fixtures in pytest are essentially reusable objects that we can call over and over again in different tests. Lets create a couple of fixtures for IP addresses and a timestamp.

@pytest.fixture
def ipv4address():
    """
    Generates a random ipv4 address.
    """
    class IPv4AddrFactory():
        """Generates random ipv4 addreses."""
        def get(self):
            """Return an ipv4 address."""
            ipaddr = str(randint(1, 255)) + '.' + \
                     str(randint(1, 255)) + '.' + \
                     str(randint(1, 255)) + '.' + \
                     str(randint(1, 255))
            return ipaddr
    return IPv4AddrFactory()


@pytest.fixture
def randomdatetime():
    """
    Generates a random date. Returns an integer
    as a POSIX time between 20100101 or 20110101.
    """
    class RandomDateTimeFactory():
        """Returns a randomly generated date and time."""
        def get(self):
            """Return a date/time"""
            # POSIX date for 20100101 0100Z: 1262350800
            # POSIX date for 20110101 0100Z: 1293886800
            # Use these dates to bracket dummy dates.
            date_posix = randint(1262350800, 1293886800)
            return str(date_posix)
    return RandomDateTimeFactory()

I could just return an actual IP address from the function, but doing it this way means that when I import the fixture into a unit test, I can instantiate a completely new random object. For example, I could, naively, do the following.

@pytest.fixture
def randomnumber():
    return random.randint(1,100)

def functionundertest(number):
    return number*10

def test_random(randomnumber):
    assert functionundertest(number) == randomnumber*10

This is fine if I just want to run the test once. If I want to enclose the test in a for loop:

def test_random(randomnumber):
    for _i in range(1,100):
         assert functionundertest(number) == randomnumber*10

What I’m doing above is not testing our functionundertest against 100 random numbers. I’m just running the same random number through the test 100 times, which isn’t very informative. What I do instead is instantiate the fixture as an object.

@pytest.fixture
def randomnumber():
    class RandomNumber():
        def get(self):
            return random.randint(1,100)
    return RandomNumber()

Now what I get instead is a random number generator, not a single random number. So I can modify my loop from before.

def test_random(randomnumber):
    for _i in range(1,100):
        testnumber = randomnumber.get()
        assert functionundertest(testnumber) == testnumber*10

And now I’m generating a bunch of random test cases constrained by how I define my generator. This is much more informative. I applied this logic to the above to generate unique, random “dummy logs” for my program tests. You ideally want your tests to be really exhaustive, but for now this suffices to find random problems and check scalability (since I can use this to make log files really large arbitrarily.

Finally, I create a fixture that puts these fixtures together and generates a log. The randomNumOccurances object is another fixture that generates a random number of occurrences that an IP was detected, which is a feature of some datasets this program is meant to work with.

@pytest.fixture
def iplogline(ipv4address, randomdatetime, randomNumOccurances):
    """
    Generate a random IP logfile line.
    """
    class IPLogLineFactory():
        """Gives us one line of a logfile."""
        def get(self):
            """Returns a list object that is one line of a dummy logfile."""
            return [ipv4address.get(), \
                    randomdatetime.get(), \
                    randomNumOccurances.get()]
    return IPLogLineFactory()

So that gives me one line of a log file. I build up like this from base components because you never know when you’re going to need just one piece of a fixture to test some small function.

A Dummy File

Now I want to put all of this together in a large dummy log file. Most of the files I work with are *.csv, but I don’t want to have a lot of test files floating around. I want my test harness to generate them on the fly. pytest can do this.

@pytest.fixture
def logfile(tmpdir, iplogline):
    """Generates a dummy CSV file for testing."""
    testfile = tmpdir.join("logfile.csv")
    with open(testfile, 'w') as file:
        for _i in range(1, 100):
            line = iplogline.get()
            file.write(line[0] + ',' + line[1] + ',' + line[2] + '\n')
    yield testfile

This will create a temporary file at the location and return an absolute path to the temporary file. Now I have a completely random csv log file, generated on the fly, that I can use for testing. Lets make a basic test. One thing cfltools does is remembers logfiles its already seen by taking an md5 checksum of the file and storing it in a database. Lets do that.

def test_open_file_and_checksum(logfile):
    """
    Verifies that a file can be opened and checksummed
    by LogParser() and LogFile().
    """
    from hashlib import md5
    parser = LogParser(logfile)
    with open(logfile) as file:
        data = file.read()
    test_md5 = md5(data.encode('utf-8')).hexdigest()
    assert parser.logfile.md5() == test_md5

And now we have a basic test harness we can bolt things on to.