Chipyard: Running a simple Hello World binary against a RISC-V Rocket core

This guide assumes that you have finished all the steps in my previous post, Setting Up a RISC-V Security Testing Environment and have managed to generate a basic binary that simulates a RISC-V Rocket core using Verilator.

Once Chipyard is basically up and running, you should have a chipyard folder that looks more or less like this:

~/chipyard$ ls
bootrom    CHANGELOG.md  CONTRIBUTING.md  env.sh      lib      project    riscv-tools-install  sims      tests       tools         vlsi
build.sbt  common.mk     docs             generators  LICENSE  README.md  scripts              software  toolchains  variables.mk

We ran our simulation in the earlier guide from within chipyard/sims/verilator and were able to generate cycle accurate test results by issuing make run-asm-tests from within that directory. Also in that directory we ought to have a simulator-example-RocketConfig that constitutes our simulation “emulator.” If we just run the emulator without specifying a target binary, we should get a short help file.

~/chipyard/sims/verilator$ ls
generated-src  Makefile  output  simulator-example-RocketConfig  verilator_install  verilator.mk
~/chipyard/sims/verilator$ ./simulator-example-RocketConfig
No binary specified for emulator
Usage: ./simulator-example-RocketConfig [EMULATOR OPTION]... [VERILOG PLUSARG]... [HOST OPTION]... BINARY [TARGET OPTION]...
Run a BINARY on the Rocket Chip emulator.

Mandatory arguments to long options are mandatory for short options too.

EMULATOR OPTIONS
  -c, --cycle-count        Print the cycle count before exiting
       +cycle-count
  -h, --help               Display this help and exit
  -m, --max-cycles=CYCLES  Kill the emulation after CYCLES
       +max-cycles=CYCLES
  -s, --seed=SEED          Use random number seed SEED
  -r, --rbb-port=PORT      Use PORT for remote bit bang (with OpenOCD and GDB)
                           If not specified, a random port will be chosen
                           automatically.
  -V, --verbose            Enable all Chisel printfs (cycle-by-cycle info)
       +verbose

EMULATOR DEBUG OPTIONS (only supported in debug build -- try `make debug`)
  -v, --vcd=FILE,          Write vcd trace to FILE (or '-' for stdout)
  -x, --dump-start=CYCLE   Start VCD tracing at CYCLE
       +dump-start

EMULATOR VERILOG PLUSARGS
       +tilelink_timeout=INT
                           Kill emulation after INT waiting TileLink cycles. Off if 0.
                             (default=0)
       +max_core_cycles=INT
                           Kill the emulation after INT rdtime cycles. Off if 0.
                             (default=0)
HOST OPTIONS
  -h, --help               Display this help and exit
       +permissive         The host will ignore any unparsed options up until
                             +permissive-off (Only needed for VCS)
       +permissive-off     Stop ignoring options. This is mandatory if using
                             +permissive (Only needed for VCS)
      --rfb=DISPLAY        Add new remote frame buffer on display DISPLAY
       +rfb=DISPLAY          to be accessible on 5900 + DISPLAY (default = 0)
      --signature=FILE     Write torture test signature to FILE
       +signature=FILE
      --chroot=PATH        Use PATH as location of syscall-servicing binaries
       +chroot=PATH

HOST OPTIONS (currently unsupported)
      --disk=DISK          Add DISK device. Use a ramdisk since this isn't
       +disk=DISK            supported

TARGET (RISC-V BINARY) OPTIONS
  These are the options passed to the program executing on the emulated RISC-V
  microprocessor.

EXAMPLES
  - run a bare metal test:
    ./simulator-example-RocketConfig $RISCV/riscv64-unknown-elf/share/riscv-tests/isa/rv64ui-p-add
  - run a bare metal test showing cycle-by-cycle information:
    ./simulator-example-RocketConfig +verbose $RISCV/riscv64-unknown-elf/share/riscv-tests/isa/rv64ui-p-add 2>&1 | spike-dasm
  - run an ELF (you wrote, called 'hello') using the proxy kernel:
    ./simulator-example-RocketConfig pk hello

Of interest to us right now is the option to run an ELF (Extensible Linked Format) binary using the emulator.

Creating a RISC-V “Hello World” binary

We’re going to generate a simple C “Hello World” program. It isn’t anything fancy. helloworld.c:

#include <stdio.h>

int main() {
    printf("Hello world!\n");
    return 0;
}

We can also go ahead and put together a simple Makefile.

CC-X86 = gcc
CFLAGS-X86 = -g
CC-RISCV = riscv64-unknown-elf-gcc
CFLAGS-RISCV = -g
RM = rm -f

default: all
all: hello-x86 hello-riscv


hello-x86:
        $(CC-X86) $(CFLAGS-X86) -o hello-x86 helloworld.c

hello-riscv:
        $(CC-RISCV) $(CFLAGS-RISCV) -o hello-riscv helloworld.c

I’m doing it this way because we may want to test how the binary runs on our native architecture, in my case x86, just to ensure that the C is behaving as I expect it to. The -g flag produces extra debugging output that a debugger like gdb can use to examine the binary later, if that’s necessary.

For x86, the compiler is gcc. For RISC-V, it’s riscv64-unknown-elf-gcc, which was compiled as part of the RISC-V toolchain in the earlier blog post. So, to compile our helloworld.c we’re using gcc -g hello-x86 helloworld.c for x86 and riscv64-unknown-elf-gcc -g hello-riscv helloworld.c to cross compile into RISC-V. By default, make will generate both for us.

~/test-binaries/helloworld$ ls
helloworld.c  Makefile
~/test-binaries/helloworld$ make
gcc -g -o hello-x86 helloworld.c
riscv64-unknown-elf-gcc -g -o hello-riscv helloworld.c
~/test-binaries/helloworld$ ls
hello-riscv  helloworld.c  hello-x86  Makefile
brad@artificer:~/test-binaries/helloworld$ ./hello-x86
Hello world!
~/test-binaries/helloworld$ file hello-riscv
hello-riscv: ELF 64-bit LSB executable, UCB RISC-V, version 1 (SYSV), statically linked, with debug_info, not stripped

So far, so good.

The RISC-V Proxy Kernel, Running the Binary

To run a binary against the emulator, we need to use the RISC-V Proxy Kernel. The proxy kernel “handles I/O-related system calls by proxying them to a host computer,” and is necessary to view the STDOUT output of your program.

~/chipyard/sims/verilator$ ./simulator-example-RocketConfig pk ~/test-binaries/helloworld/hello-riscv
This emulator compiled with JTAG Remote Bitbang client. To enable, use +jtag_rbb_enable=1.
Listening on port 40615
Hello world!

After a long while the program should deliver us our output and dump us back to the terminal. Be patient! This takes surprisingly long to run. To illustrate this, here is an execution run using time and the -c flag to show us number of cycles.

~/chipyard/sims/verilator$ time ./simulator-example-RocketConfig -c pk ~/test-binaries/helloworld/hello-riscv
This emulator compiled with JTAG Remote Bitbang client. To enable, use +jtag_rbb_enable=1.
Listening on port 42497
Hello world!
*** PASSED *** Completed after 531629 cycles

real    1m41.844s
user    1m41.485s
sys     0m0.361s

On the test machine I used, this puts me at a little over 5,000 cycles per second. It’s not fast, but it’s fine for very short test binaries meant to explore the ISA.

Conclusion

At this point we’ve verified the most critical functionality of the Chipyard toolchain on a machine: instantiating an example core and running a test binary of our own design against it. Now we need to be able to instantiate our own, self-defined RISC-V core and run a binary against that, completing our basic toolchain familiarization.

Chipyard: Setting up a RISC-V security testing environment

My master’s thesis work has been in RISC-V security, a topic that has gained substantial relevance following major flaws discovered in popular (but very proprietary) CPU cores manufactured by companies like Intel, ARM, and AMD. An advantage to researchers interested in investigating micro architecture security is that RISC-V cores are completely open source, available to free inspection by a researcher and straightforward (sort of) to modify.

Why this sort of research is important will be the subject of another article. For now, I will be focusing on the nuts and bolts of exploring this topic.

Setting Up Chipyard

In order to get started on evaluating the security of these new “open cores,” we will need a basic testing environment. Most of the code describing these cores is freely available on GitHub and is published by the Berkeley Architecture Research team. The main repository we’re going to use is Chipyard. This is more or less a standard environment for RISC-V development at present.

Requirements

This has been tested on a virtual machine running Ubuntu 18.04.1 LTS. The operating system was clean-installed. We need to make sure some basic things exist — hopefully if you’re a hardware developer you’ve already got most of this going. In Ubuntu 18.04, most of this can be handled with sudo apt-get install build-essential git, but you may want to check versions below to be sure.

  • gcc, (the GNU C Compiler), with a version greater than 7.3. Watch for this. If you are on an earlier version of Ubuntu (for example) you may find you’re on a much older version of GCC.
  • make, with a version of 4.x or later.
  • git
  • libmpc-dev was a requirement that, when not installed, threw an extremely hard to find error about GCC requiring GMP to build.
  • device-tree-compiler

You can install the known requirements listed above with the following.

$ apt-get install -y build-essential git libmpc-dev device-tree-compiler

Cloning and compiling Chipyard, RISC-V Tools

Begin by cloning Chipyard. This is based on initial setup instructions from within Chipyard’s documentation, and is the “basic” installation.

$ git clone https://github.com/ucb-bar/chipyard.git
$ cd chipyard
$ ./scripts/init-submodules-no-riscv-tools.sh

That will clone down the repository and expand all sub-modules. From here, we need the RISC-V toolchain, which will include all of our necessary compiler tools to generate binaries for the RISC-V architecture.

$ ./scripts/build-toolchains.sh

Some versions of the guide will ask you to run it with a parameter, e.g. ./scripts/build-toolchains.sh riscv-tools, but I found this causes problems.

That will take a while to run. Remember that if you are able to compile with multiple cores, you may benefit from setting the MAKEFLAGS=-jN environment variable, where N is the number of cores you have available. Additionally, don’t forget you can use terminal multiplexers like screen to keep long compilations running when you cannot, for example, keep an ssh session open for the whole compile.

Once compilation runs, be sure to source env.sh from within your Chipyard root directory. This file is emitted at the end of the toolchain compilation process and will set the environment variables necessary for other tools to find certain critical parts of the toolchain later. Place that source command in your .bashrc to ensure it’s ready to go each time you open your simulation machine.

Verilator

Once you have Chipyard installed and compiled, you will need some sort of Verilog simulation tool. The most typical open source solution is verilator.

$ apt-get install verilator

This will allow you to simulate the CPU designs generated by the toolchain.

Chipyard also supports VCS simulation, but that is a proprietary tool and this guide is avoiding proprietary tools wherever possible.

Java

Because the RISC-V cores are defined in CHISEL, a Java Runtime Environment is necessary. CHISEL (Constructing Hardware In a Scala Embedded Language) is, essentially, a Scala library in the same sense that you import package in a language like Python. You need it to generate the synthesizable Verilog from the Chisel code.

This requirement can be satisfied by running:

$ apt-get install default-jre

Chipyard should handle importing the necessary Scala and Chisel tools on first run of the simulator below.

Testing the Basics

Chipyard basically consists of these components:

  • A hardware construction toolchain meant to generate synthesizable Verilog from CHISEL, a “hardware construction language” (HCL) defined as a SCALA library.
  • Base CHISEL source for RISC-V cores, especially the Rocket core and Berkeley Out-of-Order Machine (BOOM) core.
  • A cross-compiler toolchain (riscv-gnu-toolchain) that will allow you to compile source for the RISC-V architecture.
  • Simulation tools for the cores.

We’re largely following along with the Chipyard documentation here to verify functionality. Once we know everything works correctly we can begin exploring more interesting problems.

First, navigate to chipyard/sims/verilator/ and make from that directory. Assuming everything worked in your toolchain setup above, Chipyard will generate synthesizable Verilog for a basic Rocket core from the Chisel source. When that is done, you’ll have a binary named simulator-example-RocketConfig in chipyard/sims/verilator/ (which would be the working directory). From here, we can run make run-asm-tests to test the basic functionality of the core.

At the very end of all of this, you should see some output that looks similar to the following:

  [ PASSED ] /home/brad/chipyard/sims/verilator/output/example.TestHarness.RocketConfig/rv64ui-v-lwu.out       Completed after 64605 cycles
  [ PASSED ] /home/brad/chipyard/sims/verilator/output/example.TestHarness.RocketConfig/rv64ui-v-sd.out        Completed after 84357 cycles
  [ PASSED ] /home/brad/chipyard/sims/verilator/output/example.TestHarness.RocketConfig/rv64ui-v-slliw.out     Completed after 47145 cycles
  [ PASSED ] /home/brad/chipyard/sims/verilator/output/example.TestHarness.RocketConfig/rv64ui-v-sllw.out      Completed after 65052 cycles
  [ PASSED ] /home/brad/chipyard/sims/verilator/output/example.TestHarness.RocketConfig/rv64ui-v-sltiu.out     Completed after 47052 cycles
  [ PASSED ] /home/brad/chipyard/sims/verilator/output/example.TestHarness.RocketConfig/rv64ui-v-sltu.out      Completed after 52589 cycles
  [ PASSED ] /home/brad/chipyard/sims/verilator/output/example.TestHarness.RocketConfig/rv64ui-v-sraiw.out     Completed after 52203 cycles
  [ PASSED ] /home/brad/chipyard/sims/verilator/output/example.TestHarness.RocketConfig/rv64ui-v-sraw.out      Completed after 65579 cycles
  [ PASSED ] /home/brad/chipyard/sims/verilator/output/example.TestHarness.RocketConfig/rv64ui-v-srliw.out     Completed after 47203 cycles
  [ PASSED ] /home/brad/chipyard/sims/verilator/output/example.TestHarness.RocketConfig/rv64ui-v-srlw.out      Completed after 65396 cycles
  [ PASSED ] /home/brad/chipyard/sims/verilator/output/example.TestHarness.RocketConfig/rv64ui-v-subw.out      Completed after 52501 cycles

And now we have Chipyard up and running!

Conclusion

Getting in to RISC-V development is not easy. There’s a lot of toolchain balkanization, and the tools are rapidly changing as we go. Getting fully in to RISC-V development requires some knowledge of:

  • Basic logic design.
  • Computer architecture.
  • The RISC-V assembly language (which can be explored in the excellent primer, The RISC-V Reader by David Patterson and Andrew Waterman).
  • C language programming, especially as related to the Linux kernel.
  • The Chisel Hardware Construction Language (HCL). A good tutorial on this is Digital Design for CHISEL by Martin Schoeberl. In addition to this, a good understanding of SCALA is advised since Chisel is, more or less, a SCALA library / extension.
  • The Verilog Hardware Descriptor Language (HDL), which is what Chisel ultimately generates.

Hopefully this has helped you instantiate at least a good starter environment to experiment within.

Generating Random Tests with Python and Pytest

One of the things I’m trying to incorporate into my work on cfltools is highly integrated unit testing. Since cfltools is a forensic utility, it’s critical that I am able to show that the program creates predictable, consistent outputs when fed a variety of data. This ensures that the products of the tool have legally defensible, evidentiary value.

To get this done, we’re going to use pytest, a very standard library for integrating unit tests into python. One primary thing my program is intended to do is process IP logs. These logs are going to vary from source to source, but what I’m interested right now are two critical pieces of data: an IP address and a time that the IP address was recorded. I can extend the general concept later, but for now this is my focus.

So, to test the particular functions of the program I’m developing, I want to generate large logs of random data on the fly. I’ve divided cfltools up into modules (you can see the general structure of the program at its GitHub repository). The module I’m working on just parses log files, and so it’s pretty uncreatively called logparse (which I import as cfltools.logparse).

pytest will look through files and folders for functions or files preceded by the keyword test_. So, I create a file in cfltools/logparse called test_logparse.py and put most of my unit tests in there.

Creating the Fixtures

Fixtures in pytest are essentially reusable objects that we can call over and over again in different tests. Lets create a couple of fixtures for IP addresses and a timestamp.

@pytest.fixture
def ipv4address():
    """
    Generates a random ipv4 address.
    """
    class IPv4AddrFactory():
        """Generates random ipv4 addreses."""
        def get(self):
            """Return an ipv4 address."""
            ipaddr = str(randint(1, 255)) + '.' + \
                     str(randint(1, 255)) + '.' + \
                     str(randint(1, 255)) + '.' + \
                     str(randint(1, 255))
            return ipaddr
    return IPv4AddrFactory()


@pytest.fixture
def randomdatetime():
    """
    Generates a random date. Returns an integer
    as a POSIX time between 20100101 or 20110101.
    """
    class RandomDateTimeFactory():
        """Returns a randomly generated date and time."""
        def get(self):
            """Return a date/time"""
            # POSIX date for 20100101 0100Z: 1262350800
            # POSIX date for 20110101 0100Z: 1293886800
            # Use these dates to bracket dummy dates.
            date_posix = randint(1262350800, 1293886800)
            return str(date_posix)
    return RandomDateTimeFactory()

I could just return an actual IP address from the function, but doing it this way means that when I import the fixture into a unit test, I can instantiate a completely new random object. For example, I could, naively, do the following.

@pytest.fixture
def randomnumber():
    return random.randint(1,100)

def functionundertest(number):
    return number*10

def test_random(randomnumber):
    assert functionundertest(number) == randomnumber*10

This is fine if I just want to run the test once. If I want to enclose the test in a for loop:

def test_random(randomnumber):
    for _i in range(1,100):
         assert functionundertest(number) == randomnumber*10

What I’m doing above is not testing our functionundertest against 100 random numbers. I’m just running the same random number through the test 100 times, which isn’t very informative. What I do instead is instantiate the fixture as an object.

@pytest.fixture
def randomnumber():
    class RandomNumber():
        def get(self):
            return random.randint(1,100)
    return RandomNumber()

Now what I get instead is a random number generator, not a single random number. So I can modify my loop from before.

def test_random(randomnumber):
    for _i in range(1,100):
        testnumber = randomnumber.get()
        assert functionundertest(testnumber) == testnumber*10

And now I’m generating a bunch of random test cases constrained by how I define my generator. This is much more informative. I applied this logic to the above to generate unique, random “dummy logs” for my program tests. You ideally want your tests to be really exhaustive, but for now this suffices to find random problems and check scalability (since I can use this to make log files really large arbitrarily.

Finally, I create a fixture that puts these fixtures together and generates a log. The randomNumOccurances object is another fixture that generates a random number of occurrences that an IP was detected, which is a feature of some datasets this program is meant to work with.

@pytest.fixture
def iplogline(ipv4address, randomdatetime, randomNumOccurances):
    """
    Generate a random IP logfile line.
    """
    class IPLogLineFactory():
        """Gives us one line of a logfile."""
        def get(self):
            """Returns a list object that is one line of a dummy logfile."""
            return [ipv4address.get(), \
                    randomdatetime.get(), \
                    randomNumOccurances.get()]
    return IPLogLineFactory()

So that gives me one line of a log file. I build up like this from base components because you never know when you’re going to need just one piece of a fixture to test some small function.

A Dummy File

Now I want to put all of this together in a large dummy log file. Most of the files I work with are *.csv, but I don’t want to have a lot of test files floating around. I want my test harness to generate them on the fly. pytest can do this.

@pytest.fixture
def logfile(tmpdir, iplogline):
    """Generates a dummy CSV file for testing."""
    testfile = tmpdir.join("logfile.csv")
    with open(testfile, 'w') as file:
        for _i in range(1, 100):
            line = iplogline.get()
            file.write(line[0] + ',' + line[1] + ',' + line[2] + '\n')
    yield testfile

This will create a temporary file at the location and return an absolute path to the temporary file. Now I have a completely random csv log file, generated on the fly, that I can use for testing. Lets make a basic test. One thing cfltools does is remembers logfiles its already seen by taking an md5 checksum of the file and storing it in a database. Lets do that.

def test_open_file_and_checksum(logfile):
    """
    Verifies that a file can be opened and checksummed
    by LogParser() and LogFile().
    """
    from hashlib import md5
    parser = LogParser(logfile)
    with open(logfile) as file:
        data = file.read()
    test_md5 = md5(data.encode('utf-8')).hexdigest()
    assert parser.logfile.md5() == test_md5

And now we have a basic test harness we can bolt things on to.