Is Upstream Ready for RISC-V?

August 2022 ยท 12 minute read

When RISC-V was first introduced to the Nervos ecosystem, a patched toolchain had to be used since upstream gcc/clang does not provide enough support for RISC-V. However, one argument supporting RISC-V in a blockchain, was exactly to embrace a wide open source community, where Nervos can both enjoy the contributions by the whole community, as well as contribute back code built when working on the Nervos ecosystem.

Fast forward a few years, the question remains: as of August 2022, is upstream gcc, clang, etc. ready to support RISC-V development? It’s now time to find out.

For simplicity, all the code within this post is tested on a x86_64 machine running Ubuntu 22.04 (Jammy Jellyfish).

To ensure this post is as ubiquitous as possible, RISC-V programs for both Linux environments, and CKB environments will be covered. To verify both environments, 2 different runners will be needed:

$ sudo apt-get install qemu-user-static
$ # install Rust following notes at https://rustup.rs/
$ cargo install --example ckb-vm-runner --git https://github.com/nervosnetwork/ckb-vm --features asm ckb-vm

GCC

Latest ubuntu already provides gcc package for cross-compiling RISC-V programs:

$ sudo apt-get install gcc-12-riscv64-linux-gnu

In terms of toolchains, RISC-V can still be considered to be a relatively new architecture. Using new versions of compilers can definitely help avoid bugs, while also enjoying new optimizations, we shall see such as use case later.

Now we can try building a simple program to the RISC-V architecture:

$ cat << EOF > fib.c
#include <stdio.h>
#include <stdlib.h>

int fib(int n) {
  if (n <= 1) return n;
  return fib(n - 1) + fib(n - 2);
}

int main(int argc, char* argv[]) {
  if (argc < 2) {
    return -1;
  }

  int result = fib(atoi(argv[1]));
  printf("Result: %d\n", result);
  return result + 3;
}
EOF
$ riscv64-linux-gnu-gcc-12 -O3 fib.c -o fib_gcc
$ qemu-riscv64-static fib_gcc 10
qemu-riscv64-static: Could not open '/lib/ld-linux-riscv64-lp64d.so.1': No such file or directory

Oops, compiling works, but we cannot find the dynamic linking loader required by glibc. Note this file in in fact already on our system:

$ file /usr/riscv64-linux-gnu/lib/ld-linux-riscv64-lp64d.so.1
/usr/riscv64-linux-gnu/lib/ld-linux-riscv64-lp64d.so.1: ELF 64-bit LSB shared object, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, BuildID[sha1]=c779d6eb2d438e3178484f37976cbc8794232379, stripped

Due to the fact that the host machine is using x86_64 architecture, the file is actually put with a prefix /usr/riscv64-linux-gnu. Luckily, we can patch the ELF to fix this:

$ sudo apt-get install patchelf
$ patchelf --set-interpreter /usr/riscv64-linux-gnu/lib/ld-linux-riscv64-lp64d.so.1 fib_gcc
$ qemu-riscv64-static fib_gcc 10
fib_gcc: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory

Running the program again yields a different error generated by the dynamic linking loader, which requires us to help it find the correct library path:

$ LD_LIBRARY_PATH=/usr/riscv64-linux-gnu/lib qemu-riscv64-static fib_gcc 10
Result: 55
$ echo $?
58

The program now runs successfully with the fib result printed to console, and the correct return code returned.

One convenient trick here: when a binary of different architecture is used, qemu will be launched automatically:

$ LD_LIBRARY_PATH=/usr/riscv64-linux-gnu/lib ./fib_gcc 10
Result: 55
$ echo $?
58

Another way of solving the dynamic linking problem, is that we can generate a static linked program instead:

$ riscv64-linux-gnu-gcc-12 -static -O3 fib.c -o fib_gcc_static
$ qemu-riscv64-static fib_gcc_static 10
Result: 55
$ echo $?
58

This way there’s no need to patch the ELF, nor the need to provide LD_LIBRARY_PATH, it really depends on the use case to tell which is a better option.

LLVM

First, follow the steps here to install LLVM 14, which is the latest stable version when I wrote this post. Or if you are the impatient type, following the following simple steps:

$ wget https://apt.llvm.org/llvm.sh
$ chmod +x llvm.sh
$ sudo ./llvm.sh 14 all

Now we can build the same program above using clang:

$ /usr/lib/llvm-14/bin/clang -target riscv64-linux-gnu -O3 fib.c -o fib_clang
$ qemu-riscv64-static fib_clang 10
qemu-riscv64-static: Could not open '/lib/ld-linux-riscv64-lp64d.so.1': No such file or directory
$ patchelf --set-interpreter /usr/riscv64-linux-gnu/lib/ld-linux-riscv64-lp64d.so.1 fib_clang
$ LD_LIBRARY_PATH=/usr/riscv64-linux-gnu/lib qemu-riscv64-static fib_clang 10
Result: 55
$ echo $?
58

Similarly, static linking also works:

$ /usr/lib/llvm-14/bin/clang -static -target riscv64-linux-gnu -O3 fib.c -o fib_clang_static
$ qemu-riscv64-static fib_clang_static 10
Result: 55
$ echo $?
58

For the moment, there are still some differences between code generated via gcc vs code generated via clang. Later we shall see an example of the differences.

Rust

Developers have been using Rust to build RISC-V programs on CKB for quite some time. Even though capsule is powered via docker, the underlying docker image actually use stock Rust already. Here we can show the exact same fibonacci example above written in Rust:

$ cargo new rust-riscv-demo
$ cd rust-riscv-demo
$ mkdir .cargo
$ cat << EOF > .cargo/config
[target.riscv64gc-unknown-linux-gnu]
linker = "riscv64-linux-gnu-gcc-12"
rustflags = ["-C", "link-args=-L/usr/lib/gcc-cross/riscv64-linux-gnu/12"]
EOF
$ cat << EOF > rust-toolchain
nightly-2022-08-15
EOF
$ cat << EOF > src/main.rs
fn fib(n: i32) -> i32 {
    if n <= 1 {
        return n;
    }
    fib(n - 1) + fib(n - 2)
}

fn main() {
    let n = std::env::args().nth(1).expect("input args");
    let n = i32::from_str_radix(&n, 10).expect("parsing");
    let result = fib(n);
    println!("Result: {}", result);
    std::process::exit(result + 3);
}
EOF
$ rustup target add riscv64gc-unknown-linux-gnu
$ cargo build --release --target riscv64gc-unknown-linux-gnu
$ patchelf --set-interpreter /usr/riscv64-linux-gnu/lib/ld-linux-riscv64-lp64d.so.1 target/riscv64gc-unknown-linux-gnu/release/rust-riscv-demo
$ LD_LIBRARY_PATH=/usr/riscv64-linux-gnu/lib qemu-riscv64-static target/riscv64gc-unknown-linux-gnu/release/rust-riscv-demo 10
Result: 55
$ echo $?
58

Gotchas

Like pretty much everything you have read, the above steps have been simplified from certain details. Here’re some gotchas one needs to be attention.

If you are mainly interested in RISC-V running in the Linux environment, and not so interested in CKB, feel free to jump ahead to linker relaxation section.

CKB flavored libc

If you try to run the above binaries directly under CKB’s environment, errors will be triggered:

$ ckb-vm-runner fib_gcc_static fib_gcc_static 10
asm exit=Err(MemOutOfBound) cycles=135 r[a1]=0
Error: MemOutOfBound

While CKB strictly implements the RISC-V standard specification, CKB actually has different runtime environment from Linux. One can think of CKB as a different operating system that also runs on the RISC-V CPU. You might notice the phrase linux-gnu keeps appearing in the above steps where we build a RISC-V program for qemu. This configuration sets the compiled RISC-V program targetting the Linux environment. CKB, however, will require a different configuration.

Specifically, there are 3 things that are different from a Linux environment:

As a result of the above differences, libc has to be patched to work with CKB’s runtime environment. Previously this was packed together when a full toolchain is shipped. Now the question is slightly different: with an upstream compiler, how can we switch to a libc that CKB can work with?

A viable path here, is that we can build the patched libc using upstream gcc, then configure upstream gcc to link a program against the patched libc, so was to run it on CKB. There is still non-upstream part involved but it’s already a start, and one does not have to build a custom version of gcc anymore, which can take a huge amount of time.

First thing here is to build the patched libc:

$ sudo apt-get install gcc-riscv64-unknown-elf
$ git clone https://github.com/nervosnetwork/riscv-newlib
$ cd riscv-newlib
$ mkdir dist
$ ./configure --host=x86_64-pc-linux-gnu --target=riscv64-unknown-elf --prefix `pwd`/dist \
  --enable-newlib-io-long-long --enable-newlib-io-c99-formats \
  CFLAGS_FOR_TARGET="-g -O2 -D_POSIX_MODE -DCKB_NO_MMU -D__riscv_soft_float -D__riscv_float_abi_soft" \
  CXXFLAGS_FOR_TARGET="-g -O2 -D_POSIX_MODE -DCKB_NO_MMU -D__riscv_soft_float -D__riscv_float_abi_soft"
$ make
$ make install
$ cd ..

Notice here gcc-riscv64-unknown-elf package is used instead, here the gcc to be used is riscv64-unknown-elf-gcc instead of riscv64-linux-gnu-gcc-12. One way to think about this, is that running on CKB resembles more like running on a baremetal embedded chip, instead of Linux powered runtime environment where GNU libraries are available to use.

Now is the time to build a simple C program, against the patch libc so as to run on CKB:

$ cat << EOF > fib_ckb.c
#include <stdio.h>
#include <stdlib.h>
#include "ckb_syscalls.h"

int fib(int n) {
  if (n <= 1) return n;
  return fib(n - 1) + fib(n - 2);
}

int main(int argc, char* argv[]) {
  if (argc < 2) {
    return -1;
  }

  int result = fib(atoi(argv[1]));
  char* buf = (char*) malloc(1024);
  snprintf(buf, 1024, "Result: %d", result);
  ckb_debug(buf);
  free(buf);
  return result + 3;
}
EOF
$ curl -LO https://raw.githubusercontent.com/nervosnetwork/ckb-c-stdlib/20578dfb092b3b3761df755395e20ec142a83d6e/ckb_consts.h
$ curl -LO https://raw.githubusercontent.com/nervosnetwork/ckb-c-stdlib/20578dfb092b3b3761df755395e20ec142a83d6e/ckb_syscall_apis.h
$ curl -LO https://raw.githubusercontent.com/nervosnetwork/ckb-c-stdlib/20578dfb092b3b3761df755395e20ec142a83d6e/ckb_syscalls.h
$ riscv64-unknown-elf-gcc -march=rv64imac -mabi=lp64 -nostdlib -static \
  riscv-newlib/dist/riscv64-unknown-elf/lib/rv64imac/lp64/crt0.o -O3 fib_ckb.c -o fib_ckb \
  -I riscv-newlib/dist/riscv64-unknown-elf/include \
  -L riscv-newlib/dist/riscv64-unknown-elf/lib/rv64imac/lp64 \
  -lc -lm -lgcc -lgloss
$ ckb-vm-runner fib_ckb fib_ckb 10
"Result: 55"
asm exit=Ok(58) cycles=6302 r[a1]=0

While this simple program should not require a call to malloc, it is designed this way to test that the compiler is indeed setup for CKB. In case something goes wrong, this program should generate a different output:

$ # now the program is compiled using upstream gcc & libc,
$ # which will require a proper sbrk syscall that is not supported on CKB.
$ riscv64-unknown-elf-gcc -O3 fib_ckb.c -o fib_ckb_error
$ ckb-vm-runner fib_ckb_error fib_ckb_error 10
asm exit=Err(InvalidEcall(214)) cycles=4319 r[a1]=1072
Error: InvalidEcall(214)

A similar method can also be used to compile the program using clang against the patched libc:

$ /usr/lib/llvm-14/bin/clang -target riscv64-unknown-elf \
  -march=rv64imac -mabi=lp64 -nostdlib -static \
  riscv-newlib/dist/riscv64-unknown-elf/lib/rv64imac/lp64/crt0.o -O3 fib_ckb.c -o fib_ckb_clang \
  -I riscv-newlib/dist/riscv64-unknown-elf/include \
  -L riscv-newlib/dist/riscv64-unknown-elf/lib/rv64imac/lp64 \
  -L /usr/lib/gcc/riscv64-unknown-elf/10.2.0/rv64imac/lp64 \
  -lc -lm -lgcc -lgloss
$ ckb-vm-runner fib_ckb_clang fib_ckb_clang 10
"Result: 55"
asm exit=Ok(58) cycles=6924 r[a1]=0

CKB flavored Rust skeleton

Rust, on the other hand, allows one to tweak allocators directly. A demo has been prepared at here for you to try it out:

$ sudo apt-get install gcc-riscv64-unknown-elf
$ git clone https://github.com/xxuejie/rust-riscv-ckb-demo
$ cd rust-riscv-ckb-demo
$ rustup target add riscv64imac-unknown-none-elf
$ export export PATH=$PATH:`pwd`/bin
$ cargo build --release --target riscv64imac-unknown-none-elf
$ ckb-vm-runner target/riscv64imac-unknown-none-elf/release/rust-riscv-ckb-demo rust-riscv-ckb-demo 10
"Result: 55"
asm exit=Ok(58) cycles=10507 r[a1]=631200

Linker Relaxation

3 binaries have been generated from earlier steps:

Let’s de-assemble 3 binaries here:

$ riscv64-linux-gnu-objdump -d fib_gcc > fib_gcc_dump.txt
$ riscv64-linux-gnu-objdump -d fib_clang > fib_clang_dump.txt
$ riscv64-linux-gnu-objdump -d rust-riscv-demo/target/riscv64gc-unknown-linux-gnu/release/rust-riscv-demo > rust-riscv-demo_dump.txt

Using fib as the keyword to do a little searching in each of the dump file, you might find something similar to the following segments:

In fib_gcc_dump.txt, lines similar to the following can be found:

658:   8522                    mv      a0,s0
65a:   11a000ef                jal     ra,774 <fib>
65e:   0125093b                addw    s2,a0,s2
662:   3479                    addiw   s0,s0,-2
664:   854a                    mv      a0,s2
666:   ff3419e3                bne     s0,s3,658 <main+0x58>

In fib_clang_dump.txt, lines similar to the following can found:

71a:   2501                    sext.w  a0,a0
71c:   00000097                auipc   ra,0x0
720:   f9c080e7                jalr    -100(ra) # 6b8 <fib>
724:   842a                    mv      s0,a0
726:   00000517                auipc   a0,0x0
72a:   02650513                addi    a0,a0,38 # 74c <_IO_stdin_used+0x4>

In rust-riscv-demo_dump.txt, code similar to the following can be found:

6ed0:       9101                    srli    a0,a0,0x20
6ed2:       00000097                auipc   ra,0x0
6ed6:       f06080e7                jalr    -250(ra) # 6dd8 <_ZN15rust_riscv_demo3fib17hbf741befdf5e94f2E>
6eda:       00038597                auipc   a1,0x38
6ede:       4065b583                ld      a1,1030(a1) # 3f2e0 <_GLOBAL_OFFSET_TABLE_+0xc0>
6ee2:       c22a                    sw      a0,4(sp)
6ee4:       0048                    addi    a0,sp,4

All of those asssembly code involve a call to the fib function, what’s different is: the code generated by gcc requires only a single jal instruction, while the code generated by clang and Rust involves 2 instructions: an auipc filling in the upper part of ra, and a jalr providing the remaining offset part.

Diving a little into RISC-V assembly, we will find out that jal is capable of expressing a jump range from -1MB to +1MB, which is more than capable of expressing the call in a single instruction as in the clang and Rust case. But why do they bother to do that in a more complicated way?

It turns out there is an optimization named linker relaxation, which grants RISC-V assembly code enough flexibility, but also enables performance optimization in the compiler. GCC has implemented this optimization for quite some time, but it’s only fairly recently when this optimization has been merged in LLVM.

This change has not yet been include in LLVM 14, the latest stable version. However it can be verified that LLVM 15 will include this change:

$ wget https://apt.llvm.org/llvm.sh
$ chmod +x llvm.sh
$ sudo ./llvm.sh 15 all
$ /usr/lib/llvm-15/bin/clang -target riscv64-linux-gnu -O3 fib.c -o fib_clang15
$ riscv64-linux-gnu-objdump -d fib_clang15 > fib_clang15_dump.txt

Digging into fib_clang15_dump.txt, lines similar to the following can be found:

70c:   ed5ff0ef                jal     ra,5e0 <strtol@plt>
710:   2501                    sext.w  a0,a0
712:   fa7ff0ef                jal     ra,6b8 <fib>
716:   842a                    mv      s0,a0
718:   00000517                auipc   a0,0x0
71c:   02050513                addi    a0,a0,32 # 738 <_IO_stdin_used+0x4>

This shows that linker relaxation has been applied to the code.

While the nightly version of Rust used here does contain the LLVM commit for linker relaxation, it is not enabled by default, a little tweaking is required:

$ cd rust-riscv-demo
$ cat << EOF > .cargo/config
[target.riscv64gc-unknown-linux-gnu]
linker = "riscv64-linux-gnu-gcc-12"
rustflags = ["-C", "link-args=-L/usr/lib/gcc-cross/riscv64-linux-gnu/12",
             "-C", "target-feature=+relax" ]
EOF
$ cargo build --release --target riscv64gc-unknown-linux-gnu
$ cd ..
$ riscv64-linux-gnu-objdump -d rust-riscv-demo/target/riscv64gc-unknown-linux-gnu/release/rust-riscv-demo > rust-riscv-demo_relaxed_dump.txt

Digging into rust-riscv-demo_relaxed_dump.txt, it’s obvious that the optimization has indeed been applied:

6e94:       9101                    srli    a0,a0,0x20
6e96:       f27ff0ef                jal     ra,6dbc <_ZN15rust_riscv_demo3fib17hbf741befdf5e94f2E>
6e9a:       00038597                auipc   a1,0x38
6e9e:       4465b583                ld      a1,1094(a1) # 3f2e0 <_GLOBAL_OFFSET_TABLE_+0xc0>
6ea2:       c22a                    sw      a0,4(sp)
6ea4:       0048                    addi    a0,sp,4

Note that linker relaxation is not without its controversy, see this post to learn more about it.

Conclusion

As we have seen from the above discussion, there are indeed still rough edges, but building and running a RISC-V program using upstream compilers has been much more smooth than it was like years ago. Personally, I believe this is now the time to start working on RISC-V programs using upstream toolchains. Custom patched toolchain has fulfilled its purpose beautifully, and shall now be retired for good.