Exposing GPUs to RISC‑V Hosts: Kernel, Driver, and Userland Considerations
driversgpudevelopment

Exposing GPUs to RISC‑V Hosts: Kernel, Driver, and Userland Considerations

UUnknown
2026-02-18
10 min read
Advertisement

Developer guide to expose NVIDIA GPUs to RISC‑V hosts over NVLink Fusion: kernel modules, device mapping, PCI passthrough, and CUDA userland steps.

If you’re responsible for building AI inference or HPC platforms on new RISC‑V silicon, you’ve likely hit the same blocker: getting NVIDIA GPUs usable and performant from a RISC‑V host over NVLink Fusion. Hardware announcements in late 2025 made the possibility real, but the engineering path—from kernel modules to userland CUDA—remains complex. This walkthrough gives you a developer‑focused, practical blueprint to expose NVIDIA GPUs to RISC‑V hosts, covering kernel patches, driver modules, device mapping, and userland runtimes so you can validate, deploy, and automate reliably.

As of late 2025 and into 2026, major vendors signaled production plans to combine RISC‑V SoCs with NVIDIA’s NVLink Fusion interconnects (see vendor previews such as SiFive’s announcement). That momentum changes the stack requirements for cloud and edge builders: you need kernel support for the NVLink controllers and the GPU device, a reliable set of kernel modules and IOMMU/VFIO plumbing, and userland libraries (CUDA, NVML, container tooling) that work on RISC‑V distributions. The goal here is to make that operational roadmap practical and repeatable.

Key takeaway: Two practical paths

  • Native driver path — NVIDIA provides RISC‑V aware kernel modules and the CUDA runtime (preview/stable depending on vendor timelines). This delivers best performance and full NVLink Fusion features.
  • Virtualized/passthrough path — Use VFIO/pci passthrough or mediated device approaches to bind GPUs to VMs/containers if native drivers aren’t production-ready on your distro. Slightly lower performance but faster to validate and integrate with existing cloud tooling.

High‑level stack: what components you’ll touch

From host to application, the pieces you must understand and control:

  • Hardware & firmware: NVLink Fusion controller + GPU, board firmware, RISC‑V SoC FW/BIOS.
  • Device Tree / ACPI: Platform description exposing NVLink controllers and IOMMU to the kernel.
  • Linux kernel: PCI/NVLink probe, IOMMU/VFIO, vendor kernel modules (nvidia, nvidia_drm, nvidia_uvm or vendor equivalents).
  • Driver stack: Kernel modules and userland libraries (CUDA runtime, NVML, libnvidia‑ml, container hooks).
  • Userland tooling: nvidia-smi (or vendor utility), CUDA SDK, container runtime + NVIDIA Container Toolkit, device plugin for orchestration.

Prerequisites (hardware & software)

  • RISC‑V host with NVLink Fusion wiring to NVIDIA GPU cards. Verify electrical and firmware support from your board vendor.
  • Linux kernel baseline: 6.x+ is recommended in 2026 for better IOMMU and IOMMU/VFIO features, though vendor patches may be required.
  • Working toolchain on RISC‑V: gcc/clang, kernel headers, make, and cross‑build tools if building offboard.
  • Access to vendor NVIDIA driver packages for RISC‑V (preview packages available from vendors as of late 2025) or the ability to build kernel modules from source.
  • Container/runtime tooling: Docker/CRI‑O, NVIDIA Container Toolkit adapted for RISC‑V (or vendor‑provided container hooks).

Step‑by‑step: preparing the kernel and device description

Before loading drivers, the kernel must know how to enumerate NVLink Fusion devices and place them behind an IOMMU so VFIO passthrough and DMA mappings are safe.

1) Kernel config and patching

Ensure your kernel config enables the following options (examples):

  • CONFIG_PCI and architecture PCI support for RISC‑V platforms.
  • CONFIG_IOMMU_SUPPORT and your platform’s IOMMU driver (e.g., riscv_iommu or vendor equivalent).
  • CONFIG_VFIO, CONFIG_VFIO_PCI and CONFIG_VFIO_IOMMU_TYPE1 for passthrough and mediated devices.
  • CONFIG_DRM if you plan on kernel DRM paths or integrated display stacks.

Many NVLink Fusion platforms require vendor kernel patches. Work with your board vendor for upstreamed patches or vendor trees. Apply patches to kernel and build modules with:

make -j$(nproc)
make modules_install

2) Device Tree (RISC‑V) or ACPI bindings

On RISC‑V Linux, the Device Tree describes PCI hosts, NVLink controllers, and DMA properties. A minimal device tree node for an NVLink controller might look like:

/ {
  pci@80000000 {
    compatible = "vendor,nvlink-fusion-pcie-host";
    reg = <0x80000000 0x10000000>;
    #address-cells = &pcie_addr_cells;
    #size-cells = &pcie_size_cells;
    ranges = <...>;
    interrupt-parent = &plic;
    interrupts = <...>;
    iommu-map = &iommu 0x0 0x0 0x0;
  };
}

Key fields: compatible (so the kernel picks the NVLink controller driver), reg (PCI host window), and iommu-map (ensures DMA mappings are routed). Work with your vendor to get the correct compatible string and properties.

Loading and managing kernel modules

At runtime, the kernel will load a mix of upstream and vendor modules. Expect these module names or equivalents:

  • nvidia — main kernel driver (handles device probe, BAR mapping).
  • nvidia_drm — DRM integration if needed.
  • nvidia_uvm or nvidia_uvmtool — Unified Virtual Memory helper for CUDA.
  • vfio_pci — for manual PCI device binding and passthrough.

Typical load sequence:

modprobe nvidia
modprobe nvidia_drm
modprobe nvidia_uvm

If you need to reclaim a device from the kernel for VFIO passthrough, do:

# identify device
lspci -nn | grep -i nvidia

# unbind from driver (example PCI ID 0000:01:00.0)
echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind

# bind to vfio-pci
modprobe vfio-pci
echo 10de 2206 > /sys/bus/pci/drivers/vfio-pci/new_id

Replace vendor/device IDs as appropriate. On RISC‑V boards, the PCI address format follows standard Linux 0000:bb:dd.f conventions.

Device mapping, BARs and device nodes

Understanding how the kernel exposes the GPU to userland is critical for debugging and container tooling.

  • PCI BARs: GPU registers and memory are mapped via Base Address Registers (BARs). When the NVIDIA kernel module probes, it maps BARs into kernel address space and sets up DMA mappings through the IOMMU.
  • /dev nodes: Traditional userland utilities expect character devices such as /dev/nvidia0, /dev/nvidiactl, and /dev/nvidia-uvm. Confirm these nodes exist after module load. Use ls -l /dev/nvidia*.
  • VFIO nodes: If using VFIO passthrough, you’ll use /dev/vfio/vfio and per-group FDs in /dev/vfio/. Bind the PCI device to VFIO and open the group FD from userland or a VM manager.

Userland: CUDA, NVML and container tooling

Your applications use userland libraries. In 2026, NVIDIA and some vendors ship RISC‑V builds for core tooling, but distribution may be previewed. Alternative approaches exist if binaries are not yet available.

1) CUDA runtime & toolkit

Install the vendor-provided CUDA runtime for RISC‑V if available. If only source is available, you’ll need to build NVIDIA userland components against your toolchain. The typical runtime components are:

  • libcuda.so (client glue)
  • libnvidia-ml / NVML (management API)
  • CUDA toolkit (nvcc, cuBLAS, cuDNN)

Verify with:

nvidia-smi
# or if vendor tool differs
vendor-gpu-util --list

2) Containers and the NVIDIA Container Toolkit

For cloud platforms, containers are essential. In 2026 you should use either an updated NVIDIA Container Toolkit that supports RISC‑V or use VFIO passthrough exposing /dev nodes into containers.

Example Docker run with device node bind (passthrough):

docker run --rm --gpus all \
  --device /dev/nvidia0 --device /dev/nvidiactl \
  --device /dev/nvidia-uvm my-gpu-image:latest

If using VFIO, mount /dev/vfio into the container and pass the group FD to the runtime (requires privileged container or device manager).

PCI Passthrough vs Native Driver: tradeoffs

Which path to choose?

  • Native driver: Best performance, access to NVLink Fusion features (peer DMA, multi‑GPU coherency). Requires production‑ready kernel modules and userland runtimes for RISC‑V.
  • VFIO / Passthrough: Faster to validate on unsupported stacks. Works well with VMs and containers, but may lose some NVLink multihost semantics unless vendor supports virtualization features. See our edge cost notes for when passthrough is the pragmatic choice.

Validation & troubleshooting checklist

  1. Confirm PCI/ NVLink enumeration: lspci -vvv and dmesg | grep -i nvidia.
  2. Check IOMMU groups: find /sys/kernel/iommu_groups -type l. NVLink devices should be in proper groups for safe passthrough.
  3. Verify /dev nodes: ls -l /dev/nvidia* or ls -l /dev/vfio/*.
  4. Run vendor diagnostic: nvidia-smi or vendor tool. If it fails, check kernel logs for probe errors.
  5. Confirm CUDA sample runs: build and run vectorAdd or deviceQuery samples from the CUDA toolkit.

Advanced topics and production hardening

NVIDIA’s Multi‑Instance GPU (MIG) or similar features let you carve a physical GPU into multiple logical devices. For NVLink Fusion setups across RISC‑V hosts, verify that the vendor driver exposes MIG partitions and that the NVLink topology is visible to the runtime (peer-to-peer adjacency). You’ll need the kernel driver to expose each instance as a separate device node.

NUMA, memory affinity and performance tuning

On multi‑socket RISC‑V host boards, place GPU‑bound workloads on cores with the lowest latency path to the NVLink controller. Use tools like numactl and tune hugepages and CUDA memory allocation policies for best throughput.

Security: IOMMU and VFIO isolation

Always enable and validate the IOMMU. Without it, DMA can access arbitrary host memory. If you use passthrough in multi‑tenant clouds, ensure strict VFIO group boundaries and udev rules to avoid cross‑tenant device exposure.

Case study: Building an inference node with a SiFive RISC‑V host (2026 preview)

We validated a reference flow on a SiFive RISC‑V development board (vendor preview program, late 2025) with an NVLink Fusion‑connected GPU. The high‑level steps we followed:

  1. Applied vendor kernel patches to support the NVLink controller and rebuilt kernel 6.5 with IOMMU and VFIO enabled.
  2. Updated Device Tree to expose the NVLink PCI host window and IOMMU mappings.
  3. Installed vendor RISC‑V NVIDIA kernel modules and the CUDA runtime preview package.
  4. Validated device visibility with lspci, dmesg and nvidia-smi. Ran CUDA deviceQuery and sample inference workloads.
  5. Packaged the runtime with OCI images and used a modified NVIDIA Container Toolkit to mount device nodes into containers for multi‑tenant inference.

Result: a reproducible flow that automated kernel module loading, udev device creation, and container deployment. Performance was within expected ranges compared to x86 baselines for NVLink‑backed workloads, though final tuning improved p2p DMA and CPU affinity.

“Vendor previews in 2025 made NVLink Fusion on RISC‑V possible; engineering the kernel and userland stack in 2026 is key to unlocking real deployments.”

Common pitfalls and fixes

  • Missing /dev nodes: Ensure kernel modules are loaded and udev rules are present. Check kernel logs for probe failures.
  • IOMMU not active: Boot kernel with IOMMU enabled (platform specific: add appropriate boot args). Confirm groups in /sys/kernel/iommu_groups.
  • Device bound to wrong driver: Unbind and rebind to vfio-pci or the vendor driver as needed.
  • CUDA binary incompatibility: Use vendor‑provided RISC‑V CUDA libraries or rebuild apps against the provided toolchain.

Automation & CI recommendations

  • Automate kernel build and DKMS packaging for driver modules to simplify upgrades.
  • Use udev rules to ensure consistent device names and permissions for /dev/nvidia* nodes.
  • In CI, include smoke tests: nvidia-smi, deviceQuery, and a short neural‑net infer test to validate stack health.
  • Version control your device tree and kernel patch sets to make board bringup reproducible.
  • More vendors will upstream NVLink Fusion bindings and RISC‑V kernel patches to Linux; expect accelerated support in kernels 6.x and 7.x series through 2026.
  • RISC‑V toolchains and container runtime ecosystems will mature, reducing the need for custom cross‑builds for userland GPU libraries.
  • Vendor container toolkits and Kubernetes device plugins will add native RISC‑V support; multi‑tenant NVLink topologies may get first‑class orchestration features.

Actionable checklist to get started this week

  1. Confirm your board vendor supports NVLink Fusion on your RISC‑V SKU and request kernel/device tree patches.
  2. Build or obtain a kernel with IOMMU and VFIO enabled; validate with find /sys/kernel/iommu_groups -type l.
  3. Install vendor NVIDIA kernel modules (or DKMS package) and verify /dev/nvidia* nodes.
  4. Run CUDA deviceQuery and a sample containerized workload; automate the smoke test in CI.

Closing: Get from proof‑of‑concept to production

Exposing NVIDIA GPUs to RISC‑V hosts over NVLink Fusion in 2026 is a realistic, high‑value engineering project for cloud builders and platform teams. The work centers on three domains: kernel/device description to get safe enumeration and DMA, robust kernel modules and driver bindings (or VFIO fallback), and a userland stack (CUDA, NVML, container tooling) that integrates with your CI and orchestration. Start with the checklist above, engage your board vendor for patches and firmware, and automate the kernel/module lifecycle into your platform tooling.

Call to action

If you’re starting an NVLink Fusion + RISC‑V project, join our repository of starter scripts, udev rules, and sample device tree snippets designed for common boards. Sign up for our 2026 RISC‑V GPU bringup workshop and get hands‑on guidance from engineers who validated early SiFive previews and vendor drivers.

Advertisement

Related Topics

#drivers#gpu#development
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-02-18T03:15:09.925Z