Standardized Open-Source Processor Architecture

March 3, 2022

Jon Gabay

Add Hardware and Software Extensions to an Open-Source Architecture for Optimized Design

Jon Gabay for Mouser Electronics

(Source: Shutterstock/wanpatsorn)

How often have we had to learn a new processor architecture and development environment because our new project requires more horsepower and speed than previous projects? Experience teaches us that similar parts as those with which we are familiar in a new processor can at least ease the pain. But what if learning just one new architecture can position you for quicker and easier designs in the future? Thanks to the RISC-V scalable, replicable, and configurable open-source processors, this is no longer a pipe dream.

Reduced Instruction Set Computers (RISC) are not new architectures; they have been around for decades as a streamlined alternative to the Complex Instruction Set Computers (CISC) that preceded them. Older CISC processors worked very well and, for the most part, followed a Von Neumann architecture where a processor's code would operate in cycles to fetch, decode, and execute the instructions. RISC processors follow a Harvard architecture where the instruction code bus is separated from the data bus, allowing simultaneous access that lets the processor perform each instruction in a single cycle. This makes them fast, deterministic, and easier for you to create compilers and libraries of functions that could more easily port from machine to machine.

But, like CISC processors, different manufacturers have different internal architectures, peripherals, I/O, and instruction sets. This requires having development tools tailored to the specific manufacturer and parts to develop a design. Development tool-makers deal with this by maintaining different "plug-ins" and header files for the development toolchains for each processor—quite a task.

While the RISC-V project began in 2010 at Berkeley, CA, with both Berkeley and non-Berkeley experts and contributors, it's only been since 2018 that RISC-V has gained more attention from processor manufacturers and design engineers. This is due to its promise of freeing designs from being locked into a manufacturer's part or family of parts.

RISC-V is an open specification for an Instruction Set Architecture (ISA) that allows any manufacturer to make a processor that will run the same code. This Open-Source instruction set approach eliminates the need to learn and create unique development ecosystems for each processor’s architecture. In addition to dedicated processors, the RISC-V cores can be used in ASICs and FPGAs for even higher integration.

The key is that the uses of open-source RISC-V instructions are layered and extensible. This means that using the open-source standard instructions set still allows the creation of Application Specific Instruction-set Processors (ASIPs) by letting designers add instructions that perform deeply embedded functions more efficiently. This can take the form of chip maker's extensions that differentiate themselves or ASIC/FPGA hardware developed from the mind of design engineers everywhere. EDA vendors can support extensions through Verilog or VHDL, for example.

Why is the timing of this so important? The Advanced RISC Machine (ARM) architected processors are the dominant RISC force in the market today, but it is not uncommon for a company to pay six-figure licensing fees up front and very high-cost royalties.

Many factors are in play to ensure the long-term sustainability of leading-edge technologies, especially for military and security needs. The open-source RISC-V architecture is well-positioned to satisfy everyone’s needs.

International pressure has made the open-source RISC-V an international standard, and in March it officially became RISC-V International and is now headquartered in Switzerland instead of the US. The RISC-V foundation (riscv.org) maintains the standard.

Architecture Specifics

RISC-V uses a standard 5-stage pipeline and allows up to two instructions per clock cycle execution. A standard Load/Store architecture is implemented, which differentiates ALU from Memory operations. Defined as a Base Integer ISA, a base register adds the offset register value and works with the source or destination register to allow complete access to external memory. I/O can be mapped into this space for flexible programmed I/O and block operations. In addition, a Load-Reserved/Store-Conditional instruction (LR/SC) helps update or output depending on conditional criteria.

Base integer instructions are characterized by integer register widths and the size of the user design space. The RV32I and RV64I instruction variants support 32- and 64-bit user-level address spaces; a future RV128I will be able to support flat 128-bit addressing capacity. The instruction intends to allow native hardware implementation without "over-architecting" to keep it well-positioned to be implemented as an ASIC or FPGA.

Instruction variants include both 32- and 64-bit addressing capabilities, and user-level ISA extensions and specialized variants are supported. The Little Endian configuration is used like the x86 architectures. For example, accessed memory addresses do not have to be aligned to their word width, at 16, 32, 64, or 128 bits. In addition, a "fence" instruction assures that preceding instruction value results are visible for other threads or I/O devices. This separates memory read or write operations without affecting I/O, eliminating wait times.

What helps is the memory upper load immediate instructions that allow the upper 20 bits of the 32-bit address register to be loaded with immediate data values placed into the single cycle instruction. In addition, another memory address manipulation instruction allows setting the lower 12 bits. This helps create position-independent code, allowing programs to generate 32-bit addresses relative to the program counter register. And while 128-bit addressed memory seems unfathomable today, the capability exists.

Included in the 32 Integer register set are stack, global, and thread pointers. Another 32 floating-point registers are also available for arguments, parameters, and results. Register x0 always returns a "0" when read and is common in all implementations. It's important to note that RISC-V has a 16-bit variant for embedded applications and a 32-bit flavor for integer and floating-point register sets.

RISC-V allows 16, 32, 48, 64, and 80+x bit-length instructions. Variable-length instructions are supported, and reserved encoding for greater than 192-bit instruction lengths is included in the present specification. Exceptions, Traps, and Interrupts are fully supported.

As expected, extensions are used for arithmetic operations to specify fixed, floating-point, integer, precisions, high returned values, low returned values, and operation types (Table A).

Table A: Simplified Extensions for Multiply and Divide (Source: Author)

Variations and Vector Processing support a range of data-parallel accelerators as an explicit goal of the RISC-V architecture. A relaxed memory model will make it easier for future extensions to work with data-parallel coprocessor or accelerator functions. For example, a user-defined accelerator may be designed to run kernels from essential application domains. Here, you can eliminate all but base integer operations and use only extensions that make the task at hand run more efficiently.

This can be useful for AI acceleration and machine learning. To increase teraFLOPS/Watt, work is underway using Domain-Specific Extensions (DSEs), Tensor instructions, and Vector ISA. User-defined hardware accelerators will consistently outperform software-based calculated solutions. These custom accelerators can be chained into the data pipeline and accelerate graphics, multimedia, DSP, real-time motor control, and other specific architectural requirements.

Recently ratified are the Vector extensions, which add seven unprivileged CSRs to and 32-bit vector registers. It is essential to set the vector context status field mstatus vs. properly or attempts to execute or access any vector instruction can raise an illegal instructions exception.

While accelerators are suitable for hardware and computer-intensive tasks, hypervisory instructions are handy when implementing virtual machines as guests or processes. The internally-embedded methods can handle intensive or housekeeping functions and are part of a processor's code or offloaded to other cores in the system. Using the "H" extensions, hypervisory instructions are part of the privileged instruction sets that let a processor running in machine mode have multiple users, processes, and supervisors. When planned as an orthogonal implementation, configuration bits allow supervisory code to access hypervisor registers or generate interrupts upon access.

Cryptography and machine learning functions take advantage of vector extensions that are a base for additional vector extensions in specific domains. The RISC-V architecture can accelerate the cryptographic workload supporting 32-bit and 64-bit data paths. This helps implement NIST AES decryption and NIST AED decryption, as well as block cypher, Hash functions, Entropy Source Extensions, Crossbar permutations, and more.

A key recent development is the availability of Trusted Execution Environment (TEE) support for the SiFive’s Freedom SDK development system. The HexFive Security Multizone additions to RISC-V support a hardware policy based on enforced separation for an ‘unlimited’ number of security zones. This allows firmware developers to implement full control over data. Code, peripherals, and interrupts locking out would be attempts at breaching security.

Explicit definitions for hardware threads are part of the privileged instructions, which allow recovering from a stalled thread or a thread that is not ready to proceed (waiting for input or in the middle of a calculation, for example). Hardware threads can also implement more efficient interrupts since save and restore operations don't have to be implemented for fast and real-time service routines. Five modes of hypervisory operations are supported with RISC-V's ISA. These are machine, supervisor, user, supervisor-under-hypervisor, and user-under-hypervisor. This leaves a lot of flexibility for multiple independent processes to be running without stepping on each other.

Real-World Examples

To better understand what RISC V is and how you can leverage the technology as we advance, let's use an industry-leading RISC-V developer as a source example. SiFive offers a portfolio of RISC-V processor cores for domain-specific systems on a chip (SoCs). This spans low-power embedded microcontroller-style cores up through multi-core applications processors. In addition, the configurable cores can be tuned to satisfy specific needs, including HyperX architectures, pipeline architectures, vector architectures, and parallel processing architectures.

A key advantage of SiFive is its online micro architecture generator tool that provides detailed architectural configurations and extensions (Figure 1). Designers can specify single-core designs or up to 8 or more cores with single or multiple floating-point processors, bit manipulation instructions, flexible interrupt handlers, predictive branching, and more. The architectures are built with design for test hooks as well as debug and trace features using J-TAG.

Figure 1: The ability to mix and match multiple cores in a unified or monolithic design allows the engineer to tailor a design specific to their needs while maintaining the ability to upgrade or enhance sections without a total redesign. (Source: SiFive)

Peripheral ports can be 32 bits wide, and memory ports for code and data can be 64 bits wide, meaning fewer accesses and faster speeds. Memory blocks can be unique to a core or mapped for shared access between cores. A key benefit to SiFive's approach is that it allows for a mix-and-match process for intensive and peripheral functions heterogeneously and/or monolithically (Figure 2). Compute-intensive tasks can run side by side with data I/O, load and store, and communications operations with Atomic Access to shared memory and peripherals.

Figure 2: A flexible online core development tool permits specific needs for each core to be captured and categorized, including multi-core architectures, peripheral architectures, bus widths, and shared resources. A build function creates the custom core, and simulation models are used for functional verification. (Source: Bedford Falls)

Another critical benefit SiFive brings to the party is enhanced security. Featuring the WorldGuard technique, fine grain security can isolate code for execution and data. This includes multiple levels of privilege with an unlimited number of user-defined worlds. In addition, world ID markers isolate processes from each other to ensure protected and isolated execution.

A shield architecture is included to safeguard critical information that features a NIST SP-800-90A/B/C compliant true random number generator for cryptographic and entropy-based secure features. In addition, an AES cryptographic engine is protected against various types of attacks, and a secure crypto hash engine supports SHA-2 and SHA-3 standards, as well as public-key encryption.

Available now as a chip or on a development platform is the newer Crown Supply HiFive FE310-G002 32- bit RISC-V SoC and development board. The 3.3V development board features a 32-bit wide data bus and integrated wireless technology for networking and IoT applications requiring high-end processing power.

Developing the RISC-V-based products as a core provides the highest possible integration and performance since almost all functions and data transfers can occur monolithically.

While using a hard core should provide a fully characterized and debugged processor starting point, soft cores allow user-integrated functionality to live side-by-side monolithically.

The Take-Away

The ability to tailor your microcontroller and microprocessors architecture with free licensed IP allows standardization of product lines now and for the future. In addition, open source means once the initial learning curve is completed, a deep pool of qualified and expert designers can be used to continue developing a product's architecture from now on.

Minimal and incremental design improvements don't throw the old baby out with the bathwater, and clever implementations of hardware and software can be reused going forward. This allows designers to focus exclusively on the new and advanced aspects of the new design and provides a stable plateau for design elements, libraries, functions, and blocks available in the future. This seems like a "low RISC" solution.

Standardized Open-Source Processor Architecture

Add Hardware and Software Extensions to an Open-Source Architecture for Optimized Design

Architecture Specifics

Real-World Examples

The Take-Away

Demokratisierung der Automatisierung mit RPA und KI

ロボットによる業務自動化「RPA」とAIがもたらすオートメーションの民主化

Warum die Industrie 4.0 Erfolg haben wird

マニュファクチャリング4.0が成功する理由

Wie COVID-19 Fortschritte in der Automatisierung vorantrieb

新型コロナで加速するオートメーション

Autonome Roboter treffen auf Reisegepäck von Endverbrauchern

自律型ロボットとスーツケースの出会い

Autonomous Robots Meet Consumer Luggage

Motorsteuerung für autonome mobile Roboter

自律走行搬送ロボットのモーター制御

Autonomous Mobile Robot Motor Control

Verteilte Sensoren für autonome Roboter

分散センサでロボットの自律化を実現する

Distributed Sensors for Autonomous Robots

Ein Flughafen als Vorreiter für schnelle Datenverbindungen

空港の高速データ接続イノベーション

An Airport Pioneers Fast Data Connections

5G設計におけるパワーアンプモジュールとその役割

Leistungsverstärkermodule und ihre Rolle im 5G-Design