How to config xLen in rocket core? - riscv

I am trying to use rocket core as a baseline core and add some additional features for research purpose, but I can't find where or how to change the value "xLen".

Rocket Chip uses a default XLen of 64 in it's DefaultConfig. However, this can be changed to 32 via a different top-level System configuration of DefaultRV32Config.
If you're working with the Rocket Chip emulator, you can compile these two different configurations with
cd emulator
CONFIG=DefaultConfig make
CONFIG=DefaultRV32Config make
For reference, take a look at the Rocket Chip System configurations defined in the system package as well as the subsystem configurations:
src/main/scala/system/Configs.scala
src/main/scala/subsystem/Configs.scala
The former defines DefaultConfig and DefaultRV32Config. The latter defines WithRV32. WithRV32 is what changes XLen to 32 (and also sets fLen to 32). Alternatively, you can replicate the behavior of WithRV32 in your own subclass of Config.

Related

DirectX 12 Ultimate graphics sample generates a D3D12 "CBV Invalid Resource" error

Presently I'm working on updating a Windows 11 DX12 desktop app to take advantage of the technologies introduced by DX12 Ultimate (i.e. mesh shaders, VRS & DXR).
All the official samples for Ultimate compile and run on my machine (Core i9/RTX3070 laptop) so as a first step, I wish to begin migrating as much static (i.e. unskinned) geometry over from the conventional (IA-vertex shader) rendering pipeline over to the Amplification->Mesh shader pipeline.
I'm naturally using code from the official samples to facilitate this, and in the process I've encountered a very strange issue which only triggers in my app, but not in the compiled source project.
The specific problem relates to setting up meshlet instancing culling & dynamic LOD selection. When setting descriptors into the mesh shader SRV heap, my app was failing to create a CBV:
// Mesh Info Buffers
D3D12_CONSTANT_BUFFER_VIEW_DESC cbvDesc{};
cbvDesc.BufferLocation = m.MeshInfoResource->GetGPUVirtualAddress();
cbvDesc.SizeInBytes = MeshletUtils::GetAlignedSize<UINT>(sizeof(MeshInfo)); // 256 bytes which is correct
device->CreateConstantBufferView(&cbvDesc, OffsetHandle(i)); // generates error
A CBV into the descriptor range couldn't be generated because the resource's GPU address range was created with only 16 bytes:
D3D12 ERROR: ID3D12Device::CreateConstantBufferView:
pDesc->BufferLocation + SizeInBytes - 1 (0x0000000008c1f0ff) exceeds
end of the virtual address range of Resource
(0x000001BD88FE1BF0:'MeshInfoResource', GPU VA Range:
0x0000000008c1f000 - 0x0000000008c1f00f). [ STATE_CREATION ERROR
#649: CREATE_CONSTANT_BUFFER_VIEW_INVALID_RESOURCE]
What made this frustrating was the code is identical to the official sample, but the sample was compiling without issue. But after many hours of trying dumb things, I finally decided to examine the size of the MeshInfo structure, and therein lay the solution.
The MeshInfo struct is defined in the sample's Model class as:
struct MeshInfo
{
uint32_t IndexSize;
uint32_t MeshletCount;
uint32_t LastMeshletVertCount;
uint32_t LastMeshletPrimCount;
};
It is 16 bytes in size, and passed to the resource's description prior to its creation:
auto meshInfoDesc = CD3DX12_RESOURCE_DESC::Buffer(sizeof(MeshInfo));
ThrowIfFailed(device->CreateCommittedResource(&defaultHeap, D3D12_HEAP_FLAG_NONE, &meshInfoDesc, D3D12_RESOURCE_STATE_COPY_DEST, nullptr, IID_PPV_ARGS(&m.MeshInfoResource)));
SetDebugObjectName(m.MeshInfoResource.Get(), L"MeshInfoResource");
But clearly I needed a 256 byte range to conform with D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT, so I changed meshInfoDesc to:
auto meshInfoDesc = CD3DX12_RESOURCE_DESC::Buffer(sizeof(MeshInfo) * 16u);
And the project compiles successfully.
So my question is, why isn't this GPU virtual address error also occurring in the sample???
PS: It was necessary to rename Model.h/Model.cpp to MeshletModel.h/MeshletModel.cpp for use in my project, which is based on the DirectX Tool Kit framework, where Model.h/Model.cpp files already exist for the DXTK rigid body animation effect.
The solution was explained in the question, so I will summarize it here as the answer to this post.
When creating a constant buffer view on a D3D12 resource, make sure to allocate enough memory to the resource upon creation.
This should be at least 256 bytes to satisfy D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT.
I still don't know why the sample code on GitHub could compile without this requirement. Without having delved into the sample's project configuration in detail, it's possible that D3D12 debug layer errors are being dealt with differently, but that's purely speculative.

Under Chisel 3, it takes 10 min to compile the Verilator generated C++ of Rocket Chip. Are there any ways to speed this up?

We are modifying Rocket Chip code. After each modification, we need to run the assembly programs, to be sure everything still runs correctly.
To do this, the steps are:
1) Run Chisel, to generate Verilog
2) Run the verilog through Verilator, to generate C++
3) Compile generated C++
4) Run tests
Step 3 is about 10 times longer than it was under Chisel 2. It takes about 10 minutes, which slows development.
Is there any way to speed this up?
I have found a non-trivial amount of build and run time is spent on not-really-synthesizable constructs that are used for verification support.
For example, I disable the TLMonitors through the Config options. You can find an example in the subsystem Configs.
class WithoutTLMonitors extends Config ((site, here, up) => {
case MonitorsEnabled => false
})

2 Questions about Risc-V-Privileged-Spec-v1.7

Page 16, Table 3.1:
Base field in mcpuid: RV32I RV32E RV64I RV128I
What is "RV32E"?
Is there a "E" extension?
ECALL (page 30) says nothing about the behavior of the pc.
While mepc (page 28) and mbadaddr (page 29) claim that "mepc will point to the beginning of the instruction". I think ECALL should set the mepc to the end of the causing instruction so that a ERET would go to the next instruction. Is that right?
As answered by CliffordVienna, RV32E ("embedded") is a new base ISA which uses 16 registers and makes some of the counter registers optional.
I would not recommend implementing a RV32E core, as it is probably an unnecessary over-optimization in core size that limits your ability to use a large body of RV*I code. But if performance is not needed, and you really need the core to be a tad smaller, and the core is not connected to a memory hierarchy that would dominate the area/power anyways, and you were willing to deal with the tool-chain headaches... then maybe an RV32E core is appropriate.
ECALL is treated like an exception, and will redirect the PC to the appropriate trap handler based on the current privilege level. MEPC will be set to the current PC of the ecall instruction.
You can verify this behavior by analyzing the Berkeley RV64G Rocket processor (https://github.com/ucb-bar/rocket/blob/master/src/main/scala/csr.scala), or by looking at the Spike ISA simulator (starting here: https://github.com/riscv/riscv-isa-sim/blob/master/riscv/insns/scall.h). Careful: as of 2015 Jun 27 the code is still in flux regarding the Privileged Spec.
If we look at how Spike handles eret ("sret": https://github.com/riscv/riscv-isa-sim/blob/master/riscv/insns/sret.h) for example, we have to be a bit careful. The PC is set to "mepc", but it's the trap handler's job to advance the PC by 4. We can see that done, for example, by the proxy kernel in some of the handler functions here (https://github.com/riscv/riscv-pk/blob/master/pk/handlers.c).
A draft of the RV32E (embedded) spec can be found here (via isa-dev mailing list):
https://lists.riscv.org/lists/arc/isa-dev/2015-06/msg00022/rv32e.pdf
It's RV32I with 16 instead of 32 registers and without the counter instructions.

How to simulate the RISCV Rocket chip

According to the riscv-gcc compiler we are generated the binary file. This binary file data are feeding to rocket chip through this signals.
io_host_in_valid, input [15:0] io_host_in_bits
Here io_host_in_bits is 16-bit, so we are driving the 2-times for each instruction data in little-Endian mode.
We are not getting any response from Rocket core (HTIF).
How to simulate the Rocket core and if it is possible to simulate in Xilinx Vivado 2014 as well as debug the design.
Can any one help me about this
Regards,
Santhosh Kumar.
For more information on the Rocket Chip infrastructure, I recommend checking out the slides and videos from the first RISC-V Bootcamp.
The Rocket Chip can be simulated/debugged in two different ways: C simulator and Verilog. For information on using these modes, please consult the Rocket Chip README.
Yunsup's response on the riscv-hw mailing list:
I would take a look at http://riscv.org/tutorial-hpca2015/riscv-rocket-chip-generator-tutorial-hpca2015.pdf for an overview of interfaces and the FPGA setup.
Here’s a link to our test bench we use to test the rocket chip: https://github.com/ucb-bar/rocket-chip/blob/master/vsrc/rocketTestHarness.v. I would take a look at the htif_tick function, where the implementation can be found here at https://github.com/ucb-bar/rocket-chip/blob/master/csrc/vcs_main.cc, which calls a method on htif_emulator_t (https://github.com/ucb-bar/rocket-chip/blob/master/csrc/htif_emulator.h), which is inherited from htif_pthread_t (https://github.com/riscv/riscv-fesvr/blob/master/fesvr/htif_pthread.cc). You should also take a look at https://github.com/riscv/riscv-fesvr/blob/master/fesvr/htif.cc.
The host interface (HostIO) doesn’t take instructions directly, it’s a port for the front-end server (https://github.com/riscv/riscv-fesvr/tree/master/fesvr) to access target memory and the core’s control and status registers (CSR).

32 bit Datapath RISCV core

I'm trying to parametrize the rocket core by changing the configuration in PublicConfig.scala.
However, when I change XprLen and L1D_SETS to 32, I have a compilation problem.
What is the proper way to genarate a 32 bit data path with the Rocket Chip Generator, if possible?
The Rocket-chip does not currently support generating a 32b processor.
While the required changes to the datapath would be minimal, the host-target interface for communicating to the front-end server (as Rocket currently only runs in a tethered mode) has only been spec'ed out for 64b cores.
ALso, L1D_SETS is the number of "sets" in the L1 data-cache (such that L1D_WAYS * L1D_SETS * 64 bytes per line is the total cache capacity in bytes).

Resources