How to create memory in rocket-chip generator which will map as block ram in FPGA after synthesis? - riscv

I tried using Mem(1024,UInt(width=xLen)); but after synthesizing generated verilog file in Xilinx vivado.The memory mapped as distributed ram.
It's really tough to understand and edit generated RTL file. Is there any explicit way to define memory which can inferred as block ram.?
Thanks & Regards,

It looks like chisel is optimizing away the memory. Try using DontTouch construct if you want the Memory as a single unit DontTouch. Another solution is to use a wrapper around Mem. Create a Module around the Mem and then use DontTouch construct so that it remains as a single unit. And are you sure you want to use Mem construct? Mem is Asynchronous , if you want Synchronous Memory use SyncReadMem Chisel construct. Also check your Verilog file before and after synthesizing maybe Chisel isn't the culprit here. Check if Xilinx Vivado isn't optimizing it away.

Related

How do Instruction Fetch and Memory Access co-operate without an Instruction Cache?

I've been looking into RISC-V and the RISC pipeline, and realised that a Memory Access can happen at the same time as the Instruction Fetch. Assuming that it is a fairly basic implementation without any cache, this is a hazard. I did a bit of digging and found this. It talks about inserting a bubble/stall, which makes sense, but how would one go about that? I thought about using a NOP, but the IF to actually grab that would still cause contention. Is the stall inserted by the Load/Store instruction? Or is it something else?
In computer architecture, there are two kind of architectures - Harvard and von Neumann. In Harvard architecture,instruction and data memory are separate but von Neumann has same instruction and data memory resulting in structural hazards.
Generally in RISC-V, we use separate instruction and data memory to avoid structural hazard. As in, if you have only data cache, architecturally you should fetch instruction directly from the memory.

How to work with dynamically allocated memory in Assembly (x64/Linux)?

I'm trying to build a toy language compiler (that generates assembly for NASM) and so far so good, but I got really stuck in the topic of dynamic memory allocation. It's the only part on assembly that's stopping me from starting my implementation. The goal is just to learn how things work at the low level.
Is there a good and comprehensive guide/tutorial/book about how to dynamically allocate, use and free memory using Assembly (preferably x64/Linux)? I have found some tips here and there mentioning brk, sbrk and mmap, but I don't know how to use them and I feel that there is more to it than just checking the arguments and the return value of these syscalls. How do they work exactly?
For example, in this post, it is mentioned that sbrk moves the border of the data segment. Can I know where the border is initially/after calling sbrk? Can I just use my initial data segment for the first dynamic allocations (and how)?
This other post explains how free works in C, but it does not explain how C actually gives the memory back to the OS. I have also started to read some books on assembly, but somehow they seem to ignore this topic (perhaps because it's OS specific).
Are there some working assembly code examples? I really couldn't find enough information.
I know one way is to use glibc's malloc, but I wanted to know how it is done from assembly. How do compiled languages or even LLVM do it? Do they just use C's malloc?
malloc is inteface provided for userspace programs. It may have different implementations, such as ptmalloc, tcmalloc and jemalloc. Depending on different environment, you can choosing different allocators to use and even implement your own allocator. As I know, jemalloc manages memory for userspace programs by mmap a block of demanded memory, and jemalloc controls when the block of memory frees to kernel/system.(I know jemalloc is used in Android.) Also jemalloc also uses sbrk depending on different states od system memory. For more detailed info, I think you have to read the codes of defferent allocators you wanted to learn.

Writing data on memory at different clocks

I want to write data on a common memory coming from different clock domains how can I do that?
I have one common memory block and that memory block works on clock having frequency clk. Now I want to write data on that memory coming from different clock domains i.e clk1, clk2, clk3, clk4 etc. How to do that?
I was thinking of using FIFO for each clock domain i.e. 1st FIFO have input clock clk1 and output at clk(same as memory), the 2md FIFO will have input clock clk2 and output at clk(same as memory) and so on.. But it seems that my design will outgrew if I will use large number of FIFOs. Kindly tell me the correct approach.
To pass data-units (bytes, words etc) safely between clock domains the asynchronous FIFO is the only safe solution. Note that the FIFO does not have to be deep, but in that case you may need flow control.
You may need flow control anyway as you have many sources all accessing the same memory.
But it seems that my design will outgrew if I will use large number of FIFOs.
Then you have a design problem: your FPGA is not large enough to implement the solution you have chosen. So either go to a bigger FPGA or find fundamentally different solution to your problem.
RAM can be write by clock domain A and read by clock domain B with different clocks (Dual-ported RAM):
http://www.asic-world.com/examples/verilog/ram_dp_ar_aw.html
This RAM should be used by some controller as asynchronous FIFO.
Many fpga have dedicated RAM components for example:
UFM Altera, Xilnx BRAM, Cypress Delta39K Cluster Memory Block etc.
Change your device if you have problem with large FIFO.

FPGA large input data

I am trying to send a 4 kilobyte string to an FPGA, what is the easiest way that this can be done?
This is the link for the fpga that I am using. I am using Verilog and Quartus.
The answer to your question depends a lot on what is feeding this data into the FPGA. Even if there isn't a specific protocol you need to adhere to (SPI, Ethernet, USB, etc.), there is the question of how fast you need to accept the data, and how far the data has to travel. If it's very slow, you can create a simple interface using regular IO pins with a parallel data bus and a clock. If it's much faster, you may need to explore using high speed serial interfaces and the special hard logic available on your chip to handle those speeds. Even if it's slower, but the data needs to travel over some distance, a serial interface may be a good idea to minimize cable costs.
One thing I would add to #gbuzogany 's answer: You probably want to configure that block of memory in the FPGA as a FIFO so you can handle the data input clock running at a different rate than the internal clock of your FPGA.
You can use your FPGA blocks to create a memory inside the FPGA chip (you can do that from Quartus). The creation assistant allows you to initialise this memory with anything you want (e.g: a 4KB string). The problem is that in-FPGA memory uses many of your FPGA blocks, but for a board like this it must not be a problem.
Here is a video explaining how to do that on Quartus:
https://www.youtube.com/watch?v=1nhTDOpY5gU
You can use string for memory initialization. It's easy in Verilog in 'initial begin end' block.
There are 2 ways:
1. You can create a memory block by using Xilinx Core Generator and then load the initial data to the memory, then use the data for the code. Of course, you have to convert the string to the binary data.
2. You can write a code which has a memory to store the string, it can be a First-In-First-Out FIFO memory. Then, you write a testbench to read the string from a text file and write data into the FIFO. Your FPGA can read the string from the FIFO.

Windows CE (RTOS) class-libraries for latency of interrupts and threads and USB?

I am getting started in working with Windows CE to utilize RTOS to reduce latency concerns with interrupts and threads and USB. What class-libraries(visual c++) can you point me to that would be good to have learned well to speed up the learning curve?
Thanks
That's a really, really broad question. The most important piece of advice I'll give you is that if you're after determinism and speed (your reference to an RTOS leads me to think you consider these important) then you need to be aware that any memory allocation or deallocation in a piece of code makes it non-deterministic.
C++ classes often have allocations and deallocations buried in them, so whatever you choose (and whatever you write), use them wisely. Sometimes they'll allow you to provide custom allocators (e.g. Boost) which you can use to just pull memory from an already allocated heap you create somewhere.
Keep the real-time parts of the code as small and simple as possible.

Resources