32 bit hella cache access for coprocessor acculator example - riscv

I've implemented 32 bit rocket chip with rocc example, but in accumulator example while accessing data through hella cache interface using do_load instruction. The io_mem_response_valid signal remains high for two clock cycle so data in reg file is overwritten by data of next memory location.
vivado simulation waveform for simple do_load instruction
May be memory response interface default setting to transfer 64 byte or else. please assist me. how to change burst size?
Thanks & Regards,
Sanket

I just changed value of io.mem.req.bits.size = log2Ceil(4).U (i.e. 2) from https://github.com/chipsalliance/rocket-chip/blob/master/src/main/scala/tile/LazyRoCC.scala. which may informed response size for io.mem.resp interface.

Related

Two master components controlling same slave (address assignation), Intel Quartus Prime Platform Designer (Qsys)

I am doing a project using DE1-SoC (FPGA + ARM cortex A9). You can see a part of the design (Qsys, platform designer) here
An on chip memory (RAM, image_memory) is being mastered by two different masters. One of the masters is well known h2f_lw_axi_master (provided by the Quartus Prime software to make the ARM and FPGA data exchange possible) and the other one zpc_1 is a custom master block that I designed.
The basic idea in this project is that after the FPGA is configured, one should be able to write data to the on chip memory and zpc_1 reads the content of the memory and works on it.
The length of each word is 512 bits (64bytes) and there are 1200 words (so address assigned starts from 0x0002_0000 and ends at 0x0003_2bff, enough space for 76800 = (512 * 1200) /8 bytes. The hps uses uint512_t (from boost library of c++) type data to write and zpc_1 has readdata width of 512 bits. The addresses are assigned with respect to h2f_lw_axi_master.
I have two questions related to this system.
1. Should the address for reading data in zpc_1 HDL code start from 0x20000 offset and increment by 0x40 (64) at each cycle to read the data word by word? (or any other method)
2. The zpc_1 is being able to read the first word and continuously working according to the instructions in first word, what might be the reason?
If you need additional information to answer the question and/or question is not clear enough to understand, do not hesitate to ask about more information (comment).
The problem was when one of the masters was interacting with the slave, the slave did not properly allow the other one (in the protocol there is a signal called 'waitrequest', I was not using that signal properly, when I used it that signal properly, the slave was always sending waitrequest which helped me to debug the problem as well).
Tried dual port RAM as shown here and modified the component by properly using the 'waitrequest' signal and everything started working properly.
Now the answers:
Q1: Should the address for reading data in zpc_1 HDL code start from 0x20000 offset and increment by 0x40 (64) at each cycle to read the data word by word? (or any other method)
A1: You can define another address offset with respect to the custom master component as you want, and start reading from that address offset (I used 0x00000000 as in the picture ). The address should increment by 0x40 (64) at each cycle to read the data word by word as #Unn commented.
Q2: The zpc_1 is being able to read the first word and continuously working according to the instructions in first word, what might be the reason?
A2: The reason is the slave (Single port RAM) was not able to respond correctly to both masters at the same time through single port, replacing it with dual port RAM solves the problem.

Is interrupt jitter causing the annoying wobble in audio using the mcu's dac?

I had a assignment for college where we needed to play a precompiled wav as integer array through the PWM and DAC. Now, I wanted more of a challenge, so I went out of my way and created a audio dac over usb using the micro controller in question: The STM32F051. It basically listens to my soundcard output using a wasapi loopback recorder, changes the resolution from 16 to 12 bit (since the dac on the stm32 only has a 12 bit resolution) and sends it over using usart using 10x sample rate as baud rate (in my case 960000). All done in C#.
On the microcontroller I simply use a interrupt for usart and push the received data to the dac.
It works pretty well, much better than PWM, and at a decent sample frequency of 48kHz.
But... here it comes.. When there is some (mostly) high pitch symphonic melody it starts to sound "wobbly".
Here a video where you can hear it: https://youtu.be/xD3uTP9etuA?t=88
I read up on the internet a bit about DIY dac's and someone somewhere (don't remember where) mentioned that MCU's in general have interrupt jitter. So may basic question is: Is interrupt jitter actually causing this? If so, are there ways to limit the jitter happening?
Or is this something entirely different?
I am thinking of trying to compact the pcm data send over serial (as said before, resolution of 12 bits, but are sent in packet of 2 8bits forming 16bits, hence twice the samplerate as the baud rate, so my plan is trying to shift 12 bits to the MSB and adding four bits of the next 12 bit value to the current 16 bit variable, hence only needing 12 transfers instead of 16 per 8 samples. Might read upon more efficient ways of compacting data for transport.), put the samples in a buffer and then use another timer that triggers at 48kHz for sending the samples to the dac. Would this concept work? Or would I just waste time?
For code, here is the project: https://github.com/EldinZenderink/SoundOverSerial

What is the best way to sample periodicaly gpio-pins in Linux?

I like to sample a signal which is generated by the pins of my Raspberry Pi. I made the experience that high sample rates are difficult to realize.
First I done a fast approach with Python(super slow). Then I changed to ANSI C + the bcm2835.h lib. I gained significant performance gains.
Now I am asking my self the question: How to do the best sampling under Linux?
My tries were undertaken in user space. But what is about, switching to kernel space? I can write a simple character device kernel module. In this module the pins are periodically checked. If the state changed some information is put into a buffer. This i/o buffer is polled by a synchronous file read for a application in user space. The best solution for me would be, if the pin checking can be done with a fixed frequency(sample period should be constant for signal processing).
The set up for this could be:
#kernel: character module + kernel thread + gpio device tree interface + DSP at constant sampling time
#user space: i/o app reading synchronous from character module
Ideas/tips?
I have a solution for you.
I have written such a module:
https://github.com/Appyx/gpio-reflect
You can synchronously read any signal from the GPIO pins.
You can use the output and calculate the signal with your sampling rate.
Just divide the periods.

High Speed Serial

I have a system which uses a UART clocked at 26 Mhz. This is a 16850 UART on a i86 architecture. I have no problems accessing the port. The largest incoming message is about 56 bytes, the largest outgoing about 100. The baud rate divisor needs to be 1 so seterial /dev/ttyS4 baud_base 115200 is OK and opening at 115200. There is no flow control. Specifying part 16850 does NOT set the FIFO to deep. I was losing bytes. All the data is byte, unsigned char.
I wrote a routine that uses ioperm to set the deep FIFO's to 64 and now a read/write works meaning that the deep FIFO's are NOT being enabled by serial_core.c or 8250.c, at least in a deep manner.
With the deep FIFO set using s brute force, post open(fd, "/dev/ttyS4", NO_BLOCKING, etc I get reliably the correct number of bytes but I tend to get the same word missing a bit. Not a byte, a bit.
All this same stuff runs fine under DOS so it is not a hardware issue.
I have opened the port for raw, no delays, no party, 8 bits, 2 stop.
Has anyone seen issues reading serial ports are relatively high speeds with short bursts of data?
Yes, I have tried custom baud, etc. The FIFO levels made the biggest improvement. This is a ISA bus card using IRQ7.
It appears the serial driver for Linux sucks and has way to much latency and far to many features for really basic raw operation.
Has anyone else tried very high speed data without flow control or had similar issues. As I stated, I get the correct number of bytes and all the data is correct except 1 bit in byte 4.
I am pretty stumped.

Verilog: Common bus implementation issue

I've been coding a 16-bit RISC microprocessor in Verilog, and I've hit yet another hurdle. After the code writing task was over, I tried to synthesize it. Found a couple of accidental mistakes and I fixed them. Then boom, massive error.
The design comprises of four 16-bit common buses. For some reason, I'm getting a multiple driver error for these buses from the synthesis tool.
The architecture of the computer is inspired by and is almost exactly the same as the Magic-1 by Bill Buzzbee, excluding the Page Table mechanism. Here's Bill's schematics PDF: Click Here. Scroll down to page 7 for the architecture.
The control matrix is responsible for handling when the buses and driven, and I am absolutely sure that there is only one driver for each bus at any given instance. I was wondering whether this could be the problem, since the synthesis tool probably doesn't know this.
Tri-state statements enable writing to a bus, for example:
assign io [width-1:0] = (re)?rd_out [width-1:0]:0; // Assign IO Port the value of memory at address add if re is true.
EDIT: I forgot to mention, the io port is bidirectional (inout) and is simply connected to the bus. This piece of code is from the RAM, single port. All other registers other than the RAM have separate input and output ports.
The control matrix updates a 30-bit state every negative edge, for example:
state [29:0] <= 30'b100000000010000000000000100000; // Initiate RAM Read, Read ALU, Write PC, Update Instruction Register (ins_reg).
The control matrix is rather small, since I only coded one instruction to test out the design before I spent time on coding the rest.
Unfortunately, it's illogical to copy-paste the entire code over here.
I've been pondering over this for quite a few days now, and pointing me over to the right direction would be much appreciated.
When re is low, the assign statement should be floating (driving Zs).
// enable ? driving : floating
assign io [width-1:0] = (re) ? rd_out [width-1:0] : {width{1'bz}};
If it is driving any other value then the synthesizer will treat is as a mux and not a tri-state. This is where the conflicting driver message come from.

Resources