Vivado HLS GPIO switch data for Zybo Board - gpio

I am building a custom IP core in Vivado HLS to run withing image/video processing system that runs in embedded linux on the Zybo board. The core takes image/video data in via and AXI stream, performs a processing task (say Sobel), then outputs this to another AXI stream. This works, however, I'm wishing to use the on board switches for the Zybo to determine which processing task should be ran (default is a passthrough).
I cannot find a resource or simple example that shows (in HLS.. not IP Integrator or the Vivado SDK) how to create a HLS RESOURCE/INTERFACE to read the data from the GPIO switches. What I have is the code below in my top module:
#include <hls_video.h>
#include "ip_types.h"
void MultiImaging(AXI_STREAM& inputStream, AXI_STREAM& outputStream, int rows, int cols, bool sw0, bool sw1)
{
#pragma HLS INTERFACE axis port=inputStream
#pragma HLS INTERFACE axis port=outputStream
#pragma HLS RESOURCE variable=rows core=AXI_SLAVE metadata="-bus_bundle CONTROL_BUS"
#pragma HLS RESOURCE variable=cols core=AXI_SLAVE metadata="-bus_bundle CONTROL_BUS"
#pragma HLS INTERFACE ap_stable port=rows
#pragma HLS INTERFACE ap_stable port=cols
//are these two correct for the switches?
#pragma HLS INTERFACE axis port=sw0
#pragma HLS INTERFACE axis port=sw1
//are these two correct for the switches?
#pragma HLS RESOURCE variable=sw0 core=AXI_SLAVE //GPIO?
#pragma HLS RESOURCE variable=sw1 core=AXI_SLAVE //GPIO?
RGB_IMAGE img(rows, cols);
RGB_IMAGE oimg(rows, cols);
RGB_IMAGE sobel_output(rows,cols);
RGB_IMAGE imgh(rows, cols);
RGB_IMAGE imgv(rows, cols);
RGB_IMAGE hsobel(rows, cols);
RGB_IMAGE vsobel(rows, cols);
GRAY_IMAGE imgGray(rows, cols);
GRAY_IMAGE oimgGray(rows, cols);
#pragma HLS dataflow
hls::AXIvideo2Mat(inputStream, img);
//Passthrough
if(sw1 == 0 && sw0 == 0){
//..code here
}
//Sobel
else if(sw1 == 0 && sw0 == 1){
//..code here
}
//Threshold
else if(sw1 == 1 && sw0 == 0){
//..code here
}
//..etc
}
The above works and gives the proper output for 'C Simulation' and 'C Synthesis.' It errors out in the 'RTL/C Cosimulation' with: "OpenCV Error: Sizes of input arguments do not match." This makes no sense to me since ALL RGB_IMAGES are set initially with the same rows/cols.

Well, the size of the data is NOT done, in this specific case, only by ROWs and COLs.
Try to have a look in your header file, there should be something like:
// typedef video library core structures
typedef hls::stream<ap_axiu<24,1,1,1> > AXI_STREAM;
typedef hls::Scalar<3, unsigned char> RGB_PIXEL;
typedef hls::Mat<MAX_HEIGHT, MAX_WIDTH, HLS_8UC3> RGB_IMAGE;
This is clear because you are using AXI_STREAM. Here you are defining how many bit there are in a pixel, how many channel color in your pixel and so on. If the size of the images is the same, the sentence "Sizes of input arguments do not match" refers to this mismatching problem with the top main function.

Related

Why does vivado HLS split this ap_memory interface?

So I have the following bit of code:
int post_quantum_kem_encr( unsigned char m[32],
unsigned char pk[800],
unsigned char coin[32],
unsigned char c[736]) {
#pragma HLS INTERFACE ap_memory port = m
#pragma HLS INTERFACE ap_memory port = pk
#pragma HLS INTERFACE ap_memory port = coin
#pragma HLS INTERFACE ap_memory port = c
#pragma HLS INTERFACE ap_none port = return
some_crypto(m, pk, coin, c);
return crypto_kem_enc_def;
}
Synthesizing this and exporting it as IP results in the following IP block:
My question is, why is c split up into c_d0 and c_d1? (Same goes for pk and coin.) It doesn't happen for m so it seems to be some kind of optimization. I however would like it just to do straight single byte access to the memory element I'm hooking it up to.
It seems that Vivado HLS trying to treat port c as interface to 2-port memory. You can try to add the following directive to force 1-port mode:
#pragma HLS RESOURCE variable=c core=RAM_1P
This solution may look like request to use internal RAM of FPGA, but it seems that it just configures ap_memory interface.
P.S. I found some information about this here (they used core=RAM_2P_BRAM to force 2-port interface): https://www.xilinx.com/content/dam/xilinx/support/documents/sw_manuals/xilinx2014_3/ug871-vivado-high-level-synthesis-tutorial.pdf; pages 78-82

Detect where non-speech loud sound exists in a speech audio file

Is there a way to detect the positions (start and end) where other loud sounds than speech exist in an audio file? For example, the sound of tapping something, popping sound effect, mouse click sound effect, short computer-generated music, etc.
In summary, the conditions are:
The sound is not human voice.
The sound is louder than the average volume of the human speech in that audio file.
There is a number of readily available open source Voice Activity Detectors. If the following conditions are met:
The given audio frame is NOT classified as speech and
The audio frame energy is above the adaptive threshold calculated on speech frames
classify the frame as "loud non-speech".
RNNoise, a noise suppression library, has a very good VAD and the algorithm easily works in real time.
Here is a rough example on how to use the library to get the VAD:
#include <stdio.h>
#include "rnnoise.h"
#define FRAME_SIZE 480
int main(int argc, char **argv) {
int i;
int framepos = 0;
float vad;
float x[FRAME_SIZE];
FILE *f1;
DenoiseState *st;
st = rnnoise_create(NULL);
if (argc!=2) {
fprintf(stderr, "usage: %s <noisy speech>\n", argv[0]);
return 1;
}
f1 = fopen(argv[1], "rb");
while (1) {
short tmp[FRAME_SIZE];
fread(tmp, sizeof(short), FRAME_SIZE, f1);
if (feof(f1)) break;
for (i=0;i<FRAME_SIZE;i++) x[i] = tmp[i];
vad = rnnoise_process_frame(st, x, x);
if (vad < 0.1) printf("Non-speech frame position %d VAD %f", framepos, vad);
framepos += FRAME_SIZE;
}
rnnoise_destroy(st);
fclose(f1);
return 0;
}
I didn't compile it / run, so you might need to fix a line or two to get it working.

What is wrong with the following code in vivado hls?

The following code should read a value from DDR, decrement it, write the result back to the same address, and read the next value, repeating 256 times.
Instead on the first run it decrements the first 2 values (axi_ddr[0] and [1]), and on consecutive runs it only decrements the first value (axi_ddr[0]).
#include "ap_cint.h"
#include <stdio.h>
#include "string.h"
void hls_test(volatile int256 axi_ddr[256], uint32 *axi_lite_status_control){
#pragma HLS INTERFACE s_axilite port=axi_lite_status_control register bundle=BUS_A
#pragma HLS INTERFACE s_axilite port=return bundle=BUS_A
#pragma HLS INTERFACE m_axi depth=256 port=axi_ddr bundle=DDR
int256 axi_ddr_reg;
int256 diff = 1;
uint9 i = 0;
if (*axi_lite_status_control == 1){
for(i = 0; i < 256; i++){
axi_ddr_reg = axi_ddr[i];
axi_ddr[i] = axi_ddr_reg -diff;
}
*axi_lite_status_control = 2;
}
}
Both simulation and cosimulation passes as intended, and cannot figure out what is causing the issue.
Also tried C++, but it ended in the same behavior. The only time it was different, was when I forgot to give initial value to variable diff, and then the value in all 256 DDR locations became 0x0.
Could somebody please point out what am I missing?
The code looks fine to me and it should work flowlessly. However, if you're saying that both simulation and cosimulation pass, then something might be wrong with either you test code or with your hardware implementation.
Also, for the C++ version of the code, you shall be using the ap_uint<N> types defined in ap_int.h, instead of ap_cint.h.

Linux Kernel: invoke call back function in user space from kernel space

I am writing Linux user space application. where I want to invoke registered callback function in user space area from the kernel space.
i.e. interrupt arriving on GPIO pin(switch press event) and registered function getting called in user space.
is there any method is available to do this.
Thanks
I found below code after lot of digging and perfectly works for me.
Handling interrupts from GPIO
In many cases, a GPIO input can be configured to generate an interrupt when it
changes state, which allows you to wait for the interrupt rather than polling in
an inefficient software loop. If the GPIO bit can generate interrupts, the file edge
exists. Initially, it has the value none , meaning that it does not generate interrupts.
To enable interrupts, you can set it to one of these values:
• rising: Interrupt on rising edge
• falling: Interrupt on falling edge
• both: Interrupt on both rising and falling edges
• none: No interrupts (default)
You can wait for an interrupt using the poll() function with POLLPRI as the event. If
you want to wait for a rising edge on GPIO 48, you first enable interrupts:
#echo 48 > /sys/class/gpio/export
#echo rising > /sys/class/gpio/gpio48/edge
Then, you use poll() to wait for the change, as shown in this code example:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <poll.h>>
int main(void) {
int f;
struct pollfd poll_fds [1];
int ret;
char value[4];
int n;
f = open("/sys/class/gpio/gpio48", O_RDONLY);
if (f == -1) {
perror("Can't open gpio48");
return 1;
}
poll_fds[0].fd = f;
poll_fds[0].events = POLLPRI | POLLERR;
while (1) {
printf("Waiting\n");
ret = poll(poll_fds, 1, -1);
if (ret > 0) {
n = read(f, &value, sizeof(value));
printf("Button pressed: read %d bytes, value=%c\n", n, value[0]);
}
}
return 0;
}
Have to implement a handler in a kernel module that triggers e.g. a char device. From user space it could be accessed by polling (e.g. ioctl() calls). It seems that it is the only way at the moment.

DirectX 11.1 trying to create device to not trigger Timeout Detection Recovery

I am trying to use C++ AMP to execute a long running kernel on the GPU. This requires using DirectX to create a device which won't timeout. I am setting the flag but it is still triggering Timeout Detection Recovery. I have a dedicated Radeon HD 7970 in my box without a monitor plugged into it. Is there anything else I need to do to keep Windows 8 from canceling my kernel before it is finished?
#include <amp.h>
#include <amp_math.h>
#include <amp_graphics.h>
#include <d3d11.h>
#include <dxgi.h>
#include <vector>
#include <iostream>
#include <iomanip>
#include "amp_tinymt_rng.h"
#include "timer.h"
#include <assert.h>
#pragma comment(lib, "d3d11")
#pragma comment(lib, "dxgi")
//Inside Main()
unsigned int createDeviceFlags = D3D11_CREATE_DEVICE_DISABLE_GPU_TIMEOUT;
ID3D11Device * pDevice = nullptr;
ID3D11DeviceContext * pContext = nullptr;
D3D_FEATURE_LEVEL targetFeatureLevels = D3D_FEATURE_LEVEL_11_1;
D3D_FEATURE_LEVEL featureLevel;
auto hr = D3D11CreateDevice(pAdapter,
D3D_DRIVER_TYPE_UNKNOWN,
nullptr,
createDeviceFlags,
&targetFeatureLevels,
1,
D3D11_SDK_VERSION,
&pDevice,
&featureLevel,
&pContext);
if (FAILED( hr) ||
( featureLevel != D3D_FEATURE_LEVEL_11_1))
{
std:: wcerr << "Failed to create Direct3D 11 device" << std:: endl;
return 10;
}
accelerator_view noTimeoutAcclView = concurrency::direct3d::create_accelerator_view(pDevice);
wcout << noTimeoutAcclView.accelerator.description;
//Setup kernal here
concurrency::parallel_for_each(noTimeoutAcclView, e_size, [=] (index<1> idx) restrict(amp) {
//Execute kernel here
});
Your snippet looks good, the problem has to be elsewhere. Here are few ideas:
Please double check all parallel_for_each invocations and make sure
they all use accelerator_view with the device that you created with
this snippet (explicitly pass accelerator_view as first argument to parallel_for_each).
If possible reduce the problem size and see if your code runs without
TDR, perhaps something else is causing a TDR (e.g. driver bugs are common cause of TDRs). Once you will know that your algorithm runs correctly for smaller problem you can go back to searching why is TDR triggered for larger problem size.
Disable TDR completely (see MSDN article on TDR registry keys) and see if your large problem
set ever completes, if so go back to first point. This will indicate that your code runs on accelerator_view that has TDR enabled.
Good luck!

Resources