I am trying run zynq book tutorials lab 4 and c part in vivado hls (hls included vitis in new version) but when I right click in the step of adding directive as described in the book, the add directive window does not open. I tried this separately in 2015.1, 2018.3 and 2021.2 versions of vivado, the result is the same in all of them.The step I'm having trouble with is as follows.Tutorial Book Link is here. Although I researched the problem a lot on the internet, there was not much result, but someone who encountered this problem in xilinx's forum explained in this link He mentioned that this is a bug and that the same operation can be added to the c file with the #pragma command line.Since I am new to these issues, I would appreciate if you could help me how to add the directive by using the #pragma command mentioned in the step (m) in the tutorial.
The c code mentioned is as follows.
void nco (ap_fixed<16,2> *sine_sample, ap_ufixed<16,12> step_size){
ap_ufixed<12,12> address;
temp+=step_size; // Accumulator. Values will wrap around on overflow.
address = ap_ufixed<12,12>(temp); // Cast address to a 12-bit integer value.
*sine_sample = sine_lut[(int)address]; // Assign sign sample from LUT based on current address
}
Yes, it happens at times that Vivado (now Vitis) HLS GUI does not show the directive. However, it's a blessing in disguise since it compels you to manually add the pragmas in your code and as a result you may actually understand the pragmas and their effect on the generated RTL. From the example above it seems you will be using INTERFACE and RESOURCE pragmas.
As a HLS enthusiast, I would like to briefly start with the basic understanding and syntax of both pragmas and then you can check my solution to your question in the end.
pragma HLS interface :
The INTERFACE pragma specifies how RTL ports are created from the function definition during interface synthesis .
The ports in the RTL implementation are derived from:
Any function-level protocol that is specified.
Function arguments.
Global variables accessed by the top-level function and defined outside its scope.
Below is the syntax of the INTERFACE pragma:
Syntax:
#pragma HLS interface <mode> port=<name> bundle=<string> \
register register_mode=<mode> depth=<int> offset=<string> \
clock=<string> name=<string> \
num_read_outstanding=<int> num_write_outstanding=<int> \
max_read_burst_length=<int> max_write_burst_length=<int>
In order to explicitly understand each parameter in the syntax please do read the details in following references:
page 102 to 107 of Vivado HLS Optimization
Methodology Guide for your better understanding.
From page 438 on wards of Vivado Design Suite User
Guide High-Level Synthesis about set_directive_interface for your better understanding.
pragma HLS resource:
Specify that a specific library resource (core) is used to implement a variable (array, arithmetic operation or function argument) in the RTL. If the RESOURCE pragma is not specified, Vivado HLS determines the resource to use.
#pragma HLS resource variable=<variable> core=<core>\
latency=<int>
In order to explicitly understand each parameter in the syntax please do read the details in following references:
page 120 to 121 of Vivado HLS Optimization
Methodology Guide for your better understanding about pragma HLS resource.
On page 452 and page 453 of Vivado Design Suite User
Guide High-Level Synthesis about set_directive_interface for your better understanding.
Your solution:
Updated: Now I again post this solution after running the code myself on my Vivado HLS and hence more clarification:
void nco (ap_fixed<16,2> *sine_sample, ap_ufixed<16,12> step_size) {
#pragma HLS RESOURCE variable=sine_sample core=AXI4LiteS
#pragma HLS RESOURCE variable=step_size core=AXI4LiteS
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS RESOURCE variable=return core=AXI4LiteS
// Define the pcore interface and group into AXI4 slave "slv0"
/* Value to hold the current address value of the sine LUT
* 12-bit unsigned fixed-point, all integer bits.
* Overflow is set to "wrap around" by default. */
ap_ufixed<12,12> address;
temp+=step_size; // Accumulator. Values will wrap around on overflow.
address = ap_ufixed<12,12>(temp); // Cast address to a 12-bit integer value.
*sine_sample = sine_lut[(int)address]; // Assign sign sample from LUT based on current address
}
On some Vivado HLS versions the above solution may give warning like following:
WARNING: [HLS 200-41] Resource core 'AXI4LiteS' on port '&sine_sample' is deprecated. Please use the interface directive to specify the AXI interface.
Hence, we use INTERFACE directive for all variables as following that will give no warnings:
void nco (ap_fixed<16,2> *sine_sample, ap_ufixed<16,12> step_size) {
#pragma HLS INTERFACE s_axilite port=sine_sample
#pragma HLS INTERFACE s_axilite port=step_size
#pragma HLS INTERFACE s_axilite port=return
#pragma HLS INTERFACE ap_ctrl_none port=return
// Define the pcore interface and group into AXI4 slave "slv0"
/* Value to hold the current address value of the sine LUT
* 12-bit unsigned fixed-point, all integer bits.
* Overflow is set to "wrap around" by default. */
ap_ufixed<12,12> address;
temp+=step_size; // Accumulator. Values will wrap around on overflow.
address = ap_ufixed<12,12>(temp); // Cast address to a 12-bit integer value.
*sine_sample = sine_lut[(int)address]; // Assign sign sample from LUT based on current address
}
I'll try to give you a more pedagogical answer, rather than a specific one to your problem.
The idea of Xilinx HLS-specific pragmas is to guide the HLS "compiler" in generating a hardware component close to the designer intentions.
As such, there exist a lot of those HLS pragmas and the Xilinx Vitis-HLS User Guide is full of references to look up to.
In practice, according to the pragma you want to use, you need to place it accordingly in your code (after all, pragmas are just as any other line of code).
For instance, in your code snippet:
void nco (ap_fixed<16,2> *sine_sample, ap_ufixed<16,12> step_size){
#pragma HLS INTERFACE port=sine_sample ...
#pragma HLS INTERFACE port=step_size ...
ap_ufixed<12,12> address;
static ap_ufixed<12,12> temp = 0;
#pragma HLS RESOURCE variable=temp ...
temp += step_size; // Accumulator. Values will wrap around on overflow.
address = ap_ufixed<12,12>(temp); // Cast address to a 12-bit integer value.
*sine_sample = sine_lut[(int)address]; // Assign sign sample from LUT based on current
}
The pragma INTERFACE can be placed anywhere within your function since it's referring to the function arguments and they're available everywhere in the function scope. On the other side instead, the pragma RESOURCE must be placed after defining the variable is bounded to, i.e. temp.
Another example where pragma order matters even more is in loops and nested loops. For example, imagine placing the pragma PIPELINE in these three different positions:
#pragma HLS PIPELINE II=1 // It will appy the pipeline to the whole function, fully unrolls both N-loop and M-loop
for (int i = 0; i < N; ++i) {
#pragma HLS PIPELINE II=1 // It will appy the pipeline to the N-loop and fully unroll the M loop
for (int j = 0; j < M; ++j) {
#pragma HLS PIPELINE II=1 // It will appy the pipeline to M-loop only, keeping both the N-loop and M-loop rolled, but pipelined
// ...
}
}
Note, in the example, imagine only one of the pragmas commented out at the time. I'm putting them together just for the sake of the example (all active in this way won't make much sense and I believe the function-level pragma will simply take over the other ones).
In general, I would recommend you to carefully read the documentation of each pragma you intend to use and how and where to place it in your code.
Good luck!
Related
Presently I'm working on updating a Windows 11 DX12 desktop app to take advantage of the technologies introduced by DX12 Ultimate (i.e. mesh shaders, VRS & DXR).
All the official samples for Ultimate compile and run on my machine (Core i9/RTX3070 laptop) so as a first step, I wish to begin migrating as much static (i.e. unskinned) geometry over from the conventional (IA-vertex shader) rendering pipeline over to the Amplification->Mesh shader pipeline.
I'm naturally using code from the official samples to facilitate this, and in the process I've encountered a very strange issue which only triggers in my app, but not in the compiled source project.
The specific problem relates to setting up meshlet instancing culling & dynamic LOD selection. When setting descriptors into the mesh shader SRV heap, my app was failing to create a CBV:
// Mesh Info Buffers
D3D12_CONSTANT_BUFFER_VIEW_DESC cbvDesc{};
cbvDesc.BufferLocation = m.MeshInfoResource->GetGPUVirtualAddress();
cbvDesc.SizeInBytes = MeshletUtils::GetAlignedSize<UINT>(sizeof(MeshInfo)); // 256 bytes which is correct
device->CreateConstantBufferView(&cbvDesc, OffsetHandle(i)); // generates error
A CBV into the descriptor range couldn't be generated because the resource's GPU address range was created with only 16 bytes:
D3D12 ERROR: ID3D12Device::CreateConstantBufferView:
pDesc->BufferLocation + SizeInBytes - 1 (0x0000000008c1f0ff) exceeds
end of the virtual address range of Resource
(0x000001BD88FE1BF0:'MeshInfoResource', GPU VA Range:
0x0000000008c1f000 - 0x0000000008c1f00f). [ STATE_CREATION ERROR
#649: CREATE_CONSTANT_BUFFER_VIEW_INVALID_RESOURCE]
What made this frustrating was the code is identical to the official sample, but the sample was compiling without issue. But after many hours of trying dumb things, I finally decided to examine the size of the MeshInfo structure, and therein lay the solution.
The MeshInfo struct is defined in the sample's Model class as:
struct MeshInfo
{
uint32_t IndexSize;
uint32_t MeshletCount;
uint32_t LastMeshletVertCount;
uint32_t LastMeshletPrimCount;
};
It is 16 bytes in size, and passed to the resource's description prior to its creation:
auto meshInfoDesc = CD3DX12_RESOURCE_DESC::Buffer(sizeof(MeshInfo));
ThrowIfFailed(device->CreateCommittedResource(&defaultHeap, D3D12_HEAP_FLAG_NONE, &meshInfoDesc, D3D12_RESOURCE_STATE_COPY_DEST, nullptr, IID_PPV_ARGS(&m.MeshInfoResource)));
SetDebugObjectName(m.MeshInfoResource.Get(), L"MeshInfoResource");
But clearly I needed a 256 byte range to conform with D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT, so I changed meshInfoDesc to:
auto meshInfoDesc = CD3DX12_RESOURCE_DESC::Buffer(sizeof(MeshInfo) * 16u);
And the project compiles successfully.
So my question is, why isn't this GPU virtual address error also occurring in the sample???
PS: It was necessary to rename Model.h/Model.cpp to MeshletModel.h/MeshletModel.cpp for use in my project, which is based on the DirectX Tool Kit framework, where Model.h/Model.cpp files already exist for the DXTK rigid body animation effect.
The solution was explained in the question, so I will summarize it here as the answer to this post.
When creating a constant buffer view on a D3D12 resource, make sure to allocate enough memory to the resource upon creation.
This should be at least 256 bytes to satisfy D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT.
I still don't know why the sample code on GitHub could compile without this requirement. Without having delved into the sample's project configuration in detail, it's possible that D3D12 debug layer errors are being dealt with differently, but that's purely speculative.
My group and I are trying to create a synthesizer out of a DE2-115 board for our undergraduate capstone project.
The only thing we can't figure out is how to get the frequencies mapped to the "keys" outputted properly through the audio port on board. We've scoured the web and all provided documentation included the datasheets for the codec but we can't figure out how to get it to work properly in VHDL.
Has anyone ever worked with outputting audio through the WM8731 and if so, would they be willing to help us out?
I did that some years ago, wasn't too hard, but I used a NIOS processor with SOPC builder.
I used the Altera University Program IP cores available here.
These cores provides different functionality related to the DE2 and possibly other altera sponsered-board.
According to my logs, I used 3 of these cores to make audio work.
The altera_up_avalon_audio_and_video_config, which is used to configure the audio CODEC chip at initialization.
The second IP provide data in/out interface with the audio chip: altera_up_avalon_audio.
The last one is altera_up_avalon_clocks. I can't remember exactly what it does, but as the name imply it's necessary for the clocking of the audio chip. I think it takes an input clock and generate a PLL to provide the right clock to the CODEC.
As I said, I used a NIOS processor, still according to my log, the C code I used is:
void audio_isr(void* context, alt_u32 id)
{
const int len = 2682358;
static signed char *ptr = test_snd;
unsigned int x[128];
alt_up_audio_dev *audio_dev = (alt_up_audio_dev *)context;
unsigned int n = alt_up_audio_write_fifo_space(audio_dev, ALT_UP_AUDIO_RIGHT);
for(unsigned int i = 0; i < n; i++) {
x[i] = 0x800000 + ((int)*ptr++) << 9;
if (ptr > test_snd+len) {
ptr = test_snd;
printf("Done\n");
}
}
alt_up_audio_write_fifo(audio_dev, x, n, ALT_UP_AUDIO_RIGHT);
alt_up_audio_write_fifo(audio_dev, x, n, ALT_UP_AUDIO_LEFT);
}
static void audio_init(void)
{
alt_up_audio_dev *audio_dev = alt_up_audio_open_dev (AUDIO_0_NAME);
if ( audio_dev == NULL)
printf ("Error: could not open audio device \n");
else
printf ("Opened audio device \n");
alt_up_audio_reset_audio_core(audio_dev);
alt_up_audio_disable_write_interrupt(audio_dev);
alt_up_audio_disable_read_interrupt(audio_dev);
alt_irq_register(AUDIO_0_IRQ, (void *)audio_dev, audio_isr);
alt_up_audio_enable_write_interrupt(audio_dev);
}
I don't remember how well that worked. Well enough to deserve a commit, but it was still a test, so don't give it too much importance. My final code was way too complicated to present here.
Hopefully, this is enough to get you started on the right track, which is to use Altera's IP. These IP are clear-source AFAIR, so if you don't want the NIOS, it should be simpler to start from their source than from scratch.
Probably you might require 3 modules, clock generator, audio configuration and audio serializer and deserializer. You no need to go for NIOS II based design. Plz check the altera lab experiment to understand how it works.
experiment link - https://www.altera.com/support/training/university/materials-lab-exercises.html#Digital-Logic-Exercises
pdf link - ftp://ftp.altera.com/up/pub/Altera_Material/Laboratory_Exercises/Digital_Logic/DE2-115/vhdl/lab12_VHDL.pdf.
also check for demo files
In Linux device driver development, the file_operations structure uses struct module *owner.
What is the use of this structure when we always initialize it with THIS_MODULE?
When can this field be set to NULL?
This field tells who is owner of struct file_operations. This prevents module to get unloaded when it is in operation. When initialized with THIS_MODULE current module holds the ownership on it.
Minimal runnable example
Whenever you create a kernel module, the kernel's build machinery generates a struct module object for you, and makes THIS_MODULE point to it.
This struct contains many fields, some of which can be set with module macros such as MODULE_VERSION.
This example shows how to access that information: module_info.c:
#include <linux/module.h>
#include <linux/kernel.h>
static int myinit(void)
{
/* Set by default based on the module file name. */
pr_info("name = %s\n", THIS_MODULE->name);
pr_info("version = %s\n", THIS_MODULE->version);
return 0;
}
static void myexit(void) {}
module_init(myinit)
module_exit(myexit)
MODULE_VERSION("1.0");
MODULE_LICENSE("GPL");
Dmesg outputs:
name = module_info
version = 1.0
Some MODULE_INFO fields can also be "accessed" in the following ways:
cat /sys/module/module_info/version
modinfo /module_info.ko | grep -E '^version:'
Since the address of that struct module object must be unique across all modules, it serves as a good argument for fops.owner as mentioned at: https://stackoverflow.com/a/19468893/895245. Here is a minimal example of that usage.
Tested in Linux kernel 4.16 with this QEMU + Buildroot setup.
[1] struct module *owner is commonly used at some structures and is not an operation at all; it is a pointer to the module that "owns"the structure. This field is used to prevent the module from being unloaded while its operations are in use. Almost all the time, it is simply initialized to THIS_MODULE, a macro defined in
< linux/module.h> .
.
[2] I would not recommend you to set it to null, because it may lead to driver malfunction and other problems. Instead, use the good practices of linux kernel development.
In some architectures the ".owner" was removed, so, make sure your distro and architecture still using it.
I hope it helps your understanding.
References: LDD3, kernel newbies.
file_operation is one of the main structures that used to connect the device numbers and the file operations of a driver.
There are lots of function pointer in the structure. The first pointer is struct module *owner which is not a function pointer at all but points to a structure module defined in the <linux/module.h>.
On initializing to THIS_MODULE, it holds the ownership of the module.
One of the main reasons to initialize struct module *owner to THIS_MODULE to prevent the module from getting unloaded while in use.
I am interested in writing emulators like for gameboy and other handheld consoles, but I read the first step is to emulate the instruction set. I found a link here that said for beginners to emulate the Commodore 64 8-bit microprocessor, the thing is I don't know a thing about emulating instruction sets. I know mips instruction set, so I think I can manage understanding other instruction sets, but the problem is what is it meant by emulating them?
NOTE: If someone can provide me with a step-by-step guide to instruction set emulation for beginners, I would really appreciate it.
NOTE #2: I am planning to write in C.
NOTE #3: This is my first attempt at learning the whole emulation thing.
Thanks
EDIT: I found this site that is a detailed step-by-step guide to writing an emulator which seems promising. I'll start reading it, and hope it helps other people who are looking into writing emulators too.
Emulator 101
An instruction set emulator is a software program that reads binary data from a software device and carries out the instructions that data contains as if it were a physical microprocessor accessing physical data.
The Commodore 64 used a 6502 Microprocessor. I wrote an emulator for this processor once. The first thing you need to do is read the datasheets on the processor and learn about its behavior. What sort of opcodes does it have, what about memory addressing, method of IO. What are its registers? How does it start executing? These are all questions you need to be able to answer before you can write an emulator.
Here is a general overview of how it would look like in C (Not 100% accurate):
uint8_t RAM[65536]; //Declare a memory buffer for emulated RAM (64k)
uint16_t A; //Declare Accumulator
uint16_t X; //Declare X register
uint16_t Y; //Declare Y register
uint16_t PC = 0; //Declare Program counter, start executing at address 0
uint16_t FLAGS = 0 //Start with all flags cleared;
//Return 1 if the carry flag is set 0 otherwise, in this example, the 3rd bit is
//the carry flag (not true for actual 6502)
#define CARRY_FLAG(flags) ((0x4 & flags) >> 2)
#define ADC 0x69
#define LDA 0xA9
while (executing) {
switch(RAM[PC]) { //Grab the opcode at the program counter
case ADC: //Add with carry
A = X + RAM[PC+1] + CARRY_FLAG(FLAGS);
UpdateFlags(A);
PC += ADC_SIZE;
break;
case LDA: //Load accumulator
A = RAM[PC+1];
UpdateFlags(X);
PC += MOV_SIZE;
break;
default:
//Invalid opcode!
}
}
According to this reference ADC actually has 8 opcodes in the 6502 processor, which means you will have 8 different ADC in your switch statement, each one for different opcodes and memory addressing schemes. You will have to deal with endianess and byte order, and of course pointers. I would get a solid understanding of pointer and type casting in C if you dont already have one. To manipulate the flags register you have to have a solid understanding of bitwise operations in C. If you are clever you can make use of C macros and even function pointers to save yourself some work, as the CARRY_FLAG example above.
Every time you execute an instruction, you must advance the program counter by the size of that instruction, which is different for each opcode. Some opcodes dont take any arguments and so their size is just 1 byte, while others take 16-bit integers as in my MOV example above. All this should be pretty well documented.
Branch instructions (JMP, JE, JNE etc) are simple: If some flag is set in the flags register then load the PC to the address specified. This is how "decisions" are made in a microprocessor and emulating them is simply a matter of changing the PC, just as the real microprocessor would do.
The hardest part about writing an instruction set emulator is debugging. How do you know if everything is working like it should? There are plenty of resources for helping you. People have written test codes that will help you debug every instruction. You can execute them one instruction at a time and compare the reference output. If something is different, you know you have a bug somewhere and can fix it.
This should be enough to get you started. The important thing is that you have A) A good solid understanding of the instruction set you want to emulate and B) a solid understanding of low level data manipulation in C, including type casting, pointers, bitwise operations, byte order, etc.
I would like to write a VPI/PLI interface which will open audio files (i.e. wav, aiff, etc)
and present the data to Verilog simulator. I am using Icarus at the moment and wish to
use libsndfile to handle input file formats and data type conversion.
I am not quite sure what to use in the C code ... have looked at IEEE 1364-2001 and still
confused which functions am I supposed to use.
Ideally I'd like to have a verilog module with data port (serial or parallel), clock input
and start/stop pin. I'd like to implement two modules, one for playback from from a file, and another would record output from a filter under test.
Can I do it all in C and just instantiate the module in my testbench or I'll have to write
a function (say $read_audio_data) and wrapper module to call it on each clock pulse ??
Hm, or may be I need to create the module and then get a handle for it and pass value/vect
to the handle somehow ?
I am not quite concerned about how file names will be set, as I probably
wouldn't do it from the verilog code anyway.
And I will probably stick to 24-bit integer samples for the time being and
libsndfile supposed to handle conversion quite nicely.
Perhaps, I'll stick to serial for now (may be even do in the I2S-like fashion) and
de-serialise it in Verilog if needed.
Also I have looked at Icarus plug-in which implements a video camera that reads PNG files,
though there are many more aspects to image processing then there is to audio.
Hence that code looks a bit overcomplicated to me at the moment - neither I managed to get
it to run.
I suggest approaching it like this:
figure out your C/Verilog interface
implement the audio file access with that interface in mind, but not worrying about VPI
implement the C/Verilog glue using VPI
The interface can probably be pretty simple. One function to open the audio file and specify any necessary parameters (sample size, big/little endian, etc...), and another function returns the next sample. If you need to support reading from multiple files in the same simulation, you'll need to pass sort of handle to the PLI functions to identify which file you're reading from.
The Verilog usage could be as simple as:
initial $OpenAudioFile ("filename");
always #(posedge clk)
audio_data <= $ReadSample;
The image-vpi sample looks like a reasonable example to start from. The basic idioms to use in the C code are:
Argument access
// Get a handle to the system task/function call that invoked your PLI routine
vpiHandle tf_obj = vpi_handle (vpiSysTfCall, NULL)
// Get an iterator for the arguments to your PLI routine
vpiHandle arg_iter = vpi_iterate (vpiArgument, tf_obj)
// Iterate through the arguments
vpiHandle arg_obj;
arg_obj = vpi_scan (arg_iter);
// do something with the first argument
arg_obj = vpi_scan (arg_iter);
// do something with the second argument
Retrieving values from Verilog
s_vpi_value v;
v.format = vpiIntVal;
vpi_get_value (handle, &v);
// value is in v.value.integer
Writing values to Verilog
s_vpi_value v;
v.format = vpiIntVal;
v.value.integer = 0x1234;
vpi_put_value (handle, &v, NULL, vpiNoDelay);
To read or write values larger than 32 bits, you will need to use vpiVectorVal instead of vpiIntVal, and de/encode a s_vpi_vector structure.
I have spent a few days now implementing the PLI testbench,
if anyone reads this and they may find it useful -
here is my source code. There is a readme file and below
is the screenshot of some basic results ;)
Use git clone git://github.com/errordeveloper/sftb to obtain
the code repo or download it from the github.com.
I have also wrote about this in my new blog, so hopefully
if anyone searches for this sort of thing they will find it.
I couldn't find anything similar, hence started this project!
This sounds like a good fit for Cocotb an open-source project which abstracts VPI to provide a Pythonic interface to your DUT. You wouldn't have to write any additional Verilog testbench or wrapper RTL or call VPI tasks or functions from Verilog as the testbenches are pure Python.
Your testbench as described would look something like this:
import cocotb
from cocotb.clock import Clock
from cocotb.triggers import RisingEdge
# Whatever audio-file IO library you happen to like best...
from scikits.audiolab import wavread
#cocotb.test()
def stream_file(dut, fname="testfile.wav")
# Start a clock generator
cocotb.fork(Clock(dut.clk, 5000))
data, sample_frequency, encoding = wavread(fname)
result = []
while data:
yield RisingEdge(dut.clk)
dut.data_in <= data.pop(0)
result.append(dut.data_out.value.integer)
# Write result to output file
Disclaimer: I'm one of the Cocotb developers and thus potentially biased, however I'd also challenge anybody to produce functionality similar to the above testbench as quickly and with fewer lines of (maintainable) code.