WCETK: How to interpret test results - windows-ce

doing a graphic performance test with CETK (direct draw performance, gdi performance).
Can someone explain how to interpret the test results?
In detail, what do Rgn Count, SampleCount, Min, Max, Std Deviation etc. mean?
Is there some documentation on these available?
Thanks in advance.
Regards, Timm
Here, some example results:
Rgn Count=0
SampleCount=10
Min=0.000000
Max=1000.00000
Mean=700.000000
Std Deviation=577.350269
CV%=82.478610

I hope you are familiar with Filters and Pins.
Sample Count:
When a pin delivers media data to another pin, it does not pass a direct pointer to the memory buffer. Instead, it delivers a pointer to a COM object that manages the memory. This object, called a media sample. An allocator creates a finite pool of samples. The allocator uses reference counting to keep track of the samples. The GetBuffer method returns a sample with a reference count of 1. If the reference count goes to zero, the sample goes back into the allocator's pool, where it can be used in the next GetBuffer call. This is somewhat similar to counting semaphores. While a filter is using a buffer, it holds reference count on the sample. The allocator uses the reference count to determine when it can re-use the buffer. This prevents a filter from overwriting a buffer that another filter is still using. A sample does not return to the allocator's pool of available samples until every filter has released it.
Standard Deviation:
The number of times the benchmark engine samples a particular test case depends on the standard deviation of an initial sample.
I'm not sure about the remaining terms. Maybe, this link could shed some light.

Related

Calculation dynamic delay in AnyLogic

Good day!
Please, help me understand how the Delay block works in AnyLogic. Suppose we deal with a multichannel transmission network.
The model has 2 sources. Suppose these sources generate packets every 1 sec. Packets from different sources have different priorities and need different quantities of resources to be served (it is set up with Priority and Resource_quantity parameters respectively). The Priority_queue in the model is priority-based. The proposed model put the packets into the channels in accordance with Resource availability in the channel. Firstly, it tries to put the packet to the first channel. If there are no available resources it puts the packet into the second channel. If there are no resources in both channels it waits (it is realized with Hold block).
I noticed that if I set delays in blocks delay1 and delay2 with static parameters (for ex. 2 sec) the model works ok. But then I try to calculate it before these blocks the model doesn't take into consideration it at all. And in this case, the model works without any delays.
What did I do wrong here?
I will appreciate any help.
The delay is calculated in Exit block and is written into the variable delay of the agent. I tried to add traceln(agent.delay) as #Jaco-Ben suggested right after calculation of the delay and it showed zero. In this case it doesn't also seize resources :(
Thank #Jaco-Ben for the useful comments.
The delay is zero because
the result of division in Java depends on the types of the operands.
If both operands are integer, the result will be integer as well. To
let Java perform real division (and get a real number as a result) at
least one of the operands must be of real type.
So it was my problem.
To solve it I assigned double to one of the operands :
agent.delay = (double)agent.Resource_quantity/ChannelResources1.idle();
However, it is strange why it shows correct values in the database.

Why GBuffers need to be created for each frame in D3D12?

I have experience with D3D11 and want to learn D3D12. I am reading the official D3D12 multithread example and don't understand why the shadow map (generated in the first pass as a DSV, consumed in the second pass as SRV) is created for each frame (actually only 2 copies, as the FrameResource is reused every 2 frames).
The code that creates the shadow map resource is here, in the FrameResource class, instances of which is created here.
There is actually another resource that is created for each frame, the constant buffer. I kind of understand the constant buffer. Because it is written by CPU (D3D11 dynamic usage) and need to remain unchanged until the GPU finish using it, so there need to be 2 copies. However, I don't understand why the shadow map needs to do the same, because it is only modified by GPU (D3D11 default usage), and there are fence commands to separate reading and writing to that texture anyway. As long as the GPU follows the fence, a single texture should be enough for the GPU to work correctly. Where am I wrong?
Thanks in advance.
EDIT
According to the comment below, the "fence" I mentioned above should more accurately be called "resource barrier".
The key issue is that you don't want to stall the GPU for best performance. Double-buffering is a minimal requirement, but typically triple-buffering is better for smoothing out frame-to-frame rendering spikes, etc.
FWIW, the default behavior of DXGI Present is to stall only after you have submitted THREE frames of work, not two.
Of course, there's a trade-off between triple-buffering and input responsiveness, but if you are maintaining 60 Hz or better than it's likely not noticeable.
With all that said, typically you don't need to double-buffered depth/stencil buffers for rendering, although if you wanted to make the initial write of the depth-buffer overlap with the read of the previous depth-buffer passes then you would want distinct buffers per frame for performance and correctness.
The 'writes' are all complete before the 'reads' in DX12 because of the injection of the 'Resource Barrier' into the command-list:
void FrameResource::SwapBarriers()
{
// Transition the shadow map from writeable to readable.
m_commandLists[CommandListMid]->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_shadowTexture.Get(), D3D12_RESOURCE_STATE_DEPTH_WRITE, D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE));
}
void FrameResource::Finish()
{
m_commandLists[CommandListPost]->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_shadowTexture.Get(), D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE, D3D12_RESOURCE_STATE_DEPTH_WRITE));
}
Note that this sample is a port/rewrite of the older legacy DirectX SDK sample MultithreadedRendering11, so it may be just an artifact of convenience to have two shadow buffers instead of just one.

What exactly does ALSA's snd_pcm_delay() return?

I want to use snd_pcm_delay() to query the delay until the sample I am about write to the ALSA buffer are hearable. I expect this value to vary between individual calls. Though, on two system this value is constant. The function returns a value that is always equal to the period size on one platform and on the other platform it is equal to the buffer size (two times the period size in my code).
Is my understanding of snd_pcm_delay() wrong? Is it a driver problem?
The delay is proportional to the number of samples in the buffer (the inverse of snd_pcm_avail()), plus a time that describes how much time is needed to move samples from the buffer to the speakers. The latter part is driver dependent and might not be implemented.
If the device takes out samples one entire period at a time (some DMA controllers have no better granularity for reporting the current position), then the delay value will appear to stay constant for a time, and then jump by an entire period. And you see that jump only before you have re-filled the buffer.

Variable length messages in Verilog (serial CRC-32)

I'm working with a serial protocol. Messages are of variable length that is known in advance. On both transmission and reception sides, I have the message saved to a shift register that is as long as the longest possible message.
I need to calculate CRC32 of these registers, the same as for Ethernet, as fast as possible. Since messages are variable length (anything from 12 to 64 bits), I chose serial implementation that should run already in parallel with reception/transmission of the message.
I ran into a problem with organization of data before calculation. As specified here , the data needs to be bit-reversed, padded with 32 zeros and complemented before calculation.
Even if I forget the part about running in parallel with receiving or transmitting data, how can I effectively get only my relevant message from max-length register so that I can pad it before calculation? I know that ideas like
newregister[31:0] <= oldregister[X:0] // X is my variable length
don't work. It's also impossible to have the generate for loop clause that I use to bit-reverse the old vector run variable number of times. I could use a counter to serially move data to desired length, but I cannot afford to lose this much time.
Alternatively, is there an operation that would directly give me the padded and complemented result? I do not even have an idea how to start developing such an idea.
Thanks in advance for any insight.
You've misunderstood how to do a serial CRC; the Python question you quote isn't relevant. You only need a 32-bit shift register, with appropriate feedback taps. You'll get a million hits if you do a Google search for "serial crc" or "ethernet crc". There's at least one Xilinx app note that does the whole thing for you. You'll need to be careful to preload the 32-bit register with the correct value, and whether or not you invert the 32-bit data on completion.
EDIT
The first hit on 'xilinx serial crc' is xapp209, which has the basic answer in fig 1. On top of this, you need the taps, the preload value, whether or not to invert the answer, and the value to check against on reception. I'm sure they used to do all this in another app note, but I can't find it at the moment. The basic references are the Ethernet 802.3 spec (3.2.8 Frame check Sequence field, which was p27 in the original book), and the V42 spec (8.1.1.6.2 32-bit frame check sequence, page 311 in the old CCITT Blue Book). Both give the taps. V42 requires a preload to all 1's, invert of completion, and gives the test value on reception. Warren has a (new) chapter in Hacker's Delight, which shows the taps graphically; see his website.
You only need the online generators to check your solution. Be careful, though: they will generally have different preload values, and may or may not invert the result, and may or may not be bit-reversed.
Since X is a viarable, you will need to bit assignments with a for-loop. The for-loop needs to be inside an always block and the for-loop must static unroll (ie the starting index, ending index, and step value must be constants).
for(i=0; i<32; i=i+1) begin
if (i<X)
newregister[i] <= oldregister[i];
else
newregister[i] <= 1'b0; // pad zeros
end

explain me a difference of how MRTG measures incoming data

Everyone knows that MRTG needs at least one value to be passed on it's input.
In per-target options MRTG has 'gauge', 'absolute' and default (with no options) behavior of 'what to do with incoming data'. Or, how to count it.
Lets look at the elementary, yet popular example :
We pass cumulative data from network interface statistics of 'how much packets were recieved by the interface'.
We take it from '/proc/net/dev' or look at 'ifconfig' output for certain network interface. The number of recieved bytes is increasing every time. Its cumulative.
So as i can imagine there could be two types of possible statistics:
1. How fast this value changes upon the time interval. In oher words - activity.
2. Simple, as-is growing graphic that just draw every new value per every minute (or any other time interwal)
First graphic will be saltatory (activity). Second will just grow up every time.
I read twice rrdtool's and MRTG's docs and can't understand which option mentioned above counts what.
I suppose (i am not sure) that 'gauge' draw values as is, without any differentiation calculations (good for measuring how much memory or cpu is used every 5 minutes). And default or 'absolute' behavior tryes to calculate the speed between nearby measures, but what's the differencr between last two?
Can you, guys, explain in a simple manner which behavior stands after which option of three options possible?
Thanks in advance.
MRTG assumes that everything is being measured as a rate (even if it isnt a rate)
Type 'gauge' assumes that you have already calculated the rate; thus, the provided value is stored as-is (after Data Normalisation). This is appropriate for things like CPU usage.
Type 'absolute' assumes the value passed is the count since the last update. Thus, the value is divided by the number of seconds since the last update to get a rate in thingies per second. This is rarely used, and only for certain unusual data sources that reset their value on being read - eg, a script that counts the number of lines in a log file, then truncates the log file.
Type 'counter' (the default) assumes the value passed is a constantly growing count, possibly that wraps around at 16 or 64 bits. The difference between the value and its previous value is divided by the number of seconds since the last update to get a rate in thingies per second. If it sees the value decrease, it will assume a counter wraparound at 16 or 64 bit. This is appropriate for something like network traffic counters, which is why it is the default behaviour (MRTG was originally written for network traffic graphs)
Type 'derive' is like 'counter', but will allow the counter to decrease (resulting in a negative rate). This is not possible directly in MRTG but you can manually create the necessary RRD if you want.
All types subsequently perform Data Normalisation to adjust the timestamp to a multiple of the Interval. This will be more noticeable for Gauge types where the value is small than for counter types where the value is large.
For information on this, see Alex van der Bogaerdt's excellent tutorial

Resources