Which one is better performance wise? - operation

Performance wise, which is better:
a. filesizeInMB = filesize / (1024 * 1024)
or
b. filesizeInMB = filesize / 1024 / 1024

It is often better to avoid division. Thus a would be better.
On the other hand. If your compiler is smart. It could maybe see that dividing by 1024 is equal to a bit shifting operation. In that case b could be even faster.

Related

Linux Huge pages memory usage calculation

I read the article about Linux Huge pages technology and misunderstood some important detail.
Here is the phrase:
For example, if you use HugePages with 64-bit hardware, and you want
to map 256 MB of memory, you may need one page table entry (PTE). If
you do not use HugePages, and you want to map 256 MB of memory, then
you must have 256 MB * 1024 KB/4 KB = 65536 PTEs.
I don't understand what is 1024 KB in this formula. I think it should be just 256 MB / 4 KB to calculate the number of table entries. Is there a typo in formula or am I wrong?
I agree that it is confusing. After reading it several times, I believe that it is as simple as a matter of unit conversion. At school the mathematics/physics/chemistry teachers always told us to use the same units when doing operations in order to obtain coherent results.
The value 256 is expressed in megabytes (MB). To divide it by 4 expressed in kilo-bytes (KB), you need to convert it into kilo-bytes. Hence, the multiplication by 1024KB (= 1MB). So, literally the operation is: (256 x 1024) / 4 = 65536 which is the simplification of: (256 x 1024 x 1024) / (4 x 1024)

Multi threading in OpenCl

I have started working on OpenCl and have some basic knowledge about how WorkGroups and kernel works. Suppose I have a vector of size 1024, and the WorkGroupSize of my GPU is 256. So my WorkGroupSize is a multiple of my VectorSize and this works pretty well as an example. But in real world scenarios, the VectorSize is not completely divisible by WorkGroupSize. So how to deal wit such problems? Is there any way to pass null values to make the VectorSize completely divisible by WorkgroupSize?
It is absolutely possible to pad input buffers to be round multiplies of the workgroup size you select for your kernel. However, it often isn't practical just because you need to have a algorithm which can naturally handle uninitialized or extra invalid data without error.
A far simpler solution is just to pass the input buffer length as an argument and then enclose the calculation code in an if statement based on the thread index, something like:
__kernel void kernel(....., unsigned int N)
{
unsigned int tid = get_global_id(0);
if (tid < N) {
/* kernel buffer access goes here */
}
}
This doesn't cause significant performance penalties because the conditional statement will evaluate uniformly across every workgroup except one. You then round up the number of workgroups you launch by one to ensure the whole input buffer is processed.
You do not need to fill yourself the WorkGroup: Queueing a kernel for less than the maximum Work-items per work-group is fine.
So for example, if you have 1100 items, you could work in groups of:
[256, 256, 256, 256, 76] and this will run as fast as 5 groups of 256 (1280 items).
Obviously, if your run 6 smaller groups [200, 200, 200, 200, 200, 100], it will be slower.

FFMPEG Understanding AVFrame::linesize (Audio)

As per the doucmentation of AVFrame, for audio, lineSize is size in bytes of each plane and only linesize[0] may be set. But however, am unsure whether lineszie[0] is holding per plane buffer size or is it the complete buffer size and we have to divide it by no of channels to get per plane buffer size.
For Example, when I call
int data_size = av_samples_get_buffer_size(NULL, iDesiredNoOfChannels, iAudioSamples, (AVSampleFormat)iDesiredFormat, 0) ; For iDesiredNoOfChannels = 2, iAudioSamples = 1024 & iDesiredFormat = AV_SAMPLE_FMT_FLTP data_size=8192. Pretty straightforward, as each sample is 4 bytes and since there are 2 channels total memory will be (1024 * 4 * 2) bytes. As such lineSize[0] should be 4096 for planar audio. data[0] & data[1] should be each of size 4096. However, pFrame->lineSize[0] is giving 8192. So to get the size per plane, I have to do pFrame->lineSize[0] / pFrame->channels. Isn't this behaviour different from what the documentation suggests or is my understanding of the documentaion wrong.
Old question but thought I'd answer it anyway for people who may be wondering the same thing.
In all audio AVFrames, only linesize[0] may be set and they are all required to be the same size. You should not be using linesize[1], etc. I don't know why they chose to do things this way because it's not consistent with video frames, but whatever. Just remember whether interleaved or planar only linesize[0] matters so you have to divide by channel count for planar.

Transforming Audio Samples From Time Domain to Frequency Domain

as a software engineer I am facing with some difficulties while working on a signal processing problem. I don't have much experience in this area.
What I try to do is to sample the environmental sound with 44100 sampling rate and for fixed size windows to test if a specific frequency (20KHz) exists and is higher than a threshold value.
Here is what I do according to the perfect answer in How to extract frequency information from samples from PortAudio using FFTW in C
102400 samples (2320 ms) is gathered from audio port with 44100 sampling rate. Sample values are between 0.0 and 1.0
int samplingRate = 44100;
int numberOfSamples = 102400;
float samples[numberOfSamples] = ListenMic_Function(numberOfSamples,samplingRate);
Window size or FFT Size is 1024 samples (23.2 ms)
int N = 1024;
Number of windows is 100
int noOfWindows = numberOfSamples / N;
Splitting samples to noOfWindows (100) windows each having size of N (1024) samples
float windowSamplesIn[noOfWindows][N];
for i:= 0 to noOfWindows -1
windowSamplesIn[i] = subarray(samples,i*N,(i+1)*N);
endfor
Applying Hanning window function on each window
float windowSamplesOut[noOfWindows][N];
for i:= 0 to noOfWindows -1
windowSamplesOut[i] = HanningWindow_Function(windowSamplesIn[i]);
endfor
Applying FFT on each window (real to complex conversion done inside the FFT function)
float frequencyData[noOfWindows][samplingRate/2];
for i:= 0 to noOfWindows -1
frequencyData[i] = RealToComplex_FFT_Function(windowSamplesOut[i], samplingRate);
endfor
In the last step, I use the FFT function implemented in this link: http://www.codeproject.com/Articles/9388/How-to-implement-the-FFT-algorithm ; because I cannot implement an FFT function from the scratch.
What I can't be sure is while giving N (1024) samples to FFT function as input, samplingRate/2 (22050) decibel values is returned as output. Is it what an FFT function does?
I understand that because of Nyquist Frequency, I can detect half of sampling rate frequency at most. But is it possible to get decibel values for each frequency up to samplingRate/2 (22050) Hz?
Thanks,
Vahit
See see How do I obtain the frequencies of each value in an FFT?
From a 1024 sample input, you can get back 512 meaningful frequency-levels.
So, yes, within your window, you'll get back a level for the Nyquist frequency.
The lowest frequency level you'll see is for DC (0 Hz), and the next one up will be for SampleRate/1024, or around 44 Hz, the next for 2 * SampleRate/1024, and so on, up to 512 * SampleRate / 1024 Hz.
Since only one band is used in your FFT, I would expect your results to be tarnished by side-band effects, even with proper windowing. It might work, but you might also get false positives with some input frequencies. Also, your signal is close to your niquist, so you are assuming a fairly good signal path up to your FFT. I don't think this is the right approach.
I think a better approach to this kind of signal detection would be with a high order filter (depending on your requirements, I would guess fourth or fifth order, which isn't actually that high). If you don't know how to design a high order filter, you could use two or three second order filters in series. Designing a second order filter, sometimes called a "biquad" is described here:
http://www.musicdsp.org/files/Audio-EQ-Cookbook.txt
albeit very tersely and with some assumptions of prior knowledge. I would use a high-pass (HP) filter with corner frequency as low as you can make it, probably between 18 and 20 kHz. Keep in mind there is some attenuation at the corner frequency, so after applying a filter multiple times you will drop a little signal.
After you filter the audio, take the RMS or average amplitude (that is, the average of the absolute value), to find the average level over a time period.
This technique has several advantages over what you are doing now, including better latency (you can start detecting within a few samples), better reliability (you won't get false-positives in response to loud signals at spurious frequencies), and so on.
This post might be of relevance: http://blog.bjornroche.com/2012/08/why-eq-is-done-in-time-domain.html

NodeJS ReadStream not reading bufferSize bytes at a time

I have a code where the NodeJS server reads a file and streams it to response, it looks like:
var fStream = fs.createReadStream(filePath, {'bufferSize': 128 * 1024});
fStream.pipe(response);
The issue is, Node reads the file exactly 40960 bytes a time. However, my app would be much more efficient (due to reasons not applicable to this question), if it reads 131072 (128 * 1024) bytes at a time.
Is there a way to force Node to read 128 * 1024 bytes at a time?
The accepted answer is wrong. You can force Node to read (128*1024) bytes at a time using the highWaterMark option.
var fStream = fs.createReadStream('/foo/bar', { highWaterMark: 128 * 1024 });
The Documentation specifically states that 'the amount of data potentially buffered depends on the highWaterMark option passed into the streams constructor. For normal streams, the highWaterMark option specifies a total number of bytes. For streams operating in object mode, the highWaterMark specifies a total number of objects.'
Also, see this. The default buffer size is 64 KiloBytes
I'm new here, so bear with me....
I found this in node's sources:
var toRead = Math.min(pool.length - pool.used, ~~this.bufferSize);
and:
var kPoolSize = 40 * 1024;
So it seems that the buffer size is limited to 40kb, no matter what you provide. You could try to change the value in the code and rebuild node. That's probably not a very maintainable solution though...

Resources