Lost Events With In Memory Recording - etw

Why do I keep getting lost events although I record into memory buffers?
That makes no sense to me. How can it happen that no buffers are dropped but single events? I have tracked the issue down to the CLR Rundown session which always looses some events. The problems seems to be that I have a large number of managed processes (ca. 60) which all at the same time try to send their events to ETW.
I can repro this with
C>xperf -start ClrRundown -on "Microsoft-Windows-DotNETRuntime":0x118:5+"Microsoft-Windows-DotNETRuntimeRundown":0x118:5 -buffersize 512 -minbuffers 512 -maxbuffers 1024 -Buffering
C>xperf -Loggers ClrRundown
Logger Name : ClrRundown
Logger Id : 1e
Logger Thread Id : 0000000000000000
Buffer Size : 512
Maximum Buffers : 512
Minimum Buffers : 512
Number of Buffers : 512
Free Buffers : 504
Buffers Written : 0
Events Lost : **29**
Log Buffers Lost : 0
Real Time Buffers Lost: 0
Flush Timer : 0
Age Limit : 0
Log File Mode : Buffered StopOnHybridShutdown IndependentSession
Maximum File Size : 0
Log Filename :
Trace Flags : ".NET Common Language Runtime":0x118:0x5+"Microsoft-Windows-DotNETRuntimeRundown":0x118:0x5
I do not care about the few lost events but I always get a warning from WPA when opening such a trace. That confuses non regular users of WPA which are afraid that they did something wrong and it blocks loading the trace file.
Is there a way to prevent loosing events? The only other flag I did find was -NoPerProcessorBuffering from xperf which did also not help. Increasing the buffersize to 8MB did also not change anything.
If there is no way to record data without lost events is there are cheap and fast way to reset the lost events count of the resulting not merged ETL file?

Since there is way to get rid of these spurious dropped events I have decided to reset the lost event counter of the ETL file directly:
If you call that method you can reset the LostEvents count which is an int:
// Lost event offset is taken from _TRACE_LOGFILE_HEADER32/64 which is the same for x64 and x86
const int LostEventOffset = 0x98;
private static void ResetLostEvents(string etlFile)
{
using (var file = File.OpenWrite(etlFile))
{
file.Seek(LostEventOffset, SeekOrigin.Begin);
using (BinaryWriter overwriter = new BinaryWriter(file))
{
overwriter.Write((int)0);
}
}
}
This was tested on Win7 and 10 x86, x64 which works for all ETL files I have got so far.

Related

During traffic generation from client to server, why does socket's send buffer queue get stuck at one point even when the size of send buffer is more?

I am trying to develop a socket congestion algorithm in diameter stack by comparing the socket send buffer size[default max size] and the actual bytes available in send buffer queue.
getsockopt(ainfo->socket,SOL_SOCKET,SO_SNDBUF,(void *)&n, &m); // Getting the max default size
retval = ioctl(ainfo->socket,TIOCOUTQ,&bytes_available); // Getting the actual bytes available
Testing scenario:
Start the client and server.
Once handshake is successful, start the traffic from client to server.
Block the packets at server's end using iptables.
Check the netstat output and check the send buffer size with help of ss command.
The send buffer size get stuck after sometime at a certain number. Example size of send buffer is 87090 [tb as shown in ss output]. The Send Q is stuck at some random number which is much smaller than the tb [for example : 54344]. Sometimes it increases till 135920. Ideally it should reach somewhere around 80k and then get stuck. Can somewhen explain me this unusual behavior ?
Any help is appreciated.
Thanks!

how to tune linux network buffer size

I'm reading "Kafka The Definitive Guide", in page 35 (networking section) it says :
... The first adjustment is to change the default and maximum amount of
memory allocated for the send and receive buffers for each socket. This will significantly increase performance for large transfers. The relevant parameters for the send and receive buffer default size per socket are net.core.wmem_default and
net.core.rmem_default......
In addition to the socket settings, the send and receive buffer sizes for TCP sockets must be set separately using the net.ipv4.tcp_wmem and net.ipv4.tcp_rmem parameters.
why we should set both net.core.wmem and net.ipv4.tcp_wmem?
Short answer - r/wmem_default are used for setting static socket buffer sizes, while tcp_r/wmem are used for controlling TCP send/receive window size and buffers dynamically.
More details:
By tracking the usages of r/wmem_default and tcp_r/wmem (kernel 4.14) we can see that r/wmem_default are only used in sock_init_data():
void sock_init_data(struct socket *sock, struct sock *sk)
{
sk_init_common(sk);
...
sk->sk_rcvbuf = sysctl_rmem_default;
sk->sk_sndbuf = sysctl_wmem_default;
This initializes the socket's buffers for sending and receiving packets and might be later overridden in set_sockopt:
int sock_setsockopt(struct socket *sock, int level, int optname,
char __user *optval, unsigned int optlen)
{
struct sock *sk = sock->sk;
...
sk->sk_sndbuf = max_t(int, val * 2, SOCK_MIN_SNDBUF);
...
sk->sk_rcvbuf = max_t(int, val * 2, SOCK_MIN_RCVBUF);
Usages of tcp_rmem are found in these functions: tcp_select_initial_window() in tcp_output.c and __tcp_grow_window(), tcp_fixup_rcvbuf(), tcp_clamp_window() and tcp_rcv_space_adjust() in tcp_input.c. In all usages this value is used for controlling the receive window and/or the socket's receive buffer dynamically, meaning it would take the current traffic and the system parameters into consideration.
A similar search for tcp_wmem show that it is only used for dynamic changes in the socket's send buffer in tcp_init_sock() (tcp.c) and tcp_sndbuf_expand() (tcp_input.c).
So when you want the kernel to better tune your traffic, the most important values are tcp_r/wmem. The Socket's size is usually overridden by the user the default value doesn't really matter. For exact tuning operations, try reading the comments in tcp_input.c marked as "tuning". There's a lot of valuable information there.
Hope this helps.

AudioUnit (Mac) AudioUnitRender internal buffer clash

I recently designed a Sound recorder on a mac using AudioUnits. It was designed to behave like a video security system, recording continuously, with a graphics display of power levels for playback browsing.
I've noticed that every 85 minutes distortion appears for 3 minutes. After a day of elimination it appears that the sound acquisition that occurs before callback is called uses a circular buffer, and the callback's audioUnitRender function extracts from this buffer but with a slightly slower speed, which eventually causes the internal buffer write to wrap around and catch up with audioUnitRender reads. The duplex operation test shows the latency ever increasing, and after 85 minutes you hear about 200-300ms of latency and the noise begins as the render buffer frame has a combination of buffer segments at end and beginning of buffer, i.e long and short latencies. as the pointers drift apart the noise disappears and you hear clean audio with original short latency, then it repeats again 85 mins later. Even with low impact callback processing this still happens. I've seen some posts regarding latency but none regarding clashes, has anyone seen this?
osx 10.9.5, xcode 6.1.1
code details:-
//modes 1=playback, 2=record, 3=both
AudioComponentDescription outputcd = {0}; // 10.6 version
outputcd.componentType = kAudioUnitType_Output;
outputcd.componentSubType = kAudioUnitSubType_HALOutput; //allows duplex
outputcd.componentManufacturer = kAudioUnitManufacturer_Apple;
AudioComponent comp = AudioComponentFindNext (NULL, &outputcd);
if (comp == NULL) {printf ("can't get output unit");exit (-1);}
CheckError (AudioComponentInstanceNew(comp, au),"Couldn't open component for outputUnit");
//tell input bus that its's input, tell output it's an output
if(mode==1 || mode==3) r=[self setAudioMode:*au :0];//play
if(mode==2 || mode==3) r=[self setAudioMode:*au :1];//rec
// register render callback
if(mode==1 || mode==3) [self setCallBack:*au :0];
if(mode==2 || mode==3) [self setCallBack:*au :1];
// if(mode==2 || mode==3) [self setAllocBuffer:*au];
// get default stream, change amt of channels
AudioStreamBasicDescription audioFormat;
UInt32 k=sizeof(audioFormat);
r= AudioUnitGetProperty(*au,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Output,
1,
&audioFormat,
&k);
audioFormat.mChannelsPerFrame=1;
r= AudioUnitSetProperty(*au,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Output,
1,
&audioFormat,
k);
//start
CheckError (AudioUnitInitialize(outputUnit),"Couldn't initialize output unit");
//record callback
OSStatus RecProc(void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList * ioData)
{
myView * mv2=(__bridge myView*)inRefCon;
AudioBuffer buffer,buffer2;
OSStatus status;
buffer.mDataByteSize = inNumberFrames *4 ;// buffer size
buffer.mNumberChannels = 1; // one channel
buffer.mData =mv2->rdata;
buffer2.mDataByteSize = inNumberFrames *4 ;// buffer size
buffer2.mNumberChannels = 1; // one channel
buffer2.mData =mv2->rdata2;
AudioBufferList bufferList;
bufferList.mNumberBuffers = 2;
bufferList.mBuffers[0] = buffer;
bufferList.mBuffers[1] = buffer2;
status = AudioUnitRender(mv2->outputUnit, ioActionFlags, inTimeStamp, inBusNumber, inNumberFrames, &bufferList);
[mv2 recproc :mv->rdata :mv->rdata2 :inNumberFrames];
return noErr;
}
You seem to be using the HAL output unit for pulling input. There might not be a guarantee that the input device and output device sample rates are exactly locked. Any slow slight drift in the sample rate of either device could eventually cause a buffer underflow or overflow.
One solution might be to find and set an input device for a separate input audio unit instead of depending on the default output unit. Try a USB mic, for instance.
According to this article https://www.native-instruments.com/forum/threads/latency-drift-problem-on-macbook.175551/ this problem appears to be a usb audio driver bug in maverick. I didn't find a kext replacement solution anywhere.
After making a sonar type tester (1 cycle 22khz square wave click every 600 ms to speaker, display selected recorded frame number after click) and could see the 3 to 4 samples drift per second along with the distortion/latency drift reset experience after 1.5 hrs, I decided to look around and find how to access the buffer pointers to stabilise the latency drift, but also no luck.
Also api latency queries show no changes as it drifts.
I did find that you could reset the latency with audiounitstop then audiounitstart (same thread), but it worked only if only one audiounit bus system wide was active. Research also showed that the latency could be reset if you toggle the hardware device sample-rate in Audio Midi Setup. this is a bit aggressive and would be uncomfortable for some.
My design toggled the nominalsamplerate (AudioObjectSetPropertyData with kAudioDevicePropertyNominalSampleRate) every 60 minutes (48000 then back to 44100), with delay by way of waiting for change notification through a callback.
This cause a 2 second void in audio input and output every hour. Safari playing a youtube video would mute, and cause a 1-2 second video freeze during this time . VLC showed the same but video remained smooth during 2 second silence.
Like I said, it wouldn't work for all, but I chose system wide 2 second mute every hour over a recording that has 3 minutes of fuzzy audio every 1.5 hrs. Its been posted that a yosemite upgrade fixes this, although some have also found crackling after going up to yosemite.

TCP receiving window size higher than net.core.rmem_max

I am running iperf measurements between two servers, connected through 10Gbit link. I am trying to correlate the maximum window size that I observe with the system configuration parameters.
In particular, I have observed that the maximum window size is 3 MiB. However, I cannot find the corresponding values in the system files.
By running sysctl -a I get the following values:
net.ipv4.tcp_rmem = 4096 87380 6291456
net.core.rmem_max = 212992
The first value tells us that the maximum receiver window size is 6 MiB. However, TCP tends to allocate twice the requested size, so the maximum receiver window size should be 3 MiB, exactly as I have measured it. From man tcp:
Note that TCP actually allocates twice the size of the buffer requested in the setsockopt(2) call, and so a succeeding getsockopt(2) call will not return the same size of buffer as requested in the setsockopt(2) call. TCP uses the extra space for administrative purposes and internal kernel structures, and the /proc file values reflect the larger sizes compared to the actual TCP windows.
However, the second value, net.core.rmem_max, states that the maximum receiver window size cannot be more than 208 KiB. And this is supposed to be the hard limit, according to man tcp:
tcp_rmem
max: the maximum size of the receive buffer used by each TCP socket. This value does not override the global net.core.rmem_max. This is not used to limit the size of the receive buffer declared using SO_RCVBUF on a socket.
So, how come and I observe a maximum window size larger than the one specified in net.core.rmem_max?
NB: I have also calculated the Bandwidth-Latency product: window_size = Bandwidth x RTT which is about 3 MiB (10 Gbps # 2 msec RTT), thus verifying my traffic capture.
A quick search turned up:
https://github.com/torvalds/linux/blob/4e5448a31d73d0e944b7adb9049438a09bc332cb/net/ipv4/tcp_output.c
in void tcp_select_initial_window()
if (wscale_ok) {
/* Set window scaling on max possible window
* See RFC1323 for an explanation of the limit to 14
*/
space = max_t(u32, sysctl_tcp_rmem[2], sysctl_rmem_max);
space = min_t(u32, space, *window_clamp);
while (space > 65535 && (*rcv_wscale) < 14) {
space >>= 1;
(*rcv_wscale)++;
}
}
max_t takes the higher value of the arguments. So the bigger value takes precedence here.
One other reference to sysctl_rmem_max is made where it is used to limit the argument to SO_RCVBUF (in net/core/sock.c).
All other tcp code refers to sysctl_tcp_rmem only.
So without looking deeper into the code you can conclude that a bigger net.ipv4.tcp_rmem will override net.core.rmem_max in all cases except when setting SO_RCVBUF (whose check can be bypassed using SO_RCVBUFFORCE)
net.ipv4.tcp_rmem takes precedence net.core.rmem_max according to https://serverfault.com/questions/734920/difference-between-net-core-rmem-max-and-net-ipv4-tcp-rmem:
It seems that the tcp-setting will take precendence over the common max setting
But I agree with what you say, this seems to conflict with what's written in man tcp, and I can reproduce your findings. Maybe the documentation is wrong? Please find out and comment!

Android-dev AudioRecord without blocking or threads

I wish to record the microphone audio stream so I can do realtime DSP on it.
I want to do so without having to use threads and without having .read() block while it waits for new audio data.
UPDATE/ANSWER: It's a bug in Android. 4.2.2 still has the problem, but 5.01 IS FIXED! I'm not sure where the divide is but that's the story.
NOTE: Please don't say "Just use threads." Threads are fine but this isn't about them, and the android developers intended for AudioRecord to be fully usable without me having to specify threads and without me having to deal with blocking read(). Thank you!
Here is what I have found:
When the AudioRecord object is initialized, it creates its own internal ring type buffer.
When .start() is called, it begins recording to said ring buffer (or whatever kind it really is.)
When .read() is called, it reads either half of bufferSize or the specified number of bytes (whichever is less) and then returns.
If there is more than enough audio samples in the internal buffer, then read() returns instantly with the data. If there is not enough yet, then read() waits till there is, then returns with the data.
.setRecordPositionUpdateListener() can be used to set a Listener, and .setPositionNotificationPeriod() and .setNotificationMarkerPosition() can be used to set the notification Period and Position, respectively.
However, the Listener seems to be never called unless certain requirements are met:
1: The Period or Position must be equal to bufferSize/2 or (bufferSize/2)-1.
2: A .read() must be called before the the Period or Position timer starts counting - in other words, after calling .start() then also call .read(), and each time the Listener is called, call .read() again.
3: .read() must read at least half of bufferSize each time.
So using these rules I am able to get the callback/Listener working, but for some reason the reads are still blocking and I can't figure out how to get the Listener to only be called when there is a full read's worth.
If I rig up a button view to click to read, then I can tap it and if tap rapidly, read blocks. But if I wait for the audio buffer to fill, then the first tap is instant (read returns right away) but subsiquent rapid taps are blocked because read() has to wait, I guess.
Greatly appreciated would be any insight on how I might make the Listener work as intended - in such a way that my listener gets called when there's enough data for read() to return instantly.
Below is the relavent parts of my code.
I have some log statements in my code which send strings to logcat which allows me to see how long each command is taking, and this is how I know that read() is blocking.
(And the buttons in my simple test app also are very doggy slow to respond when it is reading repeatedly, but CPU is not pegged.)
Thanks,
~Jesse
In my OnCreate():
bufferSize=AudioRecord.getMinBufferSize(samplerate,AudioFormat.CHANNEL_CONFIGURATION_MONO,AudioFormat.ENCODING_PCM_16BIT)*4;
recorder = new AudioRecord (AudioSource.MIC,samplerate,AudioFormat.CHANNEL_CONFIGURATION_MONO,AudioFormat.ENCODING_PCM_16BIT,bufferSize);
recorder.setRecordPositionUpdateListener(mRecordListener);
recorder.setPositionNotificationPeriod(bufferSize/2);
//recorder.setNotificationMarkerPosition(bufferSize/2);
audioData = new short [bufferSize];
recorder.startRecording();
samplesread=recorder.read(audioData,0,bufferSize);//This triggers it to start doing the callback.
Then here is my listener:
public OnRecordPositionUpdateListener mRecordListener = new OnRecordPositionUpdateListener()
{
public void onPeriodicNotification(AudioRecord recorder) //This one gets called every period.
{
Log.d("TimeTrack", "AAA");
samplesread=recorder.read(audioData,0,bufferSize);
Log.d("TimeTrack", "BBB");
//player.write(audioData, 0, samplesread);
//Log.d("TimeTrack", "CCC");
reads++;
}
#Override
public void onMarkerReached(AudioRecord recorder) //This one gets called only once -- when the marker is reached.
{
Log.d("TimeTrack", "AAA");
samplesread=recorder.read(audioData,0,bufferSize);
Log.d("TimeTrack", "BBB");
//player.write(audioData, 0, samplesread);
//Log.d("TimeTrack", "CCC");
}
};
UPDATE: I have tried this on Android 2.2.3, 2.3.4, and now 4.0.3, and all act the same.
Also: There is an open bug on code.google about it - one entry started in 2012 by someone else then one from 2013 started by me (I didn't know about the first):
UPDATE 2016: Ahhhh finally after years of wondering if it was me or android, I finally have answer! I tried my above code on 4.2.2 and same problem. I tried above code on 5.01, AND IT WORKS!!! And the initial .read() call is NOT needed anymore either. Now, once the .setPositionNotificationPeriod() and .StartRecording() are called, mRecordListener() just magically starts getting called every time there is data available now so it no longer blocks, because the callback is not called until after enough data has been recorded. I haven't listened to the data to know if it's recording correctly, but the callback is happening like it should, and it is not blocking the activity, like it used to!
http://code.google.com/p/android/issues/detail?id=53996
http://code.google.com/p/android/issues/detail?id=25138
If folks who care about this bug log in and vote for and/or comment on the bug maybe it'll get addressed sooner by Google.
It's late answear, but I think I know where Jesse did a mistake. His read call is getting blocked because he is requesting shorts which are sized same as buffer size, but buffer size is in bytes and short contains 2 bytes. If we make short array to be same length as buffer we will read twice as much data.
The solution is to make audioData = new short[bufferSize/2] If the buffer size is 1000 bytes, this way we will request 500 shorts which are 1000 bytes.
Also he should change samplesread=recorder.read(audioData,0,bufferSize) to samplesread=recorder.read(audioData,0,audioData.length)
UPDATE
Ok, Jesse. I can see where another mistake can be - the positionNotificationPeriod. This value have to be large enought so it won't call the listener too often and we need to make sure that when the listener is called the bytes to read are ready to be collected. If bytes won't be ready when the listener is called, the main thread will get blocked by recorder.read(audioData, 0, audioData.length) call until requested bytes get's collected by AudioRecord.
You should calculate buffer size and shorts array length based on time interval you set - how often you want the listener to be called. Position notification period, buffer size and shorts array length all have to be adjusted correctly. Let me show you an example:
int periodInFrames = sampleRate / 10;
int bufferSize = periodInFrames * 1 * 16 / 8;
audioData = new short [bufferSize / 2];
int minBufferSize = AudioRecord.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT);
if (bufferSize < minBufferSize) bufferSize = minBufferSize;
recorder = new AudioRecord(AudioSource.MIC, sampleRate, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, buffersize);
recorder.setRecordPositionUpdateListener(mRecordListener);
recorder.setPositionNotificationPeriod(periodInFrames);
recorder.startRecording();
public OnRecordPositionUpdateListener mRecordListener = new OnRecordPositionUpdateListener() {
public void onPeriodicNotification(AudioRecord recorder) {
samplesread = recorder.read(audioData, 0, audioData.length);
player.write(short2byte(audioData));
}
};
private byte[] short2byte(short[] data) {
int dataSize = data.length;
byte[] bytes = new byte[dataSize * 2];
for (int i = 0; i < dataSize; i++) {
bytes[i * 2] = (byte) (data[i] & 0x00FF);
bytes[(i * 2) + 1] = (byte) (data[i] >> 8);
data[i] = 0;
}
return bytes;
}
So now a bit of explanation.
First we set how often the listener have to be called to collect audio data (periodInFrames). PositionNotificationPeriod is expressed in frames. Sampling rate is expressed in frames per second, so for 44100 sampling rate we have 44100 frames per second. I divided it by 10 so the listener will be called every 4410 frames = 100 milliseconds - that's reasonable time interval.
Now we calculate buffer size based on our periodInFrames so any data won't be overriden before we collect it. Buffer size is expressed in bytes. Our time interval is 4410 frames, each frame contains 1 byte for mono or 2 bytes for stereo so we multiply it by number of channels (1 in your case). Each channel contains 1 byte for ENCODING_8BIT or 2 bytes for ENCODING_16BIT so we multiply it by bits per sample (16 for ENCODING_16BIT, 8 for ENCODING_8BIT) and divide it by 8.
Then we set audioData length to be half of the bufferSize so we make sure that when the listener gets called, bytes to read are already there waiting to be collected. That's because short contains 2 bytes and bufferSize is expressed in bytes.
Then we check if bufferSize is large enought to succesfully initialize AudioRecord object, if it's not then we set bufferSize to it's minimal size - we don't need to change our time interval or audioData length.
In our listener we read and store data to short array. That's why we use audioData.length instead buffer size, because only audioData.length can tell us the number of shorts the buffer contains.
I had it working some time ago so please let me know if it will work for you.
I'm not sure why you're avoiding spawning separate threads, but if it's because you don't want have to deal with coding them properly, you can use .schedule on a Timer object after each .read, where the time interval is set to the time it takes to get your buffer filled (number of samples in buffer / sampleRate). Yes I know this is using a separate thread, but this advice was given assuming that the reason you were avoiding using threads was to avoid having to code them properly.
This way, the longest time it can possibly block the thread for should be neglible. But I don't know why you'd want to do that.
If the above reason is not why you're avoiding using separate threads, may I ask why?
Also, what exactly do you mean by realtime? Do you intend to playback the affected audio using, let's say, an AudioTrack? Because the latency on most Android devices is pretty bad.

Resources