ffmpeg memery leak with av_read_frame - memory-leaks

I use av_read_frame in two ways below:
1.
for(;;){
if (av_read_frame(pFormatCtx, packet) >= 0){
av_packet_unref(packet);
}
}
2.
for(;;){
packet = av_packet_alloc();
if (av_read_frame(pFormatCtx, packet) >= 0){
av_packet_free(&packet);
}
}
Noting to do but read file and call free methods with FFMPEG APIs.
Both have a memery leak problem.(1.72GB film file got 300MB memory increment)
Anyone got rosulation?

Related

How to mmap() a large file without risking the OOM killer?

I've got an embedded ARM Linux box with a limited amount of RAM (512MB) and no swap space, on which I need to create and then manipulate a fairly large file (~200MB). Loading the entire file into RAM, modifying the contents in-RAM, and then writing it back out again would sometimes invoke the OOM-killer, which I want to avoid.
My idea to get around this was to use mmap() to map this file into my process's virtual address space; that way, reads and writes to the mapped memory-area would go out to the local flash-filesystem instead, and the OOM-killer would be avoided since if memory got low, Linux could just flush some of the mmap()'d memory pages back to disk to free up some RAM. (That might make my program slow, but slow is okay for this use-case)
However, even with the mmap() call, I'm still occasionally seeing processes get killed by the OOM-killer while performing the above operation.
My question is, was I too optimistic about how Linux would behave in the presence of both a large mmap() and limited RAM? (i.e. does mmap()-ing a 200MB file and then reading/writing to the mmap()'d memory still require 200MB of available RAM to accomplish reliably?) Or should mmap() be clever enough to page out mmap'd pages when memory is low, but I'm doing something wrong in how I use it?
FWIW my code to do the mapping is here:
void FixedSizeDataBuffer :: TryMapToFile(const std::string & filePath, bool createIfNotPresent, bool autoDelete)
{
const int fd = open(filePath.c_str(), (createIfNotPresent?(O_CREAT|O_EXCL|O_RDWR):O_RDONLY)|O_CLOEXEC, S_IRUSR|(createIfNotPresent?S_IWUSR:0));
if (fd >= 0)
{
if ((autoDelete == false)||(unlink(filePath.c_str()) == 0)) // so the file will automatically go away when we're done with it, even if we crash
{
const int fallocRet = createIfNotPresent ? posix_fallocate(fd, 0, _numBytes) : 0;
if (fallocRet == 0)
{
void * mappedArea = mmap(NULL, _numBytes, PROT_READ|(createIfNotPresent?PROT_WRITE:0), MAP_SHARED, fd, 0);
if (mappedArea)
{
printf("FixedSizeDataBuffer %p: Using backing-store file [%s] for %zu bytes of data\n", this, filePath.c_str(), _numBytes);
_buffer = (uint8_t *) mappedArea;
_isMappedToFile = true;
}
else printf("FixedSizeDataBuffer %p: Unable to mmap backing-store file [%s] to %zu bytes (%s)\n", this, filePath.c_str(), _numBytes, strerror(errno));
}
else printf("FixedSizeDataBuffer %p: Unable to pad backing-store file [%s] out to %zu bytes (%s)\n", this, filePath.c_str(), _numBytes, strerror(fallocRet));
}
else printf("FixedSizeDataBuffer %p: Unable to unlink backing-store file [%s] (%s)\n", this, filePath.c_str(), strerror(errno));
close(fd); // no need to hold this anymore AFAIK, the memory-mapping itself will keep the backing store around
}
else printf("FixedSizeDataBuffer %p: Unable to create backing-store file [%s] (%s)\n", this, filePath.c_str(), strerror(errno));
}
I can rewrite this code to just use plain-old-file-I/O if I have to, but it would be nice if mmap() could do the job (or if not, I'd at least like to understand why not).
After much further experimentation, I determined that the OOM-killer was visiting me not because the system had run out of RAM, but because RAM would occasionally become sufficiently fragmented that the kernel couldn't find a set of physically-contiguous RAM pages large enough to meet its immediate needs. When this happened, the kernel would invoke the OOM-killer to free up some RAM to avoid a kernel panic, which is all well and good for the kernel but not so great when it kills a process that the user was relying on to get his work done. :/
After trying and failing to find a way to convince Linux not to do that (I think enabling a swap partition would avoid the OOM-killer, but doing that is not an option for me on these particular machines), I came up with a hack work-around; I added some code to my program that periodically checks the amount of memory fragmentation reported by the Linux kernel, and if the memory fragmentation starts looking too severe, preemptively orders a memory-defragmentation to occur, so that the OOM-killer will (hopefully) not become necessary. If the memory-defragmentation pass doesn't appear to be improving matters any, then after 20 consecutive attempts, we also drop the VM Page cache as a way to free up contiguous physical RAM. This is all very ugly, but not as ugly as getting a phone call at 3AM from a user who wants to know why their server program just crashed. :/
The gist of the work-around implementation is below; note that DefragTick(Milliseconds) is expected to be called periodically (preferably once per second).
// Returns how safe we are from the fragmentation-based-OOM-killer visits.
// Returns -1 if we can't read the data for some reason.
static int GetFragmentationSafetyLevel()
{
int ret = -1;
FILE * fpIn = fopen("/sys/kernel/debug/extfrag/extfrag_index", "r");
if (fpIn)
{
char buf[512];
while(fgets(buf, sizeof(buf), fpIn))
{
const char * dma = (strncmp(buf, "Node 0, zone", 12) == 0) ? strstr(buf+12, "DMA") : NULL;
if (dma)
{
// dma= e.g.: "DMA -1.000 -1.000 -1.000 -1.000 0.852 0.926 0.963 0.982 0.991 0.996 0.998 0.999 1.000 1.000"
const char * s = dma+4; // skip past "DMA ";
ret = 0; // ret now becomes a count of "safe values in a row"; a safe value is any number less than 0.500, per me
while((s)&&((*s == '-')||(*s == '.')||(isdigit(*s))))
{
const float fVal = atof(s);
if (fVal < 0.500f)
{
ret++;
// Advance (s) to the next number in the list
const char * space = strchr(s, ' '); // to the next space
s = space ? (space+1) : NULL;
}
else break; // oops, a dangerous value! Run away!
}
}
}
fclose(fpIn);
}
return ret;
}
// should be called periodically (e.g. once per second)
void DefragTick(Milliseconds current_time_in_milliseconds)
{
if ((current_time_in_milliseconds-m_last_fragmentation_check_time) >= Milliseconds(1000))
{
m_last_fragmentation_check_time = current_time_in_milliseconds;
const int fragmentationSafetyLevel = GetFragmentationSafetyLevel();
if (fragmentationSafetyLevel < 9)
{
m_defrag_pending = true; // trouble seems to start at level 8
m_fragged_count++; // note that we still seem fragmented
}
else m_fragged_count = 0; // we're in the clear!
if ((m_defrag_pending)&&((current_time_in_milliseconds-m_last_defrag_time) >= Milliseconds(5000)))
{
if (m_fragged_count >= 20)
{
// FogBugz #17882
FILE * fpOut = fopen("/proc/sys/vm/drop_caches", "w");
if (fpOut)
{
const char * warningText = "Persistent Memory fragmentation detected -- dropping filesystem PageCache to improve defragmentation.";
printf("%s (fragged count is %i)\n", warningText, m_fragged_count);
fprintf(fpOut, "3");
fclose(fpOut);
m_fragged_count = 0;
}
else
{
const char * errorText = "Couldn't open /proc/sys/vm/drop_caches to drop filesystem PageCache!";
printf("%s\n", errorText);
}
}
FILE * fpOut = fopen("/proc/sys/vm/compact_memory", "w");
if (fpOut)
{
const char * warningText = "Memory fragmentation detected -- ordering a defragmentation to avoid the OOM-killer.";
printf("%s (fragged count is %i)\n", warningText, m_fragged_count);
fprintf(fpOut, "1");
fclose(fpOut);
m_defrag_pending = false;
m_last_defrag_time = current_time_in_milliseconds;
}
else
{
const char * errorText = "Couldn't open /proc/sys/vm/compact_memory to trigger a memory-defragmentation!";
printf("%s\n", errorText);
}
}
}
}

Windows Filtering Platform Network Slowdown Due to Spinlock

I am writing a Windows Filtering Platform Kernel Mode Driver, the goal of the driver is to capture all traffic on a particular layer, and communicate this traffic back down to user-mode so that it can be further analyses. The driver never needs to block any traffic, the classifyOut is always set to FWP_ACTION_CONTINUE.
The following code is used in my Classify function to queue up the packets that are received.
classifyOut->actionType = FWP_ACTION_CONTINUE;
do
{
if ((classifyOut->rights & FWPS_RIGHT_ACTION_WRITE) == 0)
{
break;
}
if (layerData != NULL)
{
PNET_BUFFER_LIST netBufferList = (PNET_BUFFER_LIST) layerData;
PNET_BUFFER netBuffer = NET_BUFFER_LIST_FIRST_NB(netBufferList);
if (packetQueueSize >= 2048)
{
ExInterlockedRemoveHeadList(&packetQueue, &packetQueueLock);
packetQueueSize--;
}
ULONG netBufferSize = NET_BUFFER_DATA_LENGTH(netBuffer);
PACKET_ITEM* allocatedPacket = InitalizePacketItem(
netBuffer,
netBufferSize
);
if (allocatedPacket == NULL)
{
classifyOut->actionType = FWP_ACTION_BLOCK;
classifyOut->rights &= ~FWPS_RIGHT_ACTION_WRITE;
break;
}
ExInterlockedInsertTailList(
&packetQueue,
&allocatedPacket->listEntry,
&packetQueueLock
);
allocatedPacket = NULL;
packetQueueSize++;
}
} while (FALSE);
The PACKET_ITEM struct is defined as the following
typedef struct _PACKET_ITEM {
LIST_ENTRY listEntry;
PVOID data;
ULONG dataLen;
} PACKET_ITEM;
I am using the inverted call model to communicate this packet data from kernel mode to user mode. The following code is used in the kernel driver once it detects the correct IOCTL has been sent.
status = WdfRequestRetrieveOutputBuffer(request, 0, &buffer, &bufferSize);
if (!NT_SUCCESS(status))
{
break;
}
PLIST_ENTRY listEntry = ExInterlockedRemoveHeadList(&packetQueue, &packetQueueLock);
if (listEntry == NULL)
{
break;
}
PACKET_ITEM* packetItem = CONTAINING_RECORD(
listEntry,
struct _PACKET_ITEM,
listEntry
);
RtlCopyMemory(
buffer,
packetItem->data,
packetItem->dataLen);
status = STATUS_SUCCESS;
WdfRequestCompleteWithInformation(
request,
status,
packetItem->dataLen
);
FreePacketItem(packetItem);
This code seems to slow the network down greatly after a short while, causing timeouts when trying to load websites in a web browser, for example.
I assume this is being caused by the spinlocks and the sheer volume of packets being transferred across the network that are being captured by this driver.
My questions are the following
Is it likely the spinlock is definitely causing my problems here? If so
Is it possible to set the classifyOut->actionType immediately and return this value before allocating any memory to copy the data into my queue. I assume this would prevent the slow down from happening?
What else should I be doing differently to prevent this?
If not,
What is causing the issue with the slow down here?

ALSA - Retrieving audio buffer timestamps

I have a simple C program that plays audio using the ALSA APIs and I wish to know the precise timing of the audio buffers.
I am attempting to retrieve the timestamps from the audio driver using ALSA's snd_pcm_htimestamp functionality, which returns two values - a timestamp and a frame count.
However, the timestamp returned from ALSA is unset (zero values). The second returned variable, the "number of available frames when timestamp was grabbed", looks to be set correctly. Does anyone have an idea as to why the timestamps are seemingly unset?
I am configuring timestamps to be activated in my setup like so:
err = snd_pcm_sw_params_set_tstamp_mode(pcmHandle, swparams, SND_PCM_TSTAMP_ENABLE);
if (err < 0) {
printf("Unable to set timestamp mode: %s\n", snd_strerror(err));
return err;
}
And I verify that it has been set:
snd_pcm_tstamp_t timestampMode;
err = snd_pcm_sw_params_get_tstamp_mode(swparams, &timestampMode);
if (timestampMode != SND_PCM_TSTAMP_ENABLE)
{
// error ...
}
Then in the program's main while loop, after I feed ALSA with samples using snd_pcm_writei, I attempt to obtain that buffer's timestamp like so:
snd_pcm_writei(pcmHandle, samples, frameCount);
snd_htimestamp_t ts;
snd_pcm_uframes_t avail;
err = snd_pcm_htimestamp(pcmHandle, &avail, &ts);
if (err < 0)
{
printf("Unable to get timestamp: %s\n", snd_strerror(err));
return err;
}
printf("avail: %d\n", avail);
printf("%lld.%.9ld", (long long)ts.tv_sec, ts.tv_nsec);
However, whilst avail seems to be set, ts is always 0.000000000.
I am on a Raspberry Pi running Raspbian with an ADA1475 audio interface.
Thanks in advance,
Andy
The change to swparams must be applied to the PCM interface with snd_pcm_sw_params().
/* Allocate a temporary swparams struct */
snd_pcm_sw_params_t *swparams;
snd_pcm_sw_params_alloca(&swparams);
/* Retrieve current SW parameters. */
snd_pcm_sw_params_current(pcmHandle, swparams);
/* Change software parameters. */
snd_pcm_sw_params_get_tstamp_mode(swparams, SND_PCM_TSTAMP_ENABLE);
snd_pcm_sw_params_set_tstamp_type(pcmHandle, swparams, SND_PCM_TSTAMP_TYPE_GETTIMEOFDAY);
/* Apply updated software parameters to PCM interface. */
snd_pcm_sw_params(pcmHandle, swparams); // <-- Change takes effect here.
ALSA allows software parameters to be changed at any time, even while the stream is running.

Libav and xaudio2 - audio not playing

I am trying to get audio playing with libav using xaudio2. The xaudio2 code I am using works with an older ffmpeg using avcodec_decode_audio2, but that has been deprecated for avcodec_decode_audio4. I have tried following various libav examples, but can't seem to get the audio to play. Video plays fine (or rather it just plays right fast now, as I haven't coded any sync code yet).
Firstly audio gets init, no errors, video gets init, then packet:
while (1) {
//is this packet from the video or audio stream?
if (packet.stream_index == player.v_id) {
add_video_to_queue(&packet);
} else if (packet.stream_index == player.a_id) {
add_sound_to_queue(&packet);
} else {
av_free_packet(&packet);
}
}
Then in add_sound_to_queue:
int add_sound_to_queue(AVPacket * packet) {
AVFrame *decoded_frame = NULL;
int done = AVCODEC_MAX_AUDIO_FRAME_SIZE;
int got_frame = 0;
if (!decoded_frame) {
if (!(decoded_frame = avcodec_alloc_frame())) {
printf("[ADD_SOUND_TO_QUEUE] Out of memory\n");
return -1;
}
} else {
avcodec_get_frame_defaults(decoded_frame);
}
if (avcodec_decode_audio4(player.av_acodecctx, decoded_frame, &got_frame, packet) < 0) {
printf("[ADD_SOUND_TO_QUEUE] Error in decoding audio\n");
av_free_packet(packet);
//continue;
return -1;
}
if (got_frame) {
int data_size;
if (packet->size > done) {
data_size = done;
} else {
data_size = packet->size;
}
BYTE * snd = (BYTE *)malloc( data_size * sizeof(BYTE));
XMemCpy(snd,
AudioBytes,
data_size * sizeof(BYTE)
);
XMemSet(&g_SoundBuffer,0,sizeof(XAUDIO2_BUFFER));
g_SoundBuffer.AudioBytes = data_size;
g_SoundBuffer.pAudioData = snd;
g_SoundBuffer.pContext = (VOID*)snd;
XAUDIO2_VOICE_STATE state;
while( g_pSourceVoice->GetState( &state ), state.BuffersQueued > 60 ) {
WaitForSingleObject( XAudio2_Notifier.hBufferEndEvent, INFINITE );
}
g_pSourceVoice->SubmitSourceBuffer( &g_SoundBuffer );
}
return 0;
}
I can't seem to figure out the problem, I have added error messages in init, opening video, codec handling etc. As mentioned before the xaudio2 code is working with an older ffmpeg, so maybe I have missed something with the avcodec_decode_audio4?
If this snappet of code isn't enough, I can post the whole code, these are just the places in the code I think the problem would be :(
I don't see you accessing decoded_frame anywhere after decoding. How do you expect to get the data out otherwise?
BYTE * snd = (BYTE *)malloc( data_size * sizeof(BYTE));
This also looks very fishy, given that data_size is derived from the packet size. The packet size is the size of the compressed data, it has very little to do with the size of the decoded PCM frame.
The decoded data is located in decoded_frame->extended_data, which is an array of pointers to data planes, see here for details. The size of the decoded data is determined by decoded_frame->nb_samples. Note that with recent Libav versions, many decoders return planar audio, so different channels live in different data buffers. For many use cases you need to convert that to interleaved format, where there's just one buffer with all the channels. Use libavresample for that.

Embedded Linux poll() returns constantly

I have a particular problem. Poll keeps returning when I know there is nothing to read.
So the setup it as follows, I have 2 File Descriptors which form part of a fd set that poll watches. One is for a Pin high to low change (GPIO). The other is for a proxy input. The problem occurs with the Proxy Input.
The order of processing is: start main functions; it will then poll; write data to proxy; poll will break; accept the data; send the data over SPI; receiving slave device, signals that it wants to send ack, by Dropping GPIO low; poll() senses this drop and reacts;
Infinite POLLINs :(
IF I have no timeout on the Poll function, the program works perfectly. The moment I include a timeout on the Poll. The Poll returns continuously. Not sure what I am doing wrong here.
while(1)
{
memset((void*)fdset, 0, sizeof(fdset));
fdset[0].fd = gpio_fd;
fdset[0].events = POLLPRI; // POLLPRI - There is urgent data to read
fdset[1].fd = proxy_rx;
fdset[1].events = POLLIN; // POLLIN - There is data to read
rc = poll(fdset, nfds, 1000);//POLL_TIMEOUT);
if (rc < 0) // Error
{
printf("\npoll() failed/Interrupted!\n");
}
else if (rc == 0) // Timeout occurred
{
printf(" poll() timeout\n");
}
else
{
if (fdset[1].revents & POLLIN)
{
printf("fdset[1].revents & POLLIN\n");
if ((resultR =read(fdset[1].fd,command_buf,10))<0)
{
printf("Failed to read Data\n");
}
if (fdset[0].revents & POLLPRI)
//if( (gpio_fd != -1) && (FD_ISSET(gpio_fd, &err)))
{
lseek(fdset[0].fd, 0, SEEK_SET); // Read from the start of the file
len = read(fdset[0].fd, reader, 64);
}
}
}
}
So that is the gist of my code.
I have also used GDB and while debugging, I found that the GPIO descriptor was set with revents = 0x10, which means that an error occurred and that POLLPRI also occurred.
In this question, something similar was addressed. But I do read all the time whenever I get POLLIN. It is a bit amazing, that this problem only occurs when I include the timeout, if I replace the poll timeout with -1, it works perfectly.
When poll fails (returning -1) you should do something with errno, perhaps thru perror; and your nfds (the second argument to poll) is not set, but it should be the constant 2.
Probably the GCC compiler would have given a warning, at least with all warnings enabled (-Wall), about nfds not being set.
(I'm guessing that nfds being uninitialized might be some "random" large value.... So the kernel might be polling other "random" file descriptors, those in your fdset after index 2...)
BTW, you could strace your program. And using the fdset name is a bit confusing (it could refer to select(2)).
Assuming I fixed your formatting properly in your question, it looks like you have a missing } after the POLLIN block and the next if() that checks the POLLPRI. It would possibly work better this way:
if (fdset[1].revents & POLLIN)
{
printf("fdset[1].revents & POLLIN\n");
if ((resultR =read(fdset[1].fd,command_buf,10))<0)
{
printf("Failed to read Data\n");
}
}
if (fdset[0].revents & POLLPRI)
//if( (gpio_fd != -1) && (FD_ISSET(gpio_fd, &err)))
{
lseek(fdset[0].fd, 0, SEEK_SET); // Read from the start of the file
len = read(fdset[0].fd, reader, 64);
}
Although you can do whatever you want with indentation in C/C++/Java/JavaScript, not doing it right can bite you really hard. Hopefully, I'm wrong and your original code was correct.
Another one I often see: People not using the { ... } at all and end up writing code like:
if(expr) do_a; do_b;
and of course, do_b; will be executed all the time, whether expr is true or false... and although you could fix the above with a comma like so:
if(expr) do_a, do_b;
the only safe way to do it right is to use the brackets:
if(expr)
{
do_a;
do_b;
}
Always make sure your indentation is perfect and write small functions so you can see that it is indeed perfect.

Resources