Prevent backup reads from getting into linux page cache - linux

AFAIK all disk reads on linux get into the page cache.
Is there a way to prevent reads (done by a backup process) to get in to the page cache?
Imagine:
A server runs fine, since most operations don't need to touch the disk, since enough memory is available.
Now the backup process starts and does a lot of reading. The read bytes get into the memory (page cache) although nobody wants to read the same bytes again in the next hours.
The backup data fills up the memory and more important pages from the cache get dropped.
Server performance gets worse since more operations need to touch the disk, because the relevant pages were dropped from the cache.
My preferred solution:
Tell linux that the reads done by the backup process don't need to be stored in the page cache.

if you re using rsync there is the flag --drop-cache according to this question
the nocache utility which
minimize the effect an application has on the Linux file system cache
Use case: backup processes that should not interfere with the present state of the cache.
using dd there is direct I/O to bybass cache according to this question
the dd also has the option nocache option check the command info coreutils 'dd invocation'for details

Related

Why does IBM MQ do triple write in the recovery log?

As far as I know, IBM uses triple write to persistent recovery log with steps as below:
assuming the page(4k) already has 1k data in it
loads the page from disk first
modify the page to add contents(3k) and write it to some other place
write the page to disk
if the page was broke, then it can restore it from the backup when recover from power down.
It's used to prevent partial write as claimed, cause it might break the original data in the page if the machine crash when writing data directly and the FS/disk don't support atomic write.
My question is why not just write append only log like WAL instead of inplace update?

How to flush the dirty pages of OS file cache to disks?

Linux exploits remaining memory for its file cache
In my application (written in C++), I'd like to flush the dirty pages to disks explicitly from time to time
(Using O_DIRECT is not appropriate for me)
I tried fflush(), but it seems not what I wanted
Is there any way to flush the dirty pages of OS file cache to disks?
Thanks
You can use sync_file_range() to encourage flushing on Linux but confusingly you can't use sync_file_range() to guarantee file durability/data integrity - it is simply a hint that might help get flushing underway (see this Linux Plumbers Conference 2019 video of Postgres developer Andres Freund complaining about the sync_file_range()'s manpage and the reply from filesystem developer Jan Kara). In short: it can help trigger flushing but you'll need to add/use something else to know durability.
I believe all the usual file descriptor sync style calls (fsync(), fdatasync() etc.) also hint that you want writeback to start occurring but in a more heavy handed fashion compared to sync_file_range() (because they also force flushing of device non-volatile caches too)...

Why does w3wp memory keeps increasing?

I am on a medium instance which has 3GB of RAM. When I start my webapp the w3wp process starts with say 80MB. I notice that the more time passes this goes up and up.... Now I took a memory dump of the process when it was 570MB and the site was running for 5 days, to see whether there were any .NET objects which were consuming a lot but found out that the largest object was 18MB which were a set of string objects.
I am not using any cache objects since I'm using redis for my session storage, and in actual fact the dump showed that there was nothing in the cache.
Now my question is the following... I am thinking that since I have 3GB of memory IIS will retain some pages in memory (cached) so the website is faster whenever there are requests and that is the reason why the memory keeps increasing. What I'm concerned is that I am having some memory leak in some way, even if I am disposing all EntityFramework objects when being used, or any other appropriate streams which need to be disposed. When some specific threshold is reached I am assuming that old cached data which was in memory gets removed and new pages are included. Am I right in saying this?
I want to point out that in the past I had been on a small instance and the % never went more than 70% and now I am on medium instance and the memory is already 60%.... very very strange with the same code.
I can send memory dump if anyone would like to help me out.
There is an issue that is affecting a small number of Web Apps, and that we're working on patching.
There is a workaround if you are hitting this particular issue:
Go to Kudu Console for your app (e.g. https://{yourapp}.scm.azurewebsites.net/DebugConsole)
Go into the LogFiles folder. If you are running into this issue, you will have a very large eventlog.xml file
Make that file readonly, by running attrib +r eventlog.xml
Optionally, restart your Web App so you have a clean w3wp
Monitor whether the usage still goes up
The one downside is that you'll no longer get those events generated, but in most cases they are not needed (and this is temporary).
The problem has been identified, but we don't have an ETA for the deployment yet.

Encrypted filesystems and the Linux buffer cache

I currently use Berkeley DBs fronted by a Java server for a high-performance disk-backed cache. Provided you warm it up before allowing it to face live traffic, your update rate is low, and your working set fits in memory, the Linux buffer cache does an excellent job. It's measurably faster than memcache, in part because you don't need to context switch to the memcached and back on read. We're very happy with the performance.
We're going to be adding some data to the cache that we're not comfortable leaving on disk in plain text. We've measured and are unhappy with the performance of decrypting during request processing, so we're looking for solutions that decrypt only when the data is loaded from disk and then keep it available in memory.
Before building something that does this, I wanted to find out if we can simply slide in an encrypted filesystem and continue to rely on the OS to manage the cache for us. I haven't found any documentation that tells me at what layer the decryption is done.
So my question is: Can anyone tell me, for any particular Linux encrypted FS, whether the (en|de)cryption is done below the buffer cache (and therefore the cache contains plaintext) or above (and the cache contains ciphertext)?
The buffer cache sits below the actual filesystem, so it will cache encrypted data. See the diagram at IBM's Anatomy of a Filesystem. Since you want to cache unencrypted data, so long as your encrypted filesystem was created using the 'loop' device the buffer cache will also contain an unencrypted copy of your data, and so it should be fast (at the cost of more memory for FS buffers in-use).
I haven't played with this, but am pretty sure that buffer cache and VM are not aware of the encryption, so you should see comparable performance with your usage.

Should I fsck ext3 on embedded system?

We have a number of embedded systems requiring r/w access to the filesystem which resides on flash storage with block device emulation. Our oldest platform runs on compact flash and these systems have been in use for over 3 years without a single fsck being run during bootup and so far we have no failures attributed to the filesystem or CF.
On our newest platform we used USB-flash for the initial production and are now migrating to Disk-on-Module for r/w storage. A while back we had some issues with the filesystem on a lot of the devices running on USB-storage so I enabled e2fsck in order to see if that would help. As it turned out we had received a shipment of bad flash memories so once those were replaced the problem went away. I have since disabled e2fsck since we had no indication that it made the system any more reliable and historically we have been fine without it.
Now that we have started putting in Disk-on-Module units I've started seeing filesystem errors again. Suddenly the system is unable to read/write certain files and if I try to access the file from the emergency console I just get "Input/output error". I enabled e2fsck again and all the files were corrected.
O'Reilly's "Building Embedded Linux Systems" recommends running e2fsck on ext2 filesystems but does not mention it in relation to ext3 so I'm a bit confused to whether I should enable it or not.
What are your takes on running fsck on an embedded system? We are considering putting binaries on a r/o partition and only the files which has to be modified on a r/w partition on the same flash device so that fsck can never accidentally delete important system binaries, does anyone have any experience with that kind of setup (good/bad)?
I think the answer to your question more relates to what types of coherency requirements you application has relative to its data. That is, what has to be guaranteed if power is lost without a formal shutdown of the system? In general, none of the desktop operating system type file systems handle this all that well without specific application closing/syncing of files and flushing of the disk caches, etc. at key transaction points in the application to ensure what you need to maintain is in fact committed to the media.
Running fsck fixes the file-system but without the above care, there is no guarantees about what changes you made will actually be kept. ie: It's not exactly deterministic what you'll lose as a result of the power failure.
I agree that putting your binaries or other important read-only data on a separate read-only partition does help ensure that they can't erroneously get tossed due to an fsck correction to file-system structures. As a minimum, putting them in a different sub-directory off the root than where the R/W data is held will help. But in both cases, if you support software updates, you still need to have scheme to deal with writing the "read-only" areas anyway.
In our application, we actually maintain a pair of directories for things like binaries and the system is setup to boot from either one of the two areas. During software updates, we update the first directory, sync everything to the media and verify the MD5 checksums on disk before moving onto the second copy's update. During boot, they are only used if the MD5 checksum is good. This ensures that you are booting a coherent image always.
Dave,
I always recommend running the fsck after a number of reboots, but not every time.
The reason is that, the ext3 is journal-ed. So unless you enable the writeback (journal-less), then most of the time, your metadata/file-system table should be in sync with your data (files).
But like Jeff mentioned, it doesn't guarantee the layer above the file-system. It means, you still get "corrupted" files, because some of the records probably didn't get written to the file system.
I'm not sure what embedded device you're running on, but how often does it get rebooted?
If it's controlled reboot, you can always do "sync;sync;sync" before restart.
I've been using the CF myself for years, and very rare occasion I got file-system errors.
fsck does help on that case.
And about separating your partition, I doubt the advantage of it. For every data/files on the file-system, there's a metadata associated with it. Most of the time, if you don't change the files, eg. binary/system files, then this metadata shouldn't change. Unless you have a faulty hardware, like cross-talking write & read, those read-only files should be safe.
Most problems arises when you have something writable, and regardless where you put this, it can cause problems if the application doesn't handle it well.
Hope that helps.

Resources