Fstab cache seems funky, filling up - linux

We have an Amazon Server with an S3 mounted partition. I like to think the mount is working, but the directory specified under the use_cache directive is filling up very rapidly, and is not shrinking back down, is this normal?
The config in fstab is s3fs#filemanager /home/user/mounts/FileManager fuse user,use_cache=/home/user/tmp,allow_other,uid=NN,gid=NNN 0 0
Both the mounted directory and the cache are growing at the same rate. Am I doing it wrong?

From the documentation:
If enabled via "use_cache" option, s3fs automatically maintains a local cache of files in the folder specified by use_cache. Whenever s3fs needs to read or write a file on s3 it first downloads the entire file locally to the folder specified by use_cache and operates on it. When fuse release() is called, s3fs will re-upload the file to s3 if it has been changed.
The folder specified by use_cache is just a local cache. It can be deleted at any time. s3fs re-builds it on demand. Note: this directory grows unbounded and can fill up a file system dependent upon the bucket and reads to that bucket. Take precaution by using a quota system or routinely clearing the cache (or some other method).

Related

AWS Lambda with Node - saving files into Lambda's file system

I need to save files I get from S3 into a Lambda's file system and I wanted to know if I can do that simply using fs.writeFileSync ?
Or do I have to still use the context function as described here:
How to Write and Read files to Lambda-AWS with Node.js
(tried to find newer examples, but could not).
What is the recommended method?
Please advise.
Yes, you can use the typical fs functions to read/write from local disk, but be aware that writing is limited to the /tmp directory and the default max diskspace available to your Lambda function in that location is 512 MB. Also note that files written there may persist to the next (warm) Lambda invocation.
If you want to simply download an object from S3 to the local disk (assuming it will fit in the available diskspace) then you can combine AWS SDK methods and Node.js streaming to stream the content to disk.
Also, it's worth noting that, depending on your app, you may be able to process the entire S3 object in RAM via streaming, without any need to actually persist to disk. This is helpful if your object size is over 512MB.
Update: as of March 2022, you can now configure Lambda functions with more ephemeral storage in /tmp, up to 10 GB. You get 512 MB included with your Lambda function invocation and are charged for the additional configured storage above 512 MB.
If you need to persist very large files, consider using Elastic File System.
Lambda does not allow access to the local file system. It is mean to be an ephemeral environment. They allow access to the /tmp folder, but only for a maximum of 512MB. If you want to have storage along with your function, you will need to implement AWS S3 or AWS EFS.
Here's an article from AWS explaining this.
Here's the docs on adding storage to Lambda.

Prevent reading and writing to directory or mount point when block storage is not mounted

I have an application that depends on the data from a mount point, e.g. /mnt/db
What I need to prevent is any read/write to the directory when the actual block storage is not mounted, that is, without modifying the application to do the check before doing the file system write in which I have no control of. Usually happens during machine restarts and similar reboots.
In what way this can be achieved without scripting?

How to prevent file/disk corruption when Linux OS gets shutdown/reset non gracefully?

We have our application running in a Linux VM. Application does lot of read/write with config files in disk along with logging. We often notice when VM gets reset(non-graceful), some of the config files/log files in VM which are in use get corrupted. Are there any file system(we use ext3/4) settings/tuning, fs driver settings we can do to avoid file corruption when a abrupt shutdown/restart happens?
Check this documentation:
https://www.kernel.org/doc/Documentation/filesystems/ext4.txt
In summary, you have tree options to mount your partition:
data=journal
data=ordered
data=writeback
Mounting your partition with "data=journal" option is the safest way of writing data to disk. As you can read in the provided link, with this config option enabled, all data are committed into the journal prior to being written into the main file system.
You can automate that option adding it to your /etc/fstab config file, in the 'options' column.

Linux: will file reads from CIFS be cached in memory?

I am writing a streaming server for linux that reads files from CIFS mounts and sends
them over a socket. Ideally, linux will cache the file in memory so that subsequent
reads will be faster. Is this the case? Can I tell the kernel to cache
network reads ?
Edit: there will be multiple reads, but no writes, on these files.
Thanks!
Update: I've tested this on a CIFS volume, using fadvise POSIX_FADV_WILLNEED to cache the file locally (using linux-ftools on command line). Turns out that the volume needs to be mounted in read-write mode for this to work. In read only mode, the fadvise seems to be ignored. This must have something to do with the samba oplock mechanism.
Subject to the usual cache coherency rules [1] in CIFS, yes, the kernel CIFS client will cache file data.
[1] Roughly, CIFS is uncached in principle, but by taking oplocks the client can cache data more aggressively. For an explanation of CIFS locking, see e.g. the Samba manual at http://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/locking.html . If the client(s) open the files in read only mode, then I suspect the client will use level 2 oplocks, and as there's no conflicting access takes place, multiple clients should be able to have level 2 oplocks for the same files. Only when some client requests write access to the files, will the oplocks be broken.

Append all logs to /var/log

Application scenario:
I have the (normal/permanent) /var/log mounted on an encrypted partition (/dev/LVG/log). /dev/LVG/log is not accessible at boot time, it needs to be manually activated later by su from ssh.
A RAM drive (using tmpfs) is mounted to /var/log at init time (in rc.local).
Once /dev/LVG/log is activated, I need a good way of appending everything in the tmpfs to /dev/LVG/log, before mounting it as /var/log.
Any recommendations on what would be a good way of doing so? Thanks in advance!
The only thing you can do is block until you somehow verify that /var/log is mounted on an encrypted VG, or queue log entries until that happened if your app must start on boot, which could get kind of expensive. You can't be responsible for every other app on the system and I can't see any reason to encrypt boot logs.
Then again, if you know the machine has heap to spare, a log queue that flushed once some event said it was OK to write to disk would seem sensible. That's no more expensive than the history that most shells keep, as long as you take care to avoid floods of events that could fill up the queue.
This does not account for possible log loss, but could with a little imagination.
There is a risk you could lose logging. You might want to try and write your logs to a file in /tmp which is tmpfs and thus in memory. You could then append the content to your encrypted volume and then remove the file in tmp. Of course if your machine failed to boot and went down again tmp would be erased and so you'd lose a good way of working out why.

Resources