How does newTempHashMap work in mapdb? Is it just a file-backed hash map or is there an in-memory caching layer?
It creates file in temporary folder. File gets deleted when JVM exits.
Related
I have a memory mapped file opened in my Java program and I keep on writing new data into it.
Generally, the OS takes care of flushing the contents to the disk.
If for some reason, some other process wants to copy the file to another location (possibly do an rsync), then do I get the latest contents of that file (i.e. the contents that are there at that instant of time in the memory map?)
Typically, I feel that if a copy on a file that is memory mapped by a process is triggered, either of these 2 should happen
The contents of the file should be flushed to disk, so that other process sees that latest content of the file.
When a copy is triggered, it is generally copied to memory (when we are doing something like rsync) and then the contents will be written to the destination file from the memory. So, if they are to be copied from the memory, then the pages of the file are already in memory, since the other process is using it (memory mapped), so this page will be accessed and there is no need to flush to disk.
What happens exactly? Is it something other than the above mentioned?
Is it same behavior for both Windows and Linux?
We are working on an iMX6Sx Freescale board, building the Linux kernel distro with Yocto.
I would like to know if there is a way to check if it is possible to check if file system operations (in particular, write) are really terminated, avoiding to close/kill a process while operations are still running.
To be more clear: we have to do some actions (copy of files, writes, ..) when our application has to switch-off and we have to know (since they are asynchronus I think) when they're are really completed.
Thanks in advance
Andrea
If you want to ensure all the writes are commited to storage and the filesystem is updated:
call fsync() on the file descriptor,
open the parent directory and call fsync() on that file descriptor
When both of these are done, the kernel has flushed everything from memory and ensured the filesystem is updated regarding the file you operate on.
Another approach is to call sync(), which ensures all kernel data are written to storage for all files and filesystem metadata.
Note:
if your application are working with FILE* instead of file descriptors, you need to first ensure written data are flushed from your application to the kernel, either by calling fflush() or fclose the FILE*
If you kill an application, any write operation it has performed will not be cancelled or interrupted, and you can make sure it's committed to storage by calling sync() or open the same file and call fsync() on it.
If you kill an application arbitrarily you can't expect everything to be consistent, perhaps the application was doing 2 important writes to a database, config file, etc. and you terminated it after the 1 write, the file might be damaged according to its format.
How do backup programs make sure they get a consistent copy of a file, when file locks in linux mostly are advisory?
For example if some other process do not respect file locks and writes to a file, how can I create a consistent copy of that file?
This is quite an interesting topic, the modern way seems to be to use a filesystem snapshot; another way is to use a block-device snapshot.
In any case, some kind of snapshot is the best solution. Zfs has snapshots (but is not available as a "first class" filesystem under Linux), as does btrfs (which is quite new).
Alternatively, a LVM volume can have a block-level snapshot taken (which can then be mounted readonly in a temporary location while a backup is taken).
If you had mandatory file locks, then a backup program would disrupt normal operation of (for example) a database so that it was not able to work correctly. Moreover, unless there was a mechanism to atomically take a mandatory lock on every file in the filesystem, there would be no way to take a consistent backup (i.e. with every file as it was at the same moment).
I hope you've all seen the wonderful site, Linux Ate My Ram. This is usually great, but it presents a problem for me. I have a secure file that I'm decrypting with gpg and then reading into memory to process. The unencrypted file is deleted a short time later, but I do NOT want that decrypted file to be saved in Linux's in-memory file cache.
Is there a way to explicitly prevent a file from being saved from Linux's cache?
Thanks!
Use gpg -d, which will cause GPG to output the file to STDOUT, so then you can have it all in memory.
Depending on how paranoid you are, you may want to use mlock as well.
If you really, really need gpg's output to be a file, you could put that file on a ramfs file system. The file's contents will only exist in non-swappable memory pages.
You can attach a ramfs file system to your tree by running (as root):
mount none /your/mnt/point -t ramfs
You may have also heard of tmpfs. It's similar in that its files have no permanent storage and generally exist only in RAM. However, for your use, you want to avoid this file system because tmpfs files can be swapped to disk.
Sure. Shred the file as you delete it.
shred -u $FILE
Granted, it doesn't directly answer your question, but I still think it's a solution---whatever's living in the cache is now randomly-generated garbage. :-)
Shred documentation says shred is "not guaranteed to be effective" (See bottom). So if I shred a document on my Ext3 filesystem or on a Raid, what happens? Do I shred part of the file? Does it sometimes shred the whole thing and sometimes not? Can it shred other stuff? Does it only shred the file header?
CAUTION: Note that shred relies on a very important assumption:
that the file system overwrites data in place. This is the
traditional way to do things, but many modern file system designs
do not satisfy this assumption. The following are examples of file
systems on which shred is not effective, or is not guaranteed to be
effective in all file sys‐ tem modes:
log-structured or journaled file systems, such as those supplied with AIX and Solaris (and JFS, ReiserFS, XFS, Ext3, etc.)
file systems that write redundant data and carry on even if some writes fail, such as RAID-based file systems
file systems that make snapshots, such as Network Appliance’s NFS server
file systems that cache in temporary locations, such as NFS version 3 clients
compressed file systems
In the case of ext3 file systems, the above disclaimer applies
(and shred is thus of limited effectiveness) only in data=journal
mode, which journals file data in addition to just metadata. In
both the data=ordered (default) and data=writeback modes, shred
works as usual. Ext3 journaling modes can be changed by adding
the data=something option to the mount options for a
particular file system in the /etc/fstab file, as documented in the
mount man page (man mount).
All shred does is overwrite, flush, check success, and repeat. It does absolutely nothing to find out whether overwriting a file actually results in the blocks which contained the original data being overwritten. This is because without knowing non-standard things about the underlying filesystem, it can't.
So, journaling filesystems won't overwrite the original blocks in place, because that would stop them recovering cleanly from errors where the change is half-written. If data is journaled, then each pass of shred might be written to a new location on disk, in which case nothing is shredded.
RAID filesystems (depending on the RAID mode) might not overwrite all of the copies of the original blocks. If there's redundancy, you might shred one disk but not the other(s), or you might find that different passes have affected different disks such that each disk is partly shredded.
On any filesystem, the disk hardware itself might just so happen to detect an error (or, in the case of flash, apply wear-leveling even without an error) and remap the logical block to a different physical block, such that the original is marked faulty (or unused) but never overwritten.
Compressed filesystems might not overwrite the original blocks, because the data with which shred overwrites is either random or extremely compressible on each pass, and either one might cause the file to radically change its compressed size and hence be relocated. NTFS stores small files in the MFT, and when shred rounds up the filesize to a multiple of one block, its first "overwrite" will typically cause the file to be relocated out to a new location, which will then be pointlessly shredded leaving the little MFT slot untouched.
Shred can't detect any of these conditions (unless you have a special implementation which directly addresses your fs and block driver - I don't know whether any such things actually exist). That's why it's more reliable when used on a whole disk than on a filesystem.
Shred never shreds "other stuff" in the sense of other files. In some of the cases above it shreds previously-unallocated blocks instead of the blocks which contain your data. It also doesn't shred any metadata in the filesystem (which I guess is what you mean by "file header"). The -u option does attempt to overwrite the file name, by renaming to a new name of the same length and then shortening that one character at a time down to 1 char, prior to deleting the file. You can see this in action if you specify -v too.
The other answers have already done a good job of explaining why shred may not be able to do its job properly.
This can be summarised as:
shred only works on partitions, not individual files
As explained in the other answers, if you shred a single file:
there is no guarantee the actual data is really overwritten, because the filesystem may send writes to the same file to different locations on disk
there is no guarantee the fs did not create copies of the data elsewhere
the fs might even decide to "optimize away" your writes, because you are writing the same file repeatedly (syncing is supposed to prevent this, but again: no guarantee)
But even if you know that your filesystem does not do any of the nasty things above, you also have to consider that many applications will automatically create copies of file data:
crash recovery files which word processors, editors (such as vim) etc. will write periodically
thumbnail/preview files in file managers (sometimes even for non-imagefiles)
temporary files that many applications use
So, short of checking every single binary you use to work with your data, it might have been copied right, left & center without you knowing. The only realistic way is to always shred complete partitions (or disks).
The concern is that data might exist on more than one place on the disk. When the data exists in exactly one location, then shred can deterministically "erase" that information. However, file systems that journal or other advanced file systems may write your file's data in multiple locations, temporarily, on the disk. Shred -- after the fact -- has no way of knowing about this and has no way of knowing where the data may have been temporarily written to disk. Thus, it has no way of erasing or overwriting those disk sectors.
Imagine this: You write a file to disk on a journaled file system that journals not just metadata but also the file data. The file data is temporarily written to the journal, and then written to its final location. Now you use shred on the file. The final location where the data was written can be safely overwritten with shred. However, shred would have to have some way of guaranteeing that the sectors in the journal that temporarily contained your file's contents are also overwritten to be able to promise that your file is truly not recoverable. Imagine a file system where the journal is not even in a fixed location or of a fixed length.
If you are using shred, then you're trying to ensure that there is no possible way your data could be reconstructed. The authors of shred are being honest that there are some conditions beyond their control where they cannot make this guarantee.