Is moving a file safer than deleting it if you want to remove all traces of it? [closed]

Is moving a file safer than deleting it if you want to remove all traces of it? [closed] - security

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I recently accidentally called "rm -rf *" on a directory an deleted some files that I needed. However, I was able to recover most of them using photorec. Apparently, "deleting" a file just removes references to said file and is not truly deleted until it is overwritten by something else.
So if I wanted to remove the file completely, couldn't I just execute
mv myfile.txt /temp/myfile.txt
(or move to external storage)

You should consider using the Linux command shred, which overwrites the target file multiple times before deleting it completely, which makes it 'impossible' to recover the file.
You can read a bit about the shred command here.
Just moving the file does not cover you for good, if you moved it to external storage, the local version of the file is deleted just as it is with the rm command.

No. that won't help either.
A move when going between file systems is really still just a "copy + rm" internally. The original storage location of the file on the "source" media is still there, just marked as available. A moving WITHIN a file system doesn't touch the file bytes at all, it just updates the bookkeeping to say "file X is now in location Y".
To truly wipe a file, you must overwriteall of its bytes. And yet again, technology gets in the way of that - if you're using a solid state storage medium, there is a VERY high chance that writing 'garbage' to the file won't touch the actual transistors the file's stored in, but actually get written somewhere completely different.
For magnetic media, repeated overwriting with alternating 0x00, 0xFF, and random bytes will eventually totally nuke the file. For SSD/flash systems, it either has to offer a "secure erase" option, or you have to smash the chips into dust. For optical media, it's even more complicated. -r media cannot be erased, only destroyed. for -rw, I don't know how many repeated-write cycles are required to truly erase the bits.

No (and not just because moving it somewhere else on your computer is not removing it from the computer). The way to completely remove a file is to completely overwrite the space on the disk where it resided. The linux command shred will accomplish this.

Basically, no, in most file systems you can't guarantee that a file is overwritten without going very low level. Removing a file and/or moving it will only change the pointer to the file, not the files existence in the file system in any way. Even the linux command shred won't guarantee a file's removal in many file systems since it assumes files are overwritten in place.
On SSDs, it's even more likely that your data stays there for a long time, since even if the file system would attempt to overwrite blocks, the SSD will remap to write to a new block (erasing takes a lot of time, if it wrote in place things would be very slow)
In the end, with modern file systems and disks, the best chance you have to have files stored securely is to keep them encrypted to begin with. If they're stored anywhere in clear text, they can be very hard to remove, and recovering an encrypted file from disk (or a backup for that matter) won't be much use to anyone without the encryption key.

Related

Prepend to Very Large File in Fixed Time or Very Fast [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a file that is very large (>500GB) that I want to prepend with a relatively small header (<20KB). Doing commands such as:
cat header bigfile > tmp
mv tmp bigfile
or similar commands (e.g., with sed) are very slow.
What is the fastest method of writing a header to the beginning of an existing large file? I am looking for a solution that can run under CentOS 7.2. It is okay to install packages from CentOS install or updates repo, EPEL, or RPMForge.
It would be great if some method exists that doesn't involve relocating or copying the large amount of data in the bigfile. That is, I'm hoping for a solution that can operate in fixed time for a given header file regardless of the size of the bigfile. If that is too much to ask for, then I'm just asking for the fastest method.
Compiling a helper tool (as in C/C++) or using a scripting language is perfectly acceptable.

Is this something that needs to be done once, to "fix" a design oversight perhaps? Or is it something that you need to do on a regular basis, for instance to add summary data (for instance, the number of data records) to the beginning of the file?
If you need to do it just once then your best option is just to accept that a mistake has been made and take the consequences of the retro-fix. As long as you make your destination drive different from the source drive you should be able to fix up a 500GB file within about two hours. So after a week of batch processes running after hours you could have upgraded perhaps thirty or forty files
If this is a standard requirement for all such files, and you think you can apply the change only when the file is complete -- some sort of summary information perhaps -- then you should reserve the space at the beginning of each file and leave it empty. Then it is a simple matter of seeking into the header region and overwriting it with the real data once it can be supplied
As has been explained, standard file systems require the whole of a file to be copied in order to add something at the beginning
If your 500GB file is on a standard hard disk, which will allow data to be read at around 100MB per second, then reading the whole file will take 5,120 seconds, or roughly 1 hour 30 minutes
As long as you arrange for the destination to be a separate drive from the source, your can mostly write the new file in parallel with the read, so it shouldn't take much longer than that. But there's no way to speed it up other than that, I'm afraid

If you were not bound to CentOS 7.2, your problem could be solved (with some reservations1) by fallocate, which provides the needed functionality for the ext4 filesystem starting from Linux 4.2 and for the XFS filesystem since Linux 4.1:
int fallocate(int fd, int mode, off_t offset, off_t len);
This is a nonportable, Linux-specific system call. For the portable,
POSIX.1-specified method of ensuring that space is allocated for a
file, see posix_fallocate(3).
fallocate() allows the caller to directly manipulate the allocated
disk space for the file referred to by fd for the byte range starting
at offset and continuing for len bytes.
The mode argument determines the operation to be performed on the
given range. Details of the supported operations are given in the
subsections below.
...
Increasing file space
Specifying the FALLOC_FL_INSERT_RANGE flag (available since Linux 4.1)
in mode increases the file space by inserting a hole within the
file size without overwriting any existing data. The hole will start
at offset and continue for len bytes. When inserting the hole inside
file, the contents of the file starting at offset will be shifted
upward (i.e., to a higher file offset) by len bytes. Inserting a
hole inside a file increases the file size by len bytes.
...
FALLOC_FL_INSERT_RANGE requires filesystem support. Filesystems that
support this operation include XFS (since Linux 4.1) and ext4 (since
Linux 4.2).
1 fallocate allows prepending data to the file only at multiples of the filesystem block size. So it will solve your problem only if it's acceptable for you to pad the extra space with whitespace, comments, etc.
Without a support for fallocate()+FALLOC_FL_INSERT_RANGE the best you can do is
Increase the file (so that it has its final size);
mmap() the file;
memmove() the data;
Fill the header data in the beginning.

How to estimate a file size from header's sector start address?

Suppose I have a deleted file in my unallocated space on a linux partition and i want to retrieve it.
Suppose I can get the start address of the file by examining the header.
Is there a way by which I can estimate the number of blocks to be analyzed hence (this depends on the size of the image.)

In general, Linux/Unix does not support recovering deleted files - if it is deleted, it should be gone. This is also good for security - one user should not be able to recover data in a file that was deleted by another user by creating huge empty file spanning almost all free space.
Some filesystems even support so called secure delete - that is, they can automatically wipe file blocks on delete (but this is not common).
You can try to write a utility which will open whole partition that your filesystem is mounted on (say, /dev/sda2) as one huge file and will read it and scan for remnants of your original data, but if file was fragmented (which is highly likely), chances are very small that you will be able to recover much of the data in some usable form.
Having said all that, there are some utilities which are trying to be a bit smarter than simple scan and can try to be undelete your files on Linux, like extundelete. It may work for you, but success is never guaranteed. Of course, you must be root to be able to use it.
And finally, if you want to be able to recover anything from that filesystem, you should unmount it right now, and take a backup of it using dd or pipe dd compressed through gzip to save space required.

Is there a Linux filesystem, perhaps fuse, which gives the directory size as the size of its contents and its subdirs? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
If there isn't, how feasible would it be to write one? A filesystem which for each directory keeps the size of its contents recursively and which is kept updated not by re-calculating the size on each change on the filesystem, but for example update the dir size when a file is removed or grows.

I am not aware of such a file system. From filesystem's point of view a directory is a file.
You can use:
du -s -h <dir>
to display the total size of all the files in the directory.

From the filesystem point of view, size of directory is size of information about its existence, which needs to be saved on the medium physically. Note, that "size" of directory containing files which have 10GB in total, will be actually the same as "size" of empty directory, because information needed to mark its existence will take same storage space. That's why size of files ( sockets, links and other stuff inside ), isn't actually the same as "directory size". Subdirectories can be mounted from various locations, including remote, and recursively mounted. Somewhat directory size is just a human vision, for real files are not "inside" directories physically - a directory is just a mark of container, exactly the same way as special file ( e.g. device file ) is marked a special file. Recounting and updating total directory size depends more on NUMBER of items in it, than sum of their sizes, and modern filesystem can keep hundreds of thousands of files ( if not more ) "in" one directory, even without subdirs, so counting their sizes could be quite heavy task, in comparison with possible profit from having this information. In short, when you execute e.g. "du" ( disk usage ) command, or when you count directory size in windows, actually doing it someway by the kernel with filesystem driver won't be faster - counting is counting.
There are quota systems, which keep and update information about total size of files owned by particular user or groups, they're, however, limited to monitor partitions separately, as for particular partition quota may be enabled or not. Moreover, quota usage gets updated, as you said, when file grows or is removed, and that's why information may be inaccurate - for this reason quota is rebuild from time to time, e.g. with cron job, by scanning all files in all directories "from the scratch", on the partition on which it is enabled.
Also note, that bottleneck of IO operations speed ( including reading information about the files ) is usually speed of the medium itself, then communication bus, and then CPU, while you're considering every filesystem to be fast as RAM FS. RAM FS is probably most trivial files system, virtually kept in RAM, which makes IO operations go very fast. You can build it at module and try to add functionality you've described, you will learn many interesting things :)
FUSE stands for "file system in user space", FS implemented with fuse are usually quite slow. They make sense when functionality in particular case is more important than speed, e.g. you can create a pseudo-filesystem basing on temperature reading from your newly bought e-thermometer you connected to your computer via USB, however they're not speed daemons, you know :)

Shred: Doesn't work on Journaled FS?

Shred documentation says shred is "not guaranteed to be effective" (See bottom). So if I shred a document on my Ext3 filesystem or on a Raid, what happens? Do I shred part of the file? Does it sometimes shred the whole thing and sometimes not? Can it shred other stuff? Does it only shred the file header?
CAUTION: Note that shred relies on a very important assumption:
that the file system overwrites data in place. This is the
traditional way to do things, but many modern file system designs
do not satisfy this assumption. The following are examples of file
systems on which shred is not effective, or is not guaranteed to be
effective in all file sys‐ tem modes:
log-structured or journaled file systems, such as those supplied with AIX and Solaris (and JFS, ReiserFS, XFS, Ext3, etc.)
file systems that write redundant data and carry on even if some writes fail, such as RAID-based file systems
file systems that make snapshots, such as Network Appliance’s NFS server
file systems that cache in temporary locations, such as NFS version 3 clients
compressed file systems
In the case of ext3 file systems, the above disclaimer applies
(and shred is thus of limited effectiveness) only in data=journal
mode, which journals file data in addition to just metadata. In
both the data=ordered (default) and data=writeback modes, shred
works as usual. Ext3 journaling modes can be changed by adding
the data=something option to the mount options for a
particular file system in the /etc/fstab file, as documented in the
mount man page (man mount).

All shred does is overwrite, flush, check success, and repeat. It does absolutely nothing to find out whether overwriting a file actually results in the blocks which contained the original data being overwritten. This is because without knowing non-standard things about the underlying filesystem, it can't.
So, journaling filesystems won't overwrite the original blocks in place, because that would stop them recovering cleanly from errors where the change is half-written. If data is journaled, then each pass of shred might be written to a new location on disk, in which case nothing is shredded.
RAID filesystems (depending on the RAID mode) might not overwrite all of the copies of the original blocks. If there's redundancy, you might shred one disk but not the other(s), or you might find that different passes have affected different disks such that each disk is partly shredded.
On any filesystem, the disk hardware itself might just so happen to detect an error (or, in the case of flash, apply wear-leveling even without an error) and remap the logical block to a different physical block, such that the original is marked faulty (or unused) but never overwritten.
Compressed filesystems might not overwrite the original blocks, because the data with which shred overwrites is either random or extremely compressible on each pass, and either one might cause the file to radically change its compressed size and hence be relocated. NTFS stores small files in the MFT, and when shred rounds up the filesize to a multiple of one block, its first "overwrite" will typically cause the file to be relocated out to a new location, which will then be pointlessly shredded leaving the little MFT slot untouched.
Shred can't detect any of these conditions (unless you have a special implementation which directly addresses your fs and block driver - I don't know whether any such things actually exist). That's why it's more reliable when used on a whole disk than on a filesystem.
Shred never shreds "other stuff" in the sense of other files. In some of the cases above it shreds previously-unallocated blocks instead of the blocks which contain your data. It also doesn't shred any metadata in the filesystem (which I guess is what you mean by "file header"). The -u option does attempt to overwrite the file name, by renaming to a new name of the same length and then shortening that one character at a time down to 1 char, prior to deleting the file. You can see this in action if you specify -v too.

The other answers have already done a good job of explaining why shred may not be able to do its job properly.
This can be summarised as:
shred only works on partitions, not individual files
As explained in the other answers, if you shred a single file:
there is no guarantee the actual data is really overwritten, because the filesystem may send writes to the same file to different locations on disk
there is no guarantee the fs did not create copies of the data elsewhere
the fs might even decide to "optimize away" your writes, because you are writing the same file repeatedly (syncing is supposed to prevent this, but again: no guarantee)
But even if you know that your filesystem does not do any of the nasty things above, you also have to consider that many applications will automatically create copies of file data:
crash recovery files which word processors, editors (such as vim) etc. will write periodically
thumbnail/preview files in file managers (sometimes even for non-imagefiles)
temporary files that many applications use
So, short of checking every single binary you use to work with your data, it might have been copied right, left & center without you knowing. The only realistic way is to always shred complete partitions (or disks).

The concern is that data might exist on more than one place on the disk. When the data exists in exactly one location, then shred can deterministically "erase" that information. However, file systems that journal or other advanced file systems may write your file's data in multiple locations, temporarily, on the disk. Shred -- after the fact -- has no way of knowing about this and has no way of knowing where the data may have been temporarily written to disk. Thus, it has no way of erasing or overwriting those disk sectors.
Imagine this: You write a file to disk on a journaled file system that journals not just metadata but also the file data. The file data is temporarily written to the journal, and then written to its final location. Now you use shred on the file. The final location where the data was written can be safely overwritten with shred. However, shred would have to have some way of guaranteeing that the sectors in the journal that temporarily contained your file's contents are also overwritten to be able to promise that your file is truly not recoverable. Imagine a file system where the journal is not even in a fixed location or of a fixed length.
If you are using shred, then you're trying to ensure that there is no possible way your data could be reconstructed. The authors of shred are being honest that there are some conditions beyond their control where they cannot make this guarantee.

rm not freeing diskspace [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I've rm'ed a 2.5gb log file - but it doesn't seemed to have freed any space.
I did:
rm /opt/tomcat/logs/catalina.out
then this:
df -hT
and df reported my /opt mount still at 100% used.
Any suggestions?

Restart tomcat, if the file is in use and you remove it, the space becomes available when that process finishes.

As others suggested, the file probably is still opened by other processes. To find out by which ones, you can do
lsof /opt/tomcat/logs/catalina.out
which lists you the processes. Probably you will find tomcat in that list.

Your Problem:
Its possible that a running program is still holding on to the file.
Your Solution:
Per the other answers here, you can simply shutdown tomcat to stop it from holding on to the file.
If that is not an option, or if you simply want more details, check out this question: Find and remove large files that are open but have been deleted - it suggests some harsher ways to deal with it that may be more useful to your situation.
More Details:
The linux/unix filesystem considers "opened" files to be another name for them. rm removes the "name" from the file as seen in the directory tree. Until the handles are closed, the files still has more "names" and so the file still exists. The file system doesn't reap files until they are completely unnamed.
It might seem a little odd, but doing it this way allows for useful things like enabling symlinks. Symlinks can essentially be treated as an alternate name for the same file.
This is why it is important to always call your languages equivalent to close() on a file handle if you are done with it. This notifies the OS that the file is no longer being used. Although sometimes this cant be helped - which is likely the case with Tomcat. Refer to Bill Karwin's Answer to read why.
Depending on the file-system, this is usually implemented as a sort of reference count, so there may not be any real names involved. It can also get weird if things like stdin and stderr are redirected to a file or another bytestream (most commonly done with services).
This whole idea is closely related to the concept of 'inodes', so if you are the curious type, i'd recommend checking that out first.
Discussion
It doesn't work so well anymore, but you used to be able to update the entire OS, start up a new http-daemon using the new libraries, and finally close the old one when no more clients are being serviced with it (releasing the old handles) . http clients wouldn't even miss a beat.
Basicly, you can completely wipe out the kernel and all the libraries "from underneath" running programs. But since the "name" still exists for the older copies, the file still exists in memory/disk for that particular program. Then it would be a matter of restarting all the services etc. While this is an advanced usage scenario, it is a reason why some unix system have years of up-time on record.

Restarting Tomcat will release any hold Tomcat has on the file. However, to avoid restarting Tomcat (e.g. if this is a production environment and you don't want to bring the services down unncessarily), you can usually just overwrite the file:
cp /dev/null /opt/tomcat/logs/catalina.out
Or even shorter and more direct:
> /opt/tomcat/logs/catalina.out
I use these methods all the time to clear log files for currently running server processes in the course of troubleshooting or disk clearing. This leaves the inode alone but clears the actual file data, whereas trying to delete the file often either doesn't work or at the very least confuses the running process' log writer.

As FerranB and Paul Tomblin have noted on this thread, the file is in use and the disk space won't be freed until the file is closed.
The problem is that you can't signal the Catalina process to close catalina.out, because the file handle isn't under control of the java process. It was opened by shell I/O redirection in catalina.sh when you started up Tomcat. Only by terminating the Catalina process can that file handle be closed.
There are two solutions to prevent this in the future:
Don't allow output from Tomcat apps to go into catalina.out. Instead use the swallowOutput property, and configure log channels for output. Logs managed by log4j can be rotated without restarting the Catalina process.
Modify catalina.sh to pipe output to cronolog instead of simply redirecting to catalina.out. That way cronolog will rotate logs for you.

the best solution is using 'echo' ( as #ejoncas' suggestion ):
$ echo '' > huge_file.log
This operation is quite safe and fast(remove about 1G data per second), especially when you are operating on your production server.
Don't simply remove this file using 'rm' because firstly you have to stop the process writing it, otherwise the disk won't be freed.
refer to: http://siwei.me/blog/posts/how-to-deal-with-huge-log-file-in-production
UPDATED： the origin of my story
in 2013, when I was working for youku.com, on the Saturday, I found one core server was down, the reason is : disk is full ( with log files)
so I simplely rm log_file.log ( without stopping the web app proccess) but found: 1. no disk space was freed and: 2. the log file was actually not seen to me.
so I have to restart my web-server( an Rails app ) and the disk space was finally freed.
This is a quite important lesson to me. It told me that echo '' > log_file.log is the correct way to free disk space if you don't want to stop the running process which is writing log to this file.

If something still has it open, the file won't actually go away. You probably need to signal catalina somehow to close and re-open its log files.

If there is a second hard link to the file then it won't be deleted until that is removed as well.

Enter the command to check which deleted files has occupied memory
$ sudo lsof | grep deleted
It will show the deleted file that still holds memory.
Then kill the process with pid or name
$ sudo kill <pid>
$ df -h
check now you will have the same memory
If not type the command below to see which file is occupying memory
# cd /
# du --threshold=(SIZE)
mention any size it will show which files are occupying above the threshold size and delete the file

Is the rm journaled/scheduled? Try a 'sync' command for force the write.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string