data integrity issue with ntfs but not ext3 - linux

In my application I am continually writing data to file1 and flushing it to the device. In another thread, I am reading data from file1 and writing it to file2.
Every time I do the fwrite + fflush on file1, I signal to the other thread to start reading from it. The other thread reads data from file1 and dumps it into file2. Pretty simple logic. Additionally, after every few minutes, I seek back to start of file1 and start overwriting old data.
Now my problem is that once I start overwriting data in file1, the data read into file2 is sometimes the old data (i.e. data written in the previous iteration) even though writer thread has signaled that it wrote the new data (and flushed it).
I am writing to and reading from a solid state drive (128 GB SAMSUNG 470 Series, if that helps) on [C + linux + arm platfrom]. I feel that there is an issue with the processor cache. Perhaps the write goes into the cache and the read by the reader thread comes from the flash, and hence the stale data.
The catch here is that this problem occurs if the SSD is formatted with NTFS. If I format it with ext3, the problem goes away. Unfortunately, NTFS is a hard requirement. Another interesting observation is that if I have two reader threads, both get stale data at different instants.
Event after disabling the SSD write cache (with hdparm -W0 /dev/sda1), I get the same problem with NTFS. I am badly stuck up with this since more than a week.
Any idea what is happening, and why is it happening that way?
Any help will be worth its weight in gold...
EDIT Turns out that the NTFS driver does not like me overwriting a file by rewinding the file pointer. Is this a known thing?

Ok, so I found the issue myself (and how rarely does that happen !!!).
I found that there was a problem with the C library buffering (fread/fwrite). So I do fflush() before every fread(). This solves my problem (I don't know what exactly went wrong with the driver but I am assuming that there is some issue with the "read" buffering of the C library I/O functions, when reading from the same location of the file second time around).
Thanks #Asad Rasheed and #jrtipton for your inputs :)

Related

File contents lost after power outage

I am using C++ ofstream to write a log file on Linux. When I monitor the file contents with tail -f command I can see the contents are correctly populated. But if a power outage happens and I check the file again after power cycle, the last couple lines of records are gone. With hexdump I can see those records turned into null characters '\0' instead. I tried flush() and manipulator std::endl and they don't help anyway.
Is it true what tail showed to me was not actually written to the disk and they were just in buffer? The inode table wasn't update before the power outage? I can accept this fact but I don't understand why the records turned to null characters if they weren't written to the file.
Btw, I tried Google's glog and have the same results (a bunch of null characters at the end). I also tried zlog, a C library. and found it only lost the last records but didn't replace them with null chars.
Well, when you have a power outage, and then start the system again, the linux kernel tries to forward the journal log to detect and correct the inconsistencies held from memory to disk when the system crashed. Normally this means to redo and commit all operations possible until the system crash, but undo (and erase) all data not commited on the time of the crash.
Linux (and other un*x kernels, like freebsd) has a facility called ordered data write, that forces metadata (like block pointers from inodes, or directory entries) to be updated after the actual data they point to is effectively written on disk, so inconsistencies reduce to a minimum. I don't know the actual linux implementation, but for example, in freebsd what you point (a block of zeros in a file instead of the actual data written) is completely impossible with freebsd kernel (well, you can do it on purpose, but not accidentally) The most probable thing is that linux probably just manages the blocks info and not the file contents, or it has updated the file size pointer and not the data up to there. This should not happen as it's an already solved problem.
The other thing is how many data you have written or why what you see on the screen doesn't appear after the system crash. Probably you have heard about something called delayed write that allows the kernel to save write operations to disk on busy systems by not writing immediately data onto disk, but waiting some time so updates can be resolved in core memory buffers before they go to disk. Disk writes, anyway, are forced after some time delay, that means 5secs in linux (I try to remember, there's a lot of time I checked that value last time, I'm in doubt between 5 and 30 seconds) so you can lose your last five seconds at most.

Can file size be used to detect a partial append?

I'm thinking about ways for my application to detect a partially-written record after a program or OS crash. Since records are only ever appended to a file (never overwritten), is a crash while writing guaranteed to yield a file size that is shorter than it should be? Is this guaranteed even if the file was opened in read-write mode instead of append mode, so long as writes are always at the end of the file? This would greatly simplify crash recovery, since comparing the last record's expected size and position with the actual file size would be enough to detect a partial write.
I understand that random-access writes can be reordered by the filesystem, but I'm having trouble finding information on whether this can happen when appending. I imagine an out-of-order append would require the filesystem to create a "hole" at the tail of the (sparse) file, write blocks beyond the hole, and then fill in the blocks in between, but I'm hoping that such an approach would be so inefficient that nobody would ever implement their filesystem that way.
I suppose another problem might be a filesystem updating the directory entry's file size field before appending the new blocks to to the file, and the OS crashing in between. Does this ever happen in practice? (ext4, perhaps?) Is there a quick way to detect it? (And what happens when trying to read the unwritten blocks that should exist according to the file's size?)
Is there anything else, such as write reordering performed by a disk/flash drive, that would get in the way of using file size as a way to detect a partial append? I don't expect to be able to compensate for this sort of drive trickery in my application, but it would be good to know about.
If you want to be SURE that you're never going to lose records, you need a consistent journaling or transactional system for your files.
There is absolutely no guarantee that a write will have been fulfilled unless you either set O_DIRECT [which you probably do not want to do], or you use markers to indicate aht "this has been fully committed", that are only written when the file is closed. You can either do that in the mainfile, or, for example, have a file that records, externally, "last written record". If you open & close that file, it should be safe as long as the APP is what is crashing - if the OS crashes [or is otherwise abruptly stopped - e.g. power cut, disk unplugged, etc], all bets are off.
Write reordering and write caching is/can be done at all levels - the C library, the OS, the filesystem module and the hard disk/controller itself are all ABLE to reorder writes.

using files as IPC on linux

I have one writer which creates and sometimes updates a file with some status information. The readers are implemented in lua (so I got only io.open) and possibly bash (cat, grep, whatever). I am worried about what would happen if the status information is updated (which means a complete file rewrite) while a reader has an open handle to the file: what can happen? I have also read that if the write/read operation is below 4KB, it is atomic: that would be perfectly fine for me, as the status info can fit well in such dimension. Can I make this assumption?
A read or write is atomic under 4Kbytes only for pipes, not for disk files (for which the atomic granularity may be the file system block size, usually 512 bytes).
In practice you could avoid bothering about such issues (assuming your status file is e.g. less than 512 bytes), and I believe that if the writer is opening and writing quickly that file (in particular, if you avoid open(2)-ing a file and keeping the opened file handle for a long time -many seconds-, then write(2)-ing later -once, a small string- inside it), you don't need to bother.
If you are paranoid, but do assume that readers are (like grep) opening a file and reading it quickly, you could write to a temporary file and rename(2)-ing it when written (and close(2)-ed) in totality.
As Duck suggested, locking the file in both readers and writers is also a solution.
I may be mistaken, in which case someone will correct me, but I don't think the external readers are going to pay any attention to whether the file is being simultaneously updated. They are are going to print (or possibly eof or error out) whatever is there.
In any case, why not avoid the whole mess and just use file locks. Have the writer flock (or similar) and the readers check the lock. If they get the lock they know they are ok to read.

How to wait for the dd command to finish copying before continuing my script?

Title pretty much nailed it, Im copying files to a flash drive and then doing some things to those files. Well I have noticed that after running the dd command the flash drive is still flashing and not all the files are on the device.
Does anyone know how to maybe run a simple loop (in script) to wait on the dd process to finish? I have been googling for about 2-3 hours now trying to figure it out and its beyond me if its even possible.
Thanks in advance!
Try the sync command:
sync writes any data buffered in memory out to disk. This can
include (but is not limited to) modified superblocks, modified inodes,
and delayed reads and writes. This must be implemented by the kernel;
The sync program does nothing but exercise the sync system call.
The kernel keeps data in memory to avoid doing (relatively slow)
disk reads and writes. This improves performance, but if the computer
crashes, data may be lost or the file system corrupted as a result.
The sync command ensures everything in memory is written to disk.
Most likely you are seeing the operating system caching the writes. If you really want to make sure that everything is written to the flash drive so that it is safe to remove, it needs to be unmounted.

How to parallelize file reading and writing

I have a program which reads data from 2 text files and then save the result to another file. Since there are many data to be read and written which cause a performance hit, I want to parallize the reading and writing operations.
My initial thought is, use 2 threads as an example, one thread read/write from the beginning, and another thread read/write from the middle of the file. Since my files are formatted as lines, not bytes(each line may have different bytes of data), seek by byte does not work for me. And the solution I could think of is use getline() to skip over the previous lines first, which might be not efficient.
Is there any good way to seek to a specified line in a file? or do you have any other ideas to parallize file reading and writing?
Environment: Win32, C++, NTFS, Single Hard Disk
Thanks.
-Dbger
Generally speaking, you do NOT want to parallelize disk I/O. Hard disks do not like random I/O because they have to continuously seek around to get to the data. Assuming you're not using RAID, and you're using hard drives as opposed to some solid state memory, you will see a severe performance degradation if you parallelize I/O(even when using technologies like those, you can still see some performance degradation when doing lots of random I/O).
To answer your second question, there really isn't a good way to seek to a certain line in a file; you can only explicitly seek to a byte offset using the read function(see this page for more details on how to use it.
Queuing multiple reads and writes won't help when you're running against one disk. If your app also performed a lot of work in CPU then you could do your reads and writes asynchronously and let the CPU work while the disk I/O occurs in the background. Alternatively, get a second physical hard drive: read from one, write to the other. For modestly sized data sets that's often effective and quite a bit cheaper than writing code.
This isn't really an answer to your question but rather a re-design (which we all hate but can't help doing). As already mentioned, trying to speed up I/O on a hard disk with multiple threads probably won't help.
However, it might be possible to use another approach depending on data sensitivity, throughput needs, data size, etc. It would not be difficult to create a structure in memory that maintains a picture of the data and allows easy/fast updates of the lines of text anywhere in the data. You could then use a dedicated thread that simply monitors that structure and whose job it is to write the data to disk. Writing data sequentially to disk can be extremely fast; it can be much faster than seeking randomly to different sections and writing it in pieces.

Resources