Delphi 10.3 Linux exclusive file access - linux

How can I lock a file in Linux with a filestream?
Creating a filestream like in the example below works perfectly in Windows, the file is locked and cannot be deleted or written to in other sessions until the stream is freed. Under Linux I can delete the file or write to it in an other session without any problems.
var f: TFileStream;
...
f := TFileStream.Create(TPath.Combine(FTemp, lowerCase(Name)), fmOpenReadWrite + fmCreate);
...
NEW FINDINGS 1.1.2020
Linux does not automatically apply an atomic lock on files like Windows does. So I tried applying a lock after creating the file:
function flock(handle, operation: integer): integer; cdecl; external libc name _PU + 'flock';
const
LOCK_EX = 2;
...
f := TFileStream.Create(fn, fmCreate, fmShareExclusive);
flock(f.handle, LOCK_EX);
There is a small race condition as creating the file and locking it is not one single step but for my application this is not a problem. When looking at the created lock file on the linux console the difference is obvious:
Wihout flock():
mint#minti:/tmp/itclock$ lsof wirsing
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
_TestLibB 5417 mint 14u REG 8,1 8 2755101 wirsing
With flock():
mint#minti:/tmp/itclock$ lsof wirsing
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
_TestLibB 6365 mint 14uW REG 8,1 0 2755117 wirsing
The difference is the big W that indicates an exclusive lock.
Unfortunately this does not solve the problem as creating a second filestream just creates a second exclusive lock in the same process and deleting the file from an other process is still possible. If there was a way to read the extended attributes of lsof <file> from within delphi ...
An other finding (new year, new ideas)
As I run my tests as unit tests and create several objects in the same process to test the locks this might be the cause as Windows does not allow accessing the locked file even from within the same process. This seems to be different in Linux. I need something that behaves like in Windows - it should be no difference if it is locked in a different process or a thread of the same process. Perhaps Linux offers a totally different way to accomplish such a locking mechanism?
And still: Any help is very appreciated!

Finally I found a usable solution that at least serves my requirements:
In Windows I continue using the simple approach with fmShareExclusive. In Linux I apply a FileLock as described above in the question. For checking the exclusive lock I run a command line via popen and capture the result prior to deleting the lock file and then creating/locking it.
lsof -Fl <my flock file name>
This outputs three lines if a Lock exists:
p7590
f14
lW
I look for lW in the result that indicates an exclusive lock on the file and I can react on that as required.
This solution still works if a process crashes as then the file lock will be gone too.
I am aware of the fact that this is not a very elegant solution but it seems to be rubust and reliable enough.
Comments and suggestions are very welcome!

Related

Windows/Python check if file is open or in use

I am using python to monitor a folder and check if files are being copied in and if so, replicate those to a new location.
I am using the following to monitor the folder:
fsmonitor
The issue I am facing is that I am unable to discern if the file is in use and currently in the process of writing the contents onto disk. if so I want to wait till copying is complete and then start copying it to my new location.
So how do I find out if a file is in use/open?
I have seen some suggestions here where I try to write to the file question and if it fails then it indicates that the file is in use:
example answer (I've seen similar in python)
But I am reluctant to use such a method due to the fear that it might cause corruption and such issues.
Is there an alternative/safer way to do this? Or is testing write permissions safe?
Is anyone familiar with pywin32? Does it provide such tools? The site looks arcane, so wonder if it has the latest API provided by windows, even fsmointor mentioned above uses the same library and I wonder if there are newer/more efficient ways to do this.
Currently, I am using psutil, proc.open_files() to loop through all processes and all files to list out open files. if files that I am concerned about appear on this list I wait and try again. However, this process creates a humongous list of files and uses 12% of my CPU to create it, so I desperately need an alternative.
In response to Adrian McCarthy
I starting out assuming that it is safe to action whatever fsmonitor puts out, but if you see the following output which si for a single file copy:
0 86 0
create C:\Users\ScanUser\Pictures\syncTest dotnet-sdk-5.0.203-win-x64 - Copy.exe 3684bf38
create C:\Users\ScanUser\Pictures\syncTest dotnet-sdk-5.0.203-win-x64 - Copy.exe 3684bf38
0 86 0
modify C:\Users\ScanUser\Pictures\syncTest dotnet-sdk-5.0.203-win-x64 - Copy.exe a8cf3250
modify C:\Users\ScanUser\Pictures\syncTest dotnet-sdk-5.0.203-win-x64 - Copy.exe a8cf3250
0 160 0
modify C:\Users\ScanUser\Pictures\syncTest dotnet-sdk-5.0.203-win-x64 - Copy.exe caef5c64
modify C:\Users\ScanUser\Pictures\syncTest dotnet-sdk-5.0.203-win-x64.exe caef5c64
modify C:\Users\ScanUser\Pictures\syncTest dotnet-sdk-5.0.203-win-x64 - Copy.exe caef5c64
modify C:\Users\ScanUser\Pictures\syncTest dotnet-sdk-5.0.203-win-x64.exe caef5c64
So the conundrum is at which 'modify' do I start copying the file? I can wait a few minutes/seconds to see if another 'modified' appeared for that file but how do I decide the time to wait for a large file over SFTP may take 30 minutes, so I need something scalable.
Also, I would like not the make multiple copy actions for a file since that will make the script inefficient.
This can help you
check if a file is open in Python
here is a code:
try: # try to open the file
with open("file", "r") as file:
# some code here
except IOError:
# if it throws an error that means it is in use
I think you're unnecessarily concerned about working with the file while another process still has it open.
On Windows. fsmonitor using the ReadDirectoryChangesW mechanism. That means you'll get a notification about a change after it happens. So if a process writes to foo.log, you'll get a notification after the write operation is completed. (In fact, I think it's after the update of the directory metadata.)
To copy the file, you need read access. So just go ahead and open it for reading.
If it opens, then it's safe to read, even if another process has it open. You cannot corrupt a file by reading it even if another process is writing to it.
If it fails to open, then another process has it open and is intentionally preventing other processes from reading it (probably because they know they'll be actively updating it). In that case, you can try again later.
Trying to first check whether another process is using the file doesn't actually help because the answer could change between the moment you check and the moment you try to act on that information.
When you open a file, the system does the permission check and the opening under a mutex*, so the answer cannot change in between. There's no way for you to simulate that yourself from user-mode code. Once you have the file open, you can safely use it.
If you try to read from a file at the same moment another process tries to write to it, the system will ensure that the read will get the data as it was before the write or as it is after the write. It won't get a result that's a mixture of old and new.
That said, if you're reading the file with a bunch of small read operations while another process is writing to the file with a bunch of small write operations, it's possible you might capture some intermediate state of the file. But that's okay. The original file is unharmed, and those writes will trigger another fsmonitor notification, so you're code will start over and try to make another copy of the file.
* I'm using "mutex" in a generic sense: It uses some sort of synchronization mechanism, but it might not necessarily be a Windows Mutex object.

How to use multithreading to write downloaded data in one file in a mulithread download application

I want to make a multithread download using Idhttp (indy), so I have a principal thread that starts secondary threads, each secondary thread creates a file: "fileThreadNB" that is supposed to contain downloaded data, then this secondary thread downloads a part of the file on the server using idhttp.request.range and it writes downloaded data in fileThreadNB , then all these files (files created by secondary threads) are copied in one file to get the same file on the server, but the copy here takes a lot of time especially when the file on the server has a big size, so is there any other way that allows threads to write data in the same file, to be clearer; thread 0: downloads from position 0 to m, writes in fileX from position 0 to m .. thread n:downloads from position j to filesize-1, writes in fileX from position j to filesize-1.
Note: threads must write data in hard drive, so I can resume download later if something bad occurs.
I tried this code instead:
procedure TSecondaryThread.Execute;
begin
HTTP.Request.Range := Format('%d-%d',[BeginPos ,BeginPos +BlockSize -1]);
File.Position:=BeginPos;
HTTP.Get(url,File);
end;
BlockSize is the same for all threads, BeginPos changes from thread to other, the too variables are initialised in TSecondaryThread.Create.
NB:
when I try use one secondary thread, the file is well downloaded, but when I use more I get this error:External SIGSEGV, and the size of the downloaded file is bigger than the file on the server's size.
File is a global variable.
i guess that the problem is due to:File.Position:=BeginPos;but I don't know how to fix it, I would be grateful if someone helps me to solve this.
As you know the filesize you can create a empty fike with the allocation already configured to that size, then just take care to write for each thread to the right range. There should be no concurrence issues.

How do dev files work?

How guys from linux make /dev files. You can write to them and immediately they're erased.
I can imagine some program which constantly read some dev file:
FILE *fp;
char buffer[255];
int result;
fp = fopen(fileName, "r");
if (!fp) {
printf("Open file error");
return;
}
while (1)
{
result = fscanf(fp, "%254c", buffer);
printf("%s", buffer);
memset(buffer, 0, 255);
fflush(stdout);
sleep(1);
}
fclose(fp);
But how to delete content in there? Closing a file and opening them once again in "w" mode is not the way how they done it, because you can do i.e. cat > /dev/tty
What are files? Files are names in a directory structure which denote objects. When you open a file like /home/joe/foo.txt, the operating system creates an object in memory representing that file (or finds an existing one, if the file is already open), binds a descriptor to it which is returned and then operations on that file descriptor (like read and write) are directed, through the object, into file system code which manipulates the file's representation on disk.
Device entries are also names in the directory structure. When you open some /dev/foo, the operating system creates an in-memory object representing the device, or finds an existing one (in which case there may be an error if the device does not support multiple opens!). If successful, it binds a new file descriptor to the device obejct and returns that descriptor to your program. The object is configured in such a way that the operations like read and write on the descriptor are directed to call into the specific device driver for device foo, and correspond to doing some kind of I/O with that device.
Such entries in /dev/ are not files; a better name for them is "device nodes" (a justification for which is the name of the mknod command). Only when programmers and sysadmins are speaking very loosely do they call them "device files".
When you do cat > /dev/tty, there isn't anything which is "erasing" data "on the other end". Well, not exactly. Basically, cat is calling write on a descriptor, and this results in a chain of function calls which ends up somewhere in the kernel's tty subsystem. The data is handed off to a tty driver which will send the data into a serial port, or socket, or into a console device which paints characters on the screen or whatever. Virtual terminals like xterm use a pair of devices: a master and slave pseudo-tty. If a tty is connected to a pseudo-tty device, then cat > /dev/tty writes go through a kind of "trombone": they bubble up on the master side of the pseudo-tty, where in fact there is a while (1) loop in some user-space C program receiving the bytes, like from a pipe. That program is xterm (or whatever); it removes the data and draws the characters in its window, scrolls the window, etc.
Unix is designed so that devices (tty, printer, etc) are accessed like everything else (as a file) so the files in /dev are special pseudo files that represent the device within the file-system.
You don't want to delete the contents of such a device file, and honestly it could be dangerous for your system if you write to them willy-nilly without understanding exactly what you are doing.
Device files are not normal files, if "normal file" refers to an arbitrary sequence of bytes, often stored on a medium. But not all files are normal files.
More broadly, files are an abstraction referring to a system service and/or resource, a service being something you can send information to for some purpose (e.g., for a normal file, write data to storage) and a resource being something you request data from for some purpose (e.g., for a normal file, read data from storage). C defines a standard for interfacing with such a service/resource.
Device files fit within this definition, but they do not not necessarily match my more specific "normal file" examples of reading and writing to and from storage. You can directly create dev files, but the only meaningful reason to do so is within the context of a kernel module. More often you may refer to them (e.g., with udev), keeping in mind they are actually created by the kernel and represent an interface with the kernel. Beyond that, the functioning of the interface differs from dev file to dev file.
I've also found quiet nice explanation:
http://lwn.net/images/pdf/LDD3/ch18.pdf

How to tell which file was created first?

On a Linux system (the one in front of me is an Ubuntu 10.04, but that shouldn't matter), how can I tell which of two files created within the same second was created first? The process I control creates neither itself; in all other respects the ctime would, I think, do the trick, but the 1 second resolution is a problem.
For background, I'm trying to reliably determine whether a potentially stale pidfile refers to the current process with that pid. If there's a better way to do that, I'm all ears.
Actually, on modern Unices with modern filesystems, the file modification time is stored in a timespec. Details:
The standard says stat looks like this WRT times:
struct timespec st_atim Last data access timestamp.
struct timespec st_mtim Last data modification timestamp.
struct timespec st_ctim Last file status change timestamp.
And a timespec
time_t tv_sec seconds
long tv_nsec nanoseconds
So, doing a stat on my Linux 2.6.39:
Access: 2011-07-14 15:38:20.016666721 +0300
Modify: 2011-06-10 03:06:12.000000000 +0300
Change: 2011-06-17 11:01:35.416667110 +0300
In conclusion, I think you've got enough precision there if the hardware is supplying it.
You can try ls -rt to sort the files by time on the hope that the file header has more precision than the default list time format displays. But if the file system doesn't have the information, there is no way to do this.
Other options? You could add an ID to the file and always increment it but as soon as you try to load this ID from the file system (when you create a new process), you'll run into problems with locking.
So how can you make sure the PID file is not stale? Answer: Use the daemon script. It runs a process in the background and makes sure the PID file gets deleted as soon as the process exits.

Syncing two files when one is still being written to

I have an application (video stream capture) which constantly writes its data to a single file. Application typically runs for several hours, creating ~1 gigabyte file. Soon (in a matter of several seconds) after it quits, I'd like to have 2 copies of file it was writing - let's say, one in /mnt/disk1, another in /mnt/disk2 (the latter is an USB flash drive with FAT32 filesystem).
I don't really like an idea of modifying the application to write 2 copies simulatenously, so I though of:
Application starts and begins to write the file (let's call it /mnt/disk1/file.mkv)
Some utility starts, copies what's already there in /mnt/disk1/file.mkv to /mnt/disk2/file.mkv
After getting initial sync state, it continues to follow a written file in a manner like tail -f does, copying everything it gets from /mnt/disk1/file.mkv to /mnt/disk2/file.mkv
Several hours pass
Application quits, we stop our syncing utility
Afterwards, we run a quick rsync /mnt/disk1/file.mkv /mnt/disk2/file.mkv just to make sure they're the same. In case if they're the same, it should just run a quick check and quit fairly soon.
What is the best approach for syncing 2 files, preferably using simple Linux shell-available utilities? May be I could use some clever trick with FUSE / md device / tee / tail -f?
Solution
The best possible solution for my case seems to be
mencoder ... -o >(
tee /mnt/disk1/file.mkv |
tee /mnt/disk2/file.mkv |
mplayer -
)
This one uses bash/zsh-specific magic named "process substitution" thus eliminating the need to make named pipes manually using mkfifo, and displays what's being encoded, as a bonus :)
Hmmm... the file is not usable while it's being written, so why don't you "trick" your program into writing through a pipe/fifo and use a 2nd, very simple program, to create 2 copies?
This way, you have your two copies as soon as the original process ends.
Read the manual page on tee(1).

Resources