glibc function fails on large file

glibc function fails on large file - linux

I have a utility I wrote years ago in C++ which takes all the files in all the subdirectories of a given directory and moves them to new numbered subdirectories based on a count of the files. It has worked without error for several years.
Yesterday it failed for the first time. It always fails on a 2.7Gig video file, perhaps the largest this utility has ever encountered. The file itself is not corrupt. It will play in a video player. I can move it with command line or file manager apps without a problem.
I use ntfw() to walk the directory subtree. On this file, ntfw() returns an error code of -1 on encountering the file, before calling my callback function. Since (I thought) the code is only dealing with filenames and not actually opening or reading the file, I don't understand why the file size should be an issue.
The number of open file descriptors is not the problem. Nor the number of files. It was in a subtree of over 5,000 files, but when moving it to one of only 50 it still fails, while the original subtree is processed without error. File permissions are not the problem. This file has the same as all the others. This includes ACL permissions.
The question is: Is file size the issue? Why?
The file system is ext4.
ldd --version /usr/lib/i386-linux-gnu/libc.so
ldd (Ubuntu GLIBC 2.27-3ubuntu1.4) 2.27
Linux version 4.15.0-161-generic (buildd#lgw01-amd64-050)
(gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04))
#169-Ubuntu SMP Fri Oct 15 13:39:59 UTC 2021

As you're using a 32-bit application, in order to work properly with files larger than 2 GB you should compile with -D_FILE_OFFSET_BITS=64 in order to use 64-bit file handling syscalls and types.
In particular, nftw() calls stat() which fails with EOVERFLOW if the size of the file exceeds 2 GB: https://man7.org/linux/man-pages/man2/stat.2.html
Also, for using mmap() (which it seems you're not using, but just in case as a comment was mentioning it), you can't allocate all of 4 GB, some of the address space is reserved for the kernel (typically 1 GB on Linux). Then some space is used by the stack(s), shared libraries etc. Maybe you'll be able to map 2 GB at a time, if you're lucky.

Related

SYSTEM ERROR: I/O error 0 in writeto, ret 2048, file 56(/mfgtmp/tmp/srtE5yybD), addr 77010944. (290) - PROGRESS 4GL

I am getting below error suddenly when my progress program was executed and running for more than 80 minutes. I think this is OS error and error 0 says its for out of disk space. I checked the disk space as it shows 14 GB available but I am not sure why I am getting this error.
Is it because of on a write out of disk space(exceeding 14 GB) and stopped ? so that available 14 GB kept same as it is?
SYSTEM ERROR: I/O error 0 in writeto, ret 2048, file 56(/mfgtmp/tmp/srtE5yybD), addr 77010944. (290)

By default temp files are created "unlinked". Because of this the space they were using is automatically reclaimed by the OS if the session crashes so you will often have a situation where your temp file ran out of space, the session crashed, and then when you investigate there is plenty of free space.
You can change the default behavior by using the -t (lower case) startup parameter. This will result in the files not being removed if a session crashes - so the space will not be returned to the OS. You will have to manually delete "stale" files if you enable -t.
On UNIX -t will also make the files visible in the -T (upper case) directory so that you can see their growth in real time.
On Windows the files are always visible and the current length is not consistently reported by system tools.
If your temp files are being written to a different filesystem than your working directory (the -T startup parameter is where temp files go) then you should have a "protrace.pid" file corresponding to the crashed session's process id and the timestamp of the crash. This will then lead you to the 4gl code that was creating the very large srt file.
14GB is far beyond "reasonable" so you really should look at that code and see if there is a better way to do whatever it is doing.

There are a number of k-base articles on that issue, for instance: https://knowledgebase.progress.com/articles/Knowledge/000027351
When you check disk space, please make sure you're checking the correct file system (/mfgtmp in this case).
The error messages references an srt file - so you might want to try to use srt file less heavy, see this article for some initial help: https://knowledgebase.progress.com/articles/Knowledge/P95930
Or: https://knowledgebase.progress.com/articles/Knowledge/P84475

In node.js how can I know whether fs.stat() will return usable crtime and/or birthtime fields for a given file/path/volume/fs?

I recently learned that different OSes and even different filesystems under the same OS support different subsets of the timestamps returned by lstat.
The Stats object returned gives us four times, each in two different flavours.
js Date objects:
atime: the last time this file was accessed expressed in milliseconds since the POSIX Epoch
mtime: the last time this file was modified ...
ctime: the last time the file status was changed ...
birthtime: the creation time of this file
(atimeMs, mtimeMs, ctimeMs, and birthtimeMs are js Date object versions of each of the above)
"Modified" means the file's contents were changed by being written to etc.
"Changed" means the file's metadata such as owners and permissions was changed.
Linux has traditionally never supported the concept of birth time, but as more newer filesystems did support it, it has recently had support added to hopefully all relevant layers of the Linux stack if I have read correctly.
But Windows and Mac both do support birth time as do their native filesystems.
Windows on the other hand did not traditionally support a concept of file change separate from file modification. But to comply to POSIX it added support at the API level and to NTFS. (It doesn't seem to be exposed anywhere in the GUI or commandline though). FAT fs does not support it.
When I call lstat on a file on Windows on an NTFS drive the results for ctime look good. When I call it on a file on a FAT drive, ctime contains junk. (In my case it's always 2076-11-29T08:54:34.955Z for every file.)
I don't know if this is a bug.
I don't know what birthtime returns on Linux on filesystems that don't support it. Hopefully null or undefined but perhaps also garbage. I also don't know what Linux or Mac return in ctime for files on FAT volumes.
So is there a way in Node to get info on which of these features are supported for a given file/path/fs?

Compare a running process in memory with an executable in disk

I have a big project which will load an executable (let's call it greeting) into memory, but for some reason (e.g. there are many files called greeting under different directories), I need to know if the process in memory is exactly the one I want to use.
I know how to compare two files: diff, cmp, cksum and so on. But is there any way to compare a process in memory with an executable in hard disk?

According this answer you can get the contents of the memory version of the binary from the proc file system. I think you can cksum the original and the in memory version.
According to the man page of /proc, under Linux 2.2 and later, the
file is a symbolic link containing the actual pathname of the executed
command. Apparently, the binary is loaded into memory, and
/proc/[pid]/exe points to the content of the binary in memory.

Unformatted direct access file portability [duplicate]

This question already has an answer here:
Reading writing fortran direct access unformatted files with different compilers
(1 answer)
Closed 6 years ago.
I have a Fortran code which writes an unformatted direct access file. The problem is that the size and the contents of the file changes when I switch to different platforms:
First platform is Windows (32bit version of the program using Intel compiler - version around 2009)
Second platform is Linux (64 bit version of the program with gfortran compiler v4.9.0).
Unfortunately the file that is produced in Linux can not be read from windows. The file in LINUX is 5-6 times smaller. However, the total number of records that are written seems to be the same. I opened both files with a hex editor and the main difference is that a lot of zeros exist in the windows version of the file.
Is there any way produce exactly the same file in LINUX?
If it helps, you can find both files here: https://www.dropbox.com/sh/erjlf5sps40in0e/AAC4XEi-p4nnTNzhyai_ZCZVa?dl=0
I open the file with the command: OPEN(IAST,FILE=ASTFILR,ACCESS='DIRECT',FORM='UNFORMATTED',RECL=80)
I write with the command:
WRITE(IAST,REC=IRC) (SNGL(PHI(I)-REF), I=IBR,IER)
I read with the command: READ(IAST,REC=IRC,ERR=999) (PHIS(I), I=1,ISTEP)
where PHIS is a REAL*4 array

The issue is that by default Intel Fortran specifies that RECL= is in units of words, whereas GFortran uses bytes. There's an Intel Fortran compiler option that you can use to make it use byte units. On Linux that option is
-assume byterecl
for Windows I'm not sure what the syntax is, maybe something like
/assume:byterecl

How to create a file of size more than 2GB in Linux/Unix?

I have this home work where I have to transfer a very big file from one source to multiple machines using bittorrent kinda of algorithm. Initially I am cutting the files in to chunks and I transfer chunks to all the targets. Targets have the intelligence to share the chunks they have with other targets. It works fine. I wanted to transfer a 4GB file so I tarred four 1GB files. It didn't error out when I created the 4GB tar file but at the other end while assembling all the chunks back to the original file it errors out saying file size limit exceeded. How can I go about solving this 2GB limitation problem?

I can think of two possible reasons:
You don't have Large File Support enabled in your Linux kernel
Your application isn't compiled with large file support (you might need to pass gcc extra flags to tell it to use 64-bit versions of certain file I/O functions. e.g. gcc -D_FILE_OFFSET_BITS=64)

This depends on the filesystem type. When using ext3, I have no such problems with files that are significantly larger.
If the underlying disk is FAT, NTFS or CIFS (SMB), you must also make sure you use the latest version of the appropriate driver. There are some older drivers that have file-size limits like the ones you experience.

Could this be related to a system limitation configuration ?
$ ulimit -a
vi /etc/security/limits.conf
vivek hard fsize 1024000
If you do not want any limit remove fsize from /etc/security/limits.conf.

If your system supports it, you can get hints with: man largefile.

You should use fseeko and ftello, see fseeko(3)
Note you should define #define _FILE_OFFSET_BITS 64
#define _FILE_OFFSET_BITS 64
#include <stdio.h>

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string