I want to check certain files and see if their types and extensions are matching. What I'm currently doing is using the file command to check the mime type (or basic output from file) and comparing it with the file extension. However, some file types returns the same mime-type, .sfx and .dll for example.
Also i have some files with no extension at all, and i should be able to determine file type of them correctly.
I want to be able to get all file types correctly but the most important file types that i m currently interested in are;
dll
msi
com
cpl
exe
ocx
tmp
upd
Is there any other tool that checks and returns a file's type?
EDIT
I wrote a nodejs script that can be used as a linux command. I have created my own file signature database by merging public databases, which has the following format for each file extension;
"ISO" : [
{
"signature": "4344303031", // byte sequence
"size": 5, // size of byte sequence
"offset": 32769 // offset in the file for the signature bytes
},
{
"signature": "4344303031",
"size": 5,
"offset": 34817
},
{
"signature": "4344303031",
"size": 5,
"offset": 36865
}
]
Now; i first check signature bytes for the extension available in the file's name (text.iso will result .iso), and i go and check the signature bytes of that file to see if that is really an iso file.If it is indeed iso, i return iso as result.
If it's not iso, i check all the signature byte sequences for every extension i have in my db against the given file to see if any of them matches. If i have a match, i return the result.
If i cannot find a match, i execute the file command, get the file's mime-type, and use another db i created for matching mime-types with extensions, to see if that has a match. The format for the mime-type db is like this;
"application/atom+xml": [
"atom",
"xml"
],
"application/atomcat+xml": [
"atomcat"
],
"application/atomsvc+xml": [
"atomsvc"
]
This solution currently meets my project's needs. Maybe this might help someone else aswell.
Using Python after pip install filemagic:
>>> import magic
>>> with magic.Magic() as m: m.id_filename('tmp.py')
...
'Python script, ASCII text executable'
>>> with magic.Magic() as m: m.id_filename('test.html')
...
'HTML document, ASCII text'
Linux has a built-in file command: man file
The main difference between Windows and *nix is that DOS/Windows has built-in dependencies on file suffix. For example, an executable must be named ".exe" (or .com); a .bat file must be named ".bat" (or .cmd).
Linux, MacOS, BSD, etc have no such restriction. Instead, they must have "execute" permission set in order to be "runable". This is true for either a binary executable (e.g. compiled code) or a script (e.g. Python, Perl ... or a shell script).
Instead of relying only on file suffix, the "file" command also looks at self-identifying "magic numbers" or other "header information" in the file itself.
SUGGESTION:
If the built-in "file" doesn't meet your needs; perhaps you can wrap it in a shell script that:
1) Checks for certain "well known suffixes" (use basename to extract the suffix), and/or
2) Calls "file" as a fallback
Related
In my package, I would like to use one .po file for each .py script it contains.
Here is my file tree :
foo
mainscript.py
commands/
commandOne.py
locales/fr/LC_MESSAGES/
mainscript_fr.po
commandOne_fr.po
In the mainscript.py, I got the following line to apply gettext to the strings :
if "fr" in os.environ['LANG']:
traduction = gettext.translation('mainscript_fr', localedir='./locales', languages=['fr'])
traduction.install()
else:
gettext.install('')
Until now, it is working as expected. But now I would like to add another .po file to translates the strings in commandOne.py.
I tried the following code :
if "fr" in os.environ['LANG']:
traduction = gettext.translation('commandOne_fr', localedir='../locales', languages=['fr'])
traduction.install()
else:
gettext.install('')
But I get a "FileNotFoundError: [Errno 2] No translation file found for domain: 'commandOne_fr' "
How can I use multiple file like that ? The package being a cli, there is many strings in a single file because of the help man and verbose mode...etc and this is not acceptable to have a single .po file with hundreds of strings.
Note : The mainscript.py calls a function from commandOne.py, which is itself inherited from an abstract class that contains other strings to translate... so I hope if any solution exists that it will also be applicable to the abstract class file.
Thank you
Translations are retrieved from .mo files, not .po files, see https://docs.python.org/3/library/gettext.html#gettext.translation. Most probably you have to compile CommandOne_fr.po with the program msgfmt into CommandOne_fr.mo.
Two more hints:
What you are doing looks like a premature optimization. You won't have any performance problem until the number of translations gets really big. Rather wait for that to happen.
Why the _fr in the name of the translation files? The language code fr is already a path component.
I was reading this document about Node.js file system, fs.writeFile(filename, data, [options], callback).
So I noticed that i have seen the [options] pretty often, but never used it for anything. Can someone give me an example? All the cases i had didn't use this option.
For anyone ending up here off a search looking for a flags reference, here it is:
Flag
Description
r
Open file for reading. An exception occurs if the file does not exist.
r+
Open file for reading and writing. An exception occurs if the file does not exist.
rs
Open file for reading in synchronous mode.
rs+
Open file for reading and writing, asking the OS to open it synchronously. See notes for 'rs' about using this with caution.
w
Open file for writing. The file is created (if it does not exist) or truncated (if it exists).
wx
Like 'w' but fails if the path exists.
w+
Open file for reading and writing. The file is created (if it does not exist) or truncated (if it exists).
wx+
Like 'w+' but fails if path exists.
a
Open file for appending. The file is created if it does not exist.
ax
Like 'a' but fails if the path exists.
a+
Open file for reading and appending. The file is created if it does not exist.
ax+
Like 'a+' but fails if the the path exists.
I'm guessing your interested in how an options parameter generally works in javascript.
As opposed to what the parameters are, which are stated in the docs:
options Object
encoding String | Null default = 'utf8'
mode Number default = 438 (aka 0666 in Octal)
flag String default = 'w'
Generally, the options parameter is an object, with properties that are the options you want to modify. So if you wanted to modify two of the options on fs.writeFile, you'd add each one as a property to options:
fs.writeFile(
"foo.txt",
"bar",
{
encoding: "base64",
flag: "a"
},
function(){ console.log("done!") }
)
And if you're confused as to what these three params are used for, the docs for fs.open have everything you need. It includes all the possibilities for flag, and a description for mode. The callback is called once the writeFile operation is complete.
fs.writeFile(filename,data,{flag: "wx"},function(err){
if(err) throw err
console.log('Date written to file, ',filename)
})
As you can see in the above code snippet, the third parameter is the options/flag. There are optional and used to indicate the behaviour of the file to be opened.
I have passed "wx" as option which indicates, file will open for writing and will be created if it doesn't exist. But it will fail if already exists.
By default "w" is passed as option.
For further reading on different options, here
These are the options.
encoding (string or NULL), default value is 'utf8'
mode (number), default value is 438 (aka 0666 in Octal)
flag (string), default value is 'w'
I have a perl script that traverses a set of directories and when it hits one of them it blows up with an Invalid Argument and I want to be able to programmatically skip it. I thought I could start by finding out the file type with the file command but it too blows up like this:
$ file /sys/devices/virtual/net/br-ex/speed
/sys/devices/virtual/net/br-ex/speed: ERROR: cannot read `/sys/devices/virtual/net/br-ex/speed' (Invalid argument)
If I print out the mode of the file with the perl or python stat function it tells me 33060 but I'm not sure what all the bits mean and I'm hoping a particular one would tell me not to try to look inside. Any suggestions?
To understand the stats number you got, you need to convert the number to octal (in python oct(...)).
Then you'll see that 33060 interprets to 100444. You're interested only in the last three digits (444). The first digit is file owner permissions, the second is group and the third is everyone else.
You can look at each of the numbers (in your case all are 4) as 3 binary bits in this order:
read-write-execute.
Since in your case owner, group & other has 4, it is translated (for all of them) to 100 (in binary) which means that only the read bit is on for all three - meaning that all three can only read the file.
As far as file permissions go, you should have been successful reading /sys/devices/virtual/net/br-ex/speed.
There are two reasons for the read to fail:
- Either speed is a directory, (directories require execute permissions to read inside).
- Or it's a special file - which can be tested using the -f flag in perl or bash, or using os.path.isfile(...) in python.
Anyhow, you can use the following links to filter files & directories according to their permissions in the 3 languages you mentioned:
ways to test permissions in perl.
ways to test permissions in python.
ways to test permissions in bash.
Not related to this particular case, but I hit the same error when I ran it on a malicious ELF (Linux executable) file. In that case it was because the program headers of the ELF was intentionally corrupted. Looking at the source code for file command, this is clear as it checks the ELF headers and bails out with the same error in case the headers are corrupted:
/*
* Loop through all the program headers.
*/
for ( ; num; num--) {
if (pread(fd, xph_addr, xph_sizeof, off) <
CAST(ssize_t, xph_sizeof)) {
file_badread(ms);
return -1;
}
TLDR; The file command checks not only the magic bytes, but it also performs other checks to validate a file type.
I'm trying to edit an existing binary file using NodeJS.
My code goes something like this:
file = fs.createWriteStream("/path/to/existing/binary/file", {flags: "a"});
file.pos = 256;
file.write(new Buffer([0, 1, 2, 3, 4, 5]));
In OS X, this works as expected (The bytes at 256..261 get replaced with 0..5).
In linux however, the 5 bytes get appended to the end of file. This is also mentioned in the NodeJS API Reference:
On Linux, positional writes don't work when the file is opened in append mode. The kernel ignores the position argument and always appends the data to the end of the file.
How do I get around this?
Open with a mode of r+ instead of a. r+ is the portable way to say that you want to read and/or write to arbitrary positions in the file, and that the file should already exist.
I'm having the following problem. I want to write a program in Fortran90 which I want to be able to call like this:
./program.x < main.in > main.out
Additionally to "main.out" (whose name I can set when calling the program), secondary outputs have to be written and I wanted them to have a similar name to either "main.in" or "main.out" (they are not actually called "main"); however, when I use:
INQUIRE(UNIT=5,NAME=sInputName)
The content of sInputName becomes "Stdin" instead of the name of the file. Is there some way to obtain the name of files that are linked to stdin/stdout when the program is called??
Unfortunately the point of i/o redirection is that you're program doesn't have to know what the input/output files are. On unix based systems you cannot look at the command line arguments as the < main.in > main.out are actually processed by the shell which uses these files to set up standard input and output before your program is invoked.
You have to remember that sometimes the standard input and output will not even be files, as they could be a terminal or a pipe. e.g.
./generate_input | ./program.x | less
So one solution is to redesign your program so that the output file is an explicit argument.
./program.x --out=main.out
That way your program knows the filename. The cost is that your program is now responsible for openning (and maybe creating) the file.
That said, on linux systems you can actually find yout where your standard file handles are pointing from the special /proc filesystem. There will be symbolic links in place for each file descriptor
/proc/<process_id>/fd/0 -> standard_input
/proc/<process_id>/fd/1 -> standard_output
/proc/<process_id>/fd/2 -> standard_error
Sorry, I don't know fortran, but a psudeo code way of checking the output file could be:
out_name = realLink( "/proc/"+getpid()+"/fd/1" )
if( isNormalFile( out_name ) )
...
Keep in mind what I said earlier, there is no garauntee this will actually be a normal file. It could be a terminal device, a pipe, a network socket, whatever... Also, I do not know what other operating systems this works on other than redhat/centos linux, so it may not be that portable. More a diagnostic tool.
Maybe the intrinsic subroutines get_command and/or get_command_argument can be of help. They were introduced in fortran 2003, and either return the full command line which was used to invoke the program, or the specified argument.