How do I avoid this race condition with readdir/inotify? - linux

Suppose I want to invoke some command on all files in a directory and set a watch to invoke that command on all files that get created in that directory. If I do:
while( ( sdi = readdir( d )) != NULL ) { ... }
closedir( d );
/* Files created here will be missed */
inotify_add_watch( ... );
then some files will potentially be missed. If I call inotify_add_watch()
before the readdir(), files may be acted on twice (it would require
a fair bit of infrastructure to prevent acting twice, and it seems that
the edge cases would be difficult to handle). Is there a simple way to avoid
having to record the names of all files worked on during the readdir loop and
comparing those to the names returned in the inotify_event structure? I can
minimize the amount of necessary comparisons with:
while( ( sdi = readdir( d )) != NULL ) { ... }
inotify_add_watch( ... );
while( ( sdi = readdir( d )) != NULL ) { /* record name */ ... }
closedir( d );
And usually the second readdir() loop will do nothing, but this feels like a bad hack.

You simply can't. The more you hack, the more race conditions you'll get.
The simplest actually working solution is to set the watch before using opendir(), and keep a list (set) of already used names (or their hashes).
But this isn't perfect either. User can have the file open in a text editor; you fix it, user saves it and the directory contains unfixed file anyway, though it's on your list.
The best method would be to be able for the program to actually distinguish used files by their content. In other words, you set watch, call command on readdir() results, then call it on inotify results and let the command itself know whether the file is fine already or not.

Related

Modx TV multi select list not saving values

I have a TV multi select list type that is evaluating a snippet:
#EVAL return $modx->runSnippet('getFeaturedResourceTree');
Evaluating this snippett:
<?php
$output = array();
$context = $modx->resource->get('context_key');
$sql = "select * from modx_site_content where context_key = '$context' order by `pagetitle`;";
$results = $modx->query($sql);
foreach($results as $result){
$output[] = $result['pagetitle'].'=='.$result['id'];
}
$output = implode('||', $output);
echo $output;
return;
This does work in the manager, I can select and pick multiple resources in the list. However, when I save the TV, nothing is actuially saved. the TV values are not present in the database and when I reload the resource, the TV field is blank.
what could the problem be here?
I'm fairly certain you can accomplish what you're trying to do with an #SELECT binding rather than #EVAL. This has 2 potential benefits:
#EVAL is Evil, LOL. Not all the time, mind you—there are certainly legitimate uses of #EVAL but I've personally always tried very hard to find an alternative, whenever I've considered using #EVAL.
The method I'm about to show you has worked for me in the past, so I'm speculating it will work for you.
#SELECT pagetitle, id FROM modx_site_content WHERE context_key = 'web' ORDER BY `pagetitle`
If you're using #EVAL because you have multiple contexts and you want the context of the Resource currently being edited, then you could use your Snippet, but I would try:
Rather than echo-ing your output, return it.
Call the snippet in a Chunk, and render the Chunk on a test page to ensure it has the output you want, formatted for the TV Input Options exactly the way it should be.
If the Chunk output passes the test, call it into the TV Input Options field with the #CHUNK binding.
One more note: I can't remember if the current Resource is available in the TV as $modx->resource or $resource, but that might be something you want to double check.

How to detect hidden files with node.js on OS X

Poking around I was unable to discover a way to detect hidden files in OS X with node (nodejs).
Of course, we can easily find the ".dot_hidden" files, but on the Mac, there are files/folders that are "protected" system files, that most users shouldn't fiddle with. In the Finder GUI, they are invisible or grey'd out when hidden files are forced to be shown via "AppleShowAllFiles".
I did discover a reference to UF_HIDDEN : 0x8000 here:
https://developer.apple.com/library/mac/documentation/FileManagement/Conceptual/FileSystemProgrammingGuide/FileSystemDetails/FileSystemDetails.html
Using node's stat, we can return 2 additional bits of info that may provide a clue for the hidden status:
mode: 33188, // File protection.
ino: 48064969, // File inode number. An inode is a file
system data structure that stores
information about a file.
I'm not really a hex / binary guy, but it looks like grabbing the stat's "ino" property we can apply 0x8000 and determine if the file is being hinted as hidden or not.
I didn't have any success with the 0x8000 mask on the mode, but did have some with ino.
Here's what I've got, checking the "ino" returns 0 or 1726, when it's 1726 the file seems to match as a hidden file in OS X.
var fs = require("fs");
var dir = "/";
var list = fs.readdirSync(dir);
list.forEach(function(f){
// easy dot hidden files
var hidden = (f.substr(0, 1) == ".") ? true : false;
var ino = 0;
var syspath = dir + "/" + f;
if( ! hidden ){
var stats = fs.statSync(syspath);
ino = parseInt( stats.ino & 0x8000, 8);
// ino yeilds 0 when hidden and 1726 when not?
if(ino || dotted){
hidden = true;
}
}
console.log(syspath, hidden, ino);
});
So my question is if I'm applying the 0x8000 mask properly on the ino value to yeild a proper result?
And how would one go about parsing the ino property get at all the other flags contained within it?
The inode number (stats.ino) is a number which uniquely identifies a file; it has nothing to do with the hidden status of the file. (Indeed, it's possible to set or clear the hidden flag on a file at any time, and this won't change the inode number.)
The hidden flag is part of the st_flags field in the struct stat structure. Unfortunately, it doesn't look like the node.js fs module exposes this value, so you may need to shell out to the stat shell utility if you need to get this information on Mac OS X. (Short version: stat -f%f file will print a file's flags, represented in decimal.)

WebSite Scraper: How to make my Parallel Thread print to seperate output locations

I'm putting together a program that can store data 5 or 6 webpages into an array, and then extract the 'the Titles' from each page. So far, it retrieves the pages content except when I try to have the extracted 'Titles' printed. I can only print to one output file.
When I googled for a solution, it took me down every road except for my question. Can someone suggest some ways I can print the 'Titles' of each page to separate output files?
This is my code:
#!/usr/bin/perl -w
use warnings;
use threads;
use LWP::UserAgent qw();
use WWW::Mechanize;
my #threads = ();
my #urls = qw(http://site1.com http://site2.com);
foreach my $url ( #urls ) {
push #threads, async {
my $mech = WWW::Mechanize->new();
printf( "Loaded: %s \n", $url );
my $res = $mech->get( $url );
my $ducktales = $mech->title;
$_->join for #threads;
open( DATA, ">C:/Users/User/Desktop/11.txt" ) or die "cant";
print DATA $ducktales;
};
}
First, let's look at your open:
open(DATA,">C:/Users/User/Desktop/11.txt")
You are using a bareword handle, DATA. Such handles are package global meaning if you opened different files at different points in your code, each new open would cause the previously opened file to be closed.
On top of that, the DATA filehandle is special, and you should probably not trample on it.
So, first, use lexical filehandles:
open my $data, ...
Next, if an error occurs, you do not show the name of the file or the error message, accessible through $!. This means you are only thinking in terms of single, global filehandles.
open my $data, '>', $data_file
or die "Cannot open '$data_file' for writing: $!";
Now, where does $data_file come from? If I understand correctly, you want one data file per URL. Therefore, it makes sense to name the data file based on the URL, restricting the name to consist of some safe subset of characters.
For now, forget about threads, and write the subroutine that will take a URL, fetch it, extract the title, and write it to a file based on the URL:
sub extract_and_write_title {
my $url = shift;
# fetch document
# extract title
# if success, open file named based on URL
# write title, close file
return;
}
Now, in your main loop, you can create threads based on this routine:
push #threads, threads->create(
\&extract_and_write_title,
$url,
);
You can fill in the blanks. As a rule, I do not give random people in the intarwebs complete scraping solutions.

Searching for related events in log file

Let's say I have a log file, which contains lines describing certain events. E.g.:
15.03.2014 (14:23) Thing #25 deleted, user #david, session #45
15.03.2014 (15:00) Thing #26 created, user #alex, session #54
...
I can easily extract standalone events using grep - it works fine even if I don't know all the information about an event.
But I want to make a step further and investigate related events. Consider following lines in log:
15.03.2014 (14:23) Thing #25 created, user #david, session #45
...
17.03.2014 (15:00) Thing #25 deleted, user #david, session #54
I want to search for Thing #X created, user #Y, session #Z events only if they are succeeded by Thing #X deleted, user #Y, session #M event, where M and Z are different.
Of course I can do that in 5-10 lines of code: search events of the first type, take all succeeding lines, search events of the second type, filter.
But maybe there is some tool for this and I will be reinventing the wheel?
Perl is a very powerful tool for these sorts of tasks, and can handle it with a one-liner, something like this:
cat txt | perl -n -e 'if (m^Thing #(\d+).*? (created|deleted).*? user #(\S+),.*? session #(\d+)^) { my $id = "$3.$1"; if ($2 eq "created") { #db{$id} = [$4,$_] } else { if (exists($db{$id}) && $db{$id}[0] != $4) { print $db{$id}[1]."$_" } delete #db{$id} } }'
Here's the same thing as a shell script, for ease of reading:
#!/usr/bin/perl
while (<>) {
if (m^Thing #(\d+).*? (created|deleted).*? user #(\S+),.*? session #(\d+)^) {
my $id = "$3.$1";
if ($2 eq "created") {
#db{$id} = [$4,$_]
} else {
if (exists($db{$id}) && $db{$id}[0] != $4) {
print $db{$id}[1]."$_"
}
delete #db{$id};
}
}
}
This will print out the create/destroy line pairs where a given user created and destroyed a particular Thing with a different session id.
Note the script assumes that 'Thing' identifiers are user-specific, and treats cases where one user creates Thing X and another destroys Thing X as separate Things (if this is not true and users share Things, change $id to "$1"). It also assumes Things are destroyed at most once per create (if multiple deletes per create are possible, remove the delete line). Obviously I don't have your actual input file, so you may need to adjust the regexp to match the actual format.
This approach may be notably better than performing multiple searches as suggested in the OP, because it does everything in a single pass through the log with no temporary files; thus it may be more efficient/appropriate for very large log files. The memory utilization scales with the number of 'Things' that are live at any point, so should be reasonable unless your log has a huge number of very long-lived Things.

IsolatedStorage.GetFileNames fails on MonoTouch/MonoDroid

I was trying out MonoTouch/MonoAndroid and everything was going
well until I called IsolatedStorageFile.GetFileNames(string) function. The
parameter was "Foo/Foo1/*". The result is SecurityException with no message.
The directory "Foo/Foo1" exists, because it has just been found using IsolatedStorageFile.GetDirectoryNames() call.
I identified this bit in Mono sources that throws the exception (in IsolatedStorageFile.cs):
DirectoryInfo[] subdirs = directory.GetDirectories (path);
// we're looking for a single result, identical to path (no pattern here)
// we're also looking for something under the current path (not
outside isolated storage)
if ((subdirs.Length == 1) && (subdirs [0].Name == path) && (subdirs[0].FullName.IndexOf(directory.FullName) >= 0)) {
afi = subdirs [0].GetFiles (pattern);
} else {
// CAS, even in FullTrust, normally enforce IsolatedStorage
throw new SecurityException ();
}
I can't step into it with the debugger so I don't know why the
condition is false. This happens both on iOS and Android. There was a
similar issue logged long time ago at
http://www.digipedia.pl/usenet/thread/12492/1724/#post1724, but there
are no replies.
The same code works on Windows Phone 7 without problems (with \ for path separators).
Has anyone got any ideas what might be causing it? Is it the uppercase in
directory names a problem?
It is a bug in Mono. IsolatedStorage will not work with paths that contain more than one directory in a row (such as Foo/Foo1/*)
I copied the code of GetFileNames() method from Mono to my project so that I can debug it. I found out that the problem is in the 2nd term of this condition (IsolatedStorageFile.cs:846):
if ((subdirs.Length == 1) && (subdirs [0].Name == path) &&(subdirs[0].FullName.IndexOf(directory.FullName) >= 0)) {
afi = subdirs [0].GetFiles (pattern);
} else {
// CAS, even in FullTrust, normally enforce IsolatedStorage
throw new SecurityException ();
}
For example when path passed to GetFileNames() is "Foo/Bar/*", subdirs[0].Name will be "Bar" while path will be "Foo/Bar" and the condition will fail causing the exception.

Resources