Searching for related events in log file - search

Let's say I have a log file, which contains lines describing certain events. E.g.:
15.03.2014 (14:23) Thing #25 deleted, user #david, session #45
15.03.2014 (15:00) Thing #26 created, user #alex, session #54
...
I can easily extract standalone events using grep - it works fine even if I don't know all the information about an event.
But I want to make a step further and investigate related events. Consider following lines in log:
15.03.2014 (14:23) Thing #25 created, user #david, session #45
...
17.03.2014 (15:00) Thing #25 deleted, user #david, session #54
I want to search for Thing #X created, user #Y, session #Z events only if they are succeeded by Thing #X deleted, user #Y, session #M event, where M and Z are different.
Of course I can do that in 5-10 lines of code: search events of the first type, take all succeeding lines, search events of the second type, filter.
But maybe there is some tool for this and I will be reinventing the wheel?

Perl is a very powerful tool for these sorts of tasks, and can handle it with a one-liner, something like this:
cat txt | perl -n -e 'if (m^Thing #(\d+).*? (created|deleted).*? user #(\S+),.*? session #(\d+)^) { my $id = "$3.$1"; if ($2 eq "created") { #db{$id} = [$4,$_] } else { if (exists($db{$id}) && $db{$id}[0] != $4) { print $db{$id}[1]."$_" } delete #db{$id} } }'
Here's the same thing as a shell script, for ease of reading:
#!/usr/bin/perl
while (<>) {
if (m^Thing #(\d+).*? (created|deleted).*? user #(\S+),.*? session #(\d+)^) {
my $id = "$3.$1";
if ($2 eq "created") {
#db{$id} = [$4,$_]
} else {
if (exists($db{$id}) && $db{$id}[0] != $4) {
print $db{$id}[1]."$_"
}
delete #db{$id};
}
}
}
This will print out the create/destroy line pairs where a given user created and destroyed a particular Thing with a different session id.
Note the script assumes that 'Thing' identifiers are user-specific, and treats cases where one user creates Thing X and another destroys Thing X as separate Things (if this is not true and users share Things, change $id to "$1"). It also assumes Things are destroyed at most once per create (if multiple deletes per create are possible, remove the delete line). Obviously I don't have your actual input file, so you may need to adjust the regexp to match the actual format.
This approach may be notably better than performing multiple searches as suggested in the OP, because it does everything in a single pass through the log with no temporary files; thus it may be more efficient/appropriate for very large log files. The memory utilization scales with the number of 'Things' that are live at any point, so should be reasonable unless your log has a huge number of very long-lived Things.

Related

Dart and Process.run

I have three issues with Process.run that I don't understand.
when I use ps aux in my Linux shell I get a nicely formatted list of processes, when I do it inside Visual Studio I get that as one huge string, I would like to create a data class, so I can map those values.
ie MEM CPU, so later I can do myobjec.CPU and I can see how much CPU the process uses with certain PID.
Second problem I would like to monitor this data so that it's updated every 0.5s. But with this technique of me fetching string, formatting string and then outputting that in UI, and doing that every 0.5 seconds sounds insane.
executing something like ps -U \$USER u in Process.run does not work since it outputs an actual string $USER instead of the name of the current user
so far I have two attempts
Future<List<List<String>>> processPlayground() async {
// This works on Windows/Linux/Mac
var shell = Shell(workingDirectory: "/");
List<List<String>> main = [];
List<Data> dataList = [];
Data? dt;
await shell.run('''
ps -U moonlight u
''').then((value) {
for (var element in value.outLines) {
main.add(element.replaceAll(_whitespaceRE, " ").split(" ").toList());
}
});
return main;
}
}
this works somewhat I have data class where I define, mem, CPU, user and so on and then I format this and present it in flutter UI, but it's very slow, and doing this every 0.5 is inefficient, UI lags.
I have also this :
test() async {
var process = await Process.start("ps", ["aux"]);
process.stdout.transform(utf8.decoder).forEach((element) {
sideList.add(element);
});
}
tz() {
print(sideList[0]);
mainList.add(sideList[0].replaceAll(_whitespaceRE, " ").split(" ").toList());
print(mainList[0]);
}
But here formatting somehow does not work, but here I also don't run these commands in a shell, don't know if that does make a difference.
TLDR
I need a way to get the list of processes, that gets updated every 0.5 seconds with current values, and I need later to have a way to put that in a table for further manipulation. In a similar way that Task Manager on Windows does, or System Monitor on Gnome does.
or should I use a different approach?

How to Create Same resource Twice in Puppet

My requirement is to do some repetitive file configuration stuff using a loop, Something like following,
$no_of_managers = 2
$array = ['One','two','Three']
define loop() {
notice("Configuring The Manager Nodes!!")
if ($name == $no_of_managers+1) {
notice("Loop Iteration Finished!!!")
}
else
{
notice("Iteration Number : $name \n")
# Doing All Stuff Here
resource {$array:}
$next = $name + 1
loop { $next: }
}
}
loop { "1":}
define resource () {
# Doing my other Stuff
notice ("The Parsed value Name : ${name}\n")
}
Now when The second iteration is running the following error occurs,
Error: Duplicate declaration: Resource[One] is already declared in file
How can I overcome this, What I'm doing is a cluster setup. Is there a workaround to do this, I'm a newbie for puppet so Your kind guidance highly appreciated.
The Use Case :
I'm trying to setup a cluster which have multiple Manager/Worker nodes, So using this script the user has the privilege to select how many manager nodes he needs. So the first loop is for that to copy necessary files and create required number of nodes.
The second loop is there to push all .erb templates. Because each Node has slightly different configs the .erb files have there own logic inside them.
So after each Iteration I want to push the .erb templates to the respective node.
In Puppet 3.x, you cannot build a loop in the fashion you are trying.
resource { $array: }
is a loop over the contents of $array if you will.
It is not really clear what you are trying to solve. If you can make your question a bit more concrete, we may be able to suggest an actual implementation.
Update
If you really want to go down this road, you need to generate unique names for your derived resources.
$local_names = regsubst($array, '$', "-$name")
resource { $local_names: }
In your defined type, you will have to retrieve the original meaning by removing the suffix.
define resource() {
$orig_name = regsubst($name, '-[0-9]+$', '')
# use $orig_name where you used $name before
}
Note that even exported resources must have unique names. So the transformation may have to happen on in the manifest of the receiving node.

CRM PlugIn Pass Variable Flag to New Execution Pipeline

I have records that have an index attribute to maintain their position in relation to each other.
I have a plugin that performs a renumbering operation on these records when the index is changed or new one created. There are specific rules that apply to items that are at the first and last position in the list.
If a new (or existing changed) item is inserted into the middle (not technically the middle...just somewhere between start and end) of the list a renumbering kicks off to make room for the record.
This renumbering process fires in a new execution pipeline...We are updating record D. When I tell record E to change (to make room for D) that of course fires the plugin on update message.
This renumbering is fine until we reach the end of the list where the plugin then gets into a loop with the first business rule that maintains the first and last record differently.
So I am trying to think of ways to pass a flag to the execution context spawned by the renumbering process so the recursion skips the boundary edge business rules if IsRenumbering == true.
My thoughts / ideas:
I have thought of using the Depth check > 1 but that isn't a reliable value as I can't explicitly turn it on or off....it may happen to work but that is not engineering a solid solution that is hoping nothing goes bump. Further a colleague far more knowledgeable than I said that when a workflow calls a plugin the depth value is off and can't be trusted.
All my variables are scoped at the execute level so as to avoid variable pollution at the class level....However if I had a dictionary object, tuple, something at the class level and one value would be the thread id and the other the flag value then perhaps my subsequent execution context could check if the same owning thread id had any values entered.
Any thoughts or other ideas on how to pass context information to a new pipeline would be greatly appreciated.
Per Nicknow sugestion I tried sharedvariables but they seem to be going out of scope...:
First time firing post op:
if (base.Stage == EXrmPluginStepStage.PostOperation)
{
...snip...
foreach (var item in RenumberSet)
{
Context.ParentContext.SharedVariables[recordrenumbering] = "googly";
Entity renumrec = new Entity("abcd") { Id = item.Id };
#region We either add or subtract indexes based upon sortdir
...snip...
renumrec["abc_indexfield"] = TmpIdx + 1;
break;
.....snip.....
#endregion
OrganizationService.Update(renumrec);
}
}
Now we come into Pre-Op of the recursion process kicked off by the above post-op OrganizationService.Update(renumrec); and it seems based upon this check the sharedvariable didn't carry over...???
if (!Context.SharedVariables.Contains(recordrenumbering))
{
//Trace.Trace("Null Set");
//Context.SharedVariables[recordrenumbering] = IsRenumbering;
Context.SharedVariables[recordrenumbering] = "Null Set";
}
throw invalidpluginexception reveals:
Sanity Checks:
Depth : 2
Entity: ...
Message: Update
Stage: PreOperation [20]
User: 065507fe-86df-e311-95fe-00155d050605
Initiating User: 065507fe-86df-e311-95fe-00155d050605
ContextEntityName: ....
ContextParentEntityName: ....
....
IsRenumbering: Null Set
What are you looking for is IExecutionContext.SharedVariables. Whatever you add here is available throughout the entire transaction. Since you'll have child pipelines you'll want to look at the ParentContext for the value. This can all get a little tricky, so be sure to do a lot of testing - I've run into many issues with SharedVariables and looping operations in Dynamics CRM.
Here is some sample (very untested) code to get you started.
public static bool GetIsRenumbering(IPluginExecutionContext pluginContext)
{
var keyName = "IsRenumbering";
var ctx = pluginContext;
while (ctx != null)
{
if (ctx.SharedVariables.Contains(keyName))
{
return (bool)ctx.SharedVariables[keyName];
}
else ctx = ctx.ParentContext;
}
return false;
}
public static void SetIsRenumbering(IPluginExecutionContext pluginContext)
{
var keyName = "IsRenumbering";
var ctx = pluginContext;
ctx.SharedVariables.Add(keyName, true);
}
A very simple solution: add a bit field to the entity called "DisableIndexRecalculation." When your first plugin runs, make sure to set that field to true for all of your updates. In the same plugin, check to see if "DisableIndexRecalculation" is set to true: if so, set it to null (by removing it from the TargetEntity entirely) and stop executing the plugin. If it is null, do your index recalculation.
Because you are immediately removing the field from the TargetEntity if it is true the value will never be persisted to the database so there will be no performance penalty.

How do I avoid this race condition with readdir/inotify?

Suppose I want to invoke some command on all files in a directory and set a watch to invoke that command on all files that get created in that directory. If I do:
while( ( sdi = readdir( d )) != NULL ) { ... }
closedir( d );
/* Files created here will be missed */
inotify_add_watch( ... );
then some files will potentially be missed. If I call inotify_add_watch()
before the readdir(), files may be acted on twice (it would require
a fair bit of infrastructure to prevent acting twice, and it seems that
the edge cases would be difficult to handle). Is there a simple way to avoid
having to record the names of all files worked on during the readdir loop and
comparing those to the names returned in the inotify_event structure? I can
minimize the amount of necessary comparisons with:
while( ( sdi = readdir( d )) != NULL ) { ... }
inotify_add_watch( ... );
while( ( sdi = readdir( d )) != NULL ) { /* record name */ ... }
closedir( d );
And usually the second readdir() loop will do nothing, but this feels like a bad hack.
You simply can't. The more you hack, the more race conditions you'll get.
The simplest actually working solution is to set the watch before using opendir(), and keep a list (set) of already used names (or their hashes).
But this isn't perfect either. User can have the file open in a text editor; you fix it, user saves it and the directory contains unfixed file anyway, though it's on your list.
The best method would be to be able for the program to actually distinguish used files by their content. In other words, you set watch, call command on readdir() results, then call it on inotify results and let the command itself know whether the file is fine already or not.

How can I implement an anti-spamming technique on my IRC bot?

I run my bot in a public channel with hundreds of users. Yesterday a person came in and just abused it.
I would like to let anyone use the bot, but if they spam commands consecutively and if they aren't a bot "owner" like me when I debug then I would like to add them to an ignored list which expires in an hour or so.
One way I'm thinking would be to save all commands by all users, in a dictionary such as:
({
'meder#freenode': [{command:'.weather 20851', timestamp: 209323023 }],
'jack#efnet': [{command:'.seen john' }]
})
I would setup a cron job to flush this out every 24 hours, but I would basically determine if a person has made X number of commands in a duration of say, 15 seconds and add them to an ignore list.
Actually, as I'm writing this answer I thought of a better idea.. maybe instead of storing each users commands, just store the the bot's commands in a list and keep on pushing until it reaches a limit of say, 15.
lastCommands = [], limit = 5;
function handleCommand( timeObj, action ) {
if ( lastCommands.length < limit ) {
action();
} else {
// enumerate through lastCommands and compare the timestamps of all 5 commands
// if the user is the same for all 5 commands, and...
// if the timestamps are all within the vicinity of 20 seconds
// add the user to the ignoreList
}
}
watch_for('command', function() {
handleCommand({timestamp: 2093293032, user: user}, function(){ message.say('hello there!') })
});
I would appreciate any advice on the matter.
Here's a simple algorithm:
Every time a user sends a command to the bot, increment a number that's tied to that user. If this is a new user, create the number for them and set it to 1.
When a user's number is incremented to a certain value (say 15), set it to 100.
Every <period> seconds, run through the list and decrement all the numbers by 1. Zero means the user's number can be freed.
Before executing a command and after incrementing the user's counter, check to see if it exceeds your magic max value (15 above). If it does, exit before executing the command.
This lets you rate limit actions and forgive excesses after a while. Divide your desired ban length by the decrement period to find the number to set when a user exceeds your threshold (100 above). You can also add to the number if a particular user keeps sending commands after they've been banned.
Well Nathon has already offered a solution, but it's possible to reduce the code that's needed.
var user = {};
user.lastCommandTime = new Date().getTime(); // time the user send his last command
user.commandCount = 0; // command limit counter
user.maxCommandsPerSecond = 1; // commands allowed per second
function handleCommand(obj, action) {
var user = obj.user, now = new Date().getTime();
var timeDifference = now - user.lastCommandTime;
user.commandCount = Math.max(user.commandCount - (timeDifference / 1000 * user.maxCommandsPerSecond), 0) + 1;
user.lastCommandTime = now;
if (user.commandCount <= user.maxCommandsPerSecond) {
console.log('command!');
} else {
console.log('flooding');
}
}
var obj = {user: user};
var e = 0;
function foo() {
handleCommand(obj, 'foo');
e += 250;
setTimeout(foo, 400 + e);
}
foo();
In this implementation, there's no need for a list or some global callback every X seconds, instead we just reduce the commandCount every time there's a new message, based on time difference to the last command, it's also possible to allow different command rates for specific users.
All we need are 3 new properties on the user object :)
Redis
I would use the insanely fast advanced key-value store redis to write something like this, because:
It is insanely fast.
There is no need for cronjob because you can set expire on keys.
It has atomic operations to increment key
You could use redis-cli for prototyping.
I myself really like node_redis as redis client. It is a really fast redis client, which can easily be installed using npm.
Algorithme
I think my algorithme would look something like this:
For each user create a unique key which counts the commands consecutively executed. Also set expire to the time when you don't flag a user as spammer anymore. Let's assume the spammer has nickname x and the expire 15.
Inside redis-cli
incr x
expire x 15
When you do a get x after 15 seconds then the key does not exist anymore.
If value of key is bigger then threshold then flag user as spammer.
get x
These answers seem to be going the wrong way about this.
IRC Servers will disconnect your client regardless of whether you're "debugging" or not if the client or bot is flooding a channel or the server in general.
Make a blanket flood control, using the method #nmichaels has detailed, but on the bot's network connection to the server itself.

Resources