I want Logstash, when it's processing input entries, to simply drop entries that are older than N days.
I assume I'll use the date module and obviously drop, but I don't know how to connect them.
The only way that I know to do date level comparison is via Ruby code. You need the date filter to parse the timestamp (that's its own issue).
Once you parse the date into a field (e.g., event["#timestamp"]), then you can use it to determine if you want to ignore it or not:
5.0:
ruby {
code => "event.cancel if (Time.now.to_f - event.get('#timestamp').to_f) > (60 * 60 * 24 * 5)"
}
Pre-5.x:
ruby {
code => "event.cancel if (Time.now.to_f - event['#timestamp'].to_f) > (60 * 60 * 24 * 5)"
}
In this case, 5 is N.
Also, it's worth pointing out that this is relative to the machine time where Logstash happens to be running. If it's inaccurate, then it will impact date math. Similarly, if the source machine's system clock is wrong, then it too can be a problem.
Drawing on Alain's good point, you could use this store the lag time, in addition to just dropping based on it.
5.0:
ruby {
code => "event.set('lag_seconds', Time.now.to_f - event.get('#timestamp').to_f))"
}
# 5 represents the number of days to allow
if [lag_seconds] > (60 * 60 * 24 * 5) {
drop { }
}
Pre-5.x:
ruby {
code => "event['lag_seconds'] = Time.now.to_f - event['#timestamp'].to_f)"
}
# 5 represents the number of days to allow
if [lag_seconds] > (60 * 60 * 24 * 5) {
drop { }
}
Using this approach, you would then be indexing lag_seconds, which is a fractional amount, thereby allowing you to analyze lag in your index if this goes into ES or some other data store.
Related
I'm implementing some security measures and one of them is a site wide throttle on too many failed requests as protection against distributed brute force attacks.
The question I am stuck with is, after how many failed login requests should I start to throttle?
Now one reasonable way is, as mentioned here "using a running average of your site's bad-login frequency as the basis for an upper limit". If the site has an average of 100 failed logins, 300 (puffer added) might be a good threshold.
Now I don't have a running average and I don't want someone having to actively increase the upper limit as the user base grows. I want a dynamic formula that calculates this limit based on the active users amount.
The difficulty is that if there are only a few users, they should have a much higher user to threshold ratio than let's say 100k users. Meaning that for example for 50 users the limit could be set at 50% of the total user count which means allowing 25 failed login requests site-wide in a given timespan. But this ratio should decrease for 100k users, the threshold should be more like around 1%. 1000 failed login requests in the same let's say hour, is a lot (probably not accurate at all I am not a security expert, the numbers are only examples to illustrate).
I was wondering, is there any mathematical formula that could archive this in a neat way?
This is a chart of what I think the formula should be calculating approximately:
Here is what I have now (I know it's terrible, any suggestion will be better I'm sure):
$threshold = 1;
if ($activeUsers <= 50) {
// Global limit is the same as the total of each users individual limit
$threshold *= $activeUsers; // If user limit is 4, global threshold will be 4 * user amount
} elseif ($activeUsers <= 200) {
// Global requests allows each user to make half of the individual limit simultaneously
// over the last defined timespan
$threshold = $threshold * $activeUsers / 2;
} elseif ($activeUsers <= 600) {
$threshold = $threshold * $activeUsers / 2.5;
} elseif ($activeUsers <= 1000) {
$threshold = $threshold * $activeUsers / 3.5;
} else { // More than 1000
$threshold = $threshold * $activeUsers / 5;
}
return $threshold;
I ended up not using some math formula but a ratio of unsuccessful to total login requests.
Code looks like this:
$loginAmountStats = $this->requestTrackRepository->getLoginAmountStats();
// Calc integer amount from given percentage and total login
$allowedFailureAmount = $loginAmountStats['login_total'] / 100 * $this->settings['login_failure_percentage'];
if ($loginAmountStats['login_failures'] > $allowedFailureAmount) {
// If changed, update SecurityServiceTest distributed brute force test expected error message
$msg = 'Maximum amount of tolerated requests reached site-wide.';
throw new SecurityException('captcha', SecurityException::GLOBAL_LOGIN, $msg);
}
This code is running on c#
int x = Environment.TickCount;
docs for Environment.TickCount
Gets the number of milliseconds elapsed since the system started. TickCount cycles between Int32.MinValue, which is a negative number, and Int32.MaxValue once every 49.8 days.
TickCount will increment from Zero to (2147483647) for approximately 24.9 days, then jump back to (-2147483648), which is a negative number, then increment back to zero during the next 24.9 days.
We can use int result = Environment.TickCount & Int32.MaxValue; to make it rotate between (0) and (2147483647) for every 24.9 days
I want an equivalent method in NodeJS, which would yield the same result.
I made a search on NodeJS npmjs but didn't find similar function
os.uptime() is the closest method to what you need which
Returns the system uptime in number of seconds
NodeJS docs
But this is a valid question that what will be the max limit for the above method.?
In NodeJS the max safe integer is Number.MAX_SAFE_INTEGER that is 9007199254740991. Which is basically 289583309.373 years. So I guess we will have to assume this as the max value for said method.
If you want the functionality as of c#'s TickCount, you will need to create your own custom method, maybe something like given below:
// this method will cycle between 0 and 2147483647
function TickCount() {
const miliseconds_elapsed = os.uptime() * 1000; // convert the time in miliseconds
return miliseconds_elapsed % 2147483647;
}
// this method will cycle between -2147483648 to 2147483647
// note: it will not start from 0
function TickCount() {
const miliseconds_elapsed = os.uptime() * 1000; // convert the time in miliseconds
return (miliseconds_elapsed % 4294967296) - 2147483648;
}
// this method will cycle between -2147483648 to 2147483647
// note: it will start from 0 goes to 2147483647
// then comes back to -2147483648 and starts the cycle
function TickCount() {
const miliseconds_elapsed = os.uptime() * 1000; // convert the time in miliseconds
if (miliseconds_elapsed <= 2147483647) {
return miliseconds_elapsed;
}
return ((miliseconds_elapsed - 2147483648) % 4294967296) - 2147483648;
}
The Microsoft docs say Environment.TickCount is an integer that "contains the amount of time in milliseconds that has passed since the last time the computer was started".
When searching for that I found this question and the answers suggest to use process.uptime() oros.uptime()
I am dealing with some Rust code that works with durations of days but the implementation of Duration::days(n) is, per the documentation n * 24 * 60 * 60 seconds, which isn't n days because not all days are 24 * 60 * 60 seconds.
This behaviour is well documented:
pub fn days(days: i64) -> Duration
Makes a new Duration with given number of days. Equivalent to
Duration::seconds(days * 24 * 60 * 60) with overflow checks. Panics
when the duration is out of bounds.
Is there a way with Rust Chrono to get a duration that is, strictly, 1 day rather than a number of seconds and is compatible with the DateTime types? Not all days are the same number of seconds. seconds and days are quite different units. If there were such a function then the following would always give a result that is the same time of day on the following day?
let start = Local.now();
let one_day_later = start + function_that_returns_a_duration_of_days(1);
Again, Duration:days(1) is not such a function because it returns 1 * 24 * 60 * 60 seconds, rather than 1 day.
For example, with TZ set to America/Denver the following:
let start = Local.ymd(2019, 3, 10).and_hms(0, 0, 0);
println!("start: {}", start);
let end = Local.ymd(2019, 3, 11).and_hms(0, 0, 0);
println!("end: {}", end);
let elapsed_seconds = end.timestamp() - start.timestamp();
println!("elapsed_seconds: {}", elapsed_seconds);
let end2 = start + Duration::days(1);
println!("end2: {}", end2);
let elapsed_seconds2 = end2.timestamp() - start.timestamp();
println!("elapsed_seconds2: {}", elapsed_seconds2);
Returns:
start: 2019-03-10 00:00:00 -07:00
end: 2019-03-11 00:00:00 -06:00
elapsed_seconds: 82800
end2: 2019-03-11 01:00:00 -06:00
elapsed_seconds2: 86400
It adds 86400 seconds, rather than 1 day.
I can get the correct result with:
let one_day_later =
(start.date() + Duration::days(1)).and_hms(start.hour(), start.minute(), start.second());
But I would prefer a function that returns a duration of days and in general would like to know more about Rust Chrono capabilities for handling durations. Does it have durations with units other than seconds? What about weeks, months and years, which also have variable numbers of seconds.
I should probably say that I don't know Rust, only having worked with it for a few days now and I haven't much read the source code. I did look at it, but find it difficult to understand due to my limited familiarity with the language.
A Duration is an amount of time. There is no amount of time that when added to an instant, always yields the same time on the next day, because as you have noticed, calendar days may have different amounts of time in them.
Not only years, weeks and days, but even hours and minutes do not always comprise the same amount of time (Leap second). A Duration is an amount of time, not a "calendar unit". So no, a Duration is not capable of expressing an idea like "same time next week".
The easiest way to express "same time next day" is with the succ and and_time methods on Date:
let one_day_later = start.date().succ().and_time(start.time());
and_time will panic if the time does not exist on the new date.
What is the best and fastest way to iterate over Collection objects in Groovy. I know there are several Groovy collection utility methods. But they use closures which are slow.
The final result in your specific case might be different, however benchmarking 5 different iteration variants available for Groovy shows that old Java for-each loop is the most efficient one. Take a look at the following example where we iterate over 100 millions of elements and we calculate the total sum of these numbers in the very imperative way:
#Grab(group='org.gperfutils', module='gbench', version='0.4.3-groovy-2.4')
import java.util.concurrent.atomic.AtomicLong
import java.util.function.Consumer
def numbers = (1..100_000_000)
def r = benchmark {
'numbers.each {}' {
final AtomicLong result = new AtomicLong()
numbers.each { number -> result.addAndGet(number) }
}
'for (int i = 0 ...)' {
final AtomicLong result = new AtomicLong()
for (int i = 0; i < numbers.size(); i++) {
result.addAndGet(numbers[i])
}
}
'for-each' {
final AtomicLong result = new AtomicLong()
for (int number : numbers) {
result.addAndGet(number)
}
}
'stream + closure' {
final AtomicLong result = new AtomicLong()
numbers.stream().forEach { number -> result.addAndGet(number) }
}
'stream + anonymous class' {
final AtomicLong result = new AtomicLong()
numbers.stream().forEach(new Consumer<Integer>() {
#Override
void accept(Integer number) {
result.addAndGet(number)
}
})
}
}
r.prettyPrint()
This is just a simple example where we try to benchmark the cost of iteration over a collection, no matter what the operation executed for every element from collection is (all variants use the same operation to give the most accurate results). And here are results (time measurements are expressed in nanoseconds):
Environment
===========
* Groovy: 2.4.12
* JVM: OpenJDK 64-Bit Server VM (25.181-b15, Oracle Corporation)
* JRE: 1.8.0_181
* Total Memory: 236 MB
* Maximum Memory: 3497 MB
* OS: Linux (4.18.9-100.fc27.x86_64, amd64)
Options
=======
* Warm Up: Auto (- 60 sec)
* CPU Time Measurement: On
WARNING: Timed out waiting for "numbers.each {}" to be stable
user system cpu real
numbers.each {} 7139971394 11352278 7151323672 7246652176
for (int i = 0 ...) 6349924690 5159703 6355084393 6447856898
for-each 3449977333 826138 3450803471 3497716359
stream + closure 8199975894 193599 8200169493 8307968464
stream + anonymous class 3599977808 3218956 3603196764 3653224857
Conclusion
Java's for-each is as fast as Stream + anonymous class (Groovy 2.x does not allow using lambda expressions).
The old for (int i = 0; ... is almost twice slower comparing to for-each - most probably because there is an additional effort of returning a value from the array at given index.
Groovy's each method is a little bit faster then stream + closure variant, and both are more than twice slower comparing to the fastest one.
It's important to run benchmarks for a specific use case to get the most accurate answer. For instance, Stream API will be most probably the best choice if there are some other operations applied next to the iteration (filtering, mapping etc.). For simple iterations from the first to the last element of a given collection choosing old Java for-each might give the best results, because it does not produce much overhead.
Also - the size of collection matters. For instance, if we use the above example but instead of iterating over 100 millions of elements we would iterate over 100k elements, then the slowest variant would cost 0.82 ms versus 0.38 ms. If you build a system where every nanosecond matters then you have to pick the most efficient solution. But if you build a simple CRUD application then it doesn't matter if iteration over a collection takes 0.82 or 0.38 milliseconds - the cost of database connection is at least 50 times bigger, so saving approximately 0.44 milliseconds would not make any impact.
// Results for iterating over 100k elements
Environment
===========
* Groovy: 2.4.12
* JVM: OpenJDK 64-Bit Server VM (25.181-b15, Oracle Corporation)
* JRE: 1.8.0_181
* Total Memory: 236 MB
* Maximum Memory: 3497 MB
* OS: Linux (4.18.9-100.fc27.x86_64, amd64)
Options
=======
* Warm Up: Auto (- 60 sec)
* CPU Time Measurement: On
user system cpu real
numbers.each {} 717422 0 717422 722944
for (int i = 0 ...) 593016 0 593016 600860
for-each 381976 0 381976 387252
stream + closure 811506 5884 817390 827333
stream + anonymous class 408662 1183 409845 416381
UPDATE: Dynamic invocation vs static compilation
There is also one more factor worth taking into account - static compilation. Below you can find results for 10 millions element collection iterations benchmark:
Environment
===========
* Groovy: 2.4.12
* JVM: OpenJDK 64-Bit Server VM (25.181-b15, Oracle Corporation)
* JRE: 1.8.0_181
* Total Memory: 236 MB
* Maximum Memory: 3497 MB
* OS: Linux (4.18.10-100.fc27.x86_64, amd64)
Options
=======
* Warm Up: Auto (- 60 sec)
* CPU Time Measurement: On
user system cpu real
Dynamic each {} 727357070 0 727357070 731017063
Static each {} 141425428 344969 141770397 143447395
Dynamic for-each 369991296 619640 370610936 375825211
Static for-each 92998379 27666 93026045 93904478
Dynamic for (int i = 0; ...) 679991895 1492518 681484413 690961227
Static for (int i = 0; ...) 173188913 0 173188913 175396602
As you can see turning on static compilation (with #CompileStatic class annotation for instance) is a game changer. Of course Java for-each is still the most efficient, however its static variant is almost 4 times faster than the dynamic one. Static Groovy each {} is faster 5 times faster than the dynamic each {}. And static for loop is also 4 times faster then the dynamic for loop.
Conclusion - for 10 millions elements static numbers.each {} takes 143 milliseconds while static for-each takes 93 milliseconds for the same size collection. It means that for collection of size 100k static numbers.each {} will cost 0.14 ms and static for-each will take 0.09 ms approximately. Both are very fast and the real difference starts when the size of collection explodes to +100 millions of elements.
Java stream from Java compiled class
And to give you a perspective - here is Java class with stream().forEach() on 10 millions of elements for a comparison:
Java stream.forEach() 87271350 160988 87432338 88563305
Just a little bit faster than statically compiled for-each in Groovy code.
I am just can figure out how to configure a Cron job in Quartz with initial delay.
So i need something that runs every hour with an initial delay of 10 min.
"* * 0/1 * * ?"
Here's a late answer, hopefully this helps others. I solved the issue by having 2 scheduled functions in my service class:
#EnableScheduling
public class DeviceService {
#Scheduled(initialDelayString = "${devices.update.initial}", fixedDelay = 2592000000L)
public void initialUpdateDevices() {
updateDevices();
}
#Scheduled(cron = "${devices.update.cron}")
public void cronUpdateDevices() {
updateDevices();
}
private void updateDevices() {
...
}
}
The initial delay and the cron expression are set in application.properties. The fixedDelay is there because Spring doesn't allow initialDelay alone. I set it to 2592000000ms, which is 30 days. In our application, the potential extra update doesn't do any harm.
In application.properties:
devices.update.initial = 600000
devices.update.cron = 0 30 1 * * *
Initially run after 10 minutes (60000ms) and then every night at 01:30.
In application-test.properties for unit testing:
devices.update.initial = 86400000
devices.update.cron = 0 30 1 24 12 *
None of our unit tests take 1 day to execute so 86400000 milliseconds is a safe bet. The cron "0 30 1 24 12 *" is set to Christmas Eve's night when people should be dreaming of nice things.