Logstash - grok is not parsing double digit float values - logstash

Grok is able to parse float values with single digit like 1.2 using BASE16FLOAT
but throws [0] "_grokparsefailure" when parsing double digit like 12.5
Example:
works for the log event
02:10:28 CPU Util %: 0.1 / 0.2 / 0.6 Disk Util %: 0.0 / 0.0 / 0.0
but not for
02:09:46 CPU Util %: 1.3 / 2.3 / 4.2 Disk Util %: 5.6 / 12.5 / 40.9
Logstash filter used
"message" => "%{TIME:time} CPU Util %: %{BASE16FLOAT:MIN_CPU} / %{BASE16FLOAT:AVG_CPU} / %{BASE16FLOAT:MAX_CPU} Disk Util %: %{BASE16FLOAT:MIN_DISK} / %{BASE16FLOAT:AVG_DISK} / %{BASE16FLOAT:MAX_DISK}"
I dont understand why it works for single digit float values but not for a double digit values.

You can use %{NUMBER} and ${SPACE}
"message" => "%{TIME:time}%{SPACE}CPU Util %:%{SPACE}%{NUMBER:MIN_CPU} /%{SPACE}%{NUMBER:AVG_CPU} /%{SPACE}%{NUMBER:MAX_CPU}%{SPACE}Disk Util %:%{SPACE}%{NUMBER:MIN_DISK} /%{SPACE}%{NUMBER:AVG_DISK} /%{SPACE}%{NUMBER:MAX_DISK}"

Related

Wrong timing in physics simulation with piston

I have written a very basic simulation of a pendulum. I used the equation α = -g/l * sin(φ). I set g/l to 50, so I expect to get a period of 0.89 seconds (Using the formula 2π*√(l/g)). However, I get a period of around 6.5 seconds. The relevant code part is
let ang_acc = - 50.0 * pendulum.angle.to_radians().sin();
pendulum.ang_vel += ang_acc * dt;
pendulum.angle += pendulum.ang_vel * dt;
The complete code is on GitHub. I tried to make it as basic as possible and commented everything important.

Efficient way to iterate collection objects in Groovy 2.x?

What is the best and fastest way to iterate over Collection objects in Groovy. I know there are several Groovy collection utility methods. But they use closures which are slow.
The final result in your specific case might be different, however benchmarking 5 different iteration variants available for Groovy shows that old Java for-each loop is the most efficient one. Take a look at the following example where we iterate over 100 millions of elements and we calculate the total sum of these numbers in the very imperative way:
#Grab(group='org.gperfutils', module='gbench', version='0.4.3-groovy-2.4')
import java.util.concurrent.atomic.AtomicLong
import java.util.function.Consumer
def numbers = (1..100_000_000)
def r = benchmark {
'numbers.each {}' {
final AtomicLong result = new AtomicLong()
numbers.each { number -> result.addAndGet(number) }
}
'for (int i = 0 ...)' {
final AtomicLong result = new AtomicLong()
for (int i = 0; i < numbers.size(); i++) {
result.addAndGet(numbers[i])
}
}
'for-each' {
final AtomicLong result = new AtomicLong()
for (int number : numbers) {
result.addAndGet(number)
}
}
'stream + closure' {
final AtomicLong result = new AtomicLong()
numbers.stream().forEach { number -> result.addAndGet(number) }
}
'stream + anonymous class' {
final AtomicLong result = new AtomicLong()
numbers.stream().forEach(new Consumer<Integer>() {
#Override
void accept(Integer number) {
result.addAndGet(number)
}
})
}
}
r.prettyPrint()
This is just a simple example where we try to benchmark the cost of iteration over a collection, no matter what the operation executed for every element from collection is (all variants use the same operation to give the most accurate results). And here are results (time measurements are expressed in nanoseconds):
Environment
===========
* Groovy: 2.4.12
* JVM: OpenJDK 64-Bit Server VM (25.181-b15, Oracle Corporation)
* JRE: 1.8.0_181
* Total Memory: 236 MB
* Maximum Memory: 3497 MB
* OS: Linux (4.18.9-100.fc27.x86_64, amd64)
Options
=======
* Warm Up: Auto (- 60 sec)
* CPU Time Measurement: On
WARNING: Timed out waiting for "numbers.each {}" to be stable
user system cpu real
numbers.each {} 7139971394 11352278 7151323672 7246652176
for (int i = 0 ...) 6349924690 5159703 6355084393 6447856898
for-each 3449977333 826138 3450803471 3497716359
stream + closure 8199975894 193599 8200169493 8307968464
stream + anonymous class 3599977808 3218956 3603196764 3653224857
Conclusion
Java's for-each is as fast as Stream + anonymous class (Groovy 2.x does not allow using lambda expressions).
The old for (int i = 0; ... is almost twice slower comparing to for-each - most probably because there is an additional effort of returning a value from the array at given index.
Groovy's each method is a little bit faster then stream + closure variant, and both are more than twice slower comparing to the fastest one.
It's important to run benchmarks for a specific use case to get the most accurate answer. For instance, Stream API will be most probably the best choice if there are some other operations applied next to the iteration (filtering, mapping etc.). For simple iterations from the first to the last element of a given collection choosing old Java for-each might give the best results, because it does not produce much overhead.
Also - the size of collection matters. For instance, if we use the above example but instead of iterating over 100 millions of elements we would iterate over 100k elements, then the slowest variant would cost 0.82 ms versus 0.38 ms. If you build a system where every nanosecond matters then you have to pick the most efficient solution. But if you build a simple CRUD application then it doesn't matter if iteration over a collection takes 0.82 or 0.38 milliseconds - the cost of database connection is at least 50 times bigger, so saving approximately 0.44 milliseconds would not make any impact.
// Results for iterating over 100k elements
Environment
===========
* Groovy: 2.4.12
* JVM: OpenJDK 64-Bit Server VM (25.181-b15, Oracle Corporation)
* JRE: 1.8.0_181
* Total Memory: 236 MB
* Maximum Memory: 3497 MB
* OS: Linux (4.18.9-100.fc27.x86_64, amd64)
Options
=======
* Warm Up: Auto (- 60 sec)
* CPU Time Measurement: On
user system cpu real
numbers.each {} 717422 0 717422 722944
for (int i = 0 ...) 593016 0 593016 600860
for-each 381976 0 381976 387252
stream + closure 811506 5884 817390 827333
stream + anonymous class 408662 1183 409845 416381
UPDATE: Dynamic invocation vs static compilation
There is also one more factor worth taking into account - static compilation. Below you can find results for 10 millions element collection iterations benchmark:
Environment
===========
* Groovy: 2.4.12
* JVM: OpenJDK 64-Bit Server VM (25.181-b15, Oracle Corporation)
* JRE: 1.8.0_181
* Total Memory: 236 MB
* Maximum Memory: 3497 MB
* OS: Linux (4.18.10-100.fc27.x86_64, amd64)
Options
=======
* Warm Up: Auto (- 60 sec)
* CPU Time Measurement: On
user system cpu real
Dynamic each {} 727357070 0 727357070 731017063
Static each {} 141425428 344969 141770397 143447395
Dynamic for-each 369991296 619640 370610936 375825211
Static for-each 92998379 27666 93026045 93904478
Dynamic for (int i = 0; ...) 679991895 1492518 681484413 690961227
Static for (int i = 0; ...) 173188913 0 173188913 175396602
As you can see turning on static compilation (with #CompileStatic class annotation for instance) is a game changer. Of course Java for-each is still the most efficient, however its static variant is almost 4 times faster than the dynamic one. Static Groovy each {} is faster 5 times faster than the dynamic each {}. And static for loop is also 4 times faster then the dynamic for loop.
Conclusion - for 10 millions elements static numbers.each {} takes 143 milliseconds while static for-each takes 93 milliseconds for the same size collection. It means that for collection of size 100k static numbers.each {} will cost 0.14 ms and static for-each will take 0.09 ms approximately. Both are very fast and the real difference starts when the size of collection explodes to +100 millions of elements.
Java stream from Java compiled class
And to give you a perspective - here is Java class with stream().forEach() on 10 millions of elements for a comparison:
Java stream.forEach() 87271350 160988 87432338 88563305
Just a little bit faster than statically compiled for-each in Groovy code.

Heap overflow in Haskell

I'm getting Heap exhausted message when running the following short Haskell program on a big enough dataset. For example, the program fails (with heap overflow) on 20 Mb input file with around 900k lines. The heap size was set (through -with-rtsopts) to 1 Gb. It runs ok if longestCommonSubstrB is defined as something simpler, e.g. commonPrefix. I need to process files in the order of 100 Mb.
I compiled the program with the following command line (GHC 7.8.3):
ghc -Wall -O2 -prof -fprof-auto "-with-rtsopts=-M512M -p -s -h -i0.1" SampleB.hs
I would appreciate any help in making this thing run in a reasonable amount of space (in the order of the input file size), but I would especially appreciate the thought process of finding where the bottleneck is and where and how to force the strictness.
My guess is that somehow forcing longestCommonSubstrB function to evaluate strictly would solve the problem, but I don't know how to do that.
{-# LANGUAGE BangPatterns #-}
module Main where
import System.Environment (getArgs)
import qualified Data.ByteString.Lazy.Char8 as B
import Data.List (maximumBy, sort)
import Data.Function (on)
import Data.Char (isSpace)
-- | Returns a list of lexicon items, i.e. [[w1,w2,w3]]
readLexicon :: FilePath -> IO [[B.ByteString]]
readLexicon filename = do
text <- B.readFile filename
return $ map (B.split '\t' . stripR) . B.lines $ text
where
stripR = B.reverse . B.dropWhile isSpace . B.reverse
transformOne :: [B.ByteString] -> B.ByteString
transformOne (w1:w2:w3:[]) =
B.intercalate (B.pack "|") [w1, longestCommonSubstrB w2 w1, w3]
transformOne a = error $ "transformOne: unexpected tuple " ++ show a
longestCommonSubstrB :: B.ByteString -> B.ByteString -> B.ByteString
longestCommonSubstrB xs ys = maximumBy (compare `on` B.length) . concat $
[f xs' ys | xs' <- B.tails xs] ++
[f xs ys' | ys' <- tail $ B.tails ys]
where f xs' ys' = scanl g B.empty $ B.zip xs' ys'
g z (x, y) = if x == y
then z `B.snoc` x
else B.empty
main :: IO ()
main = do
(input:output:_) <- getArgs
lexicon <- readLexicon input
let flattened = B.unlines . sort . map transformOne $ lexicon
B.writeFile output flattened
This is the profile ouput for the test dataset (100k lines, heap size set to 1 GB, i.e. generateSample.exe 100000, the resulting file size is 2.38 MB):
Heap profile over time:
Execution statistics:
3,505,737,588 bytes allocated in the heap
785,283,180 bytes copied during GC
62,390,372 bytes maximum residency (44 sample(s))
216,592 bytes maximum slop
96 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 6697 colls, 0 par 1.05s 1.03s 0.0002s 0.0013s
Gen 1 44 colls, 0 par 4.14s 3.99s 0.0906s 0.1935s
INIT time 0.00s ( 0.00s elapsed)
MUT time 7.80s ( 9.17s elapsed)
GC time 3.75s ( 3.67s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 1.44s ( 1.35s elapsed)
EXIT time 0.02s ( 0.00s elapsed)
Total time 13.02s ( 12.85s elapsed)
%GC time 28.8% (28.6% elapsed)
Alloc rate 449,633,678 bytes per MUT second
Productivity 60.1% of total user, 60.9% of total elapsed
Time and Allocation Profiling Report:
SampleB.exe +RTS -M1G -p -s -h -i0.1 -RTS sample.txt sample_out.txt
total time = 3.97 secs (3967 ticks # 1000 us, 1 processor)
total alloc = 2,321,595,564 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
longestCommonSubstrB Main 43.3 33.1
longestCommonSubstrB.f Main 21.5 43.6
main.flattened Main 17.5 5.1
main Main 6.6 5.8
longestCommonSubstrB.g Main 5.0 5.8
readLexicon Main 2.5 2.8
transformOne Main 1.8 1.7
readLexicon.stripR Main 1.8 1.9
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 45 0 0.1 0.0 100.0 100.0
main Main 91 0 6.6 5.8 99.9 100.0
main.flattened Main 93 1 17.5 5.1 89.1 89.4
transformOne Main 95 100000 1.8 1.7 71.6 84.3
longestCommonSubstrB Main 100 100000 43.3 33.1 69.8 82.5
longestCommonSubstrB.f Main 101 1400000 21.5 43.6 26.5 49.5
longestCommonSubstrB.g Main 104 4200000 5.0 5.8 5.0 5.8
readLexicon Main 92 1 2.5 2.8 4.2 4.8
readLexicon.stripR Main 98 0 1.8 1.9 1.8 1.9
CAF GHC.IO.Encoding.CodePage 80 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 74 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 70 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 66 0 0.0 0.0 0.0 0.0
CAF System.Environment 65 0 0.0 0.0 0.0 0.0
CAF Data.ByteString.Lazy.Char8 54 0 0.0 0.0 0.0 0.0
CAF Main 52 0 0.0 0.0 0.0 0.0
transformOne Main 99 0 0.0 0.0 0.0 0.0
readLexicon Main 96 0 0.0 0.0 0.0 0.0
readLexicon.stripR Main 97 1 0.0 0.0 0.0 0.0
main Main 90 1 0.0 0.0 0.0 0.0
UPDATE: The following program can be used to generate sample data. It expects one argument, the number of lines in the generated dataset. The generated data will be saved to the sample.txt file. When I generate 900k lines dataset with it (by running generateSample.exe 900000), the produced dataset makes the above program fail with heap overflow (the heap size was set to 1 GB). The resulting dataset is around 20 MB.
module Main where
import System.Environment (getArgs)
import Data.List (intercalate, permutations)
generate :: Int -> [(String,String,String)]
generate n = take n $ zip3 (f "banana") (f "ruanaba") (f "kikiriki")
where
f = cycle . permutations
main :: IO ()
main = do
(n:_) <- getArgs
let flattened = unlines . map f $ generate (read n :: Int)
writeFile "sample.txt" flattened
where
f (w1,w2,w3) = intercalate "\t" [w1, w2, w3]
It seems to me you've implemented a naive longest common substring, with terrible space complexity (at least O(n^2)). Strictness has nothing to do with it.
You'll want to implement a dynamic programming algo. You may find inspiration in the string-similarity package, or in the lcs function in the guts of the Diff package.

Why do hGetBuf, hPutBuf, etc. allocate memory?

In the process of doing some simple benchmarking, I came across something that surprised me. Take this snippet from Network.Socket.Splice:
hSplice :: Int -> Handle -> Handle -> IO ()
hSplice len s t = do
a <- mallocBytes len :: IO (Ptr Word8)
finally
(forever $! do
bytes <- hGetBufSome s a len
if bytes > 0
then hPutBuf t a bytes
else throwRecv0)
(free a)
One would expect that hGetBufSome and hPutBuf here would not need to allocate memory, as they write into and read from a pre-allocated buffer. The docs seem to back this intuition up... But alas:
individual inherited
COST CENTRE %time %alloc %time %alloc bytes
hSplice 0.5 0.0 38.1 61.1 3792
hPutBuf 0.4 1.0 19.8 29.9 12800000
hPutBuf' 0.4 0.4 19.4 28.9 4800000
wantWritableHandle 0.1 0.1 19.0 28.5 1600000
wantWritableHandle' 0.0 0.0 18.9 28.4 0
withHandle_' 0.0 0.1 18.9 28.4 1600000
withHandle' 1.0 3.8 18.8 28.3 48800000
do_operation 1.1 3.4 17.8 24.5 44000000
withHandle_'.\ 0.3 1.1 16.7 21.0 14400000
checkWritableHandle 0.1 0.2 16.4 19.9 3200000
hPutBuf'.\ 1.1 3.3 16.3 19.7 42400000
flushWriteBuffer 0.7 1.4 12.1 6.2 17600000
flushByteWriteBuffer 11.3 4.8 11.3 4.8 61600000
bufWrite 1.7 6.9 3.0 9.9 88000000
copyToRawBuffer 0.1 0.2 1.2 2.8 3200000
withRawBuffer 0.3 0.8 1.2 2.6 10400000
copyToRawBuffer.\ 0.9 1.7 0.9 1.7 22400000
debugIO 0.1 0.2 0.1 0.2 3200000
debugIO 0.1 0.2 0.1 0.2 3200016
hGetBufSome 0.0 0.0 17.7 31.2 80
wantReadableHandle_ 0.0 0.0 17.7 31.2 32
wantReadableHandle' 0.0 0.0 17.7 31.2 0
withHandle_' 0.0 0.0 17.7 31.2 32
withHandle' 1.6 2.4 17.7 31.2 30400976
do_operation 0.4 2.4 16.1 28.8 30400880
withHandle_'.\ 0.5 1.1 15.8 26.4 14400288
checkReadableHandle 0.1 0.4 15.3 25.3 4800096
hGetBufSome.\ 8.7 14.8 15.2 24.9 190153648
bufReadNBNonEmpty 2.6 4.4 6.1 8.0 56800000
bufReadNBNonEmpty.buf' 0.0 0.4 0.0 0.4 5600000
bufReadNBNonEmpty.so_far' 0.2 0.1 0.2 0.1 1600000
bufReadNBNonEmpty.remaining 0.2 0.1 0.2 0.1 1600000
copyFromRawBuffer 0.1 0.2 2.9 2.8 3200000
withRawBuffer 1.0 0.8 2.8 2.6 10400000
copyFromRawBuffer.\ 1.8 1.7 1.8 1.7 22400000
bufReadNBNonEmpty.avail 0.2 0.1 0.2 0.1 1600000
flushCharReadBuffer 0.3 2.1 0.3 2.1 26400528
I have to assume this is on purpose... but I have no idea what that purpose might be. Even worse: I'm just barely clever enough to get this profile, but not quite clever enough to figure out exactly what's being allocated.
Any help along those lines would be appreciated.
UPDATE: I've done some more profiling with two drastically simplified testcases. The first testcase directly uses the read/write ops from System.Posix.Internals:
echo :: Ptr Word8 -> IO ()
echo buf = forever $ do
threadWaitRead $ Fd 0
len <- c_read 0 buf 1
c_write 1 buf (fromIntegral len)
yield
As you'd hope, this allocates no memory on the heap each time through the loop. The second testcase uses the read/write ops from GHC.IO.FD:
echo :: Ptr Word8 -> IO ()
echo buf = forever $ do
len <- readRawBufferPtr "read" stdin buf 0 1
writeRawBufferPtr "write" stdout buf 0 (fromIntegral len)
UPDATE #2: I was advised to file this as a bug in GHC Trac... I'm still not sure it actually is a bug (as opposed to intentional behavior, a known limitation, or whatever) but here it is: https://ghc.haskell.org/trac/ghc/ticket/9696
I'll try to guess based on the code
Runtime tries to optimize small reads and writes, so it maintains internal buffer. If your buffer is 1 byte long, it will be inefficient to use it dirrectly. So internal buffer is used to read bigger chunk of data. It is probably ~32Kb long. Plus something similar for writing. Plus your own buffer.
The code has an optimization -- if you provide buffer bigger then the internal one, and the later is empty, it will use your buffer dirrectly. But the internal buffer is already allocated, so it will not less memory usage. I don't know how to dissable internal buffer, but you can open feature request if it is important for you.
(I realize that my guess can be totally wrong.)
ADD:
This one does seem to allocate, but I still don't know why.
What is your concern, max memory usage or number of allocated bytes?
c_read is a C function, it doesn't allocate on haskell's heap (but may allocate on C heap.)
readRawBufferPtr is Haskell function, and it is usual for haskell functions to allocate a lot of memory, that quickly becomes a garbage. Simply because of immutability. It is common for haskell program to allocate e.g 100Gb while memory usage is under 1Mb.
It seems like the conclusion is: it's a bug.

fast conversion from string time to milliseconds

For a vector or list of times, I'd like to go from a string time, e.g. 12:34:56.789 to milliseconds from midnight, which would be equal to 45296789.
This is what I do now:
toms = function(time) {
sapply(strsplit(time, ':', fixed = T),
function(x) sum(as.numeric(x)*c(3600000,60000,1000)))
}
and would like to do it faster.
Here's an example data set for benchmarking:
times = rep('12:34:56.789', 1e6)
system.time(toms(times))
# user system elapsed
# 9.00 0.04 9.05
You could use the fasttime package, which seems to be about an order of magnitude faster.
library(fasttime)
fasttoms <- function(time) {
1000*unclass(fastPOSIXct(paste("1970-01-01",time)))
}
times <- rep('12:34:56.789', 1e6)
system.time(toms(times))
# user system elapsed
# 6.61 0.03 6.68
system.time(fasttoms(times))
# user system elapsed
# 0.53 0.00 0.53
identical(fasttoms(times),toms(times))
# [1] TRUE

Resources