Starting point for ANTLR4 performance

Starting point for ANTLR4 performance - antlr4

I've migrated a quite large ANTLR2 grammar to ANTLR4, and reached a step where the output in both grammars is almost identical, apart from a few edge cases. However, some files are extremely long to get parsed (even with SLL prediction mode and BailOutStrategy), so I'm wondering how I could find which rules should be fixed first.
I've already gathered some statistics with Parser#setProfile(), but I don't know how to interpret the results in each DecisionInfo object. Is there any good documentation on how to start optimizing a large ANTLR4 grammar, and find which rabbit to chase first ?

As I didn't know what to look for in the DecisionInfo object, here's what I've found, and helped me improve the parse time by at least an order of magnitude.
First, I enabled profiling on the grammar with org.antlr.v4.runtime.Parser.setProfile(boolean profile), then executed the parser with org.antlr.v4.runtime.Parser.getInterpreter().setPredictionMode(PredictionMode.SLL) on thousands of files, and browsed through the rules with highest prediction time:
Arrays.stream(parser.getParseInfo().getDecisionInfo())
.filter(decision -> decision.timeInPrediction > 100000000)
.sorted((d1, d2) -> Long.compare(d2.timeInPrediction, d1.timeInPrediction))
.forEach(decision -> System.out.println(
String.format("Time: %d in %d calls - LL_Lookaheads: %d Max k: %d Ambiguities: %d Errors: %d Rule: %s",
decision.timeInPrediction / 1000000,
decision.invocations, decision.SLL_TotalLook,
decision.SLL_MaxLook, decision.ambiguities.size(),
decision.errors.size(), Proparse.ruleNames[Proparse._ATN.getDecisionState(decision.decision).ruleIndex])))
and then on highest max lookahead by using the same lamba except:
filter(decision -> decision.SLL_MaxLook > 50).sorted((d1, d2) -> Long.compare(d2.SLL_MaxLook, d1.SLL_MaxLook))
This gave me 4 rules where most of the time was spent, and in this case that was enough to see what had to be changed (by knowing where to look for problems).

Related

Time Complexity on Linux Shell

I created this simple algorithm, for studying it's complexity. Algorithm 1
I studied its complexity, O(n^3), and tried using the "time" command on the Linux Shell. Timing execution Algorithm 1
So I decided to increase the complexity, about O(n^5). Algorithm 2
But when I use the Time command, the times don't increase as I thought. Timing execution Algorithm 2

I think the time is not increasing because of the values of n.
Basically when n is high, the variable named 'a' inside the 'f' function can't hold such a high number .
using a debugger and n=1000000000, I got a random number as a result (-1486618624) because the maximum number that a int can hold in c++ is 2147483647, maybe this is why you'r not seeing any changes,you can try and use other types like long long that will serve you better in this case ( i wanted to comment this answer but i don't have enough reputation x) )

How to get delta percentage from /proc/schedstat

I am trying to get node CFS scheduler throttling in percent. For that i am reading 2 values 2 times (ignoring timeslices) from /proc/schedstat it has following format:
$ cat /proc/schedstat
version 15
timestamp 4297299139
cpu0 0 0 0 0 0 0 1145287047860 105917480368 8608857
CpuTime RunqTime
so i read from file, sleep for some time, read again, calculate time passed and value delta between, and calc percent then using following code:
cputTime := float64(delta.CpuTime) / delta.TimeDelta / 10000000
runqTime := float64(delta.RunqTime) / delta.TimeDelta / 10000000
percent := runqTime
the trick is that percent could be like 2000%
i assumed that runqtime is incremental, and is expressed in nanoseconds, so i divided it by 10^7 (to get it to 0-100% range), and timedelta is difference between measurements in seconds. what is wrong with it? how to do that properly?

I, for one, do not know how to interpret the output of /proc/schedstat.
You do quote an answer to a unix.stackexchange question, with a link to a mail in LKML that mentions a possible patch to the documentation.
However, "schedstat" is a term which is suspiciously missing from my local man proc page, and from the copies of man proc I could find on the internet. Actually, when searching for schedstat on Google, the results I get either do not mention the word "schedstat" (for example : I get links to copies of the man page, which mentions "sched" and "stat"), or non authoritative comments (fun fact : some of them quote that answer on stackexchange as a reference ...)
So at the moment : if I had to really understand what's in the output, I think I would try to read the code for my version of the kernel.
As far as "how do you compute delta ?", I understand what you intend to do, I had in mind something more like "what code have you written to do it ?".
By running cat /proc/schedstat; sleep 1 in a loop on my machine, I see that the "timestamp" entry is incremented by ~250 units on each iteration (so I honestly can't say what's the underlying unit for that field ...).
To compute delta.TimeDelta : do you use that field ? or do you take two instances of time.Now() ?
The other deltas are less ambiguous, I do imagine you took the difference between the counters you see :)
Do note that, on my mainly idle machine, I sometimes see increments higher than 10^9 over a second on these counters. So again : I do not know how to interpret these numbers.

How to speed up a CLOCK_MONOTONIC list of timestamps

I have a list of SECONDS.MICROSECONDS CLOCK_MONOTONIC timestamps like those below:
5795.944152
5795.952708
5795.952708
5795.960820
5795.960820
5795.969092
5795.969092
5795.977502
5795.977502
5795.986061
5795.986061
5795.994075
5795.994075
5796.002382
5796.002382
5796.010860
5796.010860
5796.019241
5796.019241
5796.027452
5796.027452
5796.035709
5796.035709
5796.044158
5796.044158
5796.052453
5796.052453
5796.060785
5796.060785
5796.069053
They each represent a particular action to be made.
What I need to do, in python preferably, but the programming language doesn't really matter, is to speed up the actions - something like being able to do a 2X, 3X, etc., speed increment on this list. So those values need to decrease in such a way to match the speed incrementation of ?X.
I thought of dividing each timestamp with the speed number I want, but it seems it doesn't work this way.

As described and suggested by #RobertDodier I have managed to find a quick and simple solution to my issue:
speed = 2
speedtimestamps = [(t0 + (t - t0)/speed for t in timestamps]
Just make sure to remove the first line containing the first t0 timestamp.

linearK - large time difference between empirical and acceptance envelopes in spatstat

I am interested in knowing correlation in points between 0 to 2km on a linear network. I am using the following statement for empirical data, this is solved in 2 minutes.
obs<-linearK(c, r=seq(0,2,by=0.20))
Now I want to check the acceptance of Randomness, so I used envelopes for the same r range.
acceptance_enve<-envelope(c, linearK, nsim=19, fix.n = TRUE, funargs = list(r=seq(0,2,by=0.20)))
But this show estimated time to be little less than 3 hours. I just want to ask if this large time difference is normal. Am I correct in my syntax to the function call of envelope its extra arguments for r as a sequence?
Is there some efficient way to shorten this 3 hour execution time for envelopes?
I have a road network of whole city, so it is quite large and I have checked that there are no disconnected subgraphs.
c
Point pattern on linear network
96 points
Linear network with 13954 vertices and 19421 lines
Enclosing window: rectangle = [559.653, 575.4999] x
[4174.833, 4189.85] Km
thank you.
EDIT AFTER COMMENT
system.time({s <- runiflpp(npoints(c), as.linnet(c));
+ linearK(s, r=seq(0,2,by=0.20))})
user system elapsed
343.047 104.428 449.650
EDIT 2
I made some really small changes by deleting some peripheral network segments that seem to have little or no effect on the overall network. This also lead to split some long segments into smaller segments. But now on the same network with different point pattern, I have even longer estimated time:
> month1envelope=envelope(months[[1]], linearK ,nsim = 39, r=seq(0,2,0.2))
Generating 39 simulations of CSR ...
1, 2, [etd 12:03:43]
The new network is
> months[[1]]
Point pattern on linear network
310 points
Linear network with 13642 vertices and 18392 lines
Enclosing window: rectangle = [560.0924, 575.4999] x [4175.113,
4189.85] Km
System Config: MacOS 10.9, 2.5Ghz, 16GB, R 3.3.3, RStudio Version 1.0.143

You don't need to use funargs in this context. Arguments can be passed directly through the ... argument. So I suggest
acceptance_enve <- envelope(c, linearK, nsim=19,
fix.n = TRUE, r=seq(0,2,by=0.20))
Please try this to see if it accelerates the execution.

in-memory string deduplication

Context: I'm writing something to process log data which involves loading several GB of data into memory and cross checking various things, finding correlations in data and writing the results out to another file. (This is essentially a cooking/denormalization step before loading into a Druid.io cluster.) I want to avoid having to write the information to a database for both performance and code simplicity - it is assumed that in the foreseeable future the volume of data processed at one time can be handled by adding memory to the machine.
My question is if it is a good idea to attempt to explicitly deduplicate strings in my code; and if so, what is a good approach. Many of the values in these log files are the same exact pieces of text (probably about 25% of the total text values in the file are unique, rough guess).
Since we're talking about gigs of data, and while ram is cheap and swap is possible, there is still a limit and if I'm careless I will very likely hit it. If I do something like this:
strstore := make(map[string]string)
// do some work that involves slicing and dicing some text, resulting in:
var a = "some string that we figured out that has about a 75% chance of being duplicate"
// note that there are about 10 other such variables that are calculated here, only "a" shown for simplicity
if _, ok := strstore[a]; !ok {
strstore[a] = a
} else {
a = strstore[a]
}
// now do some more stuff with "a" and keep it in a struct which is in
// some map some place
It would seem to me that this would have the effect of "reusing" existing strings, at the cost of a hash lookup and compare. Seemingly a good trade off.
However, this might not be that helpful if the strings that are in fact new cause memory to be fragmented and have various holes that are left unclaimed.
I could also try to keep one big byte array/slice which has the character data and index into that, but it would make the code hard to write (esp having to mess around with conversion between []byte and strings, which involves it's own allocation) and I would probably just be doing a poor job of something that is really the Go runtime's domain anyway.
Looking for any advice on approaches to this problem, or if anyone's experience with this sort of thing has yielded particularly useful mechanisms to address this.

There are a lot of interesting data structures and algorithms that you could use here. It depends on what you are trying do in the stats and processing stages.
I am not sure how compressible your logs are but you could pre-process the data, again depending on your uses cases : https://github.com/alecthomas/mph/blob/master/README.md
Take a look at some of these data structures as well for background :
https://github.com/Workiva/go-datastructures

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string