What dtrace script output means? - node.js

I am tracing DTrace probes in my restify.js application (restify it is http server in node.js that provides dtrace support). I am using sample dtrace script from restify documentation:
#!/usr/sbin/dtrace -s
#pragma D option quiet
restify*:::route-start
{
track[arg2] = timestamp;
}
restify*:::handler-start
/track[arg3]/
{
h[arg3, copyinstr(arg2)] = timestamp;
}
restify*:::handler-done
/track[arg3] && h[arg3, copyinstr(arg2)]/
{
#[copyinstr(arg2)] = quantize((timestamp - h[arg3, copyinstr(arg2)]) / 1000000);
h[arg3, copyinstr(arg2)] = 0;
}
restify*:::route-done
/track[arg2]/
{
#[copyinstr(arg1)] = quantize((timestamp - track[arg2]) / 1000000);
track[arg2] = 0;
}
And the output is:
use_restifyRequestLogger
value ------------- Distribution ------------- count
-1 | 0
0 |######################################## 2
1 | 0
use_validate
value ------------- Distribution ------------- count
-1 | 0
0 |######################################## 2
1 | 0
pre
value ------------- Distribution ------------- count
0 | 0
1 |#################### 1
2 |#################### 1
4 | 0
handler
value ------------- Distribution ------------- count
128 | 0
256 |######################################## 2
512 | 0
route_user_read
value ------------- Distribution ------------- count
128 | 0
256 |######################################## 2
512 | 0
I was wondering what is value value field - what does it mean?
Why there is 124/256/512 for example? I guess it means the time/duration but it is in strange format - is it possible to show miliseconds for example?

The output is a histogram. You are getting a histogram because you are using the quantize function in your D script. The DTrace documentation says the following on quantize:
A power-of-two frequency distribution of the values of the specified expressions. Increments the value in the highest power-of-two bucket that is less than the specified expression.
The 'value' columns is the result of (timestamp - track[arg2]) / 1000000 where timestamp is the current time in nanoseconds. So the value shown is duration in milliseconds.
Putting this all together, the route_user_read result graph is telling you that you had 2 requests that took between 128 and 256 milliseconds.
This output is useful when you have a lot of requests and want to get a general sense of how your server is performing (you can quickly identify a bi-modal distribution for example). If you just want to see how long each request is taking, try using the printf function instead of quantize.

Related

why double the cpu limit leads to only 20% time cost improvement?

I use python3 to do some encrypted calculation with MICROSOFT SEAL and is looking for some performance improvement.
I do it by:
create a shared memory to hold the plaintext data
(Use numpy array in shared memory for multiprocessing)
start multiple processes with multiprocessing.Process (there is a param controlling the number of processes, thus limiting the cpu usage)
processes read from shared memory and do some encrypted calculation
wait for calculation ends and join processes
I run this program on a 32U64G x86 linux server, cpu model is: Intel(R) Xeon(R) Gold 6161 CPU # 2.20GHz.
I notice that if I double the number of processes there is only about 20% time cost improvement.
I've tried three kinds of process nums:
| process nums | 7 | 13 | 27 |
| time ratio | 0.8 | 1 | 1.2 |
Why is this improvement disproportionate to the resources i use (cpu & memory)?
Conceptual knowledge or specific linux cmdlines are both welcome.
Thanks.
FYI:
My code of sub processes is like:
def sub_process_main(encrypted_bytes, plaintext_array, result_queue):
// init
// status_sign
while shared_int > 0:
// seal load and some other calculation
encrypted_matrix_list = seal.Ciphertext.load(encrypted_bytes)
shared_plaintext_matrix = seal.Encoder.encode(plaintext_array)
// ... do something
for some loop:
time1 = time.time()
res = []
for i in range(len(encrypted_matrix_list)):
enc = seal.evaluator.multiply_plain(encrypted_matrix_list[i], shared_plaintext_matrix[i])
res.append(enc)
time2 = time.time()
print(f'time usage: {time2 - time1}')
// ... do something
result_queue.put(final_result)
I actually print the time for every part of my code and here is the time cost for this part of code.
| process nums | 13 | 27 |
| occurrence | 1791 | 864 |
| total time | 1698.2140 | 1162.8330 |
| average | 0.9482 | 1.3459 |
I've monitored some metrics but I don't know if there are any abnormal ones.
13 cores:
top
pidstat
vmstat
27 cores:
top (Why is this using all cores rather than exactly 27 cores? Does it have anything to do with Hyper-Threading?)
pidstat
vmstat

Why an infinite loop doesn't result in an error in model checking with Promela and Spin?

If I write the following code in Promela and run it in Spin in verifier mode it ends with 0 errors. It does report that toogle and init had unreached states, but those seem to be only warnings.
byte x = 0; byte y = 0;
active proctype toggle() {
do
:: x == 1 -> x = 0
:: x == 0 -> x = 1
od
}
init {
(y == 1);
}
I was confused by this because I thought this would give me a 'invalid end state' error. If I change the body of the toogle proctype with a simple skip statement it does error out as I expected.
Why is this? Is there a way to force the simulator to report the infinite loop as an error?
Regarding the 'unreached in proctype' messages, adding an end label to the do loop doesn't seem to do anything.
I am running spin 6.5.0 and ran the following commands:
spin.exe -a test.pml
gcc -o pan pan.c
pan.exe
These are the outputs, for reference.
With do loop:
pan.exe
(Spin Version 6.5.0 -- 1 July 2019)
+ Partial Order Reduction
Full statespace search for:
never claim - (none specified)
assertion violations +
acceptance cycles - (not selected)
invalid end states +
State-vector 20 byte, depth reached 3, errors: 0
4 states, stored
1 states, matched
5 transitions (= stored+matched)
0 atomic steps
hash conflicts: 0 (resolved)
Stats on memory usage (in Megabytes):
0.000 equivalent memory usage for states (stored*(State-vector + overhead))
0.292 actual memory usage for states
64.000 memory used for hash table (-w24)
0.343 memory used for DFS stack (-m10000)
64.539 total actual memory usage
unreached in proctype toggle
..\test2.pml:7, state 8, "-end-"
(1 of 8 states)
unreached in init
..\test2.pml:10, state 2, "-end-"
(1 of 2 states)
pan: elapsed time 0.013 seconds
pan: rate 307.69231 states/second
With skip:
pan.exe
pan:1: invalid end state (at depth 0)
pan: wrote ..\test2.pml.trail
(Spin Version 6.5.0 -- 1 July 2019)
Warning: Search not completed
+ Partial Order Reduction
Full statespace search for:
never claim - (none specified)
assertion violations +
acceptance cycles - (not selected)
invalid end states +
State-vector 20 byte, depth reached 1, errors: 1
2 states, stored
0 states, matched
2 transitions (= stored+matched)
0 atomic steps
hash conflicts: 0 (resolved)
Stats on memory usage (in Megabytes):
0.000 equivalent memory usage for states (stored*(State-vector + overhead))
0.293 actual memory usage for states
64.000 memory used for hash table (-w24)
0.343 memory used for DFS stack (-m10000)
64.539 total actual memory usage
pan: elapsed time 0.015 seconds
pan: rate 133.33333 states/second
In this example
byte x = 0; byte y = 0;
active proctype toggle() {
do
:: x == 1 -> x = 0
:: x == 0 -> x = 1
od
}
init {
(y == 1);
}
the init process is blocked forever (because y == 1 is always false), but the toggle process can always execute something. Therefore, there is no invalid end state error.
Instead, in this example
byte x = 0; byte y = 0;
active proctype toggle() {
skip;
}
init {
(y == 1);
}
the init process is still blocked forever, but the toggle process can execute its only instruction skip; and then terminate. At this point, none of the remaining processes (i.e. only init) has any instruction it can execute, so Spin terminates with an invalid end state error.
~$ spin -a -search test.pml
pan:1: invalid end state (at depth 0)
pan: wrote test.pml.trail
(Spin Version 6.5.0 -- 17 July 2019)
...
State-vector 20 byte, depth reached 1, errors: 1
...
Is there a way to force the simulator to report the infinite loop as an error?
Yes. There are actually multiple ways.
The simplest approach is to use option -l of Spin:
~$ spin --help
...
-l: search for non-progress cycles
...
With this option, Spin reports any infinite-loop which does not contain any state with a progress label.
This is the output on your original problem:
~$ spin -search -l test.pml
pan:1: non-progress cycle (at depth 2)
pan: wrote test.pml.trail
(Spin Version 6.5.0 -- 17 July 2019)
...
State-vector 28 byte, depth reached 9, errors: 1
...
~$ spin -t test.pml
spin: couldn't find claim 2 (ignored)
<<<<<START OF CYCLE>>>>>
spin: trail ends after 10 steps
#processes: 2
x = 0
y = 0
10: proc 1 (:init::1) test.pml:10 (state 1)
10: proc 0 (toggle:1) test.pml:5 (state 4)
2 processes created
An alternative approach is to use LTL model checking. For instance, you may state that at some point the number of processes (see _nr_pr) that are in execution becomes equal to 0 (or more, if you admit some infinite loops), or check that a particular process terminates correctly using remote references.
Both cases are contained in the following example:
byte x = 0; byte y = 0;
active proctype toggle() {
do
:: x == 1 -> x = 0
:: x == 0 -> x = 1
od;
end:
}
init {
(y == 1);
}
// sooner or later, the process toggle
// with _pid == 0 will reach the end
// state
ltl p1 { <> toggle[0]#end };
// sooner or later, the number of processes
// that are currently running becomes 0,
// (hence, there can be no infinite loops)
ltl p2 { <> (_nr_pr == 0) };
Both the first
~$ spin -a -search -ltl p1 test.pml
~$ spin -t test.pml
ltl p1: <> ((toggle[0]#end))
ltl p2: <> ((_nr_pr==0))
<<<<<START OF CYCLE>>>>>
Never claim moves to line 4 [(!((toggle[0]._p==end)))]
spin: trail ends after 8 steps
#processes: 2
x = 0
y = 0
end = 0
8: proc 1 (:init::1) test.pml:10 (state 1)
8: proc 0 (toggle:1) test.pml:3 (state 5)
8: proc - (p1:1) _spin_nvr.tmp:3 (state 3)
2 processes created
and the second
~$ spin -a -search -ltl p2 test.pml
~$ spin -t test.pml
ltl p1: <> ((toggle[0]#end))
ltl p2: <> ((_nr_pr==0))
<<<<<START OF CYCLE>>>>>
Never claim moves to line 11 [(!((_nr_pr==0)))]
spin: trail ends after 8 steps
#processes: 2
x = 0
y = 0
end = 0
8: proc 1 (:init::1) test.pml:10 (state 1)
8: proc 0 (toggle:1) test.pml:3 (state 5)
8: proc - (p2:1) _spin_nvr.tmp:10 (state 3)
2 processes created
LTL properties are found to be false.
Regarding the 'unreached in proctype' messages, adding an end label to the do loop doesn't seem to do anything.
The end label(s) are used to remove the "invalid end state" error that would be otherwise be found.
For example, modifying your previous example as follows:
byte x = 0; byte y = 0;
active proctype toggle() {
skip;
}
init {
end:
(y == 1);
}
Makes the error go away:
~$ spin -a -search test.pml
(Spin Version 6.5.0 -- 17 July 2019)
...
State-vector 20 byte, depth reached 1, errors: 0
...
One should only ever use an end label when one is willing to guarantee that a process being stuck with no executable instruction is not a symptom of an undesired deadlock situation.

Golang : fatal error: runtime: out of memory

I trying to use this package in Github for string matching. My dictionary is 4 MB. When creating the Trie, I got fatal error: runtime: out of memory. I am using Ubuntu 14.04 with 8 GB of RAM and Golang version 1.4.2.
It seems the error come from the line 99 (now) here : m.trie = make([]node, max)
The program stops at this line.
This is the error:
fatal error: runtime: out of memory
runtime stack:
runtime.SysMap(0xc209cd0000, 0x3b1bc0000, 0x570a00, 0x5783f8)
/usr/local/go/src/runtime/mem_linux.c:149 +0x98
runtime.MHeap_SysAlloc(0x57dae0, 0x3b1bc0000, 0x4296f2)
/usr/local/go/src/runtime/malloc.c:284 +0x124
runtime.MHeap_Alloc(0x57dae0, 0x1d8dda, 0x10100000000, 0x8)
/usr/local/go/src/runtime/mheap.c:240 +0x66
goroutine 1 [running]:
runtime.switchtoM()
/usr/local/go/src/runtime/asm_amd64.s:198 fp=0xc208518a60 sp=0xc208518a58
runtime.mallocgc(0x3b1bb25f0, 0x4d7fc0, 0x0, 0xc20803c0d0)
/usr/local/go/src/runtime/malloc.go:199 +0x9f3 fp=0xc208518b10 sp=0xc208518a60
runtime.newarray(0x4d7fc0, 0x3a164e, 0x1)
/usr/local/go/src/runtime/malloc.go:365 +0xc1 fp=0xc208518b48 sp=0xc208518b10
runtime.makeslice(0x4a52a0, 0x3a164e, 0x3a164e, 0x0, 0x0, 0x0)
/usr/local/go/src/runtime/slice.go:32 +0x15c fp=0xc208518b90 sp=0xc208518b48
github.com/mf/ahocorasick.(*Matcher).buildTrie(0xc2083c7e60, 0xc209860000, 0x26afb, 0x2f555)
/home/go/ahocorasick/ahocorasick.go:104 +0x28b fp=0xc208518d90 sp=0xc208518b90
github.com/mf/ahocorasick.NewStringMatcher(0xc208bd0000, 0x26afb, 0x2d600, 0x8)
/home/go/ahocorasick/ahocorasick.go:222 +0x34b fp=0xc208518ec0 sp=0xc208518d90
main.main()
/home/go/seme/substrings.go:66 +0x257 fp=0xc208518f98 sp=0xc208518ec0
runtime.main()
/usr/local/go/src/runtime/proc.go:63 +0xf3 fp=0xc208518fe0 sp=0xc208518f98
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc208518fe8 sp=0xc208518fe0
exit status 2
This is the content of the main function (taken from the same repo: test file)
var dictionary = InitDictionary()
var bytes = []byte(""Partial invoice (€100,000, so roughly 40%) for the consignment C27655 we shipped on 15th August to London from the Make Believe Town depot. INV2345 is for the balance.. Customer contact (Sigourney) says they will pay this on the usual credit terms (30 days).")
var precomputed = ahocorasick.NewStringMatcher(dictionary)// line 66 here
fmt.Println(precomputed.Match(bytes))
Your structure is awfully inefficient in terms of memory, let's look at the internals. But before that, a quick reminder of the space required for some go types:
bool: 1 byte
int: 4 bytes
uintptr: 4 bytes
[N]type: N*sizeof(type)
[]type: 12 + len(slice)*sizeof(type)
Now, let's have a look at your structure:
type node struct {
root bool // 1 byte
b []byte // 12 + len(slice)*1
output bool // 1 byte
index int // 4 bytes
counter int // 4 bytes
child [256]*node // 256*4 = 1024 bytes
fails [256]*node // 256*4 = 1024 bytes
suffix *node // 4 bytes
fail *node // 4 bytes
}
Ok, you should have a guess of what happens here: each node weighs more than 2KB, this is huge ! Finally, we'll look at the code that you use to initialize your trie:
func (m *Matcher) buildTrie(dictionary [][]byte) {
max := 1
for _, blice := range dictionary {
max += len(blice)
}
m.trie = make([]node, max)
// ...
}
You said your dictionary is 4 MB. If it is 4MB in total, then it means that at the end of the for loop, max = 4MB. It it holds 4 MB different words, then max = 4MB*avg(word_length).
We'll take the first scenario, the nicest one. You are initializing a slice of 4M of nodes, each of which uses 2KB. Yup, that makes a nice 8GB necessary.
You should review how you build your trie. From the wikipedia page related to the Aho-Corasick algorithm, each node contains one character, so there is at most 256 characters that go from the root, not 4MB.
Some material to make it right: https://web.archive.org/web/20160315124629/http://www.cs.uku.fi/~kilpelai/BSA05/lectures/slides04.pdf
The node type has a memory size of 2084 bytes.
I wrote a litte program to demonstrate the memory usage: https://play.golang.org/p/szm7AirsDB
As you can see, the three strings (11(+1) bytes in size) dictionary := []string{"fizz", "buzz", "123"} require 24 MB of memory.
If your dictionary has a length of 4 MB you would need about 4000 * 2084 = 8.1 GB of memory.
So you should try to decrease the size of your dictionary.
Set resource limit to unlimited worked for me
if ulimit -a return 0 run ulimit -c unlimited
Maybe set a real size limit to be more secure

Is the bash function $RANDOM supposed to have an uniform distribution?

I understand that the bash function $RANDOM generates random integer number within a range, but, are these number supposed to follow (or approximate) an uniform discrete distribution?
I just printed $RANDOM a million times, turned it into a histogram, and viewed it with gnumeric, and the graph shows a very Normal distribution!
for n in `seq 1 1000000`; do echo $RANDOM ; done > random.txt
gawk '{b=int($1/100);a[b]++};END{for (n in a) {print n","a[n]}}' random.txt > hist.csv
gnumeric hist.csv
So, if you want an approximately linear distribution, use $(( $RANDOM % $MAXIMUM )) and don't use it with $MAXIMUM larger than 16383, or 8192 to be safe. You could concatenate $RANDOM % 1000 several times if you want really large numbers, as long as you take care of leading zeros.
If you do want a normal distribution, use $(( $RANGE * $RANDOM / 32767 + $MINIMUM)), and remember this is only integer math.
The Bash document doesn't actually say so:
RANDOM
Each time this parameter is referenced, a random integer between 0 and 32767 is generated.
Assigning a value to this variable seeds the random number generator.
Reading that, I would certainly assume that it's intended to be linear; it wouldn't make much sense IMHO for it to be anything else.
But looking at the bash source code, the implementation of $RANDOM is intended to produce a linear distribution (this is from variable.c in the bash 4.2 source):
/* The random number seed. You can change this by setting RANDOM. */
static unsigned long rseed = 1;
static int last_random_value;
static int seeded_subshell = 0;
/* A linear congruential random number generator based on the example
one in the ANSI C standard. This one isn't very good, but a more
complicated one is overkill. */
/* Returns a pseudo-random number between 0 and 32767. */
static int
brand ()
{
/* From "Random number generators: good ones are hard to find",
Park and Miller, Communications of the ACM, vol. 31, no. 10,
October 1988, p. 1195. filtered through FreeBSD */
long h, l;
/* Can't seed with 0. */
if (rseed == 0)
rseed = 123459876;
h = rseed / 127773;
l = rseed % 127773;
rseed = 16807 * l - 2836 * h;
#if 0
if (rseed < 0)
rseed += 0x7fffffff;
#endif
return ((unsigned int)(rseed & 32767)); /* was % 32768 */
}
As the comments imply, if you want good random numbers, use something else.

Is there a way to access DHR on the Apple 2 from Applesoft Basic

When using Applesoft Basic on the Apple 2 with an 80 column card, is there a way to create DHR graphics using only POKE?
I have found a number of solutions using third party extensions such as Beagle Graphics, but I really want to implement it myself. I've searched my Nibble magazine collection, and basic books, but have been unable to find any detailed information.
Wikipedia:
Double High-Resolution The composition
of the Double Hi-Res screen is very
complicated. In addition to the 64:1
interleaving, the pixels in the
individual rows are stored in an
unusual way: each pixel was half its
usual width and each byte of pixels
alternated between the first and
second bank of 64KB memory. Where
three consecutive on pixels were
white, six were now required in double
high-resolution. Effectively, all
pixel patterns used to make color in
Lo-Res graphics blocks could be
reproduced in Double Hi-Res graphics.
The ProDOS implementation of its RAM
disk made access to the Double Hi-Res
screen easier by making the first 8 KB
file saved to /RAM store its data at
0x012000 to 0x013fff by design. Also,
a second page was possible, and a
second file (or a larger first file)
would store its data at 0x014000 to
0x015fff. However, access via the
ProDOS file system was slow and not
well suited to page-flipping animation
in Double Hi-Res, beyond the memory
requirements.
Wikipedia says that DHR uses 64:1 interlacing, but gives no reference to the implementation. Additionally Wikipedia says you can use the /RAM disk to access, but again gives no reference to the implementation.
I am working an small program that plots a simple version of Connet's Circle Pattern. Speed isn't really as important as resolution.
A member of the comp.sys.apple2.programmer answered my question at: http://groups.google.com/group/comp.sys.apple2.programmer/browse_thread/thread/b0e8ec8911b8723b/78cd953bca521d8f
Basically you map in the Auxiliary memory from the 80 column card. Then plot on the HR screen and poke to the DHR memory location for the pixel you are trying to light/darken.
The best full example routine is:
5 HGR : POKE 49237,0 : CALL 62450 : REM clear hires then hires.aux
6 POKE 49246,0 : PG = 49236
7 SVN = 7 : HCOLOR= SVN : P5 = .5
9 GOTO 100
10 X2 = X * 4 : CL = CO : TMP = 8 : FOR I = 3 TO 0 STEP -1 : BIT = CL >= TMP:
CL = CL - BIT * TMP : TMP = TMP * P5
20 X1 = X + I: HCOLOR= SVN * BIT
30 XX = INT (X1 / SVN): H = XX * P5: POKE PG + (H= INT (H)),0
40 XX = INT (( INT (H) + (( X1 / SVN) - XX)) * SVN + P5)
50 HPLOT XX,Y: POKE PG, 0: NEXT : RETURN
100 FOR CO = 0 TO 15 : C8 = CO * 8
110 FOR X = C8 TO C8 + SVN: FOR Y = 0 TO 10 : GOSUB 10 : NEXT : NEXT
120 NEXT
130 REM color is 0 to 15
140 REM X coordinate is from 0 to 139
150 REM Y coordinate is from 0 to 191

Resources