fast conversion from string time to milliseconds - string

For a vector or list of times, I'd like to go from a string time, e.g. 12:34:56.789 to milliseconds from midnight, which would be equal to 45296789.
This is what I do now:
toms = function(time) {
sapply(strsplit(time, ':', fixed = T),
function(x) sum(as.numeric(x)*c(3600000,60000,1000)))
}
and would like to do it faster.
Here's an example data set for benchmarking:
times = rep('12:34:56.789', 1e6)
system.time(toms(times))
# user system elapsed
# 9.00 0.04 9.05

You could use the fasttime package, which seems to be about an order of magnitude faster.
library(fasttime)
fasttoms <- function(time) {
1000*unclass(fastPOSIXct(paste("1970-01-01",time)))
}
times <- rep('12:34:56.789', 1e6)
system.time(toms(times))
# user system elapsed
# 6.61 0.03 6.68
system.time(fasttoms(times))
# user system elapsed
# 0.53 0.00 0.53
identical(fasttoms(times),toms(times))
# [1] TRUE

Related

Perf stat HW counters

perf stat ./myapp
and the result must be like this (it's just an example)
Performance counter stats for 'myapp':
83723.452481 task-clock:u (msec) # 1.004 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
3,228,188 page-faults:u # 0.039 M/sec
229,570,665,834 cycles:u # 2.742 GHz
313,163,853,778 instructions:u # 1.36 insn per cycle
69,704,684,856 branches:u # 832.559 M/sec
2,078,861,393 branch-misses:u # 2.98% of all branches
83.409183620 seconds time elapsed
74.684747000 seconds user
8.739217000 seconds sys
Perf stat prints user time and system time, and the HW counter will be incremented whatever application the cpu executes.
For HW counters like cycles or instructions, does the perf count them only for "myapp"?
For instance, (cs for context switch)
|--------------------|-------|-------------------|------------|------------------|
myapp cs cs myapp cs cs end
inst 0 10 20 50 80 100
60 instructions for "myapp" , but the value of HW counter is 100, then does the perf stat prints out 60?

Perf output is less than the number of actual instruction

I tried to count the number of instructions of add loop application in RISC-V FPGA, using very simple RV32IM core with Linux 5.4.0 buildroot.
add.c:
int main()
{
int a = 0;
for (int i = 0; i < 1024*1024; i++)
a++;
printf("RESULT: %d\n", a);
return a;
}
I used -O0 compile option so that the loop really loop, and the resulting dump file is following:
000103c8 <main>:
103c8: fe010113 addi sp,sp,-32
103cc: 00812e23 sw s0,28(sp)
103d0: 02010413 addi s0,sp,32
103d4: fe042623 sw zero,-20(s0)
103d8: fe042423 sw zero,-24(s0)
103dc: 01c0006f j 103f8 <main+0x30>
103e0: fec42783 lw a5,-20(s0)
103e4: 00178793 addi a5,a5,1 # 12001 <__TMC_END__+0x1>
103e8: fef42623 sw a5,-20(s0)
103ec: fe842783 lw a5,-24(s0)
103f0: 00178793 addi a5,a5,1
103f4: fef42423 sw a5,-24(s0)
103f8: fe842703 lw a4,-24(s0)
103fc: 001007b7 lui a5,0x100
10400: fef740e3 blt a4,a5,103e0 <main+0x18>
10404: fec42783 lw a5,-20(s0)
10408: 00078513 mv a0,a5
1040c: 01c12403 lw s0,28(sp)
10410: 02010113 addi sp,sp,32
10414: 00008067 ret
As you can see, the application loops from 103e0 ~ 10400, which is 9 instructions, so the number of total instruction must be at least 9 * 1024^2
But the result of perf stat is pretty weird
RESULT: 1048576
Performance counter stats for './add.out':
3170.45 msec task-clock # 0.841 CPUs utilized
20 context-switches # 0.006 K/sec
0 cpu-migrations # 0.000 K/sec
38 page-faults # 0.012 K/sec
156192046 cycles # 0.049 GHz (11.17%)
8482441 instructions # 0.05 insn per cycle (11.12%)
1145775 branches # 0.361 M/sec (11.25%)
3.771031341 seconds time elapsed
0.075933000 seconds user
3.559385000 seconds sys
The total number of instructions perf counted was lower than 9 * 1024^2. Difference is about 10%.
How is this happening? I think the output of perf should be larger than that, because perf tool measures not only overall add.out, but also overhead of perf itself and context-switching.

process.hrtime returns non matching second and milisecond

I use process.hrtime() to calculate the time a process takes in sec and millisec as follows:
router.post(
"/api/result-store/v1/indexing-analyzer/:searchID/:id",
async (req, res) => {
var hrstart = process.hrtime();
//some code which takes time
hrend = process.hrtime(hrstart);
console.info("Execution time (hr): %ds %dms", hrend[0], hrend[1] / 1000000);
}
);
I followed the following for code:
https://blog.abelotech.com/posts/measure-execution-time-nodejs-javascript/
So I expect to get a matching time in millisec and sec but here is what I get:
Execution time (hr): 54s 105.970357ms
So this is very strange since when I convert 54s to millisec I get this 54000 so I do not get where this "105.970357ms" comes from. Is there anything wrong with my code? why do I see this mismatch?
According to process.hrtime() documentation it returns an array [seconds, nanoseconds], where nanoseconds is the remaining part of the real time that can't be represented in second precision.
1 nanosecond = 10^9 seconds
1 millisecond = 10^6 nanoseconds
In your case the execution took 54 seconds and 105.970357 milliseconds or
54000 milliseconds + 105.970357 milliseconds.
Or if you need it in seconds: (hrend[0]+ hrend[1] / Math.pow(10,9))

What does pcpu signify and why multiply by 1000?

I was reading about calculating the cpu usage of a process.
seconds = utime / Hertz
total_time = utime + stime
IF include_dead_children
total_time = total_time + cutime + cstime
ENDIF
seconds = uptime - starttime / Hertz
pcpu = (total_time * 1000 / Hertz) / seconds
print: "%CPU" pcpu / 10 "." pcpu % 10
What I don't get is, by 'seconds' the algorithm means the time computer spent doing operations other than the interested process, and before it. Since, uptime is the time our computer spent being operational and starttime means the time our [interested] process started.
Then why are we dividing the total_time by seconds [Time computer spent doing something else] to get pcpu? It doesn't make sense.
The standard meanings of the variables:
# Name Description
14 utime CPU time spent in user code, measured in jiffies
15 stime CPU time spent in kernel code, measured in jiffies
16 cutime CPU time spent in user code, including time from children
17 cstime CPU time spent in kernel code, including time from children
22 starttime Time when the process started, measured in jiffies
/proc/uptime :The uptime of the system (seconds), and the amount of time spent in idle process (seconds).
Hertz :Number of clock ticks per second
Now that you've provided what each of the variables represent, here's some comments on the pseudo-code:
seconds = utime / Hertz
The above line is pointless, as the new value of seconds is never used before it's overwritten a few lines later.
total_time = utime + stime
Total running time (user + system) of the process, in jiffies, since both utime and stime are.
IF include_dead_children
total_time = total_time + cutime + cstime
ENDIF
This should probably just say total_time = cutime + cstime, since the definitions seem to indicate that, e.g. cutime already includes utime, plus the time spent by children in user mode. So, as written, this overstates the value by including the contribution from this process twice. Or, the definition is wrong... Regardless, the total_time is still in jiffies.
seconds = uptime - starttime / Hertz
uptime is already in seconds; starttime / Hertz converts starttime from jiffies to seconds, so seconds becomes essentially "the time in seconds since this process was started".
pcpu = (total_time * 1000 / Hertz) / seconds
total_time is still in jiffies, so total_time / Hertz converts that to seconds, which is the number of CPU seconds consumed by the process. That divided by seconds would give the scaled CPU-usage percentage since process start if it were a floating point operation. Since it isn't, it's scaled by 1000 to give a resolution of 1/10%. The scaling is forced to be done early by the use of parentheses, to preserve accuracy.
print: "%CPU" pcpu / 10 "." pcpu % 10
And this undoes the scaling, by finding the dividend and the remainder when dividing pcpu by 10, and printing those values in a format that looks like a floating point value.

Why doesn't perf report cache misses?

According to perf tutorials, perf stat is supposed to report cache misses using hardware counters. However, on my system (up-to-date Arch Linux), it doesn't:
[joel#panda goog]$ perf stat ./hash
Performance counter stats for './hash':
869.447863 task-clock # 0.997 CPUs utilized
92 context-switches # 0.106 K/sec
4 cpu-migrations # 0.005 K/sec
1,041 page-faults # 0.001 M/sec
2,628,646,296 cycles # 3.023 GHz
819,269,992 stalled-cycles-frontend # 31.17% frontend cycles idle
132,355,435 stalled-cycles-backend # 5.04% backend cycles idle
4,515,152,198 instructions # 1.72 insns per cycle
# 0.18 stalled cycles per insn
1,060,739,808 branches # 1220.015 M/sec
2,653,157 branch-misses # 0.25% of all branches
0.871766141 seconds time elapsed
What am I missing? I already searched the man page and the web, but didn't find anything obvious.
Edit: my CPU is an Intel i5 2300K, if that matters.
On my system, an Intel Xeon X5570 # 2.93 GHz I was able to get perf stat to report cache references and misses by requesting those events explicitly like this
perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations sleep 5
Performance counter stats for 'sleep 5':
10573 cache-references
1949 cache-misses # 18.434 % of all cache refs
1077328 cycles # 0.000 GHz
715248 instructions # 0.66 insns per cycle
151188 branches
154 faults
0 migrations
5.002776842 seconds time elapsed
The default set of events did not include cache events, matching your results, I don't know why
perf stat -B sleep 5
Performance counter stats for 'sleep 5':
0.344308 task-clock # 0.000 CPUs utilized
1 context-switches # 0.003 M/sec
0 CPU-migrations # 0.000 M/sec
154 page-faults # 0.447 M/sec
977183 cycles # 2.838 GHz
586878 stalled-cycles-frontend # 60.06% frontend cycles idle
430497 stalled-cycles-backend # 44.05% backend cycles idle
720815 instructions # 0.74 insns per cycle
# 0.81 stalled cycles per insn
152217 branches # 442.095 M/sec
7646 branch-misses # 5.02% of all branches
5.002763199 seconds time elapsed
In the latest source code, the default event does not include cache-misses and cache-references again:
struct perf_event_attr default_attrs[] = {
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES },
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS },
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_FRONTEND },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_BACKEND },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },
};
So the man and most web are out of date as so far.
I've spent some minutes trying to understand perf. I found out the cache-misses by first recording and then reporting the data (both perf tools).
To see a list of events:
perf list
For example, in order to check the last-level-cache load misses, you will need to use the event LLC-loads-misses like this
perf record -e LLC-loads-misses ./your_program
then report the results
perf report -v

Resources