How do ZeroMQ HEARTBEAT sockopts() settings work?

How do ZeroMQ HEARTBEAT sockopts() settings work? - python-3.x

I'm using python's pyzmq==22.2.1 which should support ZeroMQ 4.2.0 (according to the API)
I'm trying to make use of the heartbeat socket options (ZMQ_HEARTBEAT_IVL, ZMQ_HEARTBEAT_TIMEOUT and ZMQ_HEARTBEAT_TTL). However, when I set these socket options, I am not receiving the expected TimeoutException or any exception on my socket. It just seems to sit there doing nothing.
What is the expected behaviour after setting these socket options ?
On the server side, how does the server detect the client has timeout and missed a heartbeat and vice versa for the client (is there an exception or something that's supposed to be thrown or something ?).
I've setup a simple router-dealer echo example below:
# Server Code:
import zmq
c = zmq.Context()
s = c.socket(zmq.ROUTER)
s.setsockopt(zmq.HEARTBEAT_IVL, 1000)
s.setsockopt(zmq.HEARTBEAT_TIMEOUT, 5000)
s.setsockopt(zmq.HEARTBEAT_TTL, 5000)
s.bind('tcp://127.0.0.1:5555')
while True:
id, data = s.recv_multipart()
s.send_multipart([id, data], zmq.NOBLOCK)
# Client Code
import zmq
import time
c = zmq.Context()
s = c.socket(zmq.DEALER)
s.HEARTBEAT_IVL = 1000
s.HEARTBEAT_TIMEOUT = 5000
s.connect('tcp://127.0.0.1:5555')
i = 0
while True:
s.send(str(i).encode())
print(s.recv())
i += 1
time.sleep(1)

Q : What is the expected behaviour after setting these socket options ?
A :well,there are two-fold effect of the said settings. One, that actually works for your setup goals ( i.e. going & sending (most probably ZMTP/3.1) ZMTP_PING connection-oriented service-sublayer "ZMTP/3.1-service-packets" and reciprocally, not sure, but most often, adequately formed "ZMTP/{3.1|2.x|1.0}-service-packets" (hopefully delivered) back. These "service-packets" are visible on the wire-line (if present - an inproc://-transport-class and vmci://-transport-class too have no actual wire a typical user can hook-on and sniff-traffic in, but some kind of pointer-acrobatics used for RAM-mapping), so a protocol-analyser will "see" them id decodes like this:
a local-initiator
MAY send:
+------+----+
| %xNN | 24 |
+------+----+
0 1
flags size
+------+---+---+---+---+
| %xNN | P | I | N | G |
+------+---+---+---+---+
2 3 4 5 6
ZMTP/3.1-Command name "PING"
+---+---+
| |
+---+---+
7 8 ping-ttl 2B
MAY be zero
MAY be ttl stored as [0:15], being a 16-bit TTL in 1/100 [s] ~ max 6553 [s]
ttl provides a strong hint
to the other peer to disconnect
if no further traffic is received after that time.
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | 0 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
ping-context (max 16B)
MAY be zero
MAY be context-specific context-payload, not more than 16B
a remote-peer
SHALL respond:
+------+----+
| %xNN | 22 |
+------+----+
0 1
flags size
+------+---+---+---+---+
| %xNN | P | O | N | G |
+------+---+---+---+---+
2 3 4 5 6
ZMTP/3.1-Command name "PONG"
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | 0 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
ping-context (echoed as obtained from PING)
ZMTP/3.1 v/s ZMTP/3.0 v/s ZMTP/2.x v/s ZMTP/1.0 differences and details get solved during version-negotiation phase, once the sublayer of connection-setup services performs all low-level needed handshaking (re)negotiations among peers trying to agree on version-, auth- & security- related principles
The second effect of these ( under-the-hood of the Context()-engine instance performed negotiations ) is that you shall never see any direct interaction with ZeroMQ-(abstract)-Message-Transport-Protocol ( ZMTP ) defined service-setup-&-maintenance-processes.
We simply enjoy the (above ZMTP generated) exposed API-calls to setup, configure and harness "our" user-level operated Signalling / Messaging infrastructure meta-plane, that is based on all the know-how "hidden" under-the-hood (and should remain so - sure, unless one decides to roll up sleeves and help develop the ZeroMQ system towards its next generation)
Q : ( is there an exception or something that's supposed to be thrown or something ? )
A :
This is why all above had to have been told first, as the due reasoning, for which there ought be a fair & honest answer no to your second question.

Related

Counting 15's in Cribbage Hand

Background
This is a followup question to my previous finding a straight in a cribbage hand question and Counting Pairs in Cribbage Hand
Objective
Count the number of ways cards can be combined to a total of 15, then score 2 points for each pair. Ace worth 1, and J,Q,K are worth 10.
What I have Tried
So my first poke at a solution required 26 different formulas. Basically I checked each possible way to combine cards to see if the total was 15. 1 way to add 5 cards, 5 ways to add 4 cards, 10 ways to add 3 cards, and 10 ways to add 2 cards. I thought I had this licked until I realized I was only looking at combinations, I had not considered the fact that I had to cap the value of cards 11, 12, and 13 to 10. I initially tried an array formula something along the lines of:
MIN(MOD(B1:F1-1,13)+1,10)
But the problem with this is that MIN takes the minimum value of all results not the individual results compared to 10.
I then tried it with an IF function, which worked, but involved the use of CSE formula even wehen being used with SUMPRODUCT which is something I try to avoid when I can
IF(MOD(B1:F1-1,13)+1<11,MOD(B1:F1-1,13)+1,10)
Then I stumble on an answer to a question in code golf which I modified to lead me to this formula, which I kind of like for some strange reason, but its a bit long in repetitive use:
--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2)
My current working formulas are:
5 card check
=(SUMPRODUCT(--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2))=15)*2
4 card checks
=(SUM(AGGREGATE(15,6,--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2),{1,2,3,4}))=15)*2
=(SUM(AGGREGATE(15,6,--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2),{1,2,3,5}))=15)*2
=(SUM(AGGREGATE(15,6,--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2),{1,2,4,5}))=15)*2
=(SUM(AGGREGATE(15,6,--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2),{1,3,4,5}))=15)*2
=(SUM(AGGREGATE(15,6,--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2),{2,3,4,5}))=15)*2
3 card checks
same as 4 card checks using all combinations for 3 cards in the {1,2,3}.
There are 10 different combinations, so 10 different formulas.
The 2 card check was based on the solution by Tom in Counting Pairs in Cribbage Hand and all two cards are checked with a single formula. (yes it is CSE)
2 card check
{=SUM(--(--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2)+TRANSPOSE(--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2))=15))}
Question
Can the 3 and 4 card combination sum check be brought into a single formula similar to the 2 card check?
Is there a better way to convert cards 11,12,13 to a value of 10?
Sample Data
| B | C | D | E | F | POINTS
+----+----+----+----+----+
| 1 | 2 | 3 | 17 | 31 | <= 2 (all 5 add to 15)
| 1 | 2 | 3 | 17 | 32 | <= 2 (Last 4 add to 15)
| 11 | 18 | 31 | 44 | 5 | <= 16 ( 4x(J+5), 4X(5+5+5) )
| 6 | 7 | 8 | 9 | 52 | <= 4 (6+9, 7+8)
| 1 | 3 | 7 | 8 | 52 | <= 2 (7+8)
| 2 | 3 | 7 | 9 | 52 | <= 2 (2+3+K)
| 2 | 4 | 6 | 23 | 52 | <= 0 (nothing add to 15)
Excel Version
Excel 2013

For 5:
=(SUMPRODUCT(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))=15)*2
For 4:
=SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,ROW($1:$5),{1,2,3,4;1,2,3,5;1,2,4,5;1,3,4,5;2,3,4,5}),ROW($1:$4)^0)=15))*2
For 3
=SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,ROW($1:$10),{1,2,3;1,2,4;1,2,5;1,3,4;1,3,5;1,4,5;2,3,4;2,3,5;2,4,5;3,4,5}),ROW($1:$3)^0)=15))*2
For 2:
SUMPRODUCT(--((CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))+(TRANSPOSE(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)))=15))
All together:
=(SUMPRODUCT(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))=15)*2+
SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,ROW($1:$5),{1,2,3,4;1,2,3,5;1,2,4,5;1,3,4,5;2,3,4,5}),ROW($1:$4)^0)=15))*2+
SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,ROW($1:$10),{1,2,3;1,2,4;1,2,5;1,3,4;1,3,5;1,4,5;2,3,4;2,3,5;2,4,5;3,4,5}),ROW($1:$3)^0)=15))*2+
SUMPRODUCT(--((CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))+(TRANSPOSE(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)))=15))
For older versions we need to "trick" INDEX into accepting the arrays as Row and Column References:
We do that by using N(IF({1},[thearray]))
=(SUMPRODUCT(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))=15)*2+
SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,N(IF({1},ROW($1:$5))),N(IF({1},{1,2,3,4;1,2,3,5;1,2,4,5;1,3,4,5;2,3,4,5}))),ROW($1:$4)^0)=15))*2+
SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,N(IF({1},ROW($1:$10))),N(IF({1},{1,2,3;1,2,4;1,2,5;1,3,4;1,3,5;1,4,5;2,3,4;2,3,5;2,4,5;3,4,5}))),ROW($1:$3)^0)=15))*2+
SUMPRODUCT(--((CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))+(TRANSPOSE(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)))=15))
This is a CSE That must be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode.

Adding zeros to a string without generating a new variable

I am trying to add zeros to a string variable in such a way that all levels of the variables have same number of digits (assume 3).
clear
input tina bina str4 pine
1 10 "99"
1 11 "99"
2 11 "99"
2 11 "99"
3 12 "."
4 12 "888"
5 14 "88"
6 15 "777"
7 16 "77"
8 17 "0"
8 18 "7"
end
I managed to do this by generating a new variable which stores the number of digits I need to add to each observation in order to reach 3:
generate pi=3-strlen(pine)
replace pine= ("0"*pi) + pine if strlen(pine)<3
I wonder if there is a way to obtain the same result but without generating the variable?
I tried the following but it does not work :
replace pine= ("0"*(`=3-strlen(pine)')) + pine if strlen(pine)<3
Probably I am not so clear about what happens when I evaluate expressions.

Your approach does not work because it evaluates the expression for the first observation only:
. display `= 3 - strlen(pine)'
1
The single quotes are not required:
replace pine = ("0" * (3-strlen(pine) ) ) + pine if strlen(pine) < 3
+--------------------+
| tina bina pine |
|--------------------|
1. | 1 10 099 |
2. | 1 11 099 |
3. | 2 11 099 |
4. | 2 11 099 |
5. | 3 12 00. |
|--------------------|
6. | 4 12 888 |
7. | 5 14 088 |
8. | 6 15 777 |
9. | 7 16 077 |
10. | 8 17 000 |
|--------------------|
11. | 8 18 007 |
+--------------------+

I know there is already an accepted answer, but I wanted to throw out my suggestion. This is maybe a little bit simpler than the other answer and is straightforward to explain. You just want to replace a string variable of real numbers with leading zeros and keep it as a string. You can easily do this by running:
replace pine = string(real(pine),"%03.0f")
Depending on your goal this is maybe better than the previous answer, because it maintains your missing value as missing and not add zeros to it. Hopefully this helpful.

Spark: count events based on two columns

I have a table with events which are grouped by a uid. All rows have the columns uid, visit_num and event_num.
visit_num is an arbitrary counter that occasionally increases. event_num is the counter of interactions within the visit.
I want to merge these two counters into a single interaction counter that keeps increasing by 1 for each event and continues to increase when then next visit has started.
As I only look at the relative distance between events, it's fine if I don't start the counter at 1.
|uid |visit_num|event_num|interaction_num|
| 1 | 1 | 1 | 1 |
| 1 | 1 | 2 | 2 |
| 1 | 2 | 1 | 3 |
| 1 | 2 | 2 | 4 |
| 2 | 1 | 1 | 500 |
| 2 | 2 | 1 | 501 |
| 2 | 2 | 2 | 502 |
I can achieve this by repartitioning the data and using the monotonically_increasing_id like this:
df.repartition("uid")\
.sort("visit_num", "event_num")\
.withColumn("iid", fn.monotonically_increasing_id())
However the documentation states:
The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.
As the id seems to be monotonically increasing by partition this seems fine. However:
I am close to reaching the 1 billion partition/uid threshold.
I don't want to rely on the current implementation not changing.
Is there a way I can start each uid with 1 as the first interaction num?
Edit
After testing this some more, I notice that some of the users don't seem to have consecutive iid values using the approach described above.
Edit 2: Windowing
Unfortunately there are some (rare) cases where more thanone row has the samevisit_numandevent_num`. I've tried using the windowing function as below, but due to this assigning the same rank to two identical columns, this is not really an option.
iid_window = Window.partitionBy("uid").orderBy("visit_num", "event_num")
df_sample_iid=df_sample.withColumn("iid", fn.rank().over(iid_window))

The best solution is the Windowing function with rank, as suggested by Jacek Laskowski.
iid_window = Window.partitionBy("uid").orderBy("visit_num", "event_num")
df_sample_iid=df_sample.withColumn("iid", fn.rank().over(iid_window))
In my specific case some more data cleaning was required but generally, this should work.

RFC 1035 Header Structure

I'm studying about dns and would like to understand about this information, because I could not fully understand.
The header contains the following fields:
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ID |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|QR| Opcode |AA|TC|RD|RA| Z | RCODE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QDCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ANCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| NSCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ARCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
I would to know what mean this numbers on top.

The numbers across the top are simply the bit numbers within the 16 bit word, although as is common with the RFC series of documents they're ordered from most significant bit to least, instead of the (more intuitive) other way around.
So, for example, given an array data of octets containing that header, the ID would be:
(data[0] << 8) | data[1]
and the QR bit would be the most significant bit of data[2]

Oracle 11.2 has a delay of 2 seconds for simple SQL at random times

A simple table join is done usualy in 0.0XX seconds and sometimes in 2.0XX seconds (according to PL/SQL Developer SQL execution). It sill happens when running from SQL Plus.
If I run the SQL 10 times, 8 times it runns fine and 2 times in 2+ seconds.
It's a clean install of Oracle 11.2.0.4 for Linux x86_64 on Centos 7.
I've installed Oracle recommended patches:
Patch 19769489 - Database Patch Set Update 11.2.0.4.5 (Includes CPUJan2015)
Patch 19877440 - Oracle JavaVM Component 11.2.0.4.2 Database PSU (Jan2015)
No change after patching.
The 2 tables have:
LNK_PACK_REP: 13 rows
PACKAGES: 6 rows
In SQL Plus i've enabled all statistics and runned the SQL multiple time. Only the time is changed from 0.1 to 2.1 from time to time. No other statistic is changed if I compare a run in 0.1 second with a run in 2.1 second. The server has 16 Gb RAM and 8 CPU core. Server load is under 0.1 (no user is using the server for the moment).
Output:
SQL> select PACKAGE_ID, id, package_name from LNK_PACK_REP LNKPR INNER JOIN PACKAGES P ON LNKPR.PACKAGE_ID = P.ID;
PACKAGE_ID ID PACKAGE_NAME
3 3 RAPOARTE
3 3 RAPOARTE
121 121 VANZARI
121 121 VANZARI
121 121 VANZARI
2 2 PACHETE
2 2 PACHETE
1 1 DEPARTAMENTE
1 1 DEPARTAMENTE
81 81 ROLURI
81 81 ROLURI
PACKAGE_ID ID PACKAGE_NAME
101 101 UTILIZATORI
101 101 UTILIZATORI
13 rows selected.
Elapsed: 00:00:02.01
Execution Plan
Plan hash value: 2671988802
--------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 13 | 351 | 3 (0)| 00:00:01 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | 13 | 351 | 3 (0)| 00:00:01 | Q1,02 | P->S | QC (RAND) |
|* 3 | HASH JOIN | | 13 | 351 | 3 (0)| 00:00:01 | Q1,02 | PCWP | |
| 4 | PX RECEIVE | | 6 | 84 | 2 (0)| 00:00:01 | Q1,02 | PCWP | |
| 5 | PX SEND HASH | :TQ10001 | 6 | 84 | 2 (0)| 00:00:01 | Q1,01 | P->P | HASH |
| 6 | PX BLOCK ITERATOR | | 6 | 84 | 2 (0)| 00:00:01 | Q1,01 | PCWC | |
| 7 | TABLE ACCESS FULL| PACKAGES | 6 | 84 | 2 (0)| 00:00:01 | Q1,01 | PCWP | |
| 8 | BUFFER SORT | | | | | | Q1,02 | PCWC | |
| 9 | PX RECEIVE | | 13 | 169 | 1 (0)| 00:00:01 | Q1,02 | PCWP | |
| 10 | PX SEND HASH | :TQ10000 | 13 | 169 | 1 (0)| 00:00:01 | | S->P | HASH |
| 11 | INDEX FULL SCAN | UNQ_PACK_REP | 13 | 169 | 1 (0)| 00:00:01 | | | |
--------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
3 - access("LNKPR"."PACKAGE_ID"="P"."ID")
Note
dynamic sampling used for this statement (level=2)
Statistics
24 recursive calls
0 db block gets
10 consistent gets
0 physical reads
0 redo size
923 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
13 rows processed
Table 1 structure:
-- Create table
create table PACKAGES
(
id NUMBER(3) not null,
package_name VARCHAR2(150),
position NUMBER(3),
activ NUMBER(1)
)
tablespace UM
pctfree 10
initrans 1
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
-- Create/Recreate primary, unique and foreign key constraints
alter table PACKAGES
add constraint PACKAGES_ID primary key (ID)
using index
tablespace UM
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
-- Create/Recreate indexes
create index PACKAGES_ACTIV on PACKAGES (ID, ACTIV)
tablespace UM
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
Table 2 structure:
-- Create table
create table LNK_PACK_REP
(
package_id NUMBER(3) not null,
report_id NUMBER(3) not null
)
tablespace UM
pctfree 10
initrans 1
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
-- Create/Recreate primary, unique and foreign key constraints
alter table LNK_PACK_REP
add constraint UNQ_PACK_REP primary key (PACKAGE_ID, REPORT_ID)
using index
tablespace UM
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
-- Create/Recreate indexes
create index LNK_PACK_REP_REPORT_ID on LNK_PACK_REP (REPORT_ID)
tablespace UM
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
In Oracle Enterprise Manager in SQL Monitor I can see the SQL that is runned multiple times. All runns have "Database Time" 0.0s (under 10 microsconds if I hover the list) and "Duration" 0.0s for normal run and 2.0s for thoose with delay.
If I go to Monitored SQL Executions for that run of 2.0s I have:
Duration: 2.0s
Database Time: 0.0s
PL/SQL & Java: 0.0
Wait activity: % (no number here)
Buffer gets: 10
IO Requests: 0
IO Bytes: 0
Fetch calls: 2
Parallel: 4
Theese numbers are consistend with a fast run except Duration that is even smaller than Database Time (10,163 microseconds Database Time and 3,748 microseconds Duration) both dispalyed as 0.0s if no mouse hover.
I don't know what else to check.

Parallel queries cannot be meaningfully tuned to within a few seconds. They are designed for queries that process large amounts of data for a long time.
The best way to optimize parallel statements with small data sets is to temporarily disable it:
alter system set parallel_max_servers=0;
(This is a good example of the advantages of developing on workstations instead of servers. On a server, this change affects everyone and you probably don't even have the privilege to run the command.)
The query may be simple but parallelism adds a lot of complexity in the background.
It's hard to say exactly why it's slower. If you have the SQL Monitoring report the wait events may help. But even those numbers may just be generic waits like "CPU". Parallel queries have a lot of overhead, in expectation of a resource-intensive, long-running query. Here are some types of overhead that may explain where those 2 seconds come from:
Dynamic sampling - Parallelism may automatically cause dynamic sampling, which reads data from the tables. Although dynamic sampling used for this statement (level=2)
may just imply missing optimizer statistics.
OS Thread startup - The SQL statement probably needs to start up 8 additional OS threads, and prepare a large amount of memory to hold all the intermediate data. Perhaps
the parameter PARALLEL_MIN_SERVERS could help prevent some time used to create those threads.
Additional monitoring - Parallel statements are automatically monitored, which requires recursive SELECTs and INSERTs.
Caching - Parallel queries often read directly from disk and skip reading and writing into the buffer cache. The rules for when it caches data are complicated and undocumented.
Downgrading - Finding the correct degree of parallelism is complicated. For example, I've compiled a list of 39 factors that influence the DOP. It's possible that one of those is causing downgrading, making some queries fast and others slow.
And there are probably dozens of other types of overhead I can't think of. Parallelism is great for massively improving the run-time of huge operations. But it doesn't work well for tiny queries.

The delay is due to parallelism as suggested by David Aldridge and Jon Heller but I don't agree the solution proposed by Jon Heller to disable parallelism for all queries (at system level). You can play with "alter session" to disable it and re-enable it before running big queries. The exact reason of the delay it's still unknown as the query finish fast in 8 out of 10 runs and I would expect a 10/10 fast run.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string