Accumulo-monitor experiencing a Java RuntimeException when creating GeoMesa instance - accumulo

In order to setup a local GeoMesa-Accumulo stack, I've cloned a git repository (https://github.com/geodocker/geodocker-geomesa).
Next, to create an instance, I executed the following commands.
$ cd geodocker-geomesa/geodocker-accumulo-geomesa/
$ docker-compose up
However, there seems to be a problem with the accumulo-monitor as I keep getting the following message.
accumulo-monitor_1 | java.lang.RuntimeException: org.apache.accumulo.core.client.impl.ThriftScanner$ScanTimedOutException
accumulo-monitor_1 | at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:164)
accumulo-monitor_1 | at org.apache.accumulo.server.problems.ProblemReports$3.hasNext(ProblemReports.java:260)
accumulo-monitor_1 | at org.apache.accumulo.server.problems.ProblemReports.summarize(ProblemReports.java:320)
accumulo-monitor_1 | at org.apache.accumulo.monitor.Monitor.fetchData(Monitor.java:395)
accumulo-monitor_1 | at org.apache.accumulo.monitor.Monitor$2.run(Monitor.java:555)
accumulo-monitor_1 | at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
accumulo-monitor_1 | at java.lang.Thread.run(Thread.java:748)
accumulo-monitor_1 | Caused by: org.apache.accumulo.core.client.impl.ThriftScanner$ScanTimedOutException
accumulo-monitor_1 | at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:252)
accumulo-monitor_1 | at org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:80)
accumulo-monitor_1 | at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:154)
accumulo-monitor_1 | ... 6 more
What should I do to resolve this issue and to succesfully create a GeoMesa Accumulo instance?
Furthermore, what is the impact of this error? Could I ignore it? An accumulo-monitor instance has been created as it is listed when using
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4da4fb183893 quay.io/geomesa/accumulo-geomesa:geomesa-2.2.0-accumulo-1.9.2 "/sbin/entrypoint.sh…" 33 minutes ago Up 33 minutes 0.0.0.0:9995->9995/tcp, 0.0.0.0:50095->50095/tcp geodockeraccumulogeomesa_accumulo-monitor_1
8edbea5b4d96 quay.io/geomesa/accumulo-geomesa:geomesa-2.2.0-accumulo-1.9.2 "/sbin/entrypoint.sh…" 3 hours ago Up 3 hours geodockeraccumulogeomesa_accumulo-master_1
f0d4a29f278a quay.io/geomesa/hdfs:geomesa-2.2.0-accumulo-1.9.2 "/sbin/entrypoint.sh…" 3 hours ago Up 3 hours geodockeraccumulogeomesa_hdfs-data_1
3786a6292a10 quay.io/geomesa/zookeeper:latest "/sbin/entrypoint.sh…" 3 hours ago Up 3 hours 2888/tcp, 0.0.0.0:2181->2181/tcp, 3888/tcp geodockeraccumulogeomesa_zookeeper_1
b30c99bfcf0a quay.io/geomesa/hdfs:geomesa-2.2.0-accumulo-1.9.2 "/sbin/entrypoint.sh…" 3 hours ago Up 3 hours 0.0.0.0:50070->50070/tcp geodockeraccumulogeomesa_hdfs-name_1
6220fd54dabb quay.io/geomesa/geoserver:geomesa-2.2.0-accumulo-1.9.2 "/opt/tomcat/bin/cat…" 3 hours ago Up 3 hours 0.0.0.0:9090->9090/tcp geodockeraccumulogeomesa_geoserver_1
(Note that there is a difference in 'created' because I tried somethings to resolve the runTimeException.)

Related

How do ZeroMQ HEARTBEAT sockopts() settings work?

I'm using python's pyzmq==22.2.1 which should support ZeroMQ 4.2.0 (according to the API)
I'm trying to make use of the heartbeat socket options (ZMQ_HEARTBEAT_IVL, ZMQ_HEARTBEAT_TIMEOUT and ZMQ_HEARTBEAT_TTL). However, when I set these socket options, I am not receiving the expected TimeoutException or any exception on my socket. It just seems to sit there doing nothing.
What is the expected behaviour after setting these socket options ?
On the server side, how does the server detect the client has timeout and missed a heartbeat and vice versa for the client (is there an exception or something that's supposed to be thrown or something ?).
I've setup a simple router-dealer echo example below:
# Server Code:
import zmq
c = zmq.Context()
s = c.socket(zmq.ROUTER)
s.setsockopt(zmq.HEARTBEAT_IVL, 1000)
s.setsockopt(zmq.HEARTBEAT_TIMEOUT, 5000)
s.setsockopt(zmq.HEARTBEAT_TTL, 5000)
s.bind('tcp://127.0.0.1:5555')
while True:
id, data = s.recv_multipart()
s.send_multipart([id, data], zmq.NOBLOCK)
# Client Code
import zmq
import time
c = zmq.Context()
s = c.socket(zmq.DEALER)
s.HEARTBEAT_IVL = 1000
s.HEARTBEAT_TIMEOUT = 5000
s.connect('tcp://127.0.0.1:5555')
i = 0
while True:
s.send(str(i).encode())
print(s.recv())
i += 1
time.sleep(1)
Q : What is the expected behaviour after setting these socket options ?
A :well,there are two-fold effect of the said settings. One, that actually works for your setup goals ( i.e. going & sending (most probably ZMTP/3.1) ZMTP_PING connection-oriented service-sublayer "ZMTP/3.1-service-packets" and reciprocally, not sure, but most often, adequately formed "ZMTP/{3.1|2.x|1.0}-service-packets" (hopefully delivered) back. These "service-packets" are visible on the wire-line (if present - an inproc://-transport-class and vmci://-transport-class too have no actual wire a typical user can hook-on and sniff-traffic in, but some kind of pointer-acrobatics used for RAM-mapping), so a protocol-analyser will "see" them id decodes like this:
a local-initiator
MAY send:
+------+----+
| %xNN | 24 |
+------+----+
0 1
flags size
+------+---+---+---+---+
| %xNN | P | I | N | G |
+------+---+---+---+---+
2 3 4 5 6
ZMTP/3.1-Command name "PING"
+---+---+
| |
+---+---+
7 8 ping-ttl 2B
MAY be zero
MAY be ttl stored as [0:15], being a 16-bit TTL in 1/100 [s] ~ max 6553 [s]
ttl provides a strong hint
to the other peer to disconnect
if no further traffic is received after that time.
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | 0 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
ping-context (max 16B)
MAY be zero
MAY be context-specific context-payload, not more than 16B
a remote-peer
SHALL respond:
+------+----+
| %xNN | 22 |
+------+----+
0 1
flags size
+------+---+---+---+---+
| %xNN | P | O | N | G |
+------+---+---+---+---+
2 3 4 5 6
ZMTP/3.1-Command name "PONG"
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | 0 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
ping-context (echoed as obtained from PING)
ZMTP/3.1 v/s ZMTP/3.0 v/s ZMTP/2.x v/s ZMTP/1.0 differences and details get solved during version-negotiation phase, once the sublayer of connection-setup services performs all low-level needed handshaking (re)negotiations among peers trying to agree on version-, auth- & security- related principles
The second effect of these ( under-the-hood of the Context()-engine instance performed negotiations ) is that you shall never see any direct interaction with ZeroMQ-(abstract)-Message-Transport-Protocol ( ZMTP ) defined service-setup-&-maintenance-processes.
We simply enjoy the (above ZMTP generated) exposed API-calls to setup, configure and harness "our" user-level operated Signalling / Messaging infrastructure meta-plane, that is based on all the know-how "hidden" under-the-hood (and should remain so - sure, unless one decides to roll up sleeves and help develop the ZeroMQ system towards its next generation)
Q : ( is there an exception or something that's supposed to be thrown or something ? )
A :
This is why all above had to have been told first, as the due reasoning, for which there ought be a fair & honest answer no to your second question.

Pandas: Sliding window, summing app 14 day data

I do wonder how it is possible to make sliding windows in Pandas.
I have a dataframe with three columns.
Country | Number | DayOfTheYear
===================================
No | 50 | 0
No | 20 | 1
No | 37 | 2
I would love to see 14 day chunks for every country and day combination.
The country think can be ignored for the moment, since I can filter those manually in some way. But imagine there is only one country, is there a smart way to get some sort of summed up sliding window, resulting in something like the following?
Country | Sum | DatesOftheYear
===================================
No | 504 | 0-13
No | 207 | 1-14
No | 337 | 2-15
I would also accept if if they where disjunct, being only 0-13, 14-27, etc.
But I just cannot come along with Pandas. I know an old SQL solution, but is there anybody having a nice idea for Pandas?
If you want a rolling windows of your dataframe, you can simply use the .rolling function of pandas : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html
In your case : df["Number"].rolling(14).sum()

Include a label for metadata purposes without aggregating on it

I have a Gauge metric that indicates the error status of a variable number of project replications to mirrors:
project A -> project A Mirror 1 -> project A Mirror 2
project B -> project B Mirror 1 -> project B Mirror 2
...
There is one value per mirror, per project, where 1 is a successful mirror and 0 is a failure. A status label includes a variable error message string if there was a failure, and a type label differentiates mirrors.
The full data for a single time series might look something like this:
mirror_info{instance="",job="mirror-status",path="projectA",status="ok",type="a"} 1
mirror_info{instance="",job="mirror-status",path="projectA",status="ok",type="b"} 1
mirror_info{instance="",job="mirror-status",path="projectB",status="Something went wrong: full error message",type="a"} 0
mirror_info{instance="",job="mirror-status",path="projectB",status="ok",type="b"} 1
mirror_info{instance="",job="mirror-status",path="projectC",status="ok",type="a"} 1
mirror_info{instance="",job="mirror-status",path="projectC",status="ok",type="b"} 1
mirror_info{instance="",job="mirror-status",path="projectD",status="ok",type="a"} 1
mirror_info{instance="",job="mirror-status",path="projectD",status="Something different went wrong: full error message",type="b"} 0
mirror_info{instance="",job="mirror-status",path="projectE",status="ok",type="a"} 1
mirror_info{instance="",job="mirror-status",path="projectE",status="ok",type="b"} 1
I want to be able to show a table like this:
| project | status a | status b |
| ------- | -------- | --------- |
| projectA | ok | ok |
| projectB | Something went wrong: full error message | ok |
| ... | ... | ... |
| projectD | ok | Something different went wrong: full error message |
The issue I'm running into is that I can't aggregate on path without losing the status, and I can't include the status without getting a different entry for every single error message variant.
I'm too much of a beginner at PromQL to know if such a thing is even possible, and fully aware that Prometheus may not even be the right tool to use for this, however that particular requirement is beyond my control in this case.

Excel Summing Sequences?

I'm trying to do sequential summing on a spreadsheet.
The first rows are data by date, and I want do sums by the week. but Excel's autopopulate keeps screwing it up and I don't know how to fix that.
Date A
----
1 | 5
2 | 5
3 | 5
4 | 5
5 | 5
6 | 5
7 | 5
8 | 5
9 | 5
10 | 5
11 | 5
12 | 5
13 | 5
14 | 5
so what I want in another area
Week Total
1 | =sum(A1:A7)
2 | =sum(A8:A14)
3 | =sum(A15:A21)
4 | continue like this for 52 weeks
but what excel keeps giving me with it's auto populating is
Week Total
1 | =sum(A1:A7) #The first iteration
2 | =sum(A2:A8) #auto generated
3 | =sum(A3:A9) #auto generated
How can I get excel to give me the results I want here? I've been searching on summing for a while and can't seem to even phrase my question right.
=sum(indirect("A"&(row()*7-6)&":A"&(row()*7)))
pasted in row 1 and below should work
at least in sheets, it does. (and excel docs say indirect works)

Oracle 11.2 has a delay of 2 seconds for simple SQL at random times

A simple table join is done usualy in 0.0XX seconds and sometimes in 2.0XX seconds (according to PL/SQL Developer SQL execution). It sill happens when running from SQL Plus.
If I run the SQL 10 times, 8 times it runns fine and 2 times in 2+ seconds.
It's a clean install of Oracle 11.2.0.4 for Linux x86_64 on Centos 7.
I've installed Oracle recommended patches:
Patch 19769489 - Database Patch Set Update 11.2.0.4.5 (Includes CPUJan2015)
Patch 19877440 - Oracle JavaVM Component 11.2.0.4.2 Database PSU (Jan2015)
No change after patching.
The 2 tables have:
LNK_PACK_REP: 13 rows
PACKAGES: 6 rows
In SQL Plus i've enabled all statistics and runned the SQL multiple time. Only the time is changed from 0.1 to 2.1 from time to time. No other statistic is changed if I compare a run in 0.1 second with a run in 2.1 second. The server has 16 Gb RAM and 8 CPU core. Server load is under 0.1 (no user is using the server for the moment).
Output:
SQL> select PACKAGE_ID, id, package_name from LNK_PACK_REP LNKPR INNER JOIN PACKAGES P ON LNKPR.PACKAGE_ID = P.ID;
PACKAGE_ID ID PACKAGE_NAME
3 3 RAPOARTE
3 3 RAPOARTE
121 121 VANZARI
121 121 VANZARI
121 121 VANZARI
2 2 PACHETE
2 2 PACHETE
1 1 DEPARTAMENTE
1 1 DEPARTAMENTE
81 81 ROLURI
81 81 ROLURI
PACKAGE_ID ID PACKAGE_NAME
101 101 UTILIZATORI
101 101 UTILIZATORI
13 rows selected.
Elapsed: 00:00:02.01
Execution Plan
Plan hash value: 2671988802
--------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 13 | 351 | 3 (0)| 00:00:01 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | 13 | 351 | 3 (0)| 00:00:01 | Q1,02 | P->S | QC (RAND) |
|* 3 | HASH JOIN | | 13 | 351 | 3 (0)| 00:00:01 | Q1,02 | PCWP | |
| 4 | PX RECEIVE | | 6 | 84 | 2 (0)| 00:00:01 | Q1,02 | PCWP | |
| 5 | PX SEND HASH | :TQ10001 | 6 | 84 | 2 (0)| 00:00:01 | Q1,01 | P->P | HASH |
| 6 | PX BLOCK ITERATOR | | 6 | 84 | 2 (0)| 00:00:01 | Q1,01 | PCWC | |
| 7 | TABLE ACCESS FULL| PACKAGES | 6 | 84 | 2 (0)| 00:00:01 | Q1,01 | PCWP | |
| 8 | BUFFER SORT | | | | | | Q1,02 | PCWC | |
| 9 | PX RECEIVE | | 13 | 169 | 1 (0)| 00:00:01 | Q1,02 | PCWP | |
| 10 | PX SEND HASH | :TQ10000 | 13 | 169 | 1 (0)| 00:00:01 | | S->P | HASH |
| 11 | INDEX FULL SCAN | UNQ_PACK_REP | 13 | 169 | 1 (0)| 00:00:01 | | | |
--------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
3 - access("LNKPR"."PACKAGE_ID"="P"."ID")
Note
dynamic sampling used for this statement (level=2)
Statistics
24 recursive calls
0 db block gets
10 consistent gets
0 physical reads
0 redo size
923 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
13 rows processed
Table 1 structure:
-- Create table
create table PACKAGES
(
id NUMBER(3) not null,
package_name VARCHAR2(150),
position NUMBER(3),
activ NUMBER(1)
)
tablespace UM
pctfree 10
initrans 1
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
-- Create/Recreate primary, unique and foreign key constraints
alter table PACKAGES
add constraint PACKAGES_ID primary key (ID)
using index
tablespace UM
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
-- Create/Recreate indexes
create index PACKAGES_ACTIV on PACKAGES (ID, ACTIV)
tablespace UM
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
Table 2 structure:
-- Create table
create table LNK_PACK_REP
(
package_id NUMBER(3) not null,
report_id NUMBER(3) not null
)
tablespace UM
pctfree 10
initrans 1
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
-- Create/Recreate primary, unique and foreign key constraints
alter table LNK_PACK_REP
add constraint UNQ_PACK_REP primary key (PACKAGE_ID, REPORT_ID)
using index
tablespace UM
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
-- Create/Recreate indexes
create index LNK_PACK_REP_REPORT_ID on LNK_PACK_REP (REPORT_ID)
tablespace UM
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
next 1M
minextents 1
maxextents unlimited
);
In Oracle Enterprise Manager in SQL Monitor I can see the SQL that is runned multiple times. All runns have "Database Time" 0.0s (under 10 microsconds if I hover the list) and "Duration" 0.0s for normal run and 2.0s for thoose with delay.
If I go to Monitored SQL Executions for that run of 2.0s I have:
Duration: 2.0s
Database Time: 0.0s
PL/SQL & Java: 0.0
Wait activity: % (no number here)
Buffer gets: 10
IO Requests: 0
IO Bytes: 0
Fetch calls: 2
Parallel: 4
Theese numbers are consistend with a fast run except Duration that is even smaller than Database Time (10,163 microseconds Database Time and 3,748 microseconds Duration) both dispalyed as 0.0s if no mouse hover.
I don't know what else to check.
Parallel queries cannot be meaningfully tuned to within a few seconds. They are designed for queries that process large amounts of data for a long time.
The best way to optimize parallel statements with small data sets is to temporarily disable it:
alter system set parallel_max_servers=0;
(This is a good example of the advantages of developing on workstations instead of servers. On a server, this change affects everyone and you probably don't even have the privilege to run the command.)
The query may be simple but parallelism adds a lot of complexity in the background.
It's hard to say exactly why it's slower. If you have the SQL Monitoring report the wait events may help. But even those numbers may just be generic waits like "CPU". Parallel queries have a lot of overhead, in expectation of a resource-intensive, long-running query. Here are some types of overhead that may explain where those 2 seconds come from:
Dynamic sampling - Parallelism may automatically cause dynamic sampling, which reads data from the tables. Although dynamic sampling used for this statement (level=2)
may just imply missing optimizer statistics.
OS Thread startup - The SQL statement probably needs to start up 8 additional OS threads, and prepare a large amount of memory to hold all the intermediate data. Perhaps
the parameter PARALLEL_MIN_SERVERS could help prevent some time used to create those threads.
Additional monitoring - Parallel statements are automatically monitored, which requires recursive SELECTs and INSERTs.
Caching - Parallel queries often read directly from disk and skip reading and writing into the buffer cache. The rules for when it caches data are complicated and undocumented.
Downgrading - Finding the correct degree of parallelism is complicated. For example, I've compiled a list of 39 factors that influence the DOP. It's possible that one of those is causing downgrading, making some queries fast and others slow.
And there are probably dozens of other types of overhead I can't think of. Parallelism is great for massively improving the run-time of huge operations. But it doesn't work well for tiny queries.
The delay is due to parallelism as suggested by David Aldridge and Jon Heller but I don't agree the solution proposed by Jon Heller to disable parallelism for all queries (at system level). You can play with "alter session" to disable it and re-enable it before running big queries. The exact reason of the delay it's still unknown as the query finish fast in 8 out of 10 runs and I would expect a 10/10 fast run.

Resources