How to monitor the throughput of Heron Cluster - heron

I needed to get the throughput of Heron Cluster for some reasons, but there is no metric in the Heron UI. So do you have any ideas about how to monitor the throughput of Heron Cluster? Thanks.
The result of running heron-explorer as follows:
yitian#heron01:~$ heron-explorer metrics aurora/yitian/devel SentenceWordCountTopology
[2018-08-03 21:02:09 +0000] [INFO]: Using tracker URL: http://127.0.0.1:8888
'spout' metrics:
container id jvm-uptime-secs jvm-process-cpu-load jvm-memory-used-mb emit-count ack-count fail-count
------------------- ----------------- ---------------------- -------------------- ------------ ----------- ------------
container_3_spout_6 2053 0.253257 146 1.13288e+07 1.13278e+07 0
container_4_spout_7 2091 0.150625 137.5 1.1624e+07 1.16228e+07 231
'count' metrics:
container id jvm-uptime-secs jvm-process-cpu-load jvm-memory-used-mb emit-count execute-count ack-count fail-count
-------------------- ----------------- ---------------------- -------------------- ------------ --------------- ----------- ------------
container_6_count_12 2092 0.184742 155.167 0 4.6026e+07 4.6026e+07 0
container_5_count_9 2091 0.387867 146 0 4.60069e+07 4.60069e+07 0
container_6_count_11 2092 0.184488 157.833 0 4.58158e+07 4.58158e+07 0
container_4_count_8 2091 0.443688 129.833 0 4.58722e+07 4.58722e+07 0
container_5_count_10 2091 0.382577 118.5 0 4.60091e+07 4.60091e+07 0
'split' metrics:
container id jvm-uptime-secs jvm-process-cpu-load jvm-memory-used-mb emit-count execute-count ack-count fail-count
------------------- ----------------- ---------------------- -------------------- ------------ --------------- ----------- ------------
container_1_split_2 2091 0.143034 75.3333 4.59453e+07 4.59453e+06 4.59453e+06 0
container_3_split_5 2042 1.12248 79.1667 4.64862e+07 4.64862e+06 4.64862e+06 0
container_2_split_3 2150 0.139837 83.6667 4.59443e+07 4.59443e+06 4.59443e+06 0
container_1_split_1 2091 0.145702 104.167 4.59454e+07 4.59454e+06 4.59454e+06 0
container_2_split_4 2150 0.138453 106.333 4.59443e+07 4.59443e+06 4.59443e+06 0
[2018-08-03 21:02:09 +0000] [INFO]: Elapsed time: 0.031s.

You can use the execute-count of you sink component to measure the output of your topology. If each of your components have a 1:1 input:output ratio then this will be your throughput.
However, if you are windowing tuples into batches or splitting tuples (like separating sentences into individual words) then things get a little more complicated. You can get the input into your topology by looking at the emit-count of your spout components. You could then use this in comparison to you bolt execute-counts to create your own throughput metric.
An easy way to get programmatic access to these metrics is via the Heron Tracker REST API. You can use your chosen language's HTTP library (like Requests for Python) to query the last 3 hours of data for a running topology. If you require more than 3 hours of data (the maximum stored by the topology TMaster) you will need to use one of the other metrics sinks to send metrics to an external database. Heron currently provides sinks for saving to local files, Graphite or Prometheus. InfluxDB support is in the works.

Related

How can I set Slurm Partition QoS?

I created partition QOS to my Slurm partition but it isn't worked. How can I solve this problem. If anyone knows, please let me know. The following steps are my operation.
CreateQoS
$sacctmgr show qos format="Name,MaxWall,MaxTRESPerUser%30,MaxJob,MaxSubmit,Priority,Preempt"
Name MaxWall MaxTRESPU MaxJobs MaxSubmit Priority Preempt
---------- ----------- ------------------------------ ------- --------- ---------- ----------
normal 0
batchdisa+ 0 0 10
2.Attach QOS to partition
$scontrol show partition
PartitionName=sample01
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=batchdisable
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=computenode0[1-2]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=2 TotalNodes=2 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
3.Run Jobs
squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
67109044 sample01 testjob test R 1:42 1 computenode01
67109045 sample01 testjob test R 1:39 1 computenode02
I was able to solve the problem by adding the following setting to slrum.conf.
AccountingStorageEnforce=associations

How to control Spark Stream with stream of dynamic queries?

I have a stream of data coming from on Kafka call it a SourceStream.
I have another stream of Spark SQL queries whose individual values are Spark SQL queries along with a window size.
I want those queries to be applied to the SourceStream data, and pass the results of queries to the sink.
Eg.
Source Stream
Id type timestamp user amount
------- ------ ---------- ---------- --------
uuid1 A 342342 ME 10.0
uuid2 B 234231 YOU 120.10
uuid3 A 234234 SOMEBODY 23.12
uuid4 A 234233 WHO 243.1
uuid5 C 124555 IT 35.12
...
....
Query Stream
Id window query
------- ------ ------
uuid13 1 hour select 'uuid13' as u, max(amount) as output from df where type = 'A' group by ..
uuid21 5 minute select 'uuid121' as u, count(1) as output from df where amount > 100 group by ..
uuid321 1 day select 'uuid321' as u, sum(amount) as output from df where amount > 100 group by ..
...
....
Each query in query stream would be applied to the source stream's incoming data at window mentioned along with the query, and the output would be sent to the sink.
What ways can I implement it with the Spark?

Why do I see a spike of steps per second in tensorflow training initially?

Hi tensorflow experts,
I see the following training speed profile using dataset API and prefetching of 128, 256, 512, or 1024 batches (each of 128 examples):
INFO:tensorflow:Saving checkpoints for 0 into
INFO:tensorflow:loss = 0.969178, step = 0
INFO:tensorflow:global_step/sec: 70.3812
INFO:tensorflow:loss = 0.65544295, step = 100 (1.422 sec)
INFO:tensorflow:global_step/sec: 178.33
INFO:tensorflow:loss = 0.47716027, step = 200 (0.560 sec)
INFO:tensorflow:global_step/sec: 178.626
INFO:tensorflow:loss = 0.53073615, step = 300 (0.560 sec)
INFO:tensorflow:global_step/sec: 132.039
INFO:tensorflow:loss = 0.4849593, step = 400 (0.757 sec)
INFO:tensorflow:global_step/sec: 121.437
INFO:tensorflow:loss = 0.4055175, step = 500 (0.825 sec)
INFO:tensorflow:global_step/sec: 122.379
INFO:tensorflow:loss = 0.28230205, step = 600 (0.817 sec)
INFO:tensorflow:global_step/sec: 122.163
INFO:tensorflow:loss = 0.4917924, step = 700 (0.819 sec)
INFO:tensorflow:global_step/sec: 122.509
The initial spike of 178 steps per second is reproducible across multiple runs and different prefetching amount. I am trying to understanding the underlying multi-threading mechanism on why that happens.
Additional information:
my cpu usage peaks at 1800% on a 48 core machine. My gpu usage is consistently at only 9%. So it's pretty amazing that both of these are not exhausted. So I am wondering if the mutex in queue_runner is causing the cpu processing to not realize its full potential, as described here?
Thanks,
John
[update] I also observed the same spike when I use prefetch_to_device(gpu_device, ..), with similar buffer sizes. Surprisingly, prefetch_to_device only slows things down, by about 10%.
NFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into
INFO:tensorflow:loss = 1.3881096, step = 0
INFO:tensorflow:global_step/sec: 52.3374
INFO:tensorflow:loss = 0.48779136, step = 100 (1.910 sec)
INFO:tensorflow:global_step/sec: 121.154
INFO:tensorflow:loss = 0.3451385, step = 200 (0.827 sec)
INFO:tensorflow:global_step/sec: 89.3222
INFO:tensorflow:loss = 0.37804496, step = 300 (1.119 sec)
INFO:tensorflow:global_step/sec: 80.4857
INFO:tensorflow:loss = 0.49938473, step = 400 (1.242 sec)
INFO:tensorflow:global_step/sec: 79.1798
INFO:tensorflow:loss = 0.5120025, step = 500 (1.263 sec)
INFO:tensorflow:global_step/sec: 81.2081
It's common to see spikes in steps per second at the start of each training run, as the cpu had time to fill up the buffer. Your step per seconds are very reasonable compared to the start, but the lack of cpu usage might indicate a bottleneck.
First question is, whether or not you are using the Dataset API in combination with the estimator. From your terminal output I suspect you do, if not I would start by changing your code to use the Estimator class. If you are already using the Estimator class, then make sure you are following the best performance practices as documented here.
If your are doing all of the above already, then there is a bottleneck in you pipeline. Due to the low CPU usage I would guess you are experiencing an I/O bottleneck. You might have your Dataset on a slow medium (hard-drive) or you aren't using a serialized format and are saturating the IOPS (again hard-drive or network storage). In either case, start by using a serialized data format such as TF-records and upgrade your storage to SSD or multiple hard drives in raid 1,0,10 your pick.

Divide price into groups by using K-means

There is a task of dividing product prices into 3 groups {high, avg, low} groups of prices. Have tried to implement it via K-means by using sklearn package. Data is in pandas Dataframe format of float64 type
dfcl
Out[173]:
price
product_option_id
10012|0 372.15
10048|0 11.30
10049|0 12.26
10050|0 6.20
10051|0 5.90
10052|0 9.00
10053|0 11.10
10054|0 9.30
10055|0 4.20
10056|0 5.60
# Convert DataFrame to matrix
mat = dfcl.as_matrix()
# Using sklearn
km = sklearn.cluster.KMeans(n_clusters=3)
km.fit(mat)
# Get cluster assignment labels
labels = km.labels_
# Format results as a DataFrame
results = pd.DataFrame(data=labels, columns=['cluster'], index=dfcl.index)
Have gotten the results but it seems so unbalanced between groups
print('Total features -', len(results))
print('Cluster 0 -',len(results.loc[results['cluster'] == 0]))
print('Cluster 1 -',len(results.loc[results['cluster'] == 1]))
print('Cluster 2 -',len(results.loc[results['cluster'] == 2]))
Total features - 5222
Cluster 0 - 4470
Cluster 1 - 733
Cluster 2 - 19
By the way, when I recount fitting data some times happens that data highly swaps between clusters. Is there any way to solve the problem with so unbalanced data between groups and leave the cluster names static to recount algorithm? I've also tried normalizing data using preprocessing.MinMaxScaler() and it didn't help.
Maybe there are some cluster algorithms that can help me do what I want or any others hacks?
Total features - 5222
Cluster 0 - 733
Cluster 1 - 4470
Cluster 2 - 19
Probably your data distribution is already skewed. K-means minimizes squared errors; it does not care about balanced clusters.
Furthermore, k-means does not produce "low" or "high" - you need to assign such semantics yourself. You cannot assume that cluster 2 is "high".
It may be worth looking at a histogram of the data, then define thresholds for "low" and "high" as you seem fit.

Getting total number of key-value pairs in RocksDB

Is it possible to efficiently get the number of key-value pairs stored in a RocksDB key-value store?
I have looked through the wiki, and haven't seen anything discussing this topic thus far. Is such an operation even possible?
Codewisely, you could use db->GetProperty("rocksdb.estimate-num-keys", &num) to obtain the estimated number of keys stored in a rocksdb.
Another option is to use the sst_dump tool with --show_properties argument to get the number of entries, although the result would be per file basis. For example, the following command will show the properties of each SST file under the specified rocksdb directory:
sst_dump --file=/tmp/rocksdbtest-691931916/dbbench --show_properties --command=none
And here's the sample output:
Process /tmp/rocksdbtest-691931916/dbbench/000005.sst
Sst file format: block-based
Table Properties:
------------------------------
# data blocks: 845
# entries: 27857
raw key size: 668568
raw average key size: 24.000000
raw value size: 2785700
raw average value size: 100.000000
data block size: 3381885
index block size: 28473
filter block size: 0
(estimated) table size: 3410358
filter policy name: N/A
# deleted keys: 0
Process /tmp/rocksdbtest-691931916/dbbench/000008.sst
Sst file format: block-based
Table Properties:
------------------------------
# data blocks: 845
# entries: 27880
raw key size: 669120
...
Combine with some shell commands, you will be able to get the total number of entries:
sst_dump --file=/tmp/rocksdbtest-691931916/dbbench --show_properties --command=none | grep entries | cut -c 14- | awk '{x+=$0}END{print "total number of entries: " x}'
And this will generate the following output:
total number of entries: 111507
There is no way to get the count exactly. But in rocksdb 3.4 which released recently, it expose an way to get an estimate count for keys, you can try it.
https://github.com/facebook/rocksdb/releases

Resources