I'm using Ganglia, which use RRDTool as database, to get chart about my servers, but after installing, setting the things up, I notice that there's a bit of confusion in the way that the data are represented, at least in the Ganglia Web interface.
Let give some examples:
(I delete my IP's address for security reasons)
What this chart means ? I would like to see the amount of disk in my servers. At the Y axis it gives the idea of Kbytes, is that right ? How should I read this chart ?
Another example, is this one:
What the Y-axis means ? What is been represented here ? Which metric ? Kb, Mb, Gb ?
I think the chart shouldn't give margin to the imagination, any person should be able to see the chart and know was is been represented here.
So I would like to do that, but how can I correct these lack of information ? (or misunderstanding information)
Related
We are moving to an online D365 environment and we're trying to determine how much data we're using in the Dataverse tables. Under Capacity I can go into the environment details and see how much space per table is used, but it's a daily value. We're trying to remove some data as we're reaching some of our capacity limits -- but I can't find where it shows how much data is being used per table in real time. Thanks for any advise on how to pull this.
This article does mention how to get to the capacity limits and usage, but all values appear to be updated just daily:
https://learn.microsoft.com/en-us/power-platform/admin/capacity-storage?source=docs
I'm trying to find some way to see the data used in real time.
We are trying to do a POC to change the way we are storing content in the geode region. We operate on the sketches (sizes can vary from 1GB to 30GB) and currently breaking them into parcels and storing the parcels in the region. We then read these parcels, merge them to create a complete sketch for our processing. We are seeing some inconsistencies in the data due to the cache eviction and trying to come up with an approach of storing the complete object in the region instead of storing the parts.
I was looking at Geode documentation but did not seem to find the size limitation for any entry in the region, but wanted to reach a broader group in case anyone has done anything similar or have some insights into it.
Thanks for your response in advance.
Best Regards,
Amit
According to what I've been investigating, the maximum object size is set as 1GB, you can have a look at GEODE-478 and commit 1e3f89ddcd for further details. It's worth mentioning, as a side note, that objects that big might cause problems with GC, so you might want to stay away from that.
Cheers.
I am doing a broadcast join of two tables A and B.
B is a cached table created with the following Spark SQL:
create table B as select segment_ids_hash from stb_ranker.c3po_segments
where
from_unixtime(unix_timestamp(string(dayid), 'yyyyMMdd')) >= CAST('2019-07-31 00:00:00.000000000' AS TIMESTAMP)
and
segmentid_check('(6|8|10|12|14|371|372|373|374|375|376|582|583|585|586|587|589|591|592|594|596|597|599|601|602|604|606|607|609|610|611|613|615|616)', seg_ids) = true
cache table B
The column 'segment_ids_hash' is of integer type and the result contains 36.4 million records.
The cached table size is about 140 MB, as shown below
Then I did the join as follows:
select count(*) from A broadcast join B on A.segment_ids_hash = B.segment_ids_hash
Here broadcast exchange data size is about 3.2 GB.
My question is why the broadcast exchange data size (3.2GB) is so much bigger than the raw data size (~140 MB). What are the overheads? Is there any way to reduce the broadcast exchange data size?
Thanks
Tl; dr: I'm also learning about the source of data size metrics. This one is probably only the estimated size of the operation, it might not reflect the actual size of the data. Don't worry about it too much for now.
Full version:
Update: got back to correct some mistakes. I see that the previous answer was lacking some depth, so I'll try to dig deeper for this as I can (I'm still relatively new to answering question).
Update 2: rephrasing, removed some overly done joke (sry)
Ok, so this thing might be very long but I think this metrics is not really is the direct size of the data.
To begin with, I made a test run for this one to reproduce the results with 200 executors and 4 cores:
This returned this results:
Now I see that something is interesting, since the dataSize for my test is around 1.2GB not 3.2 GB, this led me to read the source code of Spark.
When I go to github, I see that the 4 numbers in BroadcastExchange corresponding to this:
First link: BroadcastHashJoinExec: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
Data size corresponding to this one:
I found the relation val here appear to be a HashedRelationBroadcastMode.
Go to HashedRelation https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala:
Since we have Some(Numrows) (it's the number of row of the DF). The match case use case one (line 926:927)
Go back to HashedRelation constructor-y like part:
Since the join is for hashed int, the type is not Long => the join use UnsafeHashedRelation
To UnsafeHashedRelation:
Now we go to the place in UnsafeHashedRelation that determine the estimated size, I found this:
Focus on the estimated size, our target is the binaryMap object (later on the code assign map = binaryMap)
Then it go to here:
binaryMap is a BytestoBytesMap, which corresponding to here https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
Jump to getTotalMemoryConsumption method (the one that get estimatedSize), we got:
This is my current deadend at the moment. Just my two cents, I don't think it is a bug but just the estimated size of the join, and since this is an estimated size, I don't really think it has to be very accurate (yeah but it's weird to be honest in this case because the difference is very big).
In case that you want to continue to play with the dataSize on this one. One approach is to directly impact binaryMap object by modifying the input for its constructor. Look back at this:
There are two variables that can be configured, which is MEMORY_OFFHEAP_ENABLED and BUFFER_PAGE size. Perhaps you can try to experiment with those two configuration during spark-submit. That is also the reason why the BroadcastExec size doesn't change even when you changed the number of executors and cores.
So in conclusion, I think data size is an estimation generated by some fascinating mechanism (This one I'm also waiting for someone with more expertise to explain this as I'm digging into it), not directly the size that you have mentioned in the first image (140 MB). As such, it probably not worth to spend much time to reduce the overhead of this particular metrics.
Some bonus related stuff:
https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-SparkPlan-BroadcastExchangeExec.html
https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-UnsafeRow.html
I met one issue related to this.
the broadcast Table is only 2.5GB, but the spark give error mentioned the broadcast table hit 8GB,so can't broadcast the table.I think it is due to estimate size.
Another is that my broadcast able is reading two columns from one batch of files, and that overall size of the batch files is around 12GB.
I'm trying to view our peak bytes received/sec count for an eventhub in order to scale it properly. However, the portal is showing vastly different results from the "daily" view to the "hourly" view.
This is the graph using the "hourly" view:
From here, it looks like I'm peaking at around 2.5 MB/s. However, if I switch to the "daily" view, the numbers are vastly different:
So I can't make sense of this. It's the exact same counter, showing vastly different results. Anyone know if the Azure portal performs any "adding" or similar?
Edit: Notice that the counter is "bytes received per second". It shouldn't matter if I look at the hourly or daily view, the number of items per second shouldn't be affected by that (tho it clearly is).
When I tried to monitor Cassandra_node with JMX, I had a problem.
In detail, I got a negative value from jmx["org.apache.cassandra.metrics:type=Storage,name=Load","Count"].
In Cassandra Wiki, the definition of this metrics is:
Total disk space used (in bytes) for this node
Is it possible to get a negative value from this metrics? and why?
Yes its possible. It depends a bit on version but there were some bugs like CASSANDRA-8205 and CASSANDRA-7239 in particular around the load. If its operating like it should, those will be accurate though.
You can always drop down to OS level and monitor it by looking at du on the data directory.