Cassandra : high GC activity while the cluster seems to do nothing - cassandra

I have shutdown every web services using Cassandra.
I have shutdown any ETL using Cassandra.
The last domain-level table compaction is from yesterday (2021-11-18T15:47:00.822). Since then, only compactions on system tables have occured :
Compaction History:
id keyspace_name columnfamily_name compacted_at bytes_in bytes_out rows_merged
c0f4b1e0-4917-11ec-bf5a-0d5dfeeee6e2 system sstable_activity 2021-11-19T10:04:51.198 78314 19505 {1:12, 4:601}
5cd3e350-490f-11ec-bf5a-0d5dfeeee6e2 system size_estimates 2021-11-19T09:04:47.237 115889 26314 {4:6}
9ba752d0-48fe-11ec-bf5a-0d5dfeeee6e2 system sstable_activity 2021-11-19T07:04:51.197 77987 19558 {1:12, 4:601}
3786d260-48f6-11ec-bf5a-0d5dfeeee6e2 system size_estimates 2021-11-19T06:04:47.238 115994 26169 {4:6}
765a41e0-48e5-11ec-bf5a-0d5dfeeee6e2 system sstable_activity 2021-11-19T04:04:51.198 77853 19531 {1:8, 4:601}
12399a60-48dd-11ec-bf5a-0d5dfeeee6e2 system size_estimates 2021-11-19T03:04:47.238 115978 26290 {4:6}
510cbbc0-48cc-11ec-bf5a-0d5dfeeee6e2 system sstable_activity 2021-11-19T01:04:51.196 78419 19595 {1:12, 4:601}
ecec1440-48c3-11ec-bf5a-0d5dfeeee6e2 system size_estimates 2021-11-19T00:04:47.236 115838 26175 {4:6}
2bbf83c0-48b3-11ec-bf5a-0d5dfeeee6e2 system sstable_activity 2021-11-18T22:04:51.196 77380 19566 {1:12, 4:601}
c79edc40-48aa-11ec-bf5a-0d5dfeeee6e2 system size_estimates 2021-11-18T21:04:47.236 116007 26208 {4:6}
06735d30-489a-11ec-bf5a-0d5dfeeee6e2 system sstable_activity 2021-11-18T19:04:51.203 76300 19101 {1:9, 2:3, 3:2, 4:599}
a2517d30-4891-11ec-bf5a-0d5dfeeee6e2 system size_estimates 2021-11-18T18:04:47.235 115858 26258 {4:6}
e3e5a870-4882-11ec-bf5a-0d5dfeeee6e2 system_distributed repair_history 2021-11-18T16:19:14.807 5220983 5232639 {1:49, 2:1, 3:2}
e10c5ba0-4880-11ec-bf5a-0d5dfeeee6e2 system sstable_activity 2021-11-18T16:04:51.034 75302 19166 {1:46, 2:33, 3:50, 4:549}
But still, the Cassandra cluster seems to have high garbadge collector activity :
WARN [Service Thread] 2021-11-18 19:14:17,736 GCInspector.java:283 - ParNew GC in 1073ms. CMS Old Gen: 13461870544 -> 13461916520; Par Eden Space: 1716774456 -> 0; Par Survivor Space: 13116112 -> 57443048
WARN [Service Thread] 2021-11-18 19:14:19,116 GCInspector.java:283 - ParNew GC in 1070ms. CMS Old Gen: 13461916520 -> 13461979400; Par Eden Space: 1714728464 -> 0; Par Survivor Space: 57443048 -> 37282896
WARN [Service Thread] 2021-11-18 19:14:20,466 GCInspector.java:283 - ParNew GC in 1070ms. CMS Old Gen: 13461979400 -> 13462018112; Par Eden Space: 1718091776 -> 0; Par Survivor Space: 37282896 -> 17129408
WARN [Service Thread] 2021-11-18 19:14:21,816 GCInspector.java:283 - ParNew GC in 1070ms. CMS Old Gen: 13462018112 -> 13462045144; Par Eden Space: 1718091776 -> 0; Par Survivor Space: 17129408 -> 39569800
WARN [Service Thread] 2021-11-18 19:14:23,164 GCInspector.java:283 - ParNew GC in 1071ms. CMS Old Gen: 13462045144 -> 13462076376; Par Eden Space: 1717080600 -> 0; Par Survivor Space: 39569800 -> 26910864
WARN [Service Thread] 2021-11-18 19:14:24,524 GCInspector.java:283 - ParNew GC in 1071ms. CMS Old Gen: 13462076376 -> 13462113800; Par Eden Space: 1718091776 -> 0; Par Survivor Space: 26910864 -> 36179936
WARN [Service Thread] 2021-11-18 19:14:25,869 GCInspector.java:283 - ParNew GC in 1069ms. CMS Old Gen: 13462113800 -> 13462137272; Par Eden Space: 1717733528 -> 0; Par Survivor Space: 36179936 -> 30547296
WARN [Service Thread] 2021-11-18 19:14:27,230 GCInspector.java:283 - ParNew GC in 1069ms. CMS Old Gen: 13462137272 -> 13462163256; Par Eden Space: 1718091776 -> 0; Par Survivor Space: 30547296 -> 33604888
WARN [Service Thread] 2021-11-18 19:14:28,574 GCInspector.java:283 - ParNew GC in 1073ms. CMS Old Gen: 13462163256 -> 13462187040; Par Eden Space: 1715261960 -> 0; Par Survivor Space: 33604888 -> 28871272
WARN [Service Thread] 2021-11-18 19:14:29,946 GCInspector.java:283 - ParNew GC in 1069ms. CMS Old Gen: 13462187040 -> 13462216656; Par Eden Space: 1718091776 -> 0; Par Survivor Space: 28871272 -> 37053656
WARN [Service Thread] 2021-11-18 19:14:31,328 GCInspector.java:283 - ParNew GC in 1070ms. CMS Old Gen: 13462216656 -> 13462237976; Par Eden Space: 1718091776 -> 0; Par Survivor Space: 37053656 -> 23342920
WARN [Service Thread] 2021-11-18 19:14:32,743 GCInspector.java:283 - ParNew GC in 1071ms. CMS Old Gen: 13462237976 -> 13462278432; Par Eden Space: 1718091776 -> 0; Par Survivor Space: 23342920 -> 21896200
WARN [Service Thread] 2021-11-18 19:14:34,206 GCInspector.java:283 - ParNew GC in 1071ms. CMS Old Gen: 13462278432 -> 13462343008; Par Eden Space: 1718091776 -> 0; Par Survivor Space: 21896200 -> 20168000
WARN [Service Thread] 2021-11-18 19:14:35,696 GCInspector.java:283 - ParNew GC in 1070ms. CMS Old Gen: 13462343008 -> 13462438104; Par Eden Space: 1717981344 -> 0; Par Survivor Space: 20168000 -> 29781856
WARN [Service Thread] 2021-11-18 19:14:37,115 GCInspector.java:283 - ParNew GC in 1072ms. CMS Old Gen: 13462438104 -> 13462532752; Par Eden Space: 1717180224 -> 0; Par Survivor Space: 29781856 -> 15873392
...
WARN [Service Thread] 2021-11-19 10:34:10,753 GCInspector.java:283 - ParNew GC in 1081ms. CMS Old Gen: 21366236160 -> 22047866248; Par Eden Space: 1692018856 -> 0;
WARN [Service Thread] 2021-11-19 10:34:11,961 GCInspector.java:283 - ParNew GC in 1080ms. CMS Old Gen: 22047866248 -> 22711292400; Par Eden Space: 1718091776 -> 0;
WARN [Service Thread] 2021-11-19 10:34:13,190 GCInspector.java:283 - ParNew GC in 1082ms. CMS Old Gen: 22711292400 -> 23322328920; Par Eden Space: 1718091776 -> 0;
WARN [Service Thread] 2021-11-19 10:34:14,414 GCInspector.java:283 - ParNew GC in 1076ms. CMS Old Gen: 23322328920 -> 23938244632; Par Eden Space: 1710429576 -> 0;
WARN [Service Thread] 2021-11-19 10:34:15,628 GCInspector.java:283 - ParNew GC in 1083ms. CMS Old Gen: 23938244632 -> 24531937352; Par Eden Space: 1718091776 -> 0;
WARN [Service Thread] 2021-11-19 10:34:17,014 GCInspector.java:283 - ParNew GC in 1079ms. CMS Old Gen: 24531937352 -> 25077213400; Par Eden Space: 1718091776 -> 0;
WARN [Service Thread] 2021-11-19 10:34:18,219 GCInspector.java:283 - ParNew GC in 1082ms. CMS Old Gen: 25077213400 -> 25634088464; Par Eden Space: 1689565160 -> 0;
WARN [Service Thread] 2021-11-19 10:34:19,423 GCInspector.java:283 - ParNew GC in 1085ms. CMS Old Gen: 25634088464 -> 26549529728; Par Eden Space: 1714413672 -> 0;
WARN [Service Thread] 2021-11-19 10:34:20,656 GCInspector.java:283 - ParNew GC in 1088ms. CMS Old Gen: 26549529728 -> 27291610392; Par Eden Space: 1707391776 -> 0;
WARN [Service Thread] 2021-11-19 10:34:21,951 GCInspector.java:283 - ParNew GC in 1080ms. CMS Old Gen: 27290538440 -> 27875777144; Par Eden Space: 1718054488 -> 0;
WARN [Service Thread] 2021-11-19 10:34:23,171 GCInspector.java:283 - ParNew GC in 1082ms. CMS Old Gen: 27788203256 -> 28539500224; Par Eden Space: 1717476200 -> 0;
WARN [Service Thread] 2021-11-19 10:34:24,404 GCInspector.java:283 - ParNew GC in 1084ms. CMS Old Gen: 28313984168 -> 28943208880; Par Eden Space: 1690698568 -> 0;
WARN [Service Thread] 2021-11-19 10:34:25,674 GCInspector.java:283 - ParNew GC in 1079ms. CMS Old Gen: 28649641192 -> 29197701416; Par Eden Space: 1667998792 -> 0;
WARN [Service Thread] 2021-11-19 10:34:26,911 GCInspector.java:283 - ParNew GC in 1075ms. CMS Old Gen: 28973128960 -> 29454364992; Par Eden Space: 1718091776 -> 0;
WARN [Service Thread] 2021-11-19 10:34:28,137 GCInspector.java:283 - ParNew GC in 1079ms. CMS Old Gen: 29252627776 -> 29846619728; Par Eden Space: 1718091776 -> 0;
WARN [Service Thread] 2021-11-19 10:34:29,345 GCInspector.java:283 - ParNew GC in 1083ms. CMS Old Gen: 28703301152 -> 29313662360; Par Eden Space: 1684884992 -> 0;
How it is possible ?
Thank you

A contractor of us set the GC to CMS, even if the memory allocated to the heap was > 32 GB. That's why we saw such messages, and long GC pauses.
Turning GC to G1GC solve the issue

Related

Java eden space is not 8 times larger than s0 space

according to oracle's doc default parameter values for SurvivorRatio is 8, that means each survivor space will be one-eighth the size of eden space.
but in my application it don't work
$ jmap -heap 48865
Attaching to process ID 48865, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.45-b02
using thread-local object allocation.
Parallel GC with 8 thread(s)
Heap Configuration:
MinHeapFreeRatio = 0
MaxHeapFreeRatio = 100
MaxHeapSize = 4294967296 (4096.0MB)
NewSize = 89128960 (85.0MB)
MaxNewSize = 1431306240 (1365.0MB)
OldSize = 179306496 (171.0MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
PS Young Generation
Eden Space:
capacity = 67108864 (64.0MB)
used = 64519920 (61.53099060058594MB)
free = 2588944 (2.4690093994140625MB)
96.14217281341553% used
From Space:
capacity = 11010048 (10.5MB)
used = 0 (0.0MB)
free = 11010048 (10.5MB)
0.0% used
To Space:
capacity = 11010048 (10.5MB)
used = 0 (0.0MB)
free = 11010048 (10.5MB)
0.0% used
PS Old Generation
capacity = 179306496 (171.0MB)
used = 0 (0.0MB)
free = 179306496 (171.0MB)
0.0% used
7552 interned Strings occupying 605288 bytes.
but in VisualVM eden space is 1.332G and S0 is 455M, eden is only 3 times larger than S0 not the 8
You have neither disabled -XX:-UseAdaptiveSizePolicy, nor set -Xms equal to -Xmx, so JVM is free to resize heap generations (and survivor spaces) in runtime. In this case the estimated maximum Survior size is
MaxSurvivor = NewGen / MinSurvivorRatio
where -XX:MinSurvivorRatio=3 by default. Note: this is an estimated maximum, not the actual size.
See also this answer.

I am sufferring JAVA G1 issue

does any one encounter this kind of issue in java G1 gc
the first highlight user time is about 4 ms
but the second one user time is 0 ms and system time is about 4ms.
in G1 gc system time shouldn't be high, is it a bug in G1 gc?
below is my gc argunments
Xms200g -Xmx200g -Xmn30g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSCompactAtFullCollection -XX:CMSMaxAbortablePrecleanTime=5000 -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -verbose:gc -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
2018-01-07T04:54:39.995+0800: 906650.864: [GC (Allocation Failure) 2018-01-07T04:54:39.996+0800: 906650.865: [ParNew
Desired survivor size 1610612736 bytes, new threshold 6 (max 6)
- age 1: 69747632 bytes, 69747632 total
- age 2: 9641544 bytes, 79389176 total
- age 3: 10522192 bytes, 89911368 total
- age 4: 11732392 bytes, 101643760 total
- age 5: 9158960 bytes, 110802720 total
- age 6: 10917528 bytes, 121720248 total
: 25341731K->170431K(28311552K), 0.2088528 secs] 153045380K->127882325K(206569472K), 0.2094236 secs] [Times: **user=4.53 sys=0.00, real=0.21 secs]**
Heap after GC invocations=32432 (full 10):
par new generation total 28311552K, used 170431K [0x00007f6058000000, 0x00007f67d8000000, 0x00007f67d8000000)
eden space 25165824K, 0% used [0x00007f6058000000, 0x00007f6058000000, 0x00007f6658000000)
from space 3145728K, 5% used [0x00007f6658000000, 0x00007f666266ffe0, 0x00007f6718000000)
to space 3145728K, 0% used [0x00007f6718000000, 0x00007f6718000000, 0x00007f67d8000000)
concurrent mark-sweep generation total 178257920K, used 127711893K [0x00007f67d8000000, 0x00007f9258000000, 0x00007f9258000000)
Metaspace used 54995K, capacity 55688K, committed 56028K, reserved 57344K
}
2018-01-07T04:54:40.205+0800: 906651.074: Total time for which application threads were stopped: 0.2269738 seconds, Stopping threads took: 0.0001692 seconds
{Heap before GC invocations=32432 (full 10):
par new generation total 28311552K, used 25336255K [0x00007f6058000000, 0x00007f67d8000000, 0x00007f67d8000000)
eden space 25165824K, 100% used [0x00007f6058000000, 0x00007f6658000000, 0x00007f6658000000)
from space 3145728K, 5% used [0x00007f6658000000, 0x00007f666266ffe0, 0x00007f6718000000)
to space 3145728K, 0% used [0x00007f6718000000, 0x00007f6718000000, 0x00007f67d8000000)
concurrent mark-sweep generation total 178257920K, used 127711893K [0x00007f67d8000000, 0x00007f9258000000, 0x00007f9258000000)
Metaspace used 54995K, capacity 55688K, committed 56028K, reserved 57344K
2018-01-07T04:55:02.541+0800: 906673.411: [GC (Allocation Failure) 2018-01-07T04:55:02.542+0800: 906673.411: [ParNew
Desired survivor size 1610612736 bytes, new threshold 6 (max 6)
- age 1: 93841912 bytes, 93841912 total
- age 2: 11310104 bytes, 105152016 total
- age 3: 8967160 bytes, 114119176 total
- age 4: 10278920 bytes, 124398096 total
- age 5: 11626160 bytes, 136024256 total
- age 6: 9077432 bytes, 145101688 total
: 25336255K->195827K(28311552K), 0.1926783 secs] 153048149K->127918291K(206569472K), 0.1932366 secs] [Times: **user=0.00 sys=4.07, real=0.20 secs]**
Heap after GC invocations=32433 (full 10):
par new generation total 28311552K, used 195827K [0x00007f6058000000, 0x00007f67d8000000, 0x00007f67d8000000)
eden space 25165824K, 0% used [0x00007f6058000000, 0x00007f6058000000, 0x00007f6658000000)
from space 3145728K, 6% used [0x00007f6718000000, 0x00007f6723f3cf38, 0x00007f67d8000000)
to space 3145728K, 0% used [0x00007f6658000000, 0x00007f6658000000, 0x00007f6718000000)
concurrent mark-sweep generation total 178257920K, used 127722463K [0x00007f67d8000000, 0x00007f9258000000, 0x00007f9258000000)
Metaspace used 54995K, capacity 55688K, committed 56028K, reserved 57344K
}
2018-01-07T04:55:02.735+0800: 906673.604: Total time for which application threads were stopped: 0.2149603 seconds, Stopping threads took: 0.0002262 seconds
2018-01-07T04:55:14.673+0800: 906685.542: Total time for which application threads were stopped: 0.0183883 seconds, Stopping threads took: 0.0002046 seconds
2018-01-07T04:55:14.797+0800: 906685.666: Total time for which application threads were stopped: 0.0135349 seconds, Stopping threads took: 0.0002472 seconds
2018-01-07T04:55:14.810+0800: 906685.679: Total time for which application threads were stopped: 0.0129019 seconds, Stopping threads took: 0.0001014 seconds
2018-01-07T04:55:14.823+0800: 906685.692: Total time for which application threads were stopped: 0.0125939 seconds, Stopping threads took: 0.0002915 seconds
2018-01-07T04:55:21.597+0800: 906692.466: Total time for which application threads were stopped: 0.0137018 seconds, Stopping threads took: 0.0001683 seconds
{Heap before GC invocations=32433 (full 10):
your command-line specifies -XX:+UseConcMarkSweepGC - this isn't a G1 issue.

Cassandra nodes going down on trying to query

Cassandra nodes go down and the query fails with an consistency error.
INFO [Service Thread] 2017-07-10 02:49:18,159 GCInspector.java:258 - ConcurrentMarkSweep GC in 6330ms. CMS Old Gen: 2908389776 -> 2987845256; Par Eden Space: 671088640 -> 0;
INFO [Service Thread] 2017-07-10 02:49:27,138 GCInspector.java:258 - ConcurrentMarkSweep GC in 8897ms. CMS Old Gen: 2987845256 -> 3324514112; Par Eden Space: 671088640 -> 0;
INFO [Service Thread] 2017-07-10 02:49:34,948 GCInspector.java:258 - ConcurrentMarkSweep GC in 7667ms. CMS Old Gen: 3324514112 -> 3342860256; Par Eden Space: 671088640 -> 277520992;
INFO [Service Thread] 2017-07-10 02:49:45,485 GCInspector.java:258 - ConcurrentMarkSweep GC in 9951ms. CMS Old Gen: 3342860256 -> 3342860216; Par Eden Space: 671088640 -> 671088632; Par Survivor Space: 83886072 -> 21614264
INFO [Service Thread] 2017-07-10 02:49:54,541 GCInspector.java:258 - ConcurrentMarkSweep GC in 8684ms. CMS Old Gen: 3342860264 -> 3342860232; Par Eden Space: 671088632 -> 671088616; Par Survivor Space: 83886064 -> 72300944
Garbage Collection seems to be taking a long time.
What could be causing it? How can I fix the problem?

Spark: graphx api OOM errors after unpersist useless RDDs

I have met an Out Of Memeory error with unknown reasons, I have released the useless RDDs immediately, but after several round of loop, OOM error still come out. My code is as following:
// single source shortest path
def sssp[VD](graph:Graph[VD,Double], source: VertexId): Graph[Double, Double] = {
graph.mapVertices((id, _) => if (id == source) 0.0 else Double.PositiveInfinity)
.pregel(Double.PositiveInfinity)(
(id, dist, newDist) => scala.math.min(dist, newDist),
triplet => {
if (triplet.srcAttr + triplet.attr < triplet.dstAttr) {
Iterator((triplet.dstId, triplet.srcAttr + triplet.attr))
}
else {
Iterator.empty
}
},
(a, b) => math.min(a, b)
)
}
def selectCandidate(candidates: RDD[(VertexId, (Double, Double))]): VertexId = {
Random.setSeed(System.nanoTime())
val selectLow = Random.nextBoolean()
val (vid, (_, _)) = if (selectLow) {
println("Select lowest bound")
candidates.reduce((x, y) => if (x._2._1 < y._2._1) x else y)
} else {
println("Select highest bound")
candidates.reduce((x, y) => if (x._2._2 > y._2._2) x else y)
}
vid
}
val g = {/* load graph from hdfs*/}.partitionBy(EdgePartition2D,eParts).cache
println("Vertices Size: " + g.vertices.count )
println("Edges Size: " + g.edges.count )
val resultDiameter = {
val diff = 0d
val maxIterations = 100
val filterJoin = 1e5
val vParts = 100
var deltaHigh = Double.PositiveInfinity
var deltaLow = Double.NegativeInfinity
var candidates = g.vertices.map(x => (x._1, (Double.NegativeInfinity,
Double.PositiveInfinity)))
.partitionBy(new HashPartitioner(vParts))
.persist(StorageLevel.MEMORY_AND_DISK) // (vid, low, high)
var round = 0
var candidateCount = candidates.count
while (deltaHigh - deltaLow > diff && candidateCount > 0 && round <= maxIterations) {
val currentVertex = dia.selectCandidate(candidates)
val dist: RDD[(VertexId, Double)] = dia.sssp(g, currentVertex)
.vertices
.partitionBy(new HashPartitioner(vParts)) // join more efficiently
.persist(StorageLevel.MEMORY_AND_DISK)
val eccentricity = dist.map({ case (vid, length) => length }).max
println("Eccentricity = %.1f".format(eccentricity))
val subDist = if(candidateCount > filterJoin) {
println("Directly use Dist")
dist
} else { // when candidates is small than filterJoin, filter the useless vertices
println("Filter Dist")
val candidatesMap = candidates.sparkContext.broadcast(
candidates.collect.toMap)
val subDist = dist.filter({case (vid, length) =>
candidatesMap.value.contains(vid)})
.persist(StorageLevel.MEMORY_AND_DISK)
println("Sub Dist Count: " + subDist.count)
subDist
}
var previousCandidates = candidates
candidates = candidates.join(subDist).map({ case (vid, ((low, high), d)) =>
(vid,
(Array(low, eccentricity - d, d).max,
Array(high, eccentricity + d).min))
}).persist(StorageLevel.MEMORY_AND_DISK)
candidateCount = candidates.count
println("Candidates Count 1 : " + candidateCount)
previousCandidates.unpersist(true) // release useless rdd
dist.unpersist(true) // release useless rdd
deltaLow = Array(deltaLow,
candidates.map({ case (_, (low, _)) => low }).max).max
deltaHigh = Array(deltaHigh, 2 * eccentricity,
candidates.map({ case (_, (_, high)) => high }).max).min
previousCandidates = candidates
candidates = candidates.filter({ case (_, (low, high)) =>
!((high <= deltaLow && low >= deltaHigh / 2d) || low == high)
})
.partitionBy(new HashPartitioner(vParts)) // join more efficiently
.persist(StorageLevel.MEMORY_AND_DISK)
candidateCount = candidates.count
println("Candidates Count 2:" + candidateCount)
previousCandidates.unpersist(true) // release useless rdd
round += 1
println(s"Round=${round},Low=${deltaLow}, High=${deltaHigh}, Candidates=${candidateCount}")
}
deltaLow
}
println(s"Diameter $resultDiameter")
println("Complete!")
The main data in the while block is a graph object g and an RDD candidates. g is used to compute single source shourtest path in each round and graph structure not changed. candidates size will be decreased round by round.
In each round, I manually unpersist the useless rdd with blocking mode, so I think it should have enough memory for the following operations. However, it stops for OOM in round 7 or 6 randomly. When the program came in round 6 or 7, candidates decrease seriously, about 10% or less of the origin one. Output sample as following, the candidates size decrease from 15,288,624 in round 1 to 67,451 in round 7:
Vertices Size: 15,288,624
Edges Size: 228,097,574
Select lowest bound
Eccentricity = 12.0
Directly use Dist
Candidates Count 1 : 15288624
Candidates Count 2:15288623
Round=1,Low=12.0, High=24.0, Candidates=15288623
Select lowest bound
Eccentricity = 13.0
Directly use Dist
Candidates Count 1 : 15288623
Candidates Count 2:15288622
Round=2,Low=13.0, High=24.0, Candidates=15288622
Select highest bound
Eccentricity = 18.0
Directly use Dist
Candidates Count 1 : 15288622
Candidates Count 2:6578370
Round=3,Low=18.0, High=23.0, Candidates=6578370
Select lowest bound
Eccentricity = 12.0
Directly use Dist
Candidates Count 1 : 6578370
Candidates Count 2:6504563
Round=4,Low=18.0, High=23.0, Candidates=6504563
Select lowest bound
Eccentricity = 11.0
Directly use Dist
Candidates Count 1 : 6504563
Candidates Count 2:412789
Round=5,Low=18.0, High=22.0, Candidates=412789
Select highest bound
Eccentricity = 17.0
Directly use Dist
Candidates Count 1 : 412789
Candidates Count 2:288670
Round=6,Low=18.0, High=22.0, Candidates=288670
Select highest bound
Eccentricity = 18.0
Directly use Dist
Candidates Count 1 : 288670
Candidates Count 2:67451
Round=7,Low=18.0, High=22.0, Candidates=67451
The near ends of the spark.info log
6/12/12 14:03:09 WARN YarnAllocator: Expected to find pending requests, but found none.
16/12/12 14:06:21 INFO YarnAllocator: Canceling requests for 0 executor containers
16/12/12 14:06:33 WARN YarnAllocator: Expected to find pending requests, but found none.
16/12/12 14:14:26 WARN NioEventLoop: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
16/12/12 14:18:14 WARN NioEventLoop: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
at io.netty.util.internal.MpscLinkedQueue.offer(MpscLinkedQueue.java:123)
at io.netty.util.internal.MpscLinkedQueue.add(MpscLinkedQueue.java:218)
at io.netty.util.concurrent.SingleThreadEventExecutor.fetchFromScheduledTaskQueue(SingleThreadEventExecutor.java:260)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:347)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:374)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
at java.lang.Thread.run(Thread.java:744)
16/12/12 14:18:14 WARN DFSClient: DFSOutputStream ResponseProcessor exception for block BP-552217672-100.76.16.204-1470826698239:blk_1377987137_304302272
java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:116)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:721)
16/12/12 14:14:39 WARN AbstractConnector:
java.lang.OutOfMemoryError: Java heap space
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:233)
at org.spark-project.jetty.server.nio.SelectChannelConnector.accept(SelectChannelConnector.java:109)
at org.spark-project.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:938)
at org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:744)
16/12/12 14:20:06 INFO ApplicationMaster: Final app status: FAILED, exitCode: 12, (reason: Exception was thrown 1 time(s) from Reporter thread.)
16/12/12 14:19:38 WARN DFSClient: Error Recovery for block BP-552217672-100.76.16.204-1470826698239:blk_1377987137_304302272 in pipeline 100.76.15.28:9003, 100.76.48.218:9003, 100.76.48.199:9003: bad datanode 100.76.15.28:9003
16/12/12 14:18:58 ERROR ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
16/12/12 14:20:49 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.remote.default-remote-dispatcher-198] shutting down ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: Java heap space
16/12/12 14:20:49 INFO SparkContext: Invoking stop() from shutdown hook
16/12/12 14:20:49 INFO ContextCleaner: Cleaned shuffle 446
16/12/12 14:20:49 WARN AkkaRpcEndpointRef: Error sending message [message = RemoveRdd(2567)] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#-213595070]] had already been terminated.. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at org.spark-project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
at scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
at scala.concurrent.Future$class.recover(Future.scala:324)
at scala.concurrent.impl.Promise$DefaultPromise.recover(Promise.scala:153)
at org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:376)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:100)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
at org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:104)
at org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1630)
at org.apache.spark.ContextCleaner.doCleanupRDD(ContextCleaner.scala:208)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:185)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:180)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:180)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1180)
at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:173)
at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:68)
Caused by: akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#-213595070]] had already been terminated.
at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:132)
at org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:364)
... 12 more
16/12/12 14:20:49 WARN QueuedThreadPool: 5 threads could not be stopped
16/12/12 14:20:49 INFO SparkUI: Stopped Spark web UI at http://10.215.154.152:56338
16/12/12 14:20:49 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/12/12 14:20:49 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/12/12 14:21:04 WARN AkkaRpcEndpointRef: Error sending message [message = RemoveRdd(2567)] in 2 attempts
org.apache.spark.rpc.RpcTimeoutException: Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#-213595070]] had already been terminated.. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
The near ends of the gc.log
2016-12-12T14:10:43.541+0800: 16832.953: [Full GC 2971008K->2971007K(2971008K), 11.4284920 secs]
2016-12-12T14:10:54.990+0800: 16844.403: [Full GC 2971007K->2971007K(2971008K), 11.4479110 secs]
2016-12-12T14:11:06.457+0800: 16855.870: [GC 2971007K(2971008K), 0.6827710 secs]
2016-12-12T14:11:08.825+0800: 16858.237: [Full GC 2971007K->2971007K(2971008K), 11.5480350 secs]
2016-12-12T14:11:20.384+0800: 16869.796: [Full GC 2971007K->2971007K(2971008K), 11.0481490 secs]
2016-12-12T14:11:31.442+0800: 16880.855: [Full GC 2971007K->2971007K(2971008K), 11.0184790 secs]
2016-12-12T14:11:42.472+0800: 16891.884: [Full GC 2971008K->2971008K(2971008K), 11.3124900 secs]
2016-12-12T14:11:53.795+0800: 16903.207: [Full GC 2971008K->2971008K(2971008K), 10.9517160 secs]
2016-12-12T14:12:04.760+0800: 16914.172: [Full GC 2971008K->2971007K(2971008K), 11.0969500 secs]
2016-12-12T14:12:15.868+0800: 16925.281: [Full GC 2971008K->2971008K(2971008K), 11.1244090 secs]
2016-12-12T14:12:27.003+0800: 16936.416: [Full GC 2971008K->2971008K(2971008K), 11.0206800 secs]
2016-12-12T14:12:38.035+0800: 16947.448: [Full GC 2971008K->2971008K(2971008K), 11.0024270 secs]
2016-12-12T14:12:49.048+0800: 16958.461: [Full GC 2971008K->2971008K(2971008K), 10.9831440 secs]
2016-12-12T14:13:00.042+0800: 16969.454: [GC 2971008K(2971008K), 0.7338780 secs]
2016-12-12T14:13:02.496+0800: 16971.908: [Full GC 2971008K->2971007K(2971008K), 11.1536860 secs]
2016-12-12T14:13:13.661+0800: 16983.074: [Full GC 2971007K->2971007K(2971008K), 10.9956150 secs]
2016-12-12T14:13:24.667+0800: 16994.080: [Full GC 2971007K->2971007K(2971008K), 11.0139660 secs]
2016-12-12T14:13:35.691+0800: 17005.104: [GC 2971007K(2971008K), 0.6693770 secs]
2016-12-12T14:13:38.115+0800: 17007.527: [Full GC 2971007K->2971006K(2971008K), 11.0514040 secs]
2016-12-12T14:13:49.178+0800: 17018.590: [Full GC 2971007K->2971007K(2971008K), 10.8881160 secs]
2016-12-12T14:14:00.076+0800: 17029.489: [GC 2971007K(2971008K), 0.7046370 secs]
2016-12-12T14:14:02.498+0800: 17031.910: [Full GC 2971007K->2971007K(2971008K), 11.3424300 secs]
2016-12-12T14:14:13.862+0800: 17043.274: [Full GC 2971008K->2971006K(2971008K), 11.6215890 secs]
2016-12-12T14:14:25.503+0800: 17054.915: [GC 2971006K(2971008K), 0.7196840 secs]
2016-12-12T14:14:27.857+0800: 17057.270: [Full GC 2971008K->2971007K(2971008K), 11.3879990 secs]
2016-12-12T14:14:39.266+0800: 17068.678: [Full GC 2971007K->2971007K(2971008K), 11.1611420 secs]
2016-12-12T14:14:50.446+0800: 17079.859: [GC 2971007K(2971008K), 0.6976180 secs]
2016-12-12T14:14:52.782+0800: 17082.195: [Full GC 2971007K->2971007K(2971008K), 11.4318900 secs]
2016-12-12T14:15:04.235+0800: 17093.648: [Full GC 2971007K->2971007K(2971008K), 11.3429010 secs]
2016-12-12T14:15:15.598+0800: 17105.010: [GC 2971007K(2971008K), 0.6832320 secs]
2016-12-12T14:15:17.930+0800: 17107.343: [Full GC 2971008K->2971007K(2971008K), 11.1898520 secs]
2016-12-12T14:15:29.131+0800: 17118.544: [Full GC 2971007K->2971007K(2971008K), 10.9680150 secs]
2016-12-12T14:15:40.110+0800: 17129.522: [GC 2971007K(2971008K), 0.7444890 secs]
2016-12-12T14:15:42.508+0800: 17131.920: [Full GC 2971007K->2971007K(2971008K), 11.3052160 secs]
2016-12-12T14:15:53.824+0800: 17143.237: [Full GC 2971007K->2971007K(2971008K), 10.9484100 secs]
2016-12-12T14:16:04.783+0800: 17154.196: [Full GC 2971007K->2971007K(2971008K), 10.9543950 secs]
2016-12-12T14:16:15.748+0800: 17165.160: [GC 2971007K(2971008K), 0.7066150 secs]
2016-12-12T14:16:18.176+0800: 17167.588: [Full GC 2971007K->2971007K(2971008K), 11.1201370 secs]
2016-12-12T14:16:29.307+0800: 17178.719: [Full GC 2971007K->2971007K(2971008K), 11.0746950 secs]
2016-12-12T14:16:40.392+0800: 17189.805: [Full GC 2971007K->2971007K(2971008K), 11.0036170 secs]
2016-12-12T14:16:51.407+0800: 17200.819: [Full GC 2971007K->2971007K(2971008K), 10.9655670 secs]
2016-12-12T14:17:02.383+0800: 17211.796: [Full GC 2971007K->2971007K(2971008K), 10.7348560 secs]
2016-12-12T14:17:13.128+0800: 17222.540: [GC 2971007K(2971008K), 0.6679470 secs]
2016-12-12T14:17:15.450+0800: 17224.862: [Full GC 2971007K->2971007K(2971008K), 10.6219270 secs]
2016-12-12T14:17:26.081+0800: 17235.494: [Full GC 2971007K->2971007K(2971008K), 10.9158450 secs]
2016-12-12T14:17:37.016+0800: 17246.428: [Full GC 2971007K->2971007K(2971008K), 11.3107490 secs]
2016-12-12T14:17:48.337+0800: 17257.750: [Full GC 2971007K->2971007K(2971008K), 11.0769460 secs]
2016-12-12T14:17:59.424+0800: 17268.836: [GC 2971007K(2971008K), 0.6707600 secs]
2016-12-12T14:18:01.850+0800: 17271.262: [Full GC 2971007K->2970782K(2971008K), 12.6348300 secs]
2016-12-12T14:18:14.496+0800: 17283.909: [GC 2970941K(2971008K), 0.7525790 secs]
2016-12-12T14:18:16.890+0800: 17286.303: [Full GC 2971006K->2970786K(2971008K), 13.1047470 secs]
2016-12-12T14:18:30.008+0800: 17299.421: [GC 2970836K(2971008K), 0.8139710 secs]
2016-12-12T14:18:32.458+0800: 17301.870: [Full GC 2971005K->2970873K(2971008K), 13.0410540 secs]
2016-12-12T14:18:45.512+0800: 17314.925: [Full GC 2971007K->2970893K(2971008K), 12.7169690 secs]
2016-12-12T14:18:58.239+0800: 17327.652: [GC 2970910K(2971008K), 0.7314350 secs]
2016-12-12T14:19:00.557+0800: 17329.969: [Full GC 2971008K->2970883K(2971008K), 11.1889000 secs]
2016-12-12T14:19:11.767+0800: 17341.180: [Full GC 2971006K->2970940K(2971008K), 11.4069700 secs]
2016-12-12T14:19:23.185+0800: 17352.597: [GC 2970950K(2971008K), 0.6689360 secs]
2016-12-12T14:19:25.484+0800: 17354.896: [Full GC 2971007K->2970913K(2971008K), 12.6980050 secs]
2016-12-12T14:19:38.194+0800: 17367.607: [Full GC 2971004K->2970902K(2971008K), 12.7641130 secs]
2016-12-12T14:19:50.968+0800: 17380.380: [GC 2970921K(2971008K), 0.6966130 secs]
2016-12-12T14:19:53.266+0800: 17382.678: [Full GC 2971007K->2970875K(2971008K), 12.9416660 secs]
2016-12-12T14:20:06.233+0800: 17395.645: [Full GC 2971007K->2970867K(2971008K), 13.2740780 secs]
2016-12-12T14:20:19.527+0800: 17408.939: [GC 2970881K(2971008K), 0.7696770 secs]
2016-12-12T14:20:22.024+0800: 17411.436: [Full GC 2971007K->2970886K(2971008K), 13.8729770 secs]
2016-12-12T14:20:35.919+0800: 17425.331: [Full GC 2971002K->2915146K(2971008K), 12.8270160 secs]
2016-12-12T14:20:48.762+0800: 17438.175: [GC 2915155K(2971008K), 0.6856650 secs]
2016-12-12T14:20:51.271+0800: 17440.684: [Full GC 2971007K->2915307K(2971008K), 12.4895750 secs]
2016-12-12T14:21:03.771+0800: 17453.184: [GC 2915320K(2971008K), 0.6249910 secs]
2016-12-12T14:21:06.377+0800: 17455.789: [Full GC 2971007K->2914274K(2971008K), 12.6835220 secs]
2016-12-12T14:21:19.129+0800: 17468.541: [GC 2917963K(2971008K), 0.6917090 secs]
2016-12-12T14:21:21.526+0800: 17470.938: [Full GC 2971007K->2913949K(2971008K), 13.0442320 secs]
2016-12-12T14:21:36.588+0800: 17486.000: [GC 2936827K(2971008K), 0.7244690 secs]
So, the logs show that there might be memory leak existing, it might occur in two place:
1) my code or 2) code in spark graphx api
Can anyone help me find out the reason if it occurs in my code?
I don't think unpersist() API is causing out of memory. OutOfMemory is caused by collect() API because collect() (which is an Action unlike Transformation) fetches the entire RDD to a single driver machine.
Few suggestions:
Increasing the RAM in driver memory is one partial solution, which you have already implemented. If you are working with jdk 8, use G1GC collector to manage large heaps.
You can play with storage levels (MEMORY_AND_DISK, OFF_HEAP etc) to fine-tune it for your application.
Have a look at this official documentation guide for more details.
I haven't solved the problem completely, but I have fix it partly,
Increase the driver memory. I have mentioned above that it stoped in round 6 or 7, but when I double the driver memory, it would stop at round 14. So, I think driver memory OOM might be one reason.
Save the candidates RDD to HDFS, and continue the process at the next time. So, the compuation before will not be wasted.
Serialize candidates RDD with Kryo. It will cost some computation on decode and encode, but saves a greate amount of memory.
There are not the perfect solution, but it does work in my case. However, I hope other guys would give the perfect one.

Haskell space leak in hash table insertion

I have been coding a histogram and I have had some great help on here. I have been coding my histogram using a hash table to store the keys and frequency values because the distribution of the keys are unknown; so they might not be sorted or consecutively together.
The problem with my code is that it spends too much time in GC so looks like a space leak as the time spent in GC is 60.3% - so my productivity is a poor 39.7%.
What is going wrong? I have tried to make things strict in the histogram function and I've also in-lined it (GC time went from 69.1% to 59.4%.)
Please note I have simplified this code by not updating the frequencies in the HT.
{-# LANGUAGE BangPatterns #-}
import qualified Data.HashTable.IO as H
import qualified Data.Vector as V
type HashTable k v = H.BasicHashTable k v
n :: Int
n = 5000000
kv :: V.Vector (Int,Int)
kv = V.zip k v
where
k = V.generate n (\i -> i `mod` 10)
v = V.generate n (\i -> 1)
histogram :: V.Vector (Int,Int) -> Int -> IO (H.CuckooHashTable Int Int)
histogram vec !n = do
ht <- H.newSized n
go ht (n-1)
where
go ht = go'
where
go' (-1) = return ht
go' !i = do
let (k,v) = vec V.! i
H.insert ht k v
go' (i-1)
{-# INLINE histogram #-}
main :: IO ()
main = do
ht <- histogram kv n
putStrLn "done"
Here's how it is compiled:
ghc --make -O3 -fllvm -rtsopts histogram.hs
Diagnosis:
jap#devbox:~/dev$ ./histogram +RTS -sstderr
done
863,187,472 bytes allocated in the heap
708,960,048 bytes copied during GC
410,476,592 bytes maximum residency (5 sample(s))
4,791,736 bytes maximum slop
613 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 1284 colls, 0 par 0.46s 0.46s 0.0004s 0.0322s
Gen 1 5 colls, 0 par 0.36s 0.36s 0.0730s 0.2053s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.51s ( 0.50s elapsed)
GC time 0.82s ( 0.82s elapsed)
EXIT time 0.03s ( 0.04s elapsed)
Total time 1.36s ( 1.36s elapsed)
%GC time 60.3% (60.4% elapsed)
Alloc rate 1,708,131,822 bytes per MUT second
Productivity 39.7% of total user, 39.7% of total elapsed
For the sake of comparison, this is what I get running your code as posted:
863,187,472 bytes allocated in the heap
708,960,048 bytes copied during GC
410,476,592 bytes maximum residency (5 sample(s))
4,791,736 bytes maximum slop
613 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 1284 colls, 0 par 1.01s 1.01s 0.0008s 0.0766s
Gen 1 5 colls, 0 par 0.81s 0.81s 0.1626s 0.4783s
INIT time 0.00s ( 0.00s elapsed)
MUT time 1.04s ( 1.04s elapsed)
GC time 1.82s ( 1.82s elapsed)
EXIT time 0.04s ( 0.04s elapsed)
Total time 2.91s ( 2.91s elapsed)
%GC time 62.6% (62.6% elapsed)
Alloc rate 827,493,210 bytes per MUT second
Productivity 37.4% of total user, 37.4% of total elapsed
Given that your vector elements are just (Int, Int) tuples, we have no reason not to use Data.Vector.Unboxed instead of plain Data.Vector. That already leads to significant improvement:
743,148,592 bytes allocated in the heap
38,440 bytes copied during GC
231,096,768 bytes maximum residency (4 sample(s))
4,759,104 bytes maximum slop
226 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 977 colls, 0 par 0.23s 0.23s 0.0002s 0.0479s
Gen 1 4 colls, 0 par 0.22s 0.22s 0.0543s 0.1080s
INIT time 0.00s ( 0.00s elapsed)
MUT time 1.04s ( 1.04s elapsed)
GC time 0.45s ( 0.45s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 1.49s ( 1.49s elapsed)
%GC time 30.2% (30.2% elapsed)
Alloc rate 715,050,070 bytes per MUT second
Productivity 69.8% of total user, 69.9% of total elapsed
Next, instead of hand-rolling recursion over the vector, we might use the optimised functions the vector library provides for that purpose. Code...
import qualified Data.HashTable.IO as H
import qualified Data.Vector.Unboxed as V
n :: Int
n = 5000000
kv :: V.Vector (Int,Int)
kv = V.zip k v
where
k = V.generate n (\i -> i `mod` 10)
v = V.generate n (\i -> 1)
histogram :: V.Vector (Int,Int) -> Int -> IO (H.CuckooHashTable Int Int)
histogram vec n = do
ht <- H.newSized n
V.mapM_ (\(k, v) -> H.insert ht k v) vec
return ht
{-# INLINE histogram #-}
main :: IO ()
main = do
ht <- histogram kv n
putStrLn "done"
... and result:
583,151,048 bytes allocated in the heap
35,632 bytes copied during GC
151,096,672 bytes maximum residency (3 sample(s))
3,003,040 bytes maximum slop
148 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 826 colls, 0 par 0.20s 0.20s 0.0002s 0.0423s
Gen 1 3 colls, 0 par 0.12s 0.12s 0.0411s 0.1222s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.92s ( 0.92s elapsed)
GC time 0.32s ( 0.33s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 1.25s ( 1.25s elapsed)
%GC time 25.9% (26.0% elapsed)
Alloc rate 631,677,209 bytes per MUT second
Productivity 74.1% of total user, 74.0% of total elapsed
81MB saved, not bad at all. Can we do even better?
A heap profile (which should be the first thing you think of when having memory consumption woes - debugging them without one is shooting in the dark) will reveal that, even with the original code, peak memory consumption happens very early on. Strictly speaking we do not have a leak; we just spend a lot of memory from the beginning. Now, note that the hash table is created with ht <- H.newSized n, with n = 5000000. Unless you expect to have so many different keys (as opposed to elements), that is extremely wasteful. Changing the initial size to 10 (the number of keys you actually have in your test) improves things dramatically:
432,059,960 bytes allocated in the heap
50,200 bytes copied during GC
44,416 bytes maximum residency (2 sample(s))
25,216 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 825 colls, 0 par 0.01s 0.01s 0.0000s 0.0000s
Gen 1 2 colls, 0 par 0.00s 0.00s 0.0002s 0.0003s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.90s ( 0.90s elapsed)
GC time 0.01s ( 0.01s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.91s ( 0.90s elapsed)
%GC time 0.6% (0.6% elapsed)
Alloc rate 481,061,802 bytes per MUT second
Productivity 99.4% of total user, 99.4% of total elapsed
Finally, we might as well make our life simpler and try using the pure, yet efficient, hash map from unordered-containers. Code...
import qualified Data.HashMap.Strict as M
import qualified Data.Vector.Unboxed as V
n :: Int
n = 5000000
kv :: V.Vector (Int,Int)
kv = V.zip k v
where
k = V.generate n (\i -> i `mod` 10)
v = V.generate n (\i -> 1)
histogram :: V.Vector (Int,Int) -> M.HashMap Int Int
histogram vec =
V.foldl' (\ht (k, v) -> M.insert k v ht) M.empty vec
main :: IO ()
main = do
print $ M.size $ histogram kv
putStrLn "done"
... and result.
55,760 bytes allocated in the heap
3,512 bytes copied during GC
44,416 bytes maximum residency (1 sample(s))
17,024 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 0 colls, 0 par 0.00s 0.00s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.00s 0.00s 0.0002s 0.0002s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.34s ( 0.34s elapsed)
GC time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.34s ( 0.34s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 162,667 bytes per MUT second
Productivity 99.9% of total user, 100.0% of total elapsed
~60% faster. It remains to be seen how it would scale with a larger amount of keys, but with your test data unordered-containers ends up being not only more convenient (pure functions; actually updating the histogram values only takes changing M.insert to M.insertWith) but also faster.

Resources