I am sufferring JAVA G1 issue - garbage-collection

does any one encounter this kind of issue in java G1 gc
the first highlight user time is about 4 ms
but the second one user time is 0 ms and system time is about 4ms.
in G1 gc system time shouldn't be high, is it a bug in G1 gc?
below is my gc argunments
Xms200g -Xmx200g -Xmn30g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSCompactAtFullCollection -XX:CMSMaxAbortablePrecleanTime=5000 -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -verbose:gc -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
2018-01-07T04:54:39.995+0800: 906650.864: [GC (Allocation Failure) 2018-01-07T04:54:39.996+0800: 906650.865: [ParNew
Desired survivor size 1610612736 bytes, new threshold 6 (max 6)
- age 1: 69747632 bytes, 69747632 total
- age 2: 9641544 bytes, 79389176 total
- age 3: 10522192 bytes, 89911368 total
- age 4: 11732392 bytes, 101643760 total
- age 5: 9158960 bytes, 110802720 total
- age 6: 10917528 bytes, 121720248 total
: 25341731K->170431K(28311552K), 0.2088528 secs] 153045380K->127882325K(206569472K), 0.2094236 secs] [Times: **user=4.53 sys=0.00, real=0.21 secs]**
Heap after GC invocations=32432 (full 10):
par new generation total 28311552K, used 170431K [0x00007f6058000000, 0x00007f67d8000000, 0x00007f67d8000000)
eden space 25165824K, 0% used [0x00007f6058000000, 0x00007f6058000000, 0x00007f6658000000)
from space 3145728K, 5% used [0x00007f6658000000, 0x00007f666266ffe0, 0x00007f6718000000)
to space 3145728K, 0% used [0x00007f6718000000, 0x00007f6718000000, 0x00007f67d8000000)
concurrent mark-sweep generation total 178257920K, used 127711893K [0x00007f67d8000000, 0x00007f9258000000, 0x00007f9258000000)
Metaspace used 54995K, capacity 55688K, committed 56028K, reserved 57344K
}
2018-01-07T04:54:40.205+0800: 906651.074: Total time for which application threads were stopped: 0.2269738 seconds, Stopping threads took: 0.0001692 seconds
{Heap before GC invocations=32432 (full 10):
par new generation total 28311552K, used 25336255K [0x00007f6058000000, 0x00007f67d8000000, 0x00007f67d8000000)
eden space 25165824K, 100% used [0x00007f6058000000, 0x00007f6658000000, 0x00007f6658000000)
from space 3145728K, 5% used [0x00007f6658000000, 0x00007f666266ffe0, 0x00007f6718000000)
to space 3145728K, 0% used [0x00007f6718000000, 0x00007f6718000000, 0x00007f67d8000000)
concurrent mark-sweep generation total 178257920K, used 127711893K [0x00007f67d8000000, 0x00007f9258000000, 0x00007f9258000000)
Metaspace used 54995K, capacity 55688K, committed 56028K, reserved 57344K
2018-01-07T04:55:02.541+0800: 906673.411: [GC (Allocation Failure) 2018-01-07T04:55:02.542+0800: 906673.411: [ParNew
Desired survivor size 1610612736 bytes, new threshold 6 (max 6)
- age 1: 93841912 bytes, 93841912 total
- age 2: 11310104 bytes, 105152016 total
- age 3: 8967160 bytes, 114119176 total
- age 4: 10278920 bytes, 124398096 total
- age 5: 11626160 bytes, 136024256 total
- age 6: 9077432 bytes, 145101688 total
: 25336255K->195827K(28311552K), 0.1926783 secs] 153048149K->127918291K(206569472K), 0.1932366 secs] [Times: **user=0.00 sys=4.07, real=0.20 secs]**
Heap after GC invocations=32433 (full 10):
par new generation total 28311552K, used 195827K [0x00007f6058000000, 0x00007f67d8000000, 0x00007f67d8000000)
eden space 25165824K, 0% used [0x00007f6058000000, 0x00007f6058000000, 0x00007f6658000000)
from space 3145728K, 6% used [0x00007f6718000000, 0x00007f6723f3cf38, 0x00007f67d8000000)
to space 3145728K, 0% used [0x00007f6658000000, 0x00007f6658000000, 0x00007f6718000000)
concurrent mark-sweep generation total 178257920K, used 127722463K [0x00007f67d8000000, 0x00007f9258000000, 0x00007f9258000000)
Metaspace used 54995K, capacity 55688K, committed 56028K, reserved 57344K
}
2018-01-07T04:55:02.735+0800: 906673.604: Total time for which application threads were stopped: 0.2149603 seconds, Stopping threads took: 0.0002262 seconds
2018-01-07T04:55:14.673+0800: 906685.542: Total time for which application threads were stopped: 0.0183883 seconds, Stopping threads took: 0.0002046 seconds
2018-01-07T04:55:14.797+0800: 906685.666: Total time for which application threads were stopped: 0.0135349 seconds, Stopping threads took: 0.0002472 seconds
2018-01-07T04:55:14.810+0800: 906685.679: Total time for which application threads were stopped: 0.0129019 seconds, Stopping threads took: 0.0001014 seconds
2018-01-07T04:55:14.823+0800: 906685.692: Total time for which application threads were stopped: 0.0125939 seconds, Stopping threads took: 0.0002915 seconds
2018-01-07T04:55:21.597+0800: 906692.466: Total time for which application threads were stopped: 0.0137018 seconds, Stopping threads took: 0.0001683 seconds
{Heap before GC invocations=32433 (full 10):

your command-line specifies -XX:+UseConcMarkSweepGC - this isn't a G1 issue.

Related

What does the Pinned Handle in the gcroot result mean practically in the context of debugging high memory usage?

I have the following gcroot output:
(lldb) gcroot 0x00007FC1EB641670
HandleTable:
00007FC3900B1388 (pinned handle)
-> 00007FC278027028 System.Object[]
-> 00007FC1E82CF848 System.Collections.Concurrent.ConcurrentDictionary`2[[Dapper.SqlMapper+Identity, Dapper.StrongName],[Dapper.SqlMapper+CacheInfo, Dapper.StrongName]]
-> 00007FBEEAA5E998 System.Collections.Concurrent.ConcurrentDictionary`2+Tables[[Dapper.SqlMapper+Identity, Dapper.StrongName],[Dapper.SqlMapper+CacheInfo, Dapper.StrongName]]
-> 00007FBEEAA51000 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[Dapper.SqlMapper+Identity, Dapper.StrongName],[Dapper.SqlMapper+CacheInfo, Dapper.StrongName]][]
-> 00007FBE6BCB00C0 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[Dapper.SqlMapper+Identity, Dapper.StrongName],[Dapper.SqlMapper+CacheInfo, Dapper.StrongName]]
-> 00007FBE6BCAFFA0 Dapper.SqlMapper+Identity
-> 00007FC1EB641670 System.String
Found 1 unique roots (run 'gcroot -all' to see all roots).
What is the meaning of the address 00007FC3900B1388? How can I locate the relevant logic in the code?
EDIT 1
The System.Object[] object at 00007FC278027028 takes 130584 bytes:
(lldb) dumpobj 00007FC278027028
Name: System.Object[]
MethodTable: 00007fc315075510
EEClass: 00007fc315075488
Size: 130584(0x1fe18) bytes
Array: Rank 1, Number of elements 16320, Type CLASS
I.e. it must be on LOH.
Please, find below the output of eeheap -gc
Number of GC Heaps: 8
------------------------------
Heap 0 (000000000125D930)
generation 0 starts at 0x00007FBE89EA6AA0
generation 1 starts at 0x00007FBE8895C1B0
generation 2 starts at 0x00007FBE67FFF000
ephemeral segment allocation context: none
segment begin allocated committed allocated size committed size
00007FBE67FFE000 00007FBE67FFF000 00007FBE8A4DD0A8 00007FBE94E3F000 0x224de0a8(575529128) 0x2ce40000(753139712)
Large object heap starts at 0x00007FC267FFF000
segment begin allocated committed allocated size committed size
00007FC267FFE000 00007FC267FFF000 00007FC277C6C938 00007FC277C8D000 0xfc6d938(264689976) 0xfc8e000(264822784)
00007FBCB442A000 00007FBCB442B000 00007FBCB51FCA48 00007FBCB51FD000 0xdd1a48(14490184) 0xdd2000(14491648)
Allocated Heap Size: Size: 0x32f1d428 (854709288) bytes.
Committed Heap Size: Size: 0x3d8a0000 (1032454144) bytes.
------------------------------
Heap 1 (0000000001265A20)
generation 0 starts at 0x00007FBF06D4BB48
generation 1 starts at 0x00007FBF0594AB08
generation 2 starts at 0x00007FBEE7FFF000
ephemeral segment allocation context: none
segment begin allocated committed allocated size committed size
00007FBEE7FFE000 00007FBEE7FFF000 00007FBF0766C740 00007FBF11B4B000 0x1f66d740(526833472) 0x29b4c000(699711488)
Large object heap starts at 0x00007FC277FFF000
segment begin allocated committed allocated size committed size
00007FC277FFE000 00007FC277FFF000 00007FC287F9DFF0 00007FC287FF5000 0xff9eff0(268038128) 0xfff6000(268394496)
00007FBC84424000 00007FBC84425000 00007FBC84EA2150 00007FBC84EA3000 0xa7d150(10998096) 0xa7e000(11001856)
Allocated Heap Size: Size: 0x30089880 (805869696) bytes.
Committed Heap Size: Size: 0x3a5c0000 (979107840) bytes.
------------------------------
Heap 2 (000000000126D8A0)
generation 0 starts at 0x00007FBF89893538
generation 1 starts at 0x00007FBF88849088
generation 2 starts at 0x00007FBF67FFF000
ephemeral segment allocation context: none
segment begin allocated committed allocated size committed size
00007FBF67FFE000 00007FBF67FFF000 00007FBF89EC4EB8 00007FBF94ADD000 0x21ec5eb8(569138872) 0x2cade000(749592576)
Large object heap starts at 0x00007FC287FFF000
segment begin allocated committed allocated size committed size
00007FC287FFE000 00007FC287FFF000 00007FC297FF8718 00007FC297FF9000 0xfff9718(268408600) 0xfffa000(268410880)
00007FBCE4430000 00007FBCE4431000 00007FBCE53E9098 00007FBCE53EA000 0xfb8098(16482456) 0xfb9000(16486400)
Allocated Heap Size: Size: 0x32e77668 (854029928) bytes.
Committed Heap Size: Size: 0x3da91000 (1034489856) bytes.
------------------------------
Heap 3 (0000000001275720)
generation 0 starts at 0x00007FC0079EF878
generation 1 starts at 0x00007FC006993E30
generation 2 starts at 0x00007FBFE7FFF000
ephemeral segment allocation context: none
segment begin allocated committed allocated size committed size
00007FBFE7FFE000 00007FBFE7FFF000 00007FC00C09B238 00007FC013259000 0x2409c238(604619320) 0x2b25a000(723886080)
Large object heap starts at 0x00007FC297FFF000
segment begin allocated committed allocated size committed size
00007FC297FFE000 00007FC297FFF000 00007FC2A7E5CBC0 00007FC2A7E7D000 0xfe5dbc0(266722240) 0xfe7e000(266854400)
00007FBCF4432000 00007FBCF4433000 00007FBCF55AB6D8 00007FBCF55CC000 0x11786d8(18319064) 0x1199000(18452480)
Allocated Heap Size: Size: 0x350724d0 (889660624) bytes.
Committed Heap Size: Size: 0x3c271000 (1009192960) bytes.
------------------------------
Heap 4 (000000000127D5A0)
generation 0 starts at 0x00007FC088CAE0B0
generation 1 starts at 0x00007FC087A4C4E8
generation 2 starts at 0x00007FC067FFF000
ephemeral segment allocation context: none
segment begin allocated committed allocated size committed size
00007FC067FFE000 00007FC067FFF000 00007FC08AC581F0 00007FC093EA6000 0x22c591f0(583373296) 0x2bea7000(736784384)
Large object heap starts at 0x00007FC2A7FFF000
segment begin allocated committed allocated size committed size
00007FC2A7FFE000 00007FC2A7FFF000 00007FC2B7F24C40 00007FC2B7F45000 0xff25c40(267541568) 0xff46000(267673600)
00007FBD237BE000 00007FBD237BF000 00007FBD24FBC270 00007FBD24FBD000 0x17fd270(25154160) 0x17fe000(25157632)
Allocated Heap Size: Size: 0x3437c0a0 (876069024) bytes.
Committed Heap Size: Size: 0x3d5eb000 (1029615616) bytes.
------------------------------
Heap 5 (0000000001285420)
generation 0 starts at 0x00007FC1097B5D90
generation 1 starts at 0x00007FC10872DB10
generation 2 starts at 0x00007FC0E7FFF000
ephemeral segment allocation context: none
segment begin allocated committed allocated size committed size
00007FC0E7FFE000 00007FC0E7FFF000 00007FC10DAD4388 00007FC114D2C000 0x25ad5388(632116104) 0x2cd2d000(752013312)
Large object heap starts at 0x00007FC2B7FFF000
segment begin allocated committed allocated size committed size
00007FC2B7FFE000 00007FC2B7FFF000 00007FC2C7DD3C68 00007FC2C7DF4000 0xfdd4c68(266161256) 0xfdf5000(266293248)
00007FBC74422000 00007FBC74423000 00007FBC74AD9BA0 00007FBC74ADA000 0x6b6ba0(7039904) 0x6b7000(7041024)
Allocated Heap Size: Size: 0x35f60b90 (905317264) bytes.
Committed Heap Size: Size: 0x3d1d9000 (1025347584) bytes.
------------------------------
Heap 6 (000000000128D2A0)
generation 0 starts at 0x00007FC1887CB3E0
generation 1 starts at 0x00007FC1877EFF98
generation 2 starts at 0x00007FC167FFF000
ephemeral segment allocation context: none
segment begin allocated committed allocated size committed size
00007FC167FFE000 00007FC167FFF000 00007FC18A95FE60 00007FC193A66000 0x22960e60(580259424) 0x2ba67000(732327936)
Large object heap starts at 0x00007FC2C7FFF000
segment begin allocated committed allocated size committed size
00007FC2C7FFE000 00007FC2C7FFF000 00007FC2D7F7E3F8 00007FC2D7FD5000 0xff7f3f8(267908088) 0xffd6000(268263424)
00007FBCD442E000 00007FBCD442F000 00007FBCD52B9180 00007FBCD52BA000 0xe8a180(15245696) 0xe8b000(15249408)
Allocated Heap Size: Size: 0x3376a3d8 (863413208) bytes.
Committed Heap Size: Size: 0x3c8c8000 (1015840768) bytes.
------------------------------
Heap 7 (0000000001295120)
generation 0 starts at 0x00007FC2086004E0
generation 1 starts at 0x00007FC2075711A8
generation 2 starts at 0x00007FC1E7FFF000
ephemeral segment allocation context: none
segment begin allocated committed allocated size committed size
00007FC1E7FFE000 00007FC1E7FFF000 00007FC20879C120 00007FC213FA3000 0x2079d120(544854304) 0x2bfa4000(737820672)
Large object heap starts at 0x00007FC2D7FFF000
segment begin allocated committed allocated size committed size
00007FC2D7FFE000 00007FC2D7FFF000 00007FC2E7F465B8 00007FC2E7F67000 0xff475b8(267679160) 0xff68000(267812864)
00007FBCC442C000 00007FBCC442D000 00007FBCC514E798 00007FBCC51A5000 0xd21798(13768600) 0xd78000(14123008)
Allocated Heap Size: Size: 0x31405e70 (826302064) bytes.
Committed Heap Size: Size: 0x3cc84000 (1019756544) bytes.
------------------------------
GC Allocated Heap Size: Size: 0x199cdd658 (6875371096) bytes.
GC Committed Heap Size: Size: 0x1e5872000 (8145805312) bytes.
The pinned handle value is 0x00007FC3900B1388. I am not sure how to compute the range that owns it. I do not see any numbers that produce a range that covers 0x00007FC3900B1388.
BTW, I have found the relevant static inside the Dapper.SqlMapper class:
public static class SqlMapper
{
...
private static readonly ConcurrentDictionary<Identity, CacheInfo> _queryCache;
}
https://github.com/DapperLib/Dapper/blob/ca00feeb5fafe5262166689c0bec2b80b53add4e/Dapper/SqlMapper.cs#L59

Java eden space is not 8 times larger than s0 space

according to oracle's doc default parameter values for SurvivorRatio is 8, that means each survivor space will be one-eighth the size of eden space.
but in my application it don't work
$ jmap -heap 48865
Attaching to process ID 48865, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.45-b02
using thread-local object allocation.
Parallel GC with 8 thread(s)
Heap Configuration:
MinHeapFreeRatio = 0
MaxHeapFreeRatio = 100
MaxHeapSize = 4294967296 (4096.0MB)
NewSize = 89128960 (85.0MB)
MaxNewSize = 1431306240 (1365.0MB)
OldSize = 179306496 (171.0MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
PS Young Generation
Eden Space:
capacity = 67108864 (64.0MB)
used = 64519920 (61.53099060058594MB)
free = 2588944 (2.4690093994140625MB)
96.14217281341553% used
From Space:
capacity = 11010048 (10.5MB)
used = 0 (0.0MB)
free = 11010048 (10.5MB)
0.0% used
To Space:
capacity = 11010048 (10.5MB)
used = 0 (0.0MB)
free = 11010048 (10.5MB)
0.0% used
PS Old Generation
capacity = 179306496 (171.0MB)
used = 0 (0.0MB)
free = 179306496 (171.0MB)
0.0% used
7552 interned Strings occupying 605288 bytes.
but in VisualVM eden space is 1.332G and S0 is 455M, eden is only 3 times larger than S0 not the 8
You have neither disabled -XX:-UseAdaptiveSizePolicy, nor set -Xms equal to -Xmx, so JVM is free to resize heap generations (and survivor spaces) in runtime. In this case the estimated maximum Survior size is
MaxSurvivor = NewGen / MinSurvivorRatio
where -XX:MinSurvivorRatio=3 by default. Note: this is an estimated maximum, not the actual size.
See also this answer.

Spark: graphx api OOM errors after unpersist useless RDDs

I have met an Out Of Memeory error with unknown reasons, I have released the useless RDDs immediately, but after several round of loop, OOM error still come out. My code is as following:
// single source shortest path
def sssp[VD](graph:Graph[VD,Double], source: VertexId): Graph[Double, Double] = {
graph.mapVertices((id, _) => if (id == source) 0.0 else Double.PositiveInfinity)
.pregel(Double.PositiveInfinity)(
(id, dist, newDist) => scala.math.min(dist, newDist),
triplet => {
if (triplet.srcAttr + triplet.attr < triplet.dstAttr) {
Iterator((triplet.dstId, triplet.srcAttr + triplet.attr))
}
else {
Iterator.empty
}
},
(a, b) => math.min(a, b)
)
}
def selectCandidate(candidates: RDD[(VertexId, (Double, Double))]): VertexId = {
Random.setSeed(System.nanoTime())
val selectLow = Random.nextBoolean()
val (vid, (_, _)) = if (selectLow) {
println("Select lowest bound")
candidates.reduce((x, y) => if (x._2._1 < y._2._1) x else y)
} else {
println("Select highest bound")
candidates.reduce((x, y) => if (x._2._2 > y._2._2) x else y)
}
vid
}
val g = {/* load graph from hdfs*/}.partitionBy(EdgePartition2D,eParts).cache
println("Vertices Size: " + g.vertices.count )
println("Edges Size: " + g.edges.count )
val resultDiameter = {
val diff = 0d
val maxIterations = 100
val filterJoin = 1e5
val vParts = 100
var deltaHigh = Double.PositiveInfinity
var deltaLow = Double.NegativeInfinity
var candidates = g.vertices.map(x => (x._1, (Double.NegativeInfinity,
Double.PositiveInfinity)))
.partitionBy(new HashPartitioner(vParts))
.persist(StorageLevel.MEMORY_AND_DISK) // (vid, low, high)
var round = 0
var candidateCount = candidates.count
while (deltaHigh - deltaLow > diff && candidateCount > 0 && round <= maxIterations) {
val currentVertex = dia.selectCandidate(candidates)
val dist: RDD[(VertexId, Double)] = dia.sssp(g, currentVertex)
.vertices
.partitionBy(new HashPartitioner(vParts)) // join more efficiently
.persist(StorageLevel.MEMORY_AND_DISK)
val eccentricity = dist.map({ case (vid, length) => length }).max
println("Eccentricity = %.1f".format(eccentricity))
val subDist = if(candidateCount > filterJoin) {
println("Directly use Dist")
dist
} else { // when candidates is small than filterJoin, filter the useless vertices
println("Filter Dist")
val candidatesMap = candidates.sparkContext.broadcast(
candidates.collect.toMap)
val subDist = dist.filter({case (vid, length) =>
candidatesMap.value.contains(vid)})
.persist(StorageLevel.MEMORY_AND_DISK)
println("Sub Dist Count: " + subDist.count)
subDist
}
var previousCandidates = candidates
candidates = candidates.join(subDist).map({ case (vid, ((low, high), d)) =>
(vid,
(Array(low, eccentricity - d, d).max,
Array(high, eccentricity + d).min))
}).persist(StorageLevel.MEMORY_AND_DISK)
candidateCount = candidates.count
println("Candidates Count 1 : " + candidateCount)
previousCandidates.unpersist(true) // release useless rdd
dist.unpersist(true) // release useless rdd
deltaLow = Array(deltaLow,
candidates.map({ case (_, (low, _)) => low }).max).max
deltaHigh = Array(deltaHigh, 2 * eccentricity,
candidates.map({ case (_, (_, high)) => high }).max).min
previousCandidates = candidates
candidates = candidates.filter({ case (_, (low, high)) =>
!((high <= deltaLow && low >= deltaHigh / 2d) || low == high)
})
.partitionBy(new HashPartitioner(vParts)) // join more efficiently
.persist(StorageLevel.MEMORY_AND_DISK)
candidateCount = candidates.count
println("Candidates Count 2:" + candidateCount)
previousCandidates.unpersist(true) // release useless rdd
round += 1
println(s"Round=${round},Low=${deltaLow}, High=${deltaHigh}, Candidates=${candidateCount}")
}
deltaLow
}
println(s"Diameter $resultDiameter")
println("Complete!")
The main data in the while block is a graph object g and an RDD candidates. g is used to compute single source shourtest path in each round and graph structure not changed. candidates size will be decreased round by round.
In each round, I manually unpersist the useless rdd with blocking mode, so I think it should have enough memory for the following operations. However, it stops for OOM in round 7 or 6 randomly. When the program came in round 6 or 7, candidates decrease seriously, about 10% or less of the origin one. Output sample as following, the candidates size decrease from 15,288,624 in round 1 to 67,451 in round 7:
Vertices Size: 15,288,624
Edges Size: 228,097,574
Select lowest bound
Eccentricity = 12.0
Directly use Dist
Candidates Count 1 : 15288624
Candidates Count 2:15288623
Round=1,Low=12.0, High=24.0, Candidates=15288623
Select lowest bound
Eccentricity = 13.0
Directly use Dist
Candidates Count 1 : 15288623
Candidates Count 2:15288622
Round=2,Low=13.0, High=24.0, Candidates=15288622
Select highest bound
Eccentricity = 18.0
Directly use Dist
Candidates Count 1 : 15288622
Candidates Count 2:6578370
Round=3,Low=18.0, High=23.0, Candidates=6578370
Select lowest bound
Eccentricity = 12.0
Directly use Dist
Candidates Count 1 : 6578370
Candidates Count 2:6504563
Round=4,Low=18.0, High=23.0, Candidates=6504563
Select lowest bound
Eccentricity = 11.0
Directly use Dist
Candidates Count 1 : 6504563
Candidates Count 2:412789
Round=5,Low=18.0, High=22.0, Candidates=412789
Select highest bound
Eccentricity = 17.0
Directly use Dist
Candidates Count 1 : 412789
Candidates Count 2:288670
Round=6,Low=18.0, High=22.0, Candidates=288670
Select highest bound
Eccentricity = 18.0
Directly use Dist
Candidates Count 1 : 288670
Candidates Count 2:67451
Round=7,Low=18.0, High=22.0, Candidates=67451
The near ends of the spark.info log
6/12/12 14:03:09 WARN YarnAllocator: Expected to find pending requests, but found none.
16/12/12 14:06:21 INFO YarnAllocator: Canceling requests for 0 executor containers
16/12/12 14:06:33 WARN YarnAllocator: Expected to find pending requests, but found none.
16/12/12 14:14:26 WARN NioEventLoop: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
16/12/12 14:18:14 WARN NioEventLoop: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
at io.netty.util.internal.MpscLinkedQueue.offer(MpscLinkedQueue.java:123)
at io.netty.util.internal.MpscLinkedQueue.add(MpscLinkedQueue.java:218)
at io.netty.util.concurrent.SingleThreadEventExecutor.fetchFromScheduledTaskQueue(SingleThreadEventExecutor.java:260)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:347)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:374)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
at java.lang.Thread.run(Thread.java:744)
16/12/12 14:18:14 WARN DFSClient: DFSOutputStream ResponseProcessor exception for block BP-552217672-100.76.16.204-1470826698239:blk_1377987137_304302272
java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:116)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:721)
16/12/12 14:14:39 WARN AbstractConnector:
java.lang.OutOfMemoryError: Java heap space
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:233)
at org.spark-project.jetty.server.nio.SelectChannelConnector.accept(SelectChannelConnector.java:109)
at org.spark-project.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:938)
at org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:744)
16/12/12 14:20:06 INFO ApplicationMaster: Final app status: FAILED, exitCode: 12, (reason: Exception was thrown 1 time(s) from Reporter thread.)
16/12/12 14:19:38 WARN DFSClient: Error Recovery for block BP-552217672-100.76.16.204-1470826698239:blk_1377987137_304302272 in pipeline 100.76.15.28:9003, 100.76.48.218:9003, 100.76.48.199:9003: bad datanode 100.76.15.28:9003
16/12/12 14:18:58 ERROR ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
16/12/12 14:20:49 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.remote.default-remote-dispatcher-198] shutting down ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: Java heap space
16/12/12 14:20:49 INFO SparkContext: Invoking stop() from shutdown hook
16/12/12 14:20:49 INFO ContextCleaner: Cleaned shuffle 446
16/12/12 14:20:49 WARN AkkaRpcEndpointRef: Error sending message [message = RemoveRdd(2567)] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#-213595070]] had already been terminated.. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at org.spark-project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
at scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
at scala.concurrent.Future$class.recover(Future.scala:324)
at scala.concurrent.impl.Promise$DefaultPromise.recover(Promise.scala:153)
at org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:376)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:100)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
at org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:104)
at org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1630)
at org.apache.spark.ContextCleaner.doCleanupRDD(ContextCleaner.scala:208)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:185)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:180)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:180)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1180)
at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:173)
at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:68)
Caused by: akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#-213595070]] had already been terminated.
at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:132)
at org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:364)
... 12 more
16/12/12 14:20:49 WARN QueuedThreadPool: 5 threads could not be stopped
16/12/12 14:20:49 INFO SparkUI: Stopped Spark web UI at http://10.215.154.152:56338
16/12/12 14:20:49 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/12/12 14:20:49 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/12/12 14:21:04 WARN AkkaRpcEndpointRef: Error sending message [message = RemoveRdd(2567)] in 2 attempts
org.apache.spark.rpc.RpcTimeoutException: Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#-213595070]] had already been terminated.. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
The near ends of the gc.log
2016-12-12T14:10:43.541+0800: 16832.953: [Full GC 2971008K->2971007K(2971008K), 11.4284920 secs]
2016-12-12T14:10:54.990+0800: 16844.403: [Full GC 2971007K->2971007K(2971008K), 11.4479110 secs]
2016-12-12T14:11:06.457+0800: 16855.870: [GC 2971007K(2971008K), 0.6827710 secs]
2016-12-12T14:11:08.825+0800: 16858.237: [Full GC 2971007K->2971007K(2971008K), 11.5480350 secs]
2016-12-12T14:11:20.384+0800: 16869.796: [Full GC 2971007K->2971007K(2971008K), 11.0481490 secs]
2016-12-12T14:11:31.442+0800: 16880.855: [Full GC 2971007K->2971007K(2971008K), 11.0184790 secs]
2016-12-12T14:11:42.472+0800: 16891.884: [Full GC 2971008K->2971008K(2971008K), 11.3124900 secs]
2016-12-12T14:11:53.795+0800: 16903.207: [Full GC 2971008K->2971008K(2971008K), 10.9517160 secs]
2016-12-12T14:12:04.760+0800: 16914.172: [Full GC 2971008K->2971007K(2971008K), 11.0969500 secs]
2016-12-12T14:12:15.868+0800: 16925.281: [Full GC 2971008K->2971008K(2971008K), 11.1244090 secs]
2016-12-12T14:12:27.003+0800: 16936.416: [Full GC 2971008K->2971008K(2971008K), 11.0206800 secs]
2016-12-12T14:12:38.035+0800: 16947.448: [Full GC 2971008K->2971008K(2971008K), 11.0024270 secs]
2016-12-12T14:12:49.048+0800: 16958.461: [Full GC 2971008K->2971008K(2971008K), 10.9831440 secs]
2016-12-12T14:13:00.042+0800: 16969.454: [GC 2971008K(2971008K), 0.7338780 secs]
2016-12-12T14:13:02.496+0800: 16971.908: [Full GC 2971008K->2971007K(2971008K), 11.1536860 secs]
2016-12-12T14:13:13.661+0800: 16983.074: [Full GC 2971007K->2971007K(2971008K), 10.9956150 secs]
2016-12-12T14:13:24.667+0800: 16994.080: [Full GC 2971007K->2971007K(2971008K), 11.0139660 secs]
2016-12-12T14:13:35.691+0800: 17005.104: [GC 2971007K(2971008K), 0.6693770 secs]
2016-12-12T14:13:38.115+0800: 17007.527: [Full GC 2971007K->2971006K(2971008K), 11.0514040 secs]
2016-12-12T14:13:49.178+0800: 17018.590: [Full GC 2971007K->2971007K(2971008K), 10.8881160 secs]
2016-12-12T14:14:00.076+0800: 17029.489: [GC 2971007K(2971008K), 0.7046370 secs]
2016-12-12T14:14:02.498+0800: 17031.910: [Full GC 2971007K->2971007K(2971008K), 11.3424300 secs]
2016-12-12T14:14:13.862+0800: 17043.274: [Full GC 2971008K->2971006K(2971008K), 11.6215890 secs]
2016-12-12T14:14:25.503+0800: 17054.915: [GC 2971006K(2971008K), 0.7196840 secs]
2016-12-12T14:14:27.857+0800: 17057.270: [Full GC 2971008K->2971007K(2971008K), 11.3879990 secs]
2016-12-12T14:14:39.266+0800: 17068.678: [Full GC 2971007K->2971007K(2971008K), 11.1611420 secs]
2016-12-12T14:14:50.446+0800: 17079.859: [GC 2971007K(2971008K), 0.6976180 secs]
2016-12-12T14:14:52.782+0800: 17082.195: [Full GC 2971007K->2971007K(2971008K), 11.4318900 secs]
2016-12-12T14:15:04.235+0800: 17093.648: [Full GC 2971007K->2971007K(2971008K), 11.3429010 secs]
2016-12-12T14:15:15.598+0800: 17105.010: [GC 2971007K(2971008K), 0.6832320 secs]
2016-12-12T14:15:17.930+0800: 17107.343: [Full GC 2971008K->2971007K(2971008K), 11.1898520 secs]
2016-12-12T14:15:29.131+0800: 17118.544: [Full GC 2971007K->2971007K(2971008K), 10.9680150 secs]
2016-12-12T14:15:40.110+0800: 17129.522: [GC 2971007K(2971008K), 0.7444890 secs]
2016-12-12T14:15:42.508+0800: 17131.920: [Full GC 2971007K->2971007K(2971008K), 11.3052160 secs]
2016-12-12T14:15:53.824+0800: 17143.237: [Full GC 2971007K->2971007K(2971008K), 10.9484100 secs]
2016-12-12T14:16:04.783+0800: 17154.196: [Full GC 2971007K->2971007K(2971008K), 10.9543950 secs]
2016-12-12T14:16:15.748+0800: 17165.160: [GC 2971007K(2971008K), 0.7066150 secs]
2016-12-12T14:16:18.176+0800: 17167.588: [Full GC 2971007K->2971007K(2971008K), 11.1201370 secs]
2016-12-12T14:16:29.307+0800: 17178.719: [Full GC 2971007K->2971007K(2971008K), 11.0746950 secs]
2016-12-12T14:16:40.392+0800: 17189.805: [Full GC 2971007K->2971007K(2971008K), 11.0036170 secs]
2016-12-12T14:16:51.407+0800: 17200.819: [Full GC 2971007K->2971007K(2971008K), 10.9655670 secs]
2016-12-12T14:17:02.383+0800: 17211.796: [Full GC 2971007K->2971007K(2971008K), 10.7348560 secs]
2016-12-12T14:17:13.128+0800: 17222.540: [GC 2971007K(2971008K), 0.6679470 secs]
2016-12-12T14:17:15.450+0800: 17224.862: [Full GC 2971007K->2971007K(2971008K), 10.6219270 secs]
2016-12-12T14:17:26.081+0800: 17235.494: [Full GC 2971007K->2971007K(2971008K), 10.9158450 secs]
2016-12-12T14:17:37.016+0800: 17246.428: [Full GC 2971007K->2971007K(2971008K), 11.3107490 secs]
2016-12-12T14:17:48.337+0800: 17257.750: [Full GC 2971007K->2971007K(2971008K), 11.0769460 secs]
2016-12-12T14:17:59.424+0800: 17268.836: [GC 2971007K(2971008K), 0.6707600 secs]
2016-12-12T14:18:01.850+0800: 17271.262: [Full GC 2971007K->2970782K(2971008K), 12.6348300 secs]
2016-12-12T14:18:14.496+0800: 17283.909: [GC 2970941K(2971008K), 0.7525790 secs]
2016-12-12T14:18:16.890+0800: 17286.303: [Full GC 2971006K->2970786K(2971008K), 13.1047470 secs]
2016-12-12T14:18:30.008+0800: 17299.421: [GC 2970836K(2971008K), 0.8139710 secs]
2016-12-12T14:18:32.458+0800: 17301.870: [Full GC 2971005K->2970873K(2971008K), 13.0410540 secs]
2016-12-12T14:18:45.512+0800: 17314.925: [Full GC 2971007K->2970893K(2971008K), 12.7169690 secs]
2016-12-12T14:18:58.239+0800: 17327.652: [GC 2970910K(2971008K), 0.7314350 secs]
2016-12-12T14:19:00.557+0800: 17329.969: [Full GC 2971008K->2970883K(2971008K), 11.1889000 secs]
2016-12-12T14:19:11.767+0800: 17341.180: [Full GC 2971006K->2970940K(2971008K), 11.4069700 secs]
2016-12-12T14:19:23.185+0800: 17352.597: [GC 2970950K(2971008K), 0.6689360 secs]
2016-12-12T14:19:25.484+0800: 17354.896: [Full GC 2971007K->2970913K(2971008K), 12.6980050 secs]
2016-12-12T14:19:38.194+0800: 17367.607: [Full GC 2971004K->2970902K(2971008K), 12.7641130 secs]
2016-12-12T14:19:50.968+0800: 17380.380: [GC 2970921K(2971008K), 0.6966130 secs]
2016-12-12T14:19:53.266+0800: 17382.678: [Full GC 2971007K->2970875K(2971008K), 12.9416660 secs]
2016-12-12T14:20:06.233+0800: 17395.645: [Full GC 2971007K->2970867K(2971008K), 13.2740780 secs]
2016-12-12T14:20:19.527+0800: 17408.939: [GC 2970881K(2971008K), 0.7696770 secs]
2016-12-12T14:20:22.024+0800: 17411.436: [Full GC 2971007K->2970886K(2971008K), 13.8729770 secs]
2016-12-12T14:20:35.919+0800: 17425.331: [Full GC 2971002K->2915146K(2971008K), 12.8270160 secs]
2016-12-12T14:20:48.762+0800: 17438.175: [GC 2915155K(2971008K), 0.6856650 secs]
2016-12-12T14:20:51.271+0800: 17440.684: [Full GC 2971007K->2915307K(2971008K), 12.4895750 secs]
2016-12-12T14:21:03.771+0800: 17453.184: [GC 2915320K(2971008K), 0.6249910 secs]
2016-12-12T14:21:06.377+0800: 17455.789: [Full GC 2971007K->2914274K(2971008K), 12.6835220 secs]
2016-12-12T14:21:19.129+0800: 17468.541: [GC 2917963K(2971008K), 0.6917090 secs]
2016-12-12T14:21:21.526+0800: 17470.938: [Full GC 2971007K->2913949K(2971008K), 13.0442320 secs]
2016-12-12T14:21:36.588+0800: 17486.000: [GC 2936827K(2971008K), 0.7244690 secs]
So, the logs show that there might be memory leak existing, it might occur in two place:
1) my code or 2) code in spark graphx api
Can anyone help me find out the reason if it occurs in my code?
I don't think unpersist() API is causing out of memory. OutOfMemory is caused by collect() API because collect() (which is an Action unlike Transformation) fetches the entire RDD to a single driver machine.
Few suggestions:
Increasing the RAM in driver memory is one partial solution, which you have already implemented. If you are working with jdk 8, use G1GC collector to manage large heaps.
You can play with storage levels (MEMORY_AND_DISK, OFF_HEAP etc) to fine-tune it for your application.
Have a look at this official documentation guide for more details.
I haven't solved the problem completely, but I have fix it partly,
Increase the driver memory. I have mentioned above that it stoped in round 6 or 7, but when I double the driver memory, it would stop at round 14. So, I think driver memory OOM might be one reason.
Save the candidates RDD to HDFS, and continue the process at the next time. So, the compuation before will not be wasted.
Serialize candidates RDD with Kryo. It will cost some computation on decode and encode, but saves a greate amount of memory.
There are not the perfect solution, but it does work in my case. However, I hope other guys would give the perfect one.

why not Full GC?

Eden is 8M,survivor1 and survivor2 is 2M totally,Old area is 10M.when created object alloc4,first Minor GC was triggered,and alloc1/alloc2/alloc3 were moved old area.when created alloc6,alloc4 was moved old area,alloc5 was moved survivor area.when created alloc7,Eden could't hold the alloc7,so it was moved old area,but old area hold alloc1/alloc2/alloc3/alloc4,9M,also could't hold alloc7,so old area should trigger Full GC,recycle the alloc1,alloc3.But why the 3rd GC NOT full gc but minor gc?
/**
* VM Args:-Xms20M -Xmx20M -Xmn10M -XX:SurvivorRatio=8 -XX:+PrintGCDetails
*
* #author yikebocai#gmail.com
* #since 2013-3-26
*
*/
public class Testjstat {
private static final int _1MB = 1024 * 1024;
public static void main(String[] args) throws InterruptedException {
byte[] alloc1 = new byte[2 * _1MB];
byte[] alloc2 = new byte[2 * _1MB];
byte[] alloc3 = new byte[1 * _1MB];
// first Minor GC
byte[] alloc4 = new byte[4 * _1MB];
byte[] alloc5 = new byte[_1MB / 4];
// second Minor GC
byte[] alloc6 = new byte[6 * _1MB];
alloc1 = null;
alloc3 = null;
// first Full GC
byte[] alloc7 = new byte[3 * _1MB];
}
}
The gc detail is :
[GC [DefNew: 5463K->148K(9216K), 0.0063046 secs] 5463K->5268K(19456K), 0.0063589 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
[GC [DefNew: 4587K->404K(9216K), 0.0046368 secs] 9707K->9620K(19456K), 0.0046822 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
[GC [DefNew: 6548K->6548K(9216K), 0.0000373 secs][Tenured: 9216K->6144K(10240K), 0.0124560 secs] 15764K->12692K(19456K), [Perm : 369K->369K(12288K)], 0.0126052 secs] [Times: user=0.00 sys=0.02, real=0.01 secs]
Heap
def new generation total 9216K, used 6712K [0x322a0000, 0x32ca0000, 0x32ca0000)
eden space 8192K, 81% used [0x322a0000, 0x3292e2a8, 0x32aa0000)
from space 1024K, 0% used [0x32aa0000, 0x32aa0000, 0x32ba0000)
to space 1024K, 0% used [0x32ba0000, 0x32ba0000, 0x32ca0000)
tenured generation total 10240K, used 9216K [0x32ca0000, 0x336a0000, 0x336a0000)
the space 10240K, 90% used [0x32ca0000, 0x335a0030, 0x335a0200, 0x336a0000)
compacting perm gen total 12288K, used 369K [0x336a0000, 0x342a0000, 0x376a0000)
the space 12288K, 3% used [0x336a0000, 0x336fc548, 0x336fc600, 0x342a0000)
ro space 10240K, 51% used [0x376a0000, 0x37bccf58, 0x37bcd000, 0x380a0000)
rw space 12288K, 54% used [0x380a0000, 0x38738f50, 0x38739000, 0x38ca0000)

Why does this code consume so much heap?

Here is the full repository. This is a very simple test which inserts 50000 random things into the database with the postgresql-simple database binding. It uses MonadRandom and can generate Things lazily.
Here is the lazy Thing generator.
Here is case1 and specific snippet of code using Thing generator:
insertThings c = do
ts <- genThings
withTransaction c $ do
executeMany c "insert into things (a, b, c) values (?, ?, ?)" $ map (\(Thing ta tb tc) -> (ta, tb, tc)) $ take 50000 ts
Here is case2, which just dumps Things to stdout:
main = do
ts <- genThings
mapM print $ take 50000 ts
In the first case I have very bad GC times:
cabal-dev/bin/posttest +RTS -s
1,750,661,104 bytes allocated in the heap
619,896,664 bytes copied during GC
92,560,976 bytes maximum residency (10 sample(s))
990,512 bytes maximum slop
239 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 3323 colls, 0 par 11.01s 11.46s 0.0034s 0.0076s
Gen 1 10 colls, 0 par 0.74s 0.77s 0.0769s 0.2920s
INIT time 0.00s ( 0.00s elapsed)
MUT time 2.97s ( 3.86s elapsed)
GC time 11.75s ( 12.23s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 14.72s ( 16.09s elapsed)
%GC time 79.8% (76.0% elapsed)
Alloc rate 588,550,530 bytes per MUT second
Productivity 20.2% of total user, 18.5% of total elapsed
While in the second case times are great:
cabal-dev/bin/dumptest +RTS -s > out
1,492,068,768 bytes allocated in the heap
7,941,456 bytes copied during GC
2,054,008 bytes maximum residency (3 sample(s))
70,656 bytes maximum slop
6 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 2888 colls, 0 par 0.13s 0.16s 0.0001s 0.0089s
Gen 1 3 colls, 0 par 0.01s 0.01s 0.0020s 0.0043s
INIT time 0.00s ( 0.00s elapsed)
MUT time 2.00s ( 2.37s elapsed)
GC time 0.14s ( 0.16s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 2.14s ( 2.53s elapsed)
%GC time 6.5% (6.4% elapsed)
Alloc rate 744,750,084 bytes per MUT second
Productivity 93.5% of total user, 79.0% of total elapsed
I have tried to apply heap profiling, but did not understand anything. It looks like all of 50000 Thing are constructed in memory first, then transformed into ByteStrings with queries and then these strings are sent to the database. But why does it happen? How do I determine the guilty code?
GHC version is 7.4.2
Compilations flags is -O2 for all libraries and package itself (compiled by cabal-dev in sandbox)
I've checked profile with formatMany and 50k Things. Memory builds up steadily and then quickly drops. Maximum memory used is slightly over 40mb. Main cost centers are buildQuery and escapeStringConn followed by toRow. Half of data are ARR_WORDS (byte strings), Actions and lists.
formatMany pretty much makes one long ByteString from pieces assembled from nested lists of Actions. Actions are converted to ByteString Builders, which retain ByteStrings until used to produce final long strict ByteString.
These ByteStrings live long life up until final BS is constructed.
The strings need to be escaped with libPQ, so any non Plain action BS is passed to libPQ and replaced with new one in escapeStringConn and friends, adding more garbage.
If you replace Text in Thing with another Int, GC time drops from 75% to 45%.
I've tried to lessen use of temporary lists by formatMany and buildQuery, replacing mapM with foldM over Builder. It does not help much but increases code complexity a bit.
TLDR - Builders can't be consumed lazily because all of them are needed to produce final strict ByteString (pretty much array of bytes).
If you have problem with memory, split executeMany into chunks inside same transaction.

Resources