Spark job can not acquire resource from mesos cluster - apache-spark

I am using Spark Job Server (SJS) to create context and submit jobs.
My cluster includes 4 servers.
master1: 10.197.0.3
master2: 10.197.0.4
master3: 10.197.0.5
master4: 10.197.0.6
But only master1 has a public ip.
First of all I set up zookeeper for master1, master3 and master3 and zookeeper-id from 1 to 3.
I intend use master1, master2, master3 to be a masters of cluster.
That mean quorum=2 I set for 3 masters.
The zk connect is zk://master1:2181,master2:2181,master3:2181/mesos
each server I also start mesos-slave so I have 4 slaves and 3 masters.
As you can see all slaves are conencted.
But the funny thing is when I create a job to run it can not acquire the resource.
From logs I saw that it's continuing DECLINE the offer. This logs from master.
I0523 15:01:00.116981 32513 master.cpp:3641] Processing DECLINE call for offers: [ dc18c89f-d802-404b-9221-71f0f15b096f-O4264 ] for framework dc18c89f-d802-404b-9221-71f0f15b096f-0001 (sql_context-1) at scheduler-f5196abd-f420-48c6-b2fe-0306595601d4#10.197.0.3:28765
I0523 15:01:00.117086 32513 master.cpp:3641] Processing DECLINE call for offers: [ dc18c89f-d802-404b-9221-71f0f15b096f-O4265 ] for framework dc18c89f-d802-404b-9221-71f0f15b096f-0001 (sql_context-1) at scheduler-f5196abd-f420-48c6-b2fe-0306595601d4#10.197.0.3:28765
I0523 15:01:01.460502 32508 replica.cpp:673] Replica in VOTING status received a broadcasted recover request from (914)#127.0.0.1:5050
I0523 15:01:02.117753 32510 master.cpp:5324] Sending 1 offers to framework dc18c89f-d802-404b-9221-71f0f15b096f-0000 (sql_context) at scheduler-9b4637cf-4b27-4629-9a73-6019443ed30b#10.197.0.3:28765
I0523 15:01:02.118099 32510 master.cpp:5324] Sending 1 offers to framework dc18c89f-d802-404b-9221-71f0f15b096f-0001 (sql_context-1) at scheduler-f5196abd-f420-48c6-b2fe-0306595601d4#10.197.0.3:28765
I0523 15:01:02.119299 32508 master.cpp:3641] Processing DECLINE call for offers: [ dc18c89f-d802-404b-9221-71f0f15b096f-O4266 ] for framework dc18c89f-d802-404b-9221-71f0f15b096f-0000 (sql_context) at scheduler-9b4637cf-4b27-4629-9a73-6019443ed30b#10.197.0.3:28765
I0523 15:01:02.119858 32515 master.cpp:3641] Processing DECLINE call for offers: [ dc18c89f-d802-404b-9221-71f0f15b096f-O4267 ] for framework dc18c89f-d802-404b-9221-71f0f15b096f-0001 (sql_context-1) at scheduler-f5196abd-f420-48c6-b2fe-0306595601d4#10.197.0.3:28765
I0523 15:01:02.900946 32509 http.cpp:312] HTTP GET for /master/state from 10.197.0.3:35778 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36' with X-Forwarded-For='113.161.38.181'
I0523 15:01:03.118147 32514 master.cpp:5324] Sending 1 offers to framework dc18c89f-d802-404b-9221-71f0f15b096f-0001 (sql_context-1) at scheduler-f5196abd-f420-48c6-b2fe-0306595601d4#10.197.0.3:28765
For 1 of my slave I check
W0523 14:53:15.487599 32681 status_update_manager.cpp:475] Resending status update TASK_FAILED (UUID: 3c3a022c-2032-4da1-bbab-c367d46e07de) for task driver-20160523111535-0003 of framework a9871c4b-ab0c-4ddc-8d96-c52faf0e66f7-0019
W0523 14:53:15.487773 32681 status_update_manager.cpp:475] Resending status update TASK_FAILED (UUID: cfb494b3-6484-4394-bd94-80abf2e11ee8) for task driver-20160523112724-0001 of framework a9871c4b-ab0c-4ddc-8d96-c52faf0e66f7-0020
I0523 14:53:15.487820 32680 slave.cpp:3400] Forwarding the update TASK_FAILED (UUID: 3c3a022c-2032-4da1-bbab-c367d46e07de) for task driver-20160523111535-0003 of framework a9871c4b-ab0c-4ddc-8d96-c52faf0e66f7-0019 to master#10.197.0.3:5050
I0523 14:53:15.488008 32680 slave.cpp:3400] Forwarding the update TASK_FAILED (UUID: cfb494b3-6484-4394-bd94-80abf2e11ee8) for task driver-20160523112724-0001 of framework a9871c4b-ab0c-4ddc-8d96-c52faf0e66f7-0020 to master#10.197.0.3:5050
I0523 15:02:24.120436 32680 http.cpp:190] HTTP GET for /slave(1)/state from 113.161.38.181:63097 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'
W0523 15:02:24.165690 32685 slave.cpp:4979] Failed to get resource statistics for executor 'driver-20160523111535-0003' of framework a9871c4b-ab0c-4ddc-8d96-c52faf0e66f7-0019: Container 'cac7667c-3309-4380-9f95-07d9b888e44e' not found
W0523 15:02:24.165771 32685 slave.cpp:4979] Failed to get resource statistics for executor 'driver-20160523112724-0001' of framework a9871c4b-ab0c-4ddc-8d96-c52faf0e66f7-0020: Container '9c661311-bf7f-4ea6-9348-ce8c7f6cfbcb' not found
From SJS Logs
[2016-05-23 15:04:10,305] DEBUG oarseMesosSchedulerBackend [] [] - Declining offer: dc18c89f-d802-404b-9221-71f0f15b096f-O4565 with attributes: Map() mem: 63403.0 cpu: 8
[2016-05-23 15:04:10,305] DEBUG oarseMesosSchedulerBackend [] [] - Declining offer: dc18c89f-d802-404b-9221-71f0f15b096f-O4566 with attributes: Map() mem: 47244.0 cpu: 8
[2016-05-23 15:04:10,305] DEBUG oarseMesosSchedulerBackend [] [] - Declining offer: dc18c89f-d802-404b-9221-71f0f15b096f-O4567 with attributes: Map() mem: 47244.0 cpu: 8
[2016-05-23 15:04:10,366] WARN cheduler.TaskSchedulerImpl [] [akka://JobServer/user/context-supervisor/sql_context] - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
[2016-05-23 15:04:10,505] DEBUG cheduler.TaskSchedulerImpl [] [akka://JobServer/user/context-supervisor/sql_context] - parentName: , name: TaskSet_0, runningTasks: 0
[2016-05-23 15:04:11,306] DEBUG oarseMesosSchedulerBackend [] [] - Declining offer: dc18c89f-d802-404b-9221-71f0f15b096f-O4568 with attributes: Map() mem: 47244.0 cpu: 8
[2016-05-23 15:04:11,306] DEBUG oarseMesosSchedulerBackend [] [] - Declining offer: dc18c89f-d802-404b-9221-71f0f15b096f-O4569 with attributes: Map() mem: 63403.0 cpu: 8
[2016-05-23 15:04:11,505] DEBUG cheduler.TaskSchedulerImpl [] [akka://JobServer/user/context-supervisor/sql_context] - parentName: , name: TaskSet_0, runningTasks: 0
[2016-05-23 15:04:12,308] DEBUG oarseMesosSchedulerBackend [] [] - Declining offer: dc18c89f-d802-404b-9221-71f0f15b096f-O4570 with attributes: Map() mem: 47244.0 cpu: 8
[2016-05-23 15:04:12,505] DEBUG cheduler.TaskSchedulerImpl [] [akka://JobServer/user/context-supervisor/sql_context] - parentName: , name: TaskSet_0, runningTasks: 0
In master2 logs
May 23 08:19:44 ants-vps mesos-master[1866]: E0523 08:19:44.273349 1902 process.cpp:1958] Failed to shutdown socket with fd 28: Transport endpoint is not connected
May 23 08:19:54 ants-vps mesos-master[1866]: I0523 08:19:54.274245 1899 replica.cpp:673] Replica in VOTING status received a broadcasted recover request from (1257)#127.0.0.1:5050
May 23 08:19:54 ants-vps mesos-master[1866]: E0523 08:19:54.274533 1902 process.cpp:1958] Failed to shutdown socket with fd 28: Transport endpoint is not connected
May 23 08:20:04 ants-vps mesos-master[1866]: I0523 08:20:04.275291 1897 replica.cpp:673] Replica in VOTING status received a broadcasted recover request from (1260)#127.0.0.1:5050
May 23 08:20:04 ants-vps mesos-master[1866]: E0523 08:20:04.275512 1902 process.cpp:1958] Failed to shutdown socket with fd 28: Transport endpoint is not connected
From master3:
May 23 08:21:05 ants-vps mesos-master[22023]: I0523 08:21:05.994082 22042 recover.cpp:193] Received a recover response from a replica in EMPTY status
May 23 08:21:15 ants-vps mesos-master[22023]: I0523 08:21:15.994051 22043 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
May 23 08:21:15 ants-vps mesos-master[22023]: I0523 08:21:15.994529 22036 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (1282)#127.0.0.1:5050
How to find the reason of that issues and fix it?

Related

Running into RedisTimeoutException and other exceptions with Redisson and Azure Redis Cache

A lot of timeout exceptions and Can't add slave exceptions
Steps to reproduce or test case
intermittent
Redis version
Azure Redis Cache with 5 shards
4.0.14, 3.2.7
Redisson version
3.11.4
Redisson configuration
Default clustered config with the following overrides:
REDIS_ENABLED | true
REDIS_KEEP_ALIVE | true
REDIS_THREADS | 512
REDIS_NETTY_THREADS | 1024
REDIS_MASTER_CONNECTION_MINIMUM_IDLE_SIZE | 5
REDIS_MASTER_CONNECTION_POOL_SIZE | 10
REDIS_SLAVE_CONNECTION_MINIMUM_IDLE_SIZE | 5
REDIS_SLAVE_CONNECTION_POOL_SIZE | 10
REDIS_TIMEOUT | 1000
REDIS_RETRY_INTERVAL | 500
REDIS_TCP_NO_DELAY | true
I see following exceptions in the log:
`
exception: { [-]
class: org.redisson.client.RedisConnectionException
thrownfrom: unknown
}
level: ERROR
logger_name: org.redisson.cluster.ClusterConnectionManager
message: Can't add slave: rediss://:15002
process: 6523
stack_trace: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (READONLY), params: [], Redis client: [addr=rediss://:15002]
at org.redisson.client.RedisConnection.lambda$async$1(RedisConnection.java:207)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:680)
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:755)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:483)
... 2 common frames omitted
Wrapped by: org.redisson.client.RedisConnectionException: Unable to connect to Redis server: /:15002
at org.redisson.connection.pool.ConnectionPool$1.lambda$run$0(ConnectionPool.java:160)
at org.redisson.misc.RedissonPromise.lambda$onComplete$0(RedissonPromise.java:183)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
at org.redisson.misc.RedissonPromise.tryFailure(RedissonPromise.java:96)
at org.redisson.connection.pool.ConnectionPool.promiseFailure(ConnectionPool.java:330)
at org.redisson.connection.pool.ConnectionPool.lambda$createConnection$1(ConnectionPool.java:296)
at org.redisson.misc.RedissonPromise.lambda$onComplete$0(RedissonPromise.java:183)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
at org.redisson.misc.RedissonPromise.tryFailure(RedissonPromise.java:96)
at org.redisson.client.RedisClient$2$1.run(RedisClient.java:240)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:510)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:518)
at i.n.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
stack_trace: org.redisson.client.RedisTimeoutException: Unable to get connection! Try to increase 'nettyThreads' and/or connection pool size settingsNode source: NodeSource [slot=15393, addr=redis://:15007, redisClient=null, redirect=MOVED, entry=null], command: (PSETEX), params: [some key, 3600000, PooledUnsafeDirectByteBuf(ridx: 0, widx: 457, cap: 512)] after 0 retry attempts
at org.redisson.command.RedisExecutor$2.run(RedisExecutor.java:209)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:680)
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:755)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:483)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
level: ERROR
logger_name: org.redisson.cluster.ClusterConnectionManager
message: Can't add master: rediss://:15007 for slot ranges: [[15019-15564], [12288-13652], [4096-5461]]
process: 6574
thread_name: redisson-netty-2-718
timestamp: 2019-10-29 22:32:15.592
`
My logs are flooded with these exceptions and when I login to Azure portal, I see the CPU metric for Redis spiked to 100%. Any help is appreciated.

Intermittent connectionTimeout errors in spark streaming job

I have a Spark (2.1) streaming job that writes processed data to azure blob storage every batch (with batch interval 1 min). Every now and then (once every couple of hours) I get the 'java.net.ConnectException' with connection timeout message. This does gets retried and eventually succeeds. But this issue is causing delay in the completion of the 1 min streaming batch and is causing it to finish in 2 to 3 min when this error occurs.
Below is the executor log snippet with error message. I have spark.executor.cores=5.
Is there some kind of number of connections limit that might be causing this?
17/10/11 16:09:02 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {Starting operation.}
17/10/11 16:09:02 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {Starting operation with location 'PRIMARY' per location mode 'PRIMARY_ONLY'.}
17/10/11 16:09:02 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {Starting request to 'http://<name>.blob.core.windows.net/rawData/2017/10/11/16/data1.json' at 'Wed, 11 Oct 2017 16:09:02 GMT'.}
17/10/11 16:09:02 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {Waiting for response.}
..
17/10/11 16:11:09 WARN root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {Retryable exception thrown. Class = 'java.net.ConnectException', Message = 'Connection timed out (Connection timed out)'.}
17/10/11 16:11:09 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {Checking if the operation should be retried. Retry count = '0', HTTP status code = '-1', Error Message = 'An unknown failure occurred : Connection timed out (Connection timed out)'.}
17/10/11 16:11:09 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {The next location has been set to 'PRIMARY', per location mode 'PRIMARY_ONLY'.}
17/10/11 16:11:09 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {The retry policy set the next location to 'PRIMARY' and updated the location mode to 'PRIMARY_ONLY'.}
17/10/11 16:11:09 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {Operation will be retried after '0'ms.}
17/10/11 16:11:09 INFO root: {89f867cc-cbd3-4fa9-a549-4e07be3f69b0}: {Retrying failed operation.}

Cassandra HeartBeat logs

I am using cassandra in my spring application. Everytime a request comes or connection is idle, cassandra is printing heartbeat DEBUG logs through logback of my application. I want to stop these heartbeat logs in my debug logs.
2016-11-03 11:37:27,241 DEBUG [cluster1-nio-worker-2] [com.datastax.driver.core.Connection] [line : 1093 ] [] - Connection[/10.41.123.31:9042-2, inFlight=0, closed=false] heartbeat query succeeded
2016-11-03 11:37:30,990 DEBUG [cluster1-nio-worker-6] [com.datastax.driver.core.Connection] [line : 1023 ] [] - Connection[/10.41.123.31:9042-6, inFlight=0, closed=false] was inactive for 30 seconds, sending heartbeat
2016-11-03 11:37:30,991 DEBUG [cluster1-nio-worker-1] [com.datastax.driver.core.Connection] [line : 1023 ] [] - Connection[/10.41.123.31:9042-1, inFlight=0, closed=false] was inactive for 30 seconds, sending heartbeat
2016-11-03 11:37:30,990 DEBUG [cluster1-nio-worker-5] [com.datastax.driver.core.Connection] [line : 1023 ] [] - Connection[/10.41.123.31:9042-5, inFlight=0, closed=false] was inactive for 30 seconds, sending heartbeat
2016-11-03 11:37:30,990 DEBUG [cluster1-nio-worker-7] [com.datastax.driver.core.Connection] [line : 1023 ] [] - Connection[/10.41.123.31:9042-7, inFlight=0, closed=false] was inactive for 30 seconds, sending heartbeat
2016-11-03 11:37:30,993 DEBUG [cluster1-nio-worker-5] [com.datastax.driver.core.Connection] [line : 1093 ] [] - Connection[/10.41.123.31:9042-5, inFlight=0, closed=false] heartbeat query succeeded
2016-1
What logger are you using?
If you are using log4j you can configure the debug level like this:
import org.apache.log4j.PropertyConfigurator;
private static final org.slf4j.Logger log = LoggerFactory.getLogger(Application.class);
public Application() {
final Properties properties = new Properties();
properties.put("log4j.rootLogger", "INFO, A1");
PropertyConfigurator.configure(properties);
}
Then the heartbeats should not longer be printed out.

Cant connect to scassandra (stubbed cassandra) using datastax-driver

I have troubles connecting to cassandra.
Im trying to connect to s-cassandra (which is a stubbed cassandra as can be reviewed here), with a datastax node.js cassandra driver.
For some reason passing "127.0.0.1:8042" as a contact point to the driver
results in a DriverInternalError:( tough sometimes it does work randomly and I havent still figured out why sometimes it does and sometime i doesnt..)
The DriverInternalError I get:
{"name": "DriverInternalError",
"stack": "...",
"message": "Local datacenter could not be
determined",
"info": "Represents a bug inside the driver or in a
Cassandra host." }
That is what I see from Cassandra Driver's log:
log event: info -- Adding host 127.0.0.1:8042
log event: info -- Getting first connection
log event: info -- Connecting to 127.0.0.1:8042
log event: verbose -- Socket connected to 127.0.0.1:8042
log event: info -- Trying to use protocol version 4
log event: verbose -- Sending stream #0
log event: verbose -- Sent stream #0 to 127.0.0.1:8042
{"name":"application-storage","hostname":"Yuris-MacBook-Pro.local","pid":1338,"level":30,"msg":"Kafka producer is initialized","time":"2016-08-05T12:53:53.124Z","v":0}
log event: verbose -- Received frame #0 from 127.0.0.1:8042
log event: info -- Protocol v4 not supported, using v2
log event: verbose -- Done receiving frame #0
log event: verbose -- disconnecting
log event: info -- Connection to 127.0.0.1:8042 closed
log event: info -- Connecting to 127.0.0.1:8042
log event: verbose -- Socket connected to 127.0.0.1:8042
log event: info -- Trying to use protocol version 2
log event: verbose -- Sending stream #0
log event: verbose -- Sent stream #0 to 127.0.0.1:8042
log event: verbose -- Received frame #0 from 127.0.0.1:8042
log event: info -- Connection to 127.0.0.1:8042 opened successfully
log event: info -- Connection pool to host 127.0.0.1:8042 created with 1 connection(s)
log event: info -- Control connection using protocol version 2
log event: info -- Connection acquired to 127.0.0.1:8042, refreshing nodes list
log event: info -- Refreshing local and peers info
log event: verbose -- Sending stream #1
log event: verbose -- Done receiving frame #0
log event: verbose -- Sent stream #1 to 127.0.0.1:8042
log event: verbose -- Received frame #1 from 127.0.0.1:8042
log event: warning -- No local info provided
log event: verbose -- Sending stream #0
log event: verbose -- Done receiving frame #1
log event: verbose -- Sent stream #0 to 127.0.0.1:8042
log event: verbose -- Received frame #0 from 127.0.0.1:8042
log event: info -- Peers info retrieved
log event: error -- Tokenizer could not be determined
log event: info -- Retrieving keyspaces metadata
log event: verbose -- Sending stream #1
log event: verbose -- Done receiving frame #0
log event: verbose -- Sent stream #1 to 127.0.0.1:8042
log event: verbose -- Received frame #1 from 127.0.0.1:8042
log event: verbose -- Sending stream #0
log event: verbose -- Done receiving frame #1
log event: verbose -- Sent stream #0 to 127.0.0.1:8042
log event: verbose -- Received frame #0 from 127.0.0.1:8042
log event: info -- ControlConnection connected to 127.0.0.1:8042 and is up to date
Ive tried playing with the firewall and open application but help is not there.. tough sometimes it does work randomly and I havent still figured out why..
I have a mac OS X El Capitan
The Solution that helped me:
I needed to prime the system.local table as a prime-query-single
{
query: 'prime-query-single',
header: {'Content-Type': 'application/json'},
body: {
"when": {
"query": "SELECT * FROM system.local WHERE key='local'"
},
"then": {
"rows": [
{
"cluster_name": "custom cluster name",
"partitioner": "org.apache.cassandra.dht.Murmur3Partitioner",
"data_center": "dc1",
"rack": "rc1",
"tokens": [
"1743244960790844724"
],
"release_version": "2.0.1"
}
],
"result": "success",
"column_types": {
"tokens": "set<text>"
}
}
}
}

BitTorrent client: trouble downloading last few blocks from peers

The BitTorrent client I'm working on is almost working except it couldnt get the last few blocks from peers even though the requests have been sent. I get no data from peers except for keep-alive messages and peers would close their connection after sending a few keep-alive.
I know there is a end game mode, where there's a tendency for the last few blocks to trickle in slowly. But in my case, all peers would stop sending any data when there are a few blocks left and close their connection one by one.
What could be the problem that is causing this?
2016-07-19 15:05:27,131 - main.torrent_client - INFO - we have piece 69
2016-07-19 15:05:27,131 - main.torrent_client - INFO - downloaded: 71, total: 72
2016-07-19 15:05:27,132 - main.torrent_client - INFO - peer queue {70, 71}
2016-07-19 15:05:27,132 - main.torrent_client - INFO - 2 blocks left to request from 198.251.56.71
2016-07-19 15:05:27,132 - main.torrent_client - INFO - pop a block from piece 70
2016-07-19 15:05:27,132 - main.torrent_client - INFO - peer queue {71}
2016-07-19 15:05:27,132 - main.torrent_client - INFO - 1 blocks left to request from 198.251.56.71
2016-07-19 15:05:27,133 - main.torrent_client - INFO - pop a block from piece 71
2016-07-19 15:05:30,138 - main.torrent_client - INFO - requested {'index': 71, 'begin_offset': 0, 'request_length': 16384} from 198.251.56.71
2016-07-19 15:05:27,133 - main.pieces - INFO - Done requesting all the pieces!!!!!!!!!
2016-07-19 15:06:18,066 - main.torrent_client - INFO - just sent keep alive message to {'port': 57430, 'host': '198.251.56.71'}
2016-07-19 15:06:18,066 - main.torrent_client - INFO - just sent keep alive message to {'port': 29063, 'host': '38.76.93.8'}
2016-07-19 15:07:15,856 - main.torrent_client - DEBUG - Peer {'port': 29063, 'host': '38.76.93.8'} sent KEEP ALIVE message
2016-07-19 15:07:48,065 - main.torrent_client - INFO - just sent keep alive message to {'port': 57430, 'host': '198.251.56.71'}
2016-07-19 15:07:48,065 - main.torrent_client - INFO - just sent keep alive message to {'port': 29063, 'host': '38.76.93.8'}
2016-07-19 15:09:16,796 - main.torrent_client - DEBUG - Peer {'port': 29063, 'host': '38.76.93.8'} sent KEEP ALIVE message
2016-07-19 15:09:18,066 - main.torrent_client - INFO - just sent keep alive message to {'port': 57430, 'host': '198.251.56.71'}
2016-07-19 15:10:48,070 - main.torrent_client - INFO - just sent keep alive message to {'port': 29063, 'host': '38.76.93.8'}
2016-07-19 15:11:17,785 - main.torrent_client - DEBUG - Peer {'port': 29063, 'host': '38.76.93.8'} sent KEEP ALIVE message
2016-07-19 15:13:48,073 - main.torrent_client - DEBUG - connection closed by {'port': 57430, 'host': '198.251.56.71'}

Resources