What does -1000 mean in spark exit status - apache-spark

I'm doing something with Spark-SQL and got error below:
YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to
remove executor 1 for reason Container marked as failed:
container_1568946404896_0002_02_000002 on host: worker1. Exit status:
-1000. Diagnostics: [2019-09-20 10:43:11.474]Task java.util.concurrent.ExecutorCompletionService$QueueingFuture#76430b7c
rejected from
org.apache.hadoop.util.concurrent.HadoopThreadPoolExecutor#16970b[Terminated,
pool size = 0, active threads = 0, queued tasks = 0, completed tasks =
1]
I'm trying to figure it out by checking the meaning of Exit status: 1000, however, no valuable info returned by googling.
According to this thread, the -1000 is not even mentioned.
Any comment is welcomed, thanks.

Related

Grok, logs processing with different values

I have a logfile, I am parsing it with telegraf.logparser and then it sends it to influxdb. The problem is, my logfile has different fields in a complete string:
2016-12-06 11:13:34 job id: mHiMMDmCDFKDmGXNMhm, lrmsid: 13370
2016-12-06 11:14:34 job id: seeeeeewsda33rfddSD, lrmsid: 13371
2016-12-06 11:14:37 job id: dmABFKDmqKcNDmHBFKD, failure: "Timeout"
I can match single of that lines with
%{TIMESTAMP_ISO8601} job id: %{WORD:jobid}, lrmsid: {%WORD.lrmsid}
or
%{TIMESTAMP_ISO8601} job id: %{WORD:jobid}, failure: {%WORD.fail}
But how can I do it to get both .. so that if lrmsid is not set, it get lrmsid=null, and failure="Timeout".. and if lrmsid is set its lrmsid=12345 and failure=null
Please try this one:
(lrmsid: %{WORD:lrmsid})?(failure: "%{WORD:failure}")?
It should capture either lrmsid or failure if I have not missed anything

ArangoDB-aysnchronize-log invokes msync too much

Here is the question: why Arango sync data thousands times in one second under async mode? Is it my wrong configuration or expected behavior?
Recently I'm testing async insert of ArangoDB and MongoDB. In my test, the average latency of Arango is 2x of MongoDB. After tuning I found their IO is different. I think it's the root cause for pool async-inert perf of Arango.
Arango: Invoke msync continuously, thousand times in one second, like the following. This causes too much iowait and too much jbd2.
05:42:21.138119 msync(0x7f50fdd75000, 4096, MS_SYNC) = 0 <0.000574>
05:42:21.138843 msync(0x7f50fdd75000, 8192, MS_SYNC) = 0 <0.000558>
05:42:21.139541 msync(0x7f50fdd76000, 4096, MS_SYNC) = 0 <0.000351>
05:42:21.139928 msync(0x7f50fdd76000, , MS_SYNC) = 0 <0.000555>
05:42:21.140532 msync(0x7f50fdd77000, 4096, MS_SYNC) = 0 <0.000318>
05:42:21.141002 msync(0x7f50fdd77000, 8192, MS_SYNC) = 0 <0.000714>
05:42:21.141755 msync(0x7f50fdd78000, 4096, MS_SYNC) = 0 <0.000345>
05:42:21.142133 msync(0x7f50fdd78000, 4096, MS_SYNC) = 0 <0.000725>
Mongo: Invoke fdatasync just several times in one second.
Test Env:
All tests are in one VM: 8vCPU-24GBMem-120GBDisk-Centos6.7
It's single thread async insert test based on java driver with ycsb.
Conf for Arango:
v2.8.7
Server, scheduler, v8-cs's threads are all set to 1.
Create collection with false waitForSync, Send insert request with false waitForSync.
Start cmd:
/usr/sbin/arangod --uid arangodb --gid arangodb --pid-file /var/run/arangodb/arangod.pid --temp-path /var/tmp/arangod --log.tty --supervisor --wal.sync-interval=1000
Collection propertis:
{
"doCompact" : true,
"journalSize" : 33554432,
"isSystem" : false,
"isVolatile" : false,
"waitForSync" : false,
"keyOptions" : {
"type" : "traditional",
"allowUserKeys" : true
},
"indexBuckets" : 8
}
Detailed trace log:
2016-04-19T06:59:36Z [12065] TRACE [arangod/Wal/SynchronizerThread.cpp:213] syncing logfile 6627612014716, region 0x7ff9beef3318 - 0x7ff9beef37f2, length: 1242, wfs: false
2016-04-19T06:59:36Z [12065] TRACE [arangod/Wal/SynchronizerThread.cpp:213] syncing logfile 6627612014716, region 0x7ff9beef37f8 - 0x7ff9beef3cd2, length: 1242, wfs: false
2016-04-19T06:59:36Z [12065] TRACE [arangod/Wal/SynchronizerThread.cpp:213] syncing logfile 6627612014716, region 0x7ff9beef3cd8 - 0x7ff9beef41b2, length: 1242, wfs: false
2016-04-19T06:59:36Z [12065] TRACE [arangod/Wal/SynchronizerThread.cpp:213] syncing logfile 6627612014716, region 0x7ff9beef41b8 - 0x7ff9beef4692, length: 1242, wfs: false
2016-04-19T06:59:36Z [12065] TRACE [arangod/Wal/SynchronizerThread.cpp:213] syncing logfile 6627612014716, region 0x7ff9beef4698 - 0x7ff9beef4b72, length: 1242, wfs: false
2016-04-19T06:59:36Z [12065] TRACE [arangod/Wal/SynchronizerThread.cpp:213] syncing logfile 6627612014716, region 0x7ff9beef4b78 - 0x7ff9beef5052, length: 1242, wfs: false
2016-04-19T06:59:36Z [12065] TRACE [arangod/Wal/SynchronizerThread.cpp:213] syncing logfile 6627612014716, region 0x7ff9beef5058 - 0x7ff9beef5532, length: 1242, wfs: false
ArangoDB as a multi model database can offer more usecases than MongoDB. While it can act as replacement, the other available features also imply different requirements to the default configuration settings and implementation details.
When you work with i.e. graphs, and want to maintain persistency in them, you can alter the probability data is actually lost by doing more frequent syncs.
ArangoDB does these syncs in another thread; When trying to reproduce your setup we found that this thread actualy does more than one would think the sync-interval configuration value in /etc/arangodb/arangod.conf:
[wal]
sync-interval=10000
We fixed this; it improves the perfromance a bit when writing locally via foxx or the arangod rescue console (which you get if you don't start it in daemon mode with the --console parameter)
However, It doesn't significantly change the performance when i.e. using arangosh to sequentially insert 10 k documents:
var time= require("internal").time;
var s = time()
db._drop('test')
db._create('test')
for (i=0; i < 100000; i++) {db.test.save({i: i})}
require("internal").print(time() -s)
In general, your numbers are similar to those in our performance comparison - so thats what was to be expected with ArangoDB 2.8.
Currently you can use the bulk import facility to reduce the overhead you get in the HTTP communication.

BadIDChoice RENDER in python 3.3 and tk/tcl displayed on X

I have a fairly complicated GUI written through python's tkinter running on linux, and one of the components (which has a Text widget which updates frequently) causes the GUI to crash infrequently (once a day).
The guis are being displayed to X running on both Mac OSX through X11 and Gnome 2.28.2 with the same behavior. My python version is 3.3 and tk/tcl version is 8.5. The error I get is:
X Error of failed request: BadIDChoice (invalid resource ID chosen for this connection)
Major opcode of failed request: 148 (RENDER)
Minor opcode of failed request: 4 (RenderCreatePicture)
Resource id in failed request: 0x116517f
Serial number of failed request: 15106831
Current serial number in output stream: 15106872
a strace looks like:
11:03:29.632041 recvfrom(13, 0x3bae1d4, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
11:03:29.632059 recvfrom(13, 0x3bae1d4, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
11:03:29.632147 poll([{fd=13, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=13, revents=POLLOUT}])
11:03:29.632164 writev(13, [{"\224\4\5\0D\304\361\0\17\274\361\0i\4\0\0\0\0\0\0\224\27\n\0\3\f\340\0\301\v\340\0"..., 5032}, {NULL, 0}, {"", 0}], 3) = 5032
11:03:29.632193 poll([{fd=13, events=POLLIN}], 1, -1) = 1 ([{fd=13, revents=POLLIN}])
11:03:29.637040 recvfrom(13, "\0\16\302\276x\304\361\0\4\0\224\0\1\0\0\0`\16\330\3\1\0\0\0\243\304\342\210\377\177\0\0"..., 4096, 0, NULL, NULL) = 136
11:03:29.637135 open("/usr/share/X11/XErrorDB", O_RDONLY) = 35
11:03:29.637217 fstat(35, {st_mode=S_IFREG|0644, st_size=41532, ...}) = 0
11:03:29.637360 read(35, "!\n! Copyright 1993, 1995, 1998 "..., 41532) = 41532
11:03:29.637387 close(35) = 0
11:03:29.637820 write(2, "X Error of failed request: BadI"..., 91) = 91
...
My GUI is single-threaded (and uses the after() call to monitor sockets for I/O).
Does anyone know what might be wrong? Is there any better debugging that I could be doing to figure out what the X Error part means?
Infrequent crashes (once a day) with the following logs...
X Error of failed request: BadIDChoice (invalid resource ID chosen for this connection)
Major opcode of failed request: 148 (RENDER)
Minor opcode of failed request: 4 (RenderCreatePicture)
...appear to be a telltale signature of a known issue within xcb as mentioned in the following thread:
Bug 458092 - Crashes with BadIdChoice X errors
The patch for it is available here.
Based on the git history, this xcb bug should be fixed in libX11-1.1.99.2 and above (~8years ago).
For further reference here is the email-thread with the complete discussion.

spring integration task executor queue filled with more records

I started to build a Spring Integration app, in which the input gateway generates a fixed number (50) of records and then stops generating new records. There are basic filters/routers/transformers in the middle, and the ending service activator and task executor config are as following:
<int:service-activator input-channel="inChannel" output-channel="outChannel" ref="svcProcessor">
<int:poller fixed-rate="100" task-executor="myTaskExecutor"/>
</int:service-activator>
<task:executor id = "myTaskExecutor" pool-size="5" queue-capacity="100"/>
I tried to put some debug info at the begging of the svcProcessor method:
#Qualifier(value="myTaskExecutor")
#Autowired
ThreadPoolTaskExecutor executor;
#ServiceActivator
public Order processOrder(Order order) {
log.debug("---- " + "executor size: " + executor.getActiveCount() +
" q: " + executor.getThreadPoolExecutor().getQueue().size() +
" r: " + executor.getThreadPoolExecutor().getQueue().remainingCapacity()+
" done: " + executor.getThreadPoolExecutor().getCompletedTaskCount() +
" task: " + executor.getThreadPoolExecutor().getTaskCount()
);
//
//process order takes up to 5 seconds.
//
return order;
}
After sometimes the program runs, the log shows the queue has reached over 50, then eventually gets reject exception:
23:38:31.096 DEBUG [myTaskExecutor-2] ---- executor size: 5 q: 44 r: 56 done: 11 task: 60
23:38:31.870 DEBUG [myTaskExecutor-5] ---- executor size: 5 q: 51 r: 49 done: 11 task: 67
23:38:33.600 DEBUG [myTaskExecutor-4] ---- executor size: 5 q: 69 r: 31 done: 11 task: 85
23:32:46.792 DEBUG [myTaskExecutor-1] ---- executor size: 5 q: 72 r: 28 done: 11 task: 88
It looks like the active count and sum of queue size/remaining looks right with the config of 5 and 100, but I am not clear why there are more than 50 records in the queue, and the taskCount is also larger than the limit 50.
Am I looking at the wrong info from the executor and the queue?
Thanks
UPDATE:
(not sure if I should open another question)
I tried the xml version of the cafeDemo from spring-integration (branch SI3.0.x), and used pool provided in the document, but used 100 milliseconds rate and added capacity:
<int:service-activator input-channel="hotDrinks" ref="barista" method="prepareHotDrink" output-channel="preparedDrinks">
<int:poller task-executor="pool" fixed-rate="100"/>
</int:service-activator>
<task:executor id="pool" pool-size="5" queue-capacity="200"/>
After I ran it, it also got rejection exception after around the 20th delivery:
org.springframework.core.task.TaskRejectedException: Executor [java.util.concurrent.ThreadPoolExecutor#6c31732b[Running, pool size = 5, active threads = 5, queued tasks = 200, completed tasks = 0]]
There are only about 32 orders placed until the exception, so I am not sure why queued tasks = 200 and completed task = 0?
THANKS
getTaskCount() This method gives the number of total task assigned to executor since the start. So, it will increase with time.
And other variables are approximate number not exact as per documentation of java.
getCompletedTaskCount()
Returns the approximate total number of tasks that have completed execution.
public int getActiveCount()
Returns the approximate number of threads that are actively executing tasks.
Ideally getTaskCount() and getCompletedTaskCount() will increase linearly with time, as it includes all the previous tasks assigned since start of execution of your code. However, activeCount should be less than 50, but being approximate number it will go beyond 50 sometimes with little margin.
Refer :-
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html

Computing eigenvalues in parallel for a large matrix

I am trying to compute the eigenvalues of a big matrix on matlab using the parallel toolbox.
I first tried:
A = rand(10000,2000);
A = A*A';
matlabpool open 2
spmd
C = codistributed(A);
tic
[V,D] = eig(C);
time = gop(#max, toc) % Time for all labs in the pool to complete.
end
matlabpool close
The code starts its execution:
Starting matlabpool using the 'local' profile ... connected to 2 labs.
But, after few minutes, I got the following error:
Error using distcompserialize
Out of Memory during serialization
Error in spmdlang.RemoteSpmdExecutor/initiateComputation (line 82)
fcns = distcompMakeByteBufferHandle( ...
Error in spmdlang.spmd_feval_impl (line 14)
blockExecutor.initiateComputation();
Error in spmd_feval (line 8)
spmdlang.spmd_feval_impl( varargin{:} );
I then tried to apply what I saw on tutorial videos from the parallel toolbox:
>> job = createParallelJob('configuration', 'local');
>> task = createTask(job, #eig, 1, {A});
>> submit(job);
waitForState(job, 'finished');
>> results = getAllOutputArguments(job)
>> destroy(job);
But after two hours computation, I got:
results =
Empty cell array: 2-by-0
My computer has 2 Gi memory and intel duoCPU (2*2Ghz)
My questions are the following:
1/ Looking at the first error, I guess my memory is not sufficient for this problem. Is there a way I can divide the input data so that my computer can handle this matrix?
2/ Why is the second result I get empty? (after 2 hours computation...)
EDIT: #pm89
You were right, an error occurred during the execution:
job =
Parallel Job ID 3 Information
=============================
UserName : bigTree
State : finished
SubmitTime : Sun Jul 14 19:20:01 CEST 2013
StartTime : Sun Jul 14 19:20:22 CEST 2013
Running Duration : 0 days 0h 3m 16s
- Data Dependencies
FileDependencies : {}
PathDependencies : {}
- Associated Task(s)
Number Pending : 0
Number Running : 0
Number Finished : 2
TaskID of errors : [1 2]
- Scheduler Dependent (Parallel Job)
MaximumNumberOfWorkers : 2
MinimumNumberOfWorkers : 1

Resources