ArangoDB batch mode doesn't work with Java Driver

ArangoDB batch mode doesn't work with Java Driver - arangodb

I am evaluating ArangoDB using spring batch.
I tried to insert some data and, without batch mode, it works as expected.
However, if batch mode is on, the execution of the program hangs.
I am using arango 2.3.3 and com.arangodb:arangodb-java-driver:[2.2-SNAPSHOT,2.2]
arangoDriver.startBatchMode();
for(Account acc : items){
acc.getRecordHash();
acc.getIdHash();
arangoDriver.createDocument("AccountCollection", acc);
}
arangoDriver.executeBatch();
Any ideas what I am doing wrong?

I tried to reproduce what you are trying, first of all does the collection "AccountCollection" exist ? If not you would get an error in the batch result but still the program should not hang. i created a unittest :
#Test
public void test_StartCancelExecuteBatchMode() throws ArangoException {
driver.startBatchMode();
ArrayList<Account> items = new ArrayList<Account>();
items.add(new Account());
items.add(new Account());
items.add(new Account());
items.add(new Account());
for(Account acc : items){
acc.getRecordHash();
acc.getIdHash();
driver.createDocument("AccountCollection", acc, true, false);
}
driver.executeBatch();
}
This works perfectly and returns:
EOB
16:47:01.862 [main] DEBUG com.arangodb.http.HttpManager - [RES]http-POST: statusCode=200
16:47:01.862 [main] DEBUG com.arangodb.http.HttpManager - [RES]http-POST: text=--dlmtrMLTPRT
Content-Type: application/x-arango-batchpart
Content-Id: request1
HTTP/1.1 202 Accepted
Location: /_db/unitTestDatabase/_api/document/AccountCollection/48033214501
Content-Type: application/json; charset=utf-8
Etag: "48033214501"
Content-Length: 95
{"error":false,"_id":"AccountCollection/48033214501","_rev":"48033214501","_key":"48033214501"}
--dlmtrMLTPRT
Content-Type: application/x-arango-batchpart
Content-Id: request2
HTTP/1.1 202 Accepted
Location: /_db/unitTestDatabase/_api/document/AccountCollection/48033411109
Content-Type: application/json; charset=utf-8
Etag: "48033411109"
Content-Length: 95
{"error":false,"_id":"AccountCollection/48033411109","_rev":"48033411109","_key":"48033411109"}
--dlmtrMLTPRT
Content-Type: application/x-arango-batchpart
Content-Id: request3
HTTP/1.1 202 Accepted
Location: /_db/unitTestDatabase/_api/document/AccountCollection/48033607717
Content-Type: application/json; charset=utf-8
Etag: "48033607717"
Content-Length: 95
{"error":false,"_id":"AccountCollection/48033607717","_rev":"48033607717","_key":"48033607717"}
--dlmtrMLTPRT
Content-Type: application/x-arango-batchpart
Content-Id: request4
HTTP/1.1 202 Accepted
Location: /_db/unitTestDatabase/_api/document/AccountCollection/48033804325
Content-Type: application/json; charset=utf-8
Etag: "48033804325"
Content-Length: 95
{"error":false,"_id":"AccountCollection/48033804325","_rev":"48033804325","_key":"48033804325"}
--dlmtrMLTPRT--
But even when i create intentional errors the application never "hangs".
Frank just sent me your source code, i take a look into it. Can you try to find out where the programm is hanging ? is "executeBatch" reached at all ?

I already imported 1.6 Mio documents with your code and still everything works.
i guess it might be necessary to monitor your system resources during the import, if anything unusual occurs let us now. Generally speaking it does not seem to be the best practice to perform a one-time bulk import like this using the java api. i would recommend to use arangoimp to import the data directly into the database, this will be much faster. it is documented here

You need to increase the number of open file descriptors. The Mac has a very low limit (256). ArangoDB stores the data in datafiles of a certain chunk size. With large datasets more files are need (and some fd are already used for communication and other stuff).
When ArangoDB runs out of file descriptors, it can neither extend the dataset nor answer new questions. Therefore the import process will hang.

Related

Why ServletResponseUtil. sendFile sends a file as page content?

I want to generate a file and upload it to disk like a file.
I unload as
ServletResponseUtil.sendFile(httpServletRequest, httpServletResponse, fileName, baos.toByteArray(), ContentTypes.APPLICATION_OCTET_STREAM);
As a result, the file is not uploaded as a file to disk, but to the browser page as
�� ࡱ �; �� V ���� ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Root Entry �������� �FW� Workbook ������������ !� SummaryInformation( ���� D
Responce headers
HTTP/1.1 200
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: private, no-cache, no-store, must-revalidate
Pragma: no-cache
Liferay-Portal: Liferay Community Edition Portal 7.2.0 CE GA1 (Mueller / Build 7200 / June 4, 2019)
Content-Disposition: attachment; filename="test.xls"
Content-Type: application/octet-stream;charset=UTF-8
Content-Length: 46080
Date: Tue, 24 Dec 2019 07:21:40 GMT

That most likely happens because you're running this code within an action- or render-method of your portlet. Portlets render only fragments of HTML pages that will always be embedded in a proper HTML page. If you mock with that by obtaining the HttpServletRequest and (more specifically) HttpServletResponse, you'll just mix the portal's rendered output with your own.
You should use the serveResource method to host your implementation, there you don't even need to revert back to the HttpServletResponse, but can just stream your output to the RenderResponse.
Even if you don't end up with mixed HTML/XLS content but see only the actual XLS output, you might interfere with Liferay's SPA loader, which you can assume knows about expected return values, e.g. when rendering, it expects a HTML portal page, not binary data. That will not be the case when you use the serveResource method, as it is expected to return arbitrary Mimetypes.

Python3 Unable to Decode POST Request Result

I am sending a POST request with sockets and trying to decode the received HTML and print to the terminal.
This works fine in my initial GET request but when I try to decode and print the POST request I just get garbled text.
How can I change my decode so that the text is readable?
POST
POST
body = "hash="+md5
headers = """\
POST / HTTP/1.1\r
Host: url.com:57555\r
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0\r
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r
Accept-Language: en-US,en;q=0.5\r
Accept-Encoding: gzip, deflate\r
Referer: http://url.com:57555/\r
Content-Type: application/x-www-form-urlencoded\r
Content-Length: 32\r
Connection: close\r
Cookie: PHPSESSID=some_cookie\r
Upgrade-Insecure-Requests: 1\r
\r\n"""
payload = headers + body
s.sendall(payload.encode('utf-8'))
res = s.recv(4096)
print(str(res, errors='replace'))
Result...
python3 emdee5.py
HTTP/1.1 200 OK
Date: Sun, 26 May 2019 22:01:26 GMT
Server: Apache/2.4.18 (Ubuntu)
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate
Pragma: no-cache
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 282
Connection: close
Content-Type: text/html; charset=UTF-8
]�1o� ���
ʒ��Ҩ��b�V��LN��؜
p�$����Py��d��FP��l� ^�֞i�ĜmA��F7i�zd}��VͩK8}ߠ���!�n�W>�wL9ۅr�#Ȑ����� 4i��ec{"%��0���)������W���A�I��"��GD�;�܉"J��JA}x��l1��3٠.y�>Om�#5��9
��ڨ�p�j����JN���MQ̀)�:�p�P{K���4J^-��+�7�oV'E;'=�����l�

Your request explicitly says that you are willing to accept a compressed response:
Accept-Encoding: gzip, deflate\r
And this is therefore what you get in the response
Content-Encoding: gzip
So, the body is compressed with gzip (which explains the garbled output) and you would need to decompress it. Given that you currently don't seem to be able to properly deal with compressed responses you should not claim in your request that you support these, i.e. remove the Accept-Encoding.
Apart from that more is likely wrong with your request:
body = "hash="+md5
...
Content-Length: 32\r
...
payload = headers + body
...
Given that md5 is 32 characters hex (or 16 byte binary) the body consisting of "hash"=+md5 is likely not 32 characters long as you claim in your Content-Length.
POST / HTTP/1.1\r
Additionally you send a HTTP/1.1 request so you have to be able to deal with chunked responses - but your code does not deal with these.
res = s.recv(4096)
Similarly your code blindly assumes that the full response can be retrieved within a single recv which does not need to be the case.
In summary: unless you have a deeper understanding of how HTTP works (which you do not seem to have) it is recommended that you use existing libraries to handle HTTP for you, since these were written by developers who have an understanding of HTTP.
And even if you already have an understanding of HTTP you'll likely will use these libraries anyway since you'll know that HTTP is far from trivial and that it makes no sense to implement all the necessary details and edge cases by yourself in your code if something robust already exists.

Python3 requests.get is too slow

EDIT Adding info:
requests version: 2.21.0
Server info: a Windows python implementation which includes 10 instances of threading.Thread, each creating HTTPServer with a handler based on BaseHTTPRequestHandler. My do_GET looks like this:
def do_GET(self):
rc = 'some response'
self.send_response(200)
self.send_header('Content-type', 'text/html')
self.send_header('Access-Control-Allow-Origin', '*')
self.end_headers()
self.wfile.write(rc.encode('utf-8'))
I'm getting a strange behaviour.
Using the curl command line, the GET command is finished quickly:
curl "http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=1"
However, using requests.get() of python takes too much time. I was isolated it up to
python -c "import requests; requests.get('http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=1')"
I scanned through many other questions here and have tried many things, without success.
Here are some of my findings:
If I'm adding timeout=0.2, the call is ending quickly without any error.
However, adding timeout=5 or timeout=(5,5)` doesn't make it take longer. It always seem to be waiting a full one second before returning with results.
Working with a session wrapper, and cancelling keep-alive, didn't improve. I mean for this:
with requests.Session() as session:
session.headers.update({'Connection': 'close'})
url = "http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=%d&tmr=0" % i
session.get(url, timeout=2)
Enabling full debug, I'm getting the following output:
url=http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=0
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:3020
send: b'GET /pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=0 HTTP/1.1\r\nHost: localhost:3020\r\nUser-Agent: python-requests/2.21.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.0 200 OK\r\n'
header: Server: BaseHTTP/0.6 Python/3.7.2
header: Date: Wed, 01 May 2019 15:28:29 GMT
header: Content-type: text/html
header: Access-Control-Allow-Origin: *
DEBUG:urllib3.connectionpool:http://localhost:3020 "GET /pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=0 HTTP/1.1" 200 None
url=http://localhost:3020/pbio/powermtr?cmd=read-power-density
DEBUG:urllib3.connectionpool:Resetting dropped connection: localhost
slight pause here
send: b'GET /pbio/powermtr?cmd=read-power-density HTTP/1.1\r\nHost: localhost:3020\r\nUser-Agent: python-requests/2.21.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.0 200 OK\r\n'
header: Server: BaseHTTP/0.6 Python/3.7.2
header: Date: Wed, 01 May 2019 15:28:30 GMT
header: Content-type: text/html
header: Access-Control-Allow-Origin: *
DEBUG:urllib3.connectionpool:http://localhost:3020 "GET /pbio/powermtr?cmd=read-power-density HTTP/1.1" 200 None
6.710,i=4
url=http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=4&tmr=0
DEBUG:urllib3.connectionpool:Resetting dropped connection: localhost
slight pause here
...

From the docs:
timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds). If no timeout is specified explicitly, requests do not time out.

It took me 3 years to find an answer.
I still do not understand why, but at least I can suggest a working solution.
According to these docs, the timeout can be specified as a tuple, like this:
(timeout for connection, timeout for interval without data)
Although I do not understand why requests is waiting for [timeout] before issuing the connection, I can tell it to wait very little for the connection, and specify another timeout for the data.
So what I'm doing now, is giving a timeout of let's say (0.01, 4). Now the connection is immediate, and if the data has a deadtime of 4 seconds, it will generate a timeout exception.
Some interesting reading can be found here.
Hoping this info will help others!

Calculate the size of response header saved in a variable in JMeter

I have extracted the complete response header and stored into a variable and below is the response header,
HTTP/1.1 200 OK
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Content-Length: 2951
Content-Type: application/json
Date: Tue, 17 Apr 2018 06:34:03 GMT
Expires: 0
Pragma: no-cache
Strict-Transport-Security: max-age=3175000 ; includeSubDomains
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-Vcap-Request-Id: de1csd11-2721-de3a-548e-573sdt3fae22
X-Xss-Protection: 1; mode=block
Connection: close
Now i want to calculate the size of the above parsed data using Beanshell or Groovy or Javascript.
I used the function "variable.length()" but it did not help me as the parsed data have new line feed.
Can anyone please help me ?
Thanks in advance.
Regards,
Hari

For Groovy/Beanshell:
variable.replaceAll("\\n","").size()
For JavaScript:
variable.replace(/(?:\r\n|\r|\n)/g, '').length
Be aware that when it comes to high loads Groovy performance will be much better.
References:
JMeter Best Practices
Apache Groovy - Why and How You Should Use It

Is there any way to prepend a string to the beginning of a couch doc ID? Ie. "article_<regular random ID here>"

Title pretty much explains my problem. If this isn't possible, how can I write my own IDs? Specifically, how do I generate an ID string that I know isn't currently taken by Couch then I can do the string append myself and save it.

You can use an update handler to combine a prefix of your choosing and an automatically generated UUID. An update handler basically intercepts an HTTP request, does some additional transformation and returns the new document. (and a message to be sent in response to the user)
In an update handler, the req object is also given the UUID that CouchDB would have assigned to the document. If there is no existing document for the current request, you can create a new one entirely from scratch.
function (doc, req) {
// if no document exists
if (!doc) {
// create a new one
doc = {
// req.id is the generated UUID for this request
_id: "article_" + req.id
};
}
// always return an array 1) the new doc and 2) the string response message
return [ doc, "New Article Created" ];
}
The example here is really trivial, and more complete examples are given in that wiki page that I linked to. I just wanted to show you nothing more than what you needed for your question. Update handlers are a very powerful way to implement some application logic at the database layer.

I prefer Dominic's answer but for completeness, you could also pre-load a bunch of UUIDs from CouchDB and combine them yourself. To get a batch of five:
$ curl -i example.iriscouch.com/_uuids?count=5
HTTP/1.1 200 OK
Server: CouchDB/1.2.0 (Erlang OTP/R15B)
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
ETag: "86908CQ9GCBC01P6VF18YM2P4"
Date: Sun, 12 Feb 2012 02:03:23 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 187
Cache-Control: must-revalidate, no-cache
Connection: keep-alive
{"uuids":["da769daee5996412c8f598155c002885","da769daee5996412c8f598155c0034b4","da769daee5996412c8f598155c00406f","da769daee5996412c8f598155c004c18","da769daee5996412c8f598155c00501a"]}
Now you can compose your own IDs. You have more work ahead of you, to write and maintain the code to do all this. But the advantage is that you do not need an update function, executing a Javascript function (and potentially spawning a Javascript process on the server) for every document update.
Personally, I would start with Dominique's technique and only move to mine if/when you hit performance problems.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string