How to set max request body size in arangodb server? - arangodb

I'm trying to do an 'arangorestore' operation on my local server. When i start it, i see:
ERROR internal error: got error from server: HTTP 413 (Request Entity Too Large)
How to configure the server to work properly on 'arangorestore'?

ArangoDB raises 413 error when a server receives a request body bigger than the max allowed value-512 MB. arangorestore has an option --batch-size but arangorestore should cap the max allowed value automatically. You can explicitly use this option to have lower batch sizes.

If you're doing line-based imports on a large file, you can easily split your file into sub-files based on line length with the unix split command:
split -l 1000000 my-huge-import.json import-split
split has other options as well. Then you can loop over and call curl with the output files, anmed import-splitaa, import-splitab etc.

Related

tailLines and SinceTime in logging api,both not worked simultaneously

I am using container engine, and my pods are hosted there.
I am trying to fetch logs, using log api :
http://localhost:8000/api/v1/namespaces/app-test/pods/designer-0/log?tailLines=100&sinceTime=2017-09-17T10:47:58Z
if i used both the query params separately, it works and show the proper result, but if i am using it simultaneously only the top 100 logs are returning, the sinceTime param is get ignored.
my scenario is, i need a log from a specific time, in a chunk like, 100 lines, 100 lines.. like this.
I am not sure, whether it is a bug, or it is not implemented.
I found this from the api reference manual
https://kubernetes.io/docs/api-reference/v1.6/
tailLines - If set, the number of lines from the end of the logs to
show. If not specified, logs are shown from the creation of the
container or sinceSeconds or sinceTime
So, that means if you specify tailLines, it start from the end. I dont see any option explicitly mentioned other than limitBytes. But you will have to play around with it as it does not guarantee number of lines.
tailLines=X tells the server to start that many lines from the end
sinceTime tells the server to start from the specified time
the options are mutually exclusive
Thanks All,
I have later on recognized that, it is not ignoring the sinceTime, as the TailLines intended functionality is return the lines from the last.
So, if i mentioned the sinceTime= 10 PM yesterday, it will return the records from that time..And if also tailLines, is mentioned, so it will return the recent logs from that chunk.
So, it was working as expected. I need to play with LimitBytes for getting the logs in chunk, from that time, Instead of full logs.

logstash custom log that has xml tags inside

I have a custom log file that has plain text as well as xml tags. How do i capture these in separate fields. Here is how it looks like:
1/10/2017 4:16:35 AM :
Error thrown is:
No Error
Request sent is:
SCEO415154712
Response received is:
SCEO4151547trueTBAfalse7169-1TBAfalse2389-1
1/10/2017 4:16:35 AM :
Error thrown is:
No Error
*************************************************************************
Request sent is:
<InventoryMgmtRequest xmlns="http://www.af.com/Ecommerce/Worldwide/AvailabilityService/Schemas/InventoryMgmtRequest"><ns0:MsgHeader MessageType="FIXORD" MsgDate="10.01.2017 04:16:32" SystemOfOrigin="ISCS_DE" CommunityID="SG888" xmlns:ns0="http://www.av.com/Ecommerce/Worldwide/AvailabilityService/Schemas/InventoryMgmtRequest"><ns0:OrderID>SCEO4151547</ns0:OrderID><ns0:ReservationID></ns0:ReservationID><ns0:CRD></ns0:CRD></ns0:MsgHeader><ns0:MsgBody xmlns:ns0="http://www.ab.com/Ecommerce/Worldwide/AvailabilityService/Schemas/InventoryMgmtRequest"><ns0:Product Sku="CH562EE" Qty="1" IsExpress="false" IsTangible="true" Region="EMEA" Country="DE"><ns0:ProdType></ns0:ProdType><ns0:LineItemNum>1</ns0:LineItemNum><ns0:JCID></ns0:JCID></ns0:Product><ns0:Product Sku="CH563EE" Qty="1" IsExpress="false" IsTangible="true" Region="EMEA" Country="DE"><ns0:ProdType></ns0:ProdType><ns0:LineItemNum>2</ns0:LineItemNum><ns0:JCID></ns0:JCID></ns0:Product></ns0:MsgBody></InventoryMgmtRequest>
*************************************************************************
Response received is:
<ns0:InventoryMgmtResponse xmlns:ns0="http://www.ad.com/Ecommerce/Worldwide/AvailabilityService/Schemas/InventoryMgmtResponse"><ns0:MsgHeader MsgDate="10.01.2017 04:16:32" MessageType="FIXORD"><ns0:OrderID>SCEO4151547</ns0:OrderID><ns0:ReservationID /><ns0:ReadyToRelease>true</ns0:ReadyToRelease></ns0:MsgHeader><ns0:MsgBody><ns0:Product SKU="CH562EE" LSPSKU="9432GFT" OutOfStock="false" FulfillmentSite="00ZF" SKUExist="true" Region="EMEA" Country="DE" IsTangible="true"><ns0:EDD>TBA</ns0:EDD><ns0:FutureUsed>false</ns0:FutureUsed><ns0:CurrentQty>7169</ns0:CurrentQty><ns0:FutureQty>-1</ns0:FutureQty></ns0:Product><ns0:Product SKU="CH563EE" LSPSKU="9432GFU" OutOfStock="false" FulfillmentSite="00ZF" SKUExist="true" Region="EMEA" Country="DE" IsTangible="true"><ns0:EDD>TBA</ns0:EDD><ns0:FutureUsed>false</ns0:FutureUsed><ns0:CurrentQty>2389</ns0:CurrentQty><ns0:FutureQty>-1</ns0:FutureQty></ns0:Product></ns0:MsgBody></ns0:InventoryMgmtResponse>
*************************************************************************
Also I don't want to capture the line separators (line full of **** at the end) in my grok fields.
There is no simple answer here I'm afraid. Logstash and other log processing tools works line by line, each line is an event. If your events span more than one line you can use the multiline codec, which is pretty powerful, but in my experience you are better off trying to get the logs on to single lines at source, this makes it so much easier to write a pattern and get the process working reliably.
The issues you have here are many, but if, for example, one of your messages (sent via TCP) is retransmitted for some reason or simply (sent via UDP) lost, your pattern will break as part of the message that logstash is expecting is not there.
The best thing you can do in my opinion is to try and change the logging process to save to a file as a single line per event. Most logging tools should allow this with the right config options. Ideally, get your application to log in json format, (assuming you're processing logs to save them in elasticsearch) this would involve the lowest overhead on the logstash server to process these logs (as elasticsearch saves them in json format). All you would then need to do is pass each event/log line to the json filter and the fields are generated by the names given to it by your application.

Processing large json's using logstash - Not working and not printing any error

I started using logstash (on windows) when my main cause of use will be passing logstash a large json (10 mb), filtering the json somehow and write it out to elastic search.
As for now, I don't really care about the json filtering (I will care after I'll get this to work). I wan't the file to pass through logstash and get to my elastic search.
The client who feeds logstash uses a tcp connection.
My logstash simple configuration file looks like:
input
{
tcp
{
port=>7788
codec=>"json"
}
}
output
{
elasticsearch
{
hosts=>"localhost:9200"
codec=>"json"
}
stdout
{
codec=>rubydebug
}
}
This does work for me on small json inputs like:
{"foo":"bar", "bar": "foo"}
I see the logstash working and passing the data to elastic search and
everything's ok.
Also, when using the default codec ("text") it worked, but not as expected.
My problem starts when the inputs are large jsons.
Assuming I have a 10 mb json - what do I need to do with it so logstash will be able to handle it over tcp as a json? Should the file be indented or not? What encoding should I use before I convert it into bytes? What codec\settings should my logstash have?
BTW, when I use curl and through the large json directly to elastic search - it works - So there are no problems with the json.
Is there any way I can get some better tracing or at least know why I fail?
I found out that the problem wasn't the length but the lack of a newline - So all I needed to do was to add a newline to my log files.
BTW, there is no 4K length limit - At least not when working with TCP.

The XML parser detected error code 302

I am using the XML-INTO op-code to parse a web service request. Every now and then I get errors in the logs
(RNX0351 - "The XML parser detected error code 302").
The help for a 302 is
302 The parser does not support the requested CCSID value or
the first character of the XML document was not '<'
To the best of my knowledge, the first character is "<" and the request is generated from a previous web service call so I would be very suprised if the CCSID has changed.
The error is repeatable, for the specific query so it is almost certainly data related, I am just unsure how I would go about identifying the offending item.
Any thoughts on how to determine the issue, or better yet, how to overcome it?
cheers
CCSID is an AS400/iSeries/Power System attribute, and it applies to the whole IFS.It's like a declaration of what inside the file is, or in other words what its internal encoding "should be".
It's supposed that data content encoding in the file and the file one (the envelope) match, and the box uses this attribute to show and handle corresponding characters.
It sounds like you receive data under one encoding, but CCSID file doesn't match.
Try changing CCSID on your file (only the envelope). E.G.: 37 (american), 500 (latin-1), 819 (utf-8), 850 (dos), 1252 (win) and display file after.You can check first using ls -Sla yourfile in QSH or QP2TERM, or EDTF as well. CHGATTR allows you to change CCSID, as well as setccsid in QSH (again).
This way helped me to find related issues. Remember that although data may be visible in the four hundred, they may not be visible through a share folder in Win. It means that CCSID file, an content encoding don't match.
Hope it helps.
Hi I've seen this error with XML data uploaded to AS400/iSeries/IBM i with FTP and the CCSID 819 (ISO 8859-1 ASCII) and it has some binary garbage in first few positions of file. Changed encoding to CCSID 1208 (UTF-8 with IBM PUA) using FTP "quote type c 1208" and the problem cleared and XML-INTO was successful.
So, suggestion about XML parser error 302 received when using XML-INTO is to look at the file (wrklnk ...) and if first character is not "<" but instead some binary garbage then try CCSID 1208 for utf-8.
Statements in this answer about what 819 is and what ccsid represents utf-8 do not agree with previous answer but are correct, according to IBM documentation:
https://www-01.ibm.com/software/globalization/ccsid/ccsid819.html
https://www-01.ibm.com/software/globalization/ccsid/ccsid1208.html
I'm working on this problem a couple hours,
for me the solution was use option ccsid=UCS2 when you use data structure or variable to store xml.
something like that :
XML-INTO customer %XML( xmlSource : 'ccsid=UCS2');
I have the program running on ccsid = 870, every conversion to ccsid on the xmlSource field don't work,
The strange thing that when I use the file with ccsid = 850, every thing work fine
I mention that becouse this is the first page when you looking about this problem.
Maybe this help someone.

Is it possible to read only first N bytes from the HTTP server using Linux command?

Here is the question.
Given the url http://www.example.com, can we read the first N bytes out of the page?
using wget, we can download the whole page.
using curl, there is -r, 0-499 specifies the first 500 bytes. Seems solve the problem.
You should also be aware that many HTTP/1.1 servers do not have this feature enabled, so that when you attempt to get a range, you'll instead get the whole document.
using urlib in python. similar question here, but according to Konstantin's comment, is that really true?
Last time I tried this technique it failed because it was actually impossible to read from the HTTP server only specified amount of data, i.e. you implicitly read all HTTP response and only then read first N bytes out of it. So at the end you ended up downloading the whole 1Gb malicious response.
So the problem is that how can we read the first N bytes from the HTTP server in practice?
Regards & Thanks
You can do it natively by the following curl command (no need to download the whole document). According to the curl man page:
RANGES
HTTP 1.1 introduced byte-ranges. Using this, a client can request to get only one or more subparts of a specified document. curl
supports this with the -r flag.
Get the first 100 bytes of a document:
curl -r 0-99 http://www.get.this/
Get the last 500 bytes of a document:
curl -r -500 http://www.get.this/
`curl` also supports simple ranges for FTP files as well.
Then you can only specify start and stop position.
Get the first 100 bytes of a document using FTP:
curl -r 0-99 ftp://www.get.this/README
It works for me even with a Java web app deployed to GigaSpaces.
curl <url> | head -c 499
or
curl <url> | dd bs=1 count=499
should do
Also there are simpler utils with perhaps borader availability like
netcat host 80 <<"HERE" | dd count=499 of=output.fragment
GET /urlpath/query?string=more&bloddy=stuff
HERE
Or
GET /urlpath/query?string=more&bloddy=stuff
You should also be aware that many
HTTP/1.1 servers do not have this
feature enabled, so that when you
attempt to get a range, you'll instead
get the whole document.
You will have to get the whole web anyways, so you can get the web with curl and pipe it to head, for example.
head
c, --bytes=[-]N
print the first N bytes of each file; with the leading '-', print all
but the last N bytes of each file
I came here looking for a way to time the server's processing time, which I thought I could measure by telling curl to stop downloading after 1 byte or something.
For me, the better solution turned out to be to do a HEAD request, since this usually lets the server process the request as normal but does not return any response body:
time curl --head <URL>
Make a socket connection. Read the bytes you want. Close, and you're done.

Resources