Batch Requests into ArangoDB failing - arangodb

I am trying to import many thousands of records into Arango. I am attempting to use the batch/bulk import feature of Arango described at: https://docs.arangodb.com/3.0/HTTP/BatchRequest/index.html to do a combination of PUT and POST requests to either insert new records, or update existing records if they already exist. My end solution needs to run from a Python script, presumably using pyArango. I have created a sample HTTP request
POST http://<arango_server>:8529/_db/myDB/_api/batch
that looks something like the following:
Content-Type: multipart/form-data; boundary=P1X7QNCB
Content-Length: <calculated by python or REST Client>
Authorization: Basic <calculated by python requests session or REST Client>
--P1X7QNCB
Content-type: application/x-arango-batchpart
Content-Id: 1
POST /_api/document/model/foo HTTP/1.1
{"data": "bar"}
--P1X7QNCB
I have not been able to get this to process successfully in Arango. I have tried using python similar to the following (that generates the above request, even if my approximation of the code below has typos):
url = "/_api/document/" + collection + "/" + nodeKey + " HTTP/1.1"
postString = ("--P1X7QNCB\r\n"
"Content-type: application/x-arango-batchpart\r\n"
"Content-Id: " + str(counter) + "\r\n"
"\r\n"
"\r\n"
"PUT " + url+ "\r\n\r\n\r\n" + json.dumps(nodeData) + "\r\n")
batchHeaders = {"Content-Type": "multipart/form-data; boundary=P1X7QNCB"}
response = self.db.connection.session.post(self.db.URL + "/batch", data=postString, headers=batchHeaders)
and using a REST client where I manually post the content. In both cases I get the following response back:
{"error":true,"errorMessage":"invalid multipart message received","code":400,"errorNum":400}
And the following is logged in the arango log file:
WARNING received a corrupted multipart message
Is it obvious to anyone what I am doing wrong, or where I can look for more details on why ArangoDB is rejecting the requests?
Thanks!

ArangoDB will throw this error when it tries to extract the next part of a multipart mime container and fails to.
You should inspect your boundary strings, and check that the last string properly terminates the container with two trailing dashes (--)
NGrep or Wireshark tend to be very usefull to inspect whats really sent by programs - it may sometimes not be what you think - or even get samples how to do it from other programs.

Related

Malformed XML response with manual newline characters

I am using a GET request to fetch what I expect to be an XML document from an endpoint. The response has the following structure:
' <itunes:explicit>clean</itunes:explicit>\n' +
' <itunes:episode>11</itunes:episode>\n' +
' <itunes:episodeType>full</itunes:episodeType>\n' +
(This is from a console log in a Node.js function).
I haven't encountered a response like this before and am having trouble doing anything useful with it. I've tried:
Changing the response type and encoding of my GET function
Parsing the response with an XML parser - this throws an error
Removing the newline and + characters manually with regex (I'd like to avoid this, but it doesn't seem to work anyway)
It's worth saying that the response looks as you'd expect in a browser window:
Am I missing something fundamental about how this data is encoded / structured and what is the best way to turn it into something I can work with?
Rookie error. In case anyone else stumbles across this; I expected the response from my Axios GET request to be the xml. The response was actually in a data property in the response:
const response = await axios.get(url);
const myXML = response.data;

Python Client Rest API Invocation - Invalid character found in method name [{}POST]. HTTP method names must be tokens

Client
Python Version - 3.9,
Python Requests module version - 2.25
Server
Java 13,
Tomcat 9.
I have a Tomcat+Java based server exposing REST APIs. I am writing a client in python to consume those APIs. Everything is fine until I send empty body in POST request. It is a valid use case for us. If I send empty body I get 400 bad request error - Invalid character found in method name [{}POST]. HTTP method names must be tokens. If I send empty request from POSTMAN or Java or CURL it works fine, problem is only when I used python as a client.
Following is python snippet -
json_object={}
header = {'alias': 'A', 'Content-Type' : 'application/json', 'Content-Length' : '0'}
resp = requests.post(url, auth=(username, password), headers=header, json=json_object)
I tried using data as well instead of json param to send payload with not much of success.
I captured the wireshark dumps to undertand it further and found that, the request tomcat received is not as per RFC2616 (https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html). Especially the part -
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
Because I could see in from wireshark dumps it looked like - {}POST MY-APP-URI HTTP/1.1
As we can see the empty body is getting prefixed with http-method, hence tomcat reports that as an error.
I then looked at python http library code -client.py. Following are relevant details -
File - client.py
Method - _send_output (starting at line # 1001) - It first sends the header at line #1010 and then the body somewhere down in the code. I thought(I could be wrong here) perhaps in this case header is way longer 310 bytes than body 2 bytes, so by the time complete header is sent on wire body is pushed and hence TCP frames are order in such a way that body appears first. To corroborate this I added a delay of 1 second just after sending header line#1011 and bingo, the error disappeared and it started working fine. Not sure if this is completely correct analysis, but can someone in the know can confirm or let me know how to fix this.

Adding message to gmail error "Payload parts count different from expected"

I am adding a message to a gmail folder using this (example) URL:
https://www.googleapis.com/gmail/v1/users/user#domain.com/messages/import?uploadType=multipart
The body of the request looks like this:
--test_abc123
Content-Type: application/json; charset=UTF-8
{
"labelIds": [ "Label_525" ],
"raw": "RnJvbTogIlNlY3RpZ28gQ2VydGlmaWNh..."
}
--test_abc123--
The raw data is a base64 encoded standard MIME message that looks normal to me. The result of this POST is http error 400 with the error response "Payload parts count different from expected 2. Request payload parts count: 1".
I can supply the original MIME text if that is helpful, but let me emphasize that I have been running this code for several years without problem. I've tried different messages to test this out, but it appears that Google has changed something to break my software.
Is Google objecting to my raw data, or something about the MIME encoding? Any ideas what the problem could be?
---- Addendum ----
I have gotten a few messages to work, they seem to all have image or data attachments. However I really don't see any problem with the messages that are failing - I can import them into Office 365 or Thunderbird or anything else and they render just fine. As a test, I tried importing the message below, which was taken from the MIME RFC. It fails with the same error. I think that Google has changed something to make their MIME parser very fussy, but I don't see how I can fix my input data.
From: Nathaniel Borenstein <nsb#bellcore.com>
To: Ned Freed <ned#innosoft.com>
Subject: Sample message
MIME-Version: 1.0
Content-type: multipart/mixed; boundary="simple boundary"
This is the preamble. It is to be ignored, though it
is a handy place for mail composers to include an
explanatory note to non-MIME compliant readers.
--simple boundary
This is implicitly typed plain ASCII text.
It does NOT end with a linebreak.
--simple boundary
Content-type: text/plain; charset=us-ascii
This is explicitly typed plain ASCII text.
It DOES end with a linebreak.
--simple boundary--
This is the epilogue. It is also to be ignored.
Addendum 2: I tried a simple upload (using content-type header message/rfc822) and it worked, except the message was unlabeled. How
would I specify what label I want applied to a message? I was originally trying to follow the documentation here
link
which tells me to create the json body that I gave above. This allows me to specify the label. But I cannot seem to use
this body in a simple upload. The content type is either invalid, or what Gmail imports is just literally the json body,
it does not parse out the raw data. If you could point me to a specific example showing the URI, message body, http headers
(not java code) that would be very useful to me.
OK never mind, I got it working by adding an empty message/rfc822 part to the body of the multipart upload. That satisfies Google, and the empty part is ignored in favor of the raw data.
You are doing a multipart upload,see here:
The body of the request is formatted as a multipart/related content
type [RFC2387] and contains exactly two parts. The parts are
identified by a boundary string, and the final boundary string is
followed by two hyphens.
This is why it works only for your messages with images or attachments, since your message
--test_abc123
is only one part.
In the past there was no check if this condition is fulfilled, so you might have gotten away with using multipart for 1-part-messages.
But now it's not possible anymore, so if have a single-part message, you should use Simple upload.
If you do not know in advance how many parts your message has, you can always try the multipart first, implementing a try...catch statement, and implement a simple upload request within catch in case of failure.

Python requests: chunked post request

I am trying to send a post request through the request module with headers["Transfer-encoding"] = "chunked", but I am getting back:
<BODY><h2>Bad Request - Invalid Content Length</h2><hr><p>HTTP Error 400. There is an invalid content length or chunk length in the request.</p>
I am sending a json string. headers["Content-Type"] = "application/json" is also given.
Does anybody know if I am missing some setting? Maybe I should set the chunk-size somewhere?
Analysing the headers of the request attached to the response I actually get a content-length header different from zero.
I also tried to create a custom generator from the json string, and pass it to the post method as data=, but it it seems to simply hang there (also above the given timeout=).
Your error says you didn't create the request properly (it's 4xx error, not 5xx which would indicate server issue).
Transfer-Encoding: chunked serves for sending data in chunks. When the body of your message consists of unspecified number of chunks and you send them in lets say - stream. I would suggest reading this.
Each chunk should have it's size in front of the data. For instance:
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
9\r\n
Some data\r\n
6\r\n
Python\r\n
If you want to send chunked requests with python requests module. You probably need a generator method for that. Please see this. With such few information I can't help you more.

how to properly call a REST-API with an * in the URL

i searched the internet (and stackoverflow :D) to find an answer for the following question - and found none that i understood.
background:
we want to use a python script to connect our companies CMDB with our AWX/Ansible infrastructure.
the CMDB has a REST API which supports a (halfway) proper export.
i'm currently stuck with the implementation of the correct API call.
i can call the API itself and authenticate, but i can't call the proper filter to get the results i need.
the filter is realized by having the following string within the URL (more in the attached code example)
Label LIKE "host*"
it seems that python has a problem with the *.
error message:
InvalidURL(f"URL can't contain control characters. {url!r} "
I found some bug reports that there is an issue within some python versions, but i'm way to new to properly understand if this affects me here :D
used python version 3.7.4
PS: let's see if i can get the markup right :D
i switched the called URL to determine where exactly the problem occurs.
it only occurs when i use the SQL like filter part.
this part is essential since i just want our "hosts" to be returned and not the whole CMDB itself.
#import the required classes and such
from http.client import HTTPConnection
import json
#create a HTTP connection client
client = HTTPConnection("cmdb.example.company")
#basic auth and some header details
headers = {'Content-Type': 'application/json',
'Authorization' : 'Basic my-auth-token'}
#working API call
client.request('GET', '/cmdb/rest/hosts?attributes=Label,Keywords,Tag,Description&limit=10', headers=headers)
#broken API call returns - InvalidURL(f"URL can't contain control characters. {url!r} "
client.request('GET', '/cmdb/rest/hosts?filter=Label LIKE "host*"&attributes=Label,Keywords,Tag,Description&limit=10', headers=headers)
#check and convert the response into a readable (JSON) format
response = client.getresponse()
data = response.read()
#debugging print - show that the returned data is bytes?!
print(data)
#convert the returned data into json
my_json = data.decode('utf8').replace("'", '"')
data = json.loads(my_json)
#only return the data part from the JSON and ignore the meta-overhead
text = json.dumps(data["data"], sort_keys=True, indent=4)
print(text)
so, i want to know how to properly call the API with the described filter and resolve the displayed error.
can you give me an example i can try or pin-point a beginners mistake i made?
am i affected by the mentioned python bug regarding the URL call with * in it?
thanks for helping me out :)
soooo i found my beginners mistake myself:
i used the URL from my browser - and my browser automaticly encodes the special characters within the URL.
i found the following piece of code within Python3 URL encoding guide and modified the string to fit my needs :)
import urllib.parse
query = ' "host*"'
urllib.parse.quote(query)
'%20%22host%2A%22'
Result: '%20%22host%2A%22'
%20 = " "
%22 = " " "
%2A = "*"
so the final code looks somewhat like this:
#broken API call returns - InvalidURL(f"URL can't contain control characters. {url!r} "
client.request('GET', '/cmdb/rest/hosts?filter=Label LIKE "host*"&attributes=Label,Keywords,Tag,Description&limit=10', headers=headers)
filter=Label LIKE "host*"
#fixed API call
client.request('GET', '/cmdb/rest/hosts?filter=Label%20LIKE%20%22host%2A%22&attributes=Label,Keywords,Tag,Description&limit=10', headers=headers)
filter=Label%20LIKE%20%22host%2A%22

Resources