I am sending a POST request with sockets and trying to decode the received HTML and print to the terminal.
This works fine in my initial GET request but when I try to decode and print the POST request I just get garbled text.
How can I change my decode so that the text is readable?
POST
POST
body = "hash="+md5
headers = """\
POST / HTTP/1.1\r
Host: url.com:57555\r
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0\r
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r
Accept-Language: en-US,en;q=0.5\r
Accept-Encoding: gzip, deflate\r
Referer: http://url.com:57555/\r
Content-Type: application/x-www-form-urlencoded\r
Content-Length: 32\r
Connection: close\r
Cookie: PHPSESSID=some_cookie\r
Upgrade-Insecure-Requests: 1\r
\r\n"""
payload = headers + body
s.sendall(payload.encode('utf-8'))
res = s.recv(4096)
print(str(res, errors='replace'))
Result...
python3 emdee5.py
HTTP/1.1 200 OK
Date: Sun, 26 May 2019 22:01:26 GMT
Server: Apache/2.4.18 (Ubuntu)
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate
Pragma: no-cache
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 282
Connection: close
Content-Type: text/html; charset=UTF-8
]�1o� ���
ʒ��Ҩ��b�V��LN��
p�$����Py��d��FP��l� ^�֞i�ĜmA��F7i�zd}��VͩK8}ߠ���!�n�W>�wL9ۅr�#Ȑ����� 4i��ec{"%��0���)������W���A�I��"��GD�;�܉"J��JA}x��l1��3٠.y�>Om�#5��9
��ڨ�p�j����JN���MQ̀)�:�p�P{K���4J^-��+�7�oV'E;'=�����l�
Your request explicitly says that you are willing to accept a compressed response:
Accept-Encoding: gzip, deflate\r
And this is therefore what you get in the response
Content-Encoding: gzip
So, the body is compressed with gzip (which explains the garbled output) and you would need to decompress it. Given that you currently don't seem to be able to properly deal with compressed responses you should not claim in your request that you support these, i.e. remove the Accept-Encoding.
Apart from that more is likely wrong with your request:
body = "hash="+md5
...
Content-Length: 32\r
...
payload = headers + body
...
Given that md5 is 32 characters hex (or 16 byte binary) the body consisting of "hash"=+md5 is likely not 32 characters long as you claim in your Content-Length.
POST / HTTP/1.1\r
Additionally you send a HTTP/1.1 request so you have to be able to deal with chunked responses - but your code does not deal with these.
res = s.recv(4096)
Similarly your code blindly assumes that the full response can be retrieved within a single recv which does not need to be the case.
In summary: unless you have a deeper understanding of how HTTP works (which you do not seem to have) it is recommended that you use existing libraries to handle HTTP for you, since these were written by developers who have an understanding of HTTP.
And even if you already have an understanding of HTTP you'll likely will use these libraries anyway since you'll know that HTTP is far from trivial and that it makes no sense to implement all the necessary details and edge cases by yourself in your code if something robust already exists.
Related
I want to generate a file and upload it to disk like a file.
I unload as
ServletResponseUtil.sendFile(httpServletRequest, httpServletResponse, fileName, baos.toByteArray(), ContentTypes.APPLICATION_OCTET_STREAM);
As a result, the file is not uploaded as a file to disk, but to the browser page as
�� ࡱ �; �� V ���� ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Root Entry �������� �FW� Workbook ������������ !� SummaryInformation( ���� D
Responce headers
HTTP/1.1 200
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: private, no-cache, no-store, must-revalidate
Pragma: no-cache
Liferay-Portal: Liferay Community Edition Portal 7.2.0 CE GA1 (Mueller / Build 7200 / June 4, 2019)
Content-Disposition: attachment; filename="test.xls"
Content-Type: application/octet-stream;charset=UTF-8
Content-Length: 46080
Date: Tue, 24 Dec 2019 07:21:40 GMT
That most likely happens because you're running this code within an action- or render-method of your portlet. Portlets render only fragments of HTML pages that will always be embedded in a proper HTML page. If you mock with that by obtaining the HttpServletRequest and (more specifically) HttpServletResponse, you'll just mix the portal's rendered output with your own.
You should use the serveResource method to host your implementation, there you don't even need to revert back to the HttpServletResponse, but can just stream your output to the RenderResponse.
Even if you don't end up with mixed HTML/XLS content but see only the actual XLS output, you might interfere with Liferay's SPA loader, which you can assume knows about expected return values, e.g. when rendering, it expects a HTML portal page, not binary data. That will not be the case when you use the serveResource method, as it is expected to return arbitrary Mimetypes.
I've discovered a domain name (web site and API) which adds a header like this to each HTTP response:
XTVOpalL: Gtm; path=/; Max-Age=900
The header name looks random. Here are few other examples:
XRQOJalT: LtZ; path=/; Max-Age=900
XYjOzalA: Ntx; path=/; Max-Age=900
XykOMalm: ytD; path=/; Max-Age=900
Note the leading 4 spaces. And compare to other response headers:
HTTP/1.1 301 Moved Permanently
Date: Sat, 05 May 2018 11:52:25 GMT
Server: Apache
Location: http://example.com/wp/
Content-Length: 229
Content-Type: text/html; charset=iso-8859-1
Set-Cookie: visid_incap_993094=GuEL85vzTDKQUJ9jfphhgvma7VoAAAAAQUIPAAAAAACgWz3NlkG3smvkXeB6Ewyl; expires=Sun, 05 May 2019 08:21:45 GMT; path=/; Domain=.example.com
Set-Cookie: nlbi_993094=z0NWEcMl0wAVBr8CiwzebQAAAACu2KRRlrUCoWpyWKTrUAJF; path=/; Domain=.example.com
Set-Cookie: incap_ses_115_993094=/xoUXc5Kags3fAFBHpCYAfma7VoAAAAABT/i1XAh1J4D/02wGnXO9w==; path=/; Domain=.example.com
Set-Cookie: ___utmvmicuVtwf=peInjtBXhca; path=/; Max-Age=900
Set-Cookie: ___utmvaicuVtwf=wYxmyOU; path=/; Max-Age=900
Set-Cookie: ___utmvbicuVtwf=TZr
XYjOzalA: Ntx; path=/; Max-Age=900
X-Iinfo: 13-63374213-63374214 NNNN CT(222 -1 0) RT(1525521145044 0) q(0 0 2 0) r(5 5) U11
X-CDN: Incapsula
Main problem - this header sometimes is the first header in the response. Which, in turn, is considered a vulnerability.
In my case it looks like this:
HTTP/1.1 301 Moved Permanently
XYjOzalA: Ntx; path=/; Max-Age=900
Date: Sat, 05 May 2018 11:52:25 GMT
Server: Apache
Location: http://example.com/wp/
...
Quoting the RFC of HTTP 1.1 https://www.rfc-editor.org/rfc/rfc7230#section-3
A sender MUST NOT send whitespace between the start-line and the first header field.
...
The presence of such whitespace in a request might be an attempt to
trick a server into ignoring that field or processing the line after
it as a new request, either of which might result in a security
vulnerability if other implementations within the request chain
interpret the same message differently. Likewise, the presence of
such whitespace in a response might be ignored by some clients or
cause others to cease parsing.
This results in node.js throwing error trying to parse these HTTP responses. Error code is HPE_INVALID_HEADER_TOKEN, which is thrown only if HTTP headers are malformed.
Question: What is it? Who's doing it? Why?
"What is it?"
This is a bug in server side, as it violates HTTP protocol.
Actually, it was discussed in HTTP working group in 2013 for "a bug into python library", and I think the conclusion by Julian Reschke is correct:
It's not a legal field name, thus not a legal start of a header field line.
...
It's forbidden by the grammar, so it's invalid.
"Who's doing it? Why?"
When developer generate the random HTTP header name, he/she introduces this 4-whitespace leading characters, by accident.
EDIT Adding info:
requests version: 2.21.0
Server info: a Windows python implementation which includes 10 instances of threading.Thread, each creating HTTPServer with a handler based on BaseHTTPRequestHandler. My do_GET looks like this:
def do_GET(self):
rc = 'some response'
self.send_response(200)
self.send_header('Content-type', 'text/html')
self.send_header('Access-Control-Allow-Origin', '*')
self.end_headers()
self.wfile.write(rc.encode('utf-8'))
I'm getting a strange behaviour.
Using the curl command line, the GET command is finished quickly:
curl "http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=1"
However, using requests.get() of python takes too much time. I was isolated it up to
python -c "import requests; requests.get('http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=1')"
I scanned through many other questions here and have tried many things, without success.
Here are some of my findings:
If I'm adding timeout=0.2, the call is ending quickly without any error.
However, adding timeout=5 or timeout=(5,5)` doesn't make it take longer. It always seem to be waiting a full one second before returning with results.
Working with a session wrapper, and cancelling keep-alive, didn't improve. I mean for this:
with requests.Session() as session:
session.headers.update({'Connection': 'close'})
url = "http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=%d&tmr=0" % i
session.get(url, timeout=2)
Enabling full debug, I'm getting the following output:
url=http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=0
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:3020
send: b'GET /pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=0 HTTP/1.1\r\nHost: localhost:3020\r\nUser-Agent: python-requests/2.21.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.0 200 OK\r\n'
header: Server: BaseHTTP/0.6 Python/3.7.2
header: Date: Wed, 01 May 2019 15:28:29 GMT
header: Content-type: text/html
header: Access-Control-Allow-Origin: *
DEBUG:urllib3.connectionpool:http://localhost:3020 "GET /pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=0 HTTP/1.1" 200 None
url=http://localhost:3020/pbio/powermtr?cmd=read-power-density
DEBUG:urllib3.connectionpool:Resetting dropped connection: localhost
slight pause here
send: b'GET /pbio/powermtr?cmd=read-power-density HTTP/1.1\r\nHost: localhost:3020\r\nUser-Agent: python-requests/2.21.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.0 200 OK\r\n'
header: Server: BaseHTTP/0.6 Python/3.7.2
header: Date: Wed, 01 May 2019 15:28:30 GMT
header: Content-type: text/html
header: Access-Control-Allow-Origin: *
DEBUG:urllib3.connectionpool:http://localhost:3020 "GET /pbio/powermtr?cmd=read-power-density HTTP/1.1" 200 None
6.710,i=4
url=http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=4&tmr=0
DEBUG:urllib3.connectionpool:Resetting dropped connection: localhost
slight pause here
...
From the docs:
timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds). If no timeout is specified explicitly, requests do not time out.
It took me 3 years to find an answer.
I still do not understand why, but at least I can suggest a working solution.
According to these docs, the timeout can be specified as a tuple, like this:
(timeout for connection, timeout for interval without data)
Although I do not understand why requests is waiting for [timeout] before issuing the connection, I can tell it to wait very little for the connection, and specify another timeout for the data.
So what I'm doing now, is giving a timeout of let's say (0.01, 4). Now the connection is immediate, and if the data has a deadtime of 4 seconds, it will generate a timeout exception.
Some interesting reading can be found here.
Hoping this info will help others!
I have extracted the complete response header and stored into a variable and below is the response header,
HTTP/1.1 200 OK
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Content-Length: 2951
Content-Type: application/json
Date: Tue, 17 Apr 2018 06:34:03 GMT
Expires: 0
Pragma: no-cache
Strict-Transport-Security: max-age=3175000 ; includeSubDomains
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-Vcap-Request-Id: de1csd11-2721-de3a-548e-573sdt3fae22
X-Xss-Protection: 1; mode=block
Connection: close
Now i want to calculate the size of the above parsed data using Beanshell or Groovy or Javascript.
I used the function "variable.length()" but it did not help me as the parsed data have new line feed.
Can anyone please help me ?
Thanks in advance.
Regards,
Hari
For Groovy/Beanshell:
variable.replaceAll("\\n","").size()
For JavaScript:
variable.replace(/(?:\r\n|\r|\n)/g, '').length
Be aware that when it comes to high loads Groovy performance will be much better.
References:
JMeter Best Practices
Apache Groovy - Why and How You Should Use It
I am evaluating ArangoDB using spring batch.
I tried to insert some data and, without batch mode, it works as expected.
However, if batch mode is on, the execution of the program hangs.
I am using arango 2.3.3 and com.arangodb:arangodb-java-driver:[2.2-SNAPSHOT,2.2]
arangoDriver.startBatchMode();
for(Account acc : items){
acc.getRecordHash();
acc.getIdHash();
arangoDriver.createDocument("AccountCollection", acc);
}
arangoDriver.executeBatch();
Any ideas what I am doing wrong?
I tried to reproduce what you are trying, first of all does the collection "AccountCollection" exist ? If not you would get an error in the batch result but still the program should not hang. i created a unittest :
#Test
public void test_StartCancelExecuteBatchMode() throws ArangoException {
driver.startBatchMode();
ArrayList<Account> items = new ArrayList<Account>();
items.add(new Account());
items.add(new Account());
items.add(new Account());
items.add(new Account());
for(Account acc : items){
acc.getRecordHash();
acc.getIdHash();
driver.createDocument("AccountCollection", acc, true, false);
}
driver.executeBatch();
}
This works perfectly and returns:
EOB
16:47:01.862 [main] DEBUG com.arangodb.http.HttpManager - [RES]http-POST: statusCode=200
16:47:01.862 [main] DEBUG com.arangodb.http.HttpManager - [RES]http-POST: text=--dlmtrMLTPRT
Content-Type: application/x-arango-batchpart
Content-Id: request1
HTTP/1.1 202 Accepted
Location: /_db/unitTestDatabase/_api/document/AccountCollection/48033214501
Content-Type: application/json; charset=utf-8
Etag: "48033214501"
Content-Length: 95
{"error":false,"_id":"AccountCollection/48033214501","_rev":"48033214501","_key":"48033214501"}
--dlmtrMLTPRT
Content-Type: application/x-arango-batchpart
Content-Id: request2
HTTP/1.1 202 Accepted
Location: /_db/unitTestDatabase/_api/document/AccountCollection/48033411109
Content-Type: application/json; charset=utf-8
Etag: "48033411109"
Content-Length: 95
{"error":false,"_id":"AccountCollection/48033411109","_rev":"48033411109","_key":"48033411109"}
--dlmtrMLTPRT
Content-Type: application/x-arango-batchpart
Content-Id: request3
HTTP/1.1 202 Accepted
Location: /_db/unitTestDatabase/_api/document/AccountCollection/48033607717
Content-Type: application/json; charset=utf-8
Etag: "48033607717"
Content-Length: 95
{"error":false,"_id":"AccountCollection/48033607717","_rev":"48033607717","_key":"48033607717"}
--dlmtrMLTPRT
Content-Type: application/x-arango-batchpart
Content-Id: request4
HTTP/1.1 202 Accepted
Location: /_db/unitTestDatabase/_api/document/AccountCollection/48033804325
Content-Type: application/json; charset=utf-8
Etag: "48033804325"
Content-Length: 95
{"error":false,"_id":"AccountCollection/48033804325","_rev":"48033804325","_key":"48033804325"}
--dlmtrMLTPRT--
But even when i create intentional errors the application never "hangs".
Frank just sent me your source code, i take a look into it. Can you try to find out where the programm is hanging ? is "executeBatch" reached at all ?
I already imported 1.6 Mio documents with your code and still everything works.
i guess it might be necessary to monitor your system resources during the import, if anything unusual occurs let us now. Generally speaking it does not seem to be the best practice to perform a one-time bulk import like this using the java api. i would recommend to use arangoimp to import the data directly into the database, this will be much faster. it is documented here
You need to increase the number of open file descriptors. The Mac has a very low limit (256). ArangoDB stores the data in datafiles of a certain chunk size. With large datasets more files are need (and some fd are already used for communication and other stuff).
When ArangoDB runs out of file descriptors, it can neither extend the dataset nor answer new questions. Therefore the import process will hang.