I cannot seem to make the script print out JUST the content viewed by the page
I would like this to be using sockets module. No other libraries like requests or urllib
I cannot really try much. So I instantly committed a sin and came here first ^^'
My code:
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("pastebin.com", 80))
sock.sendall(b"GET /raw/yWmuKZyb HTTP/1.1\r\nHost: pastebin.com\r\n\r\n")
r = sock.recv(4096).decode("utf-8")
print(r)
sock.close()
I want the printed result to be:
test
test1
test2
test3
but what I get is
HTTP/1.1 200 OK
Date: Tue, 09 Apr 2019 14:20:45 GMT
Content-Type: text/plain; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: __cfduid=xxx; expires=Wed, 08-Apr-20 14:20:45 GMT; path=/; domain=.pastebin.com; HttpOnly
Cache-Control: no-cache, must-revalidate
Pragma: no-cache
Expires: Sat, 26 Jul 1997 05:00:00 GMT
Vary: Accept-Encoding
X-XSS-Protection: 1; mode=block
CF-Cache-Status: MISS
Server: cloudflare
CF-RAY: 4c4d1f9f685ece41-LHR
19
test
test1
test2
test3
Just extract out the content after \r\r\n\n by using string.split and print it
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("pastebin.com", 80))
sock.sendall(b"GET /raw/yWmuKZyb HTTP/1.1\r\nHost: pastebin.com\r\n\r\n")
r = sock.recv(4096).decode("utf-8")
#Extract the content after splitting the string on \r\n\r\n
content_list = r.split('\r\n\r\n')[1].split('\r\n')
content = '\r\n'.join(content_list)
print(content)
#19
#test
#test1
#test2
#test3
sock.close()
You are doing a HTTP/1.1 request and therefore the web server might reply with a response body in chunked transfer encoding. In this mode each chunk is prefixed by the size in hexadecimal. You either need to implement this mode or you could simply do a HTTP/1.0 request in which case the server will not use chunked transfer encoding since this was only introduced with HTTP/1.1.
Anyway, if you don't want to use any existing libraries but do your own HTTP it is expected that you actually understand HTTP. Understanding means that you have read the relevant standards, because that's what standards are for. For HTTP/1.1 this is originally RFC 2616 which was later slightly reworked into RFC 7230-7235. Once you started reading these standards you likely appreciate that there are existing libraries which deal with these protocols, since these are far from trivial.
Related
I'm trying to upload a large file (over 10Gb) to Azure Blob Storage using SAS tokens.
I generate the tokens like this
val storageConnectionString = s"DefaultEndpointsProtocol=https;AccountName=${accountName};AccountKey=${accountKey}"
val storageAccount = CloudStorageAccount.parse(storageConnectionString)
val client = storageAccount.createCloudBlobClient()
val container = client.getContainerReference(CONTAINER_NAME)
val blockBlob = container.getBlockBlobReference(path)
val policy = new SharedAccessAccountPolicy()
policy.setPermissionsFromString("racwdlup")
val date = new Date().getTime();
val expiryDate = new Date(date + 8640000).getTime()
policy.setSharedAccessStartTime(new Date(date))
policy.setSharedAccessExpiryTime(new Date(expiryDate))
policy.setResourceTypeFromString("sco")
policy.setServiceFromString("bfqt")
val token = storageAccount.generateSharedAccessSignature(policy)
Then I tried the Put Blob API and hit the following error
$ curl -X PUT -H 'Content-Type: multipart/form-data' -H 'x-ms-date: 2020-09-04' -H 'x-ms-blob-type: BlockBlob' -F file=#10gb.csv https://ACCOUNT.blob.core.windows.net/CONTAINER/10gb.csv\?ss\=bfqt\&sig\=.... -v
< HTTP/1.1 413 The request body is too large and exceeds the maximum permissible limit.
< Content-Length: 290
< Content-Type: application/xml
< Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
< x-ms-request-id: f08a1473-301e-006a-4423-837a27000000
< x-ms-version: 2019-02-02
< x-ms-error-code: RequestBodyTooLarge
< Date: Sat, 05 Sep 2020 01:24:35 GMT
* HTTP error before end of send, stop sending
<
<?xml version="1.0" encoding="utf-8"?><Error><Code>RequestBodyTooLarge</Code><Message>The request body is too large and exceeds the maximum permissible limit.
RequestId:f08a1473-301e-006a-4423-837a27000000
* Closing connection 0
* TLSv1.2 (OUT), TLS alert, close notify (256):
Time:2020-09-05T01:24:35.7712576Z</Message><MaxLimit>268435456</MaxLimit></Error>%
After that tried uploading it using PageBlob (I saw in the documentation something like size can be up to 8 TiB)
$ curl -X PUT -H 'Content-Type: multipart/form-data' -H 'x-ms-date: 2020-09-04' -H 'x-ms-blob-type: PageBlob' -H 'x-ms-blob-content-length: 1099511627776' -F file=#10gb.csv https://ACCOUNT.blob.core.windows.net/CONTAINER/10gb.csv\?ss\=bfqt\&sig\=... -v
< HTTP/1.1 400 The value for one of the HTTP headers is not in the correct format.
< Content-Length: 331
< Content-Type: application/xml
< Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
< x-ms-request-id: b00d5c32-101e-0052-3125-83dee7000000
< x-ms-version: 2019-02-02
< x-ms-error-code: InvalidHeaderValue
< Date: Sat, 05 Sep 2020 01:42:24 GMT
* HTTP error before end of send, stop sending
<
<?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidHeaderValue</Code><Message>The value for one of the HTTP headers is not in the correct format.
RequestId:b00d5c32-101e-0052-3125-83dee7000000
* Closing connection 0
* TLSv1.2 (OUT), TLS alert, close notify (256):
Time:2020-09-05T01:42:24.5137237Z</Message><HeaderName>Content-Length</HeaderName><HeaderValue>10114368132</HeaderValue></Error>%
Not sure what is the proper way to go about uploading such large file?
Check the different blob types here: https://learn.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs
The page blob actually limits the maximum size to 8TB but it's optimal for for random read and write operation.
On the other hand:
Block blobs are optimized for uploading large amounts of data efficiently. Block blobs are comprised of blocks, each of which is identified by a block ID. A block blob can include up to 50,000 blocks.
So block blobs is the way to go as it supports sizes of up to
190.7 TB (preview mode)
Now you need to use the put block https://learn.microsoft.com/en-us/rest/api/storageservices/put-block to upload the blocks that will form your blob.
To copy large files to a blob you can use azcopy:
Authenticate first:
azcopy login
Then copy the file:
azcopy copy 'C:\myDirectory\myTextFile.txt' 'https://mystorageaccount.blob.core.windows.net/mycontainer/myTextFile.txt'
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs?toc=/azure/storage/blobs/toc.json
Warp has a settingsMaxTotalHeaderLength field which by default is 50*1024 : https://hackage.haskell.org/package/warp-3.3.10/docs/src/Network.Wai.Handler.Warp.Settings.html#defaultSettings
I suppose this means 50KB? But, when I try to send a header with ~33KB, server throws bad request:
curl -v -w '%{size_request} %{size_upload}' -H #temp.log localhost:8080/v1/version
Result:
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /v1/version HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.58.0
> Accept: */*
> myheader2: <big header snipped>
>
* HTTP 1.0, assume close after body
< HTTP/1.0 400 Bad Request
< Date: Wed, 22 Jul 2020 13:15:19 GMT
< Server: Warp/3.3.10
< Content-Type: text/plain; charset=utf-8
<
* Closing connection 0
Bad Request33098 0
(note that the request size is 33098)
Same thing works with 32.5KB header file.
My real problem is actually that I need to set settingsMaxTotalHeaderLength = 160000 to send a header of size ~55KB. Not sure if this is a bug or I am misreading something?
Congratulations, it looks like you found a bug in warp. In the definition of push, there's some double-counting going on. Near the top, bsLen is calculated as the complete length of the header so far, but further down in the push' Nothing case that adds newline-free chunks, the length is updated as:
len' = len + bsLen
when bsLen already accounts for len. There are similar problems in the other push' cases, where start and end are offsets into the complete header and so shouldn't be added to len.
I'm having trouble getting response text and a response body returned when I run the code below. The "HTTP/1.1 200 OK" message comes back along with response headers, but no response body. I've confirmed this result using Fiddler2 and also looking a netsh trace log.
Other URLs (http://real-chart.finance.yahoo.com/table.csv?s=CELG&d=6&e=26&f=2014&g=d&a=2&b=26&c=1990&ignore=.csv) for example, do return response text as well as a response body.
Why is there a problem with this URL and how can I get it to return a response body?
Sub testlogin()
fileUrl = "http://financials.morningstar.com/ajax/ReportProcess4CSV.html?t=XNYS:HFC®ion=USA&culture=en-US&productCode=COM&reportType=is&period=&dataType=A&order=desc&columnYear=5&rounding=3&view=raw"
Set WHTTP = CreateObject("WinHTTP.WinHTTPrequest.5.1")
WHTTP.Open "GET", fileUrl, False
WHTTP.Send
MsgBox WHTTP.Status
MsgBox WHTTP.ResponseText
MsgBox WHTTP.ResponseBody
MsgBox WHTTP.GetAllResponseHeaders
Set WHTTP = Nothing
End Sub
Have you studied those response headers that are returned by the GET calls to both URLs?
Morningstar is like this:
Cache-Control: max-age=0
Connection: keep-alive
Date: Sat, 26 Jul 2014 22:07:33 GMT
Pragma: no-cache
Content-Length: 0
===>> Content-Type: text/html;charset=UTF-8 <<===
===>> Content-Encoding: gzip <<===
Server: Apache
Set-Cookie: JSESSIONID=6FAF41A612ABB32B0C670AB07BF0D8A5; HttpOnly
Vary: User-Agent
Vary: Accept-Encoding
com.coradiant.appvis: vid=ad&sid=CONTROLLER_1&tid=da615c36-2a18-4129-bcd7-1cbb139ab52b
Content-Disposition: attachment;filename=""HFC Income Statement.csv""
ExpiresDefault: access plus 2 hours
Yahoo Finance is like this:
Cache-Control: private
Connection: close
Date: Sat, 26 Jul 2014 22:10:00 GMT
Transfer-Encoding: chunked
===>> Content-Type: text/csv <<===
P3P: policyref=""http://info.yahoo.com/w3c/p3p.xml"", CP=""CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC GOV""
Set-Cookie: B=d3svnbl9t89po&b=3&s=4i; expires=Tue, 26-Jul-2016 22:10:00 GMT; path=/; domain=.yahoo.com
Vary: Accept-Encoding
I've sort-of highlighted the Content-Type and Content-Encoding headers (where available).
Basically, the content returned is different for the two calls. Clearly, Excel can interpret the second case where the content type is "text/csv" but the first one is a strange gzipped html page that I guess Excel can't understand.
I can't possibly give you a solution to this issue, but the content of the headers could certainly explain the difference in behaviour you're seeing.
I'm running Varnish 3.0.2, Apache2, Pressflow. I am trying to get ESI up and working for the first time, and it does work, but only the first time the parent page is requested. After that it just pulls the parent page and the replaced content from the cache. The only thing I can think of is that the included content is being cached permanently, even though I am telling it to not cache the included file at all, here is the object for the included file being stored...
11 ObjProtocol c HTTP/1.1
11 ObjResponse c OK
11 ObjHeader c Date: Wed, 18 Jul 2012 23:25:56 GMT
11 ObjHeader c X-Powered-By: PHP/5.3.3-1ubuntu9.10
11 ObjHeader c Last-Modified: Wed, 18 Jul 2012 23:25:56 +0000
11 ObjHeader c Expires: Sun, 11 Mar 1984 12:00:00 GMT
11 ObjHeader c Vary: Cookie,Accept-Encoding
11 ObjHeader c ETag: "1342653956"
11 ObjHeader c Content-Encoding: gzip
11 ObjHeader c Content-Length: 656
11 ObjHeader c Content-Type: text/html
11 ObjHeader c Server: Apache/2.2.11
11 ObjHeader c Cache-Control: no-store
I've spent a full day on this, searched, read every article I can find, tried a whole heap of config tweaking, both in the VCL and in the HTTP headers. I can't see anything I'm doing wrong.
This is a snippet from my VCL, trying to force it to not store in the cache
sub vcl_fetch {
set beresp.do_esi = true;
if (req.url ~ "^/esi_") {
set beresp.http.Cache-Control = "no-store";
set beresp.ttl = 0s;
}
}
I would add that I am seeing nothing to inidicate errors in the varnishlog. I've tried using just the path, and host + path in the include src, but no difference. It simply won't ask the backend for fresh content. If you were looking at the logs for the second and subsequent requests, you wouldn't realise it was an ESI page.
provide in the sub vcl_recv {} something that tells varnish not to lookup for the request in the cache and define an additional http response element from your backend server, which is handled by a condition in vcl. e.g. "pragma: no-cache" ..
you might extend this condition in the vcl_recv with ~ "^/esi_" ..
sub vcl_recv(
# ...
# the rest goes here ..
# ...
if ((req.url ~ "^/esi_") && (req.http.pragma ~ "no-cache")) {
return (pass);
}
# ...
}
I'm using mod_security with the latest core rules.
It triggers on all my pages whenever I use a querystring.. ie.
www.mypage.com/index.php?querystring=1
I get a warning that it exceeds maximum allowed number of arguements, however the base config defines max_numb_args to = 255 which of course it doesn't exceed.
Any ideas why?
Base conf:
SecRuleEngine On
SecAuditEngine RelevantOnly
SecAuditLog /var/log/apache2/modsec_audit.log
SecDebugLog /var/log/apache2/modsec_debug_log
SecDebugLogLevel 3
SecDefaultAction "phase:2,pass,log,status:500"
SecRule REMOTE_ADDR "^127.0.0.1$" nolog,allow
SecRequestBodyAccess On
SecResponseBodyAccess On
SecResponseBodyMimeType (null) text/html text/plain text/xml
SecResponseBodyLimit 2621440
SecServerSignature Apache
SecUploadDir /tmp
SecUploadKeepFiles Off
SecAuditLogParts ABIFHZ
SecArgumentSeparator "&"
SecCookieFormat 0
SecRequestBodyInMemoryLimit 131072
SecDataDir /tmp
SecTmpDir /tmp
SecAuditLogStorageDir /var/log/apache2/audit
SecResponseBodyLimitAction ProcessPartial
SecAction "phase:1,t:none,nolog,pass,setvar:tx.max_num_args=255"
Rule that triggers:
# Maximum number of arguments in request limited
SecRule &TX:MAX_NUM_ARGS "#eq 1" "chain,phase:2,t:none,pass,nolog,auditlog,msg:'Maximum number of arguments in request reached',id:'960335',severity:'4',rev:'2.0.7'"
SecRule &ARGS "#gt %{tx.max_num_args}" "t:none,setvar:'tx.msg=%{rule.msg}',setvar:tx.anomaly_score=+%{tx.notice_anomaly_score},setvar:tx.policy_score=+%{tx.notice_anomaly_score},setvar:tx.%{rule.id}-POLICY/SIZE_LIMIT-%{matched_var_name}=%{matched_var}"
And the log ouput:
--ad5dc005-C--
queryString=2
--ad5dc005-F--
HTTP/1.1 200 OK
X-Powered-By: PHP/5.3
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: SESSION=ak19oq36gpi94rco2qbi6j2k20; path=/
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 1272
Keep-Alive: timeout=15, max=99
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8
--ad5dc005-H--
Message: Operator GT matched 0 at ARGS. [file "/etc/apache2/conf/modsecurity_crs/base_rules/modsecurity_crs_23_request_limits.conf"] [line "30"] [id "960335"] [rev "2.0.7"] [msg "Maximum number of arguments in request reached"] [severity "WARNING"]
Message: Operator GE matched 0 at TX:anomaly_score. [file "/etc/apache2/conf/modsecurity_crs/base_rules/modsecurity_crs_49_inbound_blocking.conf"] [line "18"] [msg "Inbound Anomaly Score Exceeded (Total Score: 5, SQLi=, XSS=): Maximum number of arguments in request reached"]
Message: Warning. Operator GE matched 0 at TX:inbound_anomaly_score. [file "/etc/apache2/conf/modsecurity_crs/base_rules/modsecurity_crs_60_correlation.conf"] [line "35"] [msg "Inbound Anomaly Score Exceeded (Total Inbound Score: 5, SQLi=, XSS=): Maximum number of arguments in request reached"]
Apache-Handler: application/x-httpd-php
Stopwatch: 1279667800315092 76979 (1546* 7522 72931)
Producer: ModSeurity for Apache/2.5.11 (http://www.modsecurity.org/); core ruleset/2.0.7.
Server: Apache
I was using the lib from Ubuntu.. which had the .11 version. I uninstalled it, compiled from source .12 version and now it's alive, kicking and screaming!
Latest CSR rules needs the .12 version. Cheers.