Why emojis in Raw email are in Hexadecimal and how to decode it in Python

Why emojis in Raw email are in Hexadecimal and how to decode it in Python - python-3.x

We are working on a service that will parse a user's email. The following is a example of a raw email with emoji. This Hexadeciaml character in the email =F0=9F=98=97 is an emoji 🙂. I could verified it from here.
Why are the emojis coming this way? and is there any way to parse it in python3?
I found a way to parse it with the help of bytes after manually removing the = symbols. It works.
bytes.fromhex('F0 9F 99 82').decode('utf-8')
Is there any other way to handle this in Python3?
Thanks in advance
Example of a raw email:
MIME-Version: 1.0
Date: Wed, 22 Sep 2021 18:45:41 +0530
References: <CAFsQotqyCTbnR7ANDZX9oHYtHwtSf-im8pNj6N9pMXytbn+kbw#mail.gmail.com>
In-Reply-To: <CAFsQotqyCTbnR7ANDZX9oHYtHwtSf-im8pNj6N9pMXytbn+kbw#mail.gmail.com>
Message-ID: <CAAqby4THk2GSGbD-RXy5-6bwaEKHuBeWARMjUDRVvi+4_OTFVg#mail.gmail.com>
Subject: Re: wowoww
From: email1 email1 <email1#gmail.com>
To: email2 email2 <email2#gmail.com>
Content-Type: multipart/related; boundary="000000000000a8da4205cc954fc4"
--000000000000a8da4205cc954fc4
Content-Type: multipart/alternative; boundary="000000000000a8da4105cc954fc3"
--000000000000a8da4105cc954fc3
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
AmsterDam
On Tue, Aug 31, 2021 at 12:11 PM email1 email1 <email2#gmail.com>
wrote:
> [image: unarchive.png]
> =F0=9F=99=82=E2=98=BA
> fsdfsdf
> sdf
> ds
> f
> sdf
> ds
> f
>
>
>
--000000000000a8da4105cc954fc3
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">AmsterDam<br></div><br><div class=3D"gmail_quote"><div dir=
=3D"ltr" class=3D"gmail_attr">On Tue, Aug 31, 2021 at 12:11 PM email1 email1=
i <<a href=3D"mailto:email2#gmail.com">email2#gmail.com</a=
>> wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px=
0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><di=
v dir=3D"ltr"><img src=3D"cid:ii_kszpbrxo0" alt=3D"unarchive.png" width=3D"=
566" height=3D"283"><br><div>=F0=9F=99=82=E2=98=BA</div><div>fsdfsdf <br></=
div><div>sdf</div><div>ds</div><div>f</div><div>sdf</div><div>ds</div><div>=
f</div><div><br></div><br></div>
</blockquote></div>
--000000000000a8da4105cc954fc3--
--000000000000a8da4205cc954fc4
Content-Type: image/png; name="unarchive.png"
Content-Disposition: attachment; filename="unarchive.png"
Content-Transfer-Encoding: base64
X-Attachment-Id: ii_kszpbrxo0
Content-ID: <ii_kszpbrxo0>
--000000000000a8da4205cc954fc4--

This is in MIME format, which is very common for email. You'll need to parse the email with a tool like email.parser from the standard library. It will take care of decoding this format into normal strings.

Related

How to upload a 10 Gb file using SAS token

I'm trying to upload a large file (over 10Gb) to Azure Blob Storage using SAS tokens.
I generate the tokens like this
val storageConnectionString = s"DefaultEndpointsProtocol=https;AccountName=${accountName};AccountKey=${accountKey}"
val storageAccount = CloudStorageAccount.parse(storageConnectionString)
val client = storageAccount.createCloudBlobClient()
val container = client.getContainerReference(CONTAINER_NAME)
val blockBlob = container.getBlockBlobReference(path)
val policy = new SharedAccessAccountPolicy()
policy.setPermissionsFromString("racwdlup")
val date = new Date().getTime();
val expiryDate = new Date(date + 8640000).getTime()
policy.setSharedAccessStartTime(new Date(date))
policy.setSharedAccessExpiryTime(new Date(expiryDate))
policy.setResourceTypeFromString("sco")
policy.setServiceFromString("bfqt")
val token = storageAccount.generateSharedAccessSignature(policy)
Then I tried the Put Blob API and hit the following error
$ curl -X PUT -H 'Content-Type: multipart/form-data' -H 'x-ms-date: 2020-09-04' -H 'x-ms-blob-type: BlockBlob' -F file=#10gb.csv https://ACCOUNT.blob.core.windows.net/CONTAINER/10gb.csv\?ss\=bfqt\&sig\=.... -v
< HTTP/1.1 413 The request body is too large and exceeds the maximum permissible limit.
< Content-Length: 290
< Content-Type: application/xml
< Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
< x-ms-request-id: f08a1473-301e-006a-4423-837a27000000
< x-ms-version: 2019-02-02
< x-ms-error-code: RequestBodyTooLarge
< Date: Sat, 05 Sep 2020 01:24:35 GMT
* HTTP error before end of send, stop sending
<
<?xml version="1.0" encoding="utf-8"?><Error><Code>RequestBodyTooLarge</Code><Message>The request body is too large and exceeds the maximum permissible limit.
RequestId:f08a1473-301e-006a-4423-837a27000000
* Closing connection 0
* TLSv1.2 (OUT), TLS alert, close notify (256):
Time:2020-09-05T01:24:35.7712576Z</Message><MaxLimit>268435456</MaxLimit></Error>%
After that tried uploading it using PageBlob (I saw in the documentation something like size can be up to 8 TiB)
$ curl -X PUT -H 'Content-Type: multipart/form-data' -H 'x-ms-date: 2020-09-04' -H 'x-ms-blob-type: PageBlob' -H 'x-ms-blob-content-length: 1099511627776' -F file=#10gb.csv https://ACCOUNT.blob.core.windows.net/CONTAINER/10gb.csv\?ss\=bfqt\&sig\=... -v
< HTTP/1.1 400 The value for one of the HTTP headers is not in the correct format.
< Content-Length: 331
< Content-Type: application/xml
< Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
< x-ms-request-id: b00d5c32-101e-0052-3125-83dee7000000
< x-ms-version: 2019-02-02
< x-ms-error-code: InvalidHeaderValue
< Date: Sat, 05 Sep 2020 01:42:24 GMT
* HTTP error before end of send, stop sending
<
<?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidHeaderValue</Code><Message>The value for one of the HTTP headers is not in the correct format.
RequestId:b00d5c32-101e-0052-3125-83dee7000000
* Closing connection 0
* TLSv1.2 (OUT), TLS alert, close notify (256):
Time:2020-09-05T01:42:24.5137237Z</Message><HeaderName>Content-Length</HeaderName><HeaderValue>10114368132</HeaderValue></Error>%
Not sure what is the proper way to go about uploading such large file?

Check the different blob types here: https://learn.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs
The page blob actually limits the maximum size to 8TB but it's optimal for for random read and write operation.
On the other hand:
Block blobs are optimized for uploading large amounts of data efficiently. Block blobs are comprised of blocks, each of which is identified by a block ID. A block blob can include up to 50,000 blocks.
So block blobs is the way to go as it supports sizes of up to
190.7 TB (preview mode)
Now you need to use the put block https://learn.microsoft.com/en-us/rest/api/storageservices/put-block to upload the blocks that will form your blob.

To copy large files to a blob you can use azcopy:
Authenticate first:
azcopy login
Then copy the file:
azcopy copy 'C:\myDirectory\myTextFile.txt' 'https://mystorageaccount.blob.core.windows.net/mycontainer/myTextFile.txt'
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs?toc=/azure/storage/blobs/toc.json

How can I view raw content with HTTP request?

I cannot seem to make the script print out JUST the content viewed by the page
I would like this to be using sockets module. No other libraries like requests or urllib
I cannot really try much. So I instantly committed a sin and came here first ^^'
My code:
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("pastebin.com", 80))
sock.sendall(b"GET /raw/yWmuKZyb HTTP/1.1\r\nHost: pastebin.com\r\n\r\n")
r = sock.recv(4096).decode("utf-8")
print(r)
sock.close()
I want the printed result to be:
test
test1
test2
test3
but what I get is
HTTP/1.1 200 OK
Date: Tue, 09 Apr 2019 14:20:45 GMT
Content-Type: text/plain; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: __cfduid=xxx; expires=Wed, 08-Apr-20 14:20:45 GMT; path=/; domain=.pastebin.com; HttpOnly
Cache-Control: no-cache, must-revalidate
Pragma: no-cache
Expires: Sat, 26 Jul 1997 05:00:00 GMT
Vary: Accept-Encoding
X-XSS-Protection: 1; mode=block
CF-Cache-Status: MISS
Server: cloudflare
CF-RAY: 4c4d1f9f685ece41-LHR
19
test
test1
test2
test3

Just extract out the content after \r\r\n\n by using string.split and print it
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("pastebin.com", 80))
sock.sendall(b"GET /raw/yWmuKZyb HTTP/1.1\r\nHost: pastebin.com\r\n\r\n")
r = sock.recv(4096).decode("utf-8")
#Extract the content after splitting the string on \r\n\r\n
content_list = r.split('\r\n\r\n')[1].split('\r\n')
content = '\r\n'.join(content_list)
print(content)
#19
#test
#test1
#test2
#test3
sock.close()

You are doing a HTTP/1.1 request and therefore the web server might reply with a response body in chunked transfer encoding. In this mode each chunk is prefixed by the size in hexadecimal. You either need to implement this mode or you could simply do a HTTP/1.0 request in which case the server will not use chunked transfer encoding since this was only introduced with HTTP/1.1.
Anyway, if you don't want to use any existing libraries but do your own HTTP it is expected that you actually understand HTTP. Understanding means that you have read the relevant standards, because that's what standards are for. For HTTP/1.1 this is originally RFC 2616 which was later slightly reworked into RFC 7230-7235. Once you started reading these standards you likely appreciate that there are existing libraries which deal with these protocols, since these are far from trivial.

WinHTTPRequest Returning Empty Response Text and Body

I'm having trouble getting response text and a response body returned when I run the code below. The "HTTP/1.1 200 OK" message comes back along with response headers, but no response body. I've confirmed this result using Fiddler2 and also looking a netsh trace log.
Other URLs (http://real-chart.finance.yahoo.com/table.csv?s=CELG&d=6&e=26&f=2014&g=d&a=2&b=26&c=1990&ignore=.csv) for example, do return response text as well as a response body.
Why is there a problem with this URL and how can I get it to return a response body?
Sub testlogin()
fileUrl = "http://financials.morningstar.com/ajax/ReportProcess4CSV.html?t=XNYS:HFC&region=USA&culture=en-US&productCode=COM&reportType=is&period=&dataType=A&order=desc&columnYear=5&rounding=3&view=raw"
Set WHTTP = CreateObject("WinHTTP.WinHTTPrequest.5.1")
WHTTP.Open "GET", fileUrl, False
WHTTP.Send
MsgBox WHTTP.Status
MsgBox WHTTP.ResponseText
MsgBox WHTTP.ResponseBody
MsgBox WHTTP.GetAllResponseHeaders
Set WHTTP = Nothing
End Sub

Have you studied those response headers that are returned by the GET calls to both URLs?
Morningstar is like this:
Cache-Control: max-age=0
Connection: keep-alive
Date: Sat, 26 Jul 2014 22:07:33 GMT
Pragma: no-cache
Content-Length: 0
===>> Content-Type: text/html;charset=UTF-8 <<===
===>> Content-Encoding: gzip <<===
Server: Apache
Set-Cookie: JSESSIONID=6FAF41A612ABB32B0C670AB07BF0D8A5; HttpOnly
Vary: User-Agent
Vary: Accept-Encoding
com.coradiant.appvis: vid=ad&sid=CONTROLLER_1&tid=da615c36-2a18-4129-bcd7-1cbb139ab52b
Content-Disposition: attachment;filename=""HFC Income Statement.csv""
ExpiresDefault: access plus 2 hours
Yahoo Finance is like this:
Cache-Control: private
Connection: close
Date: Sat, 26 Jul 2014 22:10:00 GMT
Transfer-Encoding: chunked
===>> Content-Type: text/csv <<===
P3P: policyref=""http://info.yahoo.com/w3c/p3p.xml"", CP=""CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC GOV""
Set-Cookie: B=d3svnbl9t89po&b=3&s=4i; expires=Tue, 26-Jul-2016 22:10:00 GMT; path=/; domain=.yahoo.com
Vary: Accept-Encoding
I've sort-of highlighted the Content-Type and Content-Encoding headers (where available).
Basically, the content returned is different for the two calls. Clearly, Excel can interpret the second case where the content type is "text/csv" but the first one is a strange gzipped html page that I guess Excel can't understand.
I can't possibly give you a solution to this issue, but the content of the headers could certainly explain the difference in behaviour you're seeing.

Varnish ESI include only works when parent page is fetched from backend

I'm running Varnish 3.0.2, Apache2, Pressflow. I am trying to get ESI up and working for the first time, and it does work, but only the first time the parent page is requested. After that it just pulls the parent page and the replaced content from the cache. The only thing I can think of is that the included content is being cached permanently, even though I am telling it to not cache the included file at all, here is the object for the included file being stored...
11 ObjProtocol c HTTP/1.1
11 ObjResponse c OK
11 ObjHeader c Date: Wed, 18 Jul 2012 23:25:56 GMT
11 ObjHeader c X-Powered-By: PHP/5.3.3-1ubuntu9.10
11 ObjHeader c Last-Modified: Wed, 18 Jul 2012 23:25:56 +0000
11 ObjHeader c Expires: Sun, 11 Mar 1984 12:00:00 GMT
11 ObjHeader c Vary: Cookie,Accept-Encoding
11 ObjHeader c ETag: "1342653956"
11 ObjHeader c Content-Encoding: gzip
11 ObjHeader c Content-Length: 656
11 ObjHeader c Content-Type: text/html
11 ObjHeader c Server: Apache/2.2.11
11 ObjHeader c Cache-Control: no-store
I've spent a full day on this, searched, read every article I can find, tried a whole heap of config tweaking, both in the VCL and in the HTTP headers. I can't see anything I'm doing wrong.
This is a snippet from my VCL, trying to force it to not store in the cache
sub vcl_fetch {
set beresp.do_esi = true;
if (req.url ~ "^/esi_") {
set beresp.http.Cache-Control = "no-store";
set beresp.ttl = 0s;
}
}
I would add that I am seeing nothing to inidicate errors in the varnishlog. I've tried using just the path, and host + path in the include src, but no difference. It simply won't ask the backend for fresh content. If you were looking at the logs for the second and subsequent requests, you wouldn't realise it was an ESI page.

provide in the sub vcl_recv {} something that tells varnish not to lookup for the request in the cache and define an additional http response element from your backend server, which is handled by a condition in vcl. e.g. "pragma: no-cache" ..
you might extend this condition in the vcl_recv with ~ "^/esi_" ..
sub vcl_recv(
# ...
# the rest goes here ..
# ...
if ((req.url ~ "^/esi_") && (req.http.pragma ~ "no-cache")) {
return (pass);
}
# ...
}

mod_sec trigger on CSR rule _23

I'm using mod_security with the latest core rules.
It triggers on all my pages whenever I use a querystring.. ie.
www.mypage.com/index.php?querystring=1
I get a warning that it exceeds maximum allowed number of arguements, however the base config defines max_numb_args to = 255 which of course it doesn't exceed.
Any ideas why?
Base conf:
SecRuleEngine On
SecAuditEngine RelevantOnly
SecAuditLog /var/log/apache2/modsec_audit.log
SecDebugLog /var/log/apache2/modsec_debug_log
SecDebugLogLevel 3
SecDefaultAction "phase:2,pass,log,status:500"
SecRule REMOTE_ADDR "^127.0.0.1$" nolog,allow
SecRequestBodyAccess On
SecResponseBodyAccess On
SecResponseBodyMimeType (null) text/html text/plain text/xml
SecResponseBodyLimit 2621440
SecServerSignature Apache
SecUploadDir /tmp
SecUploadKeepFiles Off
SecAuditLogParts ABIFHZ
SecArgumentSeparator "&"
SecCookieFormat 0
SecRequestBodyInMemoryLimit 131072
SecDataDir /tmp
SecTmpDir /tmp
SecAuditLogStorageDir /var/log/apache2/audit
SecResponseBodyLimitAction ProcessPartial
SecAction "phase:1,t:none,nolog,pass,setvar:tx.max_num_args=255"
Rule that triggers:
# Maximum number of arguments in request limited
SecRule &TX:MAX_NUM_ARGS "#eq 1" "chain,phase:2,t:none,pass,nolog,auditlog,msg:'Maximum number of arguments in request reached',id:'960335',severity:'4',rev:'2.0.7'"
SecRule &ARGS "#gt %{tx.max_num_args}" "t:none,setvar:'tx.msg=%{rule.msg}',setvar:tx.anomaly_score=+%{tx.notice_anomaly_score},setvar:tx.policy_score=+%{tx.notice_anomaly_score},setvar:tx.%{rule.id}-POLICY/SIZE_LIMIT-%{matched_var_name}=%{matched_var}"
And the log ouput:
--ad5dc005-C--
queryString=2
--ad5dc005-F--
HTTP/1.1 200 OK
X-Powered-By: PHP/5.3
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: SESSION=ak19oq36gpi94rco2qbi6j2k20; path=/
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 1272
Keep-Alive: timeout=15, max=99
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8
--ad5dc005-H--
Message: Operator GT matched 0 at ARGS. [file "/etc/apache2/conf/modsecurity_crs/base_rules/modsecurity_crs_23_request_limits.conf"] [line "30"] [id "960335"] [rev "2.0.7"] [msg "Maximum number of arguments in request reached"] [severity "WARNING"]
Message: Operator GE matched 0 at TX:anomaly_score. [file "/etc/apache2/conf/modsecurity_crs/base_rules/modsecurity_crs_49_inbound_blocking.conf"] [line "18"] [msg "Inbound Anomaly Score Exceeded (Total Score: 5, SQLi=, XSS=): Maximum number of arguments in request reached"]
Message: Warning. Operator GE matched 0 at TX:inbound_anomaly_score. [file "/etc/apache2/conf/modsecurity_crs/base_rules/modsecurity_crs_60_correlation.conf"] [line "35"] [msg "Inbound Anomaly Score Exceeded (Total Inbound Score: 5, SQLi=, XSS=): Maximum number of arguments in request reached"]
Apache-Handler: application/x-httpd-php
Stopwatch: 1279667800315092 76979 (1546* 7522 72931)
Producer: ModSeurity for Apache/2.5.11 (http://www.modsecurity.org/); core ruleset/2.0.7.
Server: Apache

I was using the lib from Ubuntu.. which had the .11 version. I uninstalled it, compiled from source .12 version and now it's alive, kicking and screaming!
Latest CSR rules needs the .12 version. Cheers.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why emojis in Raw email are in Hexadecimal and how to decode it in Python - python-3.x

This is in MIME format, which is very common for email. You'll need to parse the email with a tool like email.parser from the standard library. It will take care of decoding this format into normal strings.

Related

How to upload a 10 Gb file using SAS token

How can I view raw content with HTTP request?

WinHTTPRequest Returning Empty Response Text and Body

Varnish ESI include only works when parent page is fetched from backend

mod_sec trigger on CSR rule _23

Categories

Resources