Cypress: intercept a network request with compression-type gzip to simulate mapbox - node.js

I am currently having trouble with mocking a particular request mapbox-gl is making. When the map is loaded from mapbox pbf-files are being requested and i have not been able to mock this.
My guess is that the core issue is that there seems to be an open bug with cypress issue-16420.
I tried alot of different intercept variants. I tried all kinds of response headers. I gziped, compressed, brd the file that I serve via fixture. I tried different encodings for the fixture. Nothing worked. One of the interceptors looks basically like this
cy.intercept({
method: 'GET',
url: '**/fonts/v1/mapbox/DIN%20Offc%20Pro%20Italic,Arial%20Unicode%20MS%20Regular/0-255.pbf?*',
}, {
fixture: 'fonts/italic.arial.0-255.pbf,binary',
statusCode: 204,
headers: {
'Connection': 'keep-alive',
'Keep-Alive': 'timeout=5',
'Transfer-Encoding': 'chunked',
'access-control-allow-origin': '*',
'access-control-expose-headers': 'Link',
'age': '11631145',
'cache-control': 'max-age=31536000',
'content-encoding': 'compress',
'content-type': 'application/x-protobuf',
'date': 'Sat, 19 Feb 2022 20:46:43 GMT',
'etag': 'W/"b040-+eCb/OHkPqToOcONTDlvpCrjmvs"',
'via': '1.1 4dd80d99fd5d0f6baaaf5179cd921f72.cloudfront.net (CloudFront)',
'x-amz-cf-id': '4uY9rjBgR_R12nkfHFrBMLEpNuWygW9DkmODlMEzwJHABTGCGg8pww==',
'x-amz-cf-pop': 'FRA56-P7',
'x-cache': 'Hit from cloudfront',
'x-origin': 'Mbx-Fonts'
}
}).as('get.0-255.pbf').as('getItalicArial0-255');
Now even if this is a bug there has to be some kind of workaround to serve the file in a cypress test without having an active internet connection. It would be great not having to rely on the network on tests. So all kinds of workarounds and dirty tricks are welcome in making this intercept work.

Related

Scrape Website protected by Cloudflare without cookies using Python and Requests

Usually when a website is protected by cloudflare they load a cookie with a value from the very first request, so when you try to fetch it it returns 403 forbidden access.
This website Oddschecker is a sports odds aggregator and does things differently.
Inspecting in a private session you can see the headers doesn't contain any cookie nor any reference to cloudflare
Yet, this is my code
headers = {
'authority': 'www.oddschecker.com',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-gpc': '1',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'es-ES,es;q=0.9'}
url = "https://www.oddschecker.com/"
session=cloudscraper.create_scraper()
response=session.get(url=url, headers=headers)
and response has a 403 status. Why is that? How is cloudflare preventing me from access if they don't load any cookie for it and I'm using a library designed to accept JS loads?
This is a snippet of the response in Postman (also 403)
Just because, I tried to recreate the POST requests in there, so I did
url="https://sparrow.cloudflare.com/api/v1/event"
payload={'event':"feedback clicked",'properties':{'errorCode':1020,'version':2}}
headers={'Content-Type':"application/json","Sparrow-Source-Key":"c771f0e4b54944bebf4261d44bd79a1e"}
r=sesion.post(url=url,headers=headers,data=json.dumps(payload))
r.headers --> {'Date': 'Tue, 22 Mar 2022 23:19:25 GMT', 'Content-Type': 'text/plain;charset=UTF-8', 'Content-Length': '9', 'Connection': 'keep-alive', 'Access-Control-Allow-Origin': 'https://sparrow.cloudflare.com', 'Vary': 'Origin, Accept-Encoding', 'access-control-allow-headers': 'Content-Type, Sparrow-Client-ID, Sparrow-Source-Key, Origin', 'access-control-allow-methods': 'POST, OPTIONS', 'access-control-max-age': '600', 'Expect-CT': 'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"', 'Server': 'cloudflare', 'CF-RAY': '6f02a6f2f8a9668f-MAD'}
Funny though, this one did return 200 and its r.content is b"Filtered." which I don't know if means something or not.
So, how do I make this work? How is it pushing me out?
Come on don't be shy
I don't know how cloudflare is doing it but I realized that cloudflare create cookies like cf_clearance after a while from your first access to website. If you keep trying your requests in browser your cookies will be generated.

Error 403 while using exchangelib to access Outlook Exchange server to read emails

I'm trying to read emails from the Microsoft Exchange server using EWS and exchangelib in Python for an email classification problem. But I am unable to connect to the exchange server.
I've tried specifying the version, auth_type, using a certificate (which gives a ssl verify error), using the smtp address in place of the username and it still doesn't connect.
Here is my code:
from exchangelib import Credentials, Account, EWSDateTime, EWSTimeZone, Configuration, DELEGATE, IMPERSONATION, NTLM, ServiceAccount, Version, Build
USER_NAME = 'domain\\user12345'
ACCOUNT_EMAIL = john.doe#ext.companyname.com'
ACCOUNT_PASSWORD = 'John#1234'
ACCOUNT_SERVER = 'oa.company.com'
creds = Credentials(USER_NAME, ACCOUNT_PASSWORD)
config = Configuration(server=ACCOUNT_SERVER, credentials=creds)
account = Account(primary_smtp_address=ACCOUNT_EMAIL, config=config, autodiscover=False, access_type=DELEGATE)
print('connecting ms exchange server account...')
print(type(account))
print(dir(account))
account.root.refresh()
Here is the error I am getting:
TransportError: Unknown failure
Retry: 0
Waited: 10
Timeout: 120
Session: 26271
Thread: 15248
Auth type: <requests_ntlm.requests_ntlm.HttpNtlmAuth object at 0x00000259AA1BD588>
URL: https://oa.company.com/EWS/Exchange.asmx
HTTP adapter: <requests.adapters.HTTPAdapter object at 0x00000259AA0DB7B8>
Allow redirects: False
Streaming: False
Response time: 0.28100000000085856
Status code: 403
Request headers: {'User-Agent': 'python-requests/2.21.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'Keep-Alive', 'Content-Type': 'text/xml; charset=utf-8', 'Content-Length': '469', 'Authorization': 'NTLM TlRMTVNTUAADAAAAGAAYAG0AAAAOAQ4BhQAAAAwADABYAAAACQAJAGQAAAAAAAAAbQAAABAAEACTAQAANoKJ4gYBsR0AAAAP7Pyb+wBnMdrlhr4FKVqPbklDSUNJQkFOS0xURElQUlUzODE5MAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJTLmMBLPHowOJZ46XDs+4ABAQAAAAAAAAZdRefQLNUB+Fc6Z26oxvgAAAAAAgAYAEkAQwBJAEMASQBCAEEATgBLAEwAVABEAAEAEgBIAFkARABFAFgAQwBIADAAOAAEACAAaQBjAGkAYwBpAGIAYQBuAGsAbAB0AGQALgBjAG8AbQADADQASABZAEQARQBYAEMASAAwADgALgBpAGMAaQBjAGkAYgBhAG4AawBsAHQAZAAuAGMAbwBtAAUAIABpAGMAaQBjAGkAYgBhAG4AawBsAHQAZAAuAGMAbwBtAAcACAAGXUXn0CzVAQYABAACAAAACgAQAD9EWlwiiUs304wucsxnkyQAAAAAAAAAALfelDwG05hYOMUqY/e60PY=', 'Cookie': 'ClientId=SINZWMOJKWSKDGEKASFG; expires=Fri, 26-Jun-2020 10:13:02 GMT; path=/; HttpOnly'}
Response headers: {'Cache-Control': 'private', 'Server': 'Microsoft-IIS/8.5', 'request-id': 'ae4dee8d-34e0-471c-8252-b8c1056c8ea0', 'X-CalculatedBETarget': 'pqrexch05.domain.com', 'X-DiagInfo': 'PQREXCH05', 'X-BEServer': 'PQREXCH05', 'X-AspNet-Version': '4.0.30319', 'Set-Cookie': 'exchangecookie=681afc8a0905459182363cce9a98d021; expires=Sat, 27-Jun-2020 10:13:02 GMT; path=/; HttpOnly, X-BackEndCookie=S-1-5-21-1343024091-725345543-504838010-1766210=u56Lnp2ejJqBy87Iysqem5nSy8mbnNLLyZ7H0sfIysbSy5vMz8qdzcvPnpzHgYHNz87G0s/I0s3Iq87Pxc7Mxc/N; expires=Sat, 27-Jul-2019 10:13:02 GMT; path=/EWS; secure; HttpOnly', 'Persistent-Auth': 'true', 'X-Powered-By': 'ASP.NET', 'X-FEServer': 'PQREXCH05', 'Date': 'Thu, 27 Jun 2019 10:13:01 GMT', 'Content-Length': '0'}
Request data: b'<?xml version=\'1.0\' encoding=\'utf-8\'?>\n<s:Envelope xmlns:m="http://schemas.microsoft.com/exchange/services/2006/messages" xmlns:s="http://schemas.xmlsoap.org/soap/envelope/" xmlns:t="http://schemas.microsoft.com/exchange/services/2006/types"><s:Header><t:RequestServerVersion Version="Exchange2013_SP1"/></s:Header><s:Body><m:ResolveNames ReturnFullContactData="false"><m:UnresolvedEntry>ICICIBANKLTD\\IPRU38190</m:UnresolvedEntry></m:ResolveNames></s:Body></s:Envelope>'
Response data: b''
You might need to configure access policy for EWS using PowerShell.
For example (to allow all apps to use REST and EWS):
Set-OrganizationConfig -EwsApplicationAccessPolicy EnforceBlockList -EwsBlockList $null
Taken from Microsoft docs on Set-OrganizationConfig.
Please search for EwsApplicationAccessPolicy in the above link for more granular access control examples.

With Nock, can you match any domain?

I have got this unit test code which mocks an HTTP request:
nock(/localhost/)
.post(/validate-user$/, credentials)
.reply(401);
This works fine. However, I don't really care about the domain. In fact, the domain will change. This will break my unit tests.
Can I specify a wildcard domain for Nock?
I've tried nock(/.*/) but it does not work.
The Nock recorder outputs:
<<<<<<-- cut here -->>>>>>
nock('https://localhost:41109', {"encodedQueryParams":true})
.post('/mock-validate-user', {"username":"alexxx","password":"alexxx100"})
.reply(200, {"username":"alexxx","email":"alexxx#foo.com","admin":true},
[ 'Content-Type',
'application/json; charset=utf-8',
'Content-Length',
'59',
'ETag',
'W/"3b-JaaJntKHf2/nzwhkqdNoFA"',
'Date',
'Thu, 02 Mar 2017 16:11:06 GMT',
'Connection',
'close' ]);
<<<<<<-- cut here -->>>>>>

Request email audit export fails with status 400 and "Premature end of file."

according to https://developers.google.com/admin-sdk/email-audit/#creating_a_mailbox_for_export I am trying to request the email audit export of an user in G Suite this way:
def requestAuditExport(account):
credentials = getCredentials()
http = credentials.authorize(httplib2.Http())
url = 'https://apps-apis.google.com/a/feeds/compliance/audit/mail/export/helpling.com/'+account
status, response = http.request(url, 'POST', headers={'Content-Type': 'application/atom+xml'})
print(status)
print(response)
And I get the following result:
{'content-length': '22', 'expires': 'Tue, 13 Dec 2016 14:19:37 GMT', 'date': 'Tue, 13 Dec 2016 14:19:37 GMT', 'x-frame-options': 'SAMEORIGIN', 'transfer-encoding': 'chunked', 'x-xss-protection': '1; mode=block', 'content-type': 'text/html; charset=UTF-8', 'x-content-type-options': 'nosniff', '-content-encoding': 'gzip', 'server': 'GSE', 'status': '400', 'cache-control': 'private, max-age=0', 'alt-svc': 'quic=":443"; ma=2592000; v="35,34"'}
b'Premature end of file.'
I cannot see where the problem is, can someone please give me a hint?
Thanks in advance!
Kay
Fix it by going intp the Admin Console, Manage API client access page under Security and add the Client ID, scope needed for the Directory API. For more information, check this document.
Okay, found out what was wrong and fixed it myself. Finally it looks like this:
http = getCredentials().authorize(httplib2.Http())
url = 'https://apps-apis.google.com/a/feeds/compliance/audit/mail/export/helpling.com/'+account
headers = {'Content-Type': 'application/atom+xml'}
xml_data = """<atom:entry xmlns:atom='http://www.w3.org/2005/Atom' xmlns:apps='http://schemas.google.com/apps/2006'> \
<apps:property name='includeDeleted' value='true'/> \
</atom:entry>"""
status, response = http.request(url, 'POST', headers=headers, body=xml_data)
Not sure if it was about the body or the header. It works now and I hope it will help others.
Thanks anyway.

Python 3.5.2 Iterating a get request

Hoping someone can tell me whether this script is functioning the way I intended it to, and if not explain what I am doing wrong.
The RESTful API I am using has a parameter pageSize ranging from 10-50. I used pageSize=50. There was another parameter that I did not use called pageNumber
So, I thought this would be the right way to make the get request:
# Python 3.5.2
import requests
r = requests.get(url, stream=True)
with open("file.txt",'w', newline='', encoding='utf-8') as fd:
text_out = r.text
fd.write(text_out)
UPDATE
I think I understand a bit better. I read the documentation in more detail, but I am still missing how to get the entire data set from the API. Here is some more information:
verbs = requests.options(r.url)
print(verbs.headers)
{'Server': 'ninx', 'Date': 'Sat, 24 Dec 2016 22:50:13 GMT',
'Allow': 'OPTIONS,HEAD,GET', 'Content-Length': '0', 'Connection': 'keep-alive'}
print(r.headers)
{'Transfer-Encoding': 'chunked', 'Vary': 'Accept-Encoding',
'X-Entity-Count': '50', 'Connection': 'keep-alive',
'Content-Encoding': 'gzip', 'Date': 'Sat, 24 Dec 2016 23:59:07 GMT',
'Server': 'ninx', 'Content-Type': 'application/json; charset=UTF-8'}
Should I create a session and use the previously unused pageNumber parameter to create a new url until the 'X-Entity-Count' is zero? Or, is there a better way?
I found a discussion that helped clear this matter up for me...this updated question should probably be deleted...
API pagination best practices

Resources