Scrape Website protected by Cloudflare without cookies using Python and Requests

Scrape Website protected by Cloudflare without cookies using Python and Requests - python-3.x

Usually when a website is protected by cloudflare they load a cookie with a value from the very first request, so when you try to fetch it it returns 403 forbidden access.
This website Oddschecker is a sports odds aggregator and does things differently.
Inspecting in a private session you can see the headers doesn't contain any cookie nor any reference to cloudflare
Yet, this is my code
headers = {
'authority': 'www.oddschecker.com',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-gpc': '1',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'es-ES,es;q=0.9'}
url = "https://www.oddschecker.com/"
session=cloudscraper.create_scraper()
response=session.get(url=url, headers=headers)
and response has a 403 status. Why is that? How is cloudflare preventing me from access if they don't load any cookie for it and I'm using a library designed to accept JS loads?
This is a snippet of the response in Postman (also 403)
Just because, I tried to recreate the POST requests in there, so I did
url="https://sparrow.cloudflare.com/api/v1/event"
payload={'event':"feedback clicked",'properties':{'errorCode':1020,'version':2}}
headers={'Content-Type':"application/json","Sparrow-Source-Key":"c771f0e4b54944bebf4261d44bd79a1e"}
r=sesion.post(url=url,headers=headers,data=json.dumps(payload))
r.headers --> {'Date': 'Tue, 22 Mar 2022 23:19:25 GMT', 'Content-Type': 'text/plain;charset=UTF-8', 'Content-Length': '9', 'Connection': 'keep-alive', 'Access-Control-Allow-Origin': 'https://sparrow.cloudflare.com', 'Vary': 'Origin, Accept-Encoding', 'access-control-allow-headers': 'Content-Type, Sparrow-Client-ID, Sparrow-Source-Key, Origin', 'access-control-allow-methods': 'POST, OPTIONS', 'access-control-max-age': '600', 'Expect-CT': 'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"', 'Server': 'cloudflare', 'CF-RAY': '6f02a6f2f8a9668f-MAD'}
Funny though, this one did return 200 and its r.content is b"Filtered." which I don't know if means something or not.
So, how do I make this work? How is it pushing me out?
Come on don't be shy

I don't know how cloudflare is doing it but I realized that cloudflare create cookies like cf_clearance after a while from your first access to website. If you keep trying your requests in browser your cookies will be generated.

Related

HTTP Request using Python

For those who are reading my thread, I'd like to thank you in advance for your assistance in advance and would also like to ask for a bit of leniency when it comes to incorrect terminology, as I am still a 'Newbie'.
I've been trying to retrieve stock codes from KRX website, as I could not find any other resource to retrieve the information that I need. I tried to use requests library in python, but because the data I needed was loaded Asynchronously, which made the data inaccessible.
The problem is that in order to retrieve the information, I need to make two requests to an endpoint, one to retrieve code to be used as body for the second request, but when I made the second request, it returns empty list.
I managed to locate the API calls which retrieved the stock codes as shown below.
TwoRequests
To my knowledge, it requires two API calls, one to retrieve code, which works as access token for the second request in order to retrieve the Stock code that I am trying to retrieve.
I've managed to retrieve the code for the first request with the following codes
import requests
url = 'https://global.krx.co.kr/contents/COM/GenerateOTP.jspx'
headers = {
'Cookie': 'SCOUTER=x22rkf7ltsmr7l; __utma=88009422.986813715.1652669493.1652669493.1652669493.1; SCOUTER=z6pj0p85muce99; JSESSIONID=bOnAJtLWSpK1BiCuhWD0ldj1TqW5z6wEcn65oVgtyie841OlbdJs3fEHpUs1QtAV.bWRjX2RvbWFpbi9tZGNvd2FwMS1tZGNhcHAwMQ==; JSESSIONID=C2794518AD56B7119F0DA630B73B05AA.58tomcat2',
'Connection': 'keep-alive',
'accept': '*/*',
'accept-enconding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,ko;q=0.8',
'host': 'global.krx.co.kr',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36',
}
params = {
'bld': 'COM/stock_isu_info',
'name': 'finderBld',
'_': '1668677450106',
}
# make get request to the url and keep the connection open
response = requests.get(url, headers=headers, params=params, stream=True)
# response = requests.get(url, params=params, headers=headers)
relay_data = response.text
but upon sending a request to the second endpoint with the code as payload, it returns empty list, but I was expecting the response value for the second request as the following:
PayloadNeeded
The code I used to make the second request is the following (I added lots values for the header and body in hopes to retrieve the data by simulating the values used on the web page):
url = 'https://global.krx.co.kr/contents/GLB/99/GLB99000001.jspx'
headers = {
# ':authority': 'global.krx.co.kr',
# ':method': 'POST',
# ':path': '/contents/GLB/99/GLB99000001.jspx',
# ':scheme': 'https',
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,ko;q=0.8',
'content-length': '0',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Cookie': 'SCOUTER=x22rkf7ltsmr7l; __utma=88009422.986813715.1652669493.1652669493.1652669493.1; SCOUTER=z6pj0p85muce99; JSESSIONID=bOnAJtLWSpK1BiCuhWD0ldj1TqW5z6wEcn65oVgtyie841OlbdJs3fEHpUs1QtAV.bWRjX2RvbWFpbi9tZGNvd2FwMS1tZGNhcHAwMQ==; JSESSIONID=C2794518AD56B7119F0DA630B73B05AA.58tomcat2',
'origin': 'https://global.krx.co.kr',
'referer': 'https://global.krx.co.kr/contents/GLB/99/GLB99000001.jsp',
'sec-ch-ua': '"Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"',
'sec-ch-ua-mobile': '?0',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'sec-gpc': '1',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
}
payload = {
'market_gubun': '0',
'isu_cdnm': 'All',
'isu_cd': '',
'isu_nm': '',
'isu_srt_cd': '',
'sort':'',
'ck_std_ind_cd': '20',
'par_pr': '',
'cpta_scl': '',
'sttl_trm': '',
'lst_stk_vl': '1',
'in_lst_stk_vl': '',
'in_lst_stk_vl2': '',
'cpt': '1',
'in_cpt': '',
'in_cpt2': '',
'nat_tot_amt': '1',
'in_nat_tot_amt': '',
'in_nat_tot_amt2': '',
'pagePath': '/contents/GLB/03/0308/0308010000/GLB0308010000.jsp',
'code': relay_data,
'pageFirstCall': 'Y',
}
# make request with url, headers, body
response = requests.post(url, headers=headers, data=payload)
print(response.text)
And here is the output for the code above:
{"DS1":[]}
Any help would be very much appreciated

How can get the json data automatically instead of copy and paste manually?

I want to get the json data in the target url:
target url
To get it manually :open it in brower manually and copy,paste.I want a more samrt way--programmatically and automatically,have tried with several way,all failed.
Method 1--traditional way with wget or curl:
wget https://xueqiu.com/stock/cata/stocktypelist.json?page=1&size=300
--2021-02-09 11:55:44-- https://xueqiu.com/stock/cata/stocktypelist.json?page=1
Resolving xueqiu.com (xueqiu.com)... 39.96.249.191
Connecting to xueqiu.com (xueqiu.com)|39.96.249.191|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-02-09 11:55:44 ERROR 403: Forbidden.
Method 2--scrapy with selenium:
>>> from selenium import webdriver
>>> browser = webdriver.Chrome()
>>> url="https://xueqiu.com/stock/cata/stocktypelist.json?page=1&size=300"
>>> browser.get(url)
It happen to me in the browser:
{"error_description":"遇到错误，请刷新页面或者重新登录帐号后再试","error_uri":"/stock/cata/stocktypelist.json","error_code":"400016"}
Method 3--build a mitmproxy:
mitmweb --listen-host 127.0.0.1 -p 8080
Set proxy in browser and open the target url in browser
Error info in terminal:
Web server listening at http://127.0.0.1:8081/
Opening in existing browser session.
Proxy server listening at http://127.0.0.1:8080
127.0.0.1:41268: clientconnect
127.0.0.1:41270: clientconnect
127.0.0.1:41268: HTTP/2 connection terminated by client: error code: 0, last stream id: 0, additional data: None
Error info in browser:
error_description "遇到错误，请刷新页面或者重新登录帐号后再试"
error_uri "/stock/cata/stocktypelist.json"
error_code "400016"
So powerful site to protect the data ,is there no way to get the data automatically?

You could use requests module
import json
import requests
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0",}
import requests
cookies = {
'xq_a_token': '176b14b3953a7c8a2ae4e4fae4c848decc03a883',
'xqat': '176b14b3953a7c8a2ae4e4fae4c848decc03a883',
'xq_r_token': '2c9b0faa98159f39fa3f96606a9498edb9ddac60',
'xq_id_token': 'eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJ1aWQiOi0xLCJpc3MiOiJ1YyIsImV4cCI6MTYxMzQ0MzE3MSwiY3RtIjoxNjEyODQ5MDY2ODI3LCJjaWQiOiJkOWQwbjRBWnVwIn0.VuyNicSjIvVkp9FrCzIlRyx8487XM4HH1C3X9KsFA2FipFiilSifBhux9pMNRyziHHiEifhX-xOgccc8IG1mn8cOylOVy3b-L1YG2T5Hs8MKgx7qm4gnV5Mzm_5_G5BiNtO44aczUcmp0g53dp7-0_Bvw3RlwXzT1DTvCKTV-s_zfBsOPyFTfiqyDUxU-oBRvkz1GpgVJzJL4EmZ8zDE2PBqeW00ueLLC7qPW50WeDCsEFS4ZPAvd2SbX9JPk-lU2WzlcMck2S9iFYmpDwuTeQuPbSeSl6jt5suwTImSgJDIUP9o2TX_Z7nNRDTYxvbP8XlejSt8X0pRDPDd_zpbMQ',
'u': '661612849116563',
'device_id': '24700f9f1986800ab4fcc880530dd0ed',
'Hm_lvt_1db88642e346389874251b5a1eded6e3': '1612849123',
's': 'c111f3y1kn',
'Hm_lpvt_1db88642e346389874251b5a1eded6e3': '1612849252',
}
headers = {
'Connection': 'keep-alive',
'Cache-Control': 'no-cache',
'sec-ch-ua': '"Chromium";v="88", "Google Chrome";v="88", ";Not A Brand";v="99"',
'sec-ch-ua-mobile': '?0',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36',
'Accept': 'image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'no-cors',
'Sec-Fetch-User': '?1',
'Sec-Fetch-Dest': 'image',
'Accept-Language': 'en-US,en;q=0.9',
'Pragma': 'no-cache',
'Referer': '',
}
params = (
('page', '1'),
('size', '300'),
)
response = requests.get('https://xueqiu.com/stock/cata/stocktypelist.json', headers=headers, params=params, cookies=cookies)
print(response.status_code)
json_data = response.json()
print(json_data)

You could use scrapy:
import json
import scrapy
class StockSpider(scrapy.Spider):
name = 'stock_spider'
start_urls = ['https://xueqiu.com/stock/cata/stocktypelist.json?page=1&size=300']
custom_settings = {
'DEFAULT_REQUEST_HEADERS': {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:85.0) Gecko/20100101 Firefox/85.0',
'Host': 'xueqiu.com',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US',
'Accept-Encoding': 'gzip,deflate,br',
'Connection': 'keep-alive',
'Cache-Control': 'no-cache',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'Pragma': 'no-cache',
'Referer': '',
},
'ROBOTSTXT_OBEY': False
}
handle_httpstatus_list = [400]
def parse(self, response):
json_result = json.loads(response.body)
yield json_result
Run spider: scrapy crawl stock_spider

Decode weirdly formatted requests response data in Python

I'm parsing a website using their XHR calls, but I've noticed I have some trouble understanding what it is returning to me. My code so you can replicate
url = "https://api.aiscore.com/v1/web/api/matches?lang=4&sport_id=2&date=20201112&tz=01:00"
headers={'authority': 'api.aiscore.com',
'sec-ch-ua': '"Chromium";v="86", ""Not\\A;Brand";v="99", "Google Chrome";v="86"',
'accept': 'application/json, text/plain, */*',
'sec-ch-ua-mobile': '?0',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36',
'origin': 'https://www.aiscore.com',
'sec-fetch-site': 'same-site',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://www.aiscore.com/',
'accept-language': 'es-ES,es;q=0.9,en;q=0.8'}
response=requests.get(url=url,headers=headers)
>>response.text
'zø÷\x01\n¬\x01\n\x0fmo07dzsdgcxknxy\x10\x02"6\x08\x1c\x12\x08Lituania*(d9051e0b77f8bb5521389618e70e2ada.png!w80*\x17Lietuvos krepsinio lygaB%990525c8080452649a7b68501bf53f57.jpegPàÑ\x04º\x01\x17lietuvos-krepsinio-lygaØ\x01\x04\nn\n\x0f4ndqmrsxyja5kve\x10\x02"5\x08;\x12\x07Uruguay*(75497a22409db78dcc52c291e078bc10.png!w80*\x0eUruguay Leagueº\x01\x0euruguay-leagueØ\x01\x04\n°\x01\n\x0fw34kgpsnpf1ko92\x10\x02"7\x084\x12\tArgentina*(3536be57ce0713954e454ae6c53ec023.png!w80*\x1bLiga Nacional de BosquetbolB$1dae06c27aea581100fa54e7c50d892a.gifº\x01\x1bliga-nacional-de-bosquetbolØ\x01\x04\nn\n\x0f8vmqy6sjvlbek9r\x10\x02"7\x08>\x12\tVenezuela*(e95294b730f61c8175550ec244bfcb50.png!w80*\rVenezuela LPBº\x01\rvenezuela-lpbØ\x01\x04\nm\n\x0f9oj7x6srxzfe7g3\x10\x02"6\x08<\x12\x08Colombia*(ef3388cc5659bccb742fb8af762f1bfd.png!w80*\rColombia Ligaº\x01\rcolombia-ligaØ\x01\x04\nf\n\x0f2jr7o9s29gh170e\x10\x02"3\x08�N\x12\rOtros paÃ\xadses*\x1faiscore_default_country.png!w80*\x0bVietnam VBAº\x01\x0bvietnam-vbaØ\x01\x04\nZ\n\x0f9oj7x6s5gse7g3y\x10\x02"\x1b\x08…\x01\x12\x06Kosovo*\x0ekosovo.png!w80*\x11Kosovo Division Iº\x01\x11kosovo-division-iØ\x01\x04\nž\x01\n\x0fzjek3ps65hvqo29\x10\x02"5\x08 \x12\x07Croacia*(560d4c6ff431c86546f3fcec72c748c7.png!w80*\x13Croatian A-1 LeagueB$f0e275b18af005a1e8c931582f7bb3bd.gifº\x01\x13croatian-a-1-leagueØ\x01\x04\ni\n\x0fw34kgpso8b1ko92\x10\x02"6\x08\x17\x12\x08RumanÃ\xada*(0c7d5ae44b2a0be9ebd7d6b9f7d60f20.png!w80*\x0bRomania CUPº\x01\x0bromania-cupØ\x01\x04\n»\x01\n\x0f2ezk90s60c2kn51\x10\x02",\x08‘N\x12\rInternacional*\x181552909490097161.png!w80*&Balkan International Basketball LeagueB$7e2cccee37653bb9ffb7a37152dd1091.jpgº\x01&balkan-international-basketball-leagueØ\x01\x04\n®\x01\n\x0f9oj7x6srpae7g3y\x10\x02"7\x08\r\x12\tDinamarca*(424214945ba5615eca039bfe5d731c09.png!w80*\x18Danish Basketball LeagueB$533c6b6c2fc9e2fb51cb315dc279eaa8.pngPð¢\x04º\x01\x18danish-basketball-leagueØ\x01\x04\nº\x01\n\x0f0xo17p8sxu3kjw5\x10\x02"3\x08E\x12\x05China*(38aed2a85bd723ca6749216d37b9a989.png!w80*!National Basketball League(China)B$89c4ae7c1e21e4219149a70f6faab848.jpgPØ\xad\x03º\x01\x1fnational-basketball-leaguechinaØ\x01\x04\n‰\x01\n\x0fn527rjspdyb1kev\x10\x02"6\x08\x1f\x12\x08TurquÃ\xada*(221cdfb73049678e244380b45872cbb2.png!w80*\x1bTurkey Federation Cup Womenº\x01\x1bturkey-federation-cup-womenØ\x01\x04\n„\x01\n\x0fr8lk2yso9t0736d\x10\x02"5\x08\x14\x12\x07Ucrania*(f01fc92b23faa973f3492a23d5a705c5.png!w80*\x04UBSLB$115e3dd59ec30a24a18a22e4a285ce6f.jpgPð¢\x04º\x01\x04ubslØ\x01\x04\n�\x01\n\x0fyzrknxslwujqle4\x10\x02"4\x08#\x12\x06Chipre*(ea2ba3f8011e19e3101ce65fdcefbcc4.png!w80*\x13Apollon Limassol BCB$ef154681b0256ec0693beb6c7116a71d.pngº\x01\x13apollon-limassol-bcØ\x01\x04\n²\x01\n\x0fyw6975lsjh2k23e\x10\x02";\x08G\x12\rCorea del Sur*(e0152977a41868ebc941f4e85e90a31b.png!w80*\x18Korean Basketball LeagueB$4c0624a972fbbb2390e46fac63e09430.pngP�Å\x03º\x01\x18korean-basketball-leagueØ\x01\x04\nÆ\x01\n\x0f5xvkjvs93frk938\x10\x02"5\x08\x0e\x12\x07Austria*(9891739094756d2605946c867b32ad28.png!w80*%Osterreichische Basketball BundesligaB$6e340c3414dcc2a34ea78667a5ca686f.jpgPð¢\x04º\x01%osterreichische-basketball-bundesligaØ\x01\x04\n¥\x01\n\x0f5xvkjvs3gtrk938\x10\x02"6\x08\x04\x12\x08Alemania*(d8b00929dec65d422303256336ada04f.png!w80*\x16Germany Basketball CupB$eb44b5509fa3ae45fd315b1858a9f811.jpgº\x01\x16germany-basketball-cupØ\x01\x04\n”\x01\n\x0f59gkl6symwt3kxd\x10\x02"%\x08’N\x12\x06Europa*\x181552909490161265.png!w80*\x16Adriatic Basketball D2B$f4c95209caa7f3bcb3c95011d2236895.pngº\x01\x16adriatic-basketball-d2Ø\x01\x04\nÓ\x01\n\x0fw6975ls3js2k23e\x10\x02"7\x08H\x12\tAustralia*(4442e4af0916f53a07fb8ca9a49b98ed.png!w80*/Womenâ€™s National Basketball League(Australia)B$13b83bf4ccd68cb087b6a0527dc1f481.pngº\x01*womens-national-basketball-leagueaustraliaØ\x01\x04\n\x7f\n\x0fmo07dzsonaxknxy\x10\x02"%\x08’N\x12\x06Europa*\x181552909490161265.png!w80*\nEuroleagueB$c14acac79745b832757d690233ddade0.jpg°\x01\x01º\x01\neuroleagueØ\x01\x04\nÃ\x01\n\x0fjmo07dzsgtxknxy\x10\x02"5\x08\x03\x12\x07EspaÃ±a*(907eba32d950bfab68227fd7ea22999b.png!w80*"Asociacion de Clubes de BaloncestoB$d0468eba49d311f847c7f592a647abc4.pngP°á\x04°\x01\x01º\x01"asociacion-de-clubes-de-baloncestoØ\x01\x04\n�\x01\n\x0f59gkl6s6pb3kxdv\x10\x02",\x08‘N\x12\rInternacional*\x181552909490097161.png!w80*\x0fClub FriendshipB$255ca39a3d57a8ba68c42e11f9212862.pngº\x01\x0fclub-friendshipØ\x01\x04\n\x7f\n\x0f1edq0esmg2hykxg\x10\x02"(\x08“N\x12\tAmÃ©ricas*\x181552909490206875.png!w80*\x1dPuerto Rico Superior Nacionalº\x01\x1dpuerto-rico-superior-nacionalØ\x01\x04\n¹\x01\n\x0fr1edq0es5uykxgo\x10\x02"3\x08E\x12\x05China*(38aed2a85bd723ca6749216d37b9a989.png!w80*\x1eChinese Basketball AssociationB$4bcdfa94d226fd5d7c740b463c182aa0.jpgPøÌ\x03¨\x01\x01º\x01\x1echinese-basketball-associationØ\x01\x04\n¢\x01\n\x0f0m2q19s2xhpk6xw\x10\x02"4\x085\x12\x06Brasil*(42537f0fb56e31e20ab9c2305752087d.png!w80*\x14Novo Basquete BrasilB$d5e2224278126aeca3d057e45b685838.jpg°\x01\x01º\x01\x14novo-basquete-brasilØ\x01\x04\n~\n\x0fyzrknxs43ujqle4\x10\x02"3\x08\x12\x12\x05Rusia*(5feb168ca8fb495dcc89b1208cdeb919.png!w80*\x17VTB United Youth Leagueº\x01\x17vtb-united-youth-leagueØ\x01\x04\nÀ\x01\n\x0fw34kgpsdwi1ko92\x10\x02"%\x08’N\x12\x06Europa*\x181552909490161265.png!w80*-European Basketball Championship Qualifier(W)B$6c4e1a58ab3c7d46f07f896acbda1bbe.jpgº\x01+european-basketball-championship-qualifierwØ\x01\x04\n�\x01\n\x0fyzrknxs9whjqle4\x10\x02"4\x08F\x12\x06JapÃ³n*(53a577bb3bc587b0c28ab808390f1c9b.png!w80*\tB2 LeagueB$692b6587fa9458aebb563b2e1c8c08fb.pngPØ\xad\x03º\x01\tb2-leagueØ\x01\x04\n·\x01\n\x0fw2j374wsou4ko6d\x10\x02"4\x08%\x12\x06Israel*(5a548c2f5875f10bf5614b7c258876cf.png!w80*\x1eIsrael Basketball Super LeagueB$3d858fe19e03080fe8d986250b8811f9.pngPàÑ\x04º\x01\x1eisrael-basketball-super-leagueØ\x01\x04\x12Ì\x01\n\x0fvmqy6sw50g0sgk9\x10\x02"\x11\n\x0fw34kgpsnpf1ko92*\x002\x11\n\x0feg676jsjyrtpkry:\x11\n\x0f5wv784slrlhnqrjB\x08\x10\x01\x18¥º²ý\x05b\x05,,4\x16\x00j\x05D60(\x00xˆ„²ý\x05€\x01\n¨\x01D°\x01\x0fÀ\x01\x03Ê\x01\x00Ò\x01\x00ò\x01K:I\n\x16\n\x031.1\n\x05-24.5\n\x050.666\n\x011\n\x15\n\x0418.0\n\x030.0\n\x051.004\n\x010\n\x16\n\x050.869\n\x05188.5\n\x030.8\n\x010\n\x00\x12Ë\x01\n\x0fedq0esoz0zdsekx\x10\x02"\x11\n\x0f9oj7x6srxzfe7g3*\x002\x11\n\x0fg676jsdrmdcpkry:\x11\n\x0fvmqy6sj281i4k9rB\x08\x10\x01\x18‹Á²ý\x05b\x05&*$0\x00j\x05\x1c((6\x00x�’²ý\x05€\x01\n¨\x01#°\x01\x12À\x01\x03Ê\x01\x00Ò\x01\x00ò\x01J:H\n\x15\n\x040.74\n\x031.5\n\x050.952\n\x011\n\x13\n\x031.4\n\x030.0\n\x042.75\n\x011\n\x18\n\x050.769\n\x05158.5\n\x050.909\n\x010\n\x00\x12Ç\x01\n\x0fjr7o9s14dewhg70\x10\x02"\x11\n\x0f1edq0esmg2hykxg*\x002\x11\n\x0fezk90snezniwkn5:\x11\n\x0fvrqw9solz5id7n2B\x02\x10\x01b\x05D(&,\x00j\x05"\x1e\x1e:\x00x�’²ý\x05€\x01\n¨\x01#°\x01\x06À\x01\x03Ê\x01\x00Ò\x01\x00ò\x01L:J\n\x15\n\x050.869\n\x0419.5\n\x030.8\n\x011\n\x15\n\x051.005\n\x030.0\n\x0417.0\n\x010\n\x18\n\x050.833\n\x05160.5\n\x050.833\n\x010\n\x00\x12Î\x01\n\x0fjek3psgr01pb9qo\x10\x02"\x11\n\x0f0xo17p8sxu3kjw5*\x002\x11\n\x0fzjek3pszw9tdqo2:\x11\n\x0f8lk2ysoge3c3736B\x08\x10\x01\x18î½³ý\x05b\x05(\x128&\x00j\x05B<2.\x00xØ�³ý\x05€\x01\n¨\x01#°\x01%À\x01\x03Ê\x01\x00Ò\x01\x00ò\x01M:I\n\x16\n\x031.3\n\x05-35.5\n\x050.588\n\x011\n\x13\n\x0417.0\n\x030.0\n\x031.0\n\x010\n\x18\n\x050.769\n\x05187.5\n\x050.909\n\x010\n\x00H\x01\x12Ç\x01\n\x0fndkzysj0o50bx73\x10\x02"\x11\n\x0fyzrknxs43ujqle4*\x002\x11\n\x0fo17p8s00leb2kjw:\x11\n\x0f8vrqw9sx4osd7n2B\x02\x10\x01b\x05$\x1c00\x00j\x05(\x14$\x18\x00xðº³ý\x05€\x01\n¨\x01#°\x01FÀ\x01\x03Ê\x01\x00Ò\x01\x00ò\x01L:J\n\x17\n\x050.833\n\x0420.5\n\x050.833\n\x011\n\x15\n\x051.004\n\x030.0\n\x0418.0\n\x010\n\x16\n\x040.83\n\x05147.5\n\x040.83\n\x010\n\x00\x12ý\x01\n\x0f9gkl6srd0jdamkx\x10\x02"\x11\n\x0fw6975ls3js2k23e*\x002\x11\n\x0f2j374wsjzrarko6:\x11\n\x0f0m2q19s3votmk6xB\x0c\x08\x01\x10\x01\x18º„´ý\x05 qb\x05"4* \x00j\x05\x1e "\x1e\x00x€×³ý\x05€\x01\x08¨\x01Ä\x01°\x01Ÿ,À\x01\x02Ê\x01\x00Ò\x01\x00ò\x01v:G\n\x15\n\x050.952\n\x0417.5\n\x030.8\n\x010\n\x15\n\x051.005\n\x030.0\n\x0421.0\n\x011\n\x15\n\x040.85\n\x05150.5\n\x030.9\n\x010\n\x00B+\x12)\n\'\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x13\x15\x16\x19\x1d\x1e #%+.0126y…\x01±\x01Ü\x01\x12x\n\x0fxvkjvsnyjdpt8k9\x10\x02"\x11\n\x0f0xo17p8sxu3kjw5*\x002\x11\n\x0fndkzyszv4gie73z:\x11\n\x0fr8lk2ys258b3736B\x02\x10\x01b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00xˆå³ý\x05€\x01\x0f°\x01\x04À\x01\x08Ê\x01\x00Ò\x01\x00ò\x01\x00\x12ý\x01\n\x0fo07dzsvp9g8fmkn\x10\x02"\x11\n\x0f0xo17p8sxu3kjw5*\x002\x11\n\x0fr8lk2ys24dt3736:\x11\n\x0f0ndkzyspp3be73zB\x0b\x10\x01\x18¹„´ý\x05 ä\x02b\x05$*\x14\x00\x00j\x05&(\x18\x00\x00xˆå³ý\x05€\x01\x06¨\x01À\x01°\x01¥\x1cÀ\x01\x02Ê\x01\x00Ò\x01\x00ò\x01w:F\n\x14\n\x050.869\n\x036.5\n\x030.8\n\x010\n\x14\n\x051.285\n\x030.0\n\x033.5\n\x010\n\x16\n\x050.869\n\x05173.5\n\x030.8\n\x010\n\x00B+\x12)\n\'\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x13\x15\x16\x19\x1d\x1e #%+.0126y…\x01±\x01Ü\x01H\x01\x12ñ\x01\n\x0fo07dzsvp8lzumkn\x10\x02"\x11\n\x0fyw6975lsjh2k23e*\x002\x11\n\x0f1edq0es14vs4kxg:\x11\n\x0fw6975lspvoank23B\x00b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00x\xa0�´ý\x05€\x01\x01¨\x01Ä\x01°\x01·\x03À\x01\x01Ê\x01\x00Ò\x01\x00ò\x01v:G\n\x15\n\x040.91\n\x04-3.5\n\x040.91\n\x010\n\x14\n\x042.55\n\x030.0\n\x041.57\n\x010\n\x16\n\x040.91\n\x05166.5\n\x040.91\n\x010\n\x00B+\x12)\n\'\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x13\x15\x16\x19\x1d\x1e #%+.0126y…\x01±\x01Ü\x01\x12Â\x01\n\x0f527rjslze4ra4ke\x10\x02"\x11\n\x0fw6975ls3js2k23e*\x002\x11\n\x0f59gkl6s2evt1kxd:\x11\n\x0f9gkl6syo1nf1kxdB\x00b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00x\xa0�´ý\x05€\x01\x01¨\x01#°\x01mÀ\x01\x01Ê\x01\x00Ò\x01\x00ò\x01I:G\n\x15\n\x040.95\n\x05-16.5\n\x030.8\n\x010\n\x14\n\x0410.0\n\x030.0\n\x041.03\n\x010\n\x16\n\x040.87\n\x05152.5\n\x040.87\n\x010\n\x00\x12Â\x01\n\x0fedq0esozpglhekx\x10\x02"\x11\n\x0fyzrknxs9whjqle4*\x002\x11\n\x0f63kvlsm4l9sp7ez:\x11\n\x0f0ndkzyswrofe73zB\x00b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00x\xa0�´ý\x05€\x01\x01¨\x01#°\x01fÀ\x01\x01Ê\x01\x00Ò\x01\x00ò\x01I:G\n\x16\n\x040.83\n\x05-14.5\n\x040.83\n\x010\n\x13\n\x037.5\n\x030.0\n\x041.07\n\x010\n\x16\n\x040.83\n\x05155.5\n\x040.83\n\x010\n\x00\x12Â\x01\n\x0fedq0esoz063sekx\x10\x02"\x11\n\x0fn527rjspdyb1kev*\x002\x11\n\x0f63kvlsjoz0bp7ez:\x11\n\x0fvmqy6sjn90b4k9rB\x00b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00x\xa0�´ý\x05€\x01\x01¨\x01#°\x01\x1eÀ\x01\x01Ê\x01\x00Ò\x01\x00ò\x01I:G\n\x15\n\x040.83\n\x0412.5\n\x040.83\n\x010\n\x14\n\x041.11\n\x030.0\n\x045.75\n\x010\n\x16\n\x040.83\n\x05140.5\n\x040.83\n\x010\n\x00\x12Á\x01\n\x0f527rjslzyz3i4ke\x10\x02"\x11\n\x0f59gkl6s6pb3kxdv*\x002\x11\n\x0fwv784spvnwhnqrj:\x11\n\x0fndqmrszd24tgkveB\x00b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00x°«´ý\x05€\x01\x01¨\x01#°\x01\'À\x01\x01Ê\x01\x00Ò\x01\x00ò\x01H:F\n\x15\n\x040.83\n\x0414.5\n\x040.83\n\x010\n\x13\n\x041.07\n\x030.0\n\x037.5\n\x010\n\x16\n\x040.83\n\x05143.5\n\x040.83\n\x010\n\x00\x12x\n\x0f9gkl6srdogehmkx\x10\x02"\x11\n\x0fyzrknxs43ujqle4*\x002\x11\n\x0foj7x6sw1erir7g3:\x11\n\x0fzjek3ps2vlsdqo2B\x02\x10\x01b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00x°«´ý\x05€\x01\x01°\x01\x12À\x01\x01Ê\x01\x00Ò\x01\x00ò\x01\x00\x12Ü\x01\n\x0fl6kers6ndgnsvq5\x10\x02"\x11\n\x0f0xo17p8sxu3kjw5*\x002\x11\n\x0fr8lk2ys2lga3736:\x11\n\x0fg63kvlsyrwup7ezB\x00b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00x¸¹´ý\x05€\x01\x01¨\x01À\x01°\x01.À\x01\x01Ê\x01\x00Ò\x01\x00ò\x01b:3\n\x15\n\x040.83\n\x0444.5\n\x040.83\n\x010\n\x00\n\x16\n\x040.83\n\x05203.5\n\x040.83\n\x010\n\x00B+\x12)\n\'\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x13\x15\x16\x19\x1d\x1e #%+.0126y…\x01±\x01Ü\x01\x12©\x01\n\x0fm2q19swxl9laek6\x10\x02"\x11\n\x0f0xo17p8sxu3kjw5*\x002\x11\n\x0fxo17p8s26xt2kjw:\x11\n\x0f9oj7x6s41pcr7g3B\x02\x10\x01b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00x¸¹´ý\x05€\x01\x01¨\x01€\x01°\x01 À\x01\x01Ê\x01\x00Ò\x01\x00ò\x01-B+\x12)\n\'\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x13\x15\x16\x19\x1d\x1e #%+.0126y…\x01±\x01Ü\x01\x12ó\x01\n\x0fjr7o9s1806lfg70\x10\x02"\x11\n\x0fr1edq0es5uykxgo*\x030Ÿ\x082\x11\n\x0fxo17p8sd42s2kjw:\x11\n\x0f0ndkzys0dzse73zB\x00b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00xä»´ý\x05€\x01\x01¨\x01Ä\x01°\x01ƒ\x02À\x01\x01Ê\x01\x00Ò\x01\x00ò\x01u:E\n\x14\n\x040.87\n\x033.5\n\x040.87\n\x010\n\x13\n\x041.55\n\x030.0\n\x032.5\n\x010\n\x16\n\x040.91\n\x05193.5\n\x040.83\n\x010\n\x00B,\x12*\n(E\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x13\x15\x16\x19\x1d\x1e #%+.0126y…\x01±\x01Ü\x01\x12°\x01\n\x0foj7x6s0ve85c47g\x10\x02"\x11\n\x0fw34kgpsdwi1ko92*\x002\x11\n\x0fn527rjszgmi8kev:\x11\n\x0fn527rjszlgc8kevB\x02\x10\x01b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00xÀÇ´ý\x05€\x01\x01¨\x01#°\x01\x0fÀ\x01\x01Ê\x01\x00Ò\x01\x00ò\x015:3\n\x15\n\x040.83\n\x0417.5\n\x040.83\n\x010\n\x00\n\x16\n\x040.83\n\x05127.5\n\x040.83\n\x010\n\x00\x12ó\x01\n\x0fo17p8s9mx00bykj\x10\x02"\x11\n\x0fr1edq0es5uykxgo*\x030Ÿ\x082\x11\n\x0fw6975ls4x8ink23:\x11\n\x0f5wv784sryjtnqrjB\x00b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00xÀÇ´ý\x05€\x01\x01¨\x01Ä\x01°\x01„\x01À\x01\x01Ê\x01\x00Ò\x01\x00ò\x01u:E\n\x14\n\x040.91\n\x031.5\n\x040.83\n\x010\n\x13\n\x031.8\n\x030.0\n\x041.95\n\x010\n\x16\n\x040.87\n\x05210.5\n\x040.87\n\x010\n\x00B,\x12*\n(E\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x13\x15\x16\x19\x1d\x1e #%+.0126y…\x01±\x01Ü\x01\x12{\n\x0f527rjslzedea4ke\x10\x02"\x11\n\x0f2jr7o9s29gh170e*\x002\x11\n\x0f6975ls30n2unk23:\x11\n\x0fezk90sdgj5fwkn5B\x02\x10\x01b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00xÀÇ´ý\x05€\x01\x01¨\x01#°\x01\x14À\x01\x01Ê\x01\x00Ò\x01\x00ò\x01\x00\x12x\n\x0f6975lsz45n6hgk2\x10\x02"\x11\n\x0f59gkl6symwt3kxd*\x002\x11\n\x0fvmqy6sj5ows4k9r:\x11\n\x0fzjek3psrn9hdqo2B\x02\x10\x01b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00xÀÇ´ý\x05€\x01\x01°\x01\nÀ\x01\x01Ê\x01\x00Ò\x01\x00ò\x01\x00\x12À\x01\n\x0f34kgpse6p9rieko\x10\x02"\x11\n\x0fn527rjspdyb1kev*\x002\x11\n\x0fndkzys8o5wue73z:\x11\n\x0fjek3psjdx3tdqo2B\x00b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00xÄÎ´ý\x05€\x01\x01¨\x01#°\x01\x0eÀ\x01\x01Ê\x01\x00Ò\x01\x00ò\x01G:E\n\x15\n\x040.77\n\x0412.5\n\x040.91\n\x010\n\x12\n\x031.1\n\x030.0\n\x036.5\n\x010\n\x16\n\x040.83\n\x05152.5\n\x040.83\n\x010\n\x00\x12Â\x01\n\x0fedq0esoz5x2hekx\x10\x02"\x11\n\x0fw34kgpsnpf1ko92*\x002\x11\n\x0fw6975lsmpwtnk23:\x11\n\x0f527rjsel0li8kevB\x02\x10\x01b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00xàÿ´ý\x05€\x01\x01¨\x01#°\x01\x0eÀ\x01\x01Ê\x01\x00Ò\x01\x00ò\x01G:E\n\x14\n\x040.83\n\x035.0\n\x040.83\n\x010\n\x13\n\x041.45\n\x030.0\n\x032.6\n\x010\n\x16\n\x040.83\n\x05155.5\n\x040.83\n\x010\n\x00\x12¯\x01\n\x0fm2q19sw611liek6\x10\x02"\x11\n\x0fw34kgpsdwi1ko92*\x002\x11\n\x0f5wv784sr80bnqrj:\x11\n\x0fndqmrszxmgtgkveB\x02\x10\x01b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00xè�µý\x05€\x01\x01¨\x01#°\x01\nÀ\x01\x01Ê\x01\x00Ò\x01\x00ò\x014:2\n\x14\n\x040.87\n\x0445.5\n\x030.8\n\x010\n\x00\n\x16\n\x040.83\n\x05142.5\n\x040.83\n\x010\n\x00\x12Á\x01\n\x0fwv784smr6o4coqr\x10\x02"\x11\n\x0fn527rjspdyb1kev*\x002\x11\n\x0fndkzys8j55se73z:\x11\n\x0fjr7o9sl1e0i370eB\x00b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00xè�µý\x05€\x01\x01¨\x01#°\x01\x01¨\x01#°\x01\x0fÀ\x01\x01Ê\x01\x00Ò\x01\x00ò\x01J:H\n\x16\n\x040.83\n\x05-14.5\n\x040.83\n\x010\n\x14\n\x047.75\n\x030.0\n\x041.07\n\x010\n\x16\n\x040.83\n\x05160.5\n\x040.83\n\x010\n\x00\x12Ã\x01\n\x0fndkzysj085rux73\x10\x02"\x11\n\x0fw34kgpsnpf1ko92*\x002\x11\n\x0fxo17p8sgvdb2kjw:\x11\n\x0fyzrknxs60phnqleB\x02\x10\x01b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00x�Ôµý\x05€\x01\x01¨\x01#°\x01\rÀ\x01\x01Ê\x01\x00Ò\x01\x00ò\x01H:F\n\x15\n\x040.83\n\x04-5.5\n\x040.83\n\x010\n\x13\n\x032.7\n\x030.0\n\x041.41\n\x010\n\x16\n\x040.83\n\x05156.5\n\x040.83\n\x010\n\x00\x12{\n\x0fm2q19sw3g94cek6\x10\x02"\x11\n\x0fr8lk2yso9t0736d*\x002\x11\n\x0f2ezk90s1zwbwkn5:\x11\n\x0f4ndqmrsvrrbgkveB\x02\x10\x01b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00x�Ôµý\x05€\x01\x01¨\x01#°\x01\tÀ\x01\x01Ê\x01\x00Ò\x01\x00ò\x01\x00\x12Á\x01\n\x0fndqmrs8vogjsrkv\x10\x02"\x11\n\x0fw2j374wsou4ko6d*\x002\x11\n\x0fmo07dzsplnt9knx:\x11\n\x0f2ezk90s80mswkn5B\x00b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00x�Ôµý\x05€\x01\x01¨\x01#°\x01\x12À\x01\x01Ê\x01\x00Ò\x01\x00ò\x01H:F\n\x15\n\x040.83\n\x04-3.5\n\x040.83\n\x010\n\x13\n\x032.3\n\x030.0\n\x041.55\n\x010\n\x16\n\x040.83\n\x05161.5\n\x040.83\n\x010\n\x00\x12Â\x01\n\x0fg676jsr8wlrsokr\x10\x02"\x11\n\x0fyzrknxslwujqle4*\x002\x11\n\x0feg676js8mphpkry:\x11\n\x0fo17p8s0yzja2kjwB\x00b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00x�Ôµý\x05€\x01\x01¨\x01#°\x01\nÀ\x01\x01Ê\x01\x00Ò\x01\x00ò\x01I:G\n\x15\n\x040.83\n\x0412.5\n\x040.83\n\x010\n\x14\n\x041.11\n\x030.0\n\x045.75\n\x010\n\x16\n\x040.83\n\x05147.5\n\x040.83\n\x010\n\x00\x12{\n\x0foj7x6s0l6z2i47g\x10\x02"\x11\n\x0fyzrknxslwujqle4*\x002\x11\n\x0fvmqy6sjrrxc4k9r:\x11\n\x0fw6975ls92esnk23B\x02\x10\x01b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00\x00\x00x�Ôµý\x05€\x01\x01¨\x01#°\x01\nÀ\x01\x01Ê\x01\x00Ò\x01\x00ò\x01\x00\x12À\x01\n\x0fvrqw9s3jnzra47n\x10\x02"\x11\n\x0fw2j374wsou4ko6d*\x002\x11\n\x0f2j374wsg6gbrko6:\x11\n\x0fw34kgpsjddb8ko9B\x00b\x05\x00\x00\x00\x00\x00j\x05\x00\x00\x00(bf97e6db8ebd32e6e2d72d534aae307e.jpg!w80š\x01\rwuhan-dangdai\x1a`\n\x0f1edq0eszz0f4kxg\x10\x022\x0fAlbania Woman\'s:(828790716b6f4f33b06cd3a9e3c17374.jpg!w80š\x01\x0falbania-woman\'s\x1aD\n\x0fjek3psjdx3tdqo2\x10\x022\x16Sakarya Yukselis Womenš\x01\x16sakarya-yukselis-women\x1an\n\x0foj7x6sr6l2cr7g3\x10\x022\x16Edremit Belediye Women:(f777a5ea9b704e38a33945809ae34b53.png!w80š\x01\x16edremit-belediye-women\x1a^\n\x0f8vrqw9s5p3ad7n2\x10\x022\x0eUCAM Murcia CB:(e3a3ec04a37490b55dfea4d2c2157e0a.png!w80š\x01\x0eucam-murcia-cb'
I've tried response.encoding. I've tried json.loads(response.text)
I know this is organized roughly with the competitions on top as index, all the matched in the middle, and another index with the team's names at the bottom, but in order to actually parse it I would like something cleaner to work with.
If it is of any use, the response.headers is
{'Date': 'Thu, 12 Nov 2020 09:36:59 GMT', 'Content-Type': 'application/octet-stream', 'Content-Length': '31740', 'Connection': 'keep-alive', 'Set-Cookie': '__cfduid=d7cae3684159a856129fed3ae7f419c231605173818; expires=Sat, 12-Dec-20 09:36:58 GMT; path=/; domain=.aiscore.com; HttpOnly; SameSite=Lax; Secure, aiclient=7dn5ocn5lq50ejo; Path=/; Domain=.aiscore.com; Max-Age=5184000', 'Access-Control-Allow-Origin': 'https://www.aiscore.com', 'Access-Control-Allow-Credentials': 'true', 'cache-control': 'no-transform', 'CF-Cache-Status': 'DYNAMIC', 'cf-request-id': '065d69cdda0000111dpo898000000001', 'Expect-CT': 'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"', 'Server': 'cloudflare', 'CF-RAY': '5f0f458fred9111d-MAD'}

this header application/octet-stream means that this is a binary file
[https://stackoverflow.com/questions/20508788/do-i-need-content-type-application-octet-stream-for-file-download]
how ever this is not, so something is not right with this response
pls tell me if i helped :)

Why am i being detected as robot when i am replicating the exact request a browser is making?

This is the website "https://www.interlinecenter.com/" this website is making request to "http://cs.cruisebase.com/cs/forms/hotdeals.aspx?skin=605&nc=y" for loading html content in an "I-FRAME". I am making the exact same request using the same headers being sent by the browser but i am not getting the same content.
Here is the code i am using:
url='http://cs.cruisebase.com/cs/forms/hotdeals.aspx?skin=605&nc=y'
header = {
'Host': 'cs.cruisebase.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://www.interlinecenter.com/',
'Connection': 'keep-alive',
'Cookie': 'visid_incap_312345=yt2dprI6SuGoy44xQsnF36dOwV0AAAAAQUIPAAAAAAAqm0pG5WAWOGjtyY8GOrLv; __utma=15704100.1052110012.1572947038.1574192877.1575447075.6; __utmz=15704100.1575447075.6.6.utmcsr=interlinecenter.com|utmccn=(referral)|utmcmd=referral|utmcct=/; ASP.NET_SessionId=pzd3a0l5kso41hhbqf3jiqlg; nlbi_312345=/7dzbSeGvDjg2/oY/eQfhwAAAACv806Zf3m7TsjHAou/y177; incap_ses_1219_312345=tMxeGkIPugj4d1gaasLqECHE5l0AAAAAg1IvjaYhEfuSIYLXtc2f/w==; LastVisitedClient=605; AWSELB=85D5DF550634E967F245F317B00A8C32EB84DA2B6B927E6D5CCB7C26C3821788BFC50D95449A1BA0B0AFD152140A70F5EA06CBB8492B21E10EC083351D7EBC4C68F086862A; incap_ses_500_312345=6PJ9FxwJ3gh0vta6kVvwBthz510AAAAAvUZPdshu8GVWM2sbkoUXmg==; __utmb=15704100.2.10.1575447075; __utmc=15704100; __utmt_tt=1',
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'max-age=0'
}
response = requests.get(url, timeout=10, headers=header)
byte_data = response.content
source_code = html.fromstring(byte_data)
print(response)
print(byte_data)
This is the response i am getting:
<Response [200]>
<html style="height:100%"><head><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><meta name="format-detection" content="telephone=no"><meta name="viewport" content="initial-scale=1.0"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"></head><body style="margin:0px;height:100%"><iframe id="main-iframe" src="/_Incapsula_Resource?SWUDNSAI=9&xinfo=10-99927380-0%200NNN%20RT%281575456049298%202%29%20q%280%20-1%20-1%200%29%20r%281%20-1%29%20B12%284%2c316%2c0%29%20U2&incident_id=500000240101726326-477561257670738314&edet=12&cinfo=04000000&rpinfo=0" frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 500000240101726326-477561257670738314</iframe></body></html>
I need to extract/scrape data at "https://cs.cruisebase.com/cs/forms/hotdeals.aspx?skin=605&nc=y".
Note: i don't want to use the selenium webdriver to get the data any help will be much appreciated, Thanks!

Did you try getting the headers by loading the target URL directly?
I sent a GET request to https://cs.cruisebase.com/cs/forms/hotdeals.aspx?skin=605&nc=y with the following headers, and I was able to get the complete response.
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'Accept-Encoding':'gzip, deflate',
'Accept-Language':'en-GB,en;q=0.9,en-US;q=0.8,hi;q=0.7,la;q=0.6',
'Cache-Control':'no-cache',
'Connection':'keep-alive',
'Cookie':'ENTER COOKIES',
'DNT':'1',
'Host':'cs.cruisebase.com',
'Pragma':'no-cache',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
}
I have left the Cookie field blank, you will have to enter cookies otherwise the page won't load. You can get the cookies from Chrome.

Python requests text only returning ï»¿ï»¿ instead of HTML

I'm trying to scrape the link to a file to download later from a website.
My code:
outage_page = 'https://www.oasis.oati.com/cgi-bin/webplus.dll?script=/woa/woa-planned-outages-report.html&Provider=MISO'
s = requests.Session()
req = s.get(outage_page, stream=True, verify='my cert path is here')
print(req, '\n', req.headers, '\n', req.raw, '\n', req.encoding, '\n', req.content, '\n', req.text)
This is the output I get:
{'Content-Type': 'text/html', 'Content-Encoding': 'gzip', 'Vary': 'Accept-Encoding', 'Server': 'Microsoft-IIS/7.5', 'X-Powered-By': 'ASP.NET', 'X-Content-Type-Options': 'nosniff', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'Date': 'Mon, 26 Aug 2019 15:48:39 GMT', 'Content-Length': '136'}
ISO-8859-1
b'\xef\xbb\xbf\xef\xbb\xbf\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n \r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n'
ï»¿ï»¿
Process finished with exit code 0
I expected req.text to return the html I could scrape, but it only returns ï»¿ï»¿. The other print statements are just for reference here. What am I doing wrong?

I'm going to go ahead and post my solution. So I converted my certificate file from .cer to .pem, included the cert in the session instead of the get and added headers to the request. I changed verify to false because it refers to server side certificate not client side.
# create the connection
s = requests.Session()
s.cert = 'path/to/cert.pem'
head = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36'
}
req = s.get(outage_page, headers=head, verify=False)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Scrape Website protected by Cloudflare without cookies using Python and Requests - python-3.x

I don't know how cloudflare is doing it but I realized that cloudflare create cookies like cf_clearance after a while from your first access to website. If you keep trying your requests in browser your cookies will be generated.

Related

HTTP Request using Python

How can get the json data automatically instead of copy and paste manually?

Decode weirdly formatted requests response data in Python

Why am i being detected as robot when i am replicating the exact request a browser is making?

Python requests text only returning ï»¿ï»¿ instead of HTML

Categories

Resources