CloudFlare - requests with empty HTTP_CF_CONNECTING_IP - security

We have a website (with CloudFlare in front).
We are constantly getting scanned/checked for vulnerabilities and requests look like this:
2020-01-28 14:19:59 Content type: application/x-www-form-urlencoded
2020-01-28 14:19:59 Request content: <?=md5("phpunit")?>
2020-01-28 14:19:59 HTTP referer:
2020-01-28 14:19:59 User agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36
2020-01-28 14:19:59 HTTP_CF_CONNECTING_IP:
2020-01-28 14:19:59 HTTP_CF_IPCOUNTRY:
2020-01-28 14:19:59 Query: path=vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php
2020-01-28 14:19:59 REMOTE_ADDR: 5.101.0.209
2020-01-28 14:19:59 REMOTE_HOST:
I have added address 5.101.0.209 to firewall in CloudFalre but requests are still coming through (somehow).
I have following questions
How can requests come via CloudFlare but variable
HTTP_CF_CONNECTING_IP is empty?
How would you recommend to defend against such scanning?
Why CloudFlare firewall does not block such request, what could be the reasons?
Thanks.

Related

Using proxy to make request results in bad request (400) error code

I'm using node-fetch and https-proxy-agent to make a request using a proxy, however, I get a 400 error code from the site I'm scraping only when I send the agent, without it, everything works fine.
import fetch from 'node-fetch';
import Proxy from 'https-proxy-agent';
const ip = PROXIES[Math.floor(Math.random() * PROXIES.length)]; // PROXIES is a list of ips
const proxyAgent = Proxy(`http://${ip}`);
fetch(url, {
agent: proxyAgent,
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.72 Safari/537.36'
}
}).then(res => res.text()).then(console.log)
This results in a 400 error code like so:
I have absolutely no idea why this is happening. If you want to reproduce the issue, I'm scraping https://azlyrics.com. Please let me know what is wrong.
The issue has been fixed. I did not notice I was making a request to a https site with a http proxy. The site was using https protocol but the proxies were http only. Changing to https proxies works. Thank you.

How can I send http post request on python with protobuf text as params?

I want to send http request using protobuf as params on python. I copied the protobuf data from charles proxy (web debugging proxy tool).
the protobuf text request data was:1 { 1: "2345654456765" }
i tried this but not working:
import requests
r = requests.post('https://api.website.com/version/auth/login?locale=en',data={1:{1:'2345654456765'}},headers={'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4181.9 Safari/537.36','platform': 'web',})
print(r.content)
I have no idea of how can I put this as a param. I always worked with json data. Is there anyone who knows the solution?

Python: Access Denied at Random Points When Using Requests

I am using requests and beautifulsoup to go through the popular comic store comixology in order to make a list of all comic titles and issues and release date for all of them, so I am requesting a massive amount of web pages. Unfortunately, partway through i will get the error:
you do not have access to (URL) on this server
I tried using a function that recursively tries the request. but this isn't working
Im not putting the whole code in because it is very long.
def getUrl(url):
try:
page = requests.get(url)
except:
getUrl(url)
return page
The User-Agent request header contains a characteristic string that allows the network protocol peers to identify the application type, operating system, software vendor or software version of the requesting software user agent. Validating User-Agent header on server side is a common operation so be sure to use valid browser’s User-Agent string to avoid getting blocked.
(Source: http://go-colly.org/articles/scraping_related_http_headers/)
The only thing you need to do is to set a legitimate user-agent. Therefore add headers to emulate a browser. :
# This is a standard user-agent of Chrome browser running on Windows 10
headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36' }
Example:
from bs4 import BeautifulSoup
import requests
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}
resp = requests.get('http://example.com', headers=headers).text
soup = BeautifulSoup(resp, 'html.parser')
Additionally, you can add another set of headers to pretend like a legitimate browser. Add some more headers like this:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language' : 'en-US,en;q=0.5',
'Accept-Encoding' : 'gzip',
'DNT' : '1', # Do Not Track Request Header
'Connection' : 'close'
}

Request Headers: 41d9251ae3b6e89193fe, what does it mean?

As we know, the general request headers are always like "User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36", or "Accept:application/json, text/plain".
But this time, I find a request header: "41d9251ae3b6e89193fe:1b237fc9847ec56e144031e03cc72d704777ef4167026f236eb3dd8d2c5b15ad837a20c8ce14459ae9f5c36e581b1322229b548178cce6cdf07ebbea7765f2df", and I have never seen a request header like that.
What does it mean?

Log user browser details in Node.js

Is there a way to log which browser/OS/etc. the user is using from my Node.js app?
Thanks
I believe you want the information stored in the Request Header "User-Agent"
var useragent = request.headers['User-Agent']
My user-agent is: "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.220 Safari/535.1" for chrome

Resources