I want to test whether some URLs are broken or not.
Now I just assert some words I know are on these pages. I feel this isn't the best that i can do. Any help?
I figured it out,
You Can detect "500 Internal Server Error" using poltergeist
Inspecting network traffic
You can inspect the network traffic (i.e. what resources have been loaded) on the current page by calling page.driver.network_traffic. This returns an array of request objects. A request object has a response_parts method containing data about the response chunks.
so, this will work :
page.driver.network_traffic.each do |request|
request.response_parts.uniq(&:url).each do |response|
puts "Error : #{response.url}" if response.status == 500
end
end
Related
I have a Pyramid web app with fail2ban set up to jail ten consecutive 404 statuses (i.e. bots that probe for vulnerabilities), Sentry error logging and, as far as I know, there are no security vulnerabilities. However, every few days I get a notification of a 502 caused by a null byte attack. This is harmless, but it has become very tiresome and I ignored a bizarre but legitimate human-user–generated 502 status as a result.
A null byte attack in Pyramid, in my set-up, raises a URLDecodeError ('utf-8' codec can't decode byte 0xc0 in position 16: invalid start byte) at the url dispatch level, so is not routed to the notfound_view_config decorated view.
Is there any way to capture %EF/%BF in requests in Pyramid or should I block them in Apache?
Comment by Steve Piercy converted into an Answer:
A search in the Pyramid issue tracker yields several related results. The first hit provides one way to deal with it.
In brief, the view constructor class exception_view_config(ExceptionClass, renderer) captures it behaving like notfound_view_config or forbidden_view_config (which aren't passed declared routes in contrast to view_config).
So the 404 view could look like:
from pyramid.view import notfound_view_config
from pyramid.exceptions import URLDecodeError
from pyramid.view import exception_view_config
#exception_view_config(context=URLDecodeError, renderer='json')
#notfound_view_config(renderer='json')
def notfound_view(request):
request.response.status = 404
return {"status": "error"}
This can be tested by visiting the browser http://0.0.0.0:👾👾/%EF%BF (where 👾👾 is the port served onto).
However, there are two additionally considerations.
It does not play well with the debug toolbar (pyramid.includes = pyramid_debugtoolbar in the local configuration ini file).
Also, an error gets raises if any dynamic attribute like request.path_info gets accessed. So either the response is minimally formatted or request.environ['PATH_INFO'] is assigned a new value before any operation in the view (e.g. usage data etc.).
The view call happens after the debugtoolbar error is raises, however, so the first point still stands even with a request.environ['PATH_INFO'] = 'hacked'.
Bonus
As this is unequivocally an attack, this could be customised to play well with fail2ban to block the hacker IP as described here by using a unique status code, say 418, at the first occurrence.
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("privnote.com", 80))
#s = ssl.wrap_socket(s, keyfile=None, certfile=None, server_side=False, cert_reqs=ssl.CERT_NONE, ssl_version=ssl.PROTOCOL_SSLv23)
def claim_note(note_url):
s.sendall(b'DELETE /'+note_url.encode()+b'HTTP/1.1\r\nX-Requested-With: XMLHttpRequest\r\nHost: privnote.com\r\n')
print(s.recv(4096))
This is my code, now let me first start by saying that I have tried so many different things apart from this. I’ve tried https port and http port, 443, 80. I’ve commented and uncommented the statement that wraps the socket with ssl. All with the same outcome. Either the api returning absolutely nothing or the api telling me the request couldn’t be understood by the server. I was looking at a GitHub repo and only one header was used and it was because it was for an Ajax call which was x-requested-with. I tried adding user agent content type and now I’m just using host and x requested with. It’s a DELETE request and the url is the first 8 chars after the link. I’ve also tried adding \r\n\r\n at the end and even tried content-length. I don’t know what else to do. I want to know why the server is saying that.
There are multiple problems with your code. If you actually print out the request you are trying to sent it will look like this:
b'DELETE /node_urlHTTP/1.1\r\nX-Requested-With: XMLHttpRequest\r\nHost: privnote.com\r\n'
There are two problems with this line: a missing space between /node_url and HTTP/1.1 and a missing final \r\n als end-of-header marker at the end. Once these are fixed you get a successful response - a 302 redirect to the HTTPS version:
b'HTTP/1.1 302 Found\r\nDate:...\r\nLocation: https://privnote.com/node_url ...
When repeating the request with HTTPS and a valid node_url (with an invalid node_url you get an error that DELETE is not an allowed method):
s.connect(("privnote.com", 443))
s = ssl.wrap_socket(s)
...
b'HTTP/1.1 200 OK\r\n ...
UPDATE: See MarkLogic 8 - Stream large result set to a file - JavaScript - Node.js Client API for someone's answer on how to do this in Javascript. This question is specifically asking about XQuery.
I have a web application that consumes rest services hosted in node.js.
Node simply proxies the request to XQuery which then queries MarkLogic.
These queries already have paging setup and work fine in the normal case to return a page of data to the UI.
I need to have an export feature such that when I put a URL parameter of export=all on a request, it doesn't lookup a page anymore.
At that point it should get the whole result set, even if it's a million records, and save it to a file.
The actual request needs to return immediately saying, "We will notify you when your download is ready."
One suggestion was to use xdmp:spawn to call the XQuery in the background which would save the results to a file. My actual HTTP request could then return immediately.
For the spawn piece, I think the idea is that I run my query with different options in order to get all results instead of one page. Then I would loop through the data and create a string variable to call xdmp:save with.
Some questions, is this a good idea? Is there a better way? If I loop through the result set and it does happen to be very large (gigabytes) it could cause memory issues.
Is there no way to directly stream the results to a file in XQuery?
Note: Another idea I had was to intercept the request at the proxy (node) layer and then do an xdmp:estimate to get the record count and then loop through querying each page and flushing it to disk. In this case I would need to find some way to return my request immediately yet process in the background in node which seems to have some ideas here: http://www.pubnub.com/blog/node-background-jobs-async-processing-for-async-language/
One possible strategy would be to use a self-spawning task that, on each iteration, gets the next page of the results for a query.
Instead of saving the results directly to a file, however, you might want to consider using xdmp:http-post() to send each page to a server:
http://docs.marklogic.com/xdmp:http-post?q=xdmp:http-post&v=8.0&api=true
In particular, the server could be a Node.js server that appends each page as it arrives to a file or any other datasink.
That way, Node.js could handle the long-running asynchronous IO with minimal load on the database server.
When a self-spawned task hits the end of the query, it can again use an HTTP request to notify Node.js to close the file and report that the export is finished.
Hping that helps,
I have problems when I use bottlenose.
According to its instructions, I need to add a error_handler as per instructions.
in the instructions I placed the function:
def error_handler(err):
ex = err['exception']
if isinstance(ex, HTTPError) and ex.code == 404:
time.sleep(random.expovariate(0.1))
return True
The examples in the instruction says to use this line:
amazon = bottlenose.Amazon(ErrorHandler=error_handler)
I have this:
amazon = bottlenose.Amazon(AWSAccessKeyId=ACCESS_KEY_ID, AWSSecretAccessKey = SECRET_KEY,AssociateTag = ASSOC_TAG)
But I'm getting no correct response. Why?
Are you submitting requests too quickly? You need to slow down. One request per second is a good speed.
The Amazon Product Advertising API returns errors in three categories so that you can easily determine how best to handle the problem:
2XX errors are caused by mistakes in the request. For example, your
request might be missing a required parameter. The error message in
the response gives a clear indication what is wrong.
4XX errors are non-transient errors. Upon receiving this error,
resubmit the request.
5XX errors are transient errors reflecting an error internal to
Amazon. A 503 error means that you are submitting requests too
quickly and your requests are being throttled. If this is the case,
you need to slow your request rate to one request per second.
I query the view like this:
/db/_design/myviewname/_view/foo?key=%22ABC123%22
The result is the following:
{
total_rows: 3,
offset: 3,
rows: [ ]
}
All good.
Since no doc was found I'd like to throw a 404 from a show or list.
Is that possible?
According to the wiki, you can issue redirect responses via Show/List functions. As such, it is also possible to send out arbitrary HTTP status codes. (like 404)
function (head, req) {
start({ code: 404 });
}
I'm not sure if 404 would be the right choice here. It really means not found.
From the W3 HTTP/1.1 rfc2616:
10.4.5 404 Not Found
The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.
There is another more appropriate response status code I think. 204 No Content which sounds more like what you really want to tell the client.
10.2.5 204 No Content
The server has fulfilled the request but does not need to return an entity-body, and might want to return updated metainformation. The response MAY include new or updated metainformation in the form of entity-headers, which if present SHOULD be associated with the requested variant.
If the client is a user agent, it SHOULD NOT change its document view from that which caused the request to be sent. This response is primarily intended to allow input for actions to take place without causing a change to the user agent's active document view, although any new or updated metainformation SHOULD be applied to the document currently in the user agent's active view.
The 204 response MUST NOT include a message-body, and thus is always terminated by the first empty line after the header fields.
Now to set a custom response header you simply specify it in the object passed to the start function, like this.
function(head, req) {
return { "code": 204 };
}