We're caching 404 for images as sometimes our app would be released ahead of the actual images and would like to be able to clear them based on status code rather than ALL the images or specific images one by one.
However I am new to Varnish an unsure whether that is doable as I couldn't find any specific documentation on clearing based on status code.
you can either PURGE and image or BAN it.
Purging: it deletes a specific object from cache and to do so you will need to know the host and the URL of the specific object you want to purge.
Banning: to ban you can use regex and for your use case something among those lines should work.
In vcl_recv:
if (req.method == "BAN") {
ban("req.status == "404");
}
It seems that purge method is just an overlay on vcl's ban.
Using varnishadmn to test I've found to purge specific status, code only obj.status is accepted.
varnishadm ban obj.status == 404
verify with:
varnishadm ban.list
Related
I have a problem about python3_anticaptcha (api provided by anti-captcha.com), search on web, find support and try over a month but no luck.
API doc:
https://anticaptcha.atlassian.net/wiki/spaces/API/pages/416972814/GeeTestTaskProxyless+-+captcha+from+geetest.com+without+proxy
I am doing a auto login on a website, and copy the api on anti-captcha's doc:
def runGee(self, challenge):
print("start gee")
try:
# Enter the key to the AntiCaptcha service from your account. Anticaptcha service key.
ANTICAPTCHA_KEY = "mycode"
# обязательные параметры
websiteURL = "https:\/\/www.nike.com.hk"
gt = "2328764cdf162e8e60cc0b04383fef81"
print("sloving1")
print("challenge:" ,challenge)
# пример работы с GeeTestTask без прокси
result = GeeTestTaskProxyless.GeeTestTaskProxyless(anticaptcha_key=ANTICAPTCHA_KEY,
websiteURL=websiteURL,
gt=gt).captcha_handler(challenge=challenge)
print("sloving2")
print(result)
print("--end gee--")
except Exception as err:
print(err)
print("--end with error--")
However, the geetask start over 3 minute(or more), and got error everytime. usually error code like:
{'errorId': 34, 'errorCode': 'ERROR_TOKEN_EXPIRED', 'errorDescription': 'Captcha provider
reported that additional variable token has expired.', 'taskId': 1204556667}
or
{'errorId': 12, 'errorCode': 'ERROR_CAPTCHA_UNSOLVABLE', 'errorDescription': ' Captcha
could not be solved by 5 different workers.', 'taskId': 1204060350}
..etc
depends on what parameter i passed.
May i know am i passing the right value to geetask? or some wrong on the code?
Moreover, if geetest return the correct value, i need to do any else to pass capcha(or pass code to geetest server) or GeeTestTaskProxyless already done(not to do anything)?
it is extremely hard to me, does anyone had used this api successfully? Thanks
The problem is not in the anticaptcha but in the geetest provider.
The token challenger can only be used once, when your browser loads the geetest captcha it expires the token.
To fix this problem, you only need to block the request that consumes the token in your browser.
go to devtools and add the block for the geestest captcha API in the browser, like this:
You can automatically integrate this into the selenium with the following command:
driver.execute_cdp_cmd('Network.setBlockedURLs', {"urls": ["api.geetest.com/get.php"]})
driver.execute_cdp_cmd('Network.enable', {})
It seems for me that those errors are because of proxy (if you use any) or just bad IP.
Personally, I use another captcha service and I didn't have such problems with it.
I advice you to try it, it's actually much easier: https://2captcha.com/2captcha-api#solving_geetest
You should send a request like this one:
https://2captcha.com/in.php?key=1abc234de56fab7c89012d34e56fa7b8&method=geetest>=f1ab2cdefa3456789012345b6c78d90e&challenge=12345678abc90123d45678ef90123a456b&api_server=api-na.geetest.com&pageurl=https://www.example.com/page/
What you need to archieve is to get correct answer from it, like this one:
{
"challenge":"1a2b3456cd67890e12345fab678901c2de",
"validate":"09fe8d7c6ba54f32e1dcb0a9fedc8765",
"seccode":"12fe3d4c56789ba01f2e345d6789c012|jordan" }
Then you just need to implement that answer on a site. Just read the first link I gave you.
Cheers.
I want to clear all pending_update_count in my bot!
The output of below command :
https://api.telegram.org/botxxxxxxxxxxxxxxxx/getWebhookInfo
Obviously I replaced the real API token with xxx
is this :
{
"ok":true,"result":
{
"url":"",
"has_custom_certificate":false,
"pending_update_count":5154
}
}
As you can see, I have 5154 unread updates til now!! ( I'm pretty sure this pending updates are errors! Because no one uses this Bot! It's just a test Bot)
By the way, this pending_update_count number are increasing so fast!
Now that I'm writing this post the number increased 51 and reached to 5205 !
I just want to clear this pending updates.
I'm pretty sure this Bot have been stuck in an infinite loop!
Is there any way to get rid of it?
P.S:
I also cleared the webhook url. But nothing changed!
UPDATE:
The output of getWebhookInfo is this :
{
"ok":true,
"result":{
"url":"https://somewhere.com/telegram/webhook",
"has_custom_certificate":false,
"pending_update_count":23,
"last_error_date":1482910173,
"last_error_message":"Wrong response from the webhook: 500 Internal Server Error",
"max_connections":40
}
}
Why I get Wrong response from the webhook: 500 Internal Server Error ?
I think you have two options:
set webhook that do nothing, just say 200 OK to telegram's servers. Telegram wiil send all updates to this url and the queque will be cleared.
disable webhook and after it get updates by using getUpdates method, after it, turn on webhook again
Update:
Problem with webhook on your side. You can try to emulate telegram's POST query on your URL.
It can be something like this:
{"message_id":1,"from":{"id":1,"first_name":"FirstName","last_name":"LastName","username":"username"},"chat":{"id":1,"first_name":"FirstName","last_name":"LastName","username":"username","type":"private"},"date":1460957457,"text":"test message"}
You can send this text as a POST query body with PostMan for example, and after it try to debug your backend.
For anyone looking at this in 2020 and beyond, the Telegram API now supports clearing the pending messages via a drop_pending_updates parameter in both setWebhook and deleteWebhook, as per the API documentation.
Just add return 1; at the end of your hook method.
Update:
Commonly this happens because of queries delay with the database.
I solved is like this
POST tg.api/bottoken/setWebhook to emtpy "url"
POST tg.api/bottoken/getUpdates
POST tg.api/bottoken/getUpdates with "offset" last update_id appeared before
doing this serveral times
POST tg.api/bottoken/getWebhookInfo
had a look if all away.
POST tg.api/bottoken/setWebhook with filled "url"
If you are using webhook, you can follow these steps
On your web browser, enter the following url with your right value of bot
https://api.telegram.org/bot/getWebhookInf
You will get a result like this on your screen
{"ok":true,"result":{"url":"url_value",...}}
On the displayed result, copy the entire url_value without quotes and replace it on this second url
https://api.telegram.org/bot/setWebhook?url=url_value&drop_pending_updates=True
Enter the second url with right bot and url_value in your web browser then press ENTER
Done!
i solve it by Change file access permissions file - set permissions file to 755
and second increase memory limit in php.ini file
A quick&dirty way is to get a temporary webhook here: https://webhook.site/ and
set your webhook to that (it will answer with a HTTP/200 code everytime, reseting your pending messages to zero)
I faced the same issue for my tele bot after user edited existing message. My bot receives update with editedMessage continuously, but update.hasMessage() was empty. As a result number of updates rocketly increased and my bot stack.
I solved this issue by adding handling for use case when message is missing - send 200 code:
public APIGatewayProxyResponseEvent handleRequest(APIGatewayProxyRequestEvent event, Context context) {
update = MAPPER.readValue(event.getBody(), Update.class);
if (!update.hasMessage()) {
return new APIGatewayProxyResponseEvent()
.withStatusCode(200) // -> !!!!!! return code 200
.withBody("message is missing")
.withIsBase64Encoded(false);
}
... ... ...
I'm using https://github.com/firebase/flashlight to index data for searches
However, this morning I deleted the whole firebase index, so it should be empty (this has worked before, but it seems that when the nodejs app.js crashes in some cases, causing the cache to get "stuck"), but I still see old search results from my nodejs app somehow...
I've tried:
http://localhost:9200/_cache/clear
and
http://localhost:9200/_flush
http://localhost:9200/firebase/_flush
They all say successful, but still I get old results, out of, seemingly nowhere.
I can also see in the console that it refreshes every 60 seconds, and, deleting the whole firebase has worked before without problems...
I even saw a message housekeeping: found 60 orphans (removing them now) in the console so it should be refreshed by now...
I tried restarting elasticsearch as well as the whole Linux/Debian server...
In the config.js I have two indexes:
exports.paths = [
{
path: "tags",
index: "firebase",
type: "tag",
filter: function(data) { return data.name !== 'system'; }
},
{
path: "tracks",
index: "firebase",
type: "track",
filter: function(data) { return data.name !== 'system'; }
}
];
And strangely enough, I have no problem whatsoever when using the 'track' store, instead of using the 'tag' one...
What am I missing here?
// Update !
So, I just deleted the firebase tracks index while the nodejs script was running and the script crashed... Same problem, different index. So the crashing script must cause it... so, how do I clear this stuck cache?
So I fixed it by simply doing:
curl -XDELETE localhost:9200/Firebase
Thanks to: https://github.com/elasticsearch/elasticsearch/issues/7541#issuecomment-54724302
I'm guessing Elastic search is not aware (and has not been told) of the relevance of its current index, perhaps the Flashlight script I'm using is not informing it about what the index should have been? But, since this is only necessary when the node script crashes when you suddenly delete your whole firebase index, it should be catchable somehow, but I'm happy that I can at least fix it like this. Rebuilding the index is not a big issue/task right now, but in the future it might be.
A wild guess, maybe you are not posting the queries correctly. You said you have tried following links:
http://localhost:9200/_cache/clear
http://localhost:9200/_flush
http://localhost:9200/firebase/_flush
If you are accessing the urls from browser it won't clear them. You have to POST them. It is ambiguous from your question if you did it, both GET and POST return same results (showing total, successful and failed). Try this from commandline using curl:
curl -XPOST 'http://localhost:9200/_cache/clear'
curl -XPOST 'http://localhost:9200/_flush'
Or create an AJAX request with JQuery or use fiddler.
Try to optimize your indeces by sending the following POST request to your elasticsearch server:
curl -XPOST 'http://localhost:9200/_optimize?max_num_segments=1&wait_for_merge=true'
It makes lucene really delete the deleted documents from disk and merge indeces.
I query the view like this:
/db/_design/myviewname/_view/foo?key=%22ABC123%22
The result is the following:
{
total_rows: 3,
offset: 3,
rows: [ ]
}
All good.
Since no doc was found I'd like to throw a 404 from a show or list.
Is that possible?
According to the wiki, you can issue redirect responses via Show/List functions. As such, it is also possible to send out arbitrary HTTP status codes. (like 404)
function (head, req) {
start({ code: 404 });
}
I'm not sure if 404 would be the right choice here. It really means not found.
From the W3 HTTP/1.1 rfc2616:
10.4.5 404 Not Found
The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.
There is another more appropriate response status code I think. 204 No Content which sounds more like what you really want to tell the client.
10.2.5 204 No Content
The server has fulfilled the request but does not need to return an entity-body, and might want to return updated metainformation. The response MAY include new or updated metainformation in the form of entity-headers, which if present SHOULD be associated with the requested variant.
If the client is a user agent, it SHOULD NOT change its document view from that which caused the request to be sent. This response is primarily intended to allow input for actions to take place without causing a change to the user agent's active document view, although any new or updated metainformation SHOULD be applied to the document currently in the user agent's active view.
The 204 response MUST NOT include a message-body, and thus is always terminated by the first empty line after the header fields.
Now to set a custom response header you simply specify it in the object passed to the start function, like this.
function(head, req) {
return { "code": 204 };
}
we recently have put Varnish in front of our Drupal because the server was suffering heavy load and we are very pleased in general.
The only problem remaining is that we sometimes have an infinite redirection loop in the cached data. We have found this through our HTTP-Monitoring. We check the front page every minute. The page in the cache sometimes contains the full front page, but with a Location header set, that sends the user to the front page again.
We are not quite sure what could cause this, but also have no clue on how could track this down. Of course, the best way to handle this would be on the drupal side, but we can't really tell why this does happen.
Is there a way to log the cases when this happens? Or is it possible to detect this in varnish and mark the current cache content as invalid?
Of course, we don't want to always pass intentional redirects to the origin server, but the ones that would cause an infinite loop.
I hope to hear some ideas how we can further track this down. Many thank in advance for all kinds of hints.
I have found a workaround for this:
sub vcl_fetch {
// Fix a strange problem: HTTP 301 redirects to the same page sometimes go in$
if (beresp.http.Location == "http://" + req.http.host + req.url) {
if (req.restarts > 2) {
unset beresp.http.Location;
#set beresp.http.X-Restarts = req.restarts;
} else {
return (restart);
}
}
}
I give the backend a second (and thirhd) chance to return a proper page. If that fails as well, the Location header is removed. This works, because the proper page is served with just an additional invalid Location header.
The accepted answer by #philip updated for Varnish 4:
sub vcl_backend_response {
#Fix a strange problem: HTTP 301 redirects to the same page sometimes go in$
if (beresp.http.Location == "http://" + bereq.http.host + bereq.url) {
if (bereq.retries > 2) {
unset beresp.http.Location;
#set beresp.http.X-Restarts = bereq.retries;
} else {
return (retry);
}
}
}