There are 5 static components without a far-future expiration date.
https://fonts.googleapis.com/css?family=Poppins%3A300%2Cregular%2C500%2C600%2C700%2Clatin-ext%2Clatin%2Cdevanagari&ver=1.0.0
https://fonts.googleapis.com/css?family=Poppins%3A700%2C300&ver=1581165984
https://fonts.googleapis.com/css?family=Poppins:700%2C300%2C400%7CRoboto:400%2C900
https://fonts.googleapis.com/css?family=Montserrat%3Aregular%2C700&ver=6.0.5
https://pixel.wp.com/g.gif?v=ext&j=1%3A8.1.1&blog=167300497&post=1633&tz=0&srv=alshuwaifatmarbles.com&host=alshuwaifatmarbles.com&ref=&fcp=2042&rand=0.1473167431208402
Related
I'm trying to make a timer that would check when the last request to a specific path was made and if the last request was made more than a minute ago, I want the script to delete a document from a mongodb database.
I've tried to achieve this with sessions, but I haven't been successful doing so. I've also tried to save the current time to the DB with the request and then checking it, but I don't know how to make the timer "run in the background", if that's even possible. I also want the timer to run per every ID (the ID is included in a table with the request)
Using a TTL index may help you. It would allow you to set an expiry date for documents when they are accessed, and MongoDB will take care of deleting them after the time is up.
Add a TTL index to your database:
db.entites.createIndex({ requestedAt: 1 }, { expireAfterSeconds: 60 });
When a request is made, update the documents you want to expire by setting the current date for the indexed field:
db.entites.updateMany({ requestId }, { $set: { requestedAt: new Date() } });
Any documents with the requestedAt field set, will be deleted 60 seconds after it's set timestamp.
Here is an option you can try:
For each request store some details in variable called last_request_time, e.g., timestamp. The last_request_time value gets updated for each request.
Run a cron job - the background job runs every 10 seconds (or 15, 20, or 30 secs). See node-cron.
The job:
Calculate difference of current_time and last_request_time
If difference is greater then 60 seconds:
Set the last_request_time value to null
Delete document from database
I'm using xKey Vmode plugin to purge objects through tags. I did set up my varnish configuration to work and support xkey but now I didn't find any resource on how to send that data through varnishadm or vcl. currently, I'm using HTTP ban
curl -X BAN -H 'X-Purge-Regex: 1.pbf' varnish
to invalidate with BAN.
Also is that possible to send xkey value with commoa seperated?
like: my cached URL is something like:
www.example.com/foo/xyz?name="t1;t2"
www.example.com/foo/abc?name="t1
www.example.com/foo/xyz?name="t2"
Currently, with BAN URL - i pass t1 value with regex and that is able to invalidate #1 and #2,
but now with Xkey
How to send http with xkey?
Is there a way Xkey supports multiple tags in a single request?
Can I send xkey with (xyz, t2) - With this, I want to invalidate #1 and #2.
Install vmod_xkey
In order to use vmod_xkey, you need to install it by compiling https://github.com/varnish/varnish-modules from source. Please make sure you select the right branch in GitHub, based on the Varnish version you use.
The xkey API
vmod_xkey has 2 functions:
xkey.purge(), which will immediately remove content from cache
xkey.softpurge(), which will mark content as expired, but keeps it around for asynchronous revalidation
The VCL code
Here's the VCL code you can use to invalidate content using tags:
vcl 4.1;
import xkey;
import std;
acl purge {
"localhost";
"192.168.55.0"/24;
}
sub vcl_recv {
if (req.method == "PURGE") {
if (!client.ip ~ purge) {
return(synth(405));
}
if(!req.http.x-xkey-purge) {
return(synth(400,"x-xkey-purge header missing"));
}
set req.http.x-purges = xkey.purge(req.http.x-xkey-purge);
if (std.integer(req.http.x-purges,0) != 0) {
return(synth(200, req.http.x-purges + " objects purged"));
} else {
return(synth(404, "Key not found"));
}
}
}
Please ensure acl purge contains the right IP addresses or IP ranges prior to using this.
By adding import xkey; to the VCL file, secondary keys are automatically registered in Varnish, and can be used later on.
The PURGE request method is used to trigger xkey.purge() and the x-xkey-purge request header is used to specify the tags.
Registering keys
Registering keys happens by specifying them in the Xkey response header. You can register a single key, but you can also add multiple ones.
Multiple keys are separated by space or comma.
Here's an example where 3 keys are registered:
category_sports
id_1265778
type_article
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Cache-Control: public, s-maxage=60
Xkey: category_sports id_1265778 type_article
Removing content based on keys
By performing a PURGE call and by specifying the right X-Xkey-Purge value, content will be removed for these keys.
Here's an example where we remove all objects matching the category_sports tag for all pages on the http://example.com website:
PURGE / HTTP/1.1
Host: example.com
X-Xkey-Purge: category_sports
Here's another example where we invalidate content that matches the foo and bar keys:
PURGE / HTTP/1.1
Host: example.com
X-Xkey-Purge: foo bar
We are using Stormcrawler 1.13 to crawl site pages. When using in one environment, it's not crawling pages having robots meta noindex nofollow but when we are deploying the same modules in another environment, pages with noindex nofollow are also crawled. Below is our crawler-conf.yaml.
# Custom configuration for StormCrawler
# This is used to override the default values from crawler-default.xml and provide additional ones
# for your custom components.
# Use this file with the parameter -conf when launching your extension of ConfigurableTopology.
# This file does not contain all the key values but only the most frequently used ones. See crawler-default.xml for an extensive list.
config:
topology.workers: 1
topology.message.timeout.secs: 300
topology.max.spout.pending: 100
topology.debug: false
fetcher.threads.number: 50
# give 2gb to the workers
worker.heap.memory.mb: 2048
# mandatory when using Flux
topology.kryo.register:
- com.digitalpebble.stormcrawler.Metadata
# metadata to transfer to the outlinks
# used by Fetcher for redirections, sitemapparser, etc...
# these are also persisted for the parent document (see below)
# metadata.transfer:
# - customMetadataName
# lists the metadata to persist to storage
# these are not transfered to the outlinks
metadata.persist:
- _redirTo
- error.cause
- error.source
- isSitemap
- isFeed
http.agent.name: "Anonymous Coward"
http.agent.version: "1.0"
http.agent.description: "built with StormCrawler Archetype ${version}"
http.agent.url: "http://someorganization.com/"
http.agent.email: "someone#someorganization.com"
# The maximum number of bytes for returned HTTP response bodies.
# The fetched page will be trimmed to 65KB in this case
# Set -1 to disable the limit.
http.content.limit: -1
# FetcherBolt queue dump => comment out to activate
# if a file exists on the worker machine with the corresponding port number
# the FetcherBolt will log the content of its internal queues to the logs
# fetcherbolt.queue.debug.filepath: "/tmp/fetcher-dump-{port}"
parsefilters.config.file: "parsefilters.json"
urlfilters.config.file: "urlfilters.json"
# revisit a page daily (value in minutes)
# set it to -1 to never refetch a page
fetchInterval.default: 1440
# revisit a page with a fetch error after 2 hours (value in minutes)
# set it to -1 to never refetch a page
fetchInterval.fetch.error: 120
# never revisit a page with an error (or set a value in minutes)
fetchInterval.error: -1
# custom fetch interval to be used when a document has the key/value in its metadata
# and has been fetched successfully (value in minutes)
# fetchInterval.FETCH_ERROR.isFeed=true: 30
# fetchInterval.isFeed=true: 10
# configuration for the classes extending AbstractIndexerBolt
# indexer.md.filter: "someKey=aValue"
indexer.url.fieldname: "url"
indexer.text.fieldname: "content"
indexer.canonical.name: "canonical"
indexer.md.mapping:
- parse.title=title
- parse.keywords=keywords
- parse.description=description
- domain=domain
# Metrics consumers:
topology.metrics.consumer.register:
- class: "org.apache.storm.metric.LoggingMetricsConsumer"
parallelism.hint: 1
Please let me know if need to do some changes in above code or any other configurations in storm-crawler.
Thank you.
The behaviour of meta noindex is not configurable in 1.13 so any difference between your environments can't be due to a difference in configuration.
How did you generate the topology? Did you use the archetype?
PS: it is good practice to set the http.agent.* configs.
I'm trying to store a persistent cookie with SimpleCookie, so that I'm doing something like:
def add_cookie(self, name, value, expiry = None):
self.cookies[name] = value
if expiry is not None:
self.cookies[name]['expires'] = expiry
print(self.cookies[name].OutputString())
Output of print:
remember_me=blabla; expires=Sun, 02 Jul 2017 13:30:57 GMT
Of course then it's passed to wsgiref.simple_server's start_response function, with something like
(Set-Cookie, cookie['remember_me'].OutputString())
and the cookie is created on browser/client side, however expiry time is not updating.
Any idea how to set the correct expiry time and make persistent cookie instead of session cookie?
Thanks.
Issue solved, the described method is fine, I was just rewriting the expiry next time automatically, and that's why expiry disappeared always.
Is there any way to change the expiration of a Parse session token to something other than 1 year or no-expiry?
Ideally I'd like to change the expiry to something like 14 days.