mod_pagespeed many "Unrecognized Script" messages in the console - mod-pagespeed

I am running mod_pagespeed (Apache) on my server and it is functioning great, apart from many "info level" messages in the admin console log. The site is currently scoring 94/89 on GTMetrix and I'm at the point of trying to squeeze that last bit of juice from this wonderful module.
These messages refer to unrecognised scripts that mod_pagespeed was attempting to process. In most cases they contain json or text/html, i.e.:
[Wed, 29 Jan 2020 13:17:10 GMT] [Info] [16645] [https://www.example.com/comment/:271] Unrecognized script:'<script type='application/ld+json'></script> 271...295'
[Wed, 29 Jan 2020 13:17:10 GMT] [Info] [16645] [https://www.example.com/comment/:88] Unrecognized script:'<script type='application/ld+json' class='yoast-schema-graph yoast-schema-graph--main'></script> 88...88'
[Wed, 29 Jan 2020 13:13:01 GMT] [Info] [14352] [https://www.example.com/comment/:47] Unrecognized script:'<script type="text/html" id="wpb-modifications"></script> 47...47'
[Wed, 29 Jan 2020 13:13:01 GMT] [Info] [14352] [https://www.example.com/comment/:11] Unrecognized script:'<script type='application/ld+json'></script> 11...33'
[Wed, 29 Jan 2020 13:13:01 GMT] [Info] [14352] [https://www.example.com//:11] Unrecognized script:'<script type='application/ld+json' class='yoast-schema-graph yoast-schema-graph--main'></script> 11...11'
Does anyone here know of any way to configure Pagespeed to be able to recognise this format?
I've already fixed some issues with unrecognised image links within some tags:
ModPagespeedUrlValuedAttribute img data-lazy-src image
ModPagespeedUrlValuedAttribute a data-tolb-src image
And those assets are now being rewritten/compressed.
My main problem is that I don't know how to get Pagespeed to recognise these script tags.
Is there a way? (Is it even necessary for these type of script tags?)
Many thanks in advance.

Related

Remixd Error when Hooking up shared folder to Remix

The first type of error occurs when I run the listener on my Windows 10 using a terminal (powershell). The listener (remixd) starts OK then when I go to connect to the browser session I get the following:
PS C:\Windows\System32> remixd -s D:\zz210201_shared_folder --remix-ide https://remix.ethereum.org
[WARN] You may now only use IDE at https://remix.ethereum.org to connect to that instance
[WARN] Any application that runs on your computer can potentially read from and write to all files in the directory.
[WARN] Symbolic links are not forwarded to Remix IDE
Wed Feb 03 2021 18:29:11 GMT+0700 (Indochina Time) Remixd is listening on 127.0.0.1:65520
setup notifications for D:\zz210201_shared_folder
Error: Error: EPERM: operation not permitted, lstat 'D:\System Volume Information'
When I use the desktop version I get the following errors when trying to make the connection.
Wed Feb 03 2021 18:13:57 GMT+0700 (Indochina Time) Remixd is listening on 127.0.0.1:65520
Wed Feb 03 2021 18:14:14 GMT+0700 (Indochina Time) Connection from origin package://6fd22d6fe5549ad4c4d8fd3ca0b7816b.mod rejected.
Wed Feb 03 2021 18:14:15 GMT+0700 (Indochina Time) Connection from origin package://6fd22d6fe5549ad4c4d8fd3ca0b7816b.mod rejected.
Wed Feb 03 2021 18:14:29 GMT+0700 (Indochina Time) Connection from origin package://6fd22d6fe5549ad4c4d8fd3ca0b7816b.mod rejected.
Wed Feb 03 2021 18:14:30 GMT+0700 (Indochina Time) Connection from origin package://6fd22d6fe5549ad4c4d8fd3ca0b7816b.mod rejected.
Wed Feb 03 2021 18:18:31 GMT+0700 (Indochina Time) Connection from origin https://remix.ethereum.org rejected.
Wed Feb 03 2021 18:18:32 GMT+0700 (Indochina Time) Connection from origin https://remix.ethereum.org rejected.
Not sure how to proceed to get it working. Any clues appreciated.
Fix Remixd EPERM error
npm uninstall -g remixd
npm install -g #remix-project/remixd --force

axe-crawler selenium webdriver chrome error

I am running axe-crawler from the command line using node and getting the following error
*ERROR: Thu Jul 23 2020 11:14:32 GMT+0100 (British Summer Time)
Error encountered in using Selenium Webdriver:
ERROR: Thu Jul 23 2020 11:14:32 GMT+0100 (British Summer Time)
[object Object]*
Not really very descriptive. Can anyone point me in the right direction to either be able to get the proper error or even point out where I might be going wrong?
As there did not seem to be a way of accessing the errors, I ended up using a C#/node hybrid, more from a sensible debugging need. I scraped the site with dotnet HtmlAgilityPack and then ran axe-core through node/javascript to produce the json response.

Couchdb Logging

Due to application requirements, I have an externally accessible CouchDB instance. I would like to see what IP addresses are attempting to authenticate with my database. By checking the couchdb.log file, I can see failed authentication attempts. They look similar to this.
[Mon, 29 Sep 2014 13:43:32 GMT] [info] [<0.28472.7>] 127.0.0.1 - - GET
/offline_master/ 401
However, no matter where I connect from, it seems that the IP address that is logged is always 127.0.0.1. Am I mis-understanding how this works? I would really like to see the IP address that is attempting to connect.
The 127.0.0.1 is the address couchDB is bound to. It's there because you can set up couchdb to respond differently depending on what host name is being used.
The only way to get the client ip address is by turning the logging level to "debug". You can do this in the configuration page in futon.
You get records like this (client IP is on 1st line):
[Tue, 30 Sep 2014 00:14:27 GMT] [debug] [<0.451.4>] 'GET' / {1,1} from "192.168.1.52"
Headers: [{'Accept',"*/*"},
{'Host',"localhost:5984"},
{'User-Agent',"curl/7.30.0"}]
[Tue, 30 Sep 2014 00:14:27 GMT] [debug] [<0.451.4>] OAuth Params: []
[Tue, 30 Sep 2014 00:14:27 GMT] [info] [<0.451.4>] 127.0.0.1 - - GET / 200
Be careful with this. The debug logs are extremely verbose. It doesn't take long to fill up a hard drive.
It is possible to set log levels by module. The module you need to set is couch_httpd. Set the default for the rest to "error" or "fatal".
See: 3.6.2 Per module logging

Custom logging in couch.log from couchapp?

Is it possible to write in the couchdb server log (the one defined by default.ini or local.ini in [log]) from a couchapp? (But from somewhere else than a view)
If that's not possible, maybe there's a workaround which would allow to log successful or unsuccessful authentication attemps in the couchdb server log? I'd like to process this server side and would like to avoid logging all httpd activity and grepping for user logging patterns, which doesn't seem to be easy or pretty...
Cheers,
Jun
A year later I find that it was in fact possible to log from views (or lists or any Javascript Design Doc functions) using the log() function: http://docs.couchdb.org/en/1.6.1/query-server/javascript.html#log
log(message)
Log a message to the CouchDB log (at the INFO level).
Arguments:
message – Message to be logged
function(doc){
log('Procesing doc ' + doc['_id']);
emit(doc['_id'], null);
}
After the map function has run, the following line can be found in CouchDB logs (e.g. at /var/log/couchdb/couch.log):
[Sat, 03 Nov 2012 17:38:02 GMT] [info] [<0.7543.0>] OS Process #Port<0.3289> Log :: Processing doc 8d300b86622d67953d102165dbe99467
Who would have guessed :)
I'm pretty sure you can't write to couch.log from a view, it's a sandboxed system.
Getting a record of connections to the server is possible though. Here's a dump from my couch.log, with an HTTP error in there:
/
[Sat, 13 Sep 2014 08:18:57 GMT] [info] [<0.160.0>] Opening index for db: test idx: _design/ivet sig: "f6b64ef8593e23cac644c13b895b7607"
[Sat, 13 Sep 2014 08:18:57 GMT] [info] [<0.121.0>] 127.0.0.1 - - GET /test/_design/ivet/_view/medicationWHP/foobar?include_docs=true 200
[Sat, 13 Sep 2014 08:18:57 GMT] [info] [<0.121.0>] 127.0.0.1 - - GET /test/_design/ivet/_view/medicationWHP/foobar?include_docs=true 500
[Sat, 13 Sep 2014 08:18:57 GMT] [error] [<0.121.0>] httpd 500 error response:
{"error":"json_encode","reason":"{bad_term,{key,null}}"}
[Sat, 13 Sep 2014 08:19:05 GMT] [info] [<0.36.0>] Apache CouchDB has started on http://127.0.0.1:5984/
You can see it has the VERB PATH CODE format for each line, so you can filter that for whatever you need. (Unauthorized is 401) You can also access the log through /_log. Details on that are here:
http://docs.couchdb.org/en/latest/api/server/common.html#log
To get all that information, you'll need to have the log level set to info. You can do this at the config screen in futon.
To do it server-side, you'd probably need to use node.js or something like that. Just have it consume the /_log endpoint, and filter each line by the HTTP response code.

Is it possible to prevent fetching of remote design document in couchdb

Update
As #AkshatJiwanSharma suggested I have tried a few things while locally replicating. Very instructive! I have renamed the question since the problem is not that the design document gets replicated, in fact it isn't replicated, but it is fetched via an HTTP GET as part of the initial replication "negotiation" phase.
I've moved the original question to the bottom to make the new question clearer. The new question is:
It seems inefficient (particularly in the case of CouchApps) to fetch the entire design document - i.e. the entire remote app - when initiating a replication with a remote source. Can this be avoided?
It is particularly problematic in our case, on high latency links (less than 7.2Kbps), with relatively large design documents (3MB).
Remote Target
I have first tried by using a "remote" target by setting the replication target to http://127.0.0.1:5984/emr_replica.
[Fri, 08 Aug 2014 08:36:20 GMT] [info] [<0.18947.7>] Document `88fa1b1a1315d27ded663466c6003578` triggered replication `e8e66a554d198b88b6263a572a072fd3+continuous`
[Fri, 08 Aug 2014 08:36:20 GMT] [info] [<0.18946.7>] starting new replication `e8e66a554d198b88b6263a572a072fd3+continuous` at <0.18947.7> (`emr_demo` -> `http://127.0.0.1:5984/emr_replica/`)
[Fri, 08 Aug 2014 08:36:20 GMT] [info] [<0.18928.7>] 127.0.0.1 - - POST /emr_replica/_revs_diff 200
[Fri, 08 Aug 2014 08:36:20 GMT] [info] [<0.18915.7>] y.y.y.y - - GET /_utils/_sidebar.html 200
[Fri, 08 Aug 2014 08:36:20 GMT] [info] [<0.18916.7>] y.y.y.y - - GET /_replicator/88fa1b1a1315d27ded663466c6003578?revs_info=true 200
In that case the design document doesn't seem to be fetched.
Remote Source
Then setting the source as "remote" like this
{
"_id": "88fa1b1a1315d27ded663466c6003a4a",
"_rev": "3-b6408e98acafe729da0153c35d9df113",
"source": "http://127.0.0.1:5984/emr_demo",
"target": "emr_replica",
"continuous": true,
"filter": "emr/user_data",
"owner": "jun"
}
Then the server fetches the remote design document before starting the replication (GET /emr_demo/_design/emr 200).
[Fri, 08 Aug 2014 08:42:17 GMT] [info] [<0.19687.7>] Document `88fa1b1a1315d27ded663466c6003a4a` triggered replication `bd8f6288970bca974dba36dbc6e5353b+continuous`
[Fri, 08 Aug 2014 08:42:17 GMT] [info] [<0.19686.7>] starting new replication `bd8f6288970bca974dba36dbc6e5353b+continuous` at <0.19687.7> (`http://127.0.0.1:5984/emr_demo/` -> `emr_replica`)
[Fri, 08 Aug 2014 08:42:17 GMT] [info] [<0.19648.7>] 127.0.0.1 - - HEAD /emr_demo/ 200
[Fri, 08 Aug 2014 08:42:17 GMT] [info] [<0.19648.7>] 127.0.0.1 - - GET /emr_demo/_design/emr 200
[Fri, 08 Aug 2014 08:42:18 GMT] [info] [<0.19656.7>] 127.0.0.1 - - GET /emr_demo/5cc2db69a32a84091b96c244273fda0e?revs=true&open_revs=%5B%221-ef8967557f2e99eb137f963daccddb3f%22%5D&latest=true 200
Further testing shows that this fetching of the design document is only done once. Further replications (including after restarting the server) only fetch the changes with the appropriate filter:
[Fri, 08 Aug 2014 09:06:36 GMT] [info] [<0.520.0>] Document `88fa1b1a1315d27ded663466c6003a4a` triggered replication `bd8f6288970bca974dba36dbc6e5353b+continuous`
[Fri, 08 Aug 2014 09:06:36 GMT] [info] [<0.519.0>] starting new replication `bd8f6288970bca974dba36dbc6e5353b+continuous` at <0.520.0> (`http://127.0.0.1:5984/emr_demo/` -> `emr_replica`)
[Fri, 08 Aug 2014 09:06:36 GMT] [info] [<0.335.0>] 127.0.0.1 - - GET /emr_demo/_changes?filter=emr%2Fuser_data&feed=continuous&style=all_docs&since=1607&heartbeat=1666 200
[Fri, 08 Aug 2014 09:06:36 GMT] [info] [<0.334.0>] 127.0.0.1 - - GET /emr_demo/5cc2db69a32a84091b96c24427560310?atts_since=%5B%2218-b613d3160bd09c45ac07a5485c9c7bce%22%5D&revs=true&open_revs=%5B%2219-d50438143337a3a0af5ed8ceb75b42f5%22%5D&latest=true 200
Former question
We're trying to use the couchdb replication over a very high latency link (slow, frequent disconnections,...). We want to avoid to replicate the design document which is heavy. We have a filter in place and when using the following curl command, the design document doesn't appear, as expected:
curl http://x.x.x.x:5984/emr/_changes?filter=emr/user_data
Our replication document is:
{
"_id": "e0e38be8cc0b11356dfb03bc8400074d",
"_rev": "1-d77117f03d63099e1e505b9f9de3371d",
"source": "http://x.x.x.x:5984/emr",
"target": "emr",
"continuous": true,
"filter": "emr/user_data",
"create_target": true,
"owner": "jun"
}
We have deactivated authentication while we're debugging. When using an existing database and removing create_target, the same problem occurs.
The source server outputs the following:
[Mon, 10 Mar 2014 21:22:03 GMT] [info] [<0.135.0>] Retrying HEAD request to http://x.x.x.x:5984/emr/ in 0.25 seconds due to error {conn_failed,{error,etimedout}}
[Mon, 10 Mar 2014 21:23:47 GMT] [info] [<0.135.0>] Retrying GET request to http://x.x.x.x:5984/emr/_design/emr in 0.25 seconds due to error req_timedout
[Mon, 10 Mar 2014 21:24:14 GMT] [error] [<0.135.0>] Replicator, request GET to "http://x.x.x.x:5984/emr/_design/emr" failed due to error {error,req_timedout}
[Mon, 10 Mar 2014 21:24:14 GMT] [error] [<0.135.0>] Replication manager, error processing document `e0e38be8cc0b11356dfb03bc8400074d`: Couldn't open document `_design/emr` from source database `http://x.x.x.x:5984/emr/`: {'EXIT',{http_request_failed,"GET","http://x.x.x.x:5984/emr/_design/emr",
{error,{error,req_timedout}}}}
When using tcpdump, it's clear that the replication fails because the replication manager attempts to download the heavy design document (http://x.x.x.x:5984/emr/_design/emr).
FYI the replicator's configuration is:
replicator connection_timeout 5000
db _replicator
http_connections 1
max_replication_retry_count 3
retries_per_request 1
socket_options [{keepalive, true}, {nodelay, true}]
ssl_certificate_max_depth 3
verify_ssl_certificates false
worker_batch_size 1
worker_processes 1
EDIT: The user_data function (which correctly hides the design document when ran through curl as above) is :
exports.user_data = function(doc, req) {
if (doc.collection == "visits" || doc.collection == "patients" || doc.collection == "reports") {
return true;
}
return false;
}
Hope someone can help!
Suggestion
Try defining a filter function in another, small, dedicated design document and see if that fixes your problem.
// replicator document:
{
"_id": "e0e38be8cc0b11356dfb03bc8400074d",
"_rev": "1-d77117f03d63099e1e505b9f9de3371d",
"source": "http://x.x.x.x:5984/emr",
"target": "emr",
"continuous": true,
"filter": "small-design-doc/user_data",
"create_target": true,
"owner": "jun"
}
// _design/small-design-doc
// -- will be replicated, but is quite small:
{
"_id": "_design/small-design-doc",
"_rev": "1-...",
"filters": {
"user_data": "function(doc, req) { ... }"
}
}
Explanation
According to a current snapshot of the source code, it seems the replicator is trying to fetch the design document (_design/emr) from the source database, simply because the filter function is defined there (emr/user_data).
If you specify a filter function in another design document, the replicator should try to download that very document before executing replication. So you cannot quite circumvent downloading any design document, but you are able to select which one.
Great question by the way. And very thoroughly investigated!

Resources