I am running mod_pagespeed (Apache) on my server and it is functioning great, apart from many "info level" messages in the admin console log. The site is currently scoring 94/89 on GTMetrix and I'm at the point of trying to squeeze that last bit of juice from this wonderful module.
These messages refer to unrecognised scripts that mod_pagespeed was attempting to process. In most cases they contain json or text/html, i.e.:
[Wed, 29 Jan 2020 13:17:10 GMT] [Info] [16645] [https://www.example.com/comment/:271] Unrecognized script:'<script type='application/ld+json'></script> 271...295'
[Wed, 29 Jan 2020 13:17:10 GMT] [Info] [16645] [https://www.example.com/comment/:88] Unrecognized script:'<script type='application/ld+json' class='yoast-schema-graph yoast-schema-graph--main'></script> 88...88'
[Wed, 29 Jan 2020 13:13:01 GMT] [Info] [14352] [https://www.example.com/comment/:47] Unrecognized script:'<script type="text/html" id="wpb-modifications"></script> 47...47'
[Wed, 29 Jan 2020 13:13:01 GMT] [Info] [14352] [https://www.example.com/comment/:11] Unrecognized script:'<script type='application/ld+json'></script> 11...33'
[Wed, 29 Jan 2020 13:13:01 GMT] [Info] [14352] [https://www.example.com//:11] Unrecognized script:'<script type='application/ld+json' class='yoast-schema-graph yoast-schema-graph--main'></script> 11...11'
Does anyone here know of any way to configure Pagespeed to be able to recognise this format?
I've already fixed some issues with unrecognised image links within some tags:
ModPagespeedUrlValuedAttribute img data-lazy-src image
ModPagespeedUrlValuedAttribute a data-tolb-src image
And those assets are now being rewritten/compressed.
My main problem is that I don't know how to get Pagespeed to recognise these script tags.
Is there a way? (Is it even necessary for these type of script tags?)
Many thanks in advance.
Due to application requirements, I have an externally accessible CouchDB instance. I would like to see what IP addresses are attempting to authenticate with my database. By checking the couchdb.log file, I can see failed authentication attempts. They look similar to this.
[Mon, 29 Sep 2014 13:43:32 GMT] [info] [<0.28472.7>] 127.0.0.1 - - GET
/offline_master/ 401
However, no matter where I connect from, it seems that the IP address that is logged is always 127.0.0.1. Am I mis-understanding how this works? I would really like to see the IP address that is attempting to connect.
The 127.0.0.1 is the address couchDB is bound to. It's there because you can set up couchdb to respond differently depending on what host name is being used.
The only way to get the client ip address is by turning the logging level to "debug". You can do this in the configuration page in futon.
You get records like this (client IP is on 1st line):
[Tue, 30 Sep 2014 00:14:27 GMT] [debug] [<0.451.4>] 'GET' / {1,1} from "192.168.1.52"
Headers: [{'Accept',"*/*"},
{'Host',"localhost:5984"},
{'User-Agent',"curl/7.30.0"}]
[Tue, 30 Sep 2014 00:14:27 GMT] [debug] [<0.451.4>] OAuth Params: []
[Tue, 30 Sep 2014 00:14:27 GMT] [info] [<0.451.4>] 127.0.0.1 - - GET / 200
Be careful with this. The debug logs are extremely verbose. It doesn't take long to fill up a hard drive.
It is possible to set log levels by module. The module you need to set is couch_httpd. Set the default for the rest to "error" or "fatal".
See: 3.6.2 Per module logging
Update
As #AkshatJiwanSharma suggested I have tried a few things while locally replicating. Very instructive! I have renamed the question since the problem is not that the design document gets replicated, in fact it isn't replicated, but it is fetched via an HTTP GET as part of the initial replication "negotiation" phase.
I've moved the original question to the bottom to make the new question clearer. The new question is:
It seems inefficient (particularly in the case of CouchApps) to fetch the entire design document - i.e. the entire remote app - when initiating a replication with a remote source. Can this be avoided?
It is particularly problematic in our case, on high latency links (less than 7.2Kbps), with relatively large design documents (3MB).
Remote Target
I have first tried by using a "remote" target by setting the replication target to http://127.0.0.1:5984/emr_replica.
[Fri, 08 Aug 2014 08:36:20 GMT] [info] [<0.18947.7>] Document `88fa1b1a1315d27ded663466c6003578` triggered replication `e8e66a554d198b88b6263a572a072fd3+continuous`
[Fri, 08 Aug 2014 08:36:20 GMT] [info] [<0.18946.7>] starting new replication `e8e66a554d198b88b6263a572a072fd3+continuous` at <0.18947.7> (`emr_demo` -> `http://127.0.0.1:5984/emr_replica/`)
[Fri, 08 Aug 2014 08:36:20 GMT] [info] [<0.18928.7>] 127.0.0.1 - - POST /emr_replica/_revs_diff 200
[Fri, 08 Aug 2014 08:36:20 GMT] [info] [<0.18915.7>] y.y.y.y - - GET /_utils/_sidebar.html 200
[Fri, 08 Aug 2014 08:36:20 GMT] [info] [<0.18916.7>] y.y.y.y - - GET /_replicator/88fa1b1a1315d27ded663466c6003578?revs_info=true 200
In that case the design document doesn't seem to be fetched.
Remote Source
Then setting the source as "remote" like this
{
"_id": "88fa1b1a1315d27ded663466c6003a4a",
"_rev": "3-b6408e98acafe729da0153c35d9df113",
"source": "http://127.0.0.1:5984/emr_demo",
"target": "emr_replica",
"continuous": true,
"filter": "emr/user_data",
"owner": "jun"
}
Then the server fetches the remote design document before starting the replication (GET /emr_demo/_design/emr 200).
[Fri, 08 Aug 2014 08:42:17 GMT] [info] [<0.19687.7>] Document `88fa1b1a1315d27ded663466c6003a4a` triggered replication `bd8f6288970bca974dba36dbc6e5353b+continuous`
[Fri, 08 Aug 2014 08:42:17 GMT] [info] [<0.19686.7>] starting new replication `bd8f6288970bca974dba36dbc6e5353b+continuous` at <0.19687.7> (`http://127.0.0.1:5984/emr_demo/` -> `emr_replica`)
[Fri, 08 Aug 2014 08:42:17 GMT] [info] [<0.19648.7>] 127.0.0.1 - - HEAD /emr_demo/ 200
[Fri, 08 Aug 2014 08:42:17 GMT] [info] [<0.19648.7>] 127.0.0.1 - - GET /emr_demo/_design/emr 200
[Fri, 08 Aug 2014 08:42:18 GMT] [info] [<0.19656.7>] 127.0.0.1 - - GET /emr_demo/5cc2db69a32a84091b96c244273fda0e?revs=true&open_revs=%5B%221-ef8967557f2e99eb137f963daccddb3f%22%5D&latest=true 200
Further testing shows that this fetching of the design document is only done once. Further replications (including after restarting the server) only fetch the changes with the appropriate filter:
[Fri, 08 Aug 2014 09:06:36 GMT] [info] [<0.520.0>] Document `88fa1b1a1315d27ded663466c6003a4a` triggered replication `bd8f6288970bca974dba36dbc6e5353b+continuous`
[Fri, 08 Aug 2014 09:06:36 GMT] [info] [<0.519.0>] starting new replication `bd8f6288970bca974dba36dbc6e5353b+continuous` at <0.520.0> (`http://127.0.0.1:5984/emr_demo/` -> `emr_replica`)
[Fri, 08 Aug 2014 09:06:36 GMT] [info] [<0.335.0>] 127.0.0.1 - - GET /emr_demo/_changes?filter=emr%2Fuser_data&feed=continuous&style=all_docs&since=1607&heartbeat=1666 200
[Fri, 08 Aug 2014 09:06:36 GMT] [info] [<0.334.0>] 127.0.0.1 - - GET /emr_demo/5cc2db69a32a84091b96c24427560310?atts_since=%5B%2218-b613d3160bd09c45ac07a5485c9c7bce%22%5D&revs=true&open_revs=%5B%2219-d50438143337a3a0af5ed8ceb75b42f5%22%5D&latest=true 200
Former question
We're trying to use the couchdb replication over a very high latency link (slow, frequent disconnections,...). We want to avoid to replicate the design document which is heavy. We have a filter in place and when using the following curl command, the design document doesn't appear, as expected:
curl http://x.x.x.x:5984/emr/_changes?filter=emr/user_data
Our replication document is:
{
"_id": "e0e38be8cc0b11356dfb03bc8400074d",
"_rev": "1-d77117f03d63099e1e505b9f9de3371d",
"source": "http://x.x.x.x:5984/emr",
"target": "emr",
"continuous": true,
"filter": "emr/user_data",
"create_target": true,
"owner": "jun"
}
We have deactivated authentication while we're debugging. When using an existing database and removing create_target, the same problem occurs.
The source server outputs the following:
[Mon, 10 Mar 2014 21:22:03 GMT] [info] [<0.135.0>] Retrying HEAD request to http://x.x.x.x:5984/emr/ in 0.25 seconds due to error {conn_failed,{error,etimedout}}
[Mon, 10 Mar 2014 21:23:47 GMT] [info] [<0.135.0>] Retrying GET request to http://x.x.x.x:5984/emr/_design/emr in 0.25 seconds due to error req_timedout
[Mon, 10 Mar 2014 21:24:14 GMT] [error] [<0.135.0>] Replicator, request GET to "http://x.x.x.x:5984/emr/_design/emr" failed due to error {error,req_timedout}
[Mon, 10 Mar 2014 21:24:14 GMT] [error] [<0.135.0>] Replication manager, error processing document `e0e38be8cc0b11356dfb03bc8400074d`: Couldn't open document `_design/emr` from source database `http://x.x.x.x:5984/emr/`: {'EXIT',{http_request_failed,"GET","http://x.x.x.x:5984/emr/_design/emr",
{error,{error,req_timedout}}}}
When using tcpdump, it's clear that the replication fails because the replication manager attempts to download the heavy design document (http://x.x.x.x:5984/emr/_design/emr).
FYI the replicator's configuration is:
replicator connection_timeout 5000
db _replicator
http_connections 1
max_replication_retry_count 3
retries_per_request 1
socket_options [{keepalive, true}, {nodelay, true}]
ssl_certificate_max_depth 3
verify_ssl_certificates false
worker_batch_size 1
worker_processes 1
EDIT: The user_data function (which correctly hides the design document when ran through curl as above) is :
exports.user_data = function(doc, req) {
if (doc.collection == "visits" || doc.collection == "patients" || doc.collection == "reports") {
return true;
}
return false;
}
Hope someone can help!
Suggestion
Try defining a filter function in another, small, dedicated design document and see if that fixes your problem.
// replicator document:
{
"_id": "e0e38be8cc0b11356dfb03bc8400074d",
"_rev": "1-d77117f03d63099e1e505b9f9de3371d",
"source": "http://x.x.x.x:5984/emr",
"target": "emr",
"continuous": true,
"filter": "small-design-doc/user_data",
"create_target": true,
"owner": "jun"
}
// _design/small-design-doc
// -- will be replicated, but is quite small:
{
"_id": "_design/small-design-doc",
"_rev": "1-...",
"filters": {
"user_data": "function(doc, req) { ... }"
}
}
Explanation
According to a current snapshot of the source code, it seems the replicator is trying to fetch the design document (_design/emr) from the source database, simply because the filter function is defined there (emr/user_data).
If you specify a filter function in another design document, the replicator should try to download that very document before executing replication. So you cannot quite circumvent downloading any design document, but you are able to select which one.
Great question by the way. And very thoroughly investigated!
Whenever I throw a command at hubot in a room i.e #hubot help I get nothing, but typing help in 1-1 chat works fine, am I missing something? has anyone else had this problem??
i have followed the setup instructions to the tee, and it still wont work
Did you end up specifying a host? I was having a similar problem, and removing the hostname fixed it. See the GH issue here:
https://github.com/github/hubot/issues/651
In my case, I could see in the logs that Hubot was receiving the commands, and even replying with the correct response, but the response was never showing up in the channel:
In the chat room (no visible in-room response):
#hubot ping
[Mon Feb 24 2014 02:04:21 GMT+0000 (UTC)] DEBUG Message '[object Object]' matched regex //^[#]?hubot[:,]?\s*(?:PING$)/i/
[Mon Feb 24 2014 02:04:21 GMT+0000 (UTC)] DEBUG OUT >
<message to="XXXXXX_hubot_test#conf.hipchat.com" type="chat" from="XXXXXX_XXXXXX#chat.hipchat.com/hubot-hipchat">
<inactive xmlns="http://jabber/protocol/chatstates"/>
<body>PONG</body>
</message>
In 1:1 messages:
[Mon Feb 24 2014 02:06:01 GMT+0000 (UTC)] DEBUG Message '[object Object]' matched regex //^[#]?hubot[:,]?\s*(?:PING$)/i/
[Mon Feb 24 2014 02:06:01 GMT+0000 (UTC)] DEBUG OUT >
<message to="XXXXX_188883#chat.hipchat.com" type="chat" from="XXXXXXXX#chat.hipchat.com/hubot-hipchat">
<inactive xmlns="http://jabber/protocol/chatstates"/>
<body>PONG</body>
</message>
I think the issue is that it uses chat.hipchat.com for 1:1 chat, and conf.hipchat.com for rooms. If you specified a specific hostname, you'll get one or the other, but not both.
If you set the environmental variables via CLI, to unset it, do
unset HUBOT_HIPCHAT_HOST.
Have you only tried: #hubot help? You can set up your prefix via: Here It does say that the
For example, the HipChat Adapter converts #hubot into hubot: before passing it to Hubot.
But I'd go ahead and try the following. Also attempt to try it locally via bin/hubot running it via the shell.
Hubot help
hubot help
Not to mention check heroku logs to make sure that hubot is showing up in your hipchat channel correctly.
Hope this helps.
I don't want to see a log for every request the server receives when I'm testing (it makes reading the results much harder). Is there a simple way to start up Node so that it doesn't do that?
I'm referring the the lines that look like this just to be perfectly clear:
127.0.0.1 - - [Mon, 07 Jan 2013 15:59:52 GMT] "GET / HTTP/1.1" 200 1039 "-" "Mozilla/5.0 Chrome/10.0.613.0 Safari/534.15 Zombie.js/1.4.1"
NodeJS does not do this automatically.
Assuming you are using express, you need to remove the logger middleware. Remove this line:
app.use(express.logger());