Is it possible to prevent fetching of remote design document in couchdb - couchdb

Update
As #AkshatJiwanSharma suggested I have tried a few things while locally replicating. Very instructive! I have renamed the question since the problem is not that the design document gets replicated, in fact it isn't replicated, but it is fetched via an HTTP GET as part of the initial replication "negotiation" phase.
I've moved the original question to the bottom to make the new question clearer. The new question is:
It seems inefficient (particularly in the case of CouchApps) to fetch the entire design document - i.e. the entire remote app - when initiating a replication with a remote source. Can this be avoided?
It is particularly problematic in our case, on high latency links (less than 7.2Kbps), with relatively large design documents (3MB).
Remote Target
I have first tried by using a "remote" target by setting the replication target to http://127.0.0.1:5984/emr_replica.
[Fri, 08 Aug 2014 08:36:20 GMT] [info] [<0.18947.7>] Document `88fa1b1a1315d27ded663466c6003578` triggered replication `e8e66a554d198b88b6263a572a072fd3+continuous`
[Fri, 08 Aug 2014 08:36:20 GMT] [info] [<0.18946.7>] starting new replication `e8e66a554d198b88b6263a572a072fd3+continuous` at <0.18947.7> (`emr_demo` -> `http://127.0.0.1:5984/emr_replica/`)
[Fri, 08 Aug 2014 08:36:20 GMT] [info] [<0.18928.7>] 127.0.0.1 - - POST /emr_replica/_revs_diff 200
[Fri, 08 Aug 2014 08:36:20 GMT] [info] [<0.18915.7>] y.y.y.y - - GET /_utils/_sidebar.html 200
[Fri, 08 Aug 2014 08:36:20 GMT] [info] [<0.18916.7>] y.y.y.y - - GET /_replicator/88fa1b1a1315d27ded663466c6003578?revs_info=true 200
In that case the design document doesn't seem to be fetched.
Remote Source
Then setting the source as "remote" like this
{
"_id": "88fa1b1a1315d27ded663466c6003a4a",
"_rev": "3-b6408e98acafe729da0153c35d9df113",
"source": "http://127.0.0.1:5984/emr_demo",
"target": "emr_replica",
"continuous": true,
"filter": "emr/user_data",
"owner": "jun"
}
Then the server fetches the remote design document before starting the replication (GET /emr_demo/_design/emr 200).
[Fri, 08 Aug 2014 08:42:17 GMT] [info] [<0.19687.7>] Document `88fa1b1a1315d27ded663466c6003a4a` triggered replication `bd8f6288970bca974dba36dbc6e5353b+continuous`
[Fri, 08 Aug 2014 08:42:17 GMT] [info] [<0.19686.7>] starting new replication `bd8f6288970bca974dba36dbc6e5353b+continuous` at <0.19687.7> (`http://127.0.0.1:5984/emr_demo/` -> `emr_replica`)
[Fri, 08 Aug 2014 08:42:17 GMT] [info] [<0.19648.7>] 127.0.0.1 - - HEAD /emr_demo/ 200
[Fri, 08 Aug 2014 08:42:17 GMT] [info] [<0.19648.7>] 127.0.0.1 - - GET /emr_demo/_design/emr 200
[Fri, 08 Aug 2014 08:42:18 GMT] [info] [<0.19656.7>] 127.0.0.1 - - GET /emr_demo/5cc2db69a32a84091b96c244273fda0e?revs=true&open_revs=%5B%221-ef8967557f2e99eb137f963daccddb3f%22%5D&latest=true 200
Further testing shows that this fetching of the design document is only done once. Further replications (including after restarting the server) only fetch the changes with the appropriate filter:
[Fri, 08 Aug 2014 09:06:36 GMT] [info] [<0.520.0>] Document `88fa1b1a1315d27ded663466c6003a4a` triggered replication `bd8f6288970bca974dba36dbc6e5353b+continuous`
[Fri, 08 Aug 2014 09:06:36 GMT] [info] [<0.519.0>] starting new replication `bd8f6288970bca974dba36dbc6e5353b+continuous` at <0.520.0> (`http://127.0.0.1:5984/emr_demo/` -> `emr_replica`)
[Fri, 08 Aug 2014 09:06:36 GMT] [info] [<0.335.0>] 127.0.0.1 - - GET /emr_demo/_changes?filter=emr%2Fuser_data&feed=continuous&style=all_docs&since=1607&heartbeat=1666 200
[Fri, 08 Aug 2014 09:06:36 GMT] [info] [<0.334.0>] 127.0.0.1 - - GET /emr_demo/5cc2db69a32a84091b96c24427560310?atts_since=%5B%2218-b613d3160bd09c45ac07a5485c9c7bce%22%5D&revs=true&open_revs=%5B%2219-d50438143337a3a0af5ed8ceb75b42f5%22%5D&latest=true 200
Former question
We're trying to use the couchdb replication over a very high latency link (slow, frequent disconnections,...). We want to avoid to replicate the design document which is heavy. We have a filter in place and when using the following curl command, the design document doesn't appear, as expected:
curl http://x.x.x.x:5984/emr/_changes?filter=emr/user_data
Our replication document is:
{
"_id": "e0e38be8cc0b11356dfb03bc8400074d",
"_rev": "1-d77117f03d63099e1e505b9f9de3371d",
"source": "http://x.x.x.x:5984/emr",
"target": "emr",
"continuous": true,
"filter": "emr/user_data",
"create_target": true,
"owner": "jun"
}
We have deactivated authentication while we're debugging. When using an existing database and removing create_target, the same problem occurs.
The source server outputs the following:
[Mon, 10 Mar 2014 21:22:03 GMT] [info] [<0.135.0>] Retrying HEAD request to http://x.x.x.x:5984/emr/ in 0.25 seconds due to error {conn_failed,{error,etimedout}}
[Mon, 10 Mar 2014 21:23:47 GMT] [info] [<0.135.0>] Retrying GET request to http://x.x.x.x:5984/emr/_design/emr in 0.25 seconds due to error req_timedout
[Mon, 10 Mar 2014 21:24:14 GMT] [error] [<0.135.0>] Replicator, request GET to "http://x.x.x.x:5984/emr/_design/emr" failed due to error {error,req_timedout}
[Mon, 10 Mar 2014 21:24:14 GMT] [error] [<0.135.0>] Replication manager, error processing document `e0e38be8cc0b11356dfb03bc8400074d`: Couldn't open document `_design/emr` from source database `http://x.x.x.x:5984/emr/`: {'EXIT',{http_request_failed,"GET","http://x.x.x.x:5984/emr/_design/emr",
{error,{error,req_timedout}}}}
When using tcpdump, it's clear that the replication fails because the replication manager attempts to download the heavy design document (http://x.x.x.x:5984/emr/_design/emr).
FYI the replicator's configuration is:
replicator connection_timeout 5000
db _replicator
http_connections 1
max_replication_retry_count 3
retries_per_request 1
socket_options [{keepalive, true}, {nodelay, true}]
ssl_certificate_max_depth 3
verify_ssl_certificates false
worker_batch_size 1
worker_processes 1
EDIT: The user_data function (which correctly hides the design document when ran through curl as above) is :
exports.user_data = function(doc, req) {
if (doc.collection == "visits" || doc.collection == "patients" || doc.collection == "reports") {
return true;
}
return false;
}
Hope someone can help!

Suggestion
Try defining a filter function in another, small, dedicated design document and see if that fixes your problem.
// replicator document:
{
"_id": "e0e38be8cc0b11356dfb03bc8400074d",
"_rev": "1-d77117f03d63099e1e505b9f9de3371d",
"source": "http://x.x.x.x:5984/emr",
"target": "emr",
"continuous": true,
"filter": "small-design-doc/user_data",
"create_target": true,
"owner": "jun"
}
// _design/small-design-doc
// -- will be replicated, but is quite small:
{
"_id": "_design/small-design-doc",
"_rev": "1-...",
"filters": {
"user_data": "function(doc, req) { ... }"
}
}
Explanation
According to a current snapshot of the source code, it seems the replicator is trying to fetch the design document (_design/emr) from the source database, simply because the filter function is defined there (emr/user_data).
If you specify a filter function in another design document, the replicator should try to download that very document before executing replication. So you cannot quite circumvent downloading any design document, but you are able to select which one.
Great question by the way. And very thoroughly investigated!

Related

mod_pagespeed many "Unrecognized Script" messages in the console

I am running mod_pagespeed (Apache) on my server and it is functioning great, apart from many "info level" messages in the admin console log. The site is currently scoring 94/89 on GTMetrix and I'm at the point of trying to squeeze that last bit of juice from this wonderful module.
These messages refer to unrecognised scripts that mod_pagespeed was attempting to process. In most cases they contain json or text/html, i.e.:
[Wed, 29 Jan 2020 13:17:10 GMT] [Info] [16645] [https://www.example.com/comment/:271] Unrecognized script:'<script type='application/ld+json'></script> 271...295'
[Wed, 29 Jan 2020 13:17:10 GMT] [Info] [16645] [https://www.example.com/comment/:88] Unrecognized script:'<script type='application/ld+json' class='yoast-schema-graph yoast-schema-graph--main'></script> 88...88'
[Wed, 29 Jan 2020 13:13:01 GMT] [Info] [14352] [https://www.example.com/comment/:47] Unrecognized script:'<script type="text/html" id="wpb-modifications"></script> 47...47'
[Wed, 29 Jan 2020 13:13:01 GMT] [Info] [14352] [https://www.example.com/comment/:11] Unrecognized script:'<script type='application/ld+json'></script> 11...33'
[Wed, 29 Jan 2020 13:13:01 GMT] [Info] [14352] [https://www.example.com//:11] Unrecognized script:'<script type='application/ld+json' class='yoast-schema-graph yoast-schema-graph--main'></script> 11...11'
Does anyone here know of any way to configure Pagespeed to be able to recognise this format?
I've already fixed some issues with unrecognised image links within some tags:
ModPagespeedUrlValuedAttribute img data-lazy-src image
ModPagespeedUrlValuedAttribute a data-tolb-src image
And those assets are now being rewritten/compressed.
My main problem is that I don't know how to get Pagespeed to recognise these script tags.
Is there a way? (Is it even necessary for these type of script tags?)
Many thanks in advance.

Live streaming on azure with FMLE

I am trying to stream to the new azure live channel.
I have followed the guide on http://azure.microsoft.com/blog/2014/09/18/azure-media-services-rtmp-support-and-live-encoders/
this is my FMLE setup: http://i.stack.imgur.com/U8rlk.png
but for some reason I am getting disconnected by the azure service. I am using Flash Media Live encoder 3.2 and this is what the log looks like:
Thu Oct 09 2014 16:33:30 : Video Encoding Started
Thu Oct 09 2014 16:33:39 : Primary - Network Status: NetConnection.Connect.Closed status
Thu Oct 09 2014 16:33:45 : Primary - Disconnected
Thu Oct 09 2014 16:33:48 : Primary - Re-establishing connection, attempt 1
Thu Oct 09 2014 16:33:48 : Primary - Reconnected
Thu Oct 09 2014 16:33:48 : Primary - Network Command: onBWDone
Thu Oct 09 2014 16:33:48 : Primary - Stream[mystream9] Status: Success
Thu Oct 09 2014 16:33:48 : Primary - Stream[mystream9] Status: NetStream.Publish.Start
Thu Oct 09 2014 16:33:56 : Primary - Network Status: NetConnection.Connect.Closed status
Thu Oct 09 2014 16:33:58 : Video Encoding Stopped
Thu Oct 09 2014 16:33:58 : Session Stopped
Thu Oct 09 2014 16:34:01 : Primary - Disconnected
Thu Oct 09 2014 16:34:01 : Audio source does not support the selected sample rate and/or channels. Re-sampling the audio to desired setting.
Thu Oct 09 2014 16:34:02 : Primary - Re-establishing connection, attempt 1
Thu Oct 09 2014 16:34:02 : Primary - Reconnected
Thu Oct 09 2014 16:34:05 : Primary - Network Status: NetConnection.Connect.Closed status
Thu Oct 09 2014 16:34:05 : Primary - Disconnected
Edit:
Oh well I decided not to go for azure live streaming. Reasons are many. They let you only output to mpeg-dash which frankly is not ready for the world as of now. There are few players who support that out of box. Because of the smooth streaming crap from microsoft, we the encoders, are forced into limiting our encoding options. Alot of encoding software today dont have all the option that is required to stream such as frame alignment, grouping of pictures etc. Its kinda sad that I can stream with h264 but not x264 which is opensource and better.

Couchdb Logging

Due to application requirements, I have an externally accessible CouchDB instance. I would like to see what IP addresses are attempting to authenticate with my database. By checking the couchdb.log file, I can see failed authentication attempts. They look similar to this.
[Mon, 29 Sep 2014 13:43:32 GMT] [info] [<0.28472.7>] 127.0.0.1 - - GET
/offline_master/ 401
However, no matter where I connect from, it seems that the IP address that is logged is always 127.0.0.1. Am I mis-understanding how this works? I would really like to see the IP address that is attempting to connect.
The 127.0.0.1 is the address couchDB is bound to. It's there because you can set up couchdb to respond differently depending on what host name is being used.
The only way to get the client ip address is by turning the logging level to "debug". You can do this in the configuration page in futon.
You get records like this (client IP is on 1st line):
[Tue, 30 Sep 2014 00:14:27 GMT] [debug] [<0.451.4>] 'GET' / {1,1} from "192.168.1.52"
Headers: [{'Accept',"*/*"},
{'Host',"localhost:5984"},
{'User-Agent',"curl/7.30.0"}]
[Tue, 30 Sep 2014 00:14:27 GMT] [debug] [<0.451.4>] OAuth Params: []
[Tue, 30 Sep 2014 00:14:27 GMT] [info] [<0.451.4>] 127.0.0.1 - - GET / 200
Be careful with this. The debug logs are extremely verbose. It doesn't take long to fill up a hard drive.
It is possible to set log levels by module. The module you need to set is couch_httpd. Set the default for the rest to "error" or "fatal".
See: 3.6.2 Per module logging

Custom logging in couch.log from couchapp?

Is it possible to write in the couchdb server log (the one defined by default.ini or local.ini in [log]) from a couchapp? (But from somewhere else than a view)
If that's not possible, maybe there's a workaround which would allow to log successful or unsuccessful authentication attemps in the couchdb server log? I'd like to process this server side and would like to avoid logging all httpd activity and grepping for user logging patterns, which doesn't seem to be easy or pretty...
Cheers,
Jun
A year later I find that it was in fact possible to log from views (or lists or any Javascript Design Doc functions) using the log() function: http://docs.couchdb.org/en/1.6.1/query-server/javascript.html#log
log(message)
Log a message to the CouchDB log (at the INFO level).
Arguments:
message – Message to be logged
function(doc){
log('Procesing doc ' + doc['_id']);
emit(doc['_id'], null);
}
After the map function has run, the following line can be found in CouchDB logs (e.g. at /var/log/couchdb/couch.log):
[Sat, 03 Nov 2012 17:38:02 GMT] [info] [<0.7543.0>] OS Process #Port<0.3289> Log :: Processing doc 8d300b86622d67953d102165dbe99467
Who would have guessed :)
I'm pretty sure you can't write to couch.log from a view, it's a sandboxed system.
Getting a record of connections to the server is possible though. Here's a dump from my couch.log, with an HTTP error in there:
/
[Sat, 13 Sep 2014 08:18:57 GMT] [info] [<0.160.0>] Opening index for db: test idx: _design/ivet sig: "f6b64ef8593e23cac644c13b895b7607"
[Sat, 13 Sep 2014 08:18:57 GMT] [info] [<0.121.0>] 127.0.0.1 - - GET /test/_design/ivet/_view/medicationWHP/foobar?include_docs=true 200
[Sat, 13 Sep 2014 08:18:57 GMT] [info] [<0.121.0>] 127.0.0.1 - - GET /test/_design/ivet/_view/medicationWHP/foobar?include_docs=true 500
[Sat, 13 Sep 2014 08:18:57 GMT] [error] [<0.121.0>] httpd 500 error response:
{"error":"json_encode","reason":"{bad_term,{key,null}}"}
[Sat, 13 Sep 2014 08:19:05 GMT] [info] [<0.36.0>] Apache CouchDB has started on http://127.0.0.1:5984/
You can see it has the VERB PATH CODE format for each line, so you can filter that for whatever you need. (Unauthorized is 401) You can also access the log through /_log. Details on that are here:
http://docs.couchdb.org/en/latest/api/server/common.html#log
To get all that information, you'll need to have the log level set to info. You can do this at the config screen in futon.
To do it server-side, you'd probably need to use node.js or something like that. Just have it consume the /_log endpoint, and filter each line by the HTTP response code.

log4j - DailyRollingFileAppender, override subAppend()

I am new to log4j's DailyRollingFileAppender class and I would like to use this to perform daily rotation of the log file and at the same time would like to also manually modify the log file every time there is an event triggered to log event.
For example, I would like to always increment the value by one for "TOTAL COUNT:" inside the log file. How can I go about doing that?
Example of the log content:
07 Oct 2011 16:57:51 [INFO ] - Failed
07 Oct 2011 16:57:51 [WARN ] - Failed
07 Oct 2011 16:57:51 [ERROR] - Successful
07 Oct 2011 16:57:51 [FATAL] - Failed
07 Oct 2011 16:57:52 [DEBUG] - Successful
07 Oct 2011 16:57:52 [INFO ] - Failed
07 Oct 2011 16:57:52 [WARN ] - Failed
07 Oct 2011 16:57:52 [ERROR] - Successful
07 Oct 2011 16:57:52 [FATAL] - Failed
07 Oct 2011 16:57:53 [DEBUG] - Successful
07 Oct 2011 16:57:53 [INFO ] - Failed
07 Oct 2011 16:57:53 [WARN ] - Failed
07 Oct 2011 16:57:53 [ERROR] - Successful
07 Oct 2011 16:57:53 [FATAL] - Failed
07 Oct 2011 16:57:54 [DEBUG] – Successful
TOTAL COUNT: 15
It seems overriding subAppend() in the DailyRollingFileAppender is the way to go. You'll also have to be cautious when calling super.subAppend(), as the WriterAppender implements it like this:
protected void subAppend(LoggingEvent event) {
this.qw.write(this.layout.format(event));
// ...
}
and you don't want layout with your "TOTAL COUNT" line.
In my subAppend() I'd copy exactly what D.R.F.Appender does, adding:
line counting logic, if not present elsewhere,
a condition checking if a special line should be printed, then formatting and printing it directly to qw.

Resources