CouchDB log does not work in map nor reduce function? - couchdb

The doc says there is a log method to write message to log file at INFO level.
I tried but it does not work. (CouchDB 1.6.1)
First I start monitoring the log file
tail -f couch.log
I see the log file is being appended and other INFO messages appear like.
[Tue, 06 Jan 2015 08:16:10 GMT] [info] [<0.321.0>] 192.168.1.43 - - GET /test/ 200
[Tue, 06 Jan 2015 08:16:10 GMT] [info] [<0.323.0>] 192.168.1.45 - - GET /test/ 200
I tried log in views (including temp view or persistence view), the message never appears while the other INFO messages are being appended. The view responses correctly. also tried to add new documents and then trigger the view, still nothing.
function(doc) {
log('LOG NEVER APPEARS');
emit(null, doc);
}
Does anybody know what the reason is?

What probably happened was you created the view and the log function logged the request on the terminal. When you tailed it the index was already created and the view code was not run again. Try to add some data and call the view again. Or change the view->save it->call the view again and you should see the logged message.
This is happening because the view code is run only when the underlying data in couchdb changes. Once a view is created it grows by appending the new data to the already existing index. If the data is not changed the code inside the view will not be run.

Related

Assignment files do not get deleted via course reset after cron execution in Moodle

When I try to reset the assignments of a course, front-end wise all data get deleted. I tested this with a single file upload by myself in a test assignment. But when checking disk usage with
du moodledata/filedir
the same usage remains. I ensured execution of the cron task which printed
...
Cron script completed correctly
Cron completed at 17:40:03. Memory used 32.8MB.
Execution took 0.810698 seconds
The files also are not in moodledata/trashdir probably reason why the cron task does not clean it.
Removing file with
moosh file-hash-delete <hash>
seemed to work. I identified the hash with pre/after executing disk usage and checking hash in the folder that used up the size of the file I uploaded.
The hash was not in the mdl_files table in MySQL, but the draft of it was. This one I found out via
moosh file-check
and I also checked it with phpMyAdmin, which outputted the file(draft) alongside other files.
Logs for resetting the course show the following:
Core System, course reset finished, The reset of the course with id '4' has ended.
Core System, deadline updated, The user with id '2' updated the event 'test ist zur Bewertung fällig.' with id '4'.
Core System, deadline updated, The user with id '2' updated the event 'test ist fällig.' with id '3'.
Core System, course reset begin, The user with id '2' started the reset of the course with id '4'.
(note that I translated some of the messages, because my setup is in German).
Unfortunately I'm having to run this Moodle instance on a hoster with extremely low disk storage (hence backup/deletion requirement).
Some background infos:
Moodle - version 3.8.2+ stable, dbtype set to mariadb
MariaDB - version 10.3.19
Machine: CentOS Linux 7
UPDATE: It seems that after some days (I checked today, ~4 days later) the files have been deleted. I don't know why this happened after so many days even though I manually triggered the cron job (seems that it doesn't delete the files). It would be nice to check where the timer is set and which script finally deletes the files.
On the course reset page, if you scroll down, there is a drop down for Assignments
Did you check the box for Delete all submissions ?
In the code, $data->reset_assign_submissions will delete the files:
public function reset_userdata($data) {
global $CFG, $DB;
$componentstr = get_string('modulenameplural', 'assign');
$status = array();
$fs = get_file_storage();
if (!empty($data->reset_assign_submissions)) {

Proper way to configure Deletion Bolt for Stormcrawler

So I'm trying to turn on the Deletion Bolt on my storm crawler instances so they can clean up the indexes as the urls for our sites change and pages go away.
For reference I am on 1.13. (our systems people have not upgraded us to Elk v7 yet)
Having never attempted to modify the es-crawler.flux, I'm looking for some help to let me know if I am doing this correctly.
I added a bolt:
- id: "deleter"
className: "com.digitalpebble.stormcrawler.elasticsearch.bolt.DeletionBolt"
parallelism: 1
and then added the stream:
- from: "status"
to: "deleter"
grouping:
type: FIELDS
args: ["url"]
streamId: "deletion"
Is that the correct way to do this? I don't want to accidentally delete everything in my index by putting in the wrong info. 🤣
Yes, to answer my own question, adding the two above items to their respective places in the es-crawler.flux DOES in fact cause the crawler to delete docs.
In order to test this, I created a directory on one of our servers with a few files in it - index.html, test1.html, test2.html, and test3.html. index.html had links to the three test html files. I crawled them with the crawler, having first limited it to ONLY crawl that specific directory. I also modified the fetch settings to re-crawl crawled docs after 3 min, and re-crawl fetch-error docs after 5min.
All 4 docs showed up in the status index as FETCHED and the content in the content index.
I then renamed test3.html to test5.html, and changed the link in the index.html. The crawler picked up the change, and and changed the status of test3.html to FETCH_ERROR and added test4.html to the indexes.
After 5min it crawled it again, keeping the fetch error status.
After another 5min, it crawled it again, changing the status to ERROR and deleting the test3.html doc from the content index.
So that worked great. In our production indexes, we have a bunch of docs that have gone from FETCH_ERROR status to ERROR status, but because deletions were not enabled, the actual content was not deleted and is still showing up in searches. On my test pages, here's the solution to that:
I disabled deletions (removing the two above items from the es-crawler.flux) and renamed test2.html to test5.html, modifing the link in the index.html. The crawler went through the three crawls with FETCH_ERROR and set it to ERROR status but did not delete the doc from the content index.
I re-enabled deletions and let the crawler run for a while, but soon realized that when the crawler set the status to ERROR, it also set the nextFetchDate to 12/31/2099.
So I went into the elasticsearch index and ran the following query to reset the status and the date to something just ahead of where the current date/time was:
POST /www-test-status/_update_by_query
{
"script": {
"source": """
if (ctx._source?.status != null)
{
ctx._source.remove('metadata.error%2Ecause');
ctx._source.remove('status');
ctx._source.put('status', 'FETCH_ERROR');
ctx._source.remove('nextFetchDate');
ctx._source.put('nextFetchDate', '2019-10-09T15:01:33.000Z');
}
""",
"lang": "painless"
},
"query": {
"match": {
"status": "ERROR"
}
}
}
The crawler then picked up the docs the next time it came around and deleted the docs out of the content index when they went back to ERROR status.
Not sure if that's the complete proper way to do it, but it has worked for me.

where to see the log of console.log in couch db?

With couch db _util (Fauxton Utility) i did create view as follows
function (doc) {
console.log(doc.biodatas.length);
if(doc.biodatas.length > 0) {
for(var idx in doc.biodatas) {
console.log(idx);
emit(idx.Name,idx.Surname,idx.City,idx.Phone);
}
}
}
i did create this view for display data as row order for suggested Name,Surname,City and Phone info of document. but i did not seen any result no document found was the result. but i have inserted data in the document. to debug that and find out whats wrong i did write console.log with in code but still in browser developer console nothing is getting display. i did also open CouchDB\var\log `s couchdb.log file but there is no message just server's request status record. and actually this file is too big to find logs. is there any other way to sore log records of specific view in different file?.

How to manage multiline events based on a random field Logstash

I've been facing a problem related to multiline events lately, and I am needing a little bit of your help for this. My syslog server is sending multi-line events. One single event gathers several lines, and the indicator that proves a particular event line is part of a multi-line event is a random number that defines a user connection session. Here is a custom generated log file:
Feb 16 17:29:04 slot1/APM-LTM notice apd[5515]: 01490010:5: 1ec2b273:Username 'cjones'
Feb 16 17:29:04 slot1/APM-LTM warning apd[5515]: 01490106:4: 1ec2b273: AD module: authentication with 'cjones' failed: Preauthentication failed, principal name: cjones#GEEKO.COM. Invalid user credentials. (-1765328360)
Feb 16 17:10:04 slot1/APM-LTM notice apd[5515]: 01490010:5: d8b5a591: Username 'gbridget'
Feb 16 17:10:04 slot1/APM-LTM err apd[5515]: 01490107:3: d8b5a591: AD module: authentication with 'gbridget' failed: Clients credentials have been revoked, principal name: gbridget#GEEKO.COM. User account is locked (-1765328366)
Feb 16 17:29:04 slot1/APM-LTM notice apd[5515]: 01490005:5: 1ec2b273: Following rule 'fallback' from item 'AD Auth' to ending 'Deny'
Feb 16 17:29:04 slot1/APM-LTM notice apd[5515]: 01490102:5: 1ec2b273: Access policy result: Logon_Deny
Above are the lines related to two different connections defined by the following user sessions: d8b5a591(user gbridget) and 1ec2b273(user cjones). user sessions are the only indicators that connect those lines to two different events. not to mention that the line events are interwined.
The problem is that I am at loss as to how to explain the above to grok filter with a multiline plugin, knowing that the latter offers too few options. In fact , the notion of "previous" and "next" line cannot be applied here for instance, so the grok options "pattern" and "what" cannot be used, since the events are not necessarily consecutive.
I would really appreciate it if someone could shed some light on this and tell me if at least it is feasable or not.
I don't see those as multi-line events, but as related events. I would load them into elasticsearch as 6 different documents and then query as needed. If you have specific queries that you're trying to perform against this data, you might ask questions about how to perform them against multiple documents.
One alternative would be to use the session_id as the document ids and then you could update the initial documents when new information came in. They don't recommend using your own document ids (for performance reasons, IIRC), and updating a document involves deleting the old one and inserting a new one, which is also not good for performance.

CouchDB: bulk_docs returning incorrect status code

I'm working on syncing a PouchDB database (with Angular) with a CouchDB database.
When the replication is in progress, the code is issuing a POST request to do a bulk update to http://127.0.0.1:5984/testdb/_bulk_docs.
I have a validation rule on database to reject unauthorized writes, and it generates a forbidden error. Therefore, the server is responding with a JSON response as [{"id":"0951db944e729c981ad3964c22002d55","rev":"8-ccdcb52743cae43c5870113f09f2e25a","error":"forbidden","reason":"Not Authorized"}]
According to the docs (at the end of the page), the above response should generate a 417 Expectation Failed status code. However, it currently generates a 201 Created status code.
Because of the incorrect response code, the client (PouchDB) shows as all records synced, but the updates are not written to the server (CouchDB).
Is there a config option to change this status code?
Fore reference, my validate_doc_update function is as following.
function(newDoc, oldDoc, userCtx){
if (!userCtx) throw({forbidden: 'Need a user to update'});
if((userCtx.roles.indexOf('_admin') == -1) && (userCtx.roles.indexOf('backend:manager') == -1)){
throw({forbidden: "Not Authorized"});
}
}
The 417:expectation failed status code only works when the all_or_nothing parameter is set to true. By default this parameter is false.
The default bulk update transaction mode in couchdb is non atomic which guarantees that only some of the documents will be saved. If the document is not saved the api returns an error object like you got along with a list of documents that were in fact successfully saved. So 201 seems to be the correct response.
Then you've got to walk through the response to find which documents failed and manually update them.
In case of all_or_nothing mode however a success will be returned only if all the documents have been updated.
While syncing you can also use the _replicate endpoint that has many other features that bulk update does not have.

Resources