Elasticsearch index/cache not clearing/stuck

Elasticsearch index/cache not clearing/stuck - node.js

I'm using https://github.com/firebase/flashlight to index data for searches
However, this morning I deleted the whole firebase index, so it should be empty (this has worked before, but it seems that when the nodejs app.js crashes in some cases, causing the cache to get "stuck"), but I still see old search results from my nodejs app somehow...
I've tried:
http://localhost:9200/_cache/clear
and
http://localhost:9200/_flush
http://localhost:9200/firebase/_flush
They all say successful, but still I get old results, out of, seemingly nowhere.
I can also see in the console that it refreshes every 60 seconds, and, deleting the whole firebase has worked before without problems...
I even saw a message housekeeping: found 60 orphans (removing them now) in the console so it should be refreshed by now...
I tried restarting elasticsearch as well as the whole Linux/Debian server...
In the config.js I have two indexes:
exports.paths = [
{
path: "tags",
index: "firebase",
type: "tag",
filter: function(data) { return data.name !== 'system'; }
},
{
path: "tracks",
index: "firebase",
type: "track",
filter: function(data) { return data.name !== 'system'; }
}
];
And strangely enough, I have no problem whatsoever when using the 'track' store, instead of using the 'tag' one...
What am I missing here?
// Update !
So, I just deleted the firebase tracks index while the nodejs script was running and the script crashed... Same problem, different index. So the crashing script must cause it... so, how do I clear this stuck cache?

So I fixed it by simply doing:
curl -XDELETE localhost:9200/Firebase
Thanks to: https://github.com/elasticsearch/elasticsearch/issues/7541#issuecomment-54724302
I'm guessing Elastic search is not aware (and has not been told) of the relevance of its current index, perhaps the Flashlight script I'm using is not informing it about what the index should have been? But, since this is only necessary when the node script crashes when you suddenly delete your whole firebase index, it should be catchable somehow, but I'm happy that I can at least fix it like this. Rebuilding the index is not a big issue/task right now, but in the future it might be.

A wild guess, maybe you are not posting the queries correctly. You said you have tried following links:
http://localhost:9200/_cache/clear
http://localhost:9200/_flush
http://localhost:9200/firebase/_flush
If you are accessing the urls from browser it won't clear them. You have to POST them. It is ambiguous from your question if you did it, both GET and POST return same results (showing total, successful and failed). Try this from commandline using curl:
curl -XPOST 'http://localhost:9200/_cache/clear'
curl -XPOST 'http://localhost:9200/_flush'
Or create an AJAX request with JQuery or use fiddler.

Try to optimize your indeces by sending the following POST request to your elasticsearch server:
curl -XPOST 'http://localhost:9200/_optimize?max_num_segments=1&wait_for_merge=true'
It makes lucene really delete the deleted documents from disk and merge indeces.

Related

Strapi & react-admin : I'd like to set 'Content-Range' header dynamically when any fetchAll query fires

I'm still a novice web developer, so please bear with me if I miss something fundamental !
I'm creating a backoffice for a Strapi backend, using react-admin.
React-admin library uses a 'data provider' to link itself with an API. Luckily someone already wrote a data provider for Strapi. I had no problem with step 1 and 2 of this README, and I can authenticate to Strapi within my React app.
I now want to fetch and display my Strapi data, starting with Users. In order to do that, quoting Step 3 of this readme : 'In controllers I need to set the Content-Range header with the total number of results to build the pagination'.
So far I tried to do this in my User controller, with no success.
What I try to achieve:
First, I'd like it to simply work with the ctx.set('Content-Range', ...) hard-coded in the controller like aforementioned Step 3.
Second, I've thought it would be very dirty to c/p this logic in every controller (not to mention in any future controllers), instead of having some callback function dynamically appending the Content-Range header to any fetchAll request. Ultimately that's what I aim for, because with ~40 Strapi objects to administrate already and plenty more to come, it has to scale.
Technical infos
node -v: 11.13.0
npm -v: 6.7.0
strapi version: 3.0.0-alpha.25.2
uname -r output: Linux 4.14.106-97.85.amzn2.x86_64
DB: mySQL v2.16
So far I've tried accessing the count() method of User model like aforementioned step3, but my controller doesn't look like the example as I'm working with users-permissions plugin.
This is the action I've tried to edit (located in project/plugins/users-permissions/controllers/User.js)
find: async (ctx) => {
let data = await strapi.plugins['users-permissions'].services.user.fetchAll(ctx.query);
data.reduce((acc, user) => {
acc.push(_.omit(user.toJSON ? user.toJSON() : user, ['password', 'resetPasswordToken']));
return acc;
}, []);
// Send 200 `ok`
ctx.send(data);
},
From what I've gathered on Strapi documentation (here and also here), context is a sort of wrapper object. I only worked with Express-generated APIs before, so I understood this snippet as 'use fetchAll method of the User model object, with ctx.query as an argument', but I had no luck logging this ctx.query. And as I can't log stuff, I'm kinda blocked.
In my exploration, I naively tried to log the full ctx object and work from there:
// Send 200 `ok`
ctx.send(data);
strapi.log.info(ctx.query, ' were query');
strapi.log.info(ctx.request, 'were request');
strapi.log.info(ctx.response, 'were response');
strapi.log.info(ctx.res, 'were res');
strapi.log.info(ctx.req, 'were req');
strapi.log.info(ctx, 'is full context')
},
Unfortunately, I fear I miss something obvious, as it gives me no input at all. Making a fetchAll request from my React app with these console.logs print this in my terminal:
[2019-09-19T12:43:03.409Z] info were query
[2019-09-19T12:43:03.410Z] info were request
[2019-09-19T12:43:03.418Z] info were response
[2019-09-19T12:43:03.419Z] info were res
[2019-09-19T12:43:03.419Z] info were req
[2019-09-19T12:43:03.419Z] info is full context
[2019-09-19T12:43:03.435Z] debug GET /users?_sort=id:DESC&_start=0&_limit=10& (74 ms)
While in my frontend I get the good ol' The Content-Range header is missing in the HTTP Response message I'm trying to solve.
After writing this wall of text I realize the logging issue is separated from my original problem, but if I was able to at least log ctx properly, maybe I'd be able to find the solution myself.
Trying to summarize:
Actual problem is, how do I set my Content-Range properly in my strapi controller ? (partially answered cf. edit 3)
Collateral problem n°1: Can't even log ctx object (cf. edit 2)
Collateral problem n°2: Once I figure out the actual problem, is it feasible to address it dynamically (basically some callback function for index/fetchAll routes, in which the model is a variable, on which I'd call the appropriate count() method, and finally append the result to my response header)? I'm not asking for the code here, just if you think it's feasible and/or know a more elegant way.
Thank you for reading through and excuse me if it was confuse; I wasn't sure which infos would be relevant, so I thought the more the better.
/edit1: forgot to mention, in my controller I also tried to log strapi.plugins['users-permissions'].services.user object to see if it actually has a count() method but got no luck with that either. Also tried the original snippet (Step 3 of aforementioned README), but failed as expected as afaik I don't see the User model being imported anywhere (the only import in User.js being lodash)
/edit2: About the logs, my bad, I just misunderstood the documentation. I now do:
ctx.send(data);
strapi.log.info('ctx should be : ', {ctx});
strapi.log.info('ctx.req = ', {...ctx.req});
strapi.log.info('ctx.res = ', {...ctx.res});
strapi.log.info('ctx.request = ', {...ctx.request});
ctrapi.log.info('ctx.response = ', {...ctx.response});
Ctx logs this way; also it seems that it needs the spread operator to display nested objects ({ctx.req} crash the server, {...ctx.req} is okay). Cool, because it narrows the question to what's interesting.
/edit3: As expected, having logs helps big time. I've managed to display my users (although in the dirty way). Couldn't find any count() method, but watching the data object that is passed to ctx.send(), it's equivalent to your typical 'res.data' i.e a pure JSON with my user list. So a simple .length did the trick:
let data = await strapi.plugins['users-permissions'].services.user.fetchAll(ctx.query);
data.reduce((acc, user) => {
acc.push(_.omit(user.toJSON ? user.toJSON() : user, ['password', 'resetPasswordToken']));
return acc;
}, []);
ctx.set('Content-Range', data.length) // <-- it did the trick
// Send 200 `ok`
ctx.send(data);
Now starting to work on the hard part: the dynamic callback function that will do that for any index/fetchAll call. Will update once I figure it out

I'm using React Admin and Strapi together and installed ra-strapi-provider.
A little boring to paste Content-Range header into all of my controllers, so I searched for a better solution. Then I've found middleware concept and created one that fits my needs. It's probably not the best solution, but do its job well:
const _ = require("lodash");
module.exports = strapi => {
return {
// can also be async
initialize() {
strapi.app.use(async (ctx, next) => {
await next();
if (_.isArray(ctx.response.body))
ctx.set("Content-Range", ctx.response.body.length);
});
}
};
};
I hope it helps

For people still landing on this page:
Strapi has been updated from #alpha to #beta. Care, as some of the code in my OP is no longer valid; also some of their documentation is not up to date.
I failed to find a "clever" way to solve this problem; in the end I copy/pasted the ctx.set('Content-Range', data.length) bit in all relevant controllers and it just worked.
If somebody comes with a clever solution for that problem I'll happily accept his answer. With the current Strapi version I don't think it's doable with policies or lifecycle callbacks.
The "quick & easy fix" is still to customize each relevant Strapi controller.
With strapi#beta you don't have direct access to controller's code: you'll first need to "rewrite" one with the help of this doc. Then add the ctx.set('Content-Range', data.length) bit. Test it properly with RA, so for the other controllers, you'll just have to create the folder, name the file, copy/paste your code + "Search & Replace" on model name.
The "longer & cleaner fix" would be to dive into the react-admin source code and refactorize so the lack of "Content-Range" header doesn't break pagination.
You'll now have to maintain your own react-admin fork, so make sure you're already committed into this library and have A LOT of tables to manage through it (so much that customizing every Strapi controller will be too tedious).
Before forking RA, please remember all the stuff you can do with the Strapi backoffice alone (including embedding your custom React app into it) and ensure it will be worth the trouble.

Clear "pending_update_count" in Telegram Bot

I want to clear all pending_update_count in my bot!
The output of below command :
https://api.telegram.org/botxxxxxxxxxxxxxxxx/getWebhookInfo
Obviously I replaced the real API token with xxx
is this :
{
"ok":true,"result":
{
"url":"",
"has_custom_certificate":false,
"pending_update_count":5154
}
}
As you can see, I have 5154 unread updates til now!! ( I'm pretty sure this pending updates are errors! Because no one uses this Bot! It's just a test Bot)
By the way, this pending_update_count number are increasing so fast!
Now that I'm writing this post the number increased 51 and reached to 5205 !
I just want to clear this pending updates.
I'm pretty sure this Bot have been stuck in an infinite loop!
Is there any way to get rid of it?
P.S:
I also cleared the webhook url. But nothing changed!
UPDATE:
The output of getWebhookInfo is this :
{
"ok":true,
"result":{
"url":"https://somewhere.com/telegram/webhook",
"has_custom_certificate":false,
"pending_update_count":23,
"last_error_date":1482910173,
"last_error_message":"Wrong response from the webhook: 500 Internal Server Error",
"max_connections":40
}
}
Why I get Wrong response from the webhook: 500 Internal Server Error ?

I think you have two options:
set webhook that do nothing, just say 200 OK to telegram's servers. Telegram wiil send all updates to this url and the queque will be cleared.
disable webhook and after it get updates by using getUpdates method, after it, turn on webhook again
Update:
Problem with webhook on your side. You can try to emulate telegram's POST query on your URL.
It can be something like this:
{"message_id":1,"from":{"id":1,"first_name":"FirstName","last_name":"LastName","username":"username"},"chat":{"id":1,"first_name":"FirstName","last_name":"LastName","username":"username","type":"private"},"date":1460957457,"text":"test message"}
You can send this text as a POST query body with PostMan for example, and after it try to debug your backend.

For anyone looking at this in 2020 and beyond, the Telegram API now supports clearing the pending messages via a drop_pending_updates parameter in both setWebhook and deleteWebhook, as per the API documentation.

Just add return 1; at the end of your hook method.
Update:
Commonly this happens because of queries delay with the database.

I solved is like this
POST tg.api/bottoken/setWebhook to emtpy "url"
POST tg.api/bottoken/getUpdates
POST tg.api/bottoken/getUpdates with "offset" last update_id appeared before
doing this serveral times
POST tg.api/bottoken/getWebhookInfo
had a look if all away.
POST tg.api/bottoken/setWebhook with filled "url"

If you are using webhook, you can follow these steps
On your web browser, enter the following url with your right value of bot
https://api.telegram.org/bot/getWebhookInf
You will get a result like this on your screen
{"ok":true,"result":{"url":"url_value",...}}
On the displayed result, copy the entire url_value without quotes and replace it on this second url
https://api.telegram.org/bot/setWebhook?url=url_value&drop_pending_updates=True
Enter the second url with right bot and url_value in your web browser then press ENTER
Done!

i solve it by Change file access permissions file - set permissions file to 755
and second increase memory limit in php.ini file

A quick&dirty way is to get a temporary webhook here: https://webhook.site/ and
set your webhook to that (it will answer with a HTTP/200 code everytime, reseting your pending messages to zero)

I faced the same issue for my tele bot after user edited existing message. My bot receives update with editedMessage continuously, but update.hasMessage() was empty. As a result number of updates rocketly increased and my bot stack.
I solved this issue by adding handling for use case when message is missing - send 200 code:
public APIGatewayProxyResponseEvent handleRequest(APIGatewayProxyRequestEvent event, Context context) {
update = MAPPER.readValue(event.getBody(), Update.class);
if (!update.hasMessage()) {
return new APIGatewayProxyResponseEvent()
.withStatusCode(200) // -> !!!!!! return code 200
.withBody("message is missing")
.withIsBase64Encoded(false);
}
... ... ...

Sail. js / Waterline.js - find() not returning array

Problem: I'm getting unexpected output from code that previously worked.
Code Problem:
sails.models.user.find().then(function (users){...});
is currently returning { id: 1 }
but should return an array of User objects like [{id:x, name:y},...]
Code Alterations:
sails.models.user.find().exec(function (err, users){...}); does not contain an error and returns the same as using .then() like above.
sails.models.user.findOne(1).then(function (users){...}); correctly returns a User like {id:x, name:y}.
sails.models.venue.find().then(function (venues){...}); returns an array of venues, just as substituting any other class besides User.
Note:
This code was previously working (it's a pretty simple line), and the only changes I made between it working and not working was running npm install (but it was previously working on heroku where which installed, so I don't think that was a problem) and changing the schema of User to add a few columns (I did this by deleting the User table in the DB, updating the Sails User model, and lifting the app in create mode, so the table exactly matches the model). Neither of these should cause a problem, but we all know how "should" and coding don't mix :P
How do I fix this? And why did this happen? Thanks :)

Realized other code was calling the package sails-mock-models which was doing its job. Totally forgot about that code. Problem solved.

SailJs is Deleting Data from pg database

Something strange is happening with my app, I am using SailsJs with official PostgreSQL driver and my data gets deleted. I don't have any pattern or list of specific events which deletes the data but I have following observations.
Few days back i was writing a function to destroy data and when I
executed that function it gave me an error I fixed the error and ran
my web app again and whoa data from one of my table was all gone.
Yesterday i wrote a function and I tried to get the HTTP call to that
function but it was giving me 500 server error, I started debugging it
and after executing my program 3 to 4 times with this error partial
data was deleted from one of my database table. Later the error was i
had a typo in URL.
If any of you guys had any experience with what is happening to me please let me know how to fix it? or at least help me on how to reproduce this issue ?
EDIT
I activated the logs and was waiting for it to happen again and it happened again and here is the log from sailsjs
In the logs I saw that its talking about alter.js sync strategy but i have selected it to be the safe strategy

It has happened to me quite a few times, when lifting the app and it is in the process of making changes to the db and it fails, sometimes due to ORM timeout.
What sails do when its lifting and needs to update the data structure is controlled in config/models.js migrate: 'alter', usually commented out, you get a prompt for what to do 1... 2... 3... (writing from the top of my head, i dont remember the actual messages) and a warning about using alter on a production system.
Changing
config/orm.js to have this
// config/orm.js
module.exports.orm = {
_hookTimeout: 60000 // I used 60 seconds as my new timeout
};
And for reasons I don't know changing config/pubsub.js
// config/pubsub.js
module.exports.pubsub = {
_hookTimeout: 60000 // I used 60 seconds as my new timeout
};
has helped me, avoid data loss.

Resolve MongoDB reference

I am currently building a chatting app with nodejs and mongoDB.
Basically I have two collections to maintain in the db.
user = {
_id: ObjectId("1234"),
account: "stan123"
}
thread = {
_user: ObjectId("1234"),
messages: [
{
body:"hi"
_user:ObjectId("1234")
},
{
body:"second msg"
_user:ObjectId("1234")
}
]
}
I am planning to pass the thread model with all resolved info (user) to the client side, so that I can construct my widget with it.
I searched for solutions for this.Some suggests to make extra calls from client side to get the data.
However, I am worried that when the amount of message grows, there will be considerable http calls that might hurt site speed.
I know some drivers can resolve DBRefs automatically and make the code clean.
However, according to
http://docs.mongodb.org/manual/applications/database-references/
I decided to just use id to maintain reference that make it's as simple as possible.
My plan is resolving all references on server side. Current approach is getting the length of message array first.
Then loop through the message array and make a second query to resolve user info separately.
In each query callback, do a messageToResolve++ and if(messageToResolve >= thread.messages.length)
If the condition meets, send the resolved model to client and end the response.
This is not a case I would consider embedded because it would be painful when you need to update user data.
(message is embedded because it exists only when thread exists)
I am not sure if it's a good way to do it.
Does anyone has a better solution?
Sorry if I didn't explain my problem and solution clear enough.
And thanks in advance.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string