ElasticSearch intermittent blank responses (cluster status: yellow) - search

I am using ElasticSearch (based off of Lucene) for searches. Every now and then over the past month or so since switching, I have had users get no results on a query. If they refresh, the results will be populated. Looking into the log I don't see any errors, so I am assuming that ElasticSearch just can't access the index on occasion.
This isn't when I am indexing large amounts of documents either. Is there something I can add to the settings that will help me better debug this? Would it help to tweak the configuration or add another cluster or node?
If I check my Admin Cluster Health, it is a yellow status. Would resolving this help my issue?
{
"cluster_name" : "xxxxx",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
}

Related

Aggregate time series data

I am using mongoDB and mongoose to store metrics data. It is stored as a document for an array of metrics referencing the project it's stored for and metric type.
The schema for this looks like this:
exports.metricReportSchema = new Schema({
metrics: [{
metric: {
type: mongoose.Schema.Types.ObjectId,
ref: 'metricSchema',
required: true
},
value: {
type: String,
required: true
}
}],
project: {
type: mongoose.Schema.Types.ObjectId,
ref: 'projectSchema',
required: true
},
reportDate: Date
});
And the actual document looks like the following:
db.metricreports.findOne() {
"_id" : ObjectId("58a60e8459dd3d12ef8c5d51"),
"reportDate" : ISODate("2017-02-16T20:41:40.657Z"),
"project" : ObjectId("58a20f5f04ef5789d3ef8faa"),
"metrics" : [
{
"metric" : ObjectId("58a20f5f04ef5789d3ef8fb7"),
"value" : "781",
"_id" : ObjectId("58a60e8459dd3d12ef8c5d52")
}, {
"metric" : ObjectId("58a21106fc2aef8a10ded196"),
"value" : "566",
"_id" : ObjectId("58a60e8459dd3d12ef8c5d53")
}, {
"metric" : ObjectId("58a2141bded78e8ad8384f97"),
"value" : "501",
"_id" : ObjectId("58a60e8459dd3d12ef8c5d54")
}, {
"metric" : ObjectId("58a2141bded78e8ad8384f94"),
"value" : "44",
"_id" : ObjectId("58a60e8459dd3d12ef8c5d55")
}, {
"metric" : ObjectId("58a2141bded78e8ad8384f93"),
"value" : "645",
"_id" : ObjectId("58a60e8459dd3d12ef8c5d56")
}
],
"__v" : 0
}
Over time, there are multiple documents of this kind that store slices of data for multiple metrics. It is very convenient for selecting and displaying static reports on metrics for multiple projects and whatnot.
Now, this is becoming little complex when I try to build a time series report for an individual metric for a project.
Basically, what I would need to do is to scan multiple metricReport documents and extract individual single metrics' data from all available reports over time. Let's say I have 10 metricReports that each contain data for 10 different metrics and I only want to extract one, this could probably look like this:
{
"_id": "...",
"project": "...",
"metric": "...",
"data": {
"2016-02-02": "22",
"2016-02-03": "453",
...
}
}
I could not find a way to do this with out-of-box mongoDB querying and filtering functionality and wanted to ask for advice:
Is my approach of storing multiple metrics in a single document reasonable? Would I be better of keeping metrics as individual documents and then "merging" them somehow?
Is there a way to achieve what I need without doing this with nodejs (I assume this is not going to be very fast thing - grabbing the documents and then iterating them to create a new structure alongside and pushing it out)?
Is there a better way to do this? Virtual models or something in mongoose that could help? I understand that mongoDB may not be the right choice for time series data but it's not the only part of functionality and mongoDB/mongoose combination seems to be serving the other purposes nicely and I don't want to change the technology mid-way.
Yes, but keep in mind that documents have a limited size (16 MB IIRC), so if your data is unbounded, this structure will not work as your "metrics" array will grow past that.
Ultimately yes, even if you can't figure out a decent filter query, Mongo has MapReduce which will allow you to do what you want, though it won't be easy. I'd use Node for this.
There's no silver bullet here. Mongo is excellent if you need to aggregate data and store it as arbitrary JSON (i.e. to be consumed by an app), and not so great at doing complex joins/views of data. Any joins at the application level will be slow. If you want performance, you'll have to aggregate your reports into individual documents, and save & serve them. If you have live data, this will be more difficult as you'll need to handle updates.

Storing a complex Query within MongoDb Document [duplicate]

This is the case: A webshop in which I want to configure which items should be listed in the sjop based on a set of parameters.
I want this to be configurable, because that allows me to experiment with different parameters also change their values easily.
I have a Product collection that I want to query based on multiple parameters.
A couple of these are found here:
within product:
"delivery" : {
"maximum_delivery_days" : 30,
"average_delivery_days" : 10,
"source" : 1,
"filling_rate" : 85,
"stock" : 0
}
but also other parameters exist.
An example of such query to decide whether or not to include a product could be:
"$or" : [
{
"delivery.stock" : 1
},
{
"$or" : [
{
"$and" : [
{
"delivery.maximum_delivery_days" : {
"$lt" : 60
}
},
{
"delivery.filling_rate" : {
"$gt" : 90
}
}
]
},
{
"$and" : [
{
"delivery.maximum_delivery_days" : {
"$lt" : 40
}
},
{
"delivery.filling_rate" : {
"$gt" : 80
}
}
]
},
{
"$and" : [
{
"delivery.delivery_days" : {
"$lt" : 25
}
},
{
"delivery.filling_rate" : {
"$gt" : 70
}
}
]
}
]
}
]
Now to make this configurable, I need to be able to handle boolean logic, parameters and values.
So, I got the idea, since such query itself is JSON, to store it in Mongo and have my Java app retrieve it.
Next thing is using it in the filter (e.g. find, or whatever) and work on the corresponding selection of products.
The advantage of this approach is that I can actually analyse the data and the effectiveness of the query outside of my program.
I would store it by name in the database. E.g.
{
"name": "query1",
"query": { the thing printed above starting with "$or"... }
}
using:
db.queries.insert({
"name" : "query1",
"query": { the thing printed above starting with "$or"... }
})
Which results in:
2016-03-27T14:43:37.265+0200 E QUERY Error: field names cannot start with $ [$or]
at Error (<anonymous>)
at DBCollection._validateForStorage (src/mongo/shell/collection.js:161:19)
at DBCollection._validateForStorage (src/mongo/shell/collection.js:165:18)
at insert (src/mongo/shell/bulk_api.js:646:20)
at DBCollection.insert (src/mongo/shell/collection.js:243:18)
at (shell):1:12 at src/mongo/shell/collection.js:161
But I CAN STORE it using Robomongo, but not always. Obviously I am doing something wrong. But I have NO IDEA what it is.
If it fails, and I create a brand new collection and try again, it succeeds. Weird stuff that goes beyond what I can comprehend.
But when I try updating values in the "query", changes are not going through. Never. Not even sometimes.
I can however create a new object and discard the previous one. So, the workaround is there.
db.queries.update(
{"name": "query1"},
{"$set": {
... update goes here ...
}
}
)
doing this results in:
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 52,
"errmsg" : "The dollar ($) prefixed field '$or' in 'action.$or' is not valid for storage."
}
})
seems pretty close to the other message above.
Needles to say, I am pretty clueless about what is going on here, so I hope some of the wizzards here are able to shed some light on the matter
I think the error message contains the important info you need to consider:
QUERY Error: field names cannot start with $
Since you are trying to store a query (or part of one) in a document, you'll end up with attribute names that contain mongo operator keywords (such as $or, $ne, $gt). The mongo documentation actually references this exact scenario - emphasis added
Field names cannot contain dots (i.e. .) or null characters, and they must not start with a dollar sign (i.e. $)...
I wouldn't trust 3rd party applications such as Robomongo in these instances. I suggest debugging/testing this issue directly in the mongo shell.
My suggestion would be to store an escaped version of the query in your document as to not interfere with reserved operator keywords. You can use the available JSON.stringify(my_obj); to encode your partial query into a string and then parse/decode it when you choose to retrieve it later on: JSON.parse(escaped_query_string_from_db)
Your approach of storing the query as a JSON object in MongoDB is not viable.
You could potentially store your query logic and fields in MongoDB, but you have to have an external app build the query with the proper MongoDB syntax.
MongoDB queries contain operators, and some of those have special characters in them.
There are rules for mongoDB filed names. These rules do not allow for special characters.
Look here: https://docs.mongodb.org/manual/reference/limits/#Restrictions-on-Field-Names
The probable reason you can sometimes successfully create the doc using Robomongo is because Robomongo is transforming your query into a string and properly escaping the special characters as it sends it to MongoDB.
This also explains why your attempt to update them never works. You tried to create a document, but instead created something that is a string object, so your update conditions are probably not retrieving any docs.
I see two problems with your approach.
In following query
db.queries.insert({
"name" : "query1",
"query": { the thing printed above starting with "$or"... }
})
a valid JSON expects key, value pair. here in "query" you are storing an object without a key. You have two options. either store query as text or create another key inside curly braces.
Second problem is, you are storing query values without wrapping in quotes. All string values must be wrapped in quotes.
so your final document should appear as
db.queries.insert({
"name" : "query1",
"query": 'the thing printed above starting with "$or"... '
})
Now try, it should work.
Obviously my attempt to store a query in mongo the way I did was foolish as became clear from the answers from both #bigdatakid and #lix. So what I finally did was this: I altered the naming of the fields to comply to the mongo requirements.
E.g. instead of $or I used _$or etc. and instead of using a . inside the name I used a #. Both of which I am replacing in my Java code.
This way I can still easily try and test the queries outside of my program. In my Java program I just change the names and use the query. Using just 2 lines of code. It simply works now. Thanks guys for the suggestions you made.
String documentAsString = query.toJson().replaceAll("_\\$", "\\$").replaceAll("#", ".");
Object q = JSON.parse(documentAsString);

Update document in array MongoDB

I know this question has been asked in various ways before but I've read them all and I can't get it to work with the below:
Here is a doc example:
"_id" : ObjectId("583659c5be5f0e6f70c95633"),
"firstName" : "da",
"lastName" : "ksk",
"email" : "papap#aol.com",
"surveyResults" : [
{
"mouseTracking" : [ ],
"result" : "White",
"question" : "What color website do you prefer?",
"questionNumber" : "0"
},
{
"mouseTracking" : [ ],
"result" : "Laptop",
"question" : "What device do you use most to browse the web?",
"questionNumber" : "1"
}]
Here is what I am trying to run in mongoose and having no luck:
db.results.update({_id:userIdVar, surveyResults: {$elemMatch: {questionNumber: questionNumVar}}},{$set:{"surveyResults.$.result" : newResultVar}});
I have variables being used and I know they have the proper values to match what is in the places I need to update. Is there something I am missing??
I added a few levels of console logging throughout the conditionals and before and after the update and saw that all of the paths were being traveled. Then I added the line:
mongoose.set('debug', true);
This logs all mongoose interactions with the db. I saw that my command I pasted wasn't even running! Turns out I accidentally left off the
.exec()
at the end so it wasn't executing the command. After a hours of checking and triple checking my syntax, variable types, it was just this.
I did in the process of this learn more about the $ operator and you should read up on it if you don't know it's place in mongoose/mongodb
As far I guess type miss match to element match. In your data example questionNumber is string so if you compare with integer type questionNumVar then it should not work. so to compare need to convert to string (if integer) otherwise should be working.
can try like:
db.results.update({_id:userIdVar, surveyResults: {$elemMatch: {questionNumber: questionNumVar.toString()}}},{$set:{"surveyResults.$.result" : newResultVar}}).exec();
then it should be working.

Mongodb: Dynamic query object in collection.find()

I am working on a Node.js + MongoDB application. The application inserts some records in the MongoDB. For example lets take below simple record:
{
"name": "Sachin",
"age" : 11,
"class": 5,
"percentage": 78,
"rating": 5
}
Now end user can set different rule for which they want to get the notification/alert when a specific condition is satisfied. For example we can have a rule like:
1) Rule1: Generate notification/alert if "percentage" is less than 40
In order to achieve this, I am using Replication and tailable cursor. So whenever a new record gets added in the collection I get an record in the tailable cursor.
coll = db.collection('oplog.rs');
options = {
tailable: true,
awaitdata: true,
numberOfRetries: -1
};
var qcond = {'o.data.percentage':{$gt:40}};
coll.find(qcond, options, function(err, cur) {
cur.each(function(err, doc) {
//Perform some operations on received document like
//adding it to other collection or generating alert
}); //cur.each
}); //find
Everything works fine till this point.
Now problem starts when enduser wants to add another rule at runtime say:
2) Rule2: Generate notification/alert if "rating" is greater than 8
Now I would like to consider this condition/rule as well when querying the tailable cursor. But the current cursor is already in a waiting state based on the conditions given as per Rule1 only.
Is there any way to update the query conditions dynamically so that I can include conditions for Rule2 as well?
I tried searching but couldn't find a way to achieve this.
Does anyone have any suggestion/pointers to tackle this situation?
No. You can't modify a cursor once it's open on the server. You'll need to terminate the cursor and reopen it to cover both conditions, or open a second cursor to cover the second condition.

Query and Insert with a single command

I have the following documents:
{
"_id": "538584aad48c6cdc3f07a2b3",
"startTime": "2014-06-12T21:30:00.000Z",
"endTime": "2014-06-12T22:00:00.000Z",
},
{
"_id": "538584b1d48c6cdc3f07a2b4",
"startTime": "2014-06-12T22:30:00.000Z",
"endTime": "2014-06-12T23:00:00.000Z",
}
All of them have startTime and endTime value. I need to maintain consistency that no two date spans in the collection overlap.
Let's say if I add the following document with the following dates:
db.collection.insert({
"startTime": "2014-06-12T19:30:00.000Z",
"endTime": "2014-06-12T21:00:00.000Z"
});
This date span insert should fail because it overlaps with an existing interval.
My questions are:
How to check for date span overlap?
How to check and insert with a single query?
EDIT: to prevent duplicate I ask there and start a bounty. I need to make update operation by using single query as described here: How to query and update document by using single query?
The query is not as complicated as it may look at first - the query to find all documents which "overlap" the range you are given is:
db.test.find( { "startTime" : { "$lt" : new_end_time },
"endTime" : { "$gt": new_start_time }
}
)
This will match any document with starting date earlier than our end date and end date greater than our start time. If you visualize the ranges as being points on a line:
-----|*********|----------|****|-----------|******||********|---
s1 e1 s2 e2 s3 e3s4 e4
the sX-eX pairs represent existing ranges. If you take a new s5-e5 you can see that if we eliminate pairs that start after our end date (they can't overlap us) and then we eliminate all pairs that end before our start date, if we have nothing left, then we are good to insert.
That condition would be does a union of all documents with end date $lte our start and those with start date $gte ours include all documents already in collection. Our query flips this around to make sure that no documents satisfy the opposite of this condition.
On the performance front, it's unfortunate that you are storing your dates as strings only. If you stored them as timestamps (or any number, really) you could make this query utilize indexes better. As it is, for performance you would want to have an index on { "startTime":1, "endTime":1 }.
It's simple to find whether the range you want to insert overlaps any existing ranges, but to your second question:
How to check and insert with a single query?
There is no way proper way to do it with an inserts since they do not take a query (i.e. they are not conditional).
However, you can use an updates with upsert condition. It can insert if the condition doesn't match anything, but if it does match, it will try to update the matched document!
So the trick you would use is make the update a noop, and set the fields you need on upsert only. Since 2.4 there is a $setOnInsert operator to update. The full thing would look something like this:
db.test.update(
{ startTime: { "$lt" : new_end_time }, "endTime" : { "$gt": new_start_time } },
{ $setOnInsert:{ startTime:new_start_time, endTime: new_end_time}},
{upsert:1}
)
WriteResult({
"nMatched" : 0,
"nUpserted" : 1,
"nModified" : 0,
"_id" : ObjectId("538e0f6e7110dddea4383938")
})
db.test.update(
{ startTime:{ "$lt" : new_end_time }, "endTime" : { "$gt": new_start_time } },
{ $setOnInsert:{ startTime:new_start_time, endTime: new_end_time}},
{upsert:1}
)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 0 })
I just did the same "update" twice - the first time, there was no overlap document(s) so the update performed an "upsert" which you can see in the WriteResult it returned.
When I ran it a second time, it would overlap (itself, of course) so it tried to update the matched document, but noticed there was no work to do. You can see the returned nMatched is 1 but nothing was inserted or modified.
This query should return all documents that somehow overlap with the new start/end-Time values.
db.test.find({"$or":[
{"$and":[{"startTime":{"$lte":"new_start_time"}, "endTime":{"$gte":"new_start_time"}}, //new time has an old startTime in the middle
{"startTime":{"$lte":"new_end_time"}, "endTime":{"$lte":"new_end_time"}}]},
{"$and":[{"startTime":{"$gte":"new_start_time"}, "endTime":{"$gte":"new_start_time"}}, //new time sorounds and old time
{"startTime":{"$lte":"new_end_time"}, "endTime":{"$lte":"new_end_time"}}]},
{"$and":[{"startTime":{"$gte":"new_start_time"}, "endTime":{"$gte":"new_start_time"}}, //an old time has the new endTime in the middle
{"startTime":{"$lte":"new_end_time"}, "endTime":{"$gte":"new_end_time"}}]},
{"$and":[{"startTime":{"$lte":"new_start_time"}, "endTime":{"$gte":"new_start_time"}}, //new time is within an old time
{"startTime":{"$lte":"new_end_time"}, "endTime":{"$gte":"new_end_time"}}]}
]})
You want to run both queries at the same time. It means you want Synchronous in your code visit this question it may help for your answer
Synchronous database queries with Node.js

Resources