I installed ArangoDB Enterprise Edition for evaluation in single server mode, where I'm using the following AQL script to insert a billion random documents into my collection for testing purposes.
FOR i IN 1..1000000000
INSERT {
title: RANDOM_TOKEN(32),
description: RANDOM_TOKEN(32),
by: RANDOM_TOKEN(32),
url: CONCAT(CONCAT("http://www.",RANDOM_TOKEN(32)),".com"),
tags: [RANDOM_TOKEN(10), RANDOM_TOKEN(10), RANDOM_TOKEN(10)],
likes: FLOOR(RAND() * (51)),
comments: [
{
user:RANDOM_TOKEN(24),
message: RANDOM_TOKEN(100),
dateCreated: DATE_ISO8601(946681801000 + FLOOR(RAND() * 1000000000000)),
likes: FLOOR(RAND() * (51))
}
]
} IN myCollection2
With some simple calculations, it seems like this query would take about 12 hours on my device. However, I have no longer than 3 hours from the moment I installed the server for evaluation before my license expires. So I'm wondering if there is any way to accelerate this query or even run it in multiple threads.
Related
I'm using ArangoDB version 3.9, I have a document-based collection named myCollection2. Each document in this collection has a 'likes' attribute, which is holding a float numeric value. Documents are dummy data now, where I used the following query to create it.
FOR i IN 1..998700000
INSERT {
title: RANDOM_TOKEN(32),
description: RANDOM_TOKEN(32),
by: RANDOM_TOKEN(32),
url: CONCAT(CONCAT("http://www.",RANDOM_TOKEN(32)),".com"),
tags: [RANDOM_TOKEN(10), RANDOM_TOKEN(10), RANDOM_TOKEN(10)],
likes: FLOOR(RAND() * (51)),
comments: [
{
user:RANDOM_TOKEN(24),
message: RANDOM_TOKEN(100),
dateCreated: DATE_ISO8601(946681801000 + FLOOR(RAND() * 1000000000000)),
likes: FLOOR(RAND() * (51))
}
]
} IN myCollection2
Then I added a persistent index to the collection on the likes attribute and used the query below to find documents with some value.
FOR s in myCollection2
FILTER s.likes == 29.130405590990936
return s
knowing that the value 29.130405590990936 actually exists in some documents, the above query is taking about ~8 ms, which is great. However, when using some other value that doesn't actually exist, say for example 10, the query takes almost about 1 hour, which is crazy. Am I missing something here?
I have been using Meilisearch for a couple of months and have recently upgraded to 0.26.0. For some reason, when I am today trying to create an index using the node package, nothing seems to happen.
I can successfully use the createIndex method like below:
client.createIndex("movies")
and the return value for the method shows the task:
{
uid: 26858,
indexUid: 'movies',
status: 'enqueued',
type: 'indexCreation',
enqueuedAt: '2022-04-08T15:15:06.325108519Z'
}
However, when I look up this task it seems that it has not been started:
{
uid: 26858,
indexUid: 'movies',
status: 'enqueued',
type: 'indexCreation',
details: { primaryKey: null },
duration: null,
enqueuedAt: '2022-04-08T15:15:06.325108519Z',
startedAt: null,
finishedAt: null
}
And indeed I can't find the index using the getIndexes method.
Strangely I created an index without issue just a few days ago.
Any idea what the issue might be or how I could debug this?
Most of Meilisearch's asynchronous operations belong to a category called "tasks". Create an index is one of them.
Depending on the queue size and server processing power it may take a while to process recently created tasks.
You can find more information on section asynchronous operation of documentation.
We are facing a timeout issue with our mongo updates. Our collection currently contains around 300 thousand documents. When we try to update a record via the UI, the server times out and the UI is stuck in limbo.
Lead.updateOne({
_id: body.CandidateID
}, {
$set: {
ingestionStatus: 'SUBMITTED',
program: body.program,
participant: body.participant,
promotion: body.promotion,
addressMeta: body.addressMeta,
CreatedByID: body.CreatedByID,
entryPerson: body.entryPerson,
lastEnteredOn: body.lastEnteredOn,
zipcode: body.zipcode,
state: body.state,
readableAddress: body.readableAddress,
promotionId: body.promotionId,
programId: body.programId,
phone1: body.phone1,
personId: body.personId,
lastName: body.lastName,
hasSignature: body.hasSignature,
firstName: body.firstName,
city: body.city,
email: body.email,
addressVerified: body.addressVerified,
address: body.address,
accountId: body.accountId
}
This is how we update a single record. We are using mlab and Heroku in our stack. Looking for advice on how to speed this up considerably.
Thank you.
If your indexes are fine then you could try rebuilding indexes on this collection.
collection indexes from the mango command line:
For example, rebuild the lead collection indexes from the mongo command line:
db.lead.reIndex();
Reference:
https://docs.mongodb.com/v3.2/tutorial/manage-indexes/
https://docs.mongodb.com/manual/reference/command/repairDatabase/
if you are not using this then try this one
Index builds can block write operations on your database, so you don’t want to build indexes in the foreground on large tables during peak usage. You can use the background creation of indexes by specifying background: true when creating.
db.collection.createIndex({ a:1 }, { background: true })
This will ultimately take longer to complete, but it will not block operations and will have less of an impact on performance.
1) Shard Lead collection by id as shard key.
2) Check if the memory taken by mongodb due to index is less than the memory of the mongoDb server.
Have you tried what this answer suggests? Namely, updating with no write-concern?
I'm coding up a Node.js server using ExpressJS and MongoDB with MongooseJS.
I have a collection called measurements. I want every document in this collection to be contained in the DB only for 24 hours. Every document that is older than 24 hours should be deleted.
To achieve this goal I've used both npm's mongoose-ttl package and MongoDB's built-in TTL.
The code for mongoose-ttl case:
const ttl = require('mongoose-ttl');
const weatherMeasurementsSchema = Schema({
parentRoomId: { type: Schema.Types.ObjectId, ref: "Room" },
temperature: Number,
humidity: Number,
createdAt: Date
});
weatherMeasurementsSchema.plugin(ttl, { ttl: "1m", interval: "1m" });
The code for pure MongoDB case:
db.measurements.createIndex( { "createdAt": 1 }, { expireAfterSeconds: 60})
For testing purposes I set the ttl and interval fields to 1 minute only, so I can gather the results earlier.
Both mongoose-ttl and MongoDB's TTL delete the data properly, but after they do that, my server crashes with error: Error: socket hang up
What could be the cause of this?
EDIT:
I investigated this further, and the only thing more that I have come upon is that the error is thrown after querying to the DB for that collection.
I created a default sails.js install with all default options on top of a simple mysql database. Currently, I have a table with 91 records.
I have an ionic front end that loads the list in that table, but it's displaying 30 records.
I used postman to hit the sails.js url (http://localhost:1337/list) and that is showing 30 records returned).
While it's a duplicate effort, I hit the url (http://localhost:1337/list) directly in a browser and it still returns 30 records.
Is there a default to how many records sails.js will return?
If so, how do I remove this default so sails.js will return all the results rather than a subset? I don't want to paginate, I want the full list.
PS I have checked the sails.js model I created to verify I don't have any funky limiting stuff and it's uber barebones.
I have no custom code and the entire sails.js install is the default minus the db connection, the skeleton controller, and the models.
Here's my model:
module.exports = {
identity: 'List',
attributes: {
id: {
type: 'integer',
primaryKey: true
},
name: {
type: 'string',
unique: true,
required: true
},
user: {
type: 'integer'
}
}
};
You are using blueprints for find action.
Go inside config/blueprints.js file and check all comments...
There you will find:
/****************************************************************************
* *
* The default number of records to show in the response from a "find" *
* action. Doubles as the default size of populated arrays if populate is *
* true. *
* *
****************************************************************************/
// defaultLimit: 30
Change it as you prefer.