How to perform sphinx Multi-queries with Thinking Sphinx

How to perform sphinx Multi-queries with Thinking Sphinx - search

My main aim is to perform Multiple Sphinx queries at once.
They can be on different models/tables or some commons ones.
The final result should be grouped query-wise.
This seems to be supported in Sphinx with Multi-Queries : http://sphinxsearch.com/docs/2.0.7/multi-queries.html
Using ThinkingSphinx with Rails Application, Is there any way I can use this functionality.?
(FYI my TS version is 2.0.11, However I would like to know if it can be done anyways with version 3.x if not with 2.x)

In Thinking Sphinx v1/v2, it's not particularly elegant, but here's the deal:
bundle = ThinkingSphinx::BundledSearch.new
bundle.search 'foo'
bundle.search 'bar', :classes => [Article]
bundle.search 'baz', :classes => [User, Article], :with => {:active => true}
# as soon as you call `searches` on the bundle, the group of queries is sent
# through to Sphinx.
foo_search, bar_search, baz_search = bundle.searches
With Thinking Sphinx v3, it's a bit different:
batch = ThinkingSphinx::BatchedSearch.new
foo_search = ThinkingSphinx.search 'foo'
bar_search = Article.search 'bar'
baz_search = ThinkingSphinx.search 'baz', :classes => [User, Article],
:with => {:active => true}
batch.searches += [foo_search, bar_search, baz_search]
batch.populate
# Use each of your search results objects now as you normally would.
# If you use any of them to access results before the batch.populate call,
# then that will be a separate call to Sphinx.
As an aside from all of this - if you're going to stick with v2 releases for the moment, I'd highly recommend upgrading to Thinking Sphinx v2.1.0, as that's still the old syntax, but uses a connection pool, so even if you're not batching all of these queries together, the socket setup overhead is minimised as much as possible.

Related

Is there a good query builder for Amazon's DynamoDB?

I tracked down this package. Generally it's pretty nice. But it seems to lack support for Projection Expressions. What is your tool of choice for dynamodb in node/typescript?
I'm not fan of the data mappers listed here because they tend to wrap the table data, or are abandoned as projects.

If typescript is an option, we use https://github.com/shiftcode/dynamo-easy. Which also does not support Projection Expression, but the underlying params can always be accessed and manipulated, so adding some non-supported feature is easy.
import { DynamoStore } from '#shiftcoders/dynamo-easy'
const queryRequest = new DynamoStore(PersonModel)
.query()
.wherePartitionKey('2018-01')
.whereSortKey().beginsWith('a')
.limit(1)
const queryParams = queryRequest.params
queryParams.ProjectionExpression = 'projectionExpression'
// also add expression attribute names if required
queryParams.ExpressionAttributeNames = {'#someExpressionAttributeName': 'someExpressionAttributeName'}
// you can also use new DynamoDB().query(queryParams), but we just use the preconfigured wrapped client
queryRequest.dynamoDBWrapper.makeRequest('query', queryParams)
.then(r => console.log('first found item with projection expression:', r))
full disclosure: I am one of the authors of this library

We use dynogels, it is maintained until to date.
https://github.com/clarkie/dynogels

If you need a GUI to construct your query, try using the "DynamoDB Visual Query Builder" I've built: https://dynobase.dev/dynamodb-query-builder/

Best way to manage internationalization in database

I ' ve some troubles , managing my i18n in my database
For now I ' just two languages available on my application , but in order to be scalable, I would like to do it the "best" way.
I could have duplicated all fields like description_fr, description_en but I was no confortable with this at all. What I've done for now, is a external table , call it content, and its architecture is like this :
id_ref => entity referenced id (2)
type => table name (university)
field => field of the specific table (description)
lang => which lang (fr, en, es…)
content => and finally the appropriate content.
I think it can be important to precise, I use sequelizeJS as ORM. So I can use a usefull hooks as afterFind, afterCreate and afterUpdate. So Each time I wanna to find a resource for example, after find it, my hook retrieve all content for this resource and set definitly my object with goods values. It works, but I'm not in love with this.
But I have some troubles with this :
It's increase considerably my number of requests to the database : If I select 50 rows for example, I have to do 50 requests more.. , and just for a particular model. If I have nested models, it's exponential…
Then, It's complicated to fetch data by content i18ned. Example find a university with a specific name is complicated.
And It's a lot of work for updating etc...
So I wonder, if it would be a good idea , to save as a JSON, directly in the table concerned , the data. Something like
{
fr : { 'name':'Ma super université' },
en : { 'name':'My kick ass university' }
}
And keep on using Sequelize Hooks to build and insert proper data into my object.
What do you think ?
How do you manage this ?
EDIT
I use a mysql database
It concerns around 20 fields (cross models)
I have to set the default value using a my default_lang if there is no content set (e.g, event.description in french will be the same as the english one, if there is no content set)

I used this npm package sequelize-i18n. It worked pretty fine for me using sequelize 3.23.2, unfortunately it seems does not have support for sequelize 4.x yet.

MongoDB 2.6 Production Ready Text Search - How To Use Skip For Pagination

In MongoDB 2.6, the text-search is supposedly production ready and we can now use skip. I'd like to use text-search and skip for pagination in my, but I'm not yet sure how to implement it.
Right now, I'm using Mongoose and the Mongoose-text-search plugin, but I don't believe either of them support skip in MongoDB's text search, so I guess I'll need to use the native MongoClient...
My app connects to the database via Mongoose using:
//Bootstrap db connection
var db = mongoose.connect(config.db, function(e) {
Now, how can I use the native MongoClient to execute a full text search for my Products model, with a skip parameter. Here is what I had using Mongoose and Mongoose-text-search, but there is no way to add in skip:
Product = mongoose.model('Product')
var query = req.query.query;
var skip = req.query.skip;
var options = {
project: '-created', // do not include the `created` property
filter: filter, // casts queries based on schema
limit: 20,
language: 'english',
lean: true
};
Product.textSearch(query, options, function (err, response) {
});

The main difference introduced in 2.6 versions of MongoDB is that you can issue a "text search" query using the standard .find() interface so the old methods for textSearch would no longer need to be applied. This is basically how modifiers such as limit and skip can be applied.
But keep in mind that as of writing the current Mongoose dependency is for an earlier version of the MongoDB node driver that existed prior to the release of MongoDB 2.6. As Mongoose actually wraps the main methods and does some syntax checking of it's own, it is probably likely ( as in untried by me ) that using the Mongoose methods will currently fail.
So what you will need to do is get the underlying driver method for .find(), and also now use the $text operator instead:
Product.collection.find(
{ "$text": { "$search": "term" } },
{ "sort": { "score": { "$meta": "textScore" } }, "skip": 25, "limit": 25 },
function(err,docs) {
// processing here
});
Also noting that $text operator does not sort the results by "score" for relevance by default, but this is passed to the "sort" option using the new $meta operator, which is also introduced in MongoDB 2.6.
So alter your skip and limit values and you have paging on text search results and with a cursor. Just be wary of large data returns as skip and limit are not really efficient ways to move through a large cursor. Better to have another key where you can range match, even though counter-intuitive to "relevance matching".
So, text search facilities are a bit "better" but not "perfect". As always, if you really need more and/or more performance, look to an external solution.
Feel free to try a similar operation with the Mongoose implementation of .find() as well. But have my reservations from past experience that there is generally some masking and checking going on there, so hence the description of usage with the "native" node driver.

Helping Kohana 3 ORM to speed up a little

I noticed that Kohana 3 ORM runs a "SHOW FULL COLUMNS" for each of my models when I start using them:
SHOW FULL COLUMNS FROM `mytable`
This query might take a few clock cycles to execute (in the Kohana profiler it's actually the slowest of all queries ran in my current app).
Is there a way to help Kohana 3 ORM to speed up by disabling this behaviour and explicitly define the columns in my models instead?

biakaveron answered my question with a comment so I can't except the correct answer.
Taken from Wouters answer on the official Kohana forums (where biakaveron pointed to), this is the correct answer:
It's very easy, $table_columns is a
big array with a lot of info, but
actually only very little of this info
is used in ORM.
This will do:
protected $_table_columns = array(
'id' => array('type'=>'int'),
'name' => array('type'=>'string'),
'allowNull' => array('type'=>'string','null'=>TRUE),
'created' => array('type'=>'int')
);

There isn't too much overhead when that query gets executed; though you can cache them / skip the process by defining them manually (if that is really what you want override the $_table_columns in your models, though I don't see how much time you can save doing it - it's worth trying).
I proposed a caching alternative for list_columns() but it got denied as it really isn't that much of a bottleneck: http://dev.kohanaframework.org/issues/2848

do not forget underscore:
protected $_table_columns = array(
'id' => array('type'=>'int'),
'name' => array('type'=>'string'),
'allowNull' => array('type'=>'string','null'=>TRUE),
'created' => array('type'=>'int')
);
This will give you full column
info as an array:
var_export($ORM->list_columns());

Not sure how the kohana team said 'show full columns' runs as fast reading from cache for all cases. Query cache is a bottleneck on mysql for our workload due to . So we had to turn it off.
https://blogs.oracle.com/dlutz/entry/mysql_query_cache_sizing
Proof that show full columns is the most run query
https://www.dropbox.com/s/zn0pbiogt774ne4/Screenshot%202015-02-17%2018.56.21.png?dl=0
Proof of the temp tables on the disk from NewRelic mysql plugin.
https://www.dropbox.com/s/cwo09sy9qxboeds/Screenshot%202015-02-17%2019.00.19.png?dl=0
And the top offending queries (> 100ms) sorted by query count.
https://www.dropbox.com/s/a1kpmkef4jd8uvt/Screenshot%202015-02-17%2018.55.38.png?dl=0

How to generate CouchDB UUID with Node.js?

Is there a way to generate random UUID like the ones used in CouchDB but with Node.js?

There are different ways to generate UUIDs. If you are already using CouchDB, you can just ask CouchDB for some like this:
http://127.0.0.1:5984/_uuids?count=10
CouchDB has three different UUID generation algorithms. You can specify which one CouchDB uses in the CouchDB configuration as uuids/algorithm. There could be benefits to asking CouchDB for UUIDs. Specifically, if you are using the "sequence" generation algorithm. The UUIDs you get from CouchDB will fall into that sequence.
If you want to do it in node.js without relying on CouchDB, then you'll need a UUID function written JavaScript. node-uuid is a JavaScript implementation that uses "Version 4" (random numbers) or "Version 1" (timestamp-based). It works with node.js or hosted in a browser: https://github.com/broofa/node-uuid
If you're on Linux, there is also a JavaScript wrapper for libuuid. It is called uuidjs. There is a performance comparison to node-uuid in the ReadMe of node-uuid.
If you want to do something, and it doesn't look like it's supported in node.js, be sure to check the modules available for npm.

I had the same question and found that simply passing a 'null' for the couchdb id in the insert statement also did the trick:
var newdoc = {
"foo":"bar",
"type": "my_couch_doctype"
};
mycouchdb.insert(newdoc, null /* <- let couchdb generate for you. */, function(err, body){
});

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string