Unaccent in Sequelize - node.js

I'm currently working in a project that uses ExpressJS, PostgreSQL and Sequelize as the ORM. I developed a search function that makes a query that searches items by name:
models.foo.findAll({
where: {
$or: [
{name: {$ilike: keywords}},
{searchMatches: {$contains: [keywords]}}
]
},
order: [['name', 'ASC']]
})
This works fine, but if the name contains an special character (like á, é, í, ó or ú) this query won't find it.
Is there a way to make the query search names with speacial characters in a meaningful sense? Like if I search the name "potato" the results "The potato", "Da potátos" and "We are the pótatóes" will come out, but not "We eat pátatos" (since á != o)

This can now be done without a completely RAW query, but using Sequelize's in built functions:
models.foo.findAll({
where: Sequelize.where(
Sequelize.fn('unaccent', Sequelize.col('name')), {
[Op.iLike]:`%${keywords}%`
}),
order: [['name', 'ASC']]
})
Then ordering, associations etc. all work still as normal :).

I finally found a valid solution. First I created the unaccent extension:
create extension unaccent;
Then I just used a raw query (I couldn't figure out how to build the query using Sequelize's way) like this:
models.sequelize.query(
`SELECT
*
FROM
"Foos"
WHERE
unaccent("name") ilike unaccent('${keywords}')
OR "searchMatches" #> ARRAY[unaccent('${keywords}')]::VARCHAR(255)[]
ORDER BY
"name" ASC`, {model: models.Foo})
And it works!

A dictionary might be what you are looking for. Can basically be used to map synonyms and exclude common elements from indexes (e.g. "a" and "the" from English text), amongst other things.
https://www.postgresql.org/docs/current/static/textsearch-dictionaries.html

In my case I solved this question using the Sequelize.literal and COLLATE that way:
where: Sequelize.literal(`name COLLATE Latin1_general_CI_AI like '%${keywords}%' COLLATE Latin1_general_CI_AI`)
That way, removing the accents on both sides.

Related

Search string value inside an array of objects inside an object of the jsonb column- TypeORM and Nest.js

the problem I am facing is as follows:
Search value: 'cooking'
JSON object::
data: {
skills: {
items: [ { name: 'cooking' }, ... ]
}
}
Expected result: Should find all the "skill items" that contain 'cooking' inside their name, using TypeORM and Nest.js.
The current code does not support search on the backend, and I should implement this. I want to use TypeORM features, rather than handling it with JavaScript.
Current code: (returns data based on the userId)
const allItems = this.dataRepository.find({ where: [{ user: { id: userId } }] })
I investigated the PostgreSQL documentation regarding the PostgreSQL functions and even though I understand how to create a raw SQL query, I am struggling to convert this to the TypeORM equivalent.
Note: I researched many StackOverflow issues before creating this question, but do inform me If I missed the right one. I will be glad to investigate.
Can you help me figure out the way to query this with TypeORM?
UPDATE
Let's consider the simple raw query:
SELECT *
FROM table1 t
WHERE t.data->'skills' #> '{"items":[{ "name": "cooking"}]}';
This query will provide the result for any item within the items array that will match exact name - in this case, "cooking".
That's totally fine, and it can be executed as a raw request but it is certainly not easy to maintain in the future, nor to use pattern matching and wildcards (I couldn't find a solution to do that, If you know how to do it please share!). But, this solution is good enough when you have to work on the exact matches. I'll keep this question updated with the new findings.
use Like in Where clause:
servicePoint = await this.servicePointAddressRepository.find({
where: [{ ...isActive, name: Like("%"+key+"%"), serviceExecutive:{id: userId} },
{ ...isActive, servicePointId: Like("%"+key+"%")},
{ ...isActive, branchCode: Like("%"+key+"%")},
],
skip: (page - 1) * limit,
take: limit,
order: { updatedAt: "DESC" },
relations:["serviceExecutive","address"]
});
This may help you! I'm matching with key here.

Usage of TSVECTOR and to_tsquery to filter records in Sequelize

I've been trying to get full search text to work for a while now without any success. The current documentation has this example:
[Op.match]: Sequelize.fn('to_tsquery', 'fat & rat') // match text search for strings 'fat' and 'rat' (PG only)
So I've built the following query:
Title.findAll({
where: {
keywords: {
[Op.match]: Sequelize.fn('to_tsquery', 'test')
}
}
})
And keywords is defined as a TSVECTOR field.
keywords: {
type: DataTypes.TSVECTOR,
},
It seems like it's generating the query properly, but I'm not getting the expected results. This is the query that it's being generated by Sequelize:
Executing (default): SELECT "id" FROM "Tests" AS "Test" WHERE "Test"."keywords" ## to_tsquery('test');
And I know that there are multiple records in the database that have 'test' in their vector, such as the following one:
{
"id": 3,
"keywords": "'keyword' 'this' 'test' 'is' 'a'",
}
so I'm unsure as to what's going on. What would be the proper way to search for matches based on a TSVECTOR field?
It's funny, but these days I am also working on the same thing and getting the same problem.
I think part of the solution is here (How to implement PostgresQL tsvector for full-text search using Sequelize?), but I haven't been able to get it to work yet.
If you find examples, I'm interested. Otherwise as soon as I find the solution that works 100% I will update this answer.
What I also notice is when I add data (seeds) from sequelize, it doesn't add the lexemes number after the data of the field in question. Do you have the same behavior ?
last thing, did you create the index ?
CREATE INDEX tsv_idx ON data USING gin(column);

Storing and querying PostgreSQL database entities with multiple related entities?

Designing a PostgreSQL database that will be queried by a Node API using Sequelize. Currently, I have a table called recipes that has columns called ingredients and instructions. Those columns are stored for a given as an array of strings like {Tomatoes, Onions}.
That method of storage worked fine for simply fetching and rendering a recipe on the client side. But it wasn't working well for fuzzy search querying because, using Sequelize all I could do was ingredients: { [Op.contains] : [query] }. So if a user typed tomatoes there was no way to write a "fuzzy" search query that would return a recipe with an ingredient Tomatoes.
And then I read this in the PostgreSQL documentation:
Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.
Now I'm considering storing ingredients and instructions as separate tables, but I have a couple of questions.
1) As a recipe can have multiple ingredients related to it, should I just use a foreign key for each ingredient and the Sequelize hasMany relationship? That seems correct to me, except that I'm now potentially duplicating common ingredients each time a new recipe is created that uses that ingredient.
2) What would be the best way to write a fuzzy search query so that a user could search the main columns of the recipes table (e.g. title, description) and additionally apply their query to the instructions and ingredients tables?
Essentially I'd like to end up with a fuzzy search query applied to the three tables that looks something like this...
const recipes = await req.context.models.Recipe.findAll({
where: {
[Op.or]: [
{ title: { [Op.iLike]: '%' + query + '%' } },
{ description: { [Op.iLike]: '%' + query + '%' } },
{ ingredients: { ingredient: { [Op.iLike]: '%' + query + '%' } } },
{ instructions: { instruction: { [Op.iLike]: '%' + query + '%' } } }
]
}
});
Thanks!
I have done this, i happen to use graphql in my node layer with sequelize, and i have filter objects that do this type of thing. You'll just need some include statements in your Recipie.findAll.. after your initial where clause where you evaluate whether you are searching title or description or both type thing. i sent my search params in with prefix's i could strip off that told me what sequelize op's i would want to use on them and just ran my args through a utility method to create my where clause, but i know there are many ways to skin that cat. i just did not want to clutter up my resolvers with tonnes of hardcoded ops and conditional clauses was all.... your include might look something like this
include: [{
model: models.Ingredient,
as: 'Ingredients',
through: { some join table specifying keys where necessary since this
is many to many }
where: {some conditional code around your search param},
}, {
model: models.Instruction,
as: 'Instructions',
where: {some conditional code around your search param},
}],
There is good documentation around multiple includes, or nested includes in the sequelize docs, but from what i see above you have a fairly good understanding of what you need to do. To uncomplicate things a bit, i'd start with just searching on your fields from recipie (title, description) before you add the includes and get that working, then it will be a little clearer how you want to form your where clauses.
alternativley.. you can skip the includes and write associations in your models and call them with getters and pass the where clauses to those... i do that as well and again well documented stuff now.. Sequelize has really upped their game
Recipie.associate = function (models) {
models.Recipie.hasMany(models.Ingredient, { as: 'Ingredients', through: "recipie_ingredient" foreignKey: 'recipie_id'});
};
now you have a getter for Ingredients, and if you declare belongsToMany targetting back at Recipie in the Ingredient model then you'll have a getter there as well, and you can pass your search string to that via where clause and get all recipies that have a given ingredient or ingredient list type thing.... Clear as mud?

Mongoose partial field search without RegEx

Let's say I have this schema:
var mongoose = require("mongoose")
var userSchema = new mongoose.Schema({
name: {type: String},
// other fields
}, { collation: { locale: "en_US", strength: 1 } });
I use collation so that the search is case-insensitive
Then let's say I have a document with name "Dave"
{
name: "Dave",
// other fields
}
then, I search for it but without writing the whole word
var userList = {
.find({name: "da"})
.exec();
}
How can I make this work without using a regex expression? Which are quite slow. I have tried doing an index and then searching with the $text method but I don't know how to make it so that it searches only a specific field within the document.
I believe using REGEX is your best solution. What you are trying to do is literally what regex is designed for. Yeah it's slow, but any other option you try to implement will probably be slower.
Creating a text index, and using the $text is only designed to match full words so you cannot use this method.
If you are truly desperate, and really don't want to use regex you can try something else... Try creating and storing an object, with each possible substring in the document. Object lookup is O(1) time, which means it will be faster, but the tradeoff is you are storing an absurd amount of data in the database. If this is ok with you, then give 'er a try.
Let's use Dave for example. The object you store could look something like this:
{
"d": 1,
"da": 1,
"dav": 1,
"dave"" 1
}
We can store this object in a field called substrings. Then when we do the database lookup, it's as simple as:
User.find({ 'substrings.da': { $exists: true }})
But please consider using regex... It's so much simpler, so much cleaner and it's designed for exactly what you want.

Find and update case insensitive data in MongoDB [duplicate]

Example:
> db.stuff.save({"foo":"bar"});
> db.stuff.find({"foo":"bar"}).count();
1
> db.stuff.find({"foo":"BAR"}).count();
0
You could use a regex.
In your example that would be:
db.stuff.find( { foo: /^bar$/i } );
I must say, though, maybe you could just downcase (or upcase) the value on the way in rather than incurring the extra cost every time you find it. Obviously this wont work for people's names and such, but maybe use-cases like tags.
UPDATE:
The original answer is now obsolete. Mongodb now supports advanced full text searching, with many features.
ORIGINAL ANSWER:
It should be noted that searching with regex's case insensitive /i means that mongodb cannot search by index, so queries against large datasets can take a long time.
Even with small datasets, it's not very efficient. You take a far bigger cpu hit than your query warrants, which could become an issue if you are trying to achieve scale.
As an alternative, you can store an uppercase copy and search against that. For instance, I have a User table that has a username which is mixed case, but the id is an uppercase copy of the username. This ensures case-sensitive duplication is impossible (having both "Foo" and "foo" will not be allowed), and I can search by id = username.toUpperCase() to get a case-insensitive search for username.
If your field is large, such as a message body, duplicating data is probably not a good option. I believe using an extraneous indexer like Apache Lucene is the best option in that case.
Starting with MongoDB 3.4, the recommended way to perform fast case-insensitive searches is to use a Case Insensitive Index.
I personally emailed one of the founders to please get this working, and he made it happen! It was an issue on JIRA since 2009, and many have requested the feature. Here's how it works:
A case-insensitive index is made by specifying a collation with a strength of either 1 or 2. You can create a case-insensitive index like this:
db.cities.createIndex(
{ city: 1 },
{
collation: {
locale: 'en',
strength: 2
}
}
);
You can also specify a default collation per collection when you create them:
db.createCollection('cities', { collation: { locale: 'en', strength: 2 } } );
In either case, in order to use the case-insensitive index, you need to specify the same collation in the find operation that was used when creating the index or the collection:
db.cities.find(
{ city: 'new york' }
).collation(
{ locale: 'en', strength: 2 }
);
This will return "New York", "new york", "New york" etc.
Other notes
The answers suggesting to use full-text search are wrong in this case (and potentially dangerous). The question was about making a case-insensitive query, e.g. username: 'bill' matching BILL or Bill, not a full-text search query, which would also match stemmed words of bill, such as Bills, billed etc.
The answers suggesting to use regular expressions are slow, because even with indexes, the documentation states:
"Case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilize case-insensitive indexes."
$regex answers also run the risk of user input injection.
If you need to create the regexp from a variable, this is a much better way to do it: https://stackoverflow.com/a/10728069/309514
You can then do something like:
var string = "SomeStringToFind";
var regex = new RegExp(["^", string, "$"].join(""), "i");
// Creates a regex of: /^SomeStringToFind$/i
db.stuff.find( { foo: regex } );
This has the benefit be being more programmatic or you can get a performance boost by compiling it ahead of time if you're reusing it a lot.
Keep in mind that the previous example:
db.stuff.find( { foo: /bar/i } );
will cause every entries containing bar to match the query ( bar1, barxyz, openbar ), it could be very dangerous for a username search on a auth function ...
You may need to make it match only the search term by using the appropriate regexp syntax as:
db.stuff.find( { foo: /^bar$/i } );
See http://www.regular-expressions.info/ for syntax help on regular expressions
db.company_profile.find({ "companyName" : { "$regex" : "Nilesh" , "$options" : "i"}});
db.zipcodes.find({city : "NEW YORK"}); // Case-sensitive
db.zipcodes.find({city : /NEW york/i}); // Note the 'i' flag for case-insensitivity
TL;DR
Correct way to do this in mongo
Do not Use RegExp
Go natural And use mongodb's inbuilt indexing , search
Step 1 :
db.articles.insert(
[
{ _id: 1, subject: "coffee", author: "xyz", views: 50 },
{ _id: 2, subject: "Coffee Shopping", author: "efg", views: 5 },
{ _id: 3, subject: "Baking a cake", author: "abc", views: 90 },
{ _id: 4, subject: "baking", author: "xyz", views: 100 },
{ _id: 5, subject: "Café Con Leche", author: "abc", views: 200 },
{ _id: 6, subject: "Сырники", author: "jkl", views: 80 },
{ _id: 7, subject: "coffee and cream", author: "efg", views: 10 },
{ _id: 8, subject: "Cafe con Leche", author: "xyz", views: 10 }
]
)
Step 2 :
Need to create index on whichever TEXT field you want to search , without indexing query will be extremely slow
db.articles.createIndex( { subject: "text" } )
step 3 :
db.articles.find( { $text: { $search: "coffee",$caseSensitive :true } } ) //FOR SENSITIVITY
db.articles.find( { $text: { $search: "coffee",$caseSensitive :false } } ) //FOR INSENSITIVITY
One very important thing to keep in mind when using a Regex based query - When you are doing this for a login system, escape every single character you are searching for, and don't forget the ^ and $ operators. Lodash has a nice function for this, should you be using it already:
db.stuff.find({$regex: new RegExp(_.escapeRegExp(bar), $options: 'i'})
Why? Imagine a user entering .* as his username. That would match all usernames, enabling a login by just guessing any user's password.
Suppose you want to search "column" in "Table" and you want case insensitive search. The best and efficient way is:
//create empty JSON Object
mycolumn = {};
//check if column has valid value
if(column) {
mycolumn.column = {$regex: new RegExp(column), $options: "i"};
}
Table.find(mycolumn);
It just adds your search value as RegEx and searches in with insensitive criteria set with "i" as option.
Mongo (current version 2.0.0) doesn't allow case-insensitive searches against indexed fields - see their documentation. For non-indexed fields, the regexes listed in the other answers should be fine.
For searching a variable and escaping it:
const escapeStringRegexp = require('escape-string-regexp')
const name = 'foo'
db.stuff.find({name: new RegExp('^' + escapeStringRegexp(name) + '$', 'i')})
Escaping the variable protects the query against attacks with '.*' or other regex.
escape-string-regexp
The best method is in your language of choice, when creating a model wrapper for your objects, have your save() method iterate through a set of fields that you will be searching on that are also indexed; those set of fields should have lowercase counterparts that are then used for searching.
Every time the object is saved again, the lowercase properties are then checked and updated with any changes to the main properties. This will make it so you can search efficiently, but hide the extra work needed to update the lc fields each time.
The lower case fields could be a key:value object store or just the field name with a prefixed lc_. I use the second one to simplify querying (deep object querying can be confusing at times).
Note: you want to index the lc_ fields, not the main fields they are based off of.
Using Mongoose this worked for me:
var find = function(username, next){
User.find({'username': {$regex: new RegExp('^' + username, 'i')}}, function(err, res){
if(err) throw err;
next(null, res);
});
}
If you're using MongoDB Compass:
Go to the collection, in the filter type -> {Fieldname: /string/i}
For Node.js using Mongoose:
Model.find({FieldName: {$regex: "stringToSearch", $options: "i"}})
The aggregation framework was introduced in mongodb 2.2 . You can use the string operator "$strcasecmp" to make a case-insensitive comparison between strings. It's more recommended and easier than using regex.
Here's the official document on the aggregation command operator: https://docs.mongodb.com/manual/reference/operator/aggregation/strcasecmp/#exp._S_strcasecmp .
You can use Case Insensitive Indexes:
The following example creates a collection with no default collation, then adds an index on the name field with a case insensitive collation. International Components for Unicode
/* strength: CollationStrength.Secondary
* Secondary level of comparison. Collation performs comparisons up to secondary * differences, such as diacritics. That is, collation performs comparisons of
* base characters (primary differences) and diacritics (secondary differences). * Differences between base characters takes precedence over secondary
* differences.
*/
db.users.createIndex( { name: 1 }, collation: { locale: 'tr', strength: 2 } } )
To use the index, queries must specify the same collation.
db.users.insert( [ { name: "Oğuz" },
{ name: "oğuz" },
{ name: "OĞUZ" } ] )
// does not use index, finds one result
db.users.find( { name: "oğuz" } )
// uses the index, finds three results
db.users.find( { name: "oğuz" } ).collation( { locale: 'tr', strength: 2 } )
// does not use the index, finds three results (different strength)
db.users.find( { name: "oğuz" } ).collation( { locale: 'tr', strength: 1 } )
or you can create a collection with default collation:
db.createCollection("users", { collation: { locale: 'tr', strength: 2 } } )
db.users.createIndex( { name : 1 } ) // inherits the default collation
I'm surprised nobody has warned about the risk of regex injection by using /^bar$/i if bar is a password or an account id search. (I.e. bar => .*#myhackeddomain.com e.g., so here comes my bet: use \Q \E regex special chars! provided in PERL
db.stuff.find( { foo: /^\Qbar\E$/i } );
You should escape bar variable \ chars with \\ to avoid \E exploit again when e.g. bar = '\E.*#myhackeddomain.com\Q'
Another option is to use a regex escape char strategy like the one described here Javascript equivalent of Perl's \Q ... \E or quotemeta()
Use RegExp,
In case if any other options do not work for you, RegExp is a good option. It makes the string case insensitive.
var username = new RegExp("^" + "John" + "$", "i");;
use username in queries, and then its done.
I hope it will work for you too. All the Best.
If there are some special characters in the query, regex simple will not work. You will need to escape those special characters.
The following helper function can help without installing any third-party library:
const escapeSpecialChars = (str) => {
return str.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");
}
And your query will be like this:
db.collection.find({ field: { $regex: escapeSpecialChars(query), $options: "i" }})
Hope it will help!
Using a filter works for me in C#.
string s = "searchTerm";
var filter = Builders<Model>.Filter.Where(p => p.Title.ToLower().Contains(s.ToLower()));
var listSorted = collection.Find(filter).ToList();
var list = collection.Find(filter).ToList();
It may even use the index because I believe the methods are called after the return happens but I haven't tested this out yet.
This also avoids a problem of
var filter = Builders<Model>.Filter.Eq(p => p.Title.ToLower(), s.ToLower());
that mongodb will think p.Title.ToLower() is a property and won't map properly.
I had faced a similar issue and this is what worked for me:
const flavorExists = await Flavors.findOne({
'flavor.name': { $regex: flavorName, $options: 'i' },
});
Yes it is possible
You can use the $expr like that:
$expr: {
$eq: [
{ $toLower: '$STRUNG_KEY' },
{ $toLower: 'VALUE' }
]
}
Please do not use the regex because it may make a lot of problems especially if you use a string coming from the end user.
I've created a simple Func for the case insensitive regex, which I use in my filter.
private Func<string, BsonRegularExpression> CaseInsensitiveCompare = (field) =>
BsonRegularExpression.Create(new Regex(field, RegexOptions.IgnoreCase));
Then you simply filter on a field as follows.
db.stuff.find({"foo": CaseInsensitiveCompare("bar")}).count();
These have been tested for string searches
{'_id': /.*CM.*/} ||find _id where _id contains ->CM
{'_id': /^CM/} ||find _id where _id starts ->CM
{'_id': /CM$/} ||find _id where _id ends ->CM
{'_id': /.*UcM075237.*/i} ||find _id where _id contains ->UcM075237, ignore upper/lower case
{'_id': /^UcM075237/i} ||find _id where _id starts ->UcM075237, ignore upper/lower case
{'_id': /UcM075237$/i} ||find _id where _id ends ->UcM075237, ignore upper/lower case
For any one using Golang and wishes to have case sensitive full text search with mongodb and the mgo godoc globalsign library.
collation := &mgo.Collation{
Locale: "en",
Strength: 2,
}
err := collection.Find(query).Collation(collation)
As you can see in mongo docs - since version 3.2 $text index is case-insensitive by default: https://docs.mongodb.com/manual/core/index-text/#text-index-case-insensitivity
Create a text index and use $text operator in your query.

Resources