search records that have special characters in mongoDB [duplicate] - node.js

I just want to create a regular expression out of any possible string.
var usersString = "Hello?!*`~World()[]";
var expression = new RegExp(RegExp.escape(usersString))
var matches = "Hello".match(expression);
Is there a built-in method for that? If not, what do people use? Ruby has RegExp.escape. I don't feel like I'd need to write my own, there have got to be something standard out there.

The function linked in another answer is insufficient. It fails to escape ^ or $ (start and end of string), or -, which in a character group is used for ranges.
Use this function:
function escapeRegex(string) {
return string.replace(/[/\-\\^$*+?.()|[\]{}]/g, '\\$&');
}
While it may seem unnecessary at first glance, escaping - (as well as ^) makes the function suitable for escaping characters to be inserted into a character class as well as the body of the regex.
Escaping / makes the function suitable for escaping characters to be used in a JavaScript regex literal for later evaluation.
As there is no downside to escaping either of them, it makes sense to escape to cover wider use cases.
And yes, it is a disappointing failing that this is not part of standard JavaScript.

For anyone using Lodash, since v3.0.0 a _.escapeRegExp function is built-in:
_.escapeRegExp('[lodash](https://lodash.com/)');
// → '\[lodash\]\(https:\/\/lodash\.com\/\)'
And, in the event that you don't want to require the full Lodash library, you may require just that function!

Most of the expressions here solve single specific use cases.
That's okay, but I prefer an "always works" approach.
function regExpEscape(literal_string) {
return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
}
This will "fully escape" a literal string for any of the following uses in regular expressions:
Insertion in a regular expression. E.g. new RegExp(regExpEscape(str))
Insertion in a character class. E.g. new RegExp('[' + regExpEscape(str) + ']')
Insertion in integer count specifier. E.g. new RegExp('x{1,' + regExpEscape(str) + '}')
Execution in non-JavaScript regular expression engines.
Special Characters Covered:
-: Creates a character range in a character class.
[ / ]: Starts / ends a character class.
{ / }: Starts / ends a numeration specifier.
( / ): Starts / ends a group.
* / + / ?: Specifies repetition type.
.: Matches any character.
\: Escapes characters, and starts entities.
^: Specifies start of matching zone, and negates matching in a character class.
$: Specifies end of matching zone.
|: Specifies alternation.
#: Specifies comment in free spacing mode.
\s: Ignored in free spacing mode.
,: Separates values in numeration specifier.
/: Starts or ends expression.
:: Completes special group types, and part of Perl-style character classes.
!: Negates zero-width group.
< / =: Part of zero-width group specifications.
Notes:
/ is not strictly necessary in any flavor of regular expression. However, it protects in case someone (shudder) does eval("/" + pattern + "/");.
, ensures that if the string is meant to be an integer in the numerical specifier, it will properly cause a RegExp compiling error instead of silently compiling wrong.
#, and \s do not need to be escaped in JavaScript, but do in many other flavors. They are escaped here in case the regular expression will later be passed to another program.
If you also need to future-proof the regular expression against potential additions to the JavaScript regex engine capabilities, I recommend using the more paranoid:
function regExpEscapeFuture(literal_string) {
return literal_string.replace(/[^A-Za-z0-9_]/g, '\\$&');
}
This function escapes every character except those explicitly guaranteed not be used for syntax in future regular expression flavors.
For the truly sanitation-keen, consider this edge case:
var s = '';
new RegExp('(choice1|choice2|' + regExpEscape(s) + ')');
This should compile fine in JavaScript, but will not in some other flavors. If intending to pass to another flavor, the null case of s === '' should be independently checked, like so:
var s = '';
new RegExp('(choice1|choice2' + (s ? '|' + regExpEscape(s) : '') + ')');

Mozilla Developer Network's Guide to Regular Expressions provides this escaping function:
function escapeRegExp(string) {
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
}

In jQuery UI's autocomplete widget (version 1.9.1) they use a slightly different regular expression (line 6753), here's the regular expression combined with bobince's approach.
RegExp.escape = function( value ) {
return value.replace(/[\-\[\]{}()*+?.,\\\^$|#\s]/g, "\\$&");
}

There is an ES7 proposal for RegExp.escape at https://github.com/benjamingr/RexExp.escape/, with a polyfill available at https://github.com/ljharb/regexp.escape.

Nothing should prevent you from just escaping every non-alphanumeric character:
usersString.replace(/(?=\W)/g, '\\');
You lose a certain degree of readability when doing re.toString() but you win a great deal of simplicity (and security).
According to ECMA-262, on the one hand, regular expression "syntax characters" are always non-alphanumeric, such that the result is secure, and special escape sequences (\d, \w, \n) are always alphanumeric such that no false control escapes will be produced.

There is an ES7 proposal for RegExp.escape at https://github.com/benjamingr/RexExp.escape/, with a polyfill available at https://github.com/ljharb/regexp.escape.
An example based on the rejected ES proposal, includes checks if the property already exists, in the case that TC39 backtracks on their decision.
Code:
if (!Object.prototype.hasOwnProperty.call(RegExp, 'escape')) {
RegExp.escape = function(string) {
// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#Escaping
// https://github.com/benjamingr/RegExp.escape/issues/37
return string.replace(/[.*+\-?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
};
}
Code Minified:
Object.prototype.hasOwnProperty.call(RegExp,"escape")||(RegExp.escape=function(e){return e.replace(/[.*+\-?^${}()|[\]\\]/g,"\\$&")});
// ...
var assert = require('assert');
var str = 'hello. how are you?';
var regex = new RegExp(RegExp.escape(str), 'g');
assert.equal(String(regex), '/hello\. how are you\?/g');
There is also an npm module at:
https://www.npmjs.com/package/regexp.escape
One can install this and use it as so:
npm install regexp.escape
or
yarn add regexp.escape
var escape = require('regexp.escape');
var assert = require('assert');
var str = 'hello. how are you?';
var regex = new RegExp(escape(str), 'g');
assert.equal(String(regex), '/hello\. how are you\?/g');
In the GitHub && NPM page are descriptions of how to use the shim/polyfill for this option, as well. That logic is based on return RegExp.escape || implementation;, where implementation contains the regexp used above.
The NPM module is an extra dependency, but it also make it easier for an external contributor to identify logical parts added to the code. ¯\(ツ)/¯

Another (much safer) approach is to escape all the characters (and not just a few special ones that we currently know) using the unicode escape format \u{code}:
function escapeRegExp(text) {
return Array.from(text)
.map(char => `\\u{${char.charCodeAt(0).toString(16)}}`)
.join('');
}
console.log(escapeRegExp('a.b')); // '\u{61}\u{2e}\u{62}'
Please note that you need to pass the u flag for this method to work:
var expression = new RegExp(escapeRegExp(usersString), 'u');

This is a shorter version.
RegExp.escape = function(s) {
return s.replace(/[$-\/?[-^{|}]/g, '\\$&');
}
This includes the non-meta characters of %, &, ', and ,, but the JavaScript RegExp specification allows this.

XRegExp has an escape function:
XRegExp.escape('Escaped? <.>');
// -> 'Escaped\?\ <\.>'
More on: http://xregexp.com/api/#escape

escapeRegExp = function(str) {
if (str == null) return '';
return String(str).replace(/([.*+?^=!:${}()|[\]\/\\])/g, '\\$1');
};

Rather than only escaping characters which will cause issues in your regular expression (e.g.: a blacklist), consider using a whitelist instead. This way each character is considered tainted unless it matches.
For this example, assume the following expression:
RegExp.escape('be || ! be');
This whitelists letters, number and spaces:
RegExp.escape = function (string) {
return string.replace(/([^\w\d\s])/gi, '\\$1');
}
Returns:
"be \|\| \! be"
This may escape characters which do not need to be escaped, but this doesn't hinder your expression (maybe some minor time penalties - but it's worth it for safety).

The functions in the other answers are overkill for escaping entire regular expressions (they may be useful for escaping parts of regular expressions that will later be concatenated into bigger regexps).
If you escape an entire regexp and are done with it, quoting the metacharacters that are either standalone (., ?, +, *, ^, $, |, \) or start something ((, [, {) is all you need:
String.prototype.regexEscape = function regexEscape() {
return this.replace(/[.?+*^$|({[\\]/g, '\\$&');
};
And yes, it's disappointing that JavaScript doesn't have a function like this built-in.

I borrowed bobince's answer above and created a tagged template function for creating a RegExp where part of the value is escaped and part isn't.
regex-escaped.js
RegExp.escape = text => text.replace(/[\-\[\]{}()*+?.,\\\^$|#\s]/g, '\\$&');
RegExp.escaped = flags =>
function (regexStrings, ...escaped) {
const source = regexStrings
.map((s, i) =>
// escaped[i] will be undefined for the last value of s
escaped[i] === undefined
? s
: s + RegExp.escape(escaped[i].toString())
)
.join('');
return new RegExp(source, flags);
};
function capitalizeFirstUserInputCaseInsensitiveMatch(text, userInput) {
const [, before, match, after ] =
RegExp.escaped('i')`^((?:(?!${userInput}).)*)(${userInput})?(.*)$`.exec(text);
return `${before}${match.toUpperCase()}${after}`;
}
const text = 'hello (world)';
const userInput = 'lo (wor';
console.log(capitalizeFirstUserInputCaseInsensitiveMatch(text, userInput));
For our TypeScript fans...
global.d.ts
interface RegExpConstructor {
/** Escapes a string so that it can be used as a literal within a `RegExp`. */
escape(text: string): string;
/**
* Returns a tagged template function that creates `RegExp` with its template values escaped.
*
* This can be useful when using a `RegExp` to search with user input.
*
* #param flags The flags to apply to the `RegExp`.
*
* #example
*
* function capitalizeFirstUserInputCaseInsensitiveMatch(text: string, userInput: string) {
* const [, before, match, after ] =
* RegExp.escaped('i')`^((?:(?!${userInput}).)*)(${userInput})?(.*)$`.exec(text);
*
* return `${before}${match.toUpperCase()}${after}`;
* }
*/
escaped(flags?: string): (regexStrings: TemplateStringsArray, ...escapedVals: Array<string | number>) => RegExp;
}

There has only ever been and ever will be 12 meta characters that need to be escaped
to be considered a literal.
It doesn't matter what is done with the escaped string, inserted into a balanced regex wrapper or appended. It doesn't matter.
Do a string replace using this
var escaped_string = oldstring.replace(/[\\^$.|?*+()[{]/g, '\\$&');

This one is the permanent solution.
function regExpEscapeFuture(literal_string) {
return literal_string.replace(/[^A-Za-z0-9_]/g, '\\$&');
}

Just published a regex escape gist based on the RegExp.escape shim which was in turn based on the rejected RegExp.escape proposal. Looks roughly equivalent to the accepted answer except it doesn't escape - characters, which seems to be actually fine according to my manual testing.
Current gist at the time of writing this:
const syntaxChars = /[\^$\\.*+?()[\]{}|]/g
/**
* Escapes all special special regex characters in a given string
* so that it can be passed to `new RegExp(escaped, ...)` to match all given
* characters literally.
*
* inspired by https://github.com/es-shims/regexp.escape/blob/master/implementation.js
*
* #param {string} s
*/
export function escape(s) {
return s.replace(syntaxChars, '\\$&')
}

Related

Implementing String Interpolation in Flex/Bison

I'm currently writing an interpreter for a language I have designed.
The lexer/parser (GLR) is written in Flex/Bison and the main interpreter in D - and everything working flawlessly so far.
The thing is I want to also add string interpolation, that is identify string literals that contain a specific pattern (e.g. "[some expression]") and convert the included expression. I think this should be done at parser level, from within the corresponding Grammar action.
My idea is converting/treating the interpolated string as what it would look like with simple concatenation (as it works right now).
E.g.
print "this is the [result]. yay!"
to
print "this is the " + result + ". yay!"
However, I'm a bit confused as to how I could do that in Bison: basically, how do I tell it to re-parse a specific string (while constructing the main AST)?
Any ideas?
You could reparse the string, if you really wanted you, by generating a reentrant parser. You would probably want a reentrant scanner, as well, although I suppose you could kludge something together with a default scanner, using flex's buffer stack. Indeed, it's worth learning how to build reentrant parsers and scanners on the general principle of avoiding unnecessary globals, whether or not you need them for this particular purpose.
But you don't really need to reparse anything; you can do the entire parse in one pass. You just need enough smarts in your scanner so that it knows about nested interpolations.
The basic idea is to let the scanner split the string literal with interpolations into a series of tokens, which can easily be assembled into an appropriate AST by the parser. Since the scanner may return more than one token out of a single string literal, we'll need to introduce a start condition to keep track of whether the scan is currently inside a string literal or not. And since interpolations can, presumably, be nested we'll use flex's optional start condition stack, enabled with %option stack, to keep track of the nested contexts.
So here's a rough sketch.
As mentioned, the scanner has extra start conditions: SC_PROGRAM, the default, which is in effect while the scanner is scanning regular program text, and SC_STRING, in effect while the scanner is scanning a string. SC_PROGRAM is only needed because flex does not provide an official interface to check whether the start condition stack is empty; aside from nesting, it is identical to the INITIAL top-level start condition. The start condition stack is used to keep track of interpolation markers ([ and ] in this example), and it is needed because an interpolated expression might use brackets (as array subscripts, for example) or might even include a nested interpolated string. Since SC_PROGRAM is, with one exception, identical to INITIAL, we'll make it an inclusive rule.
%option stack
%s SC_PROGRAM
%x SC_STRING
%%
Since we're using a separate start condition to analyse string literals, we can also normalise escape sequences as we parse. Not all applications will want to do this, but it's pretty common. But since that's not really the point of this answer, I've left out most of the details. More interesting is the way that embedded interpolation expressions are handled, particularly deeply nested ones.
The end result will be to turn string literals into a series of tokens, possibly representing a nested structure. In order to avoid actually parsing in the scanner, we don't make any attempt to create AST nodes or otherwise rewrite the string literal; instead, we just pass the quote characters themselves through to the parser, delimiting the sequence of string literal pieces:
["] { yy_push_state(SC_STRING); return '"'; }
<SC_STRING>["] { yy_pop_state(); return '"'; }
A very similar set of rules is used for interpolation markers:
<*>"[" { yy_push_state(SC_PROGRAM); return '['; }
<INITIAL>"]" { return ']'; }
<*>"]" { yy_pop_state(); return ']'; }
The second rule above avoids popping the start condition stack if it is empty (as it will be in the INITIAL state). It's not necessary to issue an error message in the scanner; we can just pass the unmatched close bracket through to the parser, which will then do whatever error recovery seems necessary.
To finish off the SC_STRING state, we need to return tokens for pieces of the string, possibly including escape sequences:
<SC_STRING>{
[^[\\"]+ { yylval.str = strdup(yytext); return T_STRING; }
\\n { yylval.chr = '\n'; return T_CHAR; }
\\t { yylval.chr = '\t'; return T_CHAR; }
/* ... Etc. */
\\x[[:xdigit]]{2} { yylval.chr = strtoul(yytext, NULL, 16);
return T_CHAR; }
\\. { yylval.chr = yytext[1]; return T_CHAR; }
}
Returning escaped characters like that to the parser is probably not the best strategy; normally I would use an internal scanner buffer to accumulate the entire string. But it was simple for illustrative purposes. (Some error handling is omitted here; there are various corner cases, including newline handling and the annoying case where the last character in the program is a backslash inside an unterminated string literal.)
In the parser, we just need to insert a concatenation node for interpolated strings. The only complication is that we don't want to insert such a node for the common case of a string literal without any interpolations, so we use two syntax productions, one for a string with exactly one contained piece, and one for a string with two or more pieces:
string : '"' piece '"' { $$ = $2; }
| '"' piece piece_list '"' { $$ = make_concat_node(
prepend_to_list($2, $3));
}
piece : T_STRING { $$ = make_literal_node($1); }
| '[' expr ']' { $$ = $2; }
piece_list
: piece { $$ = new_list($1); }
| piece_list piece { $$ = append_to_list($1, $2); }

Parameterized query using node-postgres isn't passed between quotes

I'm using the npm module pg with Node.js, and have the following query:
query = "SELECT * FROM territories WHERE '($1, -122.26), (47.57, -122.36)'::box \
#> point(nwlat, nwlng) ORDER BY nwlat, nwlng;"
client.query(query, [lat+.05], callback);
When I run this, I get the following error:
invalid input syntax for type box: "($1, -122.26), (47.57, -122.36)"
But when I replace $1 with a decimal literal, like 47.67, it executes normally. What am I doing incorrectly?
Your problem is that this:
'$1'
doesn't have a placeholder in it, it is a string literal that just happens to contain some characters that look like a numbered placeholder. So this:
'($1, -122.26), (47.57, -122.36)'
doesn't have a placeholder either, that's just a string literal that happens to contain the characters $ and 1. Consider the difference between this:
let x = 6;
let y = 'x'; // String that contains the name of a variable.
and this:
let x = 6;
let y = x; // The actual variable itself.
in JavaScript, same idea.
You can build your box string using string concatenation:
WHERE ('(' || $1 || ', -122.26), (47.57, -122.36)')::box
but that's not very pretty. A cleaner solution would be to bypass strings and casting altogether by using the point and box functions:
WHERE box(point($1, -122.26), point(47.57, -122.36))
Extending on the answer by #mu is too short
With pg-promise query formatting you would get exactly what you expect ;)

AS3: Get all substrings from a string in a specified array

This is not a duplicate because all the other questions were not in AS3.
Here is my problem: I am trying to find some substrings that are in the "storage" string, that are in another string. I need to do this because my game server is sending the client random messages that contain on of the strings in the "storage" string. The strings sent from the server will always begin with: "AA_".
My code:
private var storage:String = AA_word1:AA_word2:AA_word3:AA_example1:AA_example2";
if(test.indexOf("AA_") >= 0) {
//i dont even know if this is right...
}
}
If there is a better way to do this, please let me know!
Why not just using String.split() :
var storage:String = 'AA_word1:AA_word2:AA_word3:AA_example1:AA_example2';
var a:Array = storage.split('AA_');
// gives : ,word1:,word2:,word3:,example1:,example2
// remove the 1st ","
a.shift();
trace(a); // gives : word1:,word2:,word3:,example1:,example2
Hope that can help.
Regular Expressions are the right tool for this job:
function splitStorage(storage: String){
var re: RegExp = /AA_([\w]+):?/gi;
// Execute the regexp until it
// stops returning results.
var strings = [];
var result: String;
while(result = re.exec(storage)){
strings.push(result[1]);
}
return strings;
}
The important part of this is the regular expression itself: /AA_([\w]+):?/gi
This says find a match starting with AA_, followed by one-or-more alphanumeric characters (which we capture) ([\w]+), optionally followed by a colon.
The match is then made global and case insensitive with /gi.
If you need to capture more than just letters and numbers - like this: "AA_word1 has spaces and [special-characters]:" - then add those characters to the character set inside the capture group.
e.g. ([-,.\[\]\s\w]+) will also match hyphen, comma, full-stop, square brackets, whitespace and alphanumeric characters.
Also you could do it with just one line, with a more advanced regular expression:
var storage:String = 'AA_word1:AA_word2:AA_word3:AA_example1:AA_example2';
const a:Array = storage.match(/(?<=AA_)\w+(?=:|$)/g);
so this means: one or more word char, preceeded by "AA_" and followed by ":" or the end of string. (note that "AA_" and ":" won't be included into the resulting match)

Templates escaping in Kotlin multiline strings

If I want to use $ sign in multiline strings, how do I escape it?
val condition = """ ... $eq ... """
$eq is parsed as a reference to a variable. How to escape $, so that it will not be recognized as reference to variable? (Kotlin M13)
From the documentation
A raw string is delimited by a triple quote ("""), contains no
escaping and can contain newlines and any other character
You would need to use a standard string with newlines
" ...\n \$eq \n ... "
or you could use the literal representation
""" ... ${'$'}eq ... "
Funny, but that works:
val eq = "\$eq"
print("""... $eq ..."""") // just like you asked :D
Actually, if eq is a number (a price, or sth), then you probably want to calculate it separately, and an additional external calculation as I suggested won't hurt.
In the case where you know ahead of time what $-variables you want (like when querying Mongo, as it looks like you might be doing), you can create a little helper object that defines those variables. You also get some protection against accidentally misspelling one of your operators, which is neat.
object MongoString {
inline operator fun invoke(callback: MongoString.() -> String) = callback()
val eq = "\$eq"
val lt = "\$lt"
// ... and all the other operators ...
}
fun test() {
val query = MongoString { """{"foo": {$lt: 10}}""" }
}
I wrote simple versions for update and query strings for mongo here: https://gist.github.com/Yona-Appletree/29be816ca74a0d93cdf9e6f5e23dda15

Get indexOf special characters in ActionScript3

In ActionScript3 i wanted to get the text between 2 quotes from some HTML using a input index value where i would simply increase the 2nd quote characters value by 1. This would be very simple however i have now noticed using indexOf does not seem to work correctly with quotes and other special characters.
So my question is if you have some HTML style text like this:
var MyText:String = '<div style="text-align:center;line-height:150%"><a href="http://www.website.com/page.htm">';
How can i correctly get the index of a quote " or other special character?
Currently i try this:
MyText.indexOf('"',1)
but after 0 it always returns the wrong index value.
Also a quick additional question would be is there a better way than using ' ' to store strings with characters like " inside? So if i had other ' characters etc it won't cause problems.
Edit -
This is the function i had created (usage = GetQuote(MyText,0) etc)
// GetQuote Function (Gets the content between quotes at a set index value)
function GetQuote(Input:String, Index:Number):String {
return String(Input.substr(Input.indexOf('"', Index), Input.indexOf('"', Index + 1)));
}
The return for GetQuote(MyText,0) is "text-align yet i need text-align:center;line-height:150% instead.
First off, index of the first quote is 11 and both MyString.indexOf('"') and MyString.indexOf('"',1) return the right value (the latter also works because you don't actually have a quote at the beginning of your string).
When you need to use an single quote inside another one or a double quote inside another one you need to escape the inner one(s) using backslashes. So to catch a single quote you would use it like '\''
There are several ways of stripping a value from a string. You can use the RegExp class or use standard String functions like indexOf, substr etc.
Now what exactly would you like the result to become? Your question is not obvious.
EDIT:
Using the RegExp class is much easier:
var myText:String = '<div style="text-align:center;line-height:150%"><a href="http://www.website.com/page.htm">';
function getQuote(input:String, index:int=0):String {
// I declared the default index as the first one
var matches:Array = [];
// create an array for the matched results
var rx:RegExp = /"(\\"|[^"])*"/g;
// create a RegExp rule to catch all grouped chars
// rule also includes escaped quotes
input.replace(rx,function(a:*) {
// if it's "etc." we want etc. only so...
matches.push(a.substr(1,a.length-2));
});
// above method does not replace anything actually.
// it just cycles in the input value and pushes
// captured values into the matches array.
return (index >= matches.length || index < 0) ? '' : matches[index];
}
trace('Index 0 -->',getQuote(myText))
trace('Index 1 -->',getQuote(myText,1))
trace('Index 2 -->',getQuote(myText,2))
trace('Index -1 -->',getQuote(myText,-1))
Outputs:
Index 0 --> text-align:center;line-height:150%
Index 1 --> http://www.website.com/page.htm
Index 2 -->
Index -1 -->

Resources