How to compare a string in nodejs? - node.js

I have a string which will be in DB like below.
let text = "Hi {#var#}, ghkk{#Var#}"
But from user we will get the value like below.
let text2 = "Hi xya, ghkkhoww"
It can be upto 10 variables. I want to compare both text and text2 and in place of {#var}, if some value is there, then the string is equal, else it is not. Can anybody please give me an idea how to compare this ?

There are a couple of ways here are a few:
String(var1).includes(var2) | Determine if var1 includes var2
String(var1).indexOf(var2) | returns the index if var2 in var1
String(var1).search(var2) | searches for var2 in var1 (best for paragraphs or sentences)
String(var1).equals(var2) | do both strings equal one-another
String(var1).compareTo(var3) | are these strings comparable
var == var2 | do they equal note do not use === which is know as the equivalence operator.
As you can see its completely dependent on the use case. This is a standard problem in CS but, the common rule of thumb would be using the ==, indexOf, and equals. SideBar I guess you could use Regular expressions, commonly known as Regex. It's pretty powerful, with a small learning curve but, you can easily get expressions wrong and cause an edge case where your code is letting scenarios sneak by which can be difficult to find if you don't know what your looking for or how its affecting your code.
Resources:
https://www.w3schools.com/js/js_string_methods.asp

Related

String matching without using builtin functions

I want to search for a query (a string) in a subject (another string).
The query may appear in whole or in parts, but will not be rearranged. For instance, if the query is 'da', and the subject is 'dura', it is still a match.
I am not allowed to use string functions like strfind or find.
The constraints make this actually quite straightforward with a single loop. Imagine you have two indices initially pointing at the first character of both strings, now compare them - if they don't match, increment the subject index and try again. If they do, increment both. If you've reached the end of the query at that point, you've found it. The actual implementation should be simple enough, and I don't want to do all the work for you ;)
If this is homework, I suggest you look at the explanation which precedes the code and then try for yourself, before looking at the actual code.
The code below looks for all occurrences of chars of the query string within the subject string (variables m; and related ii, jj). It then tests all possible orders of those occurrences (variable test). An order is "acceptable" if it contains all desired chars (cond1) in increasing positions (cond2). The result (variable result) is affirmative if there is at least one acceptable order.
subject = 'this is a test string';
query = 'ten';
m = bsxfun(#eq, subject.', query);
%'// m: test if each char of query equals each char of subject
[ii jj] = find(m);
jj = jj.'; %'// ii: which char of query is found within subject...
ii = ii.'; %'// jj: ... and at which position
test = nchoosek(1:numel(jj),numel(query)).'; %'// test all possible orders
cond1 = all(jj(test) == repmat((1:numel(query)).',1,size(test,2)));
%'// cond1: for each order, are all chars of query found in subject?
cond2 = all(diff(ii(test))>0);
%// cond2: for each order, are the found chars in increasing positions?
result = any(cond1 & cond2); %// final result: 1 or 0
The code could be improved by using a better approach as regards to test, i.e. not testing all possible orders given by nchoosek.
Matlab allows you to view the source of built-in functions, so you could always try reading the code to see how the Matlab developers did it (although it will probably be very complex). (thanks Luis for the correction)
Finding a string in another string is a basic computer science problem. You can read up on it in any number of resources, such as Wikipedia.
Your requirement of non-rearranging partial matches recalls the bioinformatics problem of mapping splice variants to a genomic sequence.
You may solve your problem by using a sequence alignment algorithm such as Smith-Waterman, modified to work with all English characters and not just DNA bases.
Is this question actually from bioinformatics? If so, you should tag it as such.

intval($var) == strval($var) returns true

the given example is really simple so I don't think it needs any explaining.
I couldn't find any references on the docs that can explain this behaviour and I've also found a couple workarrounds for this, so you don't really need to bother finding them (thanks in advance though).
I'd just really like if some1 could explain this..... doesn't make any sense to me:
// comma separated IDs to later use in SQL statement
$var = '10,20,30,40,743,102394';
$multi_intval = intval($var); // same with (int) $var
$multi_string = strval($var); // same with (string) $var
var_dump($multi_intval, $multi_string, $multi_intval == $multi_string);
// result
int(10) string(22) "10,20,30,40,743,102394" bool(true)
how is 10 equal to a 22 strlen string?
I just ran across this looking for another answer, so even though it is old Ill give an answer in case someone else comes across it.
From the php docs here: If you compare a number with a string or the comparison involves numerical strings, then each string is converted to a number
Because of this your comparison intval($var) == strval($var) is changed to something like intval($var) == intval(strval($var)) which is of course equal (I don't know what the language is using to change the string to an integer, the above is just visual representation). If you really need to know if they are identical, use ===.

MATLAB string handling

I want to calculate the frequency of each word in a string. For that I need to turn string into an array (matrix) of words.
For example take "Hello world, can I ask you on a date?" and turn it into
['Hello' 'world,' 'can' 'I' 'ask' 'you' 'on' 'a' 'date?']
Then I can go over each entry and count every appearance of a particular word.
Is there a way to make an array (matrix) of words in MATLAB, instead of array of just chars?
Here is a little simpler regexp:
words = regexp(s,'\w+','match');
\w here means any symbol that can appear in words (including underscore).
Notice that the last question mark will not be included. Do you need it for counting words actually?
Regular expressions
s = 'Hello world, can I ask you on a date?'
slist = regexp(s, '[^ ]*', 'match')
yield
slist =
'Hello' 'world,' 'can' 'I' 'ask' 'you' 'on' 'a' 'date?'
Another way to do it is like this:
s = cell(java.lang.String('Hello world, can I ask you on a date?').split('[^\w]+'));
I.e. by creating a Java String object and using its methods to do the work, then converting back to a cell array of strings. Not necessarily the best way to do a job this simple, but Java has a rich library of string handling methods & classes that can come in handy.
Matlab's ability to switch into Java at the drop of a hat can come in handy sometimes - for example, when parsing & writing XML.

Make string manipulation more convenient in Mathematica

With Mathematica I always feel that strings are "second class citizens." Compared to a language such as PERL, one must juggle a lot of code to accomplish the same task.
The available functionality is not bad, but the syntax is uncomfortable. While there are a few shorthand forms such as <> for StringJoin and ~~ for StringExpression, most of the string functionality lacks such syntax, and uses clumsy names like: StringReplace, StringDrop, StringReverse, Characters, CharacterRange, FromCharacterCode, and RegularExpression.
In Mathematica strings are handled like mathematical objects, allowing 5 "a" + "b" where "a" and "b" act as symbols. This is a feature that I would not change, even if that would not break stacks of code. Nevertheless it precludes certain terse string syntax, wherein the expression 5 "a" + "b" would be rendered "aaaaab" for example.
What is the best way to make string manipulation more convenient in Mathematica?
Ideas that come to mind, either alone or in combination, are:
Overload existing functions to work on strings, e.g. Take, Replace, Reverse.
This was the original topic of my question to which Sasha replied. It was seen as inadvisable.
Use shortened names for string functions, e.g. StringReplace >> StrRpl, Characters >> Chrs, RegularExpression >> "RegEx"
Create new infix syntax for string functions, and possibly new string operations.
Create a new container for strings, e.g. str["string"], and then definitions for various functions. (This was suggested by Leonid Shifrin.)
A variable of (4), expand strings (automatically?) to characters, e.g. "string" >> str["s","t","r","i","n","g"] so that the characters can be seen by Part, Take, etc.
Call another language such as PERL from within Mathematica to handle string processing.
Create new string functions that conglomerate frequently used sequences of operations.
I think the reason these operations have String* names is that they have tiny differences compared to their list counterparts. Specifically compare Cases to StringCases.
Now the way to to achieve what you want is to do it like this:
Begin["StringOverload`"];
{Drop, Cases, Take, Reverse};
Unprotect[String];
ToStringHead[Drop] = StringDrop;
ToStringHead[Take] = StringTake;
ToStringHead[Cases] = StringCases;
ToStringHead[Reverse] = StringReverse;
String /:
HoldPattern[(h : Drop | Cases | Take | Reverse)[s_String, rest__]] :=
With[{head = ToStringHead[h]}, head[s, rest]]
RemoveOverloading[] :=
UpValues[String] =
DeleteCases[UpValues[String],
x_ /; ! FreeQ[Unevaluated[x], (Drop | Cases | Take | Reverse)]]
End[];
You get to load stuff with Get or Need, and remove the overloading with RemoveOverloading[] called with the correct context.
In[21]:= Cases["this is a sentence", RegularExpression["\\s\\w\\w\\s"]]
Out[21]= {" is "}
In[22]:= Take["This is dangerous", -9]
Out[22]= "dangerous"
In[23]:= Drop["This is dangerous", -9]
Out[23]= "This is "
I do not think doing this is the right way to go, though. You might consider introducing shorter symbols in some context which would automatically evaluate to String* symbols

How to parse a string (by a "new" markup) with R?

I want to use R to do string parsing that (I think) is like a simplistic HTML parsing.
For example, let's say we have the following two variables:
Seq <- "GCCTCGATAGCTCAGTTGGGAGAGCGTACGACTGAAGATCGTAAGGtCACCAGTTCGATCCTGGTTCGGGGCA"
Str <- ">>>>>>>..>>>>........<<<<.>>>>>.......<<<<<.....>>>>>.......<<<<<<<<<<<<."
Say that I want to parse "Seq" According to "Str", by using the legend here
Seq: GCCTCGATAGCTCAGTTGGGAGAGCGTACGACTGAAGATCGTAAGGtCACCAGTTCGATCCTGGTTCGGGGCA
Str: >>>>>>>..>>>>........<<<<.>>>>>.......<<<<<.....>>>>>.......<<<<<<<<<<<<.
| | | | | | | || |
+-----+ +--------------+ +---------------+ +---------------++-----+
| Stem 1 Stem 2 Stem 3 |
| |
+----------------------------------------------------------------+
Stem 0
Assume that we always have 4 stems (0 to 3), but that the length of letters before and after each of them can very.
The output should be something like the following list structure:
list(
"Stem 0 opening" = "GCCTCGA",
"before Stem 1" = "TA",
"Stem 1" = list(opening = "GCTC",
inside = "AGTTGGGA",
closing = "GAGC"
),
"between Stem 1 and 2" = "G",
"Stem 2" = list(opening = "TACGA",
inside = "CTGAAGA",
closing = "TCGTA"
),
"between Stem 2 and 3" = "AGGtC",
"Stem 3" = list(opening = "ACCAG",
inside = "TTCGATC",
closing = "CTGGT"
),
"After Stem 3" = "",
"Stem 0 closing" = "TCGGGGC"
)
I don't have any experience with programming a parser, and would like advices as to what strategy to use when programming something like this (and any recommended R commands to use).
What I was thinking of is to first get rid of the "Stem 0", then go through the inner string with a recursive function (let's call it "seperate.stem") that each time will split the string into:
1. before stem
2. opening stem
3. inside stem
4. closing stem
5. after stem
Where the "after stem" will then be recursively entered into the same function ("seperate.stem")
The thing is that I am not sure how to try and do this coding without using a loop.
Any advices will be most welcomed.
Update: someone sent me a bunch of question, here they are.
Q: Does each sequence have the same number of ">>>>" for the opening sequence as it does for "<<<<" on the ending sequence?
A: Yes
Q: Does the parsing always start with a partial stem 0 as your example shows?
A: No. Sometimes it will start with a few "."
Q: Is there a way of making sure you have the right sequences when you start?
A: I am not sure I understand what you mean.
Q: Is there a chance of error in the middle of the string that you have to restart from?
A: Sadly, yes. In which case, I'll need to ignore one of the inner stems...
Q: How long are these strings that you want to parse?
A: Each string has between 60 to 150 characters (and I have tens of thousands of them...)
Q: Is each one a self contained sequence like you show in your example, or do they go on for thousands of characters?
A: each sequence is self contained.
Q: Is there always at least one '.' between stems?
A: No.
Q: A full set of rules as to how the parsing should be done would be useful.
A: I agree. But since I don't have even a basic idea on how to start coding this, I thought first to have some help on the beginning and try to tweak with the other cases that will come up before turning back for help.
Q: Do you have the BNF syntax for parsing?
A: No. Your e-mail is the first time I came across it (http://en.wikipedia.org/wiki/Backus–Naur_Form).
You can simplify the task by using run length encoding.
First, convert Str to be a vector of individual characters, then call rle.
split_Str <- strsplit(Str, "")[[1]]
rle_Str <- rle(split_Str)
Run Length Encoding
lengths: int [1:14] 7 2 4 8 4 1 5 7 5 5 ...
values : chr [1:14] ">" "." ">" "." "<" "." ">" "." "<" "." ">" "." "<" "."
Now you just need to parse rle_Str$values, which is perhaps simpler. For instance, an inner stem will always look like ">" "." "<".
I think the main thing that you need to think about is the structure of the data. Does a "." always have to come between ">" and "<", or is it optional? Can you have a "." at the start? Do you need to be able to generalise to stems within stems within stems, or even more complex structures?
Once you have this solved, contructing your list output should be straightforward.
Also, don't worry about using loops, they are in the language because they are useful. Get the thing working first, then worry about speed optimisations (if you really have to) afterwards.

Resources