Is my string match algorithm fast?

Is my string match algorithm fast? - string

I came up with this algorithm for matching exact strings after I tried understanding the ones already on the internet and terribly failing :P Can anyone please tell me if this is fast or slow as compared to pre-existing algorithms?
#include <iostream>
#include<cstring>
using namespace std;
int main(){
int i=0;
int j=0;
int foo=0;
string str;
string needle;
cin>>str;
cin>>needle;
int * arr = NULL;
int * arr2 = NULL;
int * match = NULL;
arr = new int [str.length()];
int x=0, y = 0, z = 0;
int n=0; char a, b;
cout<<"\nStep 1: ";
for(i=0;i<str.length();i++){
if(str[i]==needle[0]){
arr[x]=i; x++;
cout<<i<<" ";
}
}
arr[x]=str.length(); x++; cout<<"\nStep 2: ";
if(x){
arr2 = new int [x];
for(i=0;i<x-1;i++){
if(arr[i+1]-arr[i]>=needle.length()){
arr2[y]=arr[i]; y++;
cout<<arr[i]<<" ";
}
}
delete[]arr; cout<<"\nStep 3: ";
if(y){
match = new int [y];
for(i=0;i<y;i++){
n=arr2[i];
for(j=0;j<needle.length(); j++){
a=str[n+j]; b=needle[j];
if(a==b){
foo=1;
}
else{
foo=0; break;
}
}
if(foo){
match[z]=n; z++;
cout<<n<<" ";
}
}
delete[]arr2;
if(z){
cout<<"\n\nMatches: ";
for(i=0;i<z;i++){
cout<<match[i]<<" ";
}
}
}
}
return 0;
}

"Can anyone please tell me if this is efficient enough ..."
No, because you haven't described the context in which the method is going to be used. In some cases, it will definitely be efficient enough. In others, probably not.
This is not intended to be a facetious answer. There is a real point:
Efficiency is rarely a goal in its own right, and in most cases the efficiency of particular section of code has little real significance in real world programming.
It is generally better to write simple, correct code, and only worry about the micro-level efficiency when you have measured the performance and profiled the application to identify the hotspots.
Now if you are interested in the performance of String search algorithms for its own sake, then I suggest you start by looking at this Wikipedia article which summarizes (and links to) a number of advanced string search algorthms.

You can't use it like this:
arr = new int [str.length()]
It will give you a compile time error. But you can use that in Java.

Related

Constant-time string comparison function

To compare two strings, I currently use strcmp or one of its variants. However, because strcmp take longer if more characters match, it is vulnerable to timing attacks. Is there a constant-time string comparison function in the standard library on Windows?

I don't think Windows nor Visual Studio has such functions.
At least for something simple like strcmp you can whip something up yourself.
If you only care about equality:
int strctcmp(const char*a, const char*b)
{
int r = 0;
for (; *a && *b; ++a, ++b)
{
r |= *a != *b;
}
return r;
}
If you need sortable results and you need to process all of the longest string:
int strctcmp(const char*a, const char*b)
{
int r = 0, c;
for (;;)
{
c = *a - *b;
if (!r) r = c;
if (!*a && !*b) break;
if (*a) ++a;
if (*b) ++b;
}
return r;
}
These are not perfect timing wise but should be more than good enough for anything network based.

How do I delete a word with Recursion and count the times it deletes?

I've completed about half of my assignment where I have to count the "chickens" in a string, remove the chickens, and return the amount of times I have to remove them.
public static int countChickens(String word)
{
int val = word.indexOf("chicken");
int count = 0;
if(val > -1){
count++;
word = word.substring(val + 1);
//I'm aware the following line doesn't work. It's my best guess.
//word.remove.indexOf("chicken");
val = word.indexOf("chicken");
}
return count;
}
As is, the program counts the correct amount of chickens in the word itself. (Sending it "afunchickenhaschickenfun" returns 2.) However, I need it to be able to return 2 if I send it something like "chichickencken" because it removed the first chicken, and then the second chicken came into play. How do I do the remove part?

Not tested and writen in sudo code, but should give you a better idea on a way to approach this.
int numberOfChickens = 0;
public void CountAndReplaceChicken(string word)
{
int initCheck = word.indexOf("chicken");
if (initCheck > -1)
{
word = word.remove.indexOf("chicken"); // not sure about the syntax in Eclipse but given you figure this part out
numberOfChickens++;
int recursionCheck = word.indexOf("chicken");
if (recursionCheck > -1)
CountAndReplaceChicken(word);
}
}

Okay, the teacher showed us how to do it a few days later. If I understood David Lee's code right, this is just a simplified way of what he did.
public static int countChickens(String word)
{
int val = word.indexOf("chicken");
if(val > -1){
return 1 + countChickens(word.substring(0, val) + word.substring(val + 7));
}
return 0;
}

Longest Common Substring non-DP solution with O(m*n)

The definition of the problem is:
Given two strings, find the longest common substring.
Return the length of it.
I was solving this problem and I think I solved it with O(m*n) time complexity. However I don't know why when I look up the solution, it's all talking about the optimal solution being dynamic programming - http://www.geeksforgeeks.org/longest-common-substring/
Here's my solution, you can test it here: http://www.lintcode.com/en/problem/longest-common-substring/
int longestCommonSubstring(string &A, string &B) {
int ans = 0;
for (int i=0; i<A.length(); i++) {
int counter = 0;
int k = i;
for (int j=0; j<B.length() && k <A.length(); j++) {
if (A[k]!=B[j]) {
counter = 0;
k = i;
} else {
k++;
counter++;
ans = max(ans, counter);
}
}
}
return ans;
}
My idea is simple, start from the first position of string A and see what's the longest substring I can match with string B, then start from the second position of string A and see what's the longest substring I can match....
Is there something wrong with my solution? Or is it not O(m*n) complexity?

Good news: your algorithm is O(mn). Bad news: it doesn't work correctly.
Your inner loop is wrong: it's intended to find the longest initial substring of A[i:] in B, but it works like this:
j = 0
While j < len(B)
Match as much of A[i:] against B[j:]. Call it s.
Remember s if it's the longest so far found.
j += len(s)
This fails to find the longest match. For example, when A = "XXY" and B = "XXXY" and i=0 it'll find "XX" as the longest match instead of the complete match "XXY".
Here's a runnable version of your code (lightly transcribed into C) that shows the faulty result:
#include <string.h>
#include <stdio.h>
int lcs(const char* A, const char* B) {
int al = strlen(A);
int bl = strlen(B);
int ans = 0;
for (int i=0; i<al; i++) {
int counter = 0;
int k = i;
for (int j=0; j<bl && k<al; j++) {
if (A[k]!=B[j]) {
counter = 0;
k = i;
} else {
k++;
counter++;
if (counter >= ans) ans = counter;
}
}
}
return ans;
}
int main(int argc, char**argv) {
printf("%d\n", lcs("XXY", "XXXY"));
return 0;
}
Running this program outputs "2".

Your solution is O(nm) complexity and if you look compare the structure to the provided algorithm its the exact same; however, yours does not memoize.
One advantage that the dynamic algorithm provided in the link has is that in the same complexity class time it can recall different substring lengths in O(1); otherwise, it looks good to me.
This is a kind of thing will happen from time to time because storing subspace solutions will not always result in a better run time (on first call) and result in the same complexity class runtime instead (eg. try to compute the nth Fibonacci number with a dynamic solution and compare that to a tail recursive solution. Note that in this case like your case, after the array is filled the first time, its faster to return an answer each successive call.

Given length and number of digits,we have to find minimum and maximum number that can be made?

As the question states,we are given a positive integer M and a non-negative integer S. We have to find the smallest and the largest of the numbers that have length M and sum of digits S.
Constraints:
(S>=0 and S<=900)
(M>=1 and M<=100)
I thought about it and came to conclusion that it must be Dynamic Programming.However I failed to build DP state.
This is what I thought:-
dp[i][j]=First 'i' digits having sum 'j'
And tried to make program.This is how it looks like
/*
*** PATIENCE ABOVE PERFECTION ***
"When in doubt, use brute force. :D"
-Founder of alloj.wordpress.com
*/
#include<bits/stdc++.h>
using namespace std;
#define pb push_back
#define mp make_pair
#define nline cout<<"\n"
#define fast ios_base::sync_with_stdio(false),cin.tie(0)
#define ull unsigned long long int
#define ll long long int
#define pii pair<int,int>
#define MAXX 100009
#define fr(a,b,i) for(int i=a;i<b;i++)
vector<int>G[MAXX];
int main()
{
int m,s;
cin>>m>>s;
int dp[m+1][s+1];
fr(1,m+1,i)
fr(1,s+1,j)
fr(0,10,k)
dp[i][j]=min(dp[i-1][j-k]+k,dp[i][j]); //Tried for Minimum
cout<<dp[m][s]<<endl;
return 0;
}
Please guide me about this DP state and what will be the time complexity of the program.This is my first try of DP.

dp solution goes here :-
#include<iostream>
using namespace std;
int dp[102][902][2] ;
void print_ans(int m , int s , int flag){
if(m==0)
return ;
cout<<dp[m][s][flag];
if(dp[m][s][flag]!=-1)
print_ans(m-1 , s-dp[m][s][flag] , flag );
return ;
}
int main(){
//freopen("problem.in","r",stdin);
//freopen("out.txt","w",stdout);
//int t;
//cin>>t;
//while(t--){
int m , s ;
cin>>m>>s;
if(s==0){
cout<<(m==1?"0 0":"-1 -1");
return 0;
}
for(int i = 0 ; i <=m ; i++){
for(int j=0 ; j<=s ;j++){
dp[i][j][0]=-1;
dp[i][j][1]=-1;
}
}
for(int i = 0 ; i < 10 ; i++){
dp[1][i][0]=i;
dp[1][i][1]=i;
}
for(int i = 2 ; i<=m ; i++){
for(int j = 0 ; j<=s ; j++){
int flag = -1;
int f = -1;
for(int k = 0 ; k <= 9 ; k++){
if(i==m&&k==0)
continue;
if( j>=k && flag==-1 && dp[i-1][j-k][0]!=-1)
flag = k;
}
for(int k = 9 ; k >=0 ;k--){
if(i==m&&k==0)
continue;
if( j>=k && f==-1 && dp[i-1][j-k][1]!=-1)
f = k;
}
dp[i][j][0]=flag;
dp[i][j][1]=f;
}
}
if(m!=0){
print_ans(m , s , 0);
cout<<" ";
print_ans(m,s,1);
}
else
cout<<"-1 -1";
cout<<endl;
// }
}

The DP state is (i,j). It can be thought of as the parameters of a mathematical function defined in terms of recurrences(Smaller problems ,Hence sub problems!)
More deeply,
State is generally the number of parameters to identify the problem uniquely , so that we always know on what we are computing on!!
Let us take the example of your question only
Just to define your problem we will need Number of Digits in the state + Sums that can be formed with these Digits (Note: You are kind of collectively keeping the sum while traversing through digits!)
I think that is enough for the state part.
Now,
Running time of Dynamic Programming is very simple.
First Let us see how many sub problems exist in a problem :
You need to fill up each and every state i.e. You have to cover all the unique sub problems smaller than or equal to the whole problem !!
Which problem is smaller than the other is known by the recurrent relation !!
For example:
Fibonacci Sequence
F(n)=F(n-1)+F(n-2)
Note the base case , is always the smallest sub problem .!!
Note Here for F(n) We have to calculate F(n-1) and F(n-2) , And it will reach a stage where n=1 , where you need to return the base case!!
Hence the total number of sub problems can be said as all the problems between the base case and the current problem!
Now,
In bottom up , we need to process each and every state in terms of size between this base case and problem!
Now, This tells us that the Running time should be
O(Number of Subproblems * Time per each subproblem).
So how many subproblems exist in your solution DP[0][0] to DP[M][S]
and for every problem you are running a loop of 10
O( M*S (Subproblems ) * 10 )
Chop that constant of!
But it is not necessarily a constant always!!
Here is some code which you might want to look! Feel free to ask anything !
#include<bits/stdc++.h>
using namespace std;
bool DP[9][101];
int Number[9][101];
int main()
{
DP[0][0]=true; // It is possible to form 0 using NULL digits!!
int N=9,S=100,i,j,k;
for(i=1;i<=9;++i)
for(j=0;j<=100;++j)
{
if(DP[i-1][j])
{
for(k=0;k<=9;++k)
if(j+k<=100)
{
DP[i][j+k]=true;
Number[i][j+k]=Number[i-1][j]*10+k;
}
}
}
cout<<Number[9][81]<<"\n";
return 0;
}
You can rather use backtracking rather than storing the numbers directly just because your constraints are high!
DP[i][j] represents if it is possible to form sum of digits using i digits only!!
Number[i][j]
is my laziness to avoid typing a backtrack way(Sleepy, its already 3A.M.)
I am trying to add all the possible digits to extend the state.
It is essentially kind of forward DP style!! You can read more about it at Topcoder

Convert For loop into Parallel.For loop

public void DoSomething(byte[] array, byte[] array2, int start, int counter)
{
int length = array.Length;
int index = 0;
while (count >= needleLen)
{
index = Array.IndexOf(array, array2[0], start, count - length + 1);
int i = 0;
int p = 0;
for (i = 0, p = index; i < length; i++, p++)
{
if (array[p] != array2[i])
{
break;
}
}

Given that your for loop appears to be using a loop body dependent on ordering, it's most likely not a candidate for parallelization.
However, you aren't showing the "work" involved here, so it's difficult to tell what it's doing. Since the loop relies on both i and p, and it appears that they would vary independently, it's unlikely to be rewritten using a simple Parallel.For without reworking or rethinking your algorithm.
In order for a loop body to be a good candidate for parallelization, it typically needs to be order independent, and have no ordering constraints. The fact that you're basing your loop on two independent variables suggests that these requirements are not valid in this algorithm.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Is my string match algorithm fast? - string

You can't use it like this: arr = new int [str.length()] It will give you a compile time error. But you can use that in Java.

Related

Constant-time string comparison function

How do I delete a word with Recursion and count the times it deletes?

Longest Common Substring non-DP solution with O(m*n)

Given length and number of digits,we have to find minimum and maximum number that can be made?

Convert For loop into Parallel.For loop

Categories

Resources