Find the word in the stream?

Find the word in the stream? - string

Given an infinite stream of characters and a list L of strings, create a function that calls an external API when a word in L is recognized during the processing of the stream.
Example:
L = ["ok","test","one","try","trying"]
stream = a,b,c,o,k,d,e,f,t,r,y,i,n,g.............
The call to external API will happen when 'k' is encountered, again when the 'y' is encountered, and again at 'g'.
My idea:
Create trie out of the list and navigate the nodes as you read from stream in linear time. But there would be a bug if you just do simple trie search.
Assume you have words "abxyz" and "xyw" and your input is "abxyw".In this case you can't recognize "xyw" with trie.
So search should be modified as below:
let's take above use case "abxyw". We start the search and we find we have all the element till 'x'. Moment you get 'x' you have two options:
Check if the current element is equal to the head of trie and if it is equal to head of trie then call recursive search.
Continue till the end of current word. In this case for your given input it will return false but for the recursive search we started in point 1, it will return true.
Below is my modified search but I think it has bugs and can be improved. Any suggestions?
#define SIZE 26
struct tri{
int complete;
struct tri *child[SIZE];
};
void insert(char *c, struct tri **t)
{
struct tri *current = *t;
while(*c != '\0')
{
int i;
int letter = *c - 'a';
if(current->child[letter] == NULL) {
current->child[letter] = malloc(sizeof(*current));
memset(current->child[letter], 0, sizeof(struct tri));
}
current = current->child[letter];
c++;
}
current->complete = 1;
}
struct tri *t;
int flag = 0;
int found(char *c, struct tri *tt)
{
struct tri *current = tt;
if (current == NULL)
return 0;
while(*c != '\0')
{
int i;
int letter = *c - 'a';
/* if this is the first char then recurse from begining*/
if (t->child[letter] != NULL)
flag = found(c+1, t->child[letter]);
if (flag == 1)
return 1;
if(!flag && current->child[letter] == NULL) {
return 0;
}
current = current->child[letter];
c++;
}
return current->complete;
}
int main()
{
int i;
t = malloc(sizeof(*t));
t->complete = 0;
memset(t, 0, sizeof(struct tri));
insert("weathez", &t);
insert("eather", &t);
insert("weather", &t);
(1 ==found("weather", t))?printf("found\n"):printf("not found\n");
return 0;
}

What you want to do is exactly what Aho-Corasick algorithm does.
You can take a look at my Aho-Corasick implementation. It's contest-oriented, so maybe not focused on readability but I think it's quite clear:
typedef vector<int> VI;
struct Node {
int size;
Node *fail, *output;
VI id;
map<char, Node*> next;
};
typedef pair<Node*, Node*> P;
typedef map<char, Node*> MCP;
Node* root;
inline void init() {
root = new Node;
root->size = 0;
root->output = root->fail = NULL;
}
Node* add(string& s, int u, int c = 0, Node* p = root) {
if (p == NULL) {
p = new Node;
p->size = c;
p->fail = p->output = NULL;
}
if (c == s.size()) p->id.push_back(u);
else {
if (not p->next.count(s[c])) p->next[s[c]] = NULL;
p->next[s[c]] = add(s, u, c + 1, p->next[s[c]]);
}
return p;
}
void fill_fail_output() {
queue<pair<char, P> > Q;
for (MCP::iterator it=root->next.begin();
it!=root->next.end();++it)
Q.push(pair<char, P> (it->first, P(root, it->second)));
while (not Q.empty()) {
Node *pare = Q.front().second.first;
Node *fill = Q.front().second.second;
char c = Q.front().first; Q.pop();
while (pare != root && !pare->fail->next.count(c))
pare=pare->fail;
if (pare == root) fill->fail = root;
else fill->fail = pare->fail->next[c];
if (fill->fail->id.size() != 0)
fill->output = fill->fail;
else fill->output = fill->fail->output;
for (MCP::iterator it=fill->next.begin();
it!=fill->next.end();++it)
Q.push(pair<char,P>(it->first,P(fill,it->second)));
}
}
void match(int c, VI& id) {
for (int i = 0; i < id.size(); ++i) {
cout << "Matching of pattern " << id[i];
cout << " ended at " << c << endl;
}
}
void search(string& s) {
int i = 0, j = 0;
Node *p = root, *q;
while (j < s.size()) {
while (p->next.count(s[j])) {
p = p->next[s[j++]];
if (p->id.size() != 0) match(j - 1, p->id);
q = p->output;
while (q != NULL) {
match(j - 1, q->id);
q = q->output;
}
}
if (p != root) {
p = p->fail;
i = j - p->size;
}
else i = ++j;
}
}
void erase(Node* p = root) {
for (MCP::iterator it = p->next.begin();
it != p->next.end(); ++it)
erase(it->second);
delete p;
}
int main() {
init();
int n;
cin >> n;
for (int i = 0; i < n; ++i) {
string s;
cin >> s;
add(s, i);
}
fill_fail_output();
string text;
cin >> text;
search(text);
erase(root);
}

Related

define dynamic array inside a struct without causing a segmentation fault

I'm new to c++ and I have to work with dynamic arrays:| I need to define a few dynamic arrays inside my header file and access them from main. The problem is that when I'm defining the arrays inside the structs, I get segmentation faults and don't know how to fix them.
Below is my header file.
I'm aware that there are lots of problems with my code and I could use your help. I'm really stuck here.
Thanks in advance.
I expect this code to work fine with dynamic arrays just like it does with vectors. But it's not.
`
#pragma once
#include <iostream>
#include <functional>
#include <algorithm>
#define children_link_Size 149
#define row 15
#define col 100
using namespace std;
int itemListIndex = 0;
int freqIdx = 0;
int childIdx = 0;
int linkIdx = 0;
namespace std {
template <typename T> T* begin(std::pair<T*, T*> const& p)
{
return p.first;
}
template <typename T> T* end(std::pair<T*, T*> const& p)
{
return p.second;
}
}
struct Node
{
int itemValue{};
int order{ 0 };
int freq{ 0 };
Node* parent{ nullptr };
Node* children{};
Node* links{};
Node() {
children = new Node[children_link_Size];
links = new Node[children_link_Size];
}
explicit Node(int const& p_value, int p_order = 0) :itemValue(p_value), order(p_order)
{
++freq;
cout << " + " << itemValue << " (" << order << ")" << endl;
}
bool operator ==(Node const& p_node) const
{
return itemValue == p_node.itemValue;
}
~Node() {
delete[] children;
delete[] links;
}
};
/*struct mySet {
Node* OrderedItems;
mySet()
{
OrderedItems = new Node[uniqueSize];
}
void insert(Node item)
{
for (int i = 0; i < uniqueSize; i++)
{
for (int j = 0; j < i; j++)
{
if (item == OrderedItems[j])
{
break;
}
else
{
OrderedItems[i] = item;
}
}
}
}
~mySet()
{
delete[] OrderedItems;
}
};*/
struct ItemSupport
{
explicit ItemSupport(int p_minSup) { ItemSupport::minSup = p_minSup; }
Node* Itemset = new (nothrow) Node[row];
Node* OrderedItems = new (nothrow) Node[row];
ItemSupport& operator<<(int const& p_itemValue)
{
static int order = 0;
auto inode = find_if(Itemset, Itemset + row, [&p_itemValue](Node const& p_node)
{
return p_node.itemValue == p_itemValue;
});
if (inode == Itemset + row)
{
Node node(p_itemValue, order);
Itemset[itemListIndex] = node;
itemListIndex++;
++order;
}
else
{
auto& node = (*inode);
++node.freq;
}
return *this;
}
friend ostream& operator<<(ostream& p_os, ItemSupport const& p_itemSupport)
{
ItemSupport* NoDe = {0};
if (NoDe)
{
NoDe->OrderedItems = p_itemSupport.getFrequentItems();
for (Node node : std::make_pair(NoDe->OrderedItems, NoDe->OrderedItems + row))
{
p_os << node.itemValue << ": support " << node.freq << ", order " << node.order << endl;
}
return p_os;
}
}
Node* getItem(int const& p_itemValue)
{
auto inode = find_if(Itemset, Itemset + row, [&p_itemValue](Node const& p_node)
{
return p_node.itemValue == p_itemValue;
});
if (inode != Itemset + row)
{
Node* node = const_cast<Node*>(&(*inode));
return node;
}
return nullptr;
}
static int getMinSup()
{
return minSup;
}
static int minSup;
private:
Node* getFrequentItems() const
{
int j = 0;
for (int i = 0; i < row; i++)
{
if (Itemset[i].freq >= minSup)
OrderedItems[j++] = Itemset[i];
}
return OrderedItems;
}
Node* getUnfrequentItems() const
{
int j = 0;
for (int i = 0; i < row; i++)
{
if (Itemset[i].freq <= minSup)
OrderedItems[j++] = Itemset[i];
}
return OrderedItems;
}
/*~ItemSupport() {
//delete[] Itemset;
//delete[] OrderedItems;
}*/
};
int ItemSupport::minSup = 0;
struct FP_Tree
{
explicit FP_Tree(ItemSupport& p_itemSupport, const int& p_rootValue = int()) :_headItemSupport(p_itemSupport)
{
_root = new Node(p_rootValue);
}
//void construct(Transaction const& p_itemValues)
void construct(int *p_itemValues)
{
// A. Order items into transaction
ItemSupport* ordered = {0};
for (int const& itemValue : std::make_pair(p_itemValues, p_itemValues + row))
{
Node* pNode = _headItemSupport.getItem(itemValue);
if (pNode && pNode->freq >= ItemSupport::getMinSup())
{
if (ordered)
{
ordered->OrderedItems[freqIdx] = *pNode;
freqIdx++;
}
}
}
// B. Create FP_TREE
Node* actualNode = _root;
bool here = true;
string tab;
if (ordered)
{
for (Node const& node : std::make_pair(ordered->OrderedItems, ordered->OrderedItems + row))
{
tab += "\t-";
auto it = actualNode->children;
if (here)
{
auto it = find_if(actualNode->children,
actualNode->children + children_link_Size,
[&node](Node const& nodeTmp) {
return node == (nodeTmp);
});
here &= it != actualNode->children + children_link_Size;
}
if (here)
{
actualNode = it;
++actualNode->freq;
}
else
{
Node* pNode = new Node(node.itemValue);
actualNode->children[childIdx++] = *pNode;
pNode->parent = actualNode;
Node* pNodeHead = _headItemSupport.getItem(node.itemValue);
pNodeHead->links[linkIdx++] = *pNode;
actualNode = pNode;
delete pNode;
}
//cout << tab << actualNode->_itemValue << "(" << actualNode->_freq << ")" << endl;
}
}
//cout << endl;
}
ItemSupport& headItemSupport() const
{
return _headItemSupport;
}
public:
Node* root() const
{
return _root;
}
private:
ItemSupport& _headItemSupport;
Node* _root;
};
`

I expect this code to work fine with dynamic arrays just like it does with vectors
Since you are using C++, do use std::vector -- it will save you a lot of trouble and there is no reason not to.

Hi, I wrote code in C++ fot linear search but it displays the index of searched elemet twice with second index value as random garbage value

Here is the code.what i did is implement linear search on some elements of the array and the push searched elements in stack,afterwards I print the popped elements from stack and print them.But in search function it displays two index values.
using namespace std;
int searched[10];
int stack[100], n=100, top=-1;
void push(int val) {
if(top>=n-1)
cout<<"Stack Overflow"<<endl;
else {
top++;
stack[top]=val;
}
}
void pop() {
if(top<=-1)
cout<<"Stack Underflow"<<endl;
else {
cout<<"The popped element is "<< stack[top] <<endl;
top--;
}
}
void display() {
if(top>=0) {
cout<<"Stack elements are:";
for(int i=top; i>=0; i--)
cout<<stack[i]<<" ";
cout<<endl;
} else
cout<<"Stack is empty";
}
int search(int arr[], int n, int x)
{
int i;
for (i = 0; i < n; i++)
if (arr[i] == x)
cout<<"The element is found at the index"<<i<<"\n\n";
return x;
}
int main(void)
{
int arr[15] = { 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 };
int x = 0;
for(int i=0;i<8;i++)
{
x++;
int result = search(arr, n, x);
cout << "searched Element is " << result<<"\t\t";
push(result);
pop();
}
return 0;
}```

There are two issues that lead to this confusing result.
First, if I am not mistaken the search function, which was written like this:
int search(int arr[], int n, int x)
{
int i;
for (i = 0; i < n; i++)
if (arr[i] == x)
cout<<"The element is found at the index"<<i<<"\n\n";
return x;
}
is parsed similarly to the following:
int search(int arr[], int n, int x)
{
int i;
for (i = 0; i < n; i++) {
if (arr[i] == x) {
cout<<"The element is found at the index"<<i<<"\n\n";
}
}
return x;
}
Presumably you meant this:
int search(int arr[], int n, int x)
{
int i;
for (i = 0; i < n; i++) {
if (arr[i] == x) {
cout<<"The element is found at the index"<<i<<"\n\n";
return x;
}
}
}
Second, since the global n starts at the value 100, the search loop runs off the end of the length 15 array, into other memory. This is probably undefined behavior.

PSET5 Speller Huge Number of Misspelled Words

I have been running the solution but is getting a high number of misspelled words.
WORDS MISSPELLED: 15904 as compared to staff's WORDS MISSPELLED: 955
Other than that, the word count is accurate and the runtime is alright.
I suspect the problem might come from the check / load function but I am not sure what caused it.
Some other code implemented a "to lowercase" in the check function, but I thought the (strcasecmp) would have done the job of comparing between strings, disregarding the case.
Check Function
bool check(const char *word)
{
int hashInt = hash(word);
if (table[hashInt] == NULL)
{
return 1;
}
node *cursor = table[hashInt];
while (cursor != NULL)
{
int i = strcasecmp(cursor -> word, word);
if (i == 0)
{
return 0;
break;
}
cursor = cursor -> next;
}
return false;
}
Load Function
bool load(const char *dictionary)
{
FILE *file = fopen(dictionary, "r");
if (file == NULL)
{
printf("error opening file");
return 1;
}
char word [LENGTH + 1];
while (fscanf(file, "%s\n", word) != EOF)
{
int hashInt = hash(word);
node *n = malloc(sizeof(node));
if (n == NULL)
{
unload();
return 1;
}
if (table[hashInt] == NULL)
{
table[hashInt] = n;
}
else
{
n -> next = table[hashInt];
table[hashInt] = n;
}
strcpy(n -> word, word);
wordLoaded++;
}
fclose(file);
return true;
}
Hash Function
unsigned int hash(const char *word)
{
unsigned int hash = 0;
for (int i = 0, n = strlen(word); i < n; i++)
hash = (hash << 2) ^ word[i];
return hash % N;
return 0;
}

From the spec:
dictionary is assumed to be a file containing a list of lowercase
words
and
Your implementation of check must be case-insensitive.
This int i = strcasecmp(cursor -> word, word); looks like it would fulfill the requirement as long as this int hashInt = hash(word); was also case insensitive. Alas, it is not; hash("A") and hash("a") will return different values.

A little trouble with creating a single linked list in C++

I am going to create a single linked list and construct a function (Locate()) that returns the address of the element.But in the end, I didn't see the result of this function. I tried it. This function should be run, but the result is different from what I expected.
use vs2019 on WIndows10，a student:)
#include<iostream>
using namespace std;
struct Node { //Node
int data;
Node* link;
Node(int item, Node* l = NULL)
{
data = item;
link = l;
}
Node(Node* l = NULL)
{
data = 0;
link = l;
}
};
class Link :public Node { //Link
private:
Node* first;
public:
Link(Node* l = NULL)
{
first = l;
}
Link(int d, Node* l = NULL)
{
first = new Node(d);
}
Node* Locate(int i);
};
Node* Link::Locate(int i) //Locate()
{
if (i < 0)
{
cerr << "wrong operation when locating" << endl;
exit(1);
}
int count = 0;
Node* current = first;
while (count < i && current->link != NULL)
{
current = current->link;
count++;
}
return current;
}
int main()
{
Link a;
Node* b = new Node(1);
Node* c = new Node(2);
a.link = b;
b->link = c;
cout << a.data << ' ' << b->data << ' '<<c->data<<endl;
cout << a.Locate(1) << endl;
return 0;
}
Will not output the result of this function 'Locate()' being called

Locate() accesses first->link. At that time, first is a null pointer. Whereupon your program exhibits undefined behavior; in practice, it most likely crashes.

When I re-modify the List constructor, its(Locate()) output is normal and the expected result is obtained.
The modified constructors are as follows:
Link()
{
first = new Node;
}
Link(int d)
{
first = new Node(d);
}
Node* Locate(int i);

Memory allocation and deallocation

Here is the entire program, please help me, I've tried everything to find out what exactly is going with the memory. The problem is everything runs perfectly, but there are some extra characters printed with output.
Here is the .h file:
class MyString
{
public:
MyString();
MyString(const char *message);
MyString(const MyString &source);
~MyString();
const void Print() const;
const int Length() const;
MyString& operator()(const int index, const char b);
char& operator()(const int i);
MyString& operator=(const MyString& rhs);
bool operator==(const MyString& other) const;
bool operator!=(const MyString& other) const;
const MyString operator+(const MyString& rhs) const;
MyString& operator+=(const MyString& rhs);
friend ostream& operator<<(ostream& output, const MyString& rhs);
const int Find(const MyString& other);
MyString Substring(int start, int length);
private:
char *String;
int Size;
};
istream& operator>>(istream& input, MyString& rhs);
The .cpp file:
MyString::MyString()
{
char temp[] = "Hello World";
int counter(0);
while(temp[counter] != '\0')
{
counter++;
}
Size = counter;
String = new char [Size];
for(int i=0; i < Size; i++)
String[i] = temp[i];
}
//alternate constructor that allows for setting of the inital value of the string
MyString::MyString(const char *message)
{
int counter(0);
while(message[counter] != '\0')
{
counter++;
}
Size = counter;
String = new char [Size];
for(int i=0; i < Size; i++)
String[i] = message[i];
}
//copy constructor
MyString::MyString(const MyString &source)
{
int counter(0);
while(source.String[counter] != '\0')
{
counter++;
}
Size = counter+1;
String = new char[Size];
for(int i = 0; i <= Size; i++)
String[i] = source.String[i];
}
//Deconstructor
MyString::~MyString()
{
delete [] String;
}
//Length() method that reports the length of the string
const int MyString::Length() const
{
int counter(0);
while(String[counter] != '\0')
{
counter ++;
}
return (counter);
}
/*Parenthesis operator should be overloaded to replace the Set and Get functions of your previous assignment. Note that both instances should issue exit(1) upon violation of the string array bounaries.
*/
MyString& MyString::operator()(const int index, const char b)
{
if(String[index] == '\0')
{
exit(1);
}
else
{
String[index] = b;
}
}
char& MyString::operator()(const int i)
{
if(String[i] == '\0')
{
exit(1);
}
else
{
return String[i];
}
}
/*Assignment operator (=) which will copy the source string into the destination string. Note that size of the destination needs to be adjusted to be the same as the source.
*/
MyString& MyString::operator=(const MyString& rhs)
{
if(this != &rhs)
{
delete [] String;
String = new char[rhs.Size];
Size = rhs.Size;
for(int i = 0; i < rhs.Size+1 ; i++)
{
String[i] = rhs.String[i];
}
}
return *this;
}
/*Logical comparison operator (==) that returns true iff the two strings are identical in size and contents.
*/
bool MyString::operator==(const MyString& other)const
{
if(other.Size == this->Size) {
for(int i = 0; i < this->Size+1; i++)
{
if(&other == this)
return true;
}
}
else
return false;
}
//Negated logical comparison operator (!=) that returns boolean negation of 2
bool MyString::operator!=(const MyString& other) const
{
return !(*this == other);
}
//Addition operator (+) that concatenates two strings
const MyString MyString::operator+(const MyString& rhs) const
{
char* tmp = new char[Size + rhs.Size +1];
for(int i = 0; i < Size; i++)
{
tmp[i] = String[i];
}
for(int i = 0; i < rhs.Size+1; i++) {
tmp[i+Size] = rhs.String[i];
}
MyString result;
delete [] result.String;
result.String = tmp;
result.Size = Size+rhs.Size;
return result;
}
/*Addition/Assigment operator (+=) used in the following fashion: String1 += String2 to operate as String1 = String1 + String2
*/
MyString& MyString::operator+=(const MyString& rhs)
{
char* tmp = new char[Size + rhs.Size + 1];
for(int i = 0; i < Size; i++) {
tmp[i] = String[i];
}
for(int i = 0; i < rhs.Size+1; i++)
{
tmp[i+Size] = rhs.String[i];
}
delete [] String;
String = tmp;
Size += rhs.Size;
return *this;
}
istream& operator>>(istream& input, MyString& rhs)
{
char* t;
int size(256);
t = new char[size];
input.getline(t,size);
rhs = MyString(t);
delete [] t;
return input;
}
ostream& operator<<(ostream& output, const MyString& rhs)
{
if(rhs.String != '\0')
{
output << rhs.String;
}
else
{
output<<"No String to output\n";
}
return output;
}
/*MyString::Find that finds a string in a larger string and returns the starting location of the substring. Note that your string location starts from 0 and ends at length -1. If the string is not found, a value of -1 will be returned
*/
const int MyString::Find(const MyString& other)
{
int nfound = -1;
if(other.Size > Size)
{
return nfound;
}
int i = 0, j = 0;
for(i = 0; i < Size; i++) {
for(j = 0; j < other.Size; j++) {
if( ((i+j) >= Size) || (String[i+j] != other.String[j]) )
{
break;
}
}
if(j == other.Size)
{
return i;
}
}
return nfound;
}
/*MyString::Substring(start, length). This method returns a substring of the original string that contains the same characters as the original string starting at location start and is as long as length.
*/
MyString MyString::Substring(int start, int length)
{
char* leo = new char[length+1];
for(int i = start; i < start + length+1; ++i)
{
leo[i-start] = String[i];
}
MyString sub;
delete [] sub.String; sub.String = leo; sub.Size = Size;
return sub;
}
//Print() method that prints the string
const void MyString::Print() const
{
for(int i=0; i < Size; i++)
{
cout<<String[i];
}
cout<<endl;
}
The main.cpp file:
int main (int argc, char **argv)
{
MyString String1;
const MyString ConstString("Target string"); //Test of alternate constructor
MyString SearchString; //Test of default constructor that should set "Hello World".
MyString TargetString (String1); //Test of copy constructor
cout << "Please enter two strings. ";
cout << "Each string needs to be shorter than 256 characters or terminated by /\n." << endl;
cout << "The first string will be searched to see whether it contains exactly the second string. " << endl;
cin >> SearchString >> TargetString; // Test of cascaded string-extraction operator
if(SearchString.Find(TargetString) == -1) {
cout << TargetString << " is not in " << SearchString << endl;
}
else {
cout << TargetString << " is in " << SearchString << endl;
cout << "Details of the hit: " << endl;
cout << "Starting position of the hit: " << SearchString.Find(TargetString) << endl;
cout << "The matching substring is: " << SearchString.Substring(SearchString.Find(TargetString), TargetString.Length()-1)<<"\n";
}
return 0;
}
Running the program you get this:
Please enter two strings. Each string needs to be shorter than 256 characters or terminated by /
.
The first string will be searched to see whether it contains exactly the second string.
firstly
real
realt World is not in firstly
Please Help!!

try adding a '\0' at the end of your strings in your MyString::MyString(const char *message) constructor

#Sam's answer is correct. I'm going to add on to it to help you learn what's happening.
C and C++ strings are really character arrays that follow a convention that the string is terminated with \0, sometimes called NUL (not null), which is a character where all bits are 0.
Your code gets the first part right in that it creates an array of characters. However, you do not apply the convention that the string must be NUL terminated.
You then pass a string that does not follow the NUL termination convention to cout, which does follow that convention. In other words, it runs through the string, printing each character to stdout, until it happens across the character \0 in memory. It's actually fairly lucky that it terminates. If there were not a \0 in the character array it is outputing, it would just keep on going until reaching a memory address that does not belong to your program and failing with a segmentation fault.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Find the word in the stream? - string

Related

define dynamic array inside a struct without causing a segmentation fault

Hi, I wrote code in C++ fot linear search but it displays the index of searched elemet twice with second index value as random garbage value

PSET5 Speller Huge Number of Misspelled Words

A little trouble with creating a single linked list in C++

Memory allocation and deallocation

Categories

Resources