search lines of a phrase

search lines of a phrase - string

I'm trying to program a grep method to search a text file with a provided user input word. The code will then search each line of the text file, searching for the user input word. If found if will display the line with the word in the correct location surrounded by <> symbols.
If I use the first line of this post as an example if would be-
user input: "a"
output:
Line 1: I'm trying to program < a > grep method to search < a > text file with < a > provided user input word. The code will
I've already gotten a majority of it written out but it needs to consider for wrapped phrases and multiple phrases in the same line, such as in my example. What should be the approach?
public static void grep(String[] phrases, Scanner words3) {
Scanner in = new Scanner(System.in);
int line = 1;
boolean found = false;
String grep;
System.out.print("\n\n\nPLease enter something to grep: ");
grep = in.nextLine();
System.out.print("\n");
while(words3.hasNext() == true) { //check file for remaining lines of text and searches each for user input string
String currentLine = words3.nextLine();
if(currentLine.indexOf(grep) == -1) { //change so it accounts for the grep as a wrapped phrase and for multiple matches in the same line
line++;
continue;
}
else {
System.out.print("Line " + line + " ");
int l = 0;
while(l < currentLine.length()) {
if(l == currentLine.indexOf(grep)) {
System.out.print("<" + grep + ">" );
l += grep.length() - 1;
}
else {
System.out.print(currentLine.charAt(l));
}
l++;
}
System.out.print("\n");
line++;
found = true;
}
}
if(found == false) {
System.out.print("[The word <" + grep + "> was not found]\n");
}
}

Related

charAt not working for strings from a file

Im using the charAt function to find the first and second letters in a string that was read from a file but after getting the first character from the charAt(0) line, charAt(1), throws an exection that the string is too short when I know it is not. Here is the code.
while(inputFile.hasNext()){
//read file first line
String line = inputFile.nextLine();
//if the first 2 letters of the line the scanner is reading are the same
//as the search letters print the line and add one to the linesPrinted count
String lineOne = String.valueOf(line.charAt(0));
String lineTwo = String.valueOf(line.charAt(1));
String searchOne = String.valueOf(search.charAt(0));
String searchTwo = String.valueOf(search.charAt(1));
if (lineOne.compareToIgnoreCase(searchOne) == 0 && lineTwo.compareToIgnoreCase(searchTwo) == 0){
System.out.println(line);
linesPrinted++;
}
}
I've tried checking the make sure the string isn't being changed after the charAt(0) use by printing and I know it isn't and I've run the program with no probems after just removing the line so I am sure it is this that's causing the problem

The only functional change needed would to change hasNext to hasNextLine.
As one might encounter a line shorter than 2, say an empty line at the end of file, check the length.
while (inputFile.hasNextLine()) {
// read file next line
String line = inputFile.nextLine();
if (line.length() < 2) {
continue;
}
// if the first 2 letters of the line the scanner is reading are the same
// as the search letters print the line and add one to the linesPrinted count
String lineOne = line.substring(0, 1);
String lineTwo = lin.substring(1, 2);
String searchOne = search.substring(0, 1);
String searchTwo = search.substring(1, 2);
if (lineOne.equalsIgnoreCase(searchOne) && lineTwo.equalsIgnoreCase(searchTwo)) {
System.out.println(line);
linesPrinted++;
}
}
There is a problem with special chars and other languages, scripts. A Unicode code point (symbol, character) can be more than one java char.
while (inputFile.hasNextLine()) {
// read file next line
String line = inputFile.nextLine();
if (line.length() < 2) {
continue;
}
// if the first 2 letters of the line the scanner is reading are the same
// as the search letters print the line and add one to the linesPrinted count
int]} lineStart = line.codePoints().limit(2).toArray();
int]} searchStart = search.codePoints().limit(2).toArray();
String lineKey = new String(lineStart, 0, lineStart.length);
String searchKey = new String(searchStart, 0, searchStart.length);
if (lineKey.equalsIgnoreCase(searchKey)) {
System.out.println(line);
linesPrinted++;
}
}

How to do many small changes in big string in fastest way. Visual C++

I tried to deal with this problem over month. I need to do many replacements (over 10 millions) in one big string (String ^). Also I need to do it fast. My way was correctly but program was running over 30 minutes.
Problem:
I have a table of changes to do: [strWas1, strWillBe1, strWas2, strWillBe2, ..., strWas10^7, strWillBe10^7]. Also I have one big string which can contain some of strWasN but it also can contain something-elsestrWas1 and I don't want to change it because "something-elsestrWas1" is not "strWas1".
For example String is:
"I have two dogs, three notdogs, also dogsikong, 5dogs, -dogs. DOGS,
Dogs, DoGs, 33DoGs00"
Now I need to change all isolated "dogs" from letters ("dogs" is strWas1) to "cats" ("cats" is strWillBe1). Result should be:
"I have two cats, three notdogs, also dogsikong, 5cats, -cats. cats,
cats, cats, 33cats00"
My last try was:
array<String^>^ strArray = gcnew array<String^>(9999999);
strArray[0] = gcnew String("dogs");
strArray[1] = gcnew String("cats");
//...
strArray[9999998] = gcnew String("whatReplace");
strArray[9999999] = gcnew String("newText");
bool found = false;
int index;
bool doThis = true;
String ^ notAllowed = u8"aąbcćdeęfghijklłmnńoópqrsśtuvwxyzźżAĄBCĆDEĘFGHIJKLŁMNŃOÓPQRSŚTUVWXYZŹŻёйцукенгшщзхъфывапролджэячсмитьбюЁЙЦУКЕНГШЩЗХЪФЫВАПРОЛДЖЭЯЧСМИТЬБЮ";
String ^ text = u8"I have two dogs, three notdogs, also dogsikong, 5dogs, -dogs. DOGS, Dogs, DoGs, 33DoGs00";
for (int i = 0; i < 9999999; i+=2) {
while (found = text->Contains(strArray[i])) {
index = text->IndexOf(strArray[i]);
MessageBox::Show(index.ToString());
doThis = true;
if (index == 0) {
for (int j = 0; j < notAllowed->Length; j++) {
if (text->Substring(strArray[i]->Length, 1) == notAllowed->Substring(j, 1)) doThis = false;
}
}
else if (text->Length - index - strArray[i]->Length) {
for (int j = 0; j < notAllowed->Length; j++) {
if (text->Substring(index-1, 1) == notAllowed->Substring(j, 1)) doThis = false;
}
}
else {
for (int j = 0; j < notAllowed->Length; j++) {
if ((text->Substring(index - 1, 1) == notAllowed->Substring(j, 1)) || (text->Substring(index+strArray[i]->Length,1)== notAllowed->Substring(j, 1))) doThis = false;
}
}
if (doThis) {
text = text->Substring(0, index) + strArray[i + 1] + text->Substring(index + strArray[i]->Length, text->Length - index - strArray[i]->Length);
}
}
}
But this is working for endlessly
New version (thanks to Vlad Feinstein):
array<String^>^ strArray = gcnew array<String^>(10);
strArray[0] = gcnew String("dogs");
strArray[1] = gcnew String("cats");
strArray[2] = gcnew String("dogs");
strArray[3] = gcnew String("cats");
strArray[4] = gcnew String("dogs");
strArray[5] = gcnew String("cats");
strArray[6] = gcnew String("dogs");
strArray[7] = gcnew String("cats");
strArray[8] = gcnew String("dogs");
strArray[9] = gcnew String("cats");
bool found = false;
int index;
bool doThis = true;
String ^ text = u8"I have two dogs, three notdogs, also dogsikong, 5dogs, -dogs. DOGS, Dogs, DoGs, 33DoGs00";
for (int i = 0; i < 10; i += 2)
{
int index = 0;
while ((index = text->ToLower()->IndexOf(strArray[i]->ToLower(), index)) != -1)
{
doThis = true;
// is there one more char?
if (index + strArray[i]->Length < text->Length)
{
if (Char::IsLetter(text[index+strArray[i]->Length]))
doThis = false;
}
// is there previous char?
if (index > 0)
{
if (Char::IsLetter(text[index - 1]))
doThis = false;
}
if (doThis)
text = text->Substring(0, index) + strArray[i + 1] +
text->Substring(index + strArray[i]->Length);
Debug::WriteLine(text);
index++;
}
}
Of course it is still not so quick version. Quick version wrote David Yaw.

There's a much better way to do this, rather than blindly checking each of one million replacement strings. Let .Net hash the strings, and have it do the checking that way.
If we receive the find & replace strings as a Dictionary, we can use .Net's hash lookups to find the strings that we need to replace.
If we step through each character in the string, it might be the beginning of a 5-character 'search for' string, or the beginning of a 4-character 'search for' string, etc, or it might not be part of a 'search for' string at all, in which case it'll get copied to the output directly. If we do find a 'search for' string, we'll write the replacement to the output, and mark the appropriate number of input characters as consumed.
Based on your description, it appears you want a case-insensitive comparison when searching for strings. You can use case-sensitive or -insensitive, just specify whatever you like when you construct the Dictionary.
String^ BigFindReplace(
String^ originalString,
Dictionary<String^, String^>^ replacementPairs)
{
// First, get the lengths of all the 'search for' strings in the replacement pairs.
SortedSet<int> searchForLengths;
for each (String^ searchFor in replacementPairs->Keys)
{
searchForLengths.Add(searchFor->Length);
}
// Searching for an empty string isn't valid: remove length zero, if it's there.
searchForLengths.Remove(0);
StringBuilder result;
// Step through the input string. For each character:
// A) See if the character is the beginning of one of the 'search for' strings.
// If so, then insert the 'replace with' string into the output buffer.
// Skip over this character and the rest of the 'search for' string that we found.
// B) If it's not the beginning of a 'search for' string, copy it to the output buffer.
for(int i = 0; i < originalString->Length; i++)
{
bool foundSomething = false;
int foundSomethingLength = 0;
for each (int len in searchForLengths.Reverse())
{
if (i > (originalString->Length - len))
{
// If we're on the last 4 characters of the string, we can ignore
// all the 'search for' strings that are 5 characters or longer.
continue;
}
String^ substr = originalString->Substring(i, len);
String^ replaceWith;
if (replacementPairs->TryGetValue(substr, replaceWith))
{
// We found the section of the input string that we're looking at in our
// 'search for' list! Inser the 'replace with' into the output buffer.
result.Append(replaceWith);
foundSomething = true;
foundSomethingLength = len;
break; // don't try to find more 'search for' strings.
}
}
if(foundSomething)
{
// We found & already inserted the replacement text. Just increment
// the loop counter to skip over the rest of the characters of the
// found 'search for' text.
i += (foundSomethingLength - 1); // "-1" because the for loop has its own "+1".
}
else
{
// We didn't find any of the 'search for' strings,
// so this is a character that just gets copied.
result.Append(originalString[i]);
}
}
return result.ToString();
}
My test app:
int main(array<System::String ^> ^args)
{
String^ text = "I have two dogs, three notdogs, also dogsikong, 5dogs, -dogs. DOGS, Dogs, DoGs, 33DoGs00";
Dictionary<String^, String^>^ replacementPairs =
gcnew Dictionary<String^, String^>(StringComparer::CurrentCultureIgnoreCase);
replacementPairs->Add("dogs", "cats");
replacementPairs->Add("pigs", "cats");
replacementPairs->Add("mice", "cats");
replacementPairs->Add("rats", "cats");
replacementPairs->Add("horses", "cats");
String^ outText = BigFindReplace(text, replacementPairs);
Debug::WriteLine(outText);
String^ text2 = "I have two dogs, three notpigs, also miceikong, 5rats, -dogs. RATS, Horses, DoGs, 33DoGs00";
String^ outText2 = BigFindReplace(text, replacementPairs);
Debug::WriteLine(outText2);
return 0;
}
Output:
I have two cats, three notcats, also catsikong, 5cats, -cats. cats, cats, cats, 33cats00
I have two cats, three notcats, also catsikong, 5cats, -cats. cats, cats, cats, 33cats00
Edit: Whole words only
OK, so we need to substitute whole words only. To do that, I wrote a helper method to split a string into words & nonwords. (This is different from the built-in String::Split method: String::Split doesn't return the delimiters, and we need them here.)
Once we have an array of strings, where every string is either a word or a bunch of non-word characters (e.g., separators, whitespace, etc.), then we can run each of those through the Dictionary. Because we're doing a whole word at a time, not just a single letter at a time, this is more efficient.
array<String^>^ SplitIntoWords(String^ input)
{
List<String^> result;
StringBuilder currentWord;
bool currentIsWord = false;
for each (System::Char c in input)
{
// Words are made up of letters. Word separators are made up of
// everything else (numbers, whitespace, punctuation, etc.)
bool nextCharIsWord = Char::IsLetter(c);
if(nextCharIsWord != currentIsWord)
{
if(currentWord.Length > 0)
{
result.Add(currentWord.ToString());
currentWord.Clear();
}
currentIsWord = nextCharIsWord;
}
currentWord.Append(c);
}
if(currentWord.Length > 0)
{
result.Add(currentWord.ToString());
currentWord.Clear();
}
return result.ToArray();
}
String^ BigFindReplaceWords(
String^ originalString,
Dictionary<String^, String^>^ replacementPairs)
{
StringBuilder result;
// First, separate the input string into an array of words & non-words.
array<String^>^ asWords = SplitIntoWords(originalString);
// Go through each word & non-word that came out of the split. If a word or
// non-word is in the replacement list, add the replacement to the output.
// Otherwise, add the word/nonword to the output.
for each (String^ word in asWords)
{
String^ replaceWith;
if (replacementPairs->TryGetValue(word, replaceWith))
{
result.Append(replaceWith);
}
else
{
result.Append(word);
}
}
return result.ToString();
}
My test app:
int main(array<System::String ^> ^args)
{
String^ text = "I have two dogs, three notdogs, also dogsikong, 5dogs, -dogs. DOGS, Dogs, DoGs, 33DoGs00";
array<String^>^ words = SplitIntoWords(text);
for (int i = 0; i < words->Length; i++)
{
Debug::WriteLine("words[{0}] = '{1}'", i, words[i]);
}
Dictionary<String^, String^>^ replacementPairs =
gcnew Dictionary<String^, String^>(StringComparer::CurrentCultureIgnoreCase);
replacementPairs->Add("dogs", "cats");
replacementPairs->Add("pigs", "cats");
replacementPairs->Add("mice", "cats");
replacementPairs->Add("rats", "cats");
replacementPairs->Add("horses", "cats");
String^ outText = BigFindReplaceWords(text, replacementPairs);
Debug::WriteLine(outText);
String^ text2 = "I have two dogs, three notpigs, also miceikong, 5rats, -dogs. RATS, Horses, DoGs, 33DoGs00";
String^ outText2 = BigFindReplaceWords(text2, replacementPairs);
Debug::WriteLine(outText2);
return 0;
}
Results:
words[0] = 'I'
words[1] = ' '
words[2] = 'have'
words[3] = ' '
words[4] = 'two'
words[5] = ' '
words[6] = 'dogs'
words[7] = ', '
words[8] = 'three'
words[9] = ' '
words[10] = 'notdogs'
words[11] = ', '
words[12] = 'also'
words[13] = ' '
words[14] = 'dogsikong'
words[15] = ', 5'
words[16] = 'dogs'
words[17] = ', -'
words[18] = 'dogs'
words[19] = '. '
words[20] = 'DOGS'
words[21] = ', '
words[22] = 'Dogs'
words[23] = ', '
words[24] = 'DoGs'
words[25] = ', 33'
words[26] = 'DoGs'
words[27] = '00'
I have two cats, three notdogs, also dogsikong, 5cats, -cats. cats, cats, cats, 33cats00
I have two cats, three notpigs, also miceikong, 5cats, -cats. cats, cats, cats, 33cats00

There are a lot of issues in your code the may cause problems but main logical error is:
while (found = text->Contains(strArray[i]))
should be
while (found == text->Contains(strArray[i]))
Since == is the comparison operator while = is an assignment operator. So you are always assigning, therefore your while loop in an infinite loop.

Hm... No?
while (found == text->Contains(strArray[i]))
is for comparison. But I didn't compute found before. So I calculate found in while and checking if it is true. It is allowed.
while (found = text->Contains(strArray[i]))
is exactly what:
found = text->Contains(strArray[i])
while (found==true)
At least in normal C++ it is working. Here I don't have problem with this also.

Пётр Васильевич, a few suggestions:
Replace Substring(x, 1) with Char[x].
Throw away your notAllowed string and use .NET's Char.IsLetter Method, or at least break out of your for() loops when you set doThis = false;
If you need a substring from index to the end of string, you don't need to calculate the length; just use a form with one parameter: public string Substring(int startIndex)
Don't use text->Contains(); you need to call text->IndexOf() anyway, just compare that index to -1.
10 million words??? There aren't that many in English and Russian combined!
Use two-parameters form of String.IndexOf Method (Char, Int32), to specify where to start the search (from the position of the previously found word), to avoid searching the beginning of the string over and over. Something like that:
for (int i = 0; i < 9999999; i += 2)
{
int index = 0;
while ((index = text->IndexOf(strArray[i], index)) != -1)
{
doThis = true;
// is there one more char?
if (index + strArray[i]->Length < text->Length)
{
if(Char.IsLetter(text->Char[strArray[i]->Length]))
doThis = false;
}
// is there previous char?
if (index > 0)
{
if (Char.IsLetter(text->Char[index - 1]))
doThis = false;
}
if (doThis)
text = text->Substring(0, index) + strArray[i + 1] +
text->Substring(index + strArray[i]->Length);
}
}
In your while() loop, collect the indexes of the found strings into an array, then do all replacement of the same word in one pass. That is particularly useful if there are multiple occurances of the same word in text.

How to split a string into multiple strings if spaces are detected (GM:Studio)

I made a console program, but the problem is that it doesn't allow parameters to be inserted. So I'm wondering how would I split a single string into multiple strings to achieve what I need. E.g.: text="msg Hello" would be split into textA="msg" and textB="Hello"
This is the main console code so far (just to show the idea):
if (keyboard_check_pressed(vk_enter)) {
text_console_c = asset_get_index("scr_local_"+string(keyboard_string));
if (text_console_c > -1) {
text_console+= "> "+keyboard_string+"#";
script_execute(text_console_c);
text_console_c = -1;
}
else if (keyboard_string = "") {
text_console+= ">#";
}
else {
text_console+= "> Unknown command: "+keyboard_string+"#";
};
keyboard_string = "";
}

I cant recommend spliting string with iteration by char, because when u try split very very very long string, then time to split is very long and can freeze thread for a short/long time. Game maker is single threaded for now.
This code is much faster.
string_split
var str = argument[0] //string to split
var delimiter = argument[1] // delimiter
var letDelimiter = false // append delimiter to each part
if(argument_count == 3)
letDelimiter = argument[2]
var list = ds_list_create()
var d_at = string_pos(delimiter, str)
while(d_at > 0) {
var part = string_delete(str, d_at , string_length(str))
if(letDelimiter)
part = part + delimiter
str = string_delete(str, 1, d_at)
d_at = string_pos(delimiter, str)
ds_list_add(list, part)
if(d_at == 0 && str != "")//last string without delimiter, need to add too
ds_list_add(list, str)
}
return list;
Dont forget ds_list_destroy after you iterate all strings
for example:
var splited = string_split("first part|second part", '|')
for(splited) {
//do something with each string
}
ds_list_destroy(splited)

Something like this may help, haven't tested it out but if you can follow what is going on its a good place to start.
Text = "msg Hello"
counter = 0
stringIndex = 0
for (i = 0; i < string_length(text); i++)
{
if string_char_at(text,i) == " "
{
counter++
stringIndex = 0
} else {
string_insert(string_char_at(text,i),allStrings(counter),stringIndex)
stringIndex++
}
}
allStrings should be an array containing each of the separate strings. Whenever a " " is seen the next index of allStrings starts having it's characters filled in. stringIndex is used to add the progressive characters.

how to read the content of file and store it in formatted array in c#

I am trying to read text from file, checking its content and then storing it in an array of string.
FileStream fs = new FileStream(pathToFiles, FileMode.Open);
StreamReader sr = new StreamReader(fs);
do{
line=sr.ReadLine();
if (line == "databases")
{
j = 0;
while ((ch = sr.Read()) != '}')
{
admin_databases[j] = sr.ReadLine();
j++;
}
}
else if (line == "table_name")
{
j = 0;
while ((ch = sr.Read()) != '}')
{
admin_table_name[j] = sr.ReadLine();
j++;
}
}
else
{
Response.Write(line+" ");
}
} while (line !=null);
The text is read by using ReadLine() method, but while checking its content
i.e
if(line=="databases")
it shows null string and hence unable to store it in an array.
what is the mistake that i am making here?

As an answer to last comments under main post :
According to what you say, we're back to whitespace theory !
A few tips :
Replace Response.Write(line+" "); by Response.Write("'" + line+"'"); just to see exact captured values.
Check your file content.
You could also replace your == operators by more specific comparisons : String.Compare with case insensitive param, or String.StartsWith() / Contains(), etc, rather than exact comparison.
You could also clean up your input string with "Trim()", etc.

Sorry, but we don't know what is in your file. Maybe problem is in any whitespaces.
In this line of code:
while ((ch = sr.Read()) != '}')
You missed one '='. It should look like that:
while ((ch == sr.Read()) != '}')

Get context for search string in text in C#

Given a string text which contains newline there is a search keyword which matches an item within the text.
How do I implement the following in C#:
searchIdx = search index (starting with 0, then 1, etc. for each successive call to GetSearchContext. Initially start with 0.
contextsTxt = string data to search in
searchTxt = keyword to search for in contextsTxt
numLines = number of lines to return surrounding the searchTxt found (ie. 1 = the line the searchTxt is found on, 2 = the line the searchTxt is found on, 3 = the line above the searchTxt is found on, the line the searchTxt is found on, and the line below the searchTxt is found on)
returns the "context" based on the parameters
string GetSearchContext(int searchIdx, string contentsTxt, string searchTxt, int numLines);
If there's a better function interface to accomplish this feel free to suggest that as well.
I tried the following but doesn't seem to work properly all the time:
private string GetSearchContext(string contentValue, string search, int numLines)
{
int searchIdx = contentValue.IndexOf(search);
int startIdx = 0;
int lastIdx = 0;
while (startIdx != -1 && (startIdx = contentValue.IndexOf('\n', startIdx+1)) < searchIdx)
{
lastIdx = startIdx;
}
startIdx = lastIdx;
if (startIdx < 0)
startIdx = 0;
int endIdx = searchIdx;
int lineCnt = 0;
while (endIdx != -1 && lineCnt++ < numLines)
{
endIdx = contentValue.IndexOf('\n', endIdx + 1);
}
if (endIdx == -1 || endIdx > contentValue.Length - 1)
endIdx = contentValue.Length - 1;
string lines = contentValue.Substring(startIdx, endIdx - startIdx + 1);
if (lines[0] == '\n')
lines = lines.Substring(1);
if (lines[lines.Length - 1] == '\n')
{
lines = lines.Substring(0, lines.Length - 1);
}
if (lines[lines.Length - 1] == '\r')
{
lines = lines.Substring(0, lines.Length - 1);
}
return lines;
}

it's not actually a homework question. i'm trying to build a personal search engine. I just now figured out the problem as to why it didn't always work which was due to case-sensitive searching.
Just needed to add StringComparison.CurrentCultureIgnoreCase and voila it worked! I feel dumb for not thinking of that before posting.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

search lines of a phrase - string

Related

charAt not working for strings from a file

How to do many small changes in big string in fastest way. Visual C++

How to split a string into multiple strings if spaces are detected (GM:Studio)

how to read the content of file and store it in formatted array in c#

Get context for search string in text in C#

Categories

Resources