Case insensitive string search in golang

Case insensitive string search in golang - string

How do I search through a file for a word in a case insensitive manner?
For example
If I'm searching for UpdaTe in the file, if the file contains update, the search should pick it and count it as a match.

strings.EqualFold() can check if two strings are equal, while ignoring case. It even works with Unicode. See http://golang.org/pkg/strings/#EqualFold for more info.
http://play.golang.org/p/KDdIi8c3Ar
package main
import (
"fmt"
"strings"
)
func main() {
fmt.Println(strings.EqualFold("HELLO", "hello"))
fmt.Println(strings.EqualFold("ÑOÑO", "ñoño"))
}
Both return true.

Presumably the important part of your question is the search, not the part about reading from a file, so I'll just answer that part.
Probably the simplest way to do this is to convert both strings (the one you're searching through and the one that you're searching for) to all upper case or all lower case, and then search. For example:
func CaseInsensitiveContains(s, substr string) bool {
s, substr = strings.ToUpper(s), strings.ToUpper(substr)
return strings.Contains(s, substr)
}
You can see it in action here.

Do not use strings.Contains unless you need exact matching rather than language-correct string searches
None of the current answers are correct unless you are only searching ASCII characters the minority of languages (like english) without certain diaeresis / umlauts or other unicode glyph modifiers (the more "correct" way to define it as mentioned by #snap). The standard google phrase is "searching non-ASCII characters".
For proper support for language searching you need to use http://golang.org/x/text/search.
func SearchForString(str string, substr string) (int, int) {
m := search.New(language.English, search.IgnoreCase)
return = m.IndexString(str, substr)
}
start, end := SearchForString('foobar', 'bar');
if start != -1 && end != -1 {
fmt.Println("found at", start, end);
}
Or if you just want the starting index:
func SearchForStringIndex(str string, substr string) (int, bool) {
m := search.New(language.English, search.IgnoreCase)
start, _ := m.IndexString(str, substr)
if start == -1 {
return 0, false
}
return start, true
}
index, found := SearchForStringIndex('foobar', 'bar');
if found {
fmt.Println("match starts at", index);
}
Search the language.Tag structs here to find the language you wish to search with or use language.Und if you are not sure.
Update
There seems to be some confusion so this following example should help clarify things.
package main
import (
"fmt"
"strings"
"golang.org/x/text/language"
"golang.org/x/text/search"
)
var s = `Æ`
var s2 = `Ä`
func main() {
m := search.New(language.Finnish, search.IgnoreDiacritics)
fmt.Println(m.IndexString(s, s2))
fmt.Println(CaseInsensitiveContains(s, s2))
}
// CaseInsensitiveContains in string
func CaseInsensitiveContains(s, substr string) bool {
s, substr = strings.ToUpper(s), strings.ToUpper(substr)
return strings.Contains(s, substr)
}

If your file is large, you can use regexp and bufio:
//create a regex `(?i)update` will match string contains "update" case insensitive
reg := regexp.MustCompile("(?i)update")
f, err := os.Open("test.txt")
if err != nil {
log.Fatal(err)
}
defer f.Close()
//Do the match operation
//MatchReader function will scan entire file byte by byte until find the match
//use bufio here avoid load enter file into memory
println(reg.MatchReader(bufio.NewReader(f)))
About bufio
The bufio package implements a buffered reader that may be useful both
for its efficiency with many small reads and because of the additional
reading methods it provides.

Related

How can I iterate over each 2 consecutive characters in a string in go?

I have a string like this:
package main
import "fmt"
func main() {
some := "p1k4"
for i, j := range some {
fmt.Println()
}
}
I want take each two consecutive characters in the string and print them. the output should like p1, 1k, k4, 4p.
I have tried it and still having trouble finding the answer, how should I write the code in go and get the output I want?

Go stores strings in memory as their UTF-8 encoded byte sequence. This maps ASCII charactes one-to-one in bytes, but characters outside of that range map to multiple bytes.
So I would advise to use the for range loop over a string, which ranges over the runes (characters) of the string, properly decoding multi-byte runes. This has the advantage that it does not require allocation (unlike converting the string to []rune). You may also print the pairs using fmt.Printf("%c%c", char1, char2), which also will not require allocation (unlike converting runes back to string and concatenating them).
To learn more about strings, characters and runes in Go, read blog post: Strings, bytes, runes and characters in Go
Since the loop only returns the "current" rune in the iteration (but not the previous or the next rune), use another variable to store the previous (and first) runes so you have access to them when printing.
Let's write a function that prints the pairs as you want:
func printPairs(s string) {
var first, prev rune
for i, r := range s {
if i == 0 {
first, prev = r, r
continue
}
fmt.Printf("%c%c, ", prev, r)
prev = r
}
// Print last pair: prev is the last rune
fmt.Printf("%c%c\n", prev, first)
}
Testing it with your input and with another string that has multi-byte runes:
printPairs("p1k4")
printPairs("Go-世界")
Output will be (try it on the Go Playground):
p1, 1k, k4, 4p
Go, o-, -世, 世界, 界G

package main
import (
"fmt"
)
func main() {
str := "12345"
for i := 0; i < len(str); i++ {
fmt.Println(string(str[i]) + string(str[(i+1)%len(str)]))
}
}

This is a simple for loop over your string with the first character appended at the back:
package main
import "fmt"
func main() {
some := "p1k4"
ns := some + string(some[0])
for i := 0; i < len(ns)-1; i++ {
fmt.Println(ns[i:i+2])
}
}

Split a string at the last occurrence of the separator in golang

I am trying to split a string by the last occurrence of a separator (/) in golang
Example, I have a string "a/b/c/d", after performing the split, I would like an array of string as below
[
"a/b/c",
"a/b"
"a"
]
I tried exploring strings package but couldn't find any function that does this
func main() {
fmt.Printf("%q\n", strings.Split("a/b/c/d/e", "/"))
}
May I know a way to handle this?

To split any string only at the last occurrence, using strings.LastIndex
import (
"fmt"
"strings"
)
func main() {
x := "a_ab_daqe_sd_ew"
lastInd := strings.LastIndex(x, "_")
fmt.Println(x[:lastInd]) // o/p: a_ab_daqe_sd
fmt.Println(x[lastInd+1:]) // o/p: ew
}
Note, strings.LastIndex returns -1 if substring passed(in above example, "_") is not found

Since this is for path operations, and it looks like you don't want the trailing path separator, then path.Dir does what you're looking for:
fmt.Println(path.Dir("a/b/c/d/e"))
// a/b/c/d
If this is specifically for filesystem paths, you will want to use the filepath package instead, to properly handle multiple path separators.

Here's a simple function that uses filepath.Dir(string) to build a list of all ancestor directories of a given filepath:
func main() {
fmt.Printf("OK: %#v\n", parentsOf("a/b/c/d"))
// OK: []string{"a/b/c", "a/b", "a"}
}
func parentsOf(s string) []string {
dirs := []string{}
for {
parent := filepath.Dir(s)
if parent == "." || parent == "/" {
break
}
dirs = append(dirs, parent)
s = parent
}
return dirs
}

Go: how to check if a string contains multiple substrings?

strings.Contains(str_to_check, substr) takes only one argument as the substring to check, how do I check multiple substrings without using strings.Contains() repeatedly?
eg. strings.Contains(str_to_check, substr1, substr2)

Yes, you can do this without calling strings.Contains() multiple times.
If you know substrings in advance the easiest way to check this with regular expression. And if a string to check is long and you have quite a few substrings it can be more fast then calling multiple strings.Contains
Example https://play.golang.org/p/7PokxbOOo7:
package main
import (
"fmt"
"regexp"
)
var re = regexp.MustCompile(`first|second|third`)
func main() {
fmt.Println(re.MatchString("This is the first example"))
fmt.Println(re.MatchString("This is the second example after first"))
fmt.Println(re.MatchString("This is the third example"))
fmt.Println(re.MatchString("This is the forth example"))
}
Output:
true
true
true
false
If the subs to check are dynamic it may be a bit more difficult to create regex as you need to escape special characters and regex compilation is not fast so strings.Contains() may be better in this case though it's better test if your code is performance critical.
Another good option could be to write your own scanner that can leverage common prefixes in substrings (if any) using prefix tree.

You can write your own utility function using strings.Contains() that can work for multiple sub-strings.
Here's an example that returns Boolean (true/false) in case of complete / partial match and the total number of matches:
package main
import (
"fmt"
"strings"
)
func checkSubstrings(str string, subs ...string) (bool, int) {
matches := 0
isCompleteMatch := true
fmt.Printf("String: \"%s\", Substrings: %s\n", str, subs)
for _, sub := range subs {
if strings.Contains(str, sub) {
matches += 1
} else {
isCompleteMatch = false
}
}
return isCompleteMatch, matches
}
func main() {
isCompleteMatch1, matches1 := checkSubstrings("Hello abc, xyz, abc", "abc", "xyz")
fmt.Printf("Test 1: { isCompleteMatch: %t, Matches: %d }\n", isCompleteMatch1, matches1)
fmt.Println()
isCompleteMatch2, matches2 := checkSubstrings("Hello abc, abc", "abc", "xyz")
fmt.Printf("Test 2: { isCompleteMatch: %t, Matches: %d }\n", isCompleteMatch2, matches2)
}
Output:
String: "Hello abc, xyz, abc", Substrings: [abc xyz]
Test 1: { isCompleteMatch: true, Matches: 2 }
String: "Hello abc, abc", Substrings: [abc xyz]
Test 2: { isCompleteMatch: false, Matches: 1 }
Here's the live example: https://play.golang.org/p/Xka0KfBrRD

Another solution would be using a combination of regexp and suffixarray. From the documentation:
Package suffixarray implements substring search in logarithmic time using an in-memory suffix array.
package main
import (
"fmt"
"index/suffixarray"
"regexp"
"strings"
)
func main() {
fmt.Println(contains("first secondthird", "first", "second", "third"))
fmt.Println(contains("first secondthird", "first", "10th"))
}
func contains(str string, subStrs ...string) bool {
if len(subStrs) == 0 {
return true
}
r := regexp.MustCompile(strings.Join(subStrs, "|"))
index := suffixarray.New([]byte(str))
res := index.FindAllIndex(r, -1)
exists := make(map[string]int)
for _, v := range subStrs {
exists[v] = 1
}
for _, pair := range res {
s := str[pair[0]:pair[1]]
exists[s] = exists[s] + 1
}
for _, v := range exists {
if v == 1 {
return false
}
}
return true
}
(In Go Playground)

[H]ow do I check multiple substrings without using strings.Contains() repeatedly?
Not at all. You have to call Contains repeatedly.

Go: Retrieve a string from between two characters or other strings

Let's say for example that I have one string, like this:
<h1>Hello World!</h1>
What Go code would be able to extract Hello World! from that string? I'm still relatively new to Go. Any help is greatly appreciated!

If the string looks like whatever;START;extract;END;whatever you can use this which will get the string in between:
// GetStringInBetween Returns empty string if no start string found
func GetStringInBetween(str string, start string, end string) (result string) {
s := strings.Index(str, start)
if s == -1 {
return
}
s += len(start)
e := strings.Index(str[s:], end)
if e == -1 {
return
}
e += s + e - 1
return str[s:e]
}
What happens here is it will find first index of START, adds length of START string and returns all that exists from there until first index of END.

There are lots of ways to split strings in all programming languages.
Since I don't know what you are especially asking for I provide a sample way to get the output
you want from your sample.
package main
import "strings"
import "fmt"
func main() {
initial := "<h1>Hello World!</h1>"
out := strings.TrimLeft(strings.TrimRight(initial,"</h1>"),"<h1>")
fmt.Println(out)
}
In the above code you trim <h1> from the left of the string and </h1> from the right.
As I said there are hundreds of ways to split specific strings and this is only a sample to get you started.
Hope it helps, Good luck with Golang :)
DB

I improved the Jan Kardaš`s answer.
now you can find string with more than 1 character at the start and end.
func GetStringInBetweenTwoString(str string, startS string, endS string) (result string,found bool) {
s := strings.Index(str, startS)
if s == -1 {
return result,false
}
newS := str[s+len(startS):]
e := strings.Index(newS, endS)
if e == -1 {
return result,false
}
result = newS[:e]
return result,true
}

Here is my answer using regex. Not sure why no one suggested this safest approach
package main
import (
"fmt"
"regexp"
)
func main() {
content := "<h1>Hello World!</h1>"
re := regexp.MustCompile(`<h1>(.*)</h1>`)
match := re.FindStringSubmatch(content)
if len(match) > 1 {
fmt.Println("match found -", match[1])
} else {
fmt.Println("match not found")
}
}
Playground - https://play.golang.org/p/Yc61x1cbZOJ

In the strings pkg you can use the Replacer to great affect.
r := strings.NewReplacer("<h1>", "", "</h1>", "")
fmt.Println(r.Replace("<h1>Hello World!</h1>"))
Go play!

func findInString(str, start, end string) ([]byte, error) {
var match []byte
index := strings.Index(str, start)
if index == -1 {
return match, errors.New("Not found")
}
index += len(start)
for {
char := str[index]
if strings.HasPrefix(str[index:index+len(match)], end) {
break
}
match = append(match, char)
index++
}
return match, nil
}

Read up on the strings package. Have a look into the SplitAfter function which can do something like this:
var sample = "[this][is my][string]"
t := strings.SplitAfter(sample, "[")
That should produce a slice something like: "[", "this][", "is my][", "string]". Using further functions for Trimming you should get your solution. Best of luck.

func Split(str, before, after string) string {
a := strings.SplitAfterN(str, before, 2)
b := strings.SplitAfterN(a[len(a)-1], after, 2)
if 1 == len(b) {
return b[0]
}
return b[0][0:len(b[0])-len(after)]
}
the first call of SplitAfterN will split the original string into array of 2 parts divided by the first found after string, or it will produce array containing 1 part equal to the original string.
second call of SplitAfterN uses a[len(a)-1] as input, as it is "the last item of array a". so either string after after or the original string str. the input will be split into array of 2 parts divided by the first found before string, or it will produce array containing 1 part equal to the input.
if after was not found than we can simply return b[0] as it is equal to a[len(a)-1]
if after is found, it will be included at the end of b[0] string, therefore you have to trim it via b[0][0:len(b[0])-len(after)]
all strings are case sensitive

strings.Split in Go

The file names.txt consists of many names in the form of:
"KELLEE","JOSLYN","JASON","INGER","INDIRA","GLINDA","GLENNIS"
Does anyone know how to split the string so that it is individual names separated by commas?
KELLEE,JOSLYN,JASON,INGER,INDIRA,GLINDA,GLENNIS
The following code splits by comma and leaves quotes around the name, what is the escape character to split out the ". Can it be done in one Split statement, splitting out "," and leaving a comma to separate?
package main
import "fmt"
import "io/ioutil"
import "strings"
func main() {
fData, err := ioutil.ReadFile("names.txt") // read in the external file
if err != nil {
fmt.Println("Err is ", err) // print any error
}
strbuffer := string(fData) // convert read in file to a string
arr := strings.Split(strbuffer, ",")
fmt.Println(arr)
}
By the way, this is part of Project Euler problem # 22. http://projecteuler.net/problem=22

Jeremy's answer is basically correct and does exactly what you have asked for. But the format of your "names.txt" file is actually a well known and is called CSV (comma separated values). Luckily, Go comes with an encoding/csv package (which is part of the standard library) for decoding and encoding such formats easily. In addition to your + Jeremy's solution, this package will also give exact error messages if the format is invalid, supports multi-line records and does proper unquoting of quoted strings.
The basic usage looks like this:
package main
import (
"encoding/csv"
"fmt"
"io"
"os"
)
func main() {
file, err := os.Open("names.txt")
if err != nil {
fmt.Println("Error:", err)
return
}
defer file.Close()
reader := csv.NewReader(file)
for {
record, err := reader.Read()
if err == io.EOF {
break
} else if err != nil {
fmt.Println("Error:", err)
return
}
fmt.Println(record) // record has the type []string
}
}
There is also a ReadAll method that might make your program even shorter, assuming that the whole file fits into the memory.
Update: dystroy has just pointed out that your file has only one line anyway. The CSV reader works well for that too, but the following, less general solution should also be sufficient:
for {
if n, _ := fmt.Fscanf(file, "%q,", &name); n != 1 {
break
}
fmt.Println("name:", name)
}

Split doesn't remove characters from the substrings. Your split is fine you just need to process the slice afterwards with strings.Trim(val, "\"").
for i, val := range arr {
arr[i] = strings.Trim(val, "\"")
}
Now arr will have the leading and trailing "s removed.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Case insensitive string search in golang - string

How do I search through a file for a word in a case insensitive manner? For example If I'm searching for UpdaTe in the file, if the file contains update, the search should pick it and count it as a match.

Related

How can I iterate over each 2 consecutive characters in a string in go?

Split a string at the last occurrence of the separator in golang

Go: how to check if a string contains multiple substrings?

Go: Retrieve a string from between two characters or other strings

strings.Split in Go

Categories

Resources