Go: how to check if a string contains multiple substrings? - string

strings.Contains(str_to_check, substr) takes only one argument as the substring to check, how do I check multiple substrings without using strings.Contains() repeatedly?
eg. strings.Contains(str_to_check, substr1, substr2)

Yes, you can do this without calling strings.Contains() multiple times.
If you know substrings in advance the easiest way to check this with regular expression. And if a string to check is long and you have quite a few substrings it can be more fast then calling multiple strings.Contains
Example https://play.golang.org/p/7PokxbOOo7:
package main
import (
"fmt"
"regexp"
)
var re = regexp.MustCompile(`first|second|third`)
func main() {
fmt.Println(re.MatchString("This is the first example"))
fmt.Println(re.MatchString("This is the second example after first"))
fmt.Println(re.MatchString("This is the third example"))
fmt.Println(re.MatchString("This is the forth example"))
}
Output:
true
true
true
false
If the subs to check are dynamic it may be a bit more difficult to create regex as you need to escape special characters and regex compilation is not fast so strings.Contains() may be better in this case though it's better test if your code is performance critical.
Another good option could be to write your own scanner that can leverage common prefixes in substrings (if any) using prefix tree.

You can write your own utility function using strings.Contains() that can work for multiple sub-strings.
Here's an example that returns Boolean (true/false) in case of complete / partial match and the total number of matches:
package main
import (
"fmt"
"strings"
)
func checkSubstrings(str string, subs ...string) (bool, int) {
matches := 0
isCompleteMatch := true
fmt.Printf("String: \"%s\", Substrings: %s\n", str, subs)
for _, sub := range subs {
if strings.Contains(str, sub) {
matches += 1
} else {
isCompleteMatch = false
}
}
return isCompleteMatch, matches
}
func main() {
isCompleteMatch1, matches1 := checkSubstrings("Hello abc, xyz, abc", "abc", "xyz")
fmt.Printf("Test 1: { isCompleteMatch: %t, Matches: %d }\n", isCompleteMatch1, matches1)
fmt.Println()
isCompleteMatch2, matches2 := checkSubstrings("Hello abc, abc", "abc", "xyz")
fmt.Printf("Test 2: { isCompleteMatch: %t, Matches: %d }\n", isCompleteMatch2, matches2)
}
Output:
String: "Hello abc, xyz, abc", Substrings: [abc xyz]
Test 1: { isCompleteMatch: true, Matches: 2 }
String: "Hello abc, abc", Substrings: [abc xyz]
Test 2: { isCompleteMatch: false, Matches: 1 }
Here's the live example: https://play.golang.org/p/Xka0KfBrRD

Another solution would be using a combination of regexp and suffixarray. From the documentation:
Package suffixarray implements substring search in logarithmic time using an in-memory suffix array.
package main
import (
"fmt"
"index/suffixarray"
"regexp"
"strings"
)
func main() {
fmt.Println(contains("first secondthird", "first", "second", "third"))
fmt.Println(contains("first secondthird", "first", "10th"))
}
func contains(str string, subStrs ...string) bool {
if len(subStrs) == 0 {
return true
}
r := regexp.MustCompile(strings.Join(subStrs, "|"))
index := suffixarray.New([]byte(str))
res := index.FindAllIndex(r, -1)
exists := make(map[string]int)
for _, v := range subStrs {
exists[v] = 1
}
for _, pair := range res {
s := str[pair[0]:pair[1]]
exists[s] = exists[s] + 1
}
for _, v := range exists {
if v == 1 {
return false
}
}
return true
}
(In Go Playground)

[H]ow do I check multiple substrings without using strings.Contains() repeatedly?
Not at all. You have to call Contains repeatedly.

Related

Split a string at the last occurrence of the separator in golang

I am trying to split a string by the last occurrence of a separator (/) in golang
Example, I have a string "a/b/c/d", after performing the split, I would like an array of string as below
[
"a/b/c",
"a/b"
"a"
]
I tried exploring strings package but couldn't find any function that does this
func main() {
fmt.Printf("%q\n", strings.Split("a/b/c/d/e", "/"))
}
May I know a way to handle this?
To split any string only at the last occurrence, using strings.LastIndex
import (
"fmt"
"strings"
)
func main() {
x := "a_ab_daqe_sd_ew"
lastInd := strings.LastIndex(x, "_")
fmt.Println(x[:lastInd]) // o/p: a_ab_daqe_sd
fmt.Println(x[lastInd+1:]) // o/p: ew
}
Note, strings.LastIndex returns -1 if substring passed(in above example, "_") is not found
Since this is for path operations, and it looks like you don't want the trailing path separator, then path.Dir does what you're looking for:
fmt.Println(path.Dir("a/b/c/d/e"))
// a/b/c/d
If this is specifically for filesystem paths, you will want to use the filepath package instead, to properly handle multiple path separators.
Here's a simple function that uses filepath.Dir(string) to build a list of all ancestor directories of a given filepath:
func main() {
fmt.Printf("OK: %#v\n", parentsOf("a/b/c/d"))
// OK: []string{"a/b/c", "a/b", "a"}
}
func parentsOf(s string) []string {
dirs := []string{}
for {
parent := filepath.Dir(s)
if parent == "." || parent == "/" {
break
}
dirs = append(dirs, parent)
s = parent
}
return dirs
}

Counting the occurrence of one or more substrings in a string

I know that for counting the occurrence of one substring I can use "strings.Count(, )". What if I want to count the number of occurrences of substring1 OR substring2? Is there a more elegant way than writing another new line with strings.count()?
Use a regular expression:
https://play.golang.org/p/xMsHIYKtkQ
aORb := regexp.MustCompile("A|B")
matches := aORb.FindAllStringIndex("A B C B A", -1)
fmt.Println(len(matches))
Another way to do substring matching is with the suffixarray package. Here is an example of matching multiple patterns:
package main
import (
"fmt"
"index/suffixarray"
"regexp"
)
func main() {
r := regexp.MustCompile("an")
index := suffixarray.New([]byte("banana"))
results := index.FindAllIndex(r, -1)
fmt.Println(len(results))
}
You can also match a single substring with the Lookup function.
If you want to count the number of matches in a large string, without allocating space for all the indices just to get the length and then throwing them away, you can use Regexp.FindStringIndex in a loop to match against successive substrings:
func countMatches(s string, re *regexp.Regexp) int {
total := 0
for start := 0; start < len(s); {
remaining := s[start:] // slicing the string is cheap
loc := re.FindStringIndex(remaining)
if loc == nil {
break
}
// loc[0] is the start index of the match,
// loc[1] is the end index (exclusive)
start += loc[1]
total++
}
return total
}
func main() {
s := "abracadabra"
fmt.Println(countMatches(s, regexp.MustCompile(`a|b`)))
}
runnable example at Go Playground

Go: Retrieve a string from between two characters or other strings

Let's say for example that I have one string, like this:
<h1>Hello World!</h1>
What Go code would be able to extract Hello World! from that string? I'm still relatively new to Go. Any help is greatly appreciated!
If the string looks like whatever;START;extract;END;whatever you can use this which will get the string in between:
// GetStringInBetween Returns empty string if no start string found
func GetStringInBetween(str string, start string, end string) (result string) {
s := strings.Index(str, start)
if s == -1 {
return
}
s += len(start)
e := strings.Index(str[s:], end)
if e == -1 {
return
}
e += s + e - 1
return str[s:e]
}
What happens here is it will find first index of START, adds length of START string and returns all that exists from there until first index of END.
There are lots of ways to split strings in all programming languages.
Since I don't know what you are especially asking for I provide a sample way to get the output
you want from your sample.
package main
import "strings"
import "fmt"
func main() {
initial := "<h1>Hello World!</h1>"
out := strings.TrimLeft(strings.TrimRight(initial,"</h1>"),"<h1>")
fmt.Println(out)
}
In the above code you trim <h1> from the left of the string and </h1> from the right.
As I said there are hundreds of ways to split specific strings and this is only a sample to get you started.
Hope it helps, Good luck with Golang :)
DB
I improved the Jan Kardaš`s answer.
now you can find string with more than 1 character at the start and end.
func GetStringInBetweenTwoString(str string, startS string, endS string) (result string,found bool) {
s := strings.Index(str, startS)
if s == -1 {
return result,false
}
newS := str[s+len(startS):]
e := strings.Index(newS, endS)
if e == -1 {
return result,false
}
result = newS[:e]
return result,true
}
Here is my answer using regex. Not sure why no one suggested this safest approach
package main
import (
"fmt"
"regexp"
)
func main() {
content := "<h1>Hello World!</h1>"
re := regexp.MustCompile(`<h1>(.*)</h1>`)
match := re.FindStringSubmatch(content)
if len(match) > 1 {
fmt.Println("match found -", match[1])
} else {
fmt.Println("match not found")
}
}
Playground - https://play.golang.org/p/Yc61x1cbZOJ
In the strings pkg you can use the Replacer to great affect.
r := strings.NewReplacer("<h1>", "", "</h1>", "")
fmt.Println(r.Replace("<h1>Hello World!</h1>"))
Go play!
func findInString(str, start, end string) ([]byte, error) {
var match []byte
index := strings.Index(str, start)
if index == -1 {
return match, errors.New("Not found")
}
index += len(start)
for {
char := str[index]
if strings.HasPrefix(str[index:index+len(match)], end) {
break
}
match = append(match, char)
index++
}
return match, nil
}
Read up on the strings package. Have a look into the SplitAfter function which can do something like this:
var sample = "[this][is my][string]"
t := strings.SplitAfter(sample, "[")
That should produce a slice something like: "[", "this][", "is my][", "string]". Using further functions for Trimming you should get your solution. Best of luck.
func Split(str, before, after string) string {
a := strings.SplitAfterN(str, before, 2)
b := strings.SplitAfterN(a[len(a)-1], after, 2)
if 1 == len(b) {
return b[0]
}
return b[0][0:len(b[0])-len(after)]
}
the first call of SplitAfterN will split the original string into array of 2 parts divided by the first found after string, or it will produce array containing 1 part equal to the original string.
second call of SplitAfterN uses a[len(a)-1] as input, as it is "the last item of array a". so either string after after or the original string str. the input will be split into array of 2 parts divided by the first found before string, or it will produce array containing 1 part equal to the input.
if after was not found than we can simply return b[0] as it is equal to a[len(a)-1]
if after is found, it will be included at the end of b[0] string, therefore you have to trim it via b[0][0:len(b[0])-len(after)]
all strings are case sensitive

Case insensitive string search in golang

How do I search through a file for a word in a case insensitive manner?
For example
If I'm searching for UpdaTe in the file, if the file contains update, the search should pick it and count it as a match.
strings.EqualFold() can check if two strings are equal, while ignoring case. It even works with Unicode. See http://golang.org/pkg/strings/#EqualFold for more info.
http://play.golang.org/p/KDdIi8c3Ar
package main
import (
"fmt"
"strings"
)
func main() {
fmt.Println(strings.EqualFold("HELLO", "hello"))
fmt.Println(strings.EqualFold("ÑOÑO", "ñoño"))
}
Both return true.
Presumably the important part of your question is the search, not the part about reading from a file, so I'll just answer that part.
Probably the simplest way to do this is to convert both strings (the one you're searching through and the one that you're searching for) to all upper case or all lower case, and then search. For example:
func CaseInsensitiveContains(s, substr string) bool {
s, substr = strings.ToUpper(s), strings.ToUpper(substr)
return strings.Contains(s, substr)
}
You can see it in action here.
Do not use strings.Contains unless you need exact matching rather than language-correct string searches
None of the current answers are correct unless you are only searching ASCII characters the minority of languages (like english) without certain diaeresis / umlauts or other unicode glyph modifiers (the more "correct" way to define it as mentioned by #snap). The standard google phrase is "searching non-ASCII characters".
For proper support for language searching you need to use http://golang.org/x/text/search.
func SearchForString(str string, substr string) (int, int) {
m := search.New(language.English, search.IgnoreCase)
return = m.IndexString(str, substr)
}
start, end := SearchForString('foobar', 'bar');
if start != -1 && end != -1 {
fmt.Println("found at", start, end);
}
Or if you just want the starting index:
func SearchForStringIndex(str string, substr string) (int, bool) {
m := search.New(language.English, search.IgnoreCase)
start, _ := m.IndexString(str, substr)
if start == -1 {
return 0, false
}
return start, true
}
index, found := SearchForStringIndex('foobar', 'bar');
if found {
fmt.Println("match starts at", index);
}
Search the language.Tag structs here to find the language you wish to search with or use language.Und if you are not sure.
Update
There seems to be some confusion so this following example should help clarify things.
package main
import (
"fmt"
"strings"
"golang.org/x/text/language"
"golang.org/x/text/search"
)
var s = `Æ`
var s2 = `Ä`
func main() {
m := search.New(language.Finnish, search.IgnoreDiacritics)
fmt.Println(m.IndexString(s, s2))
fmt.Println(CaseInsensitiveContains(s, s2))
}
// CaseInsensitiveContains in string
func CaseInsensitiveContains(s, substr string) bool {
s, substr = strings.ToUpper(s), strings.ToUpper(substr)
return strings.Contains(s, substr)
}
If your file is large, you can use regexp and bufio:
//create a regex `(?i)update` will match string contains "update" case insensitive
reg := regexp.MustCompile("(?i)update")
f, err := os.Open("test.txt")
if err != nil {
log.Fatal(err)
}
defer f.Close()
//Do the match operation
//MatchReader function will scan entire file byte by byte until find the match
//use bufio here avoid load enter file into memory
println(reg.MatchReader(bufio.NewReader(f)))
About bufio
The bufio package implements a buffered reader that may be useful both
for its efficiency with many small reads and because of the additional
reading methods it provides.

How to convert an int value to string in Go?

i := 123
s := string(i)
s is 'E', but what I want is "123"
Please tell me how can I get "123".
And in Java, I can do in this way:
String s = "ab" + "c" // s is "abc"
how can I concat two strings in Go?
Use the strconv package's Itoa function.
For example:
package main
import (
"strconv"
"fmt"
)
func main() {
t := strconv.Itoa(123)
fmt.Println(t)
}
You can concat strings simply by +'ing them, or by using the Join function of the strings package.
fmt.Sprintf("%v",value);
If you know the specific type of value use the corresponding formatter for example %d for int
More info - fmt
fmt.Sprintf, strconv.Itoa and strconv.FormatInt will do the job. But Sprintf will use the package reflect, and it will allocate one more object, so it's not an efficient choice.
It is interesting to note that strconv.Itoa is shorthand for
func FormatInt(i int64, base int) string
with base 10
For Example:
strconv.Itoa(123)
is equivalent to
strconv.FormatInt(int64(123), 10)
You can use fmt.Sprintf or strconv.FormatFloat
For example
package main
import (
"fmt"
)
func main() {
val := 14.7
s := fmt.Sprintf("%f", val)
fmt.Println(s)
}
In this case both strconv and fmt.Sprintf do the same job but using the strconv package's Itoa function is the best choice, because fmt.Sprintf allocate one more object during conversion.
check the benchmark here: https://gist.github.com/evalphobia/caee1602969a640a4530
see https://play.golang.org/p/hlaz_rMa0D for example.
Converting int64:
n := int64(32)
str := strconv.FormatInt(n, 10)
fmt.Println(str)
// Prints "32"
Another option:
package main
import "fmt"
func main() {
n := 123
s := fmt.Sprint(n)
fmt.Println(s == "123")
}
https://golang.org/pkg/fmt#Sprint
ok,most of them have shown you something good.
Let'me give you this:
// ToString Change arg to string
func ToString(arg interface{}, timeFormat ...string) string {
if len(timeFormat) > 1 {
log.SetFlags(log.Llongfile | log.LstdFlags)
log.Println(errors.New(fmt.Sprintf("timeFormat's length should be one")))
}
var tmp = reflect.Indirect(reflect.ValueOf(arg)).Interface()
switch v := tmp.(type) {
case int:
return strconv.Itoa(v)
case int8:
return strconv.FormatInt(int64(v), 10)
case int16:
return strconv.FormatInt(int64(v), 10)
case int32:
return strconv.FormatInt(int64(v), 10)
case int64:
return strconv.FormatInt(v, 10)
case string:
return v
case float32:
return strconv.FormatFloat(float64(v), 'f', -1, 32)
case float64:
return strconv.FormatFloat(v, 'f', -1, 64)
case time.Time:
if len(timeFormat) == 1 {
return v.Format(timeFormat[0])
}
return v.Format("2006-01-02 15:04:05")
case jsoncrack.Time:
if len(timeFormat) == 1 {
return v.Time().Format(timeFormat[0])
}
return v.Time().Format("2006-01-02 15:04:05")
case fmt.Stringer:
return v.String()
case reflect.Value:
return ToString(v.Interface(), timeFormat...)
default:
return ""
}
}
package main
import (
"fmt"
"strconv"
)
func main(){
//First question: how to get int string?
intValue := 123
// keeping it in separate variable :
strValue := strconv.Itoa(intValue)
fmt.Println(strValue)
//Second question: how to concat two strings?
firstStr := "ab"
secondStr := "c"
s := firstStr + secondStr
fmt.Println(s)
}

Resources