Go: Retrieve a string from between two characters or other strings - string

Let's say for example that I have one string, like this:
<h1>Hello World!</h1>
What Go code would be able to extract Hello World! from that string? I'm still relatively new to Go. Any help is greatly appreciated!

If the string looks like whatever;START;extract;END;whatever you can use this which will get the string in between:
// GetStringInBetween Returns empty string if no start string found
func GetStringInBetween(str string, start string, end string) (result string) {
s := strings.Index(str, start)
if s == -1 {
return
}
s += len(start)
e := strings.Index(str[s:], end)
if e == -1 {
return
}
e += s + e - 1
return str[s:e]
}
What happens here is it will find first index of START, adds length of START string and returns all that exists from there until first index of END.

There are lots of ways to split strings in all programming languages.
Since I don't know what you are especially asking for I provide a sample way to get the output
you want from your sample.
package main
import "strings"
import "fmt"
func main() {
initial := "<h1>Hello World!</h1>"
out := strings.TrimLeft(strings.TrimRight(initial,"</h1>"),"<h1>")
fmt.Println(out)
}
In the above code you trim <h1> from the left of the string and </h1> from the right.
As I said there are hundreds of ways to split specific strings and this is only a sample to get you started.
Hope it helps, Good luck with Golang :)
DB

I improved the Jan Kardaš`s answer.
now you can find string with more than 1 character at the start and end.
func GetStringInBetweenTwoString(str string, startS string, endS string) (result string,found bool) {
s := strings.Index(str, startS)
if s == -1 {
return result,false
}
newS := str[s+len(startS):]
e := strings.Index(newS, endS)
if e == -1 {
return result,false
}
result = newS[:e]
return result,true
}

Here is my answer using regex. Not sure why no one suggested this safest approach
package main
import (
"fmt"
"regexp"
)
func main() {
content := "<h1>Hello World!</h1>"
re := regexp.MustCompile(`<h1>(.*)</h1>`)
match := re.FindStringSubmatch(content)
if len(match) > 1 {
fmt.Println("match found -", match[1])
} else {
fmt.Println("match not found")
}
}
Playground - https://play.golang.org/p/Yc61x1cbZOJ

In the strings pkg you can use the Replacer to great affect.
r := strings.NewReplacer("<h1>", "", "</h1>", "")
fmt.Println(r.Replace("<h1>Hello World!</h1>"))
Go play!

func findInString(str, start, end string) ([]byte, error) {
var match []byte
index := strings.Index(str, start)
if index == -1 {
return match, errors.New("Not found")
}
index += len(start)
for {
char := str[index]
if strings.HasPrefix(str[index:index+len(match)], end) {
break
}
match = append(match, char)
index++
}
return match, nil
}

Read up on the strings package. Have a look into the SplitAfter function which can do something like this:
var sample = "[this][is my][string]"
t := strings.SplitAfter(sample, "[")
That should produce a slice something like: "[", "this][", "is my][", "string]". Using further functions for Trimming you should get your solution. Best of luck.

func Split(str, before, after string) string {
a := strings.SplitAfterN(str, before, 2)
b := strings.SplitAfterN(a[len(a)-1], after, 2)
if 1 == len(b) {
return b[0]
}
return b[0][0:len(b[0])-len(after)]
}
the first call of SplitAfterN will split the original string into array of 2 parts divided by the first found after string, or it will produce array containing 1 part equal to the original string.
second call of SplitAfterN uses a[len(a)-1] as input, as it is "the last item of array a". so either string after after or the original string str. the input will be split into array of 2 parts divided by the first found before string, or it will produce array containing 1 part equal to the input.
if after was not found than we can simply return b[0] as it is equal to a[len(a)-1]
if after is found, it will be included at the end of b[0] string, therefore you have to trim it via b[0][0:len(b[0])-len(after)]
all strings are case sensitive

Related

Replace a string at the beginning and end only

I want to replace first a and last a but not the a in the abcd.
Currently, it returns hello hellobcd hello but I would like to have hello abcd hello
The reason I use ReplaceAll is that I don't necessarily know the position and amount of how a appears. But if a in a combined string, I don't want to replace it, but just leave it as it is. What will be a solution in Go to solve this problem?
package main
import (
"fmt"
"strings"
)
func main() {
item := "hello"
test := "a abcd a"
s := "a"
item = strings.ReplaceAll(test, s, item)
fmt.Println(item)
}
Output:
hello hellobcd hello
Playground:
https://play.golang.org/p/D1VzvipblKu
You could use a regular expression:
package main
import (
"fmt"
"regexp"
)
func main() {
regexForA := regexp.MustCompile(`\ba\b`)
test := "a abcd a"
output := regexForA.ReplaceAllLiteralString(test,`hello`)
fmt.Println(output)
}
Output:
hello abcd hello
Playground link to run this example: https://play.golang.org/p/Cs99TrQDUtK
You can split the problem into two: first take the string from the start and do a plain replace with a limit of one. That will give you:
hello abcd a
Then you reverse the string and do the same thing. You can for sure do it a lot more optimized, but that will get the job done and is still quite readable:
func Reverse(s string) (result string) {
for _,v := range s {
result = string(v) + result
}
return
}
func main() {
item := "hello"
test := "a abcd a"
s := "a"
result := strings.Replace(test, s, item, 1)
result = Reverse(strings.Replace(Reverse(result), s, Reverse(item), 1))
fmt.Println(result)
}
prints
hello abcd hello
Note you need to reverse the first input string, your replacement and then the result to get it back into the right order.
playground link: https://play.golang.org/p/tuzxehhnEDu
If I were doing such thing, I'd just split string (strings.Fields) and then replace basing on string comparison, pseudocode:
for i, field := range fields {
if field == "a" {
fields[i] = "hello"
}
}
I wrote it on phone and never tested, but You should understand a concept :)

Go: how to check if a string contains multiple substrings?

strings.Contains(str_to_check, substr) takes only one argument as the substring to check, how do I check multiple substrings without using strings.Contains() repeatedly?
eg. strings.Contains(str_to_check, substr1, substr2)
Yes, you can do this without calling strings.Contains() multiple times.
If you know substrings in advance the easiest way to check this with regular expression. And if a string to check is long and you have quite a few substrings it can be more fast then calling multiple strings.Contains
Example https://play.golang.org/p/7PokxbOOo7:
package main
import (
"fmt"
"regexp"
)
var re = regexp.MustCompile(`first|second|third`)
func main() {
fmt.Println(re.MatchString("This is the first example"))
fmt.Println(re.MatchString("This is the second example after first"))
fmt.Println(re.MatchString("This is the third example"))
fmt.Println(re.MatchString("This is the forth example"))
}
Output:
true
true
true
false
If the subs to check are dynamic it may be a bit more difficult to create regex as you need to escape special characters and regex compilation is not fast so strings.Contains() may be better in this case though it's better test if your code is performance critical.
Another good option could be to write your own scanner that can leverage common prefixes in substrings (if any) using prefix tree.
You can write your own utility function using strings.Contains() that can work for multiple sub-strings.
Here's an example that returns Boolean (true/false) in case of complete / partial match and the total number of matches:
package main
import (
"fmt"
"strings"
)
func checkSubstrings(str string, subs ...string) (bool, int) {
matches := 0
isCompleteMatch := true
fmt.Printf("String: \"%s\", Substrings: %s\n", str, subs)
for _, sub := range subs {
if strings.Contains(str, sub) {
matches += 1
} else {
isCompleteMatch = false
}
}
return isCompleteMatch, matches
}
func main() {
isCompleteMatch1, matches1 := checkSubstrings("Hello abc, xyz, abc", "abc", "xyz")
fmt.Printf("Test 1: { isCompleteMatch: %t, Matches: %d }\n", isCompleteMatch1, matches1)
fmt.Println()
isCompleteMatch2, matches2 := checkSubstrings("Hello abc, abc", "abc", "xyz")
fmt.Printf("Test 2: { isCompleteMatch: %t, Matches: %d }\n", isCompleteMatch2, matches2)
}
Output:
String: "Hello abc, xyz, abc", Substrings: [abc xyz]
Test 1: { isCompleteMatch: true, Matches: 2 }
String: "Hello abc, abc", Substrings: [abc xyz]
Test 2: { isCompleteMatch: false, Matches: 1 }
Here's the live example: https://play.golang.org/p/Xka0KfBrRD
Another solution would be using a combination of regexp and suffixarray. From the documentation:
Package suffixarray implements substring search in logarithmic time using an in-memory suffix array.
package main
import (
"fmt"
"index/suffixarray"
"regexp"
"strings"
)
func main() {
fmt.Println(contains("first secondthird", "first", "second", "third"))
fmt.Println(contains("first secondthird", "first", "10th"))
}
func contains(str string, subStrs ...string) bool {
if len(subStrs) == 0 {
return true
}
r := regexp.MustCompile(strings.Join(subStrs, "|"))
index := suffixarray.New([]byte(str))
res := index.FindAllIndex(r, -1)
exists := make(map[string]int)
for _, v := range subStrs {
exists[v] = 1
}
for _, pair := range res {
s := str[pair[0]:pair[1]]
exists[s] = exists[s] + 1
}
for _, v := range exists {
if v == 1 {
return false
}
}
return true
}
(In Go Playground)
[H]ow do I check multiple substrings without using strings.Contains() repeatedly?
Not at all. You have to call Contains repeatedly.

Counting the occurrence of one or more substrings in a string

I know that for counting the occurrence of one substring I can use "strings.Count(, )". What if I want to count the number of occurrences of substring1 OR substring2? Is there a more elegant way than writing another new line with strings.count()?
Use a regular expression:
https://play.golang.org/p/xMsHIYKtkQ
aORb := regexp.MustCompile("A|B")
matches := aORb.FindAllStringIndex("A B C B A", -1)
fmt.Println(len(matches))
Another way to do substring matching is with the suffixarray package. Here is an example of matching multiple patterns:
package main
import (
"fmt"
"index/suffixarray"
"regexp"
)
func main() {
r := regexp.MustCompile("an")
index := suffixarray.New([]byte("banana"))
results := index.FindAllIndex(r, -1)
fmt.Println(len(results))
}
You can also match a single substring with the Lookup function.
If you want to count the number of matches in a large string, without allocating space for all the indices just to get the length and then throwing them away, you can use Regexp.FindStringIndex in a loop to match against successive substrings:
func countMatches(s string, re *regexp.Regexp) int {
total := 0
for start := 0; start < len(s); {
remaining := s[start:] // slicing the string is cheap
loc := re.FindStringIndex(remaining)
if loc == nil {
break
}
// loc[0] is the start index of the match,
// loc[1] is the end index (exclusive)
start += loc[1]
total++
}
return total
}
func main() {
s := "abracadabra"
fmt.Println(countMatches(s, regexp.MustCompile(`a|b`)))
}
runnable example at Go Playground

Case insensitive string search in golang

How do I search through a file for a word in a case insensitive manner?
For example
If I'm searching for UpdaTe in the file, if the file contains update, the search should pick it and count it as a match.
strings.EqualFold() can check if two strings are equal, while ignoring case. It even works with Unicode. See http://golang.org/pkg/strings/#EqualFold for more info.
http://play.golang.org/p/KDdIi8c3Ar
package main
import (
"fmt"
"strings"
)
func main() {
fmt.Println(strings.EqualFold("HELLO", "hello"))
fmt.Println(strings.EqualFold("ÑOÑO", "ñoño"))
}
Both return true.
Presumably the important part of your question is the search, not the part about reading from a file, so I'll just answer that part.
Probably the simplest way to do this is to convert both strings (the one you're searching through and the one that you're searching for) to all upper case or all lower case, and then search. For example:
func CaseInsensitiveContains(s, substr string) bool {
s, substr = strings.ToUpper(s), strings.ToUpper(substr)
return strings.Contains(s, substr)
}
You can see it in action here.
Do not use strings.Contains unless you need exact matching rather than language-correct string searches
None of the current answers are correct unless you are only searching ASCII characters the minority of languages (like english) without certain diaeresis / umlauts or other unicode glyph modifiers (the more "correct" way to define it as mentioned by #snap). The standard google phrase is "searching non-ASCII characters".
For proper support for language searching you need to use http://golang.org/x/text/search.
func SearchForString(str string, substr string) (int, int) {
m := search.New(language.English, search.IgnoreCase)
return = m.IndexString(str, substr)
}
start, end := SearchForString('foobar', 'bar');
if start != -1 && end != -1 {
fmt.Println("found at", start, end);
}
Or if you just want the starting index:
func SearchForStringIndex(str string, substr string) (int, bool) {
m := search.New(language.English, search.IgnoreCase)
start, _ := m.IndexString(str, substr)
if start == -1 {
return 0, false
}
return start, true
}
index, found := SearchForStringIndex('foobar', 'bar');
if found {
fmt.Println("match starts at", index);
}
Search the language.Tag structs here to find the language you wish to search with or use language.Und if you are not sure.
Update
There seems to be some confusion so this following example should help clarify things.
package main
import (
"fmt"
"strings"
"golang.org/x/text/language"
"golang.org/x/text/search"
)
var s = `Æ`
var s2 = `Ä`
func main() {
m := search.New(language.Finnish, search.IgnoreDiacritics)
fmt.Println(m.IndexString(s, s2))
fmt.Println(CaseInsensitiveContains(s, s2))
}
// CaseInsensitiveContains in string
func CaseInsensitiveContains(s, substr string) bool {
s, substr = strings.ToUpper(s), strings.ToUpper(substr)
return strings.Contains(s, substr)
}
If your file is large, you can use regexp and bufio:
//create a regex `(?i)update` will match string contains "update" case insensitive
reg := regexp.MustCompile("(?i)update")
f, err := os.Open("test.txt")
if err != nil {
log.Fatal(err)
}
defer f.Close()
//Do the match operation
//MatchReader function will scan entire file byte by byte until find the match
//use bufio here avoid load enter file into memory
println(reg.MatchReader(bufio.NewReader(f)))
About bufio
The bufio package implements a buffered reader that may be useful both
for its efficiency with many small reads and because of the additional
reading methods it provides.

Position in characters of a substring in Go

How can I know the position of a substring in a string, in characteres (or runes) instead of bytes?
strings.Index(s, sub) will give the position in bytes. When using Unicode, it doesn't match the position in runes: http://play.golang.org/p/DnlFjPaD2j
func main() {
s := "áéíóúÁÉÍÓÚ"
fmt.Println(strings.Index(s, "ÍÓ"))
}
Result: 14. Expected: 7
Of course, I could convert s and sub to []rune and look for the subslice manually, but is there a better way to do it?
Related to this, to get the first n characters of a string I'm doing this: string([]rune(s)[:n]). Is it the best way?
You can do it like this, after importing the unicode/utf8 package:
func main() {
s := "áéíóúÁÉÍÓÚ"
i := strings.Index(s, "ÍÓ")
fmt.Println(utf8.RuneCountInString(s[:i]))
}
http://play.golang.org/p/Etszu3rbY3
Another option:
package main
import "strings"
func runeIndex(s, substr string) int {
n := strings.Index(s, substr)
if n == -1 { return -1 }
r := []rune(s[:n])
return len(r)
}
func main() {
n := runeIndex("áéíóúÁÉÍÓÚ", "ÍÓ")
println(n == 7)
}

Resources