Golang Determine if String contains a String (with wildcards) - string

With Go, how would you determine if a string contains a certain string that includes wildcards? Example:
We're looking for t*e*s*t (the *'s can be any characters and any length of characters.
Input True: ttttteeeeeeeesttttttt
Input False: tset

Use the regexp package by converting the * in your pattern to the .* of regular expressions.
// wildCardToRegexp converts a wildcard pattern to a regular expression pattern.
func wildCardToRegexp(pattern string) string {
var result strings.Builder
for i, literal := range strings.Split(pattern, "*") {
// Replace * with .*
if i > 0 {
result.WriteString(".*")
}
// Quote any regular expression meta characters in the
// literal text.
result.WriteString(regexp.QuoteMeta(literal))
}
return result.String()
}
Use it like this:
func match(pattern string, value string) bool {
result, _ := regexp.MatchString(wildCardToRegexp(pattern), value)
return result
}
Run it on the Go PlayGround.

Good piece of code. I would offer one minor change. It seems to me that if you're using wildcards, then the absence of wildcards should mean exact match. To accomplish this, I use an early return....
func wildCardToRegexp(pattern string) string {
components := strings.Split(pattern, "*")
if len(components) == 1 {
// if len is 1, there are no *'s, return exact match pattern
return "^" + pattern + "$"
}
var result strings.Builder
for i, literal := range components {
// Replace * with .*
if i > 0 {
result.WriteString(".*")
}
// Quote any regular expression meta characters in the
// literal text.
result.WriteString(regexp.QuoteMeta(literal))
}
return "^" + result.String() + "$"
}
Run it on the Go Playground

Related

How to correctly process a string with escapes in Go?

I am creating a program, which is processing and calculating sizes of open-source repositories and libraries, and saving the data to database for further analysis.
I have an input string: github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1.
Parsed to a format: github.com/\!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1
Then I parse that into a format /home/username/dev/glass/tmp/pkg/mod/github.com/\!azure/go-ansiterm#v0.0.0-20210617225240-d185dfc1b5a1 which is a valid path in my filesystem, where I've downloaded that particular Go Library.
After that, I am passing that path to the gocloc -program (https://github.com/hhatto/gocloc)
And parse the result.
But the issue is, when I am saving that string /home/username/dev/glass/tmp/pkg/mod/github.com/\!azure/go-ansiterm#v0.0.0-20210617225240-d185dfc1b5a1 into a variable, Go actually adds another escape to the string I am saving so it's actually /home/username/dev/glass/tmp/pkg/mod/github.com/\\!azure/go-ansiterm#v0.0.0-20210617225240-d185dfc1b5a1 in memory. (fmt.Println - for example removes that)
Problem is, when I am passing that string as an argument to os/exec, which runs gocloc and that path string, it runs command with two escapes - and that's not a valid path.
Is there any way to work around this? One idea for me is to just a create shell script on what I want to do
This is the function, which parses github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1 to a format github.com/\!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1 - and after thats saved into a variable, and the variable has one more escapes, than it should have.
func parseUrlToVendorDownloadFormat(input string) string {
// Split the input string on the first space character
parts := strings.SplitN(input, " ", 2)
if len(parts) != 2 {
return ""
}
// Split the package name on the '/' character
packageNameParts := strings.Split(parts[0], "/")
// Add the '\!' prefix and lowercase each part of the package name
for i, part := range packageNameParts {
if hasUppercase(part) {
packageNameParts[i] = "\\!" + strings.ToLower(part)
}
}
// Join the modified package name parts with '/' characters
packageName := strings.Join(packageNameParts, "/")
return strings.ReplaceAll(packageName+"#"+parts[1], `\\!`, `\!`)
}
After, string is parsed to a format: /home/username/dev/glass/tmp/pkg/mod/github.com/\!azure/go-ansiterm#v0.0.0-20210617225240-d185dfc1b5a1
that is passed to this function:
// Alternative goCloc - command.
func linesOfCode(dir string) (int, error) {
// Run the `gocloc` command in the specified directory and get the output
cmd := exec.Command("gocloc", dir)
output, err := cmd.Output()
if err != nil {
return 0, err
}
lines, err := parseTotalLines(string(output))
if err != nil {
return 0, err
}
return lines, nil
}
Which uses this parse function:
// Parse from the GoCloc response.
func parseTotalLines(input string) (int, error) {
// Split the input string into lines
lines := strings.Split(input, "\n")
// Find the line containing the "TOTAL" row
var totalLine string
for _, line := range lines {
if strings.Contains(line, "TOTAL") {
totalLine = line
break
}
}
// If the "TOTAL" line was not found, return an error
if totalLine == "" {
return 0, fmt.Errorf("could not find TOTAL line in input")
}
// Split the "TOTAL" line into fields
fields := strings.Fields(totalLine)
// If the "TOTAL" line doesn't have enough fields, return an error
if len(fields) < 4 {
return 0, fmt.Errorf("invalid TOTAL line: not enough fields")
}
// Get the fourth field (the code column)
codeStr := fields[3]
// Remove any commas from the code column
codeStr = strings.Replace(codeStr, ",", "", -1)
// Parse the code column as an integer
code, err := strconv.Atoi(codeStr)
if err != nil {
return 0, err
}
return code, nil
}
What I've tried:
Use gocloc as a library, didn't get it to work.
Use single quotes instead of escapes, didn't get it to work, but I think there might be something.
One way to get around this, might be to create separate shell script and pass the dir to that as an argument, and get rid of the escapes there, I don't know ...
If you want to observe all the source code: https://github.com/haapjari/glass and more specificly, it's the files https://github.com/haapjari/glass/blob/main/pkg/plugins/goplg/plugin.go and function enrichWithLibraryData() and utils functions, which are here: https://github.com/haapjari/glass/blob/main/pkg/plugins/goplg/utils.go (the examples above)
Any ideas? How to proceed? Thanks in advance!
I have an input string: github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1.
Parsed to a format: github.com/\!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1.
Your parser seems to have error. I would expect Azure to become !azure:
github.com/!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1.
Go Modules Reference
To avoid ambiguity when serving from case-insensitive file systems, the $module and $version elements are case-encoded by replacing every uppercase letter with an exclamation mark followed by the corresponding lower-case letter. This allows modules example.com/M and example.com/m to both be stored on disk, since the former is encoded as example.com/!m.

How to replace string in Golang?

I want to replace string except first and last alphabet.
For example:
handsome -> h******e
한국어 -> 한*어
This is my code:
var final = string([]rune(username)[:1]
for i :=0l i <len([]rune(username)); i++{
if i >1 {
final = final + "*"
}
}
If you convert the string to []rune, you can modify that slice and convert it back to string in the end:
func blur(s string) string {
rs := []rune(s)
for i := 1; i < len(rs)-1; i++ {
rs[i] = '*'
}
return string(rs)
}
Testing it:
fmt.Println(blur("handsome"))
fmt.Println(blur("한국어"))
Output (try it on the Go Playground):
h******e
한*어
Note that this blur() function works with strings that have less than 3 characters too, in which case nothing will be blurred.

How to match by regexp 3 and 4 bytes UTF-8

I just want to find 3-byte character in Go using regexp.
But it panic with
regexp: Compile(\x{E29AA4}): error parsing regexp: invalid escape
sequence: \x{E29AA4
func get_words_from(text string) []string {
words := regexp.MustCompile(`\x{E29AA4}`)
return words.FindAllString(text, -1)
}
func main() {
text := "One,ВАПОЛтлдо⚤two ыаплд⚤ы ыапю.ы./\tавt𒀅hr𓀋ee!"
fmt.Println(get_words_from(text))
}
You can try on playground
Decode the UTF-8 byte sequence E2 9A A4 with e.g. utf8.DecodeRune() and use the resulting rune in the regexp:
func get_words_from(text string) []string {
r, _ := utf8.DecodeRune([]byte{0xE2, 0x9A, 0xA4})
words := regexp.MustCompile(string(r))
return words.FindAllString(text, -1)
}
You may also simply convert the byte slice to string (which interprets it as UTF-8 encoded bytes):
func get_words_from2(text string) []string {
s := string([]byte{0xE2, 0x9A, 0xA4})
words := regexp.MustCompile(s)
return words.FindAllString(text, -1)
}
Or use the equivalent unicode code point (which is 0x26a4) in the regexp string:
func get_words_from3(text string) []string {
words := regexp.MustCompile("\u26a4")
return words.FindAllString(text, -1)
}
Note that "\u26a4" is an interpreted string literal and will be unescaped by the Go compiler (not the regexp package).
All examples return (try the examples on the Go Playground):
[⚤ ⚤]
To filter out all runes that have 3 or more bytes in UTF-8, you may use a for range and utf8.RuneLen():
text := "One,ВАПОЛтлдо⚤two ыаплд⚤ы ыапю.ы./\tавt𒀅hr𓀋ee!"
fmt.Println(text)
var out []rune
for _, r := range text {
if utf8.RuneLen(r) < 3 {
out = append(out, r)
}
}
fmt.Println(string(out))
This outputs (try it on the Go Playground):
One,ВАПОЛтлдо⚤two ыаплд⚤ы ыапю.ы./ авt𒀅hr𓀋ee!
One,ВАПОЛтлдоtwo ыаплды ыапю.ы./ авthree!
Or use strings.Map(), where you return -1 for such runes, which then will be left out in the result:
out := strings.Map(func(r rune) rune {
if utf8.RuneLen(r) < 3 {
return r
}
return -1
}, text)
fmt.Println(string(out))
This outputs the same. Try this one on the Go Playground.
Also I found that character ⚤ in regex can match by \xE2\x9A\xA4 instead of wrong: \x{E29AA4}

Swift remove ONLY trailing spaces from string

many examples in SO are fixing both sides, the leading and trailing. My request is only about the trailing.
My input text is: " keep my left side "
Desired output: " keep my left side"
Of course this command will remove both ends:
let cleansed = messageText.trimmingCharacters(in: .whitespacesAndNewlines)
Which won't work for me.
How can I do it?
A quite simple solution is regular expression, the pattern is one or more(+) whitespace characters(\s) at the end of the string($)
let string = " keep my left side "
let cleansed = string.replacingOccurrences(of: "\\s+$",
with: "",
options: .regularExpression)
You can use the rangeOfCharacter function on string with a characterSet. This extension then uses recursion of there are multiple spaces to trim. This will be efficient if you only usually have a small number of spaces.
extension String {
func trailingTrim(_ characterSet : CharacterSet) -> String {
if let range = rangeOfCharacter(from: characterSet, options: [.anchored, .backwards]) {
return self.substring(to: range.lowerBound).trailingTrim(characterSet)
}
return self
}
}
"1234 ".trailingTrim(.whitespaces)
returns
"1234"
Building on vadian's answer I found for Swift 3 at the time of writing that I had to include a range parameter. So:
func trailingTrim(with string : String) -> String {
let start = string.startIndex
let end = string.endIndex
let range: Range<String.Index> = Range<String.Index>(start: start, end: end)
let cleansed:String = string.stringByReplacingOccurrencesOfString("\\s+$",
withString: "",
options: .RegularExpressionSearch,
range: range)
return cleansed
}
Simple. No regular expressions needed.
extension String {
func trimRight() -> String {
let c = reversed().drop(while: { $0.isWhitespace }).reversed()
return String(c)
}
}

Go: Retrieve a string from between two characters or other strings

Let's say for example that I have one string, like this:
<h1>Hello World!</h1>
What Go code would be able to extract Hello World! from that string? I'm still relatively new to Go. Any help is greatly appreciated!
If the string looks like whatever;START;extract;END;whatever you can use this which will get the string in between:
// GetStringInBetween Returns empty string if no start string found
func GetStringInBetween(str string, start string, end string) (result string) {
s := strings.Index(str, start)
if s == -1 {
return
}
s += len(start)
e := strings.Index(str[s:], end)
if e == -1 {
return
}
e += s + e - 1
return str[s:e]
}
What happens here is it will find first index of START, adds length of START string and returns all that exists from there until first index of END.
There are lots of ways to split strings in all programming languages.
Since I don't know what you are especially asking for I provide a sample way to get the output
you want from your sample.
package main
import "strings"
import "fmt"
func main() {
initial := "<h1>Hello World!</h1>"
out := strings.TrimLeft(strings.TrimRight(initial,"</h1>"),"<h1>")
fmt.Println(out)
}
In the above code you trim <h1> from the left of the string and </h1> from the right.
As I said there are hundreds of ways to split specific strings and this is only a sample to get you started.
Hope it helps, Good luck with Golang :)
DB
I improved the Jan Kardaš`s answer.
now you can find string with more than 1 character at the start and end.
func GetStringInBetweenTwoString(str string, startS string, endS string) (result string,found bool) {
s := strings.Index(str, startS)
if s == -1 {
return result,false
}
newS := str[s+len(startS):]
e := strings.Index(newS, endS)
if e == -1 {
return result,false
}
result = newS[:e]
return result,true
}
Here is my answer using regex. Not sure why no one suggested this safest approach
package main
import (
"fmt"
"regexp"
)
func main() {
content := "<h1>Hello World!</h1>"
re := regexp.MustCompile(`<h1>(.*)</h1>`)
match := re.FindStringSubmatch(content)
if len(match) > 1 {
fmt.Println("match found -", match[1])
} else {
fmt.Println("match not found")
}
}
Playground - https://play.golang.org/p/Yc61x1cbZOJ
In the strings pkg you can use the Replacer to great affect.
r := strings.NewReplacer("<h1>", "", "</h1>", "")
fmt.Println(r.Replace("<h1>Hello World!</h1>"))
Go play!
func findInString(str, start, end string) ([]byte, error) {
var match []byte
index := strings.Index(str, start)
if index == -1 {
return match, errors.New("Not found")
}
index += len(start)
for {
char := str[index]
if strings.HasPrefix(str[index:index+len(match)], end) {
break
}
match = append(match, char)
index++
}
return match, nil
}
Read up on the strings package. Have a look into the SplitAfter function which can do something like this:
var sample = "[this][is my][string]"
t := strings.SplitAfter(sample, "[")
That should produce a slice something like: "[", "this][", "is my][", "string]". Using further functions for Trimming you should get your solution. Best of luck.
func Split(str, before, after string) string {
a := strings.SplitAfterN(str, before, 2)
b := strings.SplitAfterN(a[len(a)-1], after, 2)
if 1 == len(b) {
return b[0]
}
return b[0][0:len(b[0])-len(after)]
}
the first call of SplitAfterN will split the original string into array of 2 parts divided by the first found after string, or it will produce array containing 1 part equal to the original string.
second call of SplitAfterN uses a[len(a)-1] as input, as it is "the last item of array a". so either string after after or the original string str. the input will be split into array of 2 parts divided by the first found before string, or it will produce array containing 1 part equal to the input.
if after was not found than we can simply return b[0] as it is equal to a[len(a)-1]
if after is found, it will be included at the end of b[0] string, therefore you have to trim it via b[0][0:len(b[0])-len(after)]
all strings are case sensitive

Resources