How to correctly process a string with escapes in Go?

How to correctly process a string with escapes in Go? - string

I am creating a program, which is processing and calculating sizes of open-source repositories and libraries, and saving the data to database for further analysis.
I have an input string: github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1.
Parsed to a format: github.com/\!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1
Then I parse that into a format /home/username/dev/glass/tmp/pkg/mod/github.com/\!azure/go-ansiterm#v0.0.0-20210617225240-d185dfc1b5a1 which is a valid path in my filesystem, where I've downloaded that particular Go Library.
After that, I am passing that path to the gocloc -program (https://github.com/hhatto/gocloc)
And parse the result.
But the issue is, when I am saving that string /home/username/dev/glass/tmp/pkg/mod/github.com/\!azure/go-ansiterm#v0.0.0-20210617225240-d185dfc1b5a1 into a variable, Go actually adds another escape to the string I am saving so it's actually /home/username/dev/glass/tmp/pkg/mod/github.com/\\!azure/go-ansiterm#v0.0.0-20210617225240-d185dfc1b5a1 in memory. (fmt.Println - for example removes that)
Problem is, when I am passing that string as an argument to os/exec, which runs gocloc and that path string, it runs command with two escapes - and that's not a valid path.
Is there any way to work around this? One idea for me is to just a create shell script on what I want to do
This is the function, which parses github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1 to a format github.com/\!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1 - and after thats saved into a variable, and the variable has one more escapes, than it should have.
func parseUrlToVendorDownloadFormat(input string) string {
// Split the input string on the first space character
parts := strings.SplitN(input, " ", 2)
if len(parts) != 2 {
return ""
}
// Split the package name on the '/' character
packageNameParts := strings.Split(parts[0], "/")
// Add the '\!' prefix and lowercase each part of the package name
for i, part := range packageNameParts {
if hasUppercase(part) {
packageNameParts[i] = "\\!" + strings.ToLower(part)
}
}
// Join the modified package name parts with '/' characters
packageName := strings.Join(packageNameParts, "/")
return strings.ReplaceAll(packageName+"#"+parts[1], `\\!`, `\!`)
}
After, string is parsed to a format: /home/username/dev/glass/tmp/pkg/mod/github.com/\!azure/go-ansiterm#v0.0.0-20210617225240-d185dfc1b5a1
that is passed to this function:
// Alternative goCloc - command.
func linesOfCode(dir string) (int, error) {
// Run the `gocloc` command in the specified directory and get the output
cmd := exec.Command("gocloc", dir)
output, err := cmd.Output()
if err != nil {
return 0, err
}
lines, err := parseTotalLines(string(output))
if err != nil {
return 0, err
}
return lines, nil
}
Which uses this parse function:
// Parse from the GoCloc response.
func parseTotalLines(input string) (int, error) {
// Split the input string into lines
lines := strings.Split(input, "\n")
// Find the line containing the "TOTAL" row
var totalLine string
for _, line := range lines {
if strings.Contains(line, "TOTAL") {
totalLine = line
break
}
}
// If the "TOTAL" line was not found, return an error
if totalLine == "" {
return 0, fmt.Errorf("could not find TOTAL line in input")
}
// Split the "TOTAL" line into fields
fields := strings.Fields(totalLine)
// If the "TOTAL" line doesn't have enough fields, return an error
if len(fields) < 4 {
return 0, fmt.Errorf("invalid TOTAL line: not enough fields")
}
// Get the fourth field (the code column)
codeStr := fields[3]
// Remove any commas from the code column
codeStr = strings.Replace(codeStr, ",", "", -1)
// Parse the code column as an integer
code, err := strconv.Atoi(codeStr)
if err != nil {
return 0, err
}
return code, nil
}
What I've tried:
Use gocloc as a library, didn't get it to work.
Use single quotes instead of escapes, didn't get it to work, but I think there might be something.
One way to get around this, might be to create separate shell script and pass the dir to that as an argument, and get rid of the escapes there, I don't know ...
If you want to observe all the source code: https://github.com/haapjari/glass and more specificly, it's the files https://github.com/haapjari/glass/blob/main/pkg/plugins/goplg/plugin.go and function enrichWithLibraryData() and utils functions, which are here: https://github.com/haapjari/glass/blob/main/pkg/plugins/goplg/utils.go (the examples above)
Any ideas? How to proceed? Thanks in advance!

I have an input string: github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1.
Parsed to a format: github.com/\!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1.
Your parser seems to have error. I would expect Azure to become !azure:
github.com/!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1.
Go Modules Reference
To avoid ambiguity when serving from case-insensitive file systems, the $module and $version elements are case-encoded by replacing every uppercase letter with an exclamation mark followed by the corresponding lower-case letter. This allows modules example.com/M and example.com/m to both be stored on disk, since the former is encoded as example.com/!m.

Related

Go - Is it possible to convert a raw string literal to an interpreted string literal?

Is it possible to convert raw string literals to interpreted string literals in Go? (See language specification)
I have a raw string literal, but I want to print out to the console what one would get with an interpreted string literal—that is, text output formatted using escape sequences.
For example, printing this raw string literal gives
s := `\033[1mString in bold.\033[0m`
println(s) // \033[1mString in bold.\033[0m
but I want the same result one would get with
s := "\033[1mString in bold.\033[0m"
println(s) // String in bold. (In bold)
For context, I am trying to print the contents of a text file that is formatted with escape sequences using
f, _ := := ioutil.ReadFile("file.txt")
println(string(f))
but the output is in the former way.

Use strconv.Unquote():
s := `\033[1mString in bold.\033[0m`
s2, err := strconv.Unquote(`"` + s + `"`)
if err != nil {
panic(err)
}
fmt.Println("normal", s2)
This will output:
normal String in bold.
Note that the string value passed to strconv.Unquote() must contain wrapping double quotes or backticks, and since the source s does not contain the wrapping quotes, I pre- and suffixed those like this:
`"` + s + `"`
See related questions:
How do I make raw unicode encoded content readable?
Golang convert integer to unicode character
How to transform Go string literal code to its value?
How to convert escape characters in HTML tags?

First, if you not have raw string you need to write it as raw string and without changes thing like "\n" and others,and incase you want to return bytes:
s := "\033[1mString in bold.\033[0m"
rune := string([]rune(s))
b := []byte(rune)
f, err := os.OpenFile("filename.txt", os.O_RDWR, 0644)
if err != nil {
return err
}
if _, err := f.Write(b); err != nil {
return err
}
with this approach, the bytes don't change so that the sh265 will be the same, and you return it after reading it from the file with no further changes.
second, for reading the data and printing:
bytes, err := os.ReadFile("filename.txt")
if err != nil {
return err
}
s = strconv.Quote(string(bytes))
fmt.Println(s)
and you gone got "\x1b[1mString in bold.\x1b[0m"

Decode base64 with white space

I have a base64 encoded string i'm trying to decrypt with go. The string contains white spaces which should be ignored.
A sample code I'm trying:
s := "eyJ0aHJlZURTU2VydmVyVHJhbnNJRCI6IjEzZmU3MWQ0LWQxMGQtNDIyMC1hMjE2LTIwMDZkMWRkNGNiOCIsImFjc1RyY++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++W5zSUQiOiJkN2M0NWY5OS05NDc4LTQ0YTYtYjFmMi0xMDAwMDAwMDMzNjYiLCJtZXNzYWdlVHlwZSI6IkNSZXEiLCJtZXNzYWdlVmVyc2lvbiI6IjIuMS4wIiwiY2hhbGxlbmdlV2luZG93U2l6ZSI6IjAyIn0%3D"
out, err := base64.URLEncoding.DecodeString(s)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(string(out))
This code returns:
illegal base64 data at input byte 93
After changing the string padding, and using StdEncoding instead of URLEncoding:
s= strings.Replace(s, "%3D", "=", -1)
out, err := base64.StdEncoding.DecodeString(s)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(string(out))
The output will be:
{"threeDSServerTransID":"13fe71d4-d10d-4220-a216-2006d1dd4cb8","acsTrc���������������������������������������������������������������������������nsID":"d7c45f99-9478-44a6-b1f2-100000003366","messageType":"CReq","messageVersion":"2.1.0","challengeWindowSize":"02"}
How can I decrypt the string correctly?

What you have is most likely "cut off" from a URL, and it is in URL-encoded form. So to get a Base64 string, you have to first decode it, you may use url.PathUnescape() for this.
Once you have the unescaped string, you may decode it using the base64.StdEncoding encoder. Note that just because it is / was part of a URL, that doesn't make it a base64 string that used the alphabet of the URL-safe version of Base64.
Also the + signs in the middle of it are really just "junk". They shouldn't be there in the first place, so double-check how you get your input, but now that they are there, you have to remove them. For that, you may use strings.Replace().
Final code to decode your invalid input:
s := "eyJ0aHJlZURTU2VydmVyVHJhbnNJRCI6IjEzZmU3MWQ0LWQxMGQtNDIyMC1hMjE2LTIwMDZkMWRkNGNiOCIsImFjc1RyY++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++W5zSUQiOiJkN2M0NWY5OS05NDc4LTQ0YTYtYjFmMi0xMDAwMDAwMDMzNjYiLCJtZXNzYWdlVHlwZSI6IkNSZXEiLCJtZXNzYWdlVmVyc2lvbiI6IjIuMS4wIiwiY2hhbGxlbmdlV2luZG93U2l6ZSI6IjAyIn0%3D"
s = strings.Replace(s, "+", "", -1)
var err error
if s, err = url.PathUnescape(s); err != nil {
panic(err)
}
out, err := base64.StdEncoding.DecodeString(s)
if err != nil {
panic(err)
}
fmt.Println(string(out))
Complete output (try it on the Go Playground):
{"threeDSServerTransID":"13fe71d4-d10d-4220-a216-2006d1dd4cb8",
"acsTransID":"d7c45f99-9478-44a6-b1f2-100000003366","messageType":"CReq",
"messageVersion":"2.1.0","challengeWindowSize":"02"}
Note that the + sign is a valid symbol in the alphabet of the standard Base64, and you can even decode the Base64 without removing the + symbols, but then you get junk data remaining in the JSON keys in the result.

The input string has three problems
First the + signs in the middle of it
Second there is garbage (a url encoded +) at the end
Third the string appears to not be valid Base64
To remove the plus signs in the middle, find the index of the start and finish and make a new string
To remove the garbage at the end, terminate the string earlier ( at index 249 of the fixed string)
There is a further problem with the string at index 148 of the fixed string, which I would guess is due to bad data
But the code fragment below shows how to overcome the first two things
package main
import (
"fmt"
"encoding/base64"
"strings"
)
func main() {
s := "eyJ0aHJlZURTU2VydmVyVHJhbnNJRCI6IjEzZmU3MWQ0LWQxMGQtNDIyMC1hMjE2LTIwMDZkMWRkNGNiOCIsImFjc1RyY++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++W5zSUQiOiJkN2M0NWY5OS05NDc4LTQ0YTYtYjFmMi0xMDAwMDAwMDMzNjYiLCJtZXNzYWdlVHlwZSI6IkNSZXEiLCJtZXNzYWdlVmVyc2lvbiI6IjIuMS4wIiwiY2hhbGxlbmdlV2luZG93U2l6ZSI6IjAyIn0%3D"
a:=strings.Index(s,"+")
b:=strings.LastIndex(s,"+")+1
fixed:=s[0:a] + s[b:249]
out, err := base64.StdEncoding.DecodeString(fixed)
if err != nil {
fmt.Println(err)
fmt.Println(fixed)
}
fmt.Println(a,b)
fmt.Println(String(out))
}

Go: Retrieve a string from between two characters or other strings

Let's say for example that I have one string, like this:
<h1>Hello World!</h1>
What Go code would be able to extract Hello World! from that string? I'm still relatively new to Go. Any help is greatly appreciated!

If the string looks like whatever;START;extract;END;whatever you can use this which will get the string in between:
// GetStringInBetween Returns empty string if no start string found
func GetStringInBetween(str string, start string, end string) (result string) {
s := strings.Index(str, start)
if s == -1 {
return
}
s += len(start)
e := strings.Index(str[s:], end)
if e == -1 {
return
}
e += s + e - 1
return str[s:e]
}
What happens here is it will find first index of START, adds length of START string and returns all that exists from there until first index of END.

There are lots of ways to split strings in all programming languages.
Since I don't know what you are especially asking for I provide a sample way to get the output
you want from your sample.
package main
import "strings"
import "fmt"
func main() {
initial := "<h1>Hello World!</h1>"
out := strings.TrimLeft(strings.TrimRight(initial,"</h1>"),"<h1>")
fmt.Println(out)
}
In the above code you trim <h1> from the left of the string and </h1> from the right.
As I said there are hundreds of ways to split specific strings and this is only a sample to get you started.
Hope it helps, Good luck with Golang :)
DB

I improved the Jan Kardaš`s answer.
now you can find string with more than 1 character at the start and end.
func GetStringInBetweenTwoString(str string, startS string, endS string) (result string,found bool) {
s := strings.Index(str, startS)
if s == -1 {
return result,false
}
newS := str[s+len(startS):]
e := strings.Index(newS, endS)
if e == -1 {
return result,false
}
result = newS[:e]
return result,true
}

Here is my answer using regex. Not sure why no one suggested this safest approach
package main
import (
"fmt"
"regexp"
)
func main() {
content := "<h1>Hello World!</h1>"
re := regexp.MustCompile(`<h1>(.*)</h1>`)
match := re.FindStringSubmatch(content)
if len(match) > 1 {
fmt.Println("match found -", match[1])
} else {
fmt.Println("match not found")
}
}
Playground - https://play.golang.org/p/Yc61x1cbZOJ

In the strings pkg you can use the Replacer to great affect.
r := strings.NewReplacer("<h1>", "", "</h1>", "")
fmt.Println(r.Replace("<h1>Hello World!</h1>"))
Go play!

func findInString(str, start, end string) ([]byte, error) {
var match []byte
index := strings.Index(str, start)
if index == -1 {
return match, errors.New("Not found")
}
index += len(start)
for {
char := str[index]
if strings.HasPrefix(str[index:index+len(match)], end) {
break
}
match = append(match, char)
index++
}
return match, nil
}

Read up on the strings package. Have a look into the SplitAfter function which can do something like this:
var sample = "[this][is my][string]"
t := strings.SplitAfter(sample, "[")
That should produce a slice something like: "[", "this][", "is my][", "string]". Using further functions for Trimming you should get your solution. Best of luck.

func Split(str, before, after string) string {
a := strings.SplitAfterN(str, before, 2)
b := strings.SplitAfterN(a[len(a)-1], after, 2)
if 1 == len(b) {
return b[0]
}
return b[0][0:len(b[0])-len(after)]
}
the first call of SplitAfterN will split the original string into array of 2 parts divided by the first found after string, or it will produce array containing 1 part equal to the original string.
second call of SplitAfterN uses a[len(a)-1] as input, as it is "the last item of array a". so either string after after or the original string str. the input will be split into array of 2 parts divided by the first found before string, or it will produce array containing 1 part equal to the input.
if after was not found than we can simply return b[0] as it is equal to a[len(a)-1]
if after is found, it will be included at the end of b[0] string, therefore you have to trim it via b[0][0:len(b[0])-len(after)]
all strings are case sensitive

Case insensitive string search in golang

How do I search through a file for a word in a case insensitive manner?
For example
If I'm searching for UpdaTe in the file, if the file contains update, the search should pick it and count it as a match.

strings.EqualFold() can check if two strings are equal, while ignoring case. It even works with Unicode. See http://golang.org/pkg/strings/#EqualFold for more info.
http://play.golang.org/p/KDdIi8c3Ar
package main
import (
"fmt"
"strings"
)
func main() {
fmt.Println(strings.EqualFold("HELLO", "hello"))
fmt.Println(strings.EqualFold("ÑOÑO", "ñoño"))
}
Both return true.

Presumably the important part of your question is the search, not the part about reading from a file, so I'll just answer that part.
Probably the simplest way to do this is to convert both strings (the one you're searching through and the one that you're searching for) to all upper case or all lower case, and then search. For example:
func CaseInsensitiveContains(s, substr string) bool {
s, substr = strings.ToUpper(s), strings.ToUpper(substr)
return strings.Contains(s, substr)
}
You can see it in action here.

Do not use strings.Contains unless you need exact matching rather than language-correct string searches
None of the current answers are correct unless you are only searching ASCII characters the minority of languages (like english) without certain diaeresis / umlauts or other unicode glyph modifiers (the more "correct" way to define it as mentioned by #snap). The standard google phrase is "searching non-ASCII characters".
For proper support for language searching you need to use http://golang.org/x/text/search.
func SearchForString(str string, substr string) (int, int) {
m := search.New(language.English, search.IgnoreCase)
return = m.IndexString(str, substr)
}
start, end := SearchForString('foobar', 'bar');
if start != -1 && end != -1 {
fmt.Println("found at", start, end);
}
Or if you just want the starting index:
func SearchForStringIndex(str string, substr string) (int, bool) {
m := search.New(language.English, search.IgnoreCase)
start, _ := m.IndexString(str, substr)
if start == -1 {
return 0, false
}
return start, true
}
index, found := SearchForStringIndex('foobar', 'bar');
if found {
fmt.Println("match starts at", index);
}
Search the language.Tag structs here to find the language you wish to search with or use language.Und if you are not sure.
Update
There seems to be some confusion so this following example should help clarify things.
package main
import (
"fmt"
"strings"
"golang.org/x/text/language"
"golang.org/x/text/search"
)
var s = `Æ`
var s2 = `Ä`
func main() {
m := search.New(language.Finnish, search.IgnoreDiacritics)
fmt.Println(m.IndexString(s, s2))
fmt.Println(CaseInsensitiveContains(s, s2))
}
// CaseInsensitiveContains in string
func CaseInsensitiveContains(s, substr string) bool {
s, substr = strings.ToUpper(s), strings.ToUpper(substr)
return strings.Contains(s, substr)
}

If your file is large, you can use regexp and bufio:
//create a regex `(?i)update` will match string contains "update" case insensitive
reg := regexp.MustCompile("(?i)update")
f, err := os.Open("test.txt")
if err != nil {
log.Fatal(err)
}
defer f.Close()
//Do the match operation
//MatchReader function will scan entire file byte by byte until find the match
//use bufio here avoid load enter file into memory
println(reg.MatchReader(bufio.NewReader(f)))
About bufio
The bufio package implements a buffered reader that may be useful both
for its efficiency with many small reads and because of the additional
reading methods it provides.

strings.Split in Go

The file names.txt consists of many names in the form of:
"KELLEE","JOSLYN","JASON","INGER","INDIRA","GLINDA","GLENNIS"
Does anyone know how to split the string so that it is individual names separated by commas?
KELLEE,JOSLYN,JASON,INGER,INDIRA,GLINDA,GLENNIS
The following code splits by comma and leaves quotes around the name, what is the escape character to split out the ". Can it be done in one Split statement, splitting out "," and leaving a comma to separate?
package main
import "fmt"
import "io/ioutil"
import "strings"
func main() {
fData, err := ioutil.ReadFile("names.txt") // read in the external file
if err != nil {
fmt.Println("Err is ", err) // print any error
}
strbuffer := string(fData) // convert read in file to a string
arr := strings.Split(strbuffer, ",")
fmt.Println(arr)
}
By the way, this is part of Project Euler problem # 22. http://projecteuler.net/problem=22

Jeremy's answer is basically correct and does exactly what you have asked for. But the format of your "names.txt" file is actually a well known and is called CSV (comma separated values). Luckily, Go comes with an encoding/csv package (which is part of the standard library) for decoding and encoding such formats easily. In addition to your + Jeremy's solution, this package will also give exact error messages if the format is invalid, supports multi-line records and does proper unquoting of quoted strings.
The basic usage looks like this:
package main
import (
"encoding/csv"
"fmt"
"io"
"os"
)
func main() {
file, err := os.Open("names.txt")
if err != nil {
fmt.Println("Error:", err)
return
}
defer file.Close()
reader := csv.NewReader(file)
for {
record, err := reader.Read()
if err == io.EOF {
break
} else if err != nil {
fmt.Println("Error:", err)
return
}
fmt.Println(record) // record has the type []string
}
}
There is also a ReadAll method that might make your program even shorter, assuming that the whole file fits into the memory.
Update: dystroy has just pointed out that your file has only one line anyway. The CSV reader works well for that too, but the following, less general solution should also be sufficient:
for {
if n, _ := fmt.Fscanf(file, "%q,", &name); n != 1 {
break
}
fmt.Println("name:", name)
}

Split doesn't remove characters from the substrings. Your split is fine you just need to process the slice afterwards with strings.Trim(val, "\"").
for i, val := range arr {
arr[i] = strings.Trim(val, "\"")
}
Now arr will have the leading and trailing "s removed.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to correctly process a string with escapes in Go? - string

Related

Go - Is it possible to convert a raw string literal to an interpreted string literal?

Decode base64 with white space

Go: Retrieve a string from between two characters or other strings

Case insensitive string search in golang

strings.Split in Go

Categories

Resources