I want to remove the last 4 characters from a string, so "test.txt" becomes "test".
package main
import (
"fmt"
"strings"
)
func main() {
file := "test.txt"
fmt.Print(strings.TrimSuffix(file, "."))
}
This will safely remove any dot-extension - and will be tolerant if no extension is found:
func removeExtension(fpath string) string {
ext := filepath.Ext(fpath)
return strings.TrimSuffix(fpath, ext)
}
Playground example.
Table tests:
/www/main.js -> '/www/main'
/tmp/test.txt -> '/tmp/test'
/tmp/test2.text -> '/tmp/test2'
/tmp/test3.verylongext -> '/tmp/test3'
/user/bob.smith/has.many.dots.exe -> '/user/bob.smith/has.many.dots'
/tmp/zeroext. -> '/tmp/zeroext'
/tmp/noext -> '/tmp/noext'
-> ''
Though there is already an accepted answer, I want to share some slice tricks for string manipulation.
Remove last n characters from a string
As the title says, remove the last 4 characters from a string, it is very common usage of slices, ie,
file := "test.txt"
fmt.Println(file[:len(file)-4]) // you can replace 4 with any n
Output:
test
Playground example.
Remove file extensions:
From your problem description, it looks like you are trying to trim the file extension suffix (ie, .txt) from the string.
For this, I would prefer #colminator's answer from above, which is
file := "test.txt"
fmt.Println(strings.TrimSuffix(file, filepath.Ext(file)))
You can use this to remove everything after last "."
go playground
package main
import (
"fmt"
"strings"
)
func main() {
sampleInput := []string{
"/www/main.js",
"/tmp/test.txt",
"/tmp/test2.text",
"/tmp/test3.verylongext",
"/user/bob.smith/has.many.dots.exe",
"/tmp/zeroext.",
"/tmp/noext",
"",
"tldr",
}
for _, str := range sampleInput {
fmt.Println(removeExtn(str))
}
}
func removeExtn(input string) string {
if len(input) > 0 {
if i := strings.LastIndex(input, "."); i > 0 {
input = input[:i]
}
}
return input
}
Related
I am trying to split a string by the last occurrence of a separator (/) in golang
Example, I have a string "a/b/c/d", after performing the split, I would like an array of string as below
[
"a/b/c",
"a/b"
"a"
]
I tried exploring strings package but couldn't find any function that does this
func main() {
fmt.Printf("%q\n", strings.Split("a/b/c/d/e", "/"))
}
May I know a way to handle this?
To split any string only at the last occurrence, using strings.LastIndex
import (
"fmt"
"strings"
)
func main() {
x := "a_ab_daqe_sd_ew"
lastInd := strings.LastIndex(x, "_")
fmt.Println(x[:lastInd]) // o/p: a_ab_daqe_sd
fmt.Println(x[lastInd+1:]) // o/p: ew
}
Note, strings.LastIndex returns -1 if substring passed(in above example, "_") is not found
Since this is for path operations, and it looks like you don't want the trailing path separator, then path.Dir does what you're looking for:
fmt.Println(path.Dir("a/b/c/d/e"))
// a/b/c/d
If this is specifically for filesystem paths, you will want to use the filepath package instead, to properly handle multiple path separators.
Here's a simple function that uses filepath.Dir(string) to build a list of all ancestor directories of a given filepath:
func main() {
fmt.Printf("OK: %#v\n", parentsOf("a/b/c/d"))
// OK: []string{"a/b/c", "a/b", "a"}
}
func parentsOf(s string) []string {
dirs := []string{}
for {
parent := filepath.Dir(s)
if parent == "." || parent == "/" {
break
}
dirs = append(dirs, parent)
s = parent
}
return dirs
}
strings.Contains(str_to_check, substr) takes only one argument as the substring to check, how do I check multiple substrings without using strings.Contains() repeatedly?
eg. strings.Contains(str_to_check, substr1, substr2)
Yes, you can do this without calling strings.Contains() multiple times.
If you know substrings in advance the easiest way to check this with regular expression. And if a string to check is long and you have quite a few substrings it can be more fast then calling multiple strings.Contains
Example https://play.golang.org/p/7PokxbOOo7:
package main
import (
"fmt"
"regexp"
)
var re = regexp.MustCompile(`first|second|third`)
func main() {
fmt.Println(re.MatchString("This is the first example"))
fmt.Println(re.MatchString("This is the second example after first"))
fmt.Println(re.MatchString("This is the third example"))
fmt.Println(re.MatchString("This is the forth example"))
}
Output:
true
true
true
false
If the subs to check are dynamic it may be a bit more difficult to create regex as you need to escape special characters and regex compilation is not fast so strings.Contains() may be better in this case though it's better test if your code is performance critical.
Another good option could be to write your own scanner that can leverage common prefixes in substrings (if any) using prefix tree.
You can write your own utility function using strings.Contains() that can work for multiple sub-strings.
Here's an example that returns Boolean (true/false) in case of complete / partial match and the total number of matches:
package main
import (
"fmt"
"strings"
)
func checkSubstrings(str string, subs ...string) (bool, int) {
matches := 0
isCompleteMatch := true
fmt.Printf("String: \"%s\", Substrings: %s\n", str, subs)
for _, sub := range subs {
if strings.Contains(str, sub) {
matches += 1
} else {
isCompleteMatch = false
}
}
return isCompleteMatch, matches
}
func main() {
isCompleteMatch1, matches1 := checkSubstrings("Hello abc, xyz, abc", "abc", "xyz")
fmt.Printf("Test 1: { isCompleteMatch: %t, Matches: %d }\n", isCompleteMatch1, matches1)
fmt.Println()
isCompleteMatch2, matches2 := checkSubstrings("Hello abc, abc", "abc", "xyz")
fmt.Printf("Test 2: { isCompleteMatch: %t, Matches: %d }\n", isCompleteMatch2, matches2)
}
Output:
String: "Hello abc, xyz, abc", Substrings: [abc xyz]
Test 1: { isCompleteMatch: true, Matches: 2 }
String: "Hello abc, abc", Substrings: [abc xyz]
Test 2: { isCompleteMatch: false, Matches: 1 }
Here's the live example: https://play.golang.org/p/Xka0KfBrRD
Another solution would be using a combination of regexp and suffixarray. From the documentation:
Package suffixarray implements substring search in logarithmic time using an in-memory suffix array.
package main
import (
"fmt"
"index/suffixarray"
"regexp"
"strings"
)
func main() {
fmt.Println(contains("first secondthird", "first", "second", "third"))
fmt.Println(contains("first secondthird", "first", "10th"))
}
func contains(str string, subStrs ...string) bool {
if len(subStrs) == 0 {
return true
}
r := regexp.MustCompile(strings.Join(subStrs, "|"))
index := suffixarray.New([]byte(str))
res := index.FindAllIndex(r, -1)
exists := make(map[string]int)
for _, v := range subStrs {
exists[v] = 1
}
for _, pair := range res {
s := str[pair[0]:pair[1]]
exists[s] = exists[s] + 1
}
for _, v := range exists {
if v == 1 {
return false
}
}
return true
}
(In Go Playground)
[H]ow do I check multiple substrings without using strings.Contains() repeatedly?
Not at all. You have to call Contains repeatedly.
I am trying to count "characters" in go. That is, if a string contains one printable "glyph", or "composed character" (or what someone would ordinarily think of as a character), I want it to count 1. For example, the string "Hello, δΈππΏπη", should count 11, since there are 11 characters, and a human would look at this and say there are 11 glyphs.
utf8.RuneCountInString() works well in most cases, including ascii, accents, asian characters and even emojis. However, as I understand it runes correspond to code points, not characters. When I try to use basic emojis it works, but when I use emojis that have different skin tones, I get the wrong count: https://play.golang.org/p/aFIGsB6MsO
From what I read here and here the following should work, but I still don't seem to be getting the right results (it over-counts):
func CountCharactersInString(str string) int {
var ia norm.Iter
ia.InitString(norm.NFC, str)
nc := 0
for !ia.Done() {
nc = nc + 1
ia.Next()
}
return nc
}
This doesn't work either:
func GraphemeCountInString(str string) int {
re := regexp.MustCompile("\\PM\\pM*|.")
return len(re.FindAllString(str, -1))
}
I am looking for something similar to this in Objective C:
+ (NSInteger)countCharactersInString:(NSString *) string {
// --- Calculate the number of characters enterd by user and update character count label
NSInteger count = 0;
NSUInteger index = 0;
while (index < string.length) {
NSRange range = [string rangeOfComposedCharacterSequenceAtIndex:index];
count++;
index += range.length;
}
return count;
}
Straight forward natively use the utf8.RuneCountInString()
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
str := "Hello, δΈππη"
fmt.Println("counts =", utf8.RuneCountInString(str))
}
I wrote a package that allows you to do this: https://github.com/rivo/uniseg. It breaks strings according to the rules specified in Unicode Standard Annex #29 which is what you are looking for. Here is how you would use it in your case:
package main
import (
"fmt"
"github.com/rivo/uniseg"
)
func main() {
fmt.Println(uniseg.GraphemeClusterCount("Hello, δΈππΏπη"))
}
This will print 11 as you expect.
Have you tried strings.Count?
package main
import (
"fmt"
"strings"
)
func main() {
fmt.Println(strings.Count("Hello, δΈππη", "π")) // Returns 2
}
Reference to the example of API document.
https://golang.org/pkg/unicode/utf8/#example_DecodeLastRuneInString
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
str := "Hello, δΈπη"
count := 0
for len(str) > 0 {
r, size := utf8.DecodeLastRuneInString(str)
count++
fmt.Printf("%c %v\n", r, size)
str = str[:len(str)-size]
}
fmt.Println("count:",count)
}
I think the easiest way to do this would be like this:
package main
import "fmt"
func main() {
str := "Hello, δΈππη"
var counter int
for range str {
counter++
}
fmt.Println(counter)
}
This one prints 11
If I have a multi line string like
this is a line
this is another line
what is the best way to remove the empty line? I could make it work by splitting, iterating, and doing a condition check, but is there a better way?
Similar to ΞΞ»Π's answer it can be done with strings.Replace:
func Replace(s, old, new string, n int) string
Replace returns a copy of the string s with the first n non-overlapping instances of old replaced by new. If old is empty, it matches at the beginning of the string and after each UTF-8 sequence, yielding up to k+1 replacements for a k-rune string. If n < 0, there is no limit on the number of replacements.
package main
import (
"fmt"
"strings"
)
func main() {
var s = `line 1
line 2
line 3`
s = strings.Replace(s, "\n\n", "\n", -1)
fmt.Println(s)
}
https://play.golang.org/p/lu5UI74SLo
Assumming that you want to have the same string with empty lines removed as an output, I would use regular expressions:
import (
"fmt"
"regexp"
)
func main() {
var s = `line 1
line 2
line 3`
regex, err := regexp.Compile("\n\n")
if err != nil {
return
}
s = regex.ReplaceAllString(s, "\n")
fmt.Println(s)
}
The more generic approach would be something like this maybe.
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
s := `
####
####
####
####
`
fmt.Println(regexp.MustCompile(`[\t\r\n]+`).ReplaceAllString(strings.TrimSpace(s), "\n"))
}
https://play.golang.org/p/uWyHfUIDw-o
How can I know the position of a substring in a string, in characteres (or runes) instead of bytes?
strings.Index(s, sub) will give the position in bytes. When using Unicode, it doesn't match the position in runes: http://play.golang.org/p/DnlFjPaD2j
func main() {
s := "ÑéΓΓ³ΓΊΓΓΓΓΓ"
fmt.Println(strings.Index(s, "ΓΓ"))
}
Result: 14. Expected: 7
Of course, I could convert s and sub to []rune and look for the subslice manually, but is there a better way to do it?
Related to this, to get the first n characters of a string I'm doing this: string([]rune(s)[:n]). Is it the best way?
You can do it like this, after importing the unicode/utf8 package:
func main() {
s := "ÑéΓΓ³ΓΊΓΓΓΓΓ"
i := strings.Index(s, "ΓΓ")
fmt.Println(utf8.RuneCountInString(s[:i]))
}
http://play.golang.org/p/Etszu3rbY3
Another option:
package main
import "strings"
func runeIndex(s, substr string) int {
n := strings.Index(s, substr)
if n == -1 { return -1 }
r := []rune(s[:n])
return len(r)
}
func main() {
n := runeIndex("ÑéΓΓ³ΓΊΓΓΓΓΓ", "ΓΓ")
println(n == 7)
}