Golang: print string array in an unique way - string

I want a function func format(s []string) string such that for two string slices s1 and s2, if reflect.DeepEqual(s1, s2) == false, then format(s1) != format(s2).
If I simply use fmt.Sprint, slices ["a", "b", "c"] and ["a b", "c"] are all printed as [a b c], which is undesirable; and there is also the problem of string([]byte('4', 0, '2')) having the same representation as "42".

Use a format verb that shows the data structure, like %#v. In this case %q works well too because the primitive types are all strings.
fmt.Printf("%#v\n", []string{"a", "b", "c"})
fmt.Printf("%#v\n", []string{"a b", "c"})
// prints
// []string{"a", "b", "c"}
// []string{"a b", "c"}

You may use:
func format(s1, s2 []string) string {
if reflect.DeepEqual(s1, s2) {
return "%v\n"
}
return "%q\n"
}
Like this working sample (The Go Playground):
package main
import (
"fmt"
"reflect"
)
func main() {
s1, s2 := []string{"a", "b", "c"}, []string{"a b", "c"}
frmat := format(s1, s2)
fmt.Printf(frmat, s1) // ["a" "b" "c"]
fmt.Printf(frmat, s2) // ["a b" "c"]
s2 = []string{"a", "b", "c"}
frmat = format(s1, s2)
fmt.Printf(frmat, s1) // ["a" "b" "c"]
fmt.Printf(frmat, s2) // ["a b" "c"]
}
func format(s1, s2 []string) string {
if reflect.DeepEqual(s1, s2) {
return "%v\n"
}
return "%q\n"
}
output:
["a" "b" "c"]
["a b" "c"]
[a b c]
[a b c]

Related

all words are contained in sentence by golang

How can i match all word in the sentence?
words: ["test", "test noti", "result alarm", "alarm test"]
sentence: "alarm result test"
I expected something like this
[o] test
[x] test noti
[o] result alarm
[o] alarm test
I tried split by words,
var words []string
words = append(words, "test", "test noti", "result alarm", "alarm test")
sentence := "alarm result test"
for i := 0; i < len(words); i++ {
log.Info(strings.Split(words[i], " "))
}
Take a look at go strings package.
It contains necessary functions to achieve your goal.
As an example:
package main
import (
"fmt"
"strings"
)
const s = "alarm result test"
var words = []string{"test", "test noti", "result alarm", "alarm test"}
func main() {
for _, w := range words {
var c bool
for _, substr := range strings.Split(w, " ") {
c = strings.Contains(s, substr)
if !c {
break
}
}
fmt.Printf("%t %s \n", c, w)
}
}
https://go.dev/play/p/PhGLePCwhho

String Permutations of Different Lengths

I have been trying to wrap my head around something and can't seem to find an answer. I know how to get all the permutations of a string as it is fairly easy. What I want to try and do is get all the permutations of the string in different sizes. For example:
Given "ABCD" and a lower limit of 3 chars I would want to get back ABC, ABD, ACB, ACD, ADB, ADC, ... , ABCD, ACBD, ADBC, .. etc.
I'm not quite sure how to accomplish that. I have it in my head that it is something that could be very complicated or very simple. Any help pointing me in a direction is appreciated. Thanks.
If you've already got the full-length permutations, you can drop stuff off of the front or back, and insert the result into a set.
XCTAssertEqual(
Permutations(["A", "B", "C"]).reduce( into: Set() ) { set, permutation in
permutation.indices.forEach {
set.insert( permutation.dropLast($0) )
}
},
[ ["A", "B", "C"],
["A", "C", "B"],
["B", "C", "A"],
["B", "A", "C"],
["C", "A", "B"],
["C", "B", "A"],
["B", "C"],
["C", "B"],
["C", "A"],
["A", "C"],
["A", "B"],
["B", "A"],
["A"],
["B"],
["C"]
]
)
public struct Permutations<Sequence: Swift.Sequence>: Swift.Sequence, IteratorProtocol {
public typealias Array = [Sequence.Element]
private let array: Array
private var iteration = 0
public init(_ sequence: Sequence) {
array = Array(sequence)
}
public mutating func next() -> Array? {
guard iteration < array.count.factorial!
else { return nil }
defer { iteration += 1 }
return array.indices.reduce(into: array) { permutation, index in
let shift =
iteration / (array.count - 1 - index).factorial!
% (array.count - index)
permutation.replaceSubrange(
index...,
with: permutation.dropFirst(index).shifted(by: shift)
)
}
}
}
public extension Collection where SubSequence: RangeReplaceableCollection {
func shifted(by shift: Int) -> SubSequence {
let drops =
shift > 0
? (shift, count - shift)
: (count + shift, -shift)
return dropFirst(drops.0) + dropLast(drops.1)
}
}
public extension BinaryInteger where Stride: SignedInteger {
/// - Note: `nil` for negative numbers
var factorial: Self? {
switch self {
case ..<0:
return nil
case 0...1:
return 1
default:
return (2...self).reduce(1, *)
}
}
}

How can I define a custom alphabet order for comparing and sorting strings in go?

Please read to the bottom before marking this as duplicate
I would like to be able to sort an array of strings (or a slice of structs based on one string value) alphabetically, but based on a custom alphabet or unicode letters.
Most times people advise using a collator that supports different pre-defined locales/alphabets. (See this answer for Java), but what can be done for rare languages/alphabets that are not available in these locale bundles?
The language I would like to use is not available in the list of languages supported and usable by Golangs's collate, so I need to be able to define a custom alphabet, or order of Unicode characters/runes for sorting.
Others suggest translate the strings into an english/ASCII sortable alphabet first, and then sort that. That's what's been suggested by a similar question in this solution done in Javascript or this solution in Ruby. But surely there must be a more efficient way to do this with Go.
Is it possible to create a Collator in Go that uses a custom alphabet/character set? Is that what func NewFromTable is for?
It seems that I should be able to use the Reorder function but it looks like this is not yet implemented in the language? The source code shows this:
func Reorder(s ...string) Option {
// TODO: need fractional weights to implement this.
panic("TODO: implement")
}
How can I define a custom alphabet order for comparing and sorting strings in go?
Note beforehand:
The following solution has been cleaned up and optimized, and published as a reusable library here: github.com/icza/abcsort.
Using abcsort, custom-sorting a string slice (using a custom alphabet) is as simple as:
sorter := abcsort.New("bac")
ss := []string{"abc", "bac", "cba", "CCC"}
sorter.Strings(ss)
fmt.Println(ss)
// Output: [CCC bac abc cba]
Custom-sorting a slice of structs by one of the struct field is like:
type Person struct {
Name string
Age int
}
ps := []Person{{Name: "alice", Age: 21}, {Name: "bob", Age: 12}}
sorter.Slice(ps, func(i int) string { return ps[i].Name })
fmt.Println(ps)
// Output: [{bob 12} {alice 21}]
Original answer follows:
We can implement custom sorting that uses a custom alphabet. We just need to create the appropriate less(i, j int) bool function, and the sort package will do the rest.
Question is how to create such a less() function?
Let's start by defining the custom alphabet. Convenient way is to create a string that contains the letters of the custom alphabet, enumerated (ordered) from smallest to highest. For example:
const alphabet = "bca"
Let's create a map from this alphabet, which will tell the weight or order of each letter of our custom alphabet:
var weights = map[rune]int{}
func init() {
for i, r := range alphabet {
weights[r] = i
}
}
(Note: i in the above loop is the byte index, not the rune index, but since both are monotone increasing, both will do just fine for rune weight.)
Now we can create our less() function. To have "acceptable" performance, we should avoid converting the input string values to byte or rune slices. To do that, we can call aid from the utf8.DecodeRuneInString() function which decodes the first rune of a string.
So we do the comparison rune-by-rune. If both runes are letters of the custom alphabet, we may use their weights to tell how they compare to each other. If at least one of the runes are not from our custom alphabet, we will fallback to simple numeric rune comparisons.
If 2 runes at the beginning of the 2 input strings are equal, we proceed to the next runes in each input string. We may do this my slicing the input strings: slicing them does not make a copy, it just returns a new string header that points to the data of the original strings.
All right, now let's see the implementation of this less() function:
func less(s1, s2 string) bool {
for {
switch e1, e2 := len(s1) == 0, len(s2) == 0; {
case e1 && e2:
return false // Both empty, they are equal (not less)
case !e1 && e2:
return false // s1 not empty but s2 is: s1 is greater (not less)
case e1 && !e2:
return true // s1 empty but s2 is not: s1 is less
}
r1, size1 := utf8.DecodeRuneInString(s1)
r2, size2 := utf8.DecodeRuneInString(s2)
// Check if both are custom, in which case we use custom order:
custom := false
if w1, ok1 := weights[r1]; ok1 {
if w2, ok2 := weights[r2]; ok2 {
custom = true
if w1 != w2 {
return w1 < w2
}
}
}
if !custom {
// Fallback to numeric rune comparison:
if r1 != r2 {
return r1 < r2
}
}
s1, s2 = s1[size1:], s2[size2:]
}
}
Let's see some trivial tests of this less() function:
pairs := [][2]string{
{"b", "c"},
{"c", "a"},
{"b", "a"},
{"a", "b"},
{"bca", "bac"},
}
for _, pair := range pairs {
fmt.Printf("\"%s\" < \"%s\" ? %t\n", pair[0], pair[1], less(pair[0], pair[1]))
}
Output (try it on the Go Playground):
"b" < "c" ? true
"c" < "a" ? true
"b" < "a" ? true
"a" < "b" ? false
"bca" < "bac" ? true
And now let's test this less() function in an actual sorting:
ss := []string{
"abc",
"abca",
"abcb",
"abcc",
"bca",
"cba",
"bac",
}
sort.Slice(ss, func(i int, j int) bool {
return less(ss[i], ss[j])
})
fmt.Println(ss)
Output (try it on the Go Playground):
[bca bac cba abc abcb abcc abca]
Again, if performance is important to you, you should not use sort.Slice() as that has to use reflection under the hood, but rather create your own slice type that implements sort.Interface, and in your implementation you can tell how to do it without using reflection.
This is how it could look like:
type CustStrSlice []string
func (c CustStrSlice) Len() int { return len(c) }
func (c CustStrSlice) Less(i, j int) bool { return less(c[i], c[j]) }
func (c CustStrSlice) Swap(i, j int) { c[i], c[j] = c[j], c[i] }
When you want to sort a string slice using the custom alphabet, simply convert your slice to CustStrSlice, so it can be passed directly to sort.Sort() (this type conversion does not make a copy of the slice or its elements, it just changes the type information):
ss := []string{
"abc",
"abca",
"abcb",
"abcc",
"bca",
"cba",
"bac",
}
sort.Sort(CustStrSlice(ss))
fmt.Println(ss)
Output of the above is again (try it on the Go Playground):
[bca bac cba abc abcb abcc abca]
Some things to note:
The default string comparison compares strings byte-wise. That is, if the input strings contain invalid UTF-8 sequences, the actual bytes will still be used.
Our solution is different in this regard, as we decode runes (we have to because we use a custom alphabet in which we allow runes that are not necessarily mapped to bytes 1-to-1 in UTF-8 encoding). This means if the input is not a valid UTF-8 sequence, the behavior might not be consistent with the default ordering. But if your inputs are valid UTF-8 sequences, this will do what you expect it to do.
One last note:
We've seen how a string slice could be custom-sorted. If we have a slice of structs (or a slice of pointers of structs), the sorting algorithm (the less() function) may be the same, but when comparing elements of the slice, we have to compare fields of the elements, not the struct elements themselves.
So let's say we have the following struct:
type Person struct {
Name string
Age int
}
func (p *Person) String() string { return fmt.Sprint(*p) }
(The String() method is added so we'll see the actual contents of the structs, not just their addresses...)
And let's say we want to apply our custom sorting on a slice of type []*Person, using the Name field of the Person elements. So we simply define this custom type:
type PersonSlice []*Person
func (p PersonSlice) Len() int { return len(p) }
func (p PersonSlice) Less(i, j int) bool { return less(p[i].Name, p[j].Name) }
func (p PersonSlice) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
And that's all. The rest is the same, for example:
ps := []*Person{
{Name: "abc"},
{Name: "abca"},
{Name: "abcb"},
{Name: "abcc"},
{Name: "bca"},
{Name: "cba"},
{Name: "bac"},
}
sort.Sort(PersonSlice(ps))
fmt.Println(ps)
Output (try it on the Go Playground):
[{bca 0} {bac 0} {cba 0} {abc 0} {abcb 0} {abcc 0} {abca 0}]
Using table_test.go [1] as a starting point, I came up with the following. The
real work is being done by Builder.Add [2]:
package main
import (
"golang.org/x/text/collate"
"golang.org/x/text/collate/build"
)
type entry struct {
r rune
w int
}
func newCollator(ents []entry) (*collate.Collator, error) {
b := build.NewBuilder()
for _, ent := range ents {
err := b.Add([]rune{ent.r}, [][]int{{ent.w}}, nil)
if err != nil { return nil, err }
}
t, err := b.Build()
if err != nil { return nil, err }
return collate.NewFromTable(t), nil
}
Result:
package main
import "fmt"
func main() {
a := []entry{
{'a', 3}, {'b', 2}, {'c', 1},
}
c, err := newCollator(a)
if err != nil {
panic(err)
}
x := []string{"alfa", "bravo", "charlie"}
c.SortStrings(x)
fmt.Println(x) // [charlie bravo alfa]
}
https://github.com/golang/text/blob/3115f89c/collate/table_test.go
https://pkg.go.dev/golang.org/x/text/collate/build#Builder.Add

performance comparison stringr--str_replace_all

Just wondering if there will be performance differences in running str_replace_all.
For example:
text <- c("a","b", "c")
str_replace_all(text, c("a", "b", "c"), c("d", "e", "f"))
and
str_replace_all(text, "a", "d")
str_replace_all(text, "b", "e")
str_replace_all(text, "c", "f")
Both get me the same result but I was wondering what would be faster if I was doing the same procedure for close to 200,000 documents and if each document file was longer?
It is evident you will have better performance with a single str_replace_all call since you do not have to change text value. See, when you need to call str_replace_all to change the text value, you need to re-assign the value each time you replace and that means additional overhead.
Here is a test with 3 functions: f1 uses the first approach, f2 uses the second and f3 is just a "chained" version of f2:
> library(microbenchmark)
> text <- c("a", "b", "c")
> f1 <- function(text) { text=str_replace_all(text, "a", "d"); text = str_replace_all(text, "b", "e"); text=str_replace_all(text, "c", "f"); return(text) }
> f1(text)
[1] "d" "e" "f"
> f2 <- function(text) { return(str_replace_all(text, c("a", "b", "c"), c("d", "e", "f"))) }
> f2(text)
[1] "d" "e" "f"
> f3 <- function(text) { return(str_replace_all(str_replace_all(str_replace_all(text, "c", "f"), "b", "e"), "a", "d")) }
> f3(text)
[1] "d" "e" "f"
> test <- microbenchmark( f1(text), f2(text), f3(text), times = 50000 )
> test
Unit: microseconds
expr min lq mean median uq max neval
f1(text) 225.788 233.335 257.2998 239.673 262.313 25071.76 50000
f2(text) 182.321 187.755 207.1858 191.980 210.393 24844.76 50000
f3(text) 224.581 231.825 255.2167 237.863 259.898 24531.74 50000
With times = 50000, the functions were run 50,000 times and the median value, being the lowest with f2, together with lower quartile (lq) and upper quartile (uq) values, proves that a single str_replace_all is the fastest. autoplot(test) (from ggplot2 library) shows:
And finally, it is best to use stri_replace_all_fixed from stringi package if you need to only replace literal strings. Then, here is the benchmark:
> library(stringi)
> f1 <- function(text) { text=stri_replace_all_fixed(text, "a", "d"); text = stri_replace_all_fixed(text, "b", "e"); text=stri_replace_all_fixed(text, "c", "f"); return(text) }
> f2 <- function(text) { return(stri_replace_all_fixed(text, c("a", "b", "c"), c("d", "e", "f"))) }
> f3 <- function(text) { return(stri_replace_all_fixed(stri_replace_all_fixed(stri_replace_all_fixed(text, "c", "f"), "b", "e"), "a", "d")) }
> test <- microbenchmark( f1(text), f2(text), f3(text), times = 50000 )
> test
Unit: microseconds
expr min lq mean median uq max neval cld
f1(text) 7.547 7.849 9.197490 8.151 8.453 1008.800 50000 b
f2(text) 3.321 3.623 4.420453 3.925 3.925 2053.821 50000 a
f3(text) 7.245 7.547 9.802766 7.849 8.151 50816.654 50000 b

Golang Alphabetic representation of a number

Is there an easy way to convert a number to a letter?
For example,
3 => "C" and 23 => "W"?
For simplicity range check is omitted from below solutions.
They all can be tried on the Go Playground.
Number -> rune
Simply add the number to the const 'A' - 1 so adding 1 to this you get 'A', adding 2 you get 'B' etc.:
func toChar(i int) rune {
return rune('A' - 1 + i)
}
Testing it:
for _, i := range []int{1, 2, 23, 26} {
fmt.Printf("%d %q\n", i, toChar(i))
}
Output:
1 'A'
2 'B'
23 'W'
26 'Z'
Number -> string
Or if you want it as a string:
func toCharStr(i int) string {
return string('A' - 1 + i)
}
Output:
1 "A"
2 "B"
23 "W"
26 "Z"
This last one (converting a number to string) is documented in the Spec: Conversions to and from a string type:
Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer.
Number -> string (cached)
If you need to do this a lot of times, it is profitable to store the strings in an array for example, and just return the string from that:
var arr = [...]string{"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M",
"N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"}
func toCharStrArr(i int) string {
return arr[i-1]
}
Note: a slice (instead of the array) would also be fine.
Note #2: you may improve this if you add a dummy first character so you don't have to subtract 1 from i:
var arr = [...]string{".", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M",
"N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"}
func toCharStrArr(i int) string { return arr[i] }
Number -> string (slicing a string constant)
Also another interesting solution:
const abc = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
func toCharStrConst(i int) string {
return abc[i-1 : i]
}
Slicing a string is efficient: the new string will share the backing array (it can be done because strings are immutable).
If you need not a rune, but a string and also more than one character for e.g. excel column
package main
import (
"fmt"
)
func IntToLetters(number int32) (letters string){
number--
if firstLetter := number/26; firstLetter >0{
letters += IntToLetters(firstLetter)
letters += string('A' + number%26)
} else {
letters += string('A' + number)
}
return
}
func main() {
fmt.Println(IntToLetters(1))// print A
fmt.Println(IntToLetters(26))// print Z
fmt.Println(IntToLetters(27))// print AA
fmt.Println(IntToLetters(1999))// print BXW
}
preview here: https://play.golang.org/p/GAWebM_QCKi
I made also package with this: https://github.com/arturwwl/gointtoletters
The simplest solution would be
func stringValueOf(i int) string {
var foo = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
return string(foo[i-1])
}
Hope this will help you to solve your problem. Happy Coding!!

Resources