Please read to the bottom before marking this as duplicate
I would like to be able to sort an array of strings (or a slice of structs based on one string value) alphabetically, but based on a custom alphabet or unicode letters.
Most times people advise using a collator that supports different pre-defined locales/alphabets. (See this answer for Java), but what can be done for rare languages/alphabets that are not available in these locale bundles?
The language I would like to use is not available in the list of languages supported and usable by Golangs's collate, so I need to be able to define a custom alphabet, or order of Unicode characters/runes for sorting.
Others suggest translate the strings into an english/ASCII sortable alphabet first, and then sort that. That's what's been suggested by a similar question in this solution done in Javascript or this solution in Ruby. But surely there must be a more efficient way to do this with Go.
Is it possible to create a Collator in Go that uses a custom alphabet/character set? Is that what func NewFromTable is for?
It seems that I should be able to use the Reorder function but it looks like this is not yet implemented in the language? The source code shows this:
func Reorder(s ...string) Option {
// TODO: need fractional weights to implement this.
panic("TODO: implement")
}
How can I define a custom alphabet order for comparing and sorting strings in go?
Note beforehand:
The following solution has been cleaned up and optimized, and published as a reusable library here: github.com/icza/abcsort.
Using abcsort, custom-sorting a string slice (using a custom alphabet) is as simple as:
sorter := abcsort.New("bac")
ss := []string{"abc", "bac", "cba", "CCC"}
sorter.Strings(ss)
fmt.Println(ss)
// Output: [CCC bac abc cba]
Custom-sorting a slice of structs by one of the struct field is like:
type Person struct {
Name string
Age int
}
ps := []Person{{Name: "alice", Age: 21}, {Name: "bob", Age: 12}}
sorter.Slice(ps, func(i int) string { return ps[i].Name })
fmt.Println(ps)
// Output: [{bob 12} {alice 21}]
Original answer follows:
We can implement custom sorting that uses a custom alphabet. We just need to create the appropriate less(i, j int) bool function, and the sort package will do the rest.
Question is how to create such a less() function?
Let's start by defining the custom alphabet. Convenient way is to create a string that contains the letters of the custom alphabet, enumerated (ordered) from smallest to highest. For example:
const alphabet = "bca"
Let's create a map from this alphabet, which will tell the weight or order of each letter of our custom alphabet:
var weights = map[rune]int{}
func init() {
for i, r := range alphabet {
weights[r] = i
}
}
(Note: i in the above loop is the byte index, not the rune index, but since both are monotone increasing, both will do just fine for rune weight.)
Now we can create our less() function. To have "acceptable" performance, we should avoid converting the input string values to byte or rune slices. To do that, we can call aid from the utf8.DecodeRuneInString() function which decodes the first rune of a string.
So we do the comparison rune-by-rune. If both runes are letters of the custom alphabet, we may use their weights to tell how they compare to each other. If at least one of the runes are not from our custom alphabet, we will fallback to simple numeric rune comparisons.
If 2 runes at the beginning of the 2 input strings are equal, we proceed to the next runes in each input string. We may do this my slicing the input strings: slicing them does not make a copy, it just returns a new string header that points to the data of the original strings.
All right, now let's see the implementation of this less() function:
func less(s1, s2 string) bool {
for {
switch e1, e2 := len(s1) == 0, len(s2) == 0; {
case e1 && e2:
return false // Both empty, they are equal (not less)
case !e1 && e2:
return false // s1 not empty but s2 is: s1 is greater (not less)
case e1 && !e2:
return true // s1 empty but s2 is not: s1 is less
}
r1, size1 := utf8.DecodeRuneInString(s1)
r2, size2 := utf8.DecodeRuneInString(s2)
// Check if both are custom, in which case we use custom order:
custom := false
if w1, ok1 := weights[r1]; ok1 {
if w2, ok2 := weights[r2]; ok2 {
custom = true
if w1 != w2 {
return w1 < w2
}
}
}
if !custom {
// Fallback to numeric rune comparison:
if r1 != r2 {
return r1 < r2
}
}
s1, s2 = s1[size1:], s2[size2:]
}
}
Let's see some trivial tests of this less() function:
pairs := [][2]string{
{"b", "c"},
{"c", "a"},
{"b", "a"},
{"a", "b"},
{"bca", "bac"},
}
for _, pair := range pairs {
fmt.Printf("\"%s\" < \"%s\" ? %t\n", pair[0], pair[1], less(pair[0], pair[1]))
}
Output (try it on the Go Playground):
"b" < "c" ? true
"c" < "a" ? true
"b" < "a" ? true
"a" < "b" ? false
"bca" < "bac" ? true
And now let's test this less() function in an actual sorting:
ss := []string{
"abc",
"abca",
"abcb",
"abcc",
"bca",
"cba",
"bac",
}
sort.Slice(ss, func(i int, j int) bool {
return less(ss[i], ss[j])
})
fmt.Println(ss)
Output (try it on the Go Playground):
[bca bac cba abc abcb abcc abca]
Again, if performance is important to you, you should not use sort.Slice() as that has to use reflection under the hood, but rather create your own slice type that implements sort.Interface, and in your implementation you can tell how to do it without using reflection.
This is how it could look like:
type CustStrSlice []string
func (c CustStrSlice) Len() int { return len(c) }
func (c CustStrSlice) Less(i, j int) bool { return less(c[i], c[j]) }
func (c CustStrSlice) Swap(i, j int) { c[i], c[j] = c[j], c[i] }
When you want to sort a string slice using the custom alphabet, simply convert your slice to CustStrSlice, so it can be passed directly to sort.Sort() (this type conversion does not make a copy of the slice or its elements, it just changes the type information):
ss := []string{
"abc",
"abca",
"abcb",
"abcc",
"bca",
"cba",
"bac",
}
sort.Sort(CustStrSlice(ss))
fmt.Println(ss)
Output of the above is again (try it on the Go Playground):
[bca bac cba abc abcb abcc abca]
Some things to note:
The default string comparison compares strings byte-wise. That is, if the input strings contain invalid UTF-8 sequences, the actual bytes will still be used.
Our solution is different in this regard, as we decode runes (we have to because we use a custom alphabet in which we allow runes that are not necessarily mapped to bytes 1-to-1 in UTF-8 encoding). This means if the input is not a valid UTF-8 sequence, the behavior might not be consistent with the default ordering. But if your inputs are valid UTF-8 sequences, this will do what you expect it to do.
One last note:
We've seen how a string slice could be custom-sorted. If we have a slice of structs (or a slice of pointers of structs), the sorting algorithm (the less() function) may be the same, but when comparing elements of the slice, we have to compare fields of the elements, not the struct elements themselves.
So let's say we have the following struct:
type Person struct {
Name string
Age int
}
func (p *Person) String() string { return fmt.Sprint(*p) }
(The String() method is added so we'll see the actual contents of the structs, not just their addresses...)
And let's say we want to apply our custom sorting on a slice of type []*Person, using the Name field of the Person elements. So we simply define this custom type:
type PersonSlice []*Person
func (p PersonSlice) Len() int { return len(p) }
func (p PersonSlice) Less(i, j int) bool { return less(p[i].Name, p[j].Name) }
func (p PersonSlice) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
And that's all. The rest is the same, for example:
ps := []*Person{
{Name: "abc"},
{Name: "abca"},
{Name: "abcb"},
{Name: "abcc"},
{Name: "bca"},
{Name: "cba"},
{Name: "bac"},
}
sort.Sort(PersonSlice(ps))
fmt.Println(ps)
Output (try it on the Go Playground):
[{bca 0} {bac 0} {cba 0} {abc 0} {abcb 0} {abcc 0} {abca 0}]
Using table_test.go [1] as a starting point, I came up with the following. The
real work is being done by Builder.Add [2]:
package main
import (
"golang.org/x/text/collate"
"golang.org/x/text/collate/build"
)
type entry struct {
r rune
w int
}
func newCollator(ents []entry) (*collate.Collator, error) {
b := build.NewBuilder()
for _, ent := range ents {
err := b.Add([]rune{ent.r}, [][]int{{ent.w}}, nil)
if err != nil { return nil, err }
}
t, err := b.Build()
if err != nil { return nil, err }
return collate.NewFromTable(t), nil
}
Result:
package main
import "fmt"
func main() {
a := []entry{
{'a', 3}, {'b', 2}, {'c', 1},
}
c, err := newCollator(a)
if err != nil {
panic(err)
}
x := []string{"alfa", "bravo", "charlie"}
c.SortStrings(x)
fmt.Println(x) // [charlie bravo alfa]
}
https://github.com/golang/text/blob/3115f89c/collate/table_test.go
https://pkg.go.dev/golang.org/x/text/collate/build#Builder.Add
I am having a problem figuring out how to reference elements of a sub structure.
See: http://play.golang.org/p/pamS_ZY01s
Given something like following.... How do you reference data in the room struct? I have tried fmt.Println(*n.Homes[0].Rooms[0].Size), but that does not work.
Begin Code example
package main
import (
"fmt"
)
type Neighborhood struct {
Name string
Homes *[]Home
}
type Home struct {
Color string
Rooms *[]Room
}
type Room struct {
Size string
}
func main() {
var n Neighborhood
var h1 Home
var r1 Room
n.Name = "Mountain Village"
h1.Color = "Blue"
r1.Size = "200 sq feet"
// Initiaize Array of Homes
homeslice := make([]Home, 0)
n.Homes = &homeslice
roomslice := make([]Room, 0)
h1.Rooms = &roomslice
*h1.Rooms = append(*h1.Rooms, r1)
*n.Homes = append(*n.Homes, h1)
fmt.Println(n)
fmt.Println(*n.Homes)
}
First, *[]Home is really wasteful. A slice is a three worded struct under the hood, one of them being itself a pointer to an array. You are introducing a double indirection there. This article on data structures in Go is very useful.
Now, because of this indirection, you need to put the dereference operator * in every pointer-to-slice expression. Like this:
fmt.Println((*(*n.Homes)[0].Rooms)[0].Size)
But, really, just take out the pointers.
Why does map have different behavior on Go?
All types in Go are copied by value: string, intxx, uintxx, floatxx, struct, [...]array, []slice except for map[key]value
package main
import "fmt"
type test1 map[string]int
func (t test1) DoSomething() { // doesn't need to use pointer
t["yay"] = 1
}
type test2 []int
func (t* test2) DoSomething() { // must use pointer so changes would effect
*t = append(*t,1)
}
type test3 struct{
a string
b int
}
func (t* test3) DoSomething() { // must use pointer so changes would effect
t.a = "aaa"
t.b = 123
}
func main() {
t1 := test1{}
u1 := t1
u1.DoSomething()
fmt.Println("u1",u1)
fmt.Println("t1",t1)
t2 := test2{}
u2 := t2
u2.DoSomething()
fmt.Println("u2",u2)
fmt.Println("t2",t2)
t3 := test3{}
u3 := t3
u3.DoSomething()
fmt.Println("u3",u3)
fmt.Println("t3",t3)
}
And passing variable as function's parameter/argument is equal to assignment with :=
package main
import "fmt"
type test1 map[string]int
func DoSomething1(t test1) { // doesn't need to use pointer
t["yay"] = 1
}
type test2 []int
func DoSomething2(t *test2) { // must use pointer so changes would effect
*t = append(*t,1)
}
type test3 struct{
a string
b int
}
func DoSomething3(t *test3) { // must use pointer so changes would effect
t.a = "aaa"
t.b = 123
}
func main() {
t1 := test1{}
DoSomething1(t1)
fmt.Println("t1",t1)
t2 := test2{}
DoSomething2(&t2)
fmt.Println("t2",t2)
t3 := test3{}
DoSomething3(&t3)
fmt.Println("t3",t3)
}
Map values are pointers. Some other types (slice, string, channel, function) are, similarly, implemented with pointers. Interestingly, the linked FAQ entry says,
Early on, maps and channels were syntactically pointers and it was impossible to declare or use a non-pointer instance. ... Eventually we decided that the strict separation of pointers and values made the language harder to use.
"Go passes by value" means variables passed as regular function args won't be modified by the called function. That doesn't change that some built-in types can contain pointers (just like your own structs can).
Python is similar: f(x) won't change an integer x passed to it, but it can append to a list x because Python lists are implemented with a pointer internally. C++, by contrast, has actual pass-by-reference available: f(x) can change the caller's int x if f is declared to have a reference parameter (void f(int& x)).
(I also wrote some general info on pointers vs. values in Go in an answer to another question, if that helps.)
I'm currently working with a bit of code at the moment, that involves a var with type []interface{}
It has values within it, that I could easily access like so:
//given that args is of type []interface{}
name := args[0]
age := args[1] //ect...
This is fine, but I'd like to be able to use the strings Join function, and it would typically error due to it requiring type []string and not type []interface{}.
What would be the most appropriate solution to be able to use the Join function, I'd guess maybe some sort on conversion?
You need to construct a new array of type []string in order to use strings.Join:
import "fmt"
import "strings"
func main() {
s1 := []interface{}{"a", "b", "c"}
s2 := make([]string, len(s1))
for i, s := range s1 {
s2[i] = s.(string)
}
fmt.Println(strings.Join(s2, ", "))
}
See the related Golang FAQ entry: can I convert a []T to an []interface{}?
I've a struct called item
type Item struct {
Limit int
Skip int
Fields string
}
item := Item {
Limit: 3,
Skip: 5,
Fields: "Valuie",
}
how could I get the field name, value and join it into a string.
something like:
item := Item {
Limit: 3,
Skip: 5,
Fields: "Valuie",
}
to a string something like
"Limit=3&Skip=5&Fields=Valuie"
And I've try reflections to get convert interface to field value map so far. Am I going the right way? Cause I think there might have some better solutions. And thanks!
m, _ = reflections.Items(data)
for k, v := range m {
fmt.Printf("%s : %s\n", k, v)
}
I've got
Limit : %!s(int=3)
Skip : %!s(int=5)
Fields : Valuie
You can use %v instead of %s. %s will assume a string, something that can be converted to a string (i.e. byte array) or an object with a String() method.
Using %v will check the type and display it correctly.
Example of the String() method call with %s with your example: http://play.golang.org/p/bxE91IaVKj
There's no need to use reflection for this. Just write out the code for the types you have.
func (i *Item) URLValues() string {
uv := url.Values()
uv.Set("Limit", strconv.Itoa(i.Limit))
uv.Set("Skip", strconv.Itoa(i.Skip))
uv.Set("Fields", i.Fields)
return uv.Encode()
}
This code is simple, readable, and you don't need to think to write it. Unless you have a lot of types that you're going to be converting to values then I'd not even think about the magic reflection-based solution to this problem.
For any struct you can use reflection and url.Values from the net/url package:
i := Item{1, 2, "foo"}
v := url.Values{}
ri := reflect.ValueOf(i)
ri.FieldByNameFunc(func(name string) bool {
v.Set(name, fmt.Sprintf("%v", ri.FieldByName(name).Interface()))
return false
})
fmt.Println(v.Encode())
Example on play.
Of course, this code does not handle nested data structures or slices so you would need to extend
the code if you use other data structures to make it more general. However, this example should
get you started.
Take a look at go-querystring. It converts a struct into URL query parameters (your expected output).
type Item struct {
Limit int `limit:"limit"`
Skip int `url:"skip"`
Fields string `url:"fields"`
}
item := Item {
Limit: 3,
Skip: 5,
Fields: "Valuie",
}
v, _ := query.Values(item)
fmt.Print(v.Encode())
// will output: "limit=3&skip=5&fields=Valuie"