I want to split a string on a regular expresion, but preserve the matches.
I have tried splitting the string on a regex, but it throws away the matches. I have also tried using this, but I am not very good at translating code from language to language, let alone C#.
re := regexp.MustCompile(`\d`)
array := re.Split("ab1cd2ef3", -1)
I need the value of array to be ["ab", "1", "cd", "2", "ef", "3"], but the value of array is ["ab", "cd", "ef"]. No errors.
The kind of regex support in the link you have pointed out is NOT available in Go regex package. You can read the related discussion.
What you want to achieve (as per the sample given) can be done using regex to match digits or non-digits.
package main
import (
"fmt"
"regexp"
)
func main() {
str := "ab1cd2ef3"
r := regexp.MustCompile(`(\d|[^\d]+)`)
fmt.Println(r.FindAllStringSubmatch(str, -1))
}
Playground: https://play.golang.org/p/L-ElvkDky53
Output:
[[ab ab] [1 1] [cd cd] [2 2] [ef ef] [3 3]]
I don't think this is possible with the current regexp package, but the Split could be easily extended to such behavior.
This should work for your case:
func Split(re *regexp.Regexp, s string, n int) []string {
if n == 0 {
return nil
}
matches := re.FindAllStringIndex(s, n)
strings := make([]string, 0, len(matches))
beg := 0
end := 0
for _, match := range matches {
if n > 0 && len(strings) >= n-1 {
break
}
end = match[0]
if match[1] != 0 {
strings = append(strings, s[beg:end])
}
beg = match[1]
// This also appends the current match
strings = append(strings, s[match[0]:match[1]])
}
if end != len(s) {
strings = append(strings, s[beg:])
}
return strings
}
Dumb solutions. Add separator in the string and split with separator.
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
re := regexp.MustCompile(`\d+`)
input := "ab1cd2ef3"
sep := "|"
indexes := re.FindAllStringIndex(input, -1)
fmt.Println(indexes)
move := 0
for _, v := range indexes {
p1 := v[0] + move
p2 := v[1] + move
input = input[:p1] + sep + input[p1:p2] + sep + input[p2:]
move += 2
}
result := strings.Split(input, sep)
fmt.Println(result)
}
You can use a bufio.Scanner:
package main
import (
"bufio"
"strings"
)
func digit(data []byte, eof bool) (int, []byte, error) {
for i, b := range data {
if '0' <= b && b <= '9' {
if i > 0 {
return i, data[:i], nil
}
return 1, data[:1], nil
}
}
return 0, nil, nil
}
func main() {
s := bufio.NewScanner(strings.NewReader("ab1cd2ef3"))
s.Split(digit)
for s.Scan() {
println(s.Text())
}
}
https://golang.org/pkg/bufio#Scanner.Split
I was wondering if there is any way I could easily split a string at spaces, except when the space is inside quotation marks?
For example, changing
Foo bar random "letters lol" stuff
into
Foo, bar, random, "letters lol", stuff
Think about it. You have a string in comma separated values (CSV) file format, RFC4180, except that your separator, outside quote pairs, is a space (instead of a comma). For example,
package main
import (
"encoding/csv"
"fmt"
"strings"
)
func main() {
s := `Foo bar random "letters lol" stuff`
fmt.Printf("String:\n%q\n", s)
// Split string
r := csv.NewReader(strings.NewReader(s))
r.Comma = ' ' // space
fields, err := r.Read()
if err != nil {
fmt.Println(err)
return
}
fmt.Printf("\nFields:\n")
for _, field := range fields {
fmt.Printf("%q\n", field)
}
}
Playground: https://play.golang.org/p/Ed4IV97L7H
Output:
String:
"Foo bar random \"letters lol\" stuff"
Fields:
"Foo"
"bar"
"random"
"letters lol"
"stuff"
Using strings.FieldsFunc try this:
package main
import (
"fmt"
"strings"
)
func main() {
s := `Foo bar random "letters lol" stuff`
quoted := false
a := strings.FieldsFunc(s, func(r rune) bool {
if r == '"' {
quoted = !quoted
}
return !quoted && r == ' '
})
out := strings.Join(a, ", ")
fmt.Println(out) // Foo, bar, random, "letters lol", stuff
}
Using simple strings.Builder and range over string and keeping or not keeping " at your will, try this
package main
import (
"fmt"
"strings"
)
func main() {
s := `Foo bar random "letters lol" stuff`
a := []string{}
sb := &strings.Builder{}
quoted := false
for _, r := range s {
if r == '"' {
quoted = !quoted
sb.WriteRune(r) // keep '"' otherwise comment this line
} else if !quoted && r == ' ' {
a = append(a, sb.String())
sb.Reset()
} else {
sb.WriteRune(r)
}
}
if sb.Len() > 0 {
a = append(a, sb.String())
}
out := strings.Join(a, ", ")
fmt.Println(out) // Foo, bar, random, "letters lol", stuff
// not keep '"': // Foo, bar, random, letters lol, stuff
}
Using scanner.Scanner, try this:
package main
import (
"fmt"
"strings"
"text/scanner"
)
func main() {
var s scanner.Scanner
s.Init(strings.NewReader(`Foo bar random "letters lol" stuff`))
slice := make([]string, 0, 5)
tok := s.Scan()
for tok != scanner.EOF {
slice = append(slice, s.TokenText())
tok = s.Scan()
}
out := strings.Join(slice, ", ")
fmt.Println(out) // Foo, bar, random, "letters lol", stuff
}
Using csv.NewReader which removes " itself, try this:
package main
import (
"encoding/csv"
"fmt"
"log"
"strings"
)
func main() {
s := `Foo bar random "letters lol" stuff`
r := csv.NewReader(strings.NewReader(s))
r.Comma = ' '
record, err := r.Read()
if err != nil {
log.Fatal(err)
}
out := strings.Join(record, ", ")
fmt.Println(out) // Foo, bar, random, letters lol, stuff
}
Using regexp, try this:
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
s := `Foo bar random "letters lol" stuff`
r := regexp.MustCompile(`[^\s"]+|"([^"]*)"`)
a := r.FindAllString(s, -1)
out := strings.Join(a, ", ")
fmt.Println(out) // Foo, bar, random, "letters lol", stuff
}
You could use regex
This (go playground) will cover all use cases for multiple words inside quotes and multiple quoted entries in your array:
package main
import (
"fmt"
"regexp"
)
func main() {
s := `Foo bar random "letters lol" stuff "also will" work on "multiple quoted stuff"`
r := regexp.MustCompile(`[^\s"']+|"([^"]*)"|'([^']*)`)
arr := r.FindAllString(s, -1)
fmt.Println("your array: ", arr)
}
Output will be:
[Foo, bar, random, "letters lol", stuff, "also will", work, on, "multiple quoted stuff"]
If you want to learn more about regex here is a great SO answer with super handy resources at the end - Learning Regular Expressions
Hope this helps
How can we reverse a simple string in Go?
In Go1 rune is a builtin type.
func Reverse(s string) string {
runes := []rune(s)
for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 {
runes[i], runes[j] = runes[j], runes[i]
}
return string(runes)
}
Russ Cox, on the golang-nuts mailing list, suggests
package main
import "fmt"
func main() {
input := "The quick brown 狐 jumped over the lazy 犬"
// Get Unicode code points.
n := 0
rune := make([]rune, len(input))
for _, r := range input {
rune[n] = r
n++
}
rune = rune[0:n]
// Reverse
for i := 0; i < n/2; i++ {
rune[i], rune[n-1-i] = rune[n-1-i], rune[i]
}
// Convert back to UTF-8.
output := string(rune)
fmt.Println(output)
}
This works, without all the mucking about with functions:
func Reverse(s string) (result string) {
for _,v := range s {
result = string(v) + result
}
return
}
From Go example projects: golang/example/stringutil/reverse.go, by Andrew Gerrand
/*
Copyright 2014 Google Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
// Reverse returns its argument string reversed rune-wise left to right.
func Reverse(s string) string {
r := []rune(s)
for i, j := 0, len(r)-1; i < len(r)/2; i, j = i+1, j-1 {
r[i], r[j] = r[j], r[i]
}
return string(r)
}
Go Playground for reverse a string
After reversing string "bròwn", the correct result should be "nwòrb", not "nẁorb".
Note the grave above the letter o.
For preserving Unicode combining characters such as "as⃝df̅" with reverse result "f̅ds⃝a",
please refer to another code listed below:
http://rosettacode.org/wiki/Reverse_a_string#Go
This works on unicode strings by considering 2 things:
range works on string by enumerating unicode characters
string can be constructed from int slices where each element is a unicode character.
So here it goes:
func reverse(s string) string {
o := make([]int, utf8.RuneCountInString(s));
i := len(o);
for _, c := range s {
i--;
o[i] = c;
}
return string(o);
}
There are too many answers here. Some of them are clear duplicates. But even from the left one, it is hard to select the best solution.
So I went through the answers, thrown away the one that does not work for unicode and also removed duplicates. I benchmarked the survivors to find the fastest. So here are the results with attribution (if you notice the answers that I missed, but worth adding, feel free to modify the benchmark):
Benchmark_rmuller-4 100000 19246 ns/op
Benchmark_peterSO-4 50000 28068 ns/op
Benchmark_russ-4 50000 30007 ns/op
Benchmark_ivan-4 50000 33694 ns/op
Benchmark_yazu-4 50000 33372 ns/op
Benchmark_yuku-4 50000 37556 ns/op
Benchmark_simon-4 3000 426201 ns/op
So here is the fastest method by rmuller:
func Reverse(s string) string {
size := len(s)
buf := make([]byte, size)
for start := 0; start < size; {
r, n := utf8.DecodeRuneInString(s[start:])
start += n
utf8.EncodeRune(buf[size-start:], r)
}
return string(buf)
}
For some reason I can't add a benchmark, so you can copy it from PlayGround (you can't run tests there). Rename it and run go test -bench=.
I noticed this question when Simon posted his solution which, since strings are immutable, is very inefficient. The other proposed solutions are also flawed; they don't work or they are inefficient.
Here's an efficient solution that works, except when the string is not valid UTF-8 or the string contains combining characters.
package main
import "fmt"
func Reverse(s string) string {
n := len(s)
runes := make([]rune, n)
for _, rune := range s {
n--
runes[n] = rune
}
return string(runes[n:])
}
func main() {
fmt.Println(Reverse(Reverse("Hello, 世界")))
fmt.Println(Reverse(Reverse("The quick brown 狐 jumped over the lazy 犬")))
}
I wrote the following Reverse function which respects UTF8 encoding and combined characters:
// Reverse reverses the input while respecting UTF8 encoding and combined characters
func Reverse(text string) string {
textRunes := []rune(text)
textRunesLength := len(textRunes)
if textRunesLength <= 1 {
return text
}
i, j := 0, 0
for i < textRunesLength && j < textRunesLength {
j = i + 1
for j < textRunesLength && isMark(textRunes[j]) {
j++
}
if isMark(textRunes[j-1]) {
// Reverses Combined Characters
reverse(textRunes[i:j], j-i)
}
i = j
}
// Reverses the entire array
reverse(textRunes, textRunesLength)
return string(textRunes)
}
func reverse(runes []rune, length int) {
for i, j := 0, length-1; i < length/2; i, j = i+1, j-1 {
runes[i], runes[j] = runes[j], runes[i]
}
}
// isMark determines whether the rune is a marker
func isMark(r rune) bool {
return unicode.Is(unicode.Mn, r) || unicode.Is(unicode.Me, r) || unicode.Is(unicode.Mc, r)
}
I did my best to make it as efficient and readable as possible. The idea is simple, traverse through the runes looking for combined characters then reverse the combined characters' runes in-place. Once we have covered them all, reverse the runes of the entire string also in-place.
Say we would like to reverse this string bròwn. The ò is represented by two runes, one for the o and one for this unicode \u0301a that represents the "grave".
For simplicity, let's represent the string like this bro'wn. The first thing we do is look for combined characters and reverse them. So now we have the string br'own. Finally, we reverse the entire string and end up with nwo'rb. This is returned to us as nwòrb
You can find it here https://github.com/shomali11/util if you would like to use it.
Here are some test cases to show a couple of different scenarios:
func TestReverse(t *testing.T) {
assert.Equal(t, Reverse(""), "")
assert.Equal(t, Reverse("X"), "X")
assert.Equal(t, Reverse("b\u0301"), "b\u0301")
assert.Equal(t, Reverse("😎⚽"), "⚽😎")
assert.Equal(t, Reverse("Les Mise\u0301rables"), "selbare\u0301siM seL")
assert.Equal(t, Reverse("ab\u0301cde"), "edcb\u0301a")
assert.Equal(t, Reverse("This `\xc5` is an invalid UTF8 character"), "retcarahc 8FTU dilavni na si `�` sihT")
assert.Equal(t, Reverse("The quick bròwn 狐 jumped over the lazy 犬"), "犬 yzal eht revo depmuj 狐 nwòrb kciuq ehT")
}
//Reverse reverses string using strings.Builder. It's about 3 times faster
//than the one with using a string concatenation
func Reverse(in string) string {
var sb strings.Builder
runes := []rune(in)
for i := len(runes) - 1; 0 <= i; i-- {
sb.WriteRune(runes[i])
}
return sb.String()
}
//Reverse reverses string using string
func Reverse(in string) (out string) {
for _, r := range in {
out = string(r) + out
}
return
}
BenchmarkReverseStringConcatenation-8 1000000 1571 ns/op 176 B/op 29 allocs/op
BenchmarkReverseStringsBuilder-8 3000000 499 ns/op 56 B/op 6 allocs/op
Using strings.Builder is about 3 times faster than using string concatenation
Here is quite different, I would say more functional approach, not listed among other answers:
func reverse(s string) (ret string) {
for _, v := range s {
defer func(r rune) { ret += string(r) }(v)
}
return
}
This is the fastest implementation
func Reverse(s string) string {
size := len(s)
buf := make([]byte, size)
for start := 0; start < size; {
r, n := utf8.DecodeRuneInString(s[start:])
start += n
utf8.EncodeRune(buf[size-start:], r)
}
return string(buf)
}
const (
s = "The quick brown 狐 jumped over the lazy 犬"
reverse = "犬 yzal eht revo depmuj 狐 nworb kciuq ehT"
)
func TestReverse(t *testing.T) {
if Reverse(s) != reverse {
t.Error(s)
}
}
func BenchmarkReverse(b *testing.B) {
for i := 0; i < b.N; i++ {
Reverse(s)
}
}
A simple stroke with rune:
func ReverseString(s string) string {
runes := []rune(s)
size := len(runes)
for i := 0; i < size/2; i++ {
runes[size-i-1], runes[i] = runes[i], runes[size-i-1]
}
return string(runes)
}
func main() {
fmt.Println(ReverseString("Abcdefg 汉语 The God"))
}
: doG ehT 语汉 gfedcbA
You could also import an existing implementation:
import "4d63.com/strrev"
Then:
strrev.Reverse("abåd") // returns "dåba"
Or to reverse a string including unicode combining characters:
strrev.ReverseCombining("abc\u0301\u031dd") // returns "d\u0301\u031dcba"
These implementations supports correct ordering of unicode multibyte and combing characters when reversed.
Note: Built-in string reverse functions in many programming languages do not preserve combining, and identifying combining characters requires significantly more execution time.
func ReverseString(str string) string {
output :=""
for _, char := range str {
output = string(char) + output
}
return output
}
// "Luizpa" -> "apziuL"
// "123日本語" -> "語本日321"
// "⚽😎" -> "😎⚽"
// "´a´b´c´" -> "´c´b´a´"
This code preserves sequences of combining characters intact, and
should work with invalid UTF-8 input too.
package stringutil
import "code.google.com/p/go.text/unicode/norm"
func Reverse(s string) string {
bound := make([]int, 0, len(s) + 1)
var iter norm.Iter
iter.InitString(norm.NFD, s)
bound = append(bound, 0)
for !iter.Done() {
iter.Next()
bound = append(bound, iter.Pos())
}
bound = append(bound, len(s))
out := make([]byte, 0, len(s))
for i := len(bound) - 2; i >= 0; i-- {
out = append(out, s[bound[i]:bound[i+1]]...)
}
return string(out)
}
It could be a little more efficient if the unicode/norm primitives
allowed iterating through the boundaries of a string without
allocating. See also https://code.google.com/p/go/issues/detail?id=9055 .
If you need to handle grapheme clusters, use unicode or regexp module.
package main
import (
"unicode"
"regexp"
)
func main() {
str := "\u0308" + "a\u0308" + "o\u0308" + "u\u0308"
println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme(str))
println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme2(str))
}
func ReverseGrapheme(str string) string {
buf := []rune("")
checked := false
index := 0
ret := ""
for _, c := range str {
if !unicode.Is(unicode.M, c) {
if len(buf) > 0 {
ret = string(buf) + ret
}
buf = buf[:0]
buf = append(buf, c)
if checked == false {
checked = true
}
} else if checked == false {
ret = string(append([]rune(""), c)) + ret
} else {
buf = append(buf, c)
}
index += 1
}
return string(buf) + ret
}
func ReverseGrapheme2(str string) string {
re := regexp.MustCompile("\\PM\\pM*|.")
slice := re.FindAllString(str, -1)
length := len(slice)
ret := ""
for i := 0; i < length; i += 1 {
ret += slice[length-1-i]
}
return ret
}
It's assuredly not the most memory efficient solution, but for a "simple" UTF-8 safe solution the following will get the job done and not break runes.
It's in my opinion the most readable and understandable on the page.
func reverseStr(str string) (out string) {
for _, s := range str {
out = string(s) + out
}
return
}
The following two methods run faster than the fastest solution that preserve combining characters, though that's not to say I'm missing something in my benchmark setup.
//input string s
bs := []byte(s)
var rs string
for len(bs) > 0 {
r, size := utf8.DecodeLastRune(bs)
rs += fmt.Sprintf("%c", r)
bs = bs[:len(bs)-size]
} // rs has reversed string
Second method inspired by this
//input string s
bs := []byte(s)
cs := make([]byte, len(bs))
b1 := 0
for len(bs) > 0 {
r, size := utf8.DecodeLastRune(bs)
d := make([]byte, size)
_ = utf8.EncodeRune(d, r)
b1 += copy(cs[b1:], d)
bs = bs[:len(bs) - size]
} // cs has reversed bytes
NOTE: This answer is from 2009, so there are probably better solutions out there by now.
Looks a bit 'roundabout', and probably not very efficient, but illustrates how the Reader interface can be used to read from strings. IntVectors also seem very suitable as buffers when working with utf8 strings.
It would be even shorter when leaving out the 'size' part, and insertion into the vector by Insert, but I guess that would be less efficient, as the whole vector then needs to be pushed back by one each time a new rune is added.
This solution definitely works with utf8 characters.
package main
import "container/vector";
import "fmt";
import "utf8";
import "bytes";
import "bufio";
func
main() {
toReverse := "Smørrebrød";
fmt.Println(toReverse);
fmt.Println(reverse(toReverse));
}
func
reverse(str string) string {
size := utf8.RuneCountInString(str);
output := vector.NewIntVector(size);
input := bufio.NewReader(bytes.NewBufferString(str));
for i := 1; i <= size; i++ {
rune, _, _ := input.ReadRune();
output.Set(size - i, rune);
}
return string(output.Data());
}
func Reverse(s string) string {
r := []rune(s)
var output strings.Builder
for i := len(r) - 1; i >= 0; i-- {
output.WriteString(string(r[i]))
}
return output.String()
}
Simple, Sweet and Performant
func reverseStr(str string) string {
strSlice := []rune(str) //converting to slice of runes
length := len(strSlice)
for i := 0; i < (length / 2); i++ {
strSlice[i], strSlice[length-i-1] = strSlice[length-i-1], strSlice[i]
}
return string(strSlice) //converting back to string
}
Reversing a string by word is a similar process. First, we convert the string into an array of strings where each entry is a word. Next, we apply the normal reverse loop to that array. Finally, we smush the results back together into a string that we can return to the caller.
package main
import (
"fmt"
"strings"
)
func reverse_words(s string) string {
words := strings.Fields(s)
for i, j := 0, len(words)-1; i < j; i, j = i+1, j-1 {
words[i], words[j] = words[j], words[i]
}
return strings.Join(words, " ")
}
func main() {
fmt.Println(reverse_words("one two three"))
}
Another hack is to use built-in language features, for example, defer:
package main
import "fmt"
func main() {
var name string
fmt.Scanln(&name)
for _, char := range []rune(name) {
defer fmt.Printf("%c", char) // <-- LIFO does it all for you
}
}
For simple strings it possible to use such construction:
func Reverse(str string) string {
if str != "" {
return Reverse(str[1:]) + str[:1]
}
return ""
}
For Unicode strings it might look like this:
func RecursiveReverse(str string) string {
if str == "" {
return ""
}
runes := []rune(str)
return RecursiveReverse(string(runes[1:])) + string(runes[0])
}
A version which I think works on unicode. It is built on the utf8.Rune functions:
func Reverse(s string) string {
b := make([]byte, len(s));
for i, j := len(s)-1, 0; i >= 0; i-- {
if utf8.RuneStart(s[i]) {
rune, size := utf8.DecodeRuneInString(s[i:len(s)]);
utf8.EncodeRune(rune, b[j:j+size]);
j += size;
}
}
return string(b);
}
rune is a type, so use it. Moreover, Go doesn't use semicolons.
func reverse(s string) string {
l := len(s)
m := make([]rune, l)
for _, c := range s {
l--
m[l] = c
}
return string(m)
}
func main() {
str := "the quick brown 狐 jumped over the lazy 犬"
fmt.Printf("reverse(%s): [%s]\n", str, reverse(str))
}
try below code:
package main
import "fmt"
func reverse(s string) string {
chars := []rune(s)
for i, j := 0, len(chars)-1; i < j; i, j = i+1, j-1 {
chars[i], chars[j] = chars[j], chars[i]
}
return string(chars)
}
func main() {
fmt.Printf("%v\n", reverse("abcdefg"))
}
for more info check http://golangcookbook.com/chapters/strings/reverse/
and http://www.dotnetperls.com/reverse-string-go
func reverseString(someString string) string {
runeString := []rune(someString)
var reverseString string
for i := len(runeString)-1; i >= 0; i -- {
reverseString += string(runeString[i])
}
return reverseString
}
Strings are immutable object in golang, unlike C inplace reverse is not possible with golang.
With C , you can do something like,
void reverseString(char *str) {
int length = strlen(str)
for(int i = 0, j = length-1; i < length/2; i++, j--)
{
char tmp = str[i];
str[i] = str[j];
str[j] = tmp;
}
}
But with golang, following one, uses byte to convert the input into bytes first and then reverses the byte array once it is reversed, convert back to string before returning. works only with non unicode type string.
package main
import "fmt"
func main() {
s := "test123 4"
fmt.Println(reverseString(s))
}
func reverseString(s string) string {
a := []byte(s)
for i, j := 0, len(s)-1; i < j; i++ {
a[i], a[j] = a[j], a[i]
j--
}
return string(a)
}
Here is yet another solution:
func ReverseStr(s string) string {
chars := []rune(s)
rev := make([]rune, 0, len(chars))
for i := len(chars) - 1; i >= 0; i-- {
rev = append(rev, chars[i])
}
return string(rev)
}
However, yazu's solution above is more elegant since he reverses the []rune slice in place.