Check if a string starts with a decimal digit? - string

It looks the following works, is it a good approach?
var thestr = "192.168.0.1"
if (thestr[0]>= '0' && thestr[0] <= '9'){
//...
}

Your solution is completely fine.
But note that strings in Go are stored as a read-only byte slice where the bytes are the UTF-8 encoded byte sequence, and indexing a string indexes its bytes, not its runes (characters). But since a decimal digit ('0'..'9') has exactly one byte, it is ok in this case to test the first byte, but first you should test if len(s) > 0 or s != "".
Here are some other alternatives, try all on the Go Playground:
1) Testing the byte range:
This is your solution, probably the fastest one:
s := "12asdf"
fmt.Println(s[0] >= '0' && s[0] <= '9')
2) Using fmt.Sscanf():
Note: this also accepts if the string starts with a negative number, decide if it is a problem for you or not (e.g. accepts "-12asf").
i := 0
n, err := fmt.Sscanf(s, "%d", &i)
fmt.Println(n > 0, err == nil) // Both n and err can be used to test
3) Using unicode.IsDigit():
fmt.Println(unicode.IsDigit(rune(s[0])))
4) Using regexp:
I would probably never use this as this is by far the slowest, but here it is:
r := regexp.MustCompile(`^\d`)
fmt.Println(r.FindString(s) != "")
Or:
r := regexp.MustCompile(`^\d.*`)
fmt.Println(r.MatchString(s))

Please do not use regexps for that simple task :)
What I would change in this case:
add check for empty string before checking for the first rune
I would rephrase it as "starts with a digit" as the number semantic is too broad. .5e-45 is a number, but probably it is not what you want. 0's semantic is also not simple: https://math.stackexchange.com/questions/238737/why-do-some-people-state-that-zero-is-not-a-number

Since you are comparing by character and no characters are between 1 and 9, I think your solution is ok, but it does not account for the other numbers following.
For example, if thestr was "192.something.invalid" it's no longer an IP.
I'd recommend using a regular expression to check the IP.
something like
\b(?:\d{1,3}\.){3}\d{1,3}\b

Related

How to create a string of arbitrary length

I want to create a dummy string of a given length to do a performance test. For example I want to first test with 1 KB of string and then may be 10 KB of string etc. I don't care which character (or rune?) it gets filled with. I understand that a string in Go is backed by byte array. So, I want the final string to be backed by a byte array of size equivalent of 1 KB (if I give 1024 as the argument).
For example, I tried the brute force code below:
...
oneKBPayload := createPayload(1024, 'A')
...
//I don't mind even if the char argument is removed and 'A' is used for example
func createPayload(len int, char rune) string {
payload := make([]byte, len)
for i := 0; i < len; i++ {
payload = append(payload, byte(char))
}
return string(payload[:])
}
and it produced a result of (for 10 length)
"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000AAAAAAAAAA"
I realize that it has something to do with the encoding. But how to fix this so that I create any string which is backed by a byte array of the given length so that when I write it over the network, I generate the intended payload.
Your createPayload() creates a byte slice with the given length, which is filled with zeros by default (zero value). Then you append len number of runes to this slice, so the result will be double the length you intend to create (given the rune is less than 127), and that's why you see zeros then followed by the 'A' rune when printed.
If you change it to:
payload := make([]byte, 0, len)
Then the result will be what you want.
But easier would be to simply use strings.Repeat() which repeats a given string value n times. Repeat a one-rune (or more specifically a one-byte) string value n times, and you get what you want:
s := strings.Repeat("A", 10)
fmt.Println(len(s), s)
This will output (try it on the Go Playground):
10 AAAAAAAAAA
If you don't care about the content of the string only about its length, then simply convert a byte slice like this:
s := string(make([]byte, 1024))
fmt.Println(len(s))
Or alternatively like this:
s2 := string([]byte{1023: 0})
fmt.Println(len(s2))
Both prints 1024. Try them on the Go Playground.
If you do care about the content and you already have a byte slice allocated, this is how you can efficiently fill it: Is there analog of memset in go?

Easy way to get a sub-string/sub-slice of up to N characters/elements in Go

In Python I can slice a string to get a sub-string of up to N characters and if the string is too short it will simply return the rest of the string, e.g.
"mystring"[:100] # Returns "mystring"
What's the easiest way to do the same in Go? Trying the same thing panics:
"mystring"[:100] // panic: runtime error: slice bounds out of range
Of course, I can write it all manually:
func Substring(s string, startIndex int, count int) string {
maxCount := len(s) - startIndex
if count > maxCount {
count = maxCount
}
return s[startIndex:count]
}
fmt.Println(Substring("mystring", 0, n))
But that's rather a lot of work for something so simple and (I would have thought) common. What's more, I don't know how to generalise this function to slices of other types, since Go doesn't support generics. I'm hoping there is a better way. Even Math.Min() doesn't easily work here, because it expects and returns float64.
Note that while a function remains the recommended solution (even if it has to be implemented for slices with different type), it wouldn't work well with string.
fmt.Println(Substring("世界mystring", 0, 5)) would actually print 世�� instead of 世界mys.
See "Code points, characters, and runes": a character may be represented by a number of different sequences of code points, and therefore different sequences of UTF-8 bytes.
And in Go, a "code point" is a rune (as seen here).
Using rune would be more robust (again, in case of strings)
func SubstringRunes(s string, startIndex int, count int) string {
runes := []rune(s)
length := len(runes)
maxCount := length - startIndex
if count > maxCount {
count = maxCount
}
return string(runes[startIndex:count])
}
See it in action in this playground.

Comparing not equal length strings in Go

When I compare the following not equal length strings in Go, the result of comparison is not right. Can someone help?
i := "1206410694"
j := "128000000"
fmt.Println("result is", i >= j, i, j )
The output is:
result is false 1206410694 128000000
The reason is probably because Go does char by char comparison starting with the most significant char. In my case these strings represent numbers so i is larger than j. So just wonder if someone can help with explaining how not equal length strings are compared in go.
The reason is probably because Go does char by char comparison starting with the most significant char.
This is correct.
If they represent numbers, then you should compare as them as numbers. Parse / convert them to int before comparing:
ii, _ := strconv.Atoi(i)
ij, _ := strconv.Atoi(j)
Edit: And yes, #JimB is totally right. If you are not 100% sure that the conversion will succeed, please do not ignore the errors.

Int32.TryParse equivalent for String?

I have been doing some homework. The task was to implement System.Int32.TryParse to check if an entered value is a number or something random. I was wondering if there is a built-in method that checks if something entered is a letter and NOT a number. I tried searching google and MSDN in the string type, but with no luck so far. I did write my own implementation, but I was curious.
Tnx
The easiest way to check whether a character is a number is probably to check whether it is in range from '0' to '9'. This works because of the way characters are encoded - the digits are encoded as a sub-range of the char values.
let str = "123"
let firstIsNumber =
str.[0] >= '0' && str.[0] <= '9'
This gives you a bit different behavior than Char.IsDigit because Char.IsDigit also returns true for thigns that are digits in other alphabets, say ႐႑႒႓႔႕႖႗႘႙ I suspect you do not plan to parse those :-).

How do you make a function detect whether a string is binary safe or not

How does one detect if a string is binary safe or not in Go?
A function like:
IsBinarySafe(str) //returns true if its safe and false if its not.
Any comment after this are just things I have thought or attempted to solve this:
I assumed that there must exist a library that already does this but had a tough time finding it. If there isn't one, how do you implement this?
I was thinking of some solution but wasn't really convinced they were good solutions.
One of them was to iterate over the bytes, and have a hash map of all the illegal byte sequences.
I also thought of maybe writing a regex with all the illegal strings but wasn't sure if that was a good solution.
I also was not sure if a sequence of bytes from other languages counted as binary safe. Say the typical golang example:
世界
Would:
IsBinarySafe(世界) //true or false?
Would it return true or false? I was assuming that all binary safe string should only use 1 byte. So iterating over it in the following way:
const nihongo = "日本語abc日本語"
for i, w := 0, 0; i < len(nihongo); i += w {
runeValue, width := utf8.DecodeRuneInString(nihongo[i:])
fmt.Printf("%#U starts at byte position %d\n", runeValue, i)
w = width
}
and returning false whenever the width was great than 1. These are just some ideas I had just in case there wasn't a library for something like this already but I wasn't sure.
Binary safety has nothing to do with how wide a character is, it's mainly to check for non-printable characters more or less, like null bytes and such.
From Wikipedia:
Binary-safe is a computer programming term mainly used in connection
with string manipulating functions. A binary-safe function is
essentially one that treats its input as a raw stream of data without
any specific format. It should thus work with all 256 possible values
that a character can take (assuming 8-bit characters).
I'm not sure what your goal is, almost all languages handle utf8/16 just fine now, however for your specific question there's a rather simple solution:
// checks if s is ascii and printable, aka doesn't include tab, backspace, etc.
func IsAsciiPrintable(s string) bool {
for _, r := range s {
if r > unicode.MaxASCII || !unicode.IsPrint(r) {
return false
}
}
return true
}
func main() {
fmt.Printf("len([]rune(s)) = %d, len([]byte(s)) = %d\n", len([]rune(s)), len([]byte(s)))
fmt.Println(IsAsciiPrintable(s), IsAsciiPrintable("test"))
}
playground
From unicode.IsPrint:
IsPrint reports whether the rune is defined as printable by Go. Such
characters include letters, marks, numbers, punctuation, symbols, and
the ASCII space character, from categories L, M, N, P, S and the ASCII
space character. This categorization is the same as IsGraphic except
that the only spacing character is ASCII space, U+0020.

Resources