How to check value of character in golang with UTF-8 strings? - string

I'm attempting to check if the first character in a string matches the following, note the UTF-8 quote characters:
c := t.Content[0]
if c != '.' && c != ',' && c != '?' && c != '“' && c != '”'{
This code does not work due to the special characters in the last two checks.
What is the correct way to do this?

Indexing a string indexes its bytes (in UTF-8 encoding - this is how Go stores strings in memory), but you want to test the first character.
So you should get the first rune and not the first byte. For efficiency you may use utf8.DecodeRuneInString() which only decodes the first rune. If you need all the runes of the string, you may use type conversion like all := []rune("I'm a string").
See this example:
for _, s := range []string{"asdf", ".asdf", "”asdf"} {
c, _ := utf8.DecodeRuneInString(s)
if c != '.' && c != ',' && c != '?' && c != '“' && c != '”' {
fmt.Println("Ok:", s)
} else {
fmt.Println("Not ok:", s)
}
}
Output (try it on the Go Playground):
Ok: asdf
Not ok: .asdf
Not ok: ”asdf

Adding to #icza's great answer: It's worth noting that while indexing of strings is in bytes, range of strings is in runes. So the following also works:
for _, s := range []string{"asdf", ".asdf", "”asdf"} {
for _, c := range s {
if c != '.' && c != ',' && c != '?' && c != '“' && c != '”' {
fmt.Println("Ok:", s)
} else {
fmt.Println("Not ok:", s)
}
break // we break after the first character regardless
}
}

Related

Why do I get a wrong result?

Below are my codes for Leetcode 20. (Given a string s containing just the characters '(', ')', '{', '}', '[' and ']', determine if the input string is valid.
An input string is valid if:
Open brackets must be closed by the same type of brackets.
Open brackets must be closed in the correct order.)
When the input is "(])" I still got true. Can anyone let me know what is wrong with my code? Thanks!
class Solution {
public boolean isValid(String s) {
Stack<Character> stack = new Stack<>();
for(char c: s.toCharArray()){
if(c == '(' || c == '[' || c == '{'){
stack.push(c);
}else{
if(stack.empty()){
return false;
}
if(c == ')' && stack.peek() == '('){
stack.pop();
}
if(c == ']' && stack.peek() == '['){
stack.pop();
}
if(c == '}' && stack.peek() == '{'){
stack.pop();
}
}
}return stack.empty();
}
}
On the second iteration of the for loop you have char ], it doesn't match the first conditional so it goes on to the else block. None of the other if statements match, therefor it doesn't know what to do and just starts on the 3rd iteration of the loop, where it sees ) and also sees ( on peek so returns empty. This is where the issue lies. You'll need to add an additional else inside your else block to catch anything that doest match the 4 ifs.
In order to fix this particular test, add a check for the ] character only. if you see that character and you havent seen any [s then return false
Hopefully that helps, if not, let me know and I can try to clarify more.

Checking for non terminating block statements in a file

I am working on a project that wants me to check for multi comments in a text file and also to see if it is a non terminating block statements. Pretty much I am using get char to check each character and compare it to the multi comment symbols and use peek to see if the next character matches the other symbols. The first part is working but to know when there is no terminating block statements is confusing please help.
if (c == '#' && inFile.peek() == '|') {
char next = '\0';
multipleComment += c;
while (inFile.get(c)) {
next = inFile.peek();
multipleComment += c;
if (c == '\n')
lineNumber++;
if (c == '|' && next == '#')
{
multipleComment += next;
tokenTypes.push_back(multipleComment);
values.push_back("COMMENT");
lineNumbers.push_back(lineNumber);
multipleComment.clear();
break;
}
else {
values.push_back("UNDEFINED");
tokenTypes.push_back(text);
lineNumbers.push_back(lineNumber);
}
}
}

Remove surrounding double or single quotes in Golang

Shouldn't strconv.Unquote handle both single and double quotes?
See also https://golang.org/src/strconv/quote.go - line 350
However following code returns a syntax error:
s, err := strconv.Unquote(`'test'`)
if err != nil {
fmt.Println(err)
} else {
fmt.Println(s)
}
https://play.golang.org/p/TnprqhNdwD1
But double quotes work as expected:
s, err := strconv.Unquote(`"test"`)
if err != nil {
fmt.Println(err)
} else {
fmt.Println(s)
}
What am I missing?
There is no ready function for what you want in the standard library.
What you presented works, but we can make it simpler (and likely more efficient):
func trimQuotes(s string) string {
if len(s) >= 2 {
if c := s[len(s)-1]; s[0] == c && (c == '"' || c == '\'') {
return s[1 : len(s)-1]
}
}
return s
}
Testing it:
fmt.Println(trimQuotes(`'test'`))
fmt.Println(trimQuotes(`"test"`))
fmt.Println(trimQuotes(`"'test`))
Output (try it on the Go Playground):
test
test
"'test
strconv.Unquote does properly handle both single and double quotes, but it isn't intended to be used in the way that your code snippet invokes it. It's intended for use in cases where you are processing go source code, and come across a string literal. The single quote case is valid for a single character, and not a string. In your go source files, if you try to use single quotes for a multi-character string literal, you'll get a compiler error similar to illegal rune literal.
What you can do instead for removing quotes from the start and end of a string, is use the strings.Trim function to take care of it.
s := strings.Trim(`'test'`, `'"`)
fmt.Println(s)
Temp workaround:
func trimQuotes(s string) string {
if len(s) >= 2 {
switch {
case s[0] == '"' && s[len(s)-1] == '"':
return s[1 : len(s)-1]
case s[0] == '\'' && s[len(s)-1] == '\'':
return s[1 : len(s)-1]
}
}
return s
}

How can I assign a new char into a string in Go?

I'm trying to alter an existing string in Go but I keep getting this error "cannot assign to new_str[i]"
package main
import "fmt"
func ToUpper(str string) string {
new_str := str
for i:=0; i<len(str); i++{
if str[i]>='a' && str[i]<='z'{
chr:=uint8(rune(str[i])-'a'+'A')
new_str[i]=chr
}
}
return new_str
}
func main() {
fmt.Println(ToUpper("cdsrgGDH7865fxgh"))
}
This is my code, I wish to change lowercase to uppercase but I cant alter the string. Why? How can I alter it?
P.S I wish to use ONLY the fmt package!
Thanks in advance.
You can't... they are immutable. From the Golang Language Specification:
Strings are immutable: once created, it is impossible to change the contents of a string.
You can however, cast it to a []byte slice and alter that:
func ToUpper(str string) string {
new_str := []byte(str)
for i := 0; i < len(str); i++ {
if str[i] >= 'a' && str[i] <= 'z' {
chr := uint8(rune(str[i]) - 'a' + 'A')
new_str[i] = chr
}
}
return string(new_str)
}
Working sample: http://play.golang.org/p/uZ_Gui7cYl
Use range and avoid unnecessary conversions and allocations. Strings are immutable. For example,
package main
import "fmt"
func ToUpper(s string) string {
var b []byte
for i, c := range s {
if c >= 'a' && c <= 'z' {
if b == nil {
b = []byte(s)
}
b[i] = byte('A' + rune(c) - 'a')
}
}
if b == nil {
return s
}
return string(b)
}
func main() {
fmt.Println(ToUpper("cdsrgGDH7865fxgh"))
}
Output:
CDSRGGDH7865FXGH
In Go strings are immutable. Here is one very bad way of doing what you want (playground)
package main
import "fmt"
func ToUpper(str string) string {
new_str := ""
for i := 0; i < len(str); i++ {
chr := str[i]
if chr >= 'a' && chr <= 'z' {
chr = chr - 'a' + 'A'
}
new_str += string(chr)
}
return new_str
}
func main() {
fmt.Println(ToUpper("cdsrgGDH7865fxgh"))
}
This is bad because
you are treating your string as characters - what if it is UTF-8? Using range str is the way to go
appending to strings is slow - lots of allocations - a bytes.Buffer would be a good idea
there is a very good library routine to do this already strings.ToUpper
It is worth exploring the line new_str += string(chr) a bit more. Strings are immutable, so what this does is make a new string with the chr on the end, it doesn't extend the old string. This is wildly inefficient for long strings as the allocated memory will tend to the square of the string length.
Next time just use strings.ToUpper!

String Matching: Matching words with or without spaces

I want to find a way by which I can map "b m w" to "bmw" and "ali baba" to "alibaba" in both the following examples.
"b m w shops" and "bmw"
I need to determine whether I can write "b m w" as "bmw"
I thought of this approach:
remove spaces from the original string. This gives "bmwshops". And now find the Largest common substring in "bmwshop" and "bmw".
Second example:
"ali baba and 40 thieves" and "alibaba and 40 thieves"
The above approach does not work in this case.
Is there any standard algorithm that could be used?
It sounds like you're asking this question: "How do I determine if string A can be made equal to string B by removing (some) spaces?".
What you can do is iterate over both strings, advancing within both whenever they have the same character, otherwise advancing along the first when it has a space, and returning false otherwise. Like this:
static bool IsEqualToAfterRemovingSpacesFromOne(this string a, string b) {
return a.IsEqualToAfterRemovingSpacesFromFirst(b)
|| b.IsEqualToAfterRemovingSpacesFromFirst(a);
}
static bool IsEqualToAfterRemovingSpacesFromFirst(this string a, string b) {
var i = 0;
var j = 0;
while (i < a.Length && j < b.Length) {
if (a[i] == b[j]) {
i += 1
j += 1
} else if (a[i] == ' ') {
i += 1;
} else {
return false;
}
}
return i == a.Length && j == b.Length;
}
The above is just an ever-so-slightly modified string comparison. If you want to extend this to 'largest common substring', then take a largest common substring algorithm and do the same sort of thing: whenever you would have failed due to a space in the first string, just skip past it.
Did you look at Suffix Array - http://en.wikipedia.org/wiki/Suffix_array
or Here from Jon Bentley - Programming Pearl
Note : you have to write code to handle spaces.

Resources