Golang: Issues replacing newlines in a string from a text file - string

I've been trying to have a File be read, which will then put the read material into a string. Then the string will get split by line into multiple strings:
absPath, _ := filepath.Abs("../Go/input.txt")
data, err := ioutil.ReadFile(absPath)
if err != nil {
panic(err)
}
input := string(data)
The input.txt is read as:
a
strong little bird
with a very
big heart
went
to school one day and
forgot his food at
home
However,
re = regexp.MustCompile("\\n")
input = re.ReplaceAllString(input, " ")
turns the text into a mangled mess of:
homeot his food atand
I'm not sure how replacing newlines can mess up so badly to the point where the text inverts itself

I guess that you are running the code using Windows. Observe that if you print out the length of the resulting string, it will show something over 100 characters. The reason is that Windows uses not only newlines (\n) but also carriage returns (\r) - so a newline in Windows is actually \r\n, not \n. To properly filter them out of your string, use:
re := regexp.MustCompile(`\r?\n`)
input = re.ReplaceAllString(input, " ")
The backticks will make sure that you don't need to quote the backslashes in the regular expression. I used the question mark for the carriage return to make sure that your code works on other platforms as well.

I do not think that you need to use regex for such an easy task. This can be achieved with just
absPath, _ := filepath.Abs("../Go/input.txt")
data, _ := ioutil.ReadFile(absPath)
input := string(data)
strings.Replace(input, "\n","",-1)
example of removing \n

Related

Creating a substring in go creates a new kind of symbol

I am comparing strings and there is the following:
Please note that the " in front of NEW are different.
Now when calling my function like this:
my_func(a[18:], b[18:])
The resulting strings are surprisingly:
What do I have to do to cut this weird symbol away and why is it behaving like this?
Because that type of quote is a multibyte character, and you are splitting the string in the middle of a character. What you could do is convert to an []rune and then convert back:
https://play.golang.org/p/pw42sEwRTZd
s := "H界llo"
fmt.Println(s[1:3]) // ��
fmt.Println(string([]rune(s)[1:3])) // 界l
Another option is the utf8string package:
package main
import "golang.org/x/exp/utf8string"
func main() {
s := utf8string.NewString(` 'Not Available') “NEW CREDIT" FROM customers;`)
t := s.Slice(18, s.RuneCount())
println(t == `“NEW CREDIT" FROM customers;`)
}
https://pkg.go.dev/golang.org/x/exp/utf8string

GO flag pkg reading option string containing escaped runes like "\u00FC" won't read

The test program below works as desired using the DEFAULT string having code points like \u00FC,
as well as if that type of code point is coded as a sting within the prog. Passing the same string from cmd line like: prog.exe -input="ABC\u00FC" does NOT. I assumed it was os interaction so
tried other quoting, even wrapping like: "(ABC\u00FC)" and trimming the parens inside the func NG.
Is the "for _, runeRead := range []rune" incorrect for escaped values?
package main
import (
"fmt"
"flag"
"os"
)
var input string
var m = make(map[rune]struct{})
func init() {
flag.StringVar(&input, "input", "A7\u00FC", "string of runes")
m['A'] = struct{}{}
m['\u00FC'] = struct{}{}
m['7'] = struct{}{}
}
func main() {
flag.Parse()
ck(input) // cmd line - with default OK
ck("A\u00FC") // hard code - OK
}
func ck(in string) {
for _, runeRead := range []rune(in) {
fmt.Printf("DEBUG: Testing rune: %v %v\n", string(runeRead), byte(runeRead))
if _, ok := m[runeRead]; ! ok {
fmt.Printf("\nERROR: Invalid entry <%v>, in string <%s>.\n", string(runeRead), in)
os.Exit(9)
}
}
}
Soluntion needs to work windows and linux.
https://ss64.com/nt/syntax-esc.html
^ Escape character.
Adding the escape character before a command symbol allows it to be treated as ordinary text.
When piping or redirecting any of these characters you should prefix with the escape character: & \ < > ^ |
e.g. ^\ ^& ^| ^> ^< ^^
So you should do
prog.exe -input="ABC^\u00FC"
in case it helps others
It apparently is that different OSs and/or shells (in my case bash) are having issue with the the "\u" of the unicode character. In bash at the cmd line the user could enter
$' the characters ' to protect the \u. It was suggested that WITHIN the program if a string had the same issue that the strconv.Quote could have been a solution.
Since I wanted an OS/shell independent solution for non-computer savvy users, I did a slightly more involved workaround.
I tell users to enter the unicode that needs the \u format to use %FC instead of \u00FC. I parse the string from the command line i.e. ABC%FC%F6123 with rexexp and inside my GO code I replace the %xx with the unicode rune as I had originally expected to get it. With a few lines of code the user input is now OS agnostic.

Why does golang bytes.Buffer behave in such way?

I recently faced a problem, where I'm writing to a byte.Buffer using a writer. But when I do String() on that byte.Buffer I'm getting an unexpected output (extra pair of double quotes added). Can you please help me understand it?
Here is a code snippet of my problem! I just need help understanding why each word is surrounded by a double quote.
func main() {
var csvBuffer bytes.Buffer
wr := csv.NewWriter(&csvBuffer)
data := []string{`{"agent":"python-requests/2.19.1","api":"/packing-slip/7123"}`}
err := wr.Write(data)
if err != nil {
fmt.Println("WARNING: unable to write ", err)
}
wr.Flush()
fmt.Println(csvBuffer.String())
}
Output:
{""agent"":""python-requests/2.19.1"",""api"":""/packing-slip/7123""}
In CSV double quotes (") are escaped as 2 double quotes. That's what you see.
You encode a single string value which contains double quotes, so all those are replaced with 2 double quotes.
When decoded, the result will contain 1 double quotes of course:
r := csv.NewReader(&csvBuffer)
rec, err := r.Read()
fmt.Println(rec, err)
Outputs (try it on the Go Playground):
[{"agent":"python-requests/2.19.1","api":"/packing-slip/7e0a05b3"}] <nil>
Quoting from package doc of encoding/csv:
Within a quoted-field a quote character followed by a second quote character is considered a single quote.
"the ""word"" is true","a ""quoted-field"""
results in
{`the "word" is true`, `a "quoted-field"`}
In CSV, the following are equivalent:
one,two
and
"one","two"
Now if the values would contain double quotes, that would indicate the end of the value. CSV handles this by substituting double quotes with 2 of them. The value one"1 is encoded as one""1 in CSV, e.g.:
"one""1","two""2"

Send strings terminated with [CR][LF]

My app sends NMEA strings terminated with [CR]+[LF].
The NMEA standard specifies this format (example is heading info from a gyro compass): '$HEHDT,2.0,T*2D[CR][LF]'.
At the receiving end the string is discarded as incomplete.
How do I append and send these characters?
Sending is straight forward with only a few lines of code (Object is Cp1tx: TIdUDPServer;):
...
Cp1tx.Active:= true;
Cp1tx.Broadcast(InStr,8051,'',IndyTextEncoding_8Bit);
Cp1tx.Active:= false;
...
Btw, I am using Delphi 10.1 Berlin.
Assumming that the InStr is the string you want to send it would be :
Cp1tx.Broadcast(InStr + #13#10, 8051, '', IndyTextEncoding_8Bit);
There are different ways to express CRLF:
Instr := '$HEHDT,2.0,T*2D'#13#10;
Instr := '$HEHDT,2.0,T*2D'#$D#$A;
// CR and LF are defined in the IdGlobal unit
Instr := '$HEHDT,2.0,T*2D'+CR+LF;
// EOL is defined in the IdGlobal unit
Instr := '$HEHDT,2.0,T*2D'+EOL;
Thanks to all of you.
I think I made a fool of myself. It runs ok now no matter how I add the CRLF chars.
A historical comment:
CRLF (and in that order!) was invented for use in the old, mechanical telex machines powered by a 1/2 HP motor. It took time to move the carriage back to the left position. That's why CR is send first and then LF, so all the mechanics have time to align and get ready to print the first character on the new line.
Novice telex operators learned it the hard way. Sending LF and CR and then typing text trapped the carriage on its way to the left, the type arms tangled and often the drive axle jammed or broke. Remember this was high speed transmission on astonishing 50 Baud! I spend endless hours in my service repairing broke telex machines. Well, things are different and better nowadays, but we still stick to the old CRLF convention.
When I need to send CR+LF often I declare a Const and refer to it as needed.
Const
CRLF = #13+#10;
{ To use this do the following }
MyString := 'This string ends with a Carriage Return / Line Feed'+CRLF;
You can also add Carriage Return / Linefeed using Chr(10)+Chr(13);
For example;
MyString := 'This string also ends with a CRLF' + Chr(10) + Chr(13)
+ 'But it could equally end with an Escape Code' + Chr(27) // or #27
I have edited my answer because it was pointed out I had the CR LF in the wrong order.

Convert underscores to spaces in Matlab string?

So say I have a string with some underscores like hi_there.
Is there a way to auto-convert that string into "hi there"?
(the original string, by the way, is a variable name that I'm converting into a plot title).
Surprising that no-one has yet mentioned strrep:
>> strrep('string_with_underscores', '_', ' ')
ans =
string with underscores
which should be the official way to do a simple string replacements. For such a simple case, regexprep is overkill: yes, they are Swiss-knifes that can do everything possible, but they come with a long manual. String indexing shown by AndreasH only works for replacing single characters, it cannot do this:
>> s = 'string*-*with*-*funny*-*separators';
>> strrep(s, '*-*', ' ')
ans =
string with funny separators
>> s(s=='*-*') = ' '
Error using ==
Matrix dimensions must agree.
As a bonus, it also works for cell-arrays with strings:
>> strrep({'This_is_a','cell_array_with','strings_with','underscores'},'_',' ')
ans =
'This is a' 'cell array with' 'strings with' 'underscores'
Try this Matlab code for a string variable 's'
s(s=='_') = ' ';
If you ever have to do anything more complicated, say doing a replacement of multiple variable length strings,
s(s == '_') = ' ' will be a huge pain. If your replacement needs ever get more complicated consider using regexprep:
>> regexprep({'hi_there', 'hey_there'}, '_', ' ')
ans =
'hi there' 'hey there'
That being said, in your case #AndreasH.'s solution is the most appropriate and regexprep is overkill.
A more interesting question is why you are passing variables around as strings?
regexprep() may be what you're looking for and is a handy function in general.
regexprep('hi_there','_',' ')
Will take the first argument string, and replace instances of the second argument with the third. In this case it replaces all underscores with a space.
In Matlab strings are vectors, so performing simple string manipulations can be achieved using standard operators e.g. replacing _ with whitespace.
text = 'variable_name';
text(text=='_') = ' '; //replace all occurrences of underscore with whitespace
=> text = variable name
I know this was already answered, however, in my case I was looking for a way to correct plot titles so that I could include a filename (which could have underscores). So, I wanted to print them with the underscores NOT displaying with as subscripts. So, using this great info above, and rather than a space, I escaped the subscript in the substitution.
For example:
% Have the user select a file:
[infile inpath]=uigetfile('*.txt','Get some text file');
figure
% this is a problem for filenames with underscores
title(infile)
% this correctly displays filenames with underscores
title(strrep(infile,'_','\_'))

Resources