How to convert strings to array of byte and back - string

4I must write strings to a binary MIDI file. The standard requires one to know the length of the string in bytes. As I want to write for mobile as well I cannot use AnsiString, which was a good way to ensure that the string was a one-byte string. That simplified things. I tested the following code:
TByte = array of Byte;
function TForm3.convertSB (arg: string): TByte;
var
i: Int32;
begin
Label1.Text := (SizeOf (Char));
for i := Low (arg) to High (arg) do
begin
label1.Text := label1.Text + ' ' + IntToStr (Ord (arg [i]));
end;
end; // convert SB //
convertSB ('MThd');
It returns 2 77 84 104 100 (as label text) in Windows as well as Android. Does this mean that Delphi treats strings by default as UTF-8? This would greatly simplify things but I couldn't find it in the help. And what is the best way to convert this to an array of bytes? Read each character and test whether it is 1, 2 or 4 bytes and allocate this space in the array? For converting back to a character: just read the array of bytes until a byte is encountered < 128?

Delphi strings are encoded internally as UTF-16. There was a big clue in the fact that SizeOf(Char) is 2.
The reason that all your characters had ordinal in the ASCII range is that UTF-16 extends ASCII in the sense that characters 0 to 127, in the ASCII range, have the same ordinal value in UTF-16. And all your characters are ASCII characters.
That said, you do not need to worry about the internal storage. You simply convert between string and byte array using the TEncoding class. For instance, to convert to UTF-8 you write:
bytes := TEncoding.UTF8.GetBytes(str);
And in the opposite direction:
str := TEncoding.UTF8.GetString(bytes);
The class supports many other encodings, as described in the documentation. It's not clear from the question which encoding you are need to use. Hopefully you can work the rest out from here.

Related

How do I convert a string of escape sequences to bytes?

Creating a TCP server that needs to process some data. I have a net.Conn instance "connection"from which I will read said data. The lower part of the snippet brings about an error noting that it cannot use the 'esc' value as a byte value.
const(
esc = "\a\n"
)
....
c := bufio.NewReader(connection)
data, err := c.ReadBytes(esc)
Clearly, some conversion is needed but when I try
const(
esc = "\a\n"
)
....
c := bufio.NewReader(connection)
data, err := c.ReadBytes(byte(esc))
The compiler makes note that I cannot convert esc to byte. Is it due to the fact that I declared "\a\n" as a const value on the package level? Or is there something else overall associated with how I'm framing the bytes to be read?
You can't convert esc to byte because you can't convert strings into single bytes. You can convert a string into a byte slice ([]byte).
The bufio.Reader only supports single byte delimiters, you should use a bufio.Scanner with a custom split function instead for multi-byte delimiters.
Perhaps a modified version of https://stackoverflow.com/a/37531472/1205448

How to convert AnsiStr to Bytes and vice versa?

Hope somebody can help me with this problem - I have to read data from an interface that is specified like this:
<Each message consists of a 2-byte length (in network byte order) followed by that many bytes of data. The end of the msg series is indicated by an empty message (length of 0).>
Using TDataPortTCP I can read the buffer with Dataport.Peek(size) and pull the data from the buffer with Dataport.Pull(size) - both methods provide the result as AnsiStr
I imagine that something like this should work, but I have no idea how to convert AnsiStr to Bytes and vice versa:
while DataPortTCP.Peek(2) > ZeroBytes do
begin
LengthInBytes := DataPortTCP.Pull(2) ;
sContent := DataPortTCP.Pull(LengthInByte) ;
end;
How do I declare / get / convert ZeroBytes and LengthInBytes and how do I have to deal with Endianess ?
Unfortunately I know nothing about TBytes and what I read so far did only lead to more confusion ;-)
I would be very grateful if someone could point me into the right direction.
To check whether a string is not empty, you can compare to ''.
To retrieve the byte value from a string, you can access the character and apply the ord function.
Network order is big endian, mean that the first byte is of higher order. Hence you need to shift it to the left using shl. In this case, there are two bytes, so the first on is shifted 8 bits, which amounts to multiplying it by 256.
var
LengthInBytesStr: String;
LengthInBytes: Word;
begin
...
while DataPortTCP.Peek(2) <> '' do
begin
LengthInBytesStr := DataPortTCP.Pull(2);
LengthInBytes := (ord(LengthInBytesStr[1]) shl 8)
+ ord(LengthInBytesStr[2]);
sContent := DataPortTCP.Pull(LengthInByte);
end;

How to detect when bytes can't be converted to string in Go?

There are invalid byte sequences that can't be converted to Unicode strings. How do I detect that when converting []byte to string in Go?
You can, as Tim Cooper noted, test UTF-8 validity with utf8.Valid.
But! You might be thinking that converting non-UTF-8 bytes to a Go string is impossible. In fact, "In Go, a string is in effect a read-only slice of bytes"; it can contain bytes that aren't valid UTF-8 which you can print, access via indexing, pass to WriteString methods, or even round-trip back to a []byte (to Write, say).
There are two places in the language that Go does do UTF-8 decoding of strings for you.
when you do for i, r := range s the r is a Unicode code point as a value of type rune
when you do the conversion []rune(s), Go decodes the whole string to runes.
(Note that rune is an alias for int32, not a completely different type.)
In both these instances invalid UTF-8 is replaced with U+FFFD, the replacement character reserved for uses like this. More is in the spec sections on for statements and conversions between strings and other types. These conversions never crash, so you only need to actively check for UTF-8 validity if it's relevant to your application, like if you can't accept the U+FFFD replacement and need to throw an error on mis-encoded input.
Since that behavior's baked into the language, you can expect it from libraries, too. U+FFFD is utf8.RuneError and returned by functions in utf8.
Here's a sample program showing what Go does with a []byte holding invalid UTF-8:
package main
import "fmt"
func main() {
a := []byte{0xff}
s := string(a)
fmt.Println(s)
for _, r := range s {
fmt.Println(r)
}
rs := []rune(s)
fmt.Println(rs)
}
Output will look different in different environments, but in the Playground it looks like
�
65533
[65533]

Indexing string as chars

The elements of strings have type byte and may be accessed using the
usual indexing operations.
How can I get element of string as char ?
"some"[1] -> "o"
The simplest solution is to convert it to an array of runes :
var runes = []rune("someString")
Note that when you iterate on a string, you don't need the conversion. See this example from Effective Go :
for pos, char := range "日本語" {
fmt.Printf("character %c starts at byte position %d\n", char, pos)
}
This prints
character 日 starts at byte position 0
character 本 starts at byte position 3
character 語 starts at byte position 6
Go strings are usually, but not necessarily, UTF-8 encoded. In the case they are Unicode strings, the term "char[acter]" is pretty complex and there is no generall/unique bijection of runes (code points) and Unicode characters.
Anyway one can easily work with code points (runes) in a slice and use indexes into it using a conversion:
package main
import "fmt"
func main() {
utf8 := "Hello, 世界"
runes := []rune(utf8)
fmt.Printf("utf8:% 02x\nrunes: %#v\n", []byte(utf8), runes)
}
Also here: http://play.golang.org/p/qWVSA-n93o
Note: Often the desire to access Unicode "characters" by index is a design mistake. Most of textual data is processed sequentially.
Another option is the package utf8string:
package main
import "golang.org/x/exp/utf8string"
func main() {
s := utf8string.NewString("🧡💛💚💙💜")
t := s.At(2)
println(t == '💚')
}
https://pkg.go.dev/golang.org/x/exp/utf8string

how can delphi 'string' literals be more than 255?

im working on delphi 7 and i was working on a strings, i came across this
For a string of default length, that is, declared simply as string, max size is always 255. A ShortString is never allowed to grow to more than 255 characters.
on delphi strings
once i had to do something like this in my delphi code (that was for a really big query)
var
sMyStringOF256characters : string;
ilength : integer;
begin
sMyStringOF256characters:='ThisStringisofLength256,ThisStringisofLength256,.....'
//length of sMyStringOF256characters is 256
end;
i get this error
[Error] u_home.pas(38): String literals may have at most 255 elements.
but when i try this
var
iCounter : integer;
myExtremlyLongString : string;
begin
myExtremlyLongString:='';
Label1.Caption:='';
for iCounter:=0 to 2500 do
begin
myExtremlyLongString:=myExtremlyLongString+inttostr(iCounter);
Label1.Caption:=myExtremlyLongString;
end;
Label2.Caption:=inttostr(length(myExtremlyLongString));
end;
and the result is
As you can see the length of myExtremlyLongString is 8894 characters.
why did not delphi give any error saying the length is beyond 255 for myExtremlyLongString?
EDIT
i used
SetLength(sMyStringOF256characters,300);
but it doesnt work.
why did not delphi give any error saying the length is beyond 255 for
myExtremlyLongString?
You have your answer a bit down in the text in section Long String (AnsiString).
In current versions of Delphi, the string type is simply an alias for
AnsiString,
So string is not limited to 255 characters but a string literal is. That means that you can build a string that is longer than 255 characters but you can not have a string value in code that is longer than 255 characters. You need to split them if you want that.
sMyString:='ThisStringisofLength255'+'ThisStringisofLength255';
Split it up into:
sMyStringOF256characters :=
'ThisStringis' +
'ofLength256' +
'And ManyManyManyManyManyManyManyManyManyManyManyManyMany' +
'ManyManyManyManyManyManyManyManyManyManyManyManyMany' +
'ManyManyManyManyManyManyManyManyManyManyManyManyMany' +
'ManyManyManyManyManyManyManyManyManyManyManyManyMany' +
'ManyManyManyManyManyManyManyManyManyManyManyManyMany' +
'ManyManyManyManyManyManyManyManyManyManyManyManyMany' +
'ManyManyManyManyManyManyManyManyManyManyManyManyMany' +
'ManyManyManyManyManyManyManyManyManyManyManyManyMany' +
'CharactersCharactersCharactersCharactersCharactersCharactersCharactersCharacters';
Back in old DOS/Turbo Pascal days, "strings" were indeed limited to 255 characters. In large part because the 1st byte contained the string length, and a byte can only have a value between 0 and 255.
That is no longer an issue in contemporary versions of Delphi.
"ShortString" is the type for the old DOS/Pascal string type.
"LongString" has been the default string type for a long time (including the Borland Delphi 2006 I currently use for most production work). LongStrings (aka "AnsiStrings") hold 8-bit characters, and are limited only by available memory.
Recent versions of Delphi (Delphi 2009 and higher, including the new Delphi XE2) all now default to multi-byte Unicode "WideString" strings. WideStrings, like AnsiStrings, are also effectively "unlimited" in maximum length.
This article explains in more detail:
http://delphi.about.com/od/beginners/l/aa071800a.htm
The difference is that in your first code example you are putting the string as part of your code - literal string. That has a limitation on how many characters it will allow.
In your second code example you are generating it dynamically and not putting it as one big literal string.
String type in Delphi (unlike shortstring that can only be up to 255) can be as big as your memory.
You could try using the StringBuilder class:
procedure TestStringBuilder;
var
I: Integer;
StringBuilder: TStringBuilder;
begin
StringBuilder := TStringBuilder.Create;
try
for I := 1 to 10 do
begin
StringBuilder.Append('a string ');
StringBuilder.Append(66); //add an integer
StringBuilder.Append(sLineBreak); //add new line
end;
OutputWriteLine('Final string builder length: ' + IntToStr(StringBuilder.Length));
finally
StringBuilder.Free;
end;
end;
If you need realy long string in Delphi, you can load it from other resources like a txt files or just plain text with any extension. Im using it and it works. You can create "like a" array tables using plain text lines numbers. In delphi code, you can do as #arjen van der Spek and others says only.
For me, files with text as var's formated -
sometext:string=
'txt...'+
'txt...'+
'txt...';
are bad for future editing.
pros: you can use any long text.
cons: text code is open, anybody can read it opening file in notepad etc.

Resources