ReadStr() and WriteStr() in Delphi - string

I have some code that uses ReadStr and WriteStr for what I presume is writing a string to a binary file.
The explanation for WriteStr in the documentation states that it will write raw data in the shape of an AnsiString to the object's stream, which makes sense. But then ReadStr says that it reads a character. So are they not the opposite of each other?
Let say I have,
pName: String[80];
and I use WriteStr on it, what does it actually write? Since WriteStr expects AnsiString, does it cast pName to be such? In that case, does it not write the "Length" field into the stream because an AnsiString pointer points to the first element and not the length field? I was also looking and it seems String == AnsiString these days, but my question about the length field still remains the same.
If lets say it doesn't write the Length field into the file, does it still write the NULL at the end of the data? As such, can I find where the string ends by looking for a '\0'? Does ReadStr read until the NULL character?
Thank you kindly :)

In your pre-Unicode version of Delphi, WriteStr and ReadStr write and read an AnsiString value. The writing code writes the length, and then the string content. The reading code reads the length, allocates the string, and then fills it with the content.
This has the potential of involving a truncation when you assign the result of ReadStr to your 80 character short string.

Related

Is an empty string `[]byte("")` serialized as `\0`

Our code base is Go and C++. I'm a C++ programmer, so I can sort of follow Go, but I don't understand it well.
We are an embedded shop and the Go folks are serializing a string and sending it on an I2C buffer. It appears an empty Go string is appended to the I2C transaction as \0, instead of "nothing".
Everything I've seen in Go documentation has only described how to test for it (i.e. str == "" or len(str) > 0), but doesn't appear to describe how it is serialized.
As a C++ programmer, \0 makes sense, because it is the null terminator for strings or even simply NULL, which would make sense to store in a variable. Can someone please confirm or deny this?
The Go Language does not specify how Go values are serialized. The encoders in the Go standard library serialize empty strings in different ways. The gob encoder omits empty strings. The JSON encoder writes empty strings as "" (unless told to omit the empty string).
In memory string values do not have a null terminator.

Why is Julia giving me StringIndex error?

I'm getting a StringIndex error for one particular string out of 10,000 which I am processing. I don't really know what the issue is with this string. I think it is probably a special character issue.
If I println the string then assign it to txt then pass txt to the function, I don't get an error. I am a little baffled.
I am sorry, I can't post the string as it is protected content and even if I did copying and pasting the string somehow removes the source of error. Any suggestions?
Just to expand. The details of how String is represented in Julia are explained in the Julia manual.
You can use eachindex to get an iterator of valid indices into a String. The reason why it is an iterator is that you cannot efficiently (i.e. in O(1) time) find an index of i-th character in the string. However, you can use isascii function on a String to check if it consists only of ASCII characters (in which case byte and character indices are the same).
Also if you need to get to some specific character in a string you usually need probably more than one character, in which case first, last and chop functions are useful (actually last(first(s, n)) gives you a character at position n; although it is not most efficient - iterating eachindex will allocate less).
In Julia Strings are indexed by bytes rather than characters. You should use for c in str rather than trying to index manually.

How do I change my data from bytes to strings in this function?

I have some code for networking program. and its
def dataReceived(self, data)
print(f"Received quote: {data}")
self.transport.loseConnection()
This function is printing
Received quote: b'\x00&C:\\Users\\.pycharm2016.3\\config\x00&C:\\users\\pych‌​arm\\system\x00\x03-‌​-'
How would i change my code to fix this?
I think I know that what is happening is what is being printed out is the code in bytes and it needs to be converted to a string, but do I do this on the server or client side of the program?
When i write
print(f"receivedquote: {data}".decode('utf-8')
that does not do the trick. I get a lot of errors. How can I ask this question better to find a solution?
Decode the actual data:
data = data.decode('utf-8')
At this point, data is a Python Unicode string which you can print, search, slice, etc.
(Your data doesn't particularly look like UTF-8 but I'm not going to second-guess that; it's not clearly not UTF-8, either.)
It's generally a good idea to convert to Unicode immediately after receiving a byte string, and have the rest of your program operate on strings. If you need to encode back to bytes, similarly do that only at the perimeter, jut before the data leaves your program. (Ned Batchelder calls this "Unicode sandwiching.")

Is it safe to cast binary data from a byte array to a string and back in golang?

Maybe a stupid question, but if I have some arbitrary binary data, can I cast it to string and back to byte array without corrupting it?
Is []byte(string(byte_array)) always the same as byte_array?
The expression []byte(string(byte_slice)) evaluates to a slice with the same length and contents as byte_slice. The capacity of the two slices may be different.
Although some language features assume that strings contain valid UTF-8 encoded text, a string can contain arbitrary bytes.

What is the difference between binary safe strings and binary unsafe strings?

I was reading redis manifesto[1] and it seems redis accepts only binary safe strings as keys but I don't know the difference between the two. Can anyone explain with an example?
[1] http://oldblog.antirez.com/post/redis-manifesto.html
According to Redis documentation, simple Redis strings have syntax "+redis_response\r\n" whereas bulk Redis strings have syntax "$str_len\r\nbinary_safe_string\r\n".
In other words, binary safe string in Redis can contain any data as simple as "foo" to any binary data upto 512MB say a JEPG image. Binary safe string has its length encoded in it and does not terminate with any particular character such as a NULL terminating string in C which ends with '\0.
HTH,
Swanand
I'm not familiar with the system in question, but the term "binary safe string" might be used either to describe certain string-storage types or to describe particular string instances. In a binary-safe string type, a string of length N may be used to encapsulate any sequence of N values in the range either 0-255 or 0-65535 (for 8- or 16-bit types, respectively). A binary-safe string instance might be one whose representation may be subdivided into uniformly-sized pieces, with each piece representing one character, as distinct from a string instance in which different characters require different amounts of storage space.
Some string types (which are not binary safe) will use variable-length representations for certain characters, and will behave oddly if asked to act upon e.g. a string which contains the code for "first half of a multi-part character" followed by something other than a "second half of multi-part character". Further, some code which works with strings will assume that it the Nth character will be stored in either the Nth byte or the Nth pair of bytes, and will malfunction if given a string in which, e.g. the 8th character is stored in the 12th and 13th pairs of bytes.
Looking only briefly at the link provided, I would guess that it's saying that the redis does not expect to only work with strings that use different numbers of bytes to hold different characters, though I'm not quite clear whether it's assuming that a string type will be able to handle any possible sequence of bytes, or whether it's assuming that any string instance which it's given may be safely regarded as a sequence of bytes. I think the fundamental concepts of interest, though, are (1) some string types use variable-length encodings and others do not; (2) even in types that use variable-length encodings, a useful subset of string instances will consist only of fixed-length characters.
Binary-safe means that a string can contain any character, while binary-unsafe can not, such as '\0' in C language. '\0' is the ending of a string, which means characters after '\0' and before '\0' will be considered as two different strings.

Resources