libspotify and const char * lifetimes / encoding - spotify

Are the various libspotify APIs that return const char* returning caller owned strings or callee owned strings?
The normal convention, as far as I know, is that const char* means the callee owns it and the caller can use it but not necessarily rely on its lifetime and is not expected to free it.
Is this the pattern Spotify follows?
Also I saw some mention in the api.h file that the strings are UTF8 encoded? I assume this is true on all APIs not just the one or two that explicitly mention it?

1) const char * returns are owned by libSpotify unless stated otherwise. You don't need to free() them, and if you want them to stick around you should copy them - for example, a playlist name's const char * will be freed by libSpotify when the playlist's name changes. The "Add your own locks" section of the libSpotify FAQ discusses this a little bit.
2) All strings are UTF-8.

Related

Are extended attributes name and value guaranteed to be UTF-8 encoded

I am trying to implement Rusty wrappers for those extended attributes syscalls, if they are guaranteed to be UTF-8 encoded, then I can use the type String, or I have to use OsString.
I googled a lot, but only found these two pages:
freedesktop: CommonExtendedAttributes says that:
Attribute strings should be in UTF-8 encoding.
macOS man page for setxattr(2) says that: The extended attribute names are simple NULL-terminated UTF-8 strings
Seems that this tells us the name is guaranteed to be UTF-8 encoded on macOS,
I would like to know information on as many platforms as possible since I try to cover them all in my implementation.
No, in Linux they are absolutely not guaranteed to be in UTF-8. Attribute values are not even guaranteed to be strings at all. They are just arrays of bytes with no constraints on them.
int setxattr(const char *path, const char *name,
const void *value, size_t size, int flags);
const void *value, size_t size is not a good way to pass a string to a function. const char* name is, and attribute names are indeed strings, but they are null-terminated byte strings.
Freedesktop recommendations are just that, recommendations. They don't prevent anyone from creating any attribute they want.

How do I cast the struct to a char pointer?

Effectively I want to make an spi interface where I'll be able to change bits 18-22 and bits 1-16 separately (I want a one hot address on bits 1-16 and a binary coded decimal on bits 18-22) here's how I intend on implementing the struct
struct spi_out
{
unsigned int BCDADDR : 4
unsigned int OHADDR : 16
/// Some other spi bit addresses making up the rest of the 3 bytes
So here's my problem
I want to be able to access BCD address and encode it directly eg: spi_out.bcd = 5// in order to access the 6th cellbut I also want to use the operator function to format the bits the way I need them since I need the variables in the order I put them in and I can't figure out a simple way of doing this since I wouldn't want to have to put an LUT inside an operator function but I need to be able to cast the string of bits to a char pointer so the 3 bytes of information from the function can be fed to a hardware abstraction functionHAL_SPI_Transmit() , like I know the data is kept as 3 bytes so I don't see why I can't access them as such>:/
Okay so i have come to appreciate that my question was worded in such an annoying confusing way but i have actually found an answer to my own question, that is to user the Union keyword, this means i can define a union that means i can create a type that can be treated as a struct to access the individual bits or an array of 4 chars. I did not realize this existed but here is a link to the stack overflow question where i found my answer
accessing bit fields in structure
sorry guys

Why do some struct types let us set members that can only be a certain value?

I was reading up on some vulkan struct types, this is one of many examples, but the one I will use is vkInstanceCreateInfo. The documentation states:
The VkInstanceCreateInfo structure is defined as:
typedef struct VkInstanceCreateInfo {
VkStructureType sType;
const void* pNext;
VkInstanceCreateFlags flags;
const VkApplicationInfo* pApplicationInfo;
uint32_t enabledLayerCount;
const char* const* ppEnabledLayerNames;
uint32_t enabledExtensionCount;
const char* const* ppEnabledExtensionNames;
} VkInstanceCreateInfo;
Then below in the options we see:
sType is the type of this structure
sType must be VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO
If we dont have any options anyway, why is this parameter not just set implicitly upon creation of the type?
Note: I realise this is not something specific to the vulkan API.
Update: I'm not just talking specifically about vulkan, just all parameters that can only be a certain type.
The design allows structures to be chained together so that extensions can create additional parameters to existing calls without interfering with the original API structures and without interfering with each other.
Nearly every struct in Vulkan has sType as it's first member, and pNext as it's second member. That means that if you have a void* and all you know is that it is some kind of Vulkan API struct, you can safely read the first 32 bits and it will be a VkStructureType and read the next 32 or 64 bits and it will tell you if there's another structure in the chain.
So for instance, there's a VkMemoryAllocateInfo structure for allocating memory that has (aside from sType and pNext the size of the allocation and the heap index it should come from. But what if I want to use the "dedicated allocation" extension. Then I also need to fill out a VkMemoryDedicatedAllocateInfo structure with extra information. But I still need to call the same vkAllocateMemory function that only takes a VkMemoryAllocateInfo... so where do I put the VkMemoryDedicatedAllocateInfo structure I filled out? I put a pointer to it in the pNext field of VkMemoryAllocateInfo.
Maybe I also want to share this memory with some OpenGL code. There's an extension that lets you do that, but you need to fill out a VkExportMemoryAllocateInfo structure and pass it in during the allocation as well. Well, I can do that by putting it in the pNext field of my VkMemoryDedicatedAllocateInfo structure. I can create a chain of structures like that as long as I want.
Here's the really important part. Since all structures have sType as their first field, an extension can navigate along this chain of structures and find the ones it cares about without knowing anything about the structures other than that they always start with sType and pNext.
All of this means that Vulkan can be extended in ways that alter the behavior of existing functions, but without changing the function itself, or the structures that are passed to it.
You might ask why all of the core structures have sType and pNext, even though you're passing them to functions with typed pointers, rather than void pointers. The reason is consistency, and because you never know when an existing structure might be needed as part of the chain for some new extension.
If we dont have any options anyway, why is this parameter not just set implicitly upon creation of the type?
Because C isn't C++. There's no way to declare a structure in C and say that this portion of the structure will always have this value. In C++ you can, by declaring something as const and providing the initial default value. In fact, one of the things I like about the Vulkan C++ bindings is that you can basically forget about sType forever. If you're using extensions you still need to populate pNext as appropriate.

Is an explicit NUL-byte necessary at the end of a bytearray for cython to be able to convert it to a null-terminated C-string

When converting a bytearray-object (or a bytes-object for that matter) to a C-string, the cython-documentation recommends to use the following:
cdef char * cstr = py_bytearray
there is no overhead, as cstr is pointing to the buffer of the bytearray-object.
However, C-strings are null-terminated and thus in order to be able to pass cstr to a C-function it must also be null-terminated. The cython-documentation doesn't provide any information, whether the resulting C-strings are null-terminated.
It is possible to add a NUL-byte explicitly to the byarray-object, e.g. by using b'text\x00' instead of just `b'text'. Yet this is cumbersome, easy to forget, and there is at least experimental evidence, that the explicit NUL-byte is not needed:
%%cython
from libc.stdio cimport printf
def printit(py_bytearray):
cdef char *ptr = py_bytearray
printf("%s\n", ptr)
And now
printit(bytearray(b'text'))
prints the desired "text" to stdout (which, in the case an IPython-notebook, is obviously not the output shown in the browser).
But is this a lucky coincidence or is there a guarantee, that the buffer of a bytearray-object (or a bytes-object) is null-terminated?
I think it's safe (at least in Python 3), however I'd be a bit wary.
Cython uses the C-API function PyByteArray_AsString. The Python3 documentation for it says "The returned array always has an extra null byte appended." The Python2 version does not have that note so it's difficult to be sure if it's safe.
Practically speaking, I think Python deals with this by always over-allocating bytearrays by one and NULL terminating them (see source code for one example of where this is done).
The only reason to be a bit cautious is that it's perfectly acceptable for bytearrays (and Python strings for that matter) to contain a 0 byte within the string, so it isn't a good indicator of where the end is. Therefore, you should really be using their len anyway. (This is a weak argument though, especially since you're probably the one initializing them, so you know if this should be true)
(My initial version of this answer had something about _PyByteArray_empty_string. #ead pointed out in the comments that I was mistaken about this and hence it's edited out...)

How can I get a Path from a raw C string (CStr or *const u8)?

What's the most direct way to use a C string as Rust's Path?
I've got const char * from FFI and need to use it as a filesystem path in Rust.
I'd rather not enforce UTF-8 on the path, so converting through str/String is undesirable.
It should work on Windows at least for ASCII paths.
To clarify: I'm just replacing an existing C implementation that passes the path to fopen with a Rust stdlib implementation. It's not my problem whether it's a valid path or encoded properly for a given filesystem, as long as it's not worse than fopen (and I know fopen basically doesn't work on Windows).
Here's what I've learned:
Path/OsStr always use WTF-8 on Windows, and are an encoding-ignorant bag of bytes on Unix.
They never ever store any paths using any "wide" encoding like UTF-16 or UCS-2. The Windows-only masquerade of OsStr is to hide the WTF-8 encoding, nothing more.
It is extremely unlikely to ever change, because the standard library API supports creation of Path and OsStr from UTF-8 &str without any allocation or mutation of memory (i.e. as_ref() is supported, and its strict API doesn't leave room to implement it as anything other than a pointer cast).
Unix-only zero-copy version (it doesn't even depend on any implementation details):
use std::ffi::{CStr,OsStr};
use std::path::Path;
use std::os::unix::ffi::OsStrExt;
let slice = CStr::from_ptr(c_null_terminated_string_ptr_here);
let osstr = OsStr::from_bytes(slice.to_bytes());
let path: &Path = osstr.as_ref();
On Windows, converting only valid UTF-8 is the best Rust can do without a charade of creating WTF-8 OsString from code units:
…
let str = ::std::str::from_utf8(slice.to_bytes()).expect("keep your surrogates paired");
let path: &Path = str.as_ref();
Safely and portably? Insofar as I'm aware, there isn't a way. My advice is to demand UTF-8 and just pray it never breaks.
The problem is that the only thing you can really say about a "C string" is that it's NUL-terminated. You can't really say anything meaningful about how it's encoded. At least, not with any real certainty.
Unsafely and/or non-portably? If you're running on Linux (and possibly other modern *NIXen), you can maybe use OsStrExt to do the conversion. This only works assuming the C string was a valid path in the first place. If it came from some string processing code that wasn't using the same encoding as the filesystem (which these days is generally "arbitrary bytes that look like UTF-8 but might not be")... well, you'll have to convert it yourself, first.
On Windows? Hahahaha. This depends on where the string came from. C strings embedded in an executable can be in a variety of encodings depending on how the code was compiled. If it came from the OS itself, it could be in one of two different encodings: the thread's OEM codepage, or the thread's ANSI codepage. I never worked out how to check which it's set to. If it came from the console, it would be in whatever the console's input encoding was set to when you received it... assuming it wasn't piped in from something else that was using a different encoding (hi there, PowerShell!). All of the above require you to roll your own transcoding code, since Rust itself avoids this by never, ever using non-Unicode APIs on Windows.
Oh, and don't forget that there is no 8-bit encoding that can properly store Windows paths, since Windows paths are "arbitrary 16-bit words that look like UTF-16 but might not be". [1]
... so, like I said: demand UTF-8 and just pray it never breaks, because trying to do it "correctly" leads to madness.
[1]: I should clarify: there is such an encoding: WTF-8, which is what Rust uses for OsStr and OsString on Windows. The catch is that nothing else on Windows uses this, so it's never going to be how a C string is encoded.

Resources