A common problem substrate developers might run into: developing a custom pallet to store the mapping into storage with common types, such as String. As an example:
#[derive(Encode, Decode, Clone, Default, RuntimeDebug)]
pub struct ClusterMetadata {
ip_address: String,
namespace: String,
whitelisted_ips: String,
}
On building the runtime, you get this error for every String:
|
21 | ip_address: String,
| ^^^^^^ not found in this scope
Why are Strings not included in scope? And other std rust types?
The error here is not related to no_std, so you probably just need to import the String type to get the real errors with using strings in the runtime.
The real issue you will find is that String is not encodable by Parity SCALE Codec, which is obviously a requirement for any storage item (or most any type you want to use) in the runtime.
So the question is "Why does SCALE not encode String"?
This is by choice. In general, String is surprisingly complex type. The Rust book spends a whole section talking about the complexities of the type.
As such, it can easily become a footgun within the runtime environment that people use Strings incorrectly.
Furthermore, it is generally bad practice to store Strings in runtime storage. I think we can easily agree that minimizing storage usage in the runtime is a best practice, and thus you should only put into storage items which you need to be able to derive consensus and state transitions in your runtime. Most often, String data would be used for metadata, and this kind of usage is not best practice.
If you look more closely at Substrate, you will find that we break this best practice more than once, but this is a decision we explicitly make, having the information at hand to be able to correctly evaluate the cost/benefit.
All of this combined is why Strings are not treated as a first class object in the runtime. Instead, we ask users to encode strings into bytes, and then work with that byte array instead.
Related
I have a smart contract written in Rust. In it there's a function which accepts an Option<u128> as one of it argumments.
How can I pass that argument from a client-side? Namely, from a JS, Python or any other language. Or even via Curl. For there's no description anywhere.
The representation of Option<T> in JSON/JS is T | null.
Assuming you're using the near-sdk which serializes the parameter as an object or sequence, in this Option<u128> case if the only parameter has a name of value. Examples of the representation will look like this:
{"value":80}
or
{"value":null}
or
[9]
or
[null]
However you'd like that to be represented in each language.
Keep in mind, though, that most languages might not support deserializing 128-bit integers into native numbers, and you'll get an overflow. For example, JS max integer is 2^53-1, and python will have a 2^64-1 max for 64-bit runtimes and a 2^32-1 max for 32-bit runtimes. This means that for these, you will have an overflow at this larger range.
The common workaround for using large integers like this is to use some big integer implementation in each. Commonly the default serialization for these is a string rather than an integer, which is common in the ecosystem and in NEAR tooling. Just check what format each side uses and use whatever fits your use case.
I'd recommend using big integers on the client side and serializing as a string and could use a wrapper type like U128 from near-sdk, but there is no restriction at a JSON level to serialize and deserialize these big integers as a number if you'd prefer to do that.
Edit: This was assuming that the parameters are serialized as JSON. If you decided to use a different serialization protocol, let me know in the comments, and I can explain what the representation looks like for Option types.
I am having a hard time understanding the Enum custom type in Rust. In a broad way, The Book describes an Enum as a custom data type that has different variants. How should I think about these variants? Are these sub-types, or are these specific values that the Enum type can take?
Looking online I see examples like:
enum Day {
Monday,
Tuesday,
Wednesday,
Thursday,
Friday,
Saturday,
Sunday,
}
In the case above, the variants are the possible values of the Day type. But in The Book we see examples like:
struct Ipv4Addr {
// --snip--
}
struct Ipv6Addr {
// --snip--
}
enum IpAddr {
V4(Ipv4Addr),
V6(Ipv6Addr),
}
To me, it seems like IpAddr::V4 is a sub-type rather than a specific value, but a variable is really of type IpAddr and can have a value IpAddr::V4.
Does it make sense to make the distinction I mention above? What is the correct interpretation of an Enum?
Rust is a static, strongly typed language. It is also very fast. In many cases it is more efficient to use the stack, instead of the heap. However, when you use the stack Rust must know the size of the data that is needed. That's not a problem for simple fixed types like i16, u128, etc. It also isn't a problem for tuples, structs or arrays, because they have a fixed data structure with a known size.
However, sometimes you will need to use different data types, depending on some runtime condition/state. In languages like Java, .NET, JS, Python, PHP, etc., in such situations you will be using the heap (one way or another). In Rust you also have ways to use the heap, but that's often suboptimal. Enums in Rust allow you to define additional, variant-specific fields with custom data types. That can be very flexible and at the same time, in many cases, would be faster than solutions that make use of the heap.
Note that in languages like Java, you would often end up creating a hierarchy of classes to achieve what you can do in Rust with enums. Both approaches have their pros and cons. But if you come from a language like Java, you should keep that in mind.
Maybe a good example would be to think about how you would represent a JSON in your language of choice. If the JSON has a fixed data structure, you can use standard structs in Rust, classes in Java, etc. But what if you don't know the structure of a JSON object in advance? In most modern languages the parser would create some sort of a (Linked)HashMap that contains strings for the keys and some object instances (integers, strings, lists, maps, etc.) for the values. Compare that to serde's Value enum. Another example, which is not for JSON, but is conceptually similar in that you can read data of different types, is mysql's Value.
It might also be useful to understand how Rust allocates memory for enums. It basically determines (at compile time, of course) of all the variants, which one needs most memory. Let's say variant A needs 12 bytes, variant B needs 16 bytes, variant C needs 4 bytes. Rust will allocate 16 bytes for the associated data of every enum instance, because that's the minimum size that all variants can fit in.
It is reasonable to see the Day as a C-style enum. It describes all possible values of the type and has a numeric discriminant to identify each.
The IpAddr type is a tagged union. It is some tag (a number like in the c-style enum) followed by the value you give in brackets. It is not really a subtype, more a variant of IpAddr.
Once identified by its tag (which match and such do for you) you can use the values inside.
I am working on a new type of database, using GO. One of the things I would like to do is have a distributed disk so that I can distribute queries over multiple machines (think Pi type architectures). This means building my own structures on raw disk.
My challenge is that I can't find a GO package that will let me write N bytes from a pointer to a structure. All the IO packages limit the access to []byte slices.
That's nice for protection, but if I have to buffer everything through a byte array via some form of encoding it will slow down the access to a specific object.
Anyone got any idea on how to do raw IO? Or am I going to have to handle GOBs as my unit of IO and suffer the penalty for encoding/decoding?
Big warning first: don't do it: it is neither safe nor portable
For a given struct, you can reflect over it to figure out the in-memory size of the actual struct, then unsafely cast it to a []byte using unsafe.
eg: (*[in-mem size]byte)(unsafe.Pointer(&mystruct))
This will give you something C-ish with absolutely no safety guarantees or portability.
I'll quote the Go spec:
A package using unsafe must be vetted manually for type safety and may
not be portable.
You can find a lot more details in this Go and Memory layout post, including all the steps you need to unsafely treat structs as just bytes.
Overall, it's fascinating to examine how Go functions on a low level, but this is absolutely the wrong thing to do in your case. Any real data infrastructure will need storage logic way more complicated than just dumping in-memory structs to disk anyway.
In general, you cannot do raw IO of a Go struct (i.e. memdump). This is because many things in Go contain pointers, and the actual data is not contiguous in memory.
For example, a struct like this:
type Person struct {
Name string
}
contains a string, which in turn contains a pointer to the bytes of the string. A raw memdump would only dump the pointer.
The solution is serialization. This is never free, although some implementations do a pretty good job.
The closest to what you are describing is something like go-memdump, but I wouldn't recommend it for production.
Otherwise, I recommend looking at a performant serialization technique. (Go's gob encoding is not the best.)
...Or am I going to have to handle GOBs as my unit of IO and suffer the penalty for encoding/decoding?
Just use GOBs.
Premature optimization is the root of all evil.
The documentation of Rust suggests to use &str whenever it's possible and only when it's not, use String. Is it always the case? For example, I'm building the client for REST API of a web-service and I have an entity:
struct User {
id: &str // or String?
name: &str // or String?
//......
}
So is it better to use &str or String in general and in this particular case?
In Rust everything related to a decision whether to use a reference or not stems from the basic concepts of ownership and borrowing and their applications. When you design your data structures, there is no clean rule: it wholly depends on your exact use case.
For example, if your data structure is intended to provide a view into some other data structure (like iterators do), then it makes sense to use references and slices as its fields. If, on the other hand, your structure is a DTO, it is more natural to make it own all of its data.
I believe that a suggestion to use &str where possible is more applicable to function definitions, and in this case it indeed is natural: if you make your functions accept &str instead of String, their caller will be able to use them easily and with no cost if they have either String or &str; on the other hand, if your functions accept Strings, then if their caller has &str, they will be forced to allocate a new String, and even if they have String but don't want to give up ownership, they still would need to clone it.
But of course there are exceptions: sometimes you do want to transfer ownership inside a function. Some data structures and traits, like Option or Reader, provide an ability to turn an owned variant to a borrowed one (Option::as_ref() and Reader::by_ref()), which are sometimes useful. There is also a Cow type which kind of "abstracts" over ownership, allowing you to pass a borrowed value which will be cloned if necessary. Sometimes there is a trait like BytesContainer which abstracts over various types, owning as well as borrowing, and which allows the caller to pass values of different types.
What I wanted to stress, again, is that there is no fixed rule, and it wholly depends on concrete task you're working on. You should use common sense and ownership/borrowing concepts when you architect your data structures.
In your particular case whether to use String or &str depends on what you will actually do with User objects - just "REST API client" is unfortunately too vague. It depends on your architecture. If these objects are used solely to perform an HTTP request, but the data is actually stored in some other source, then you would likely want to use &strs. If, on the other hand, User objects are used across your entire program, then it makes sense to make them own the data with Strings.
Once I studied about the advantage of a string being immutable because of something to improve performace in memory.
Can anybody explain this to me? I can't find it on the Internet.
Immutability (for strings or other types) can have numerous advantages:
It makes it easier to reason about the code, since you can make assumptions about variables and arguments that you can't otherwise make.
It simplifies multithreaded programming since reading from a type that cannot change is always safe to do concurrently.
It allows for a reduction of memory usage by allowing identical values to be combined together and referenced from multiple locations. Both Java and C# perform string interning to reduce the memory cost of literal strings embedded in code.
It simplifies the design and implementation of certain algorithms (such as those employing backtracking or value-space partitioning) because previously computed state can be reused later.
Immutability is a foundational principle in many functional programming languages - it allows code to be viewed as a series of transformations from one representation to another, rather than a sequence of mutations.
Immutable strings also help avoid the temptation of using strings as buffers. Many defects in C/C++ programs relate to buffer overrun problems resulting from using naked character arrays to compose or modify string values. Treating strings as a mutable types encourages using types better suited for buffer manipulation (see StringBuilder in .NET or Java).
Consider the alternative. Java has no const qualifier. If String objects were mutable, then any method to which you pass a reference to a string could have the side-effect of modifying the string. Immutable strings eliminate the need for defensive copies, and reduce the risk of program error.
Immutable strings are cheap to copy, because you don't need to copy all the data - just copy a reference or pointer to the data.
Immutable classes of any kind are easier to work with in multiple threads, the only synchronization needed is for destruction.
Perhaps, my answer is outdated, but probably someone will found here a new information.
Why Java String is immutable and why it is good:
you can share a string between threads and be sure no one of them will change the string and confuse another thread
you don’t need a lock. Several threads can work with immutable string without conflicts
if you just received a string, you can be sure no one will change its value after that
you can have many string duplicates – they will be pointed to a single instance, to just one copy. This saves computer memory (RAM)
you can do substring without copying, – by creating a pointer to an existing string’s element. This is why Java substring operation implementation is so fast
immutable strings (objects) are much better suited to use them as key in hash-tables
a) Imagine StringPool facility without making string immutable , its not possible at all because in case of string pool one string object/literal e.g. "Test" has referenced by many reference variables , so if any one of them change the value others will be automatically gets affected i.e. lets say
String A = "Test" and String B = "Test"
Now String B called "Test".toUpperCase() which change the same object into "TEST" , so A will also be "TEST" which is not desirable.
b) Another reason of Why String is immutable in Java is to allow String to cache its hashcode , being immutable String in Java caches its hash code and do not calculate every time we call hashcode method of String, which makes it very fast as hashmap key.
Think of various strings sitting on a common pool. String variables then point to locations in the pool. If u copy a string variable, both the original and the copy shares the same characters. These efficiency of sharing outweighs the inefficiency of string editing by extracting substrings and concatenating.
Fundamentally, if one object or method wishes to pass information to another, there are a few ways it can do it:
It may give a reference to a mutable object which contains the information, and which the recipient promises never to modify.
It may give a reference to an object which contains the data, but whose content it doesn't care about.
It may store the information into a mutable object the intended data recipient knows about (generally one supplied by that data recipient).
It may return a reference to an immutable object containing the information.
Of these methods, #4 is by far the easiest. In many cases, mutable objects are easier to work with than immutable ones, but there's no easy way to share with "untrusted" code the information that's in a mutable object without having to first copy the information to something else. By contrast, information held in an immutable object to which one holds a reference may easily be shared by simply sharing a copy of that reference.