Get value of nth char in string in rust - string

How do I get the value of a character at position n in a string?
For example, if I had the string "Hello, world!", how would I get the value of the first character?

It's simple as s.chars().nth(n).
However, beware that like said in the docs:
It’s important to remember that char represents a Unicode Scalar Value, and might not match your idea of what a ‘character’ is. Iteration over grapheme clusters may be what you actually want. This functionality is not provided by Rust’s standard library, check crates.io instead.
See How to iterate over Unicode grapheme clusters in Rust?.
For the first character specifically, you can use s.chars().next().
If your string is ASCII-only, you can use as_bytes(): s.as_bytes()[n]. But I would not recommend that, as this is not future-proof (though this is faster, O(1) vs O(n)).

Related

Why is Julia giving me StringIndex error?

I'm getting a StringIndex error for one particular string out of 10,000 which I am processing. I don't really know what the issue is with this string. I think it is probably a special character issue.
If I println the string then assign it to txt then pass txt to the function, I don't get an error. I am a little baffled.
I am sorry, I can't post the string as it is protected content and even if I did copying and pasting the string somehow removes the source of error. Any suggestions?
Just to expand. The details of how String is represented in Julia are explained in the Julia manual.
You can use eachindex to get an iterator of valid indices into a String. The reason why it is an iterator is that you cannot efficiently (i.e. in O(1) time) find an index of i-th character in the string. However, you can use isascii function on a String to check if it consists only of ASCII characters (in which case byte and character indices are the same).
Also if you need to get to some specific character in a string you usually need probably more than one character, in which case first, last and chop functions are useful (actually last(first(s, n)) gives you a character at position n; although it is not most efficient - iterating eachindex will allocate less).
In Julia Strings are indexed by bytes rather than characters. You should use for c in str rather than trying to index manually.

What is the difference between binary safe strings and binary unsafe strings?

I was reading redis manifesto[1] and it seems redis accepts only binary safe strings as keys but I don't know the difference between the two. Can anyone explain with an example?
[1] http://oldblog.antirez.com/post/redis-manifesto.html
According to Redis documentation, simple Redis strings have syntax "+redis_response\r\n" whereas bulk Redis strings have syntax "$str_len\r\nbinary_safe_string\r\n".
In other words, binary safe string in Redis can contain any data as simple as "foo" to any binary data upto 512MB say a JEPG image. Binary safe string has its length encoded in it and does not terminate with any particular character such as a NULL terminating string in C which ends with '\0.
HTH,
Swanand
I'm not familiar with the system in question, but the term "binary safe string" might be used either to describe certain string-storage types or to describe particular string instances. In a binary-safe string type, a string of length N may be used to encapsulate any sequence of N values in the range either 0-255 or 0-65535 (for 8- or 16-bit types, respectively). A binary-safe string instance might be one whose representation may be subdivided into uniformly-sized pieces, with each piece representing one character, as distinct from a string instance in which different characters require different amounts of storage space.
Some string types (which are not binary safe) will use variable-length representations for certain characters, and will behave oddly if asked to act upon e.g. a string which contains the code for "first half of a multi-part character" followed by something other than a "second half of multi-part character". Further, some code which works with strings will assume that it the Nth character will be stored in either the Nth byte or the Nth pair of bytes, and will malfunction if given a string in which, e.g. the 8th character is stored in the 12th and 13th pairs of bytes.
Looking only briefly at the link provided, I would guess that it's saying that the redis does not expect to only work with strings that use different numbers of bytes to hold different characters, though I'm not quite clear whether it's assuming that a string type will be able to handle any possible sequence of bytes, or whether it's assuming that any string instance which it's given may be safely regarded as a sequence of bytes. I think the fundamental concepts of interest, though, are (1) some string types use variable-length encodings and others do not; (2) even in types that use variable-length encodings, a useful subset of string instances will consist only of fixed-length characters.
Binary-safe means that a string can contain any character, while binary-unsafe can not, such as '\0' in C language. '\0' is the ending of a string, which means characters after '\0' and before '\0' will be considered as two different strings.

Reversing string in ocaml

I have this function for reversing strings in ocaml however it says that I have my types wrong. I am unsure as to why or what I can do :(
Any tips on debugging would also be greatly appreciated!
28 let reverse s =
29 let rec helper i =
30 if i >= String.length s then "" else (helper (i+1))^(s.[i])
31 in
32 helper 0
Error: This expression has type char but an expression was expected of type
string
Thank you
Your implementation does not have the expected (linear) time and space complexity: it is quadratic in both time and space, so it is hardly a correct implementation of the requested feature.
String concatenation sa^sb allocates a new string of size length sa + length sb, and fills it with the two strings; this means that both its time and space complexity are linear in the sum of the lengths. When you iterate this operation once per character, you get an algorithm of quadratic complexity (the total size of memory allocated, and total number of copies, will be 1+2+3+....+n).
To correctly implement this algorithm, you could either:
allocate a string of the expected size, and mutate it in place with the content of the input string, reversed
create a string list made of reversed size-one strings, then use String.concat to concatenate all of them at once (which allocates the result and copies the strings only once)
use the Buffer module which is meant to accumulate characters or strings iteratively without exhibiting a quadratic behavior (it uses a dynamic resizing policy that makes addition of a char amortized constant time)
The first approach is both the simplest and the fastest, but the other two will get more interesting in more complex application where you want to concatenate strings, but it's less straightforward to know in one step what the final result will be.
The error message is pretty clear, I think. The expression s.[i] represents a character (the ith character of the string). But the ^ operator requires strings as its arguments.
To get past the problem you can use String.make 1 s.[i]. This expression gives a 1-character string containing the single character s.[i].
Handling strings recursively in OCaml isn't as nice as it could be, because there's no nice way to destructure a string (break it into parts). The equivalent code to reverse a list looks a lot prettier. For what it's worth :-)
You can also use 3rd party libraries to do so. http://batteries.forge.ocamlcore.org/ already implements a function for reversing strings

caesar cipher check in ocaml

I want to implement a check function that given two strings s1 and s2 will check if s2 is the caesar cipher of s1 or not. the inter face needs to be looked like string->string->bool.
the problem is that I am not allowed to use any string functions other than String.length, so how can I solve it? i am not permitted any list array, iterations. Only recursions and pattern matching.
Please help me. And also can you tell me how I can write a substring function in ocaml other than the module function with the above restrictions?
My guess is that you are probably allowed to use s.[i] to get the ith character of string s. This is the same as String.get, but the instructor may not think of it in those terms. Without some form of getting the individual characters for the string, I believe that this is impossible. You should probably double check with your instructor to be sure, but I would be surprised if he had meant for you to be unable to separate a string into characters (which is something that you cannot do with pattern-matching alone in Ocaml).
Once you can get individual characters, the way to do it should be pretty clear (you do not need substring to traverse each string recursively).
If you still want to write substring, creating it would be complex since you don't have access to String.create or other similar functions. But you can write your own version of String.create using recursion, one character string literals (like "x"), the ability to set a character in a string to another (like s.[0] <- c), and string concatenation (s1 ^ s2). Again, of course, all of this is assuming that those operators are allowed to be used.

padding in generalized suffix tree and implementation resource

From the wikipedia page, it says using unique terminator strings $0, $1, …, $n-1 for a tree with n strings, s1, ..., sn.
My question is: how to deal with situations in which there are literal suffix of $i for string i+1? For example, my first string s1 is example$0. What is the clever way of doing this?
Also, the implementation of suffix tree I found are mostly for a single string, not for the generalized version. Given a implementation for a single string, how can one easily extend it?
Thank you!
1st question: if you're using Unicode, you may use PUA codes (http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Private_use_characters) which are not assigned in your environment. Starting at U+E000 would do. If you're using 8-bit ascii, use a byte code which you know is not in your strings -- \003 (end of text) sounds appropriate -- instead of that '$'.
2nd question: just start over, only starting with the current tree instead of an empty one. The unique terminators guarantee that you'll never find yourself trying to split a leaf node.

Resources