Accessing a character in a borrowed string by index - rust

How would you access an element in a borrowed string by index?
Straightforward in Python:
my_string_lst = list(my_string)
print my_string_list[0]
print my_string[0] # same as above
Rust (attempt 1):
let my_string_vec = vec![my_string]; # doesn't work
println!("{}", my_string_vec[0]); # returns entire of `my_string`
Rust (attempt 2):
let my_string_vec = my_string.as_bytes(); # returns a &[u8]
println!("{}", my_string_vec[0]); # prints nothing
My end goal is to stick it into a loop like this:
for pos in 0..my_string_vec.len() {
while shift <= pos && my_string_vec[pos] != my_string_vec[pos-shift] {
shift += shifts[pos-shift];
}
shifts[pos+1] = shift;
}
for ch in my_string_vec {
let pos = 0; // simulate some runtime index
if my_other_string_vec[pos] != ch {
...
}
}
I think it's possible to do use my_string_vec.as_bytes()[pos]and my_string_vec.as_bytes()[pos-shift]in my condition statement, but I feel that this has a bad code smell.

You can use char_at(index) to access a specific character. If you want to iterate over the characters in a string, you can use the chars() method which yields an iterator over the characters in the string.
The reason it was specifically not made possible to use indexing syntax is, IIRC, because indexing syntax would give the impression that it was like accessing a character in your typical C-string-like string, where accessing a character at a given index is a constant time operation (i.e. just accessing a single byte in an array). Strings in Rust, on the other hand, are Unicode and a single character may not necessarily consist of just one byte, making a specific character access a linear time operation, so it was decided to make that performance difference explicit and clear.
As far as I know, there is no method available for swapping characters in a string (see this question). Note that this wouldn't have been possible anyways via an immutably borrowed string, since such a string isn't yours to modify. You would have to most likely use a String, or perhaps a &mut str if you're strictly swapping, but I'm not too familiar with Unicode's intricacies.
I recommend instead you build up a String the way you want it, that way you don't have to worry about the mutability of the borrowed string. You'd refer/look into the borrowed string, and write into the output/build-up string accordingly based on your logic.
So this:
for pos in 0..my_string_vec.len() {
while shift <= pos && my_string_vec[pos] != my_string_vec[pos-shift] {
shift += shifts[pos-shift];
}
shifts[pos+1] = shift;
}
Might become something like this (not tested; not clear what your logic is for):
for ch in my_string.chars()
while shift <= pos && ch != my_string.char_at(pos - shift) {
// assuming shifts is a vec; not clear in question
shift += shifts[pos - shift];
}
shifts.push(shift);
}
Your last for loop:
for ch in my_string_vec {
let pos = 0; // simulate some runtime index
if my_other_string_vec[pos] != ch {
...
}
}
That kind of seems like you want to compare a given character in string A with the corresponding character (in the same position) of string B. For this I would recommend zipping the chars iterator of the first with the second, something like:
for (left, right) in my_string.chars().zip(my_other_string.chars()) {
if left != right {
}
}
Note that zip() stops iterating as soon as either iterator stops, meaning that if the strings are not the same length, then it'll only go as far as the shortest string.
If you need access to the "character index" information, you could add .enumerate() to that, so the above would change to:
for (index, (left, right)) in my_string.chars().zip(my_other_string.chars()).enumerate()

Related

How to get a substring of a &str based on character index?

I am trying to write a program that takes a list of words and then, if the word has an even length, prints the two middle letters. If the word has an odd length, it prints the single middle letter.
I can find the index of the middle letter(s), but I do not know how to use that index to print the corresponding letters of the word.
fn middle(wds: &[&str)){
for word in wds{
let index = words.chars().count() /2;
match words.chars().count() % 2{
0 => println!("Even word found"),
_ => println!("odd word found")
}
}
}
fn main(){
let wordlist = ["Some","Words","to","test","testing","elephant","absolute"];
middle(&wordlist);
}
You can use slices for this, specifically &str slices. Note the &.
These links might be helpful:
https://riptutorial.com/rust/example/4146/string-slicing
https://doc.rust-lang.org/book/ch04-03-slices.html
fn main() {
let s = "elephant";
let mid = s.len() / 2;
let sliced = &s[mid - 1..mid + 1];
println!("{}", sliced);
}
Hey after posting i found two different ways of doing it, the fact i had two seperate ways in my head was confusing me and stopping me finding the exact answer.
//i fixed printing the middle letter of the odd numbered string with
word.chars().nth(index).unwrap()
//to fix the even index problem i did
&word[index-1..index+1]

Easy way to get a sub-string/sub-slice of up to N characters/elements in Go

In Python I can slice a string to get a sub-string of up to N characters and if the string is too short it will simply return the rest of the string, e.g.
"mystring"[:100] # Returns "mystring"
What's the easiest way to do the same in Go? Trying the same thing panics:
"mystring"[:100] // panic: runtime error: slice bounds out of range
Of course, I can write it all manually:
func Substring(s string, startIndex int, count int) string {
maxCount := len(s) - startIndex
if count > maxCount {
count = maxCount
}
return s[startIndex:count]
}
fmt.Println(Substring("mystring", 0, n))
But that's rather a lot of work for something so simple and (I would have thought) common. What's more, I don't know how to generalise this function to slices of other types, since Go doesn't support generics. I'm hoping there is a better way. Even Math.Min() doesn't easily work here, because it expects and returns float64.
Note that while a function remains the recommended solution (even if it has to be implemented for slices with different type), it wouldn't work well with string.
fmt.Println(Substring("世界mystring", 0, 5)) would actually print 世�� instead of 世界mys.
See "Code points, characters, and runes": a character may be represented by a number of different sequences of code points, and therefore different sequences of UTF-8 bytes.
And in Go, a "code point" is a rune (as seen here).
Using rune would be more robust (again, in case of strings)
func SubstringRunes(s string, startIndex int, count int) string {
runes := []rune(s)
length := len(runes)
maxCount := length - startIndex
if count > maxCount {
count = maxCount
}
return string(runes[startIndex:count])
}
See it in action in this playground.

Check if a substring exists at the beginning, middle and end of a string while allowing intersections

It sound easy, you can simply iterate and check them, but the problem here is optimization: Don't make any needless checking, needless new objects or operation.
The algorithm will be tested against a huge set of test cases to verify its efficiency.
Examples:
"aaaa" contains "aa" at the beginning, middle and end.
"baabaabaaaabbaab" contains "baab" at the beginning, middle and end. See the intersection.
And one more thing I forgot to say:
You are not given the substring to check for, you need to find if such a substring exists, if it doesn't return false, if it does return true.
Find the longest substring satisfying those conditions and return it, or print it (your choice).
A simple Boolean function, right?
Update:
The substring needs to be at least 2 character shorter that the main string.
Sorry, it was my mistake in the "aaa" example, I fixed it.
You can solve it with KMP, a string matching algorithm. Using it to generate an array fail[]
fail[i] = max {k | S[1:k] == S[i-k+1:i]}
Then you can enumerate all possible value of fail[n](fail[n], fail[ fail[n] ], fail[ fail[fail[n]] ] ...) to check whether it exists in the middle.
The complexity is O(n).
Let's jump the shark:
function the_best_match_at_the_beginning_the_middle_and_the_end( s ){
print( s );
return true;
}
That's one of these "you might get significantly better in terms of theoretical complexity, but in reality, linear operation is always faster" answers:
Assuming in is your input string, pattern is what you're looking for, and you're able to read or look up C-standard-lib-style methods like strncmp. Let l_in be the number of characters in the input, l_pattern the number of characters in the pattern.
Simply explicitely check the start (strncmp(in,pattern,l_pattern)); then use a bog-normal linear search from the second letter on (strstr(in+1, pattern):
If strstr didn't find anything, there's no middle match nor a end match.
If it's at the end (result of strstr is l_in-l_pattern), you've got no middle match.
If it's not found at the end, you've got a middle match. Manually check (strncmp(in+l_in-l_patter, pattern, l_pattern)) for the end match.
Why this is faster? Because modern computers are pretty optimized for searching through data linearly, see Bjarne "C++" Stroustrup's why you should avoid linked lists. Simply put, letting your CPU run on a continous amount of memory prefetched to a CPU cache is much much faster than being "clever" about avoiding a few duplicate checks.
One clean way to approach this is to just check all substrings in the input from the beginning. Compare each substring to see that it exists at the end, and then check to see if it exists in the middle. For the middle check, you can compare against the input string with its first and last characters removed.
public boolean subStrings(String input) {
if (input == null || input.equals("")) {
return false;
}
if (input.length() == 1) {
System.out.println(input + " is a match!");
return true;
}
boolean foundIt = false;
String longestMatch = "";
for (int i=1; i < inputNew.length(); ++i) {
String substring = inputNew.substring(0, i);
boolean endMatch = inputNew.substring(inputNew.length()-i, inputNew.length()).equals(substring);
boolean midMatch = inputNew.substring(1, inputNew.length()-1).contains(substring);
if (endMatch && midMatch) {
longestMatch = substring;
foundIt = true;
}
}
if (foundIt) {
System.out.println(longestMatch + " is a match!");
return true;
}
else {
return false;
}
}
subStrings("baabaabaaaabbaab");
Output:
baab is a match!

How do you make a function detect whether a string is binary safe or not

How does one detect if a string is binary safe or not in Go?
A function like:
IsBinarySafe(str) //returns true if its safe and false if its not.
Any comment after this are just things I have thought or attempted to solve this:
I assumed that there must exist a library that already does this but had a tough time finding it. If there isn't one, how do you implement this?
I was thinking of some solution but wasn't really convinced they were good solutions.
One of them was to iterate over the bytes, and have a hash map of all the illegal byte sequences.
I also thought of maybe writing a regex with all the illegal strings but wasn't sure if that was a good solution.
I also was not sure if a sequence of bytes from other languages counted as binary safe. Say the typical golang example:
世界
Would:
IsBinarySafe(世界) //true or false?
Would it return true or false? I was assuming that all binary safe string should only use 1 byte. So iterating over it in the following way:
const nihongo = "日本語abc日本語"
for i, w := 0, 0; i < len(nihongo); i += w {
runeValue, width := utf8.DecodeRuneInString(nihongo[i:])
fmt.Printf("%#U starts at byte position %d\n", runeValue, i)
w = width
}
and returning false whenever the width was great than 1. These are just some ideas I had just in case there wasn't a library for something like this already but I wasn't sure.
Binary safety has nothing to do with how wide a character is, it's mainly to check for non-printable characters more or less, like null bytes and such.
From Wikipedia:
Binary-safe is a computer programming term mainly used in connection
with string manipulating functions. A binary-safe function is
essentially one that treats its input as a raw stream of data without
any specific format. It should thus work with all 256 possible values
that a character can take (assuming 8-bit characters).
I'm not sure what your goal is, almost all languages handle utf8/16 just fine now, however for your specific question there's a rather simple solution:
// checks if s is ascii and printable, aka doesn't include tab, backspace, etc.
func IsAsciiPrintable(s string) bool {
for _, r := range s {
if r > unicode.MaxASCII || !unicode.IsPrint(r) {
return false
}
}
return true
}
func main() {
fmt.Printf("len([]rune(s)) = %d, len([]byte(s)) = %d\n", len([]rune(s)), len([]byte(s)))
fmt.Println(IsAsciiPrintable(s), IsAsciiPrintable("test"))
}
playground
From unicode.IsPrint:
IsPrint reports whether the rune is defined as printable by Go. Such
characters include letters, marks, numbers, punctuation, symbols, and
the ASCII space character, from categories L, M, N, P, S and the ASCII
space character. This categorization is the same as IsGraphic except
that the only spacing character is ASCII space, U+0020.

Given a char array with words in it, find all 'a' characters and replace with xyz. Modify the input array, do not create a copy of the array

This is an interview question. I wanted to know if instead of 'a' we want to replace 'xyz' does that mean we have to create 2 extra space or can I increase the size at that particular index? What could be the most efficient complexity of it in terms of running time.
Thanks!
To do this efficiently, you want to scan forwards through the string counting the 'a's, then scan backwards copying the characters directly to their correct final position while injecting 'xyz's as necessary:
move a pointer through the string from the start until you find the end
as it moves, count the number of 'a's
while the a_count is not 0, move the pointer backwards through the string
if the pointer isn't on an 'a', copy the character under the pointer to an address a_count * 2 characters forwards of the pointer
else (you're pointer is to an 'a'), copy 'xyz' to address pointer + --a_count * 2
This approach is O(N) where N is the number of characters in the string.
Complexity will be O(strlen(in)), optimization is going to be focused on the constant.
I've come up with a simple, relatively efficient implementation. Note that since we are told to build the result in-place, we MUST assume that the original buffer is large enough to accommodate the growth.
Going to test it on ideone.
Done: http://ideone.com/nAgej
This is quite simple. Just needs 2 passes and is O(n).
Count # of 'a' in string
Using length of original string and # of 'a', compute the limits of the modified string
Do a reverse pass on the string and replace 'a' by 'xyz'
Here is a C implementation of this. Note that I have made the replaceable string a variable in replace you can change it to anything else of any length, just make sure that the array has enough space.
#include <stdio.h>
#include <string.h>
int main() {
char arr[1024];
char search='a', *replace="xyz";
fgets(arr,1023/strlen(replace),stdin);
if (arr[strlen(arr)-1]=='\n')
arr[strlen(arr)-1]='\0';
puts(arr);
int i, j, k, offset, target;
for (i=offset=0; i<strlen(arr); ++i)
if (arr[i]==search)
offset+=strlen(replace)-1;
target = strlen(arr)+offset;
for (i=strlen(arr); i>=0; --i) {
if (arr[i]!=search)
arr[target--] = arr[i];
else {
target -= strlen(replace);
strncpy(&arr[target+1],replace,strlen(replace));
}
}
puts(arr);
return 0;
}

Resources