How can I extract a string prefix that comes before a given character? - rust

Is there a way of extracting a prefix defined by a character without using split? For instance, given "some-text|more-text" how can I extract "some-text" which is the string that comes before "|"?

If your issue with split is that it may also split rest of the string, then consider str.splitn(2, '|') which splits into maximum two parts, so you get the string up to the first | and then the rest of the string even if it contains other | characters.

Split may be the best way. But, if for whatever reason you want to do it without .split(), here's a couple alternatives.
You can use the .chars() iterator, chained with .take_while(), then produce a String with .collect().
// pfx will be "foo".
let pfx = "foo|bar".chars()
.take_while(|&ch| ch != '|')
.collect::<String>();
Another way, which is pretty efficient, but fallable, is to create a slice of the original str:
let s = "foo|bar";
let pfx = &s[..s.find('|').unwrap()];
// or - if used in a function/closure that returns an `Option`.
let pfx = &s[..s.find('|')?];

as #Kornel point out, you can use the Split function:
let raw_text = "some-text|more-text";
let extracted_text = raw_text.split("|").next().unwrap();
<iframe src="https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&code=fn%20main()%20%7B%0A%20%20%20%20let%20raw_text%20%3D%20%22some-text%7Cmore-text%22%3B%0A%20%20%20%20let%20extracted_text%20%3D%20raw_text.split(%22%7C%22).next().unwrap()%3B%0A%20%20%20%20println!(%22%7B%7D%22%2Cextracted_text)%3B%0A%7D" style="width:100%; height:400px;"></iframe>

Related

Need to extract the last word in a Rust string

I am doing some processing of a string in Rust, and I need to be able to extract the last set of characters from that string. In other words, given a string like the following:
some|not|necessarily|long|name
I need to be able to get the last part of that string, namely "name" and put it into another String or a &str, in a manner like:
let last = call_some_function("some|not|necessarily|long|name");
so that last becomes equal to "name".
Is there a way to do this? Is there a string function that will allow this to be done easily? If not (after looking at the documentation, I doubt that there is), how would one do this in Rust?
While the answer from #effect is correct, it is not the most idiomatic nor the most performant way to do it. It'll walk the entire string and match all of the |s to reach the last. You can make it better, but there is a method of str that does exactly what you want - rsplit_once():
let (_, name) = s.rsplit_once('|').unwrap();
// Or
// let name = s.rsplit_once('|').unwrap().1;
//
// You can also use a multichar separator:
// let (_, name) = s.rsplit_once("|").unwrap();
// But in the case of a single character, a `char` type is likely to be more performant.
Playground.
You can use the String::split() method, which will return an iterator over the substrings split by that separator, and then use the Iterator::last() method to return the last element in the iterator, like so:
let s = String::from("some|not|necessarily|long|name");
let last = s.split('|').last().unwrap();
assert_eq!(last, "name");
Please also note that string slices (&str) also implement the split method, so you don't need to use std::String.
let s = "some|not|necessarily|long|name";
let last = s.split('|').last().unwrap();
assert_eq!(last, "name");

Reading a comma-separated string (not text file) in Matlab

I want to read a string in Matlab (not an external text file) which has numerical values separated by commas, such as
a = {'1,2,3'}
and I'd like to store it in a vector as numbers. Is there any function which does that? I only find processes and functions used to do that with text files.
I think you're looking for sscanf
A = sscanf(str,formatSpec) reads data from str, converts it according
to the format specified by formatSpec, and returns the results in an
array. str is either a character array or a string scalar.
You can try the str2num function:
vec = str2num('1,2,3')
If you have to use the cell a, per your example, it would be: vec=str2num(a{1})
There are some security warnings in the documentation to consider so be cognizant of how your code is being employed.
Another, more flexible, option is textscan. It can handle strings as well as file handles.
Here's an example:
cellResult = textscan('1,2,3', '%f','delimiter',',');
vec = cellResult{1};
I will use the eval function to "evaluate" the vector. If that is the structure, I will also use the cell2mat to get the '1,2,3' text (this can be approached by other methods too.
% Generate the variable "a" that contains the "vector"
a = {'1,2,3'};
% Generate the vector using the eval function
myVector = eval(['[' cell2mat(a) ']']);
Let me know if this solution works for you

Rust split vector of bytes by specific bytes

I have a file containing information that I want to load in the application. The file has some header infos as string, then multiple entries that are ended by ';' Some entries are used for different types and therefore lenght is variable, but all variables are separated by ','
Example:
\Some heading
\Some other heading
I,003f,3f3d00ed,"Some string",00ef,
0032,20f3
;
Y,02d1,0000,0000,"Name of element",
00000007,0,
00000000,0,
;
Y,02d1,0000,0000,"Name of element",30f0,2d0f,02sd,
00000007,0,
00000000,0,
;
I is one type of element
Y is another type of element
What I want to achieve is, to bring the elements into different structs to work with. Most of the values are numbers but some are strings.
What I was able to achieve is:
Import the file as Vec<u8>
Put it in a string (can't do that directly, beacuse there may be UTF-8 problems in elements I'm not interested in)
Split it to a Vec<&str> by ';'
Pass the strings to functions depending on their type
Split it to a Vec by '\n'
Split it to a Vec by ','
Reading out the data I need and interpret from the strings (str::from_str_radix for example)
Buld the struct and return it
This seems not to be the way to go, since I start with bytes, allocate them as string and then again allocate numbers on most of the values.
So my question is:
Can I split the Vec<u8> into multiple vectors separated by ';' (byte 59), split these further by '\n' and split this further by ','.
I assume it would be more performant to apply the bytes directly to the correct data-type. Or is my concern wrong?
Can I split the Vec into multiple vectors separated by ';' (byte 59), split these further by '\n' and split this further by ','.
Usually that is not going to work if the other bytes may appear in other places, like embedded in the strings.
Then there is also the question of how the strings are encoded, whether there are escape sequences, etc.
I assume it would be more performant to apply the bytes directly to the correct data-type. Or is my concern wrong?
Reading the entire file into memory and then performing several copies from one Vec to another Vec and another and so on is going to be slower than a single pass with a state machine of some kind. Not to mention it will make working with files bigger than memory extremely slow or impossible.
I wouldn't worry about performance until you have a working algorithm, in particular if you have to work with an undocumented, non-trivial, third-party format and you are not experienced at reading binary formats.
An answer not exactly for this question - but i got directed here and didn't get a better answer.
The function below takes a byte slice pointer and another byte slice and splits the first by the second.
fn split_bytes<'a>(bs: &'a[u8], pred: &'a[u8]) -> Vec<&'a[u8]> {
let mut indexes: Vec<(usize, usize)> = Vec::new();
for (index, el) in bs.windows(pred.len()).enumerate() {
if el == pred {
indexes.push((index, index + pred.len()));
}
};
indexes.reverse();
let mut cur = bs.clone();
let mut res: Vec<&[u8]> = Vec::new();
for (start, end) in indexes.to_owned() {
let (first_left, first_right) = cur.split_at(end);
res.push(first_right);
let (second_left, _) = first_left.split_at(start);
cur = second_left
}
res
}
Here is a demo link in the Rust playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=ff653e2be80d4f73542e26dc37c46f13

Remove part of string (regular expressions)

I am a beginner in programming. I have a string for example "test:1" and "test:2". And I want to remove ":1" and ":2" (including :). How can I do it using regular expression?
Hi andrew it's pretty easy. Think of a string as if it is an array of chars (letters) cause it actually IS. If the part of the string you want to delete is allways at the end of the string and allways the same length it goes like this:
var exampleString = 'test:1';
exampleString.length -= 2;
Thats it you just deleted the last two values(letters) of the string(charArray)
If you cant be shure it's allways at the end or the amount of chars to delete you'd to use the version of szymon
There are at least a few ways to do it with Groovy. If you want to stick to regular expression, you can apply expression ^([^:]+) (which means all characters from the beginning of the string until reaching :) to a StringGroovyMethods.find(regexp) method, e.g.
def str = "test:1".find(/^([^:]+)/)
assert str == 'test'
Alternatively you can use good old String.split(String delimiter) method:
def str = "test:1".split(':')[0]
assert str == 'test'

How to wrap a raw string literal without inserting newlines into the raw string?

I have a raw string literal which is very long. Is it possible to split this across multiple lines without adding newline characters to the string?
file.write(r#"This is an example of a line which is well over 100 characters in length. Id like to know if its possible to wrap it! Now some characters to justify using a raw string \foo\bar\baz :)"#)
In Python and C for example, you can simply write this as multiple string literals.
# "some string"
(r"some "
r"string")
Is it possible to do something similar in Rust?
While raw string literals don't support this, it can be achieved using the concat! macro:
let a = concat!(
r#"some very "#,
r#"long string "#,
r#"split over lines"#);
let b = r#"some very long string split over lines"#;
assert_eq!(a, b);
It is possible with indoc.
The indoc!() macro takes a multiline string literal and un-indents it at compile time so the leftmost non-space character is in the first column.
let testing = indoc! {"
def hello():
print('Hello, world!')
hello()
"};
let expected = "def hello():\n print('Hello, world!')\n\nhello()\n";
assert_eq!(testing, expected);
Ps: I really think we could use an AI that recommend good crates to Rust users.

Resources