playground
use serde_json::json; // 1.0.66
use std::str;
fn main() {
let input = "{\"a\": \"b\\u001fc\"}";
let bytes = input.as_bytes();
let json: serde_json::Value = serde_json::from_slice(bytes).unwrap();
for (_k, v) in json.as_object().unwrap() {
let vec = serde_json::to_vec(v).unwrap();
let utf8_str = str::from_utf8(&vec).unwrap();
println!("value: {}", v);
println!("utf8_str: {}", utf8_str);
println!("bytes: {:?}", vec);
}
}
How can the value of object key "a" be transformed into the following string?
b\u{1f}c
I've tried with serde_json and str::from_utf8, but I always get "b\u001fc" as the result. The escaped character sequence is not interpreted correctly. How this can be solved?
The problem is this line:
let vec = serde_json::to_vec(v).unwrap();
From the serde_json docs on to_vec():
Serialize the given data structure as a JSON byte vector.
You are deserializing from JSON, getting the values of the object, serializing them back to JSON and printing that. You don't want to serialize back to JSON, you want to print the "raw" string, so something like this does what you want:
fn main() {
let input = "{\"a\": \"b\\u001fc\"}";
let bytes = input.as_bytes();
let json: serde_json::Value = serde_json::from_slice(bytes).unwrap();
for (_k, v) in json.as_object().unwrap() {
let string = v.as_str().unwrap();
println!("bytes: {:?}", string);
}
}
Playground
I think things are closer to working than you think. Your problem is not that the escape sequence isn't being interpreted correctly, but rather that serde_json::to_vec(v) essentially re-encodes v (which is serde_json::value::Value::String) into a vector of JSON-encoded bytes. This means that it picks up the surrounding quote characters (byte 34) and turns the escape sequence into a literal ['\\', 'u', ...] — because that's how it would look in JSON.
If you want to get the string value out, you can do this:
for (_k, v) in json.as_object().unwrap() {
if let serde_json::value::Value::String(s) = v {
println!("{:?}", s);
}
}
This prints "b\u{1f}c", the Rust string you want.
Related
I am trying to collect vector of string to string with separator $.
let v = [String::from("bump"), String::from("sage"),String::from("lol"), String::from(" kek ")];
let s: String = v.into_iter().map(|x| x.push_str("$")).collect();
println!("{:?}",s );
The code above does not work, but this:
let v = [String::from("hello"), String::from("world"),String::from("shit"), String::from(" +15 ")];
let s: String = v.into_iter().collect();
println!("{:?}",s );
is working. How do I solve this problem?
Your code isn't working because push_str() does not return the string.
So your map() function maps from String to (), because you don't return x from it.
Further, x is not mutable, so you cannot call push_str() on it. You have to declare it mut.
This is your code, minimally modified so that it works:
fn main(){
let v = [String::from("bump"), String::from("sage"),String::from("lol"), String::from(" kek ")];
let s: String = v.into_iter().map(|mut x| {x.push_str("$"); x}).collect();
println!("{:?}",s );
}
"bump$sage$lol$ kek $"
Further, if you only push a single character, do push('$') instead.
You will notice, however, that there is a $ at the end of the string. Your usecase is perfect for reduce(), so I'd use #Aleksander's answer instead.
You can use Iterator::reduce. Note that it will put separators only between items (and not at the end of string like in Petterrabit's answer) and it will re-use the allocation of fist string (which results in slightly better memory efficiency).
fn main() {
let v = [
String::from("bump"),
String::from("sage"),
String::from("lol"),
String::from(" kek "),
];
let s: String = v
.into_iter()
.reduce(|mut acc, x| {
acc.push('$');
acc.push_str(&x);
acc
})
.unwrap_or_default();
println!("{}", s);
}
You must return strings from your map.
Note that push_str doesn't return anything.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=50534d454a46093299ce38682733c86a
fn main() {
let v = [
String::from("bump"),
String::from("sage"),
String::from("lol"),
String::from(" kek "),
];
let s: String = v
.iter()
.map(|x| {
let mut x = x.to_owned();
x.push_str("$");
x
})
.collect();
println!("{}", s);
}
EDIT
If your real use case is more complex and you must use a iterator and a map, you should prefer above answers which are better done (no need to own the returned strings from the map because you collect them into a new string anyway).
But that said if the only purpose is to join your Vec with a separator you should simply do
fn main() {
let v = [
String::from("bump"),
String::from("sage"),
String::from("lol"),
String::from(" kek "),
];
println!("{}", v.join("$"));
}
I was wondering how to convert a styled string into a vector. Say I had a String with the value:
"[x, y]"
-how could I turn it into a vector that has x as the first object and y as the second object?
Thanks!
Sure, but the elements can't be references. As mentioned by #prog-fh that isn't possible in rust since once compiled, variable names may not be stored and the compiler may have even removed some during optimizations.
You can however do something more similar to python's ast.literal_eval using serde with Rust Object Notation (RON, a type of serialization that was made to resemble rust data structures). It isn't perfect, but it is an option. It does however require you know what types you are trying to parse.
use ron::from_str;
let input = "[37.6, 24.3, 89.023]";
let parsed: Vec<f32> = from_str(input).unwrap();
On the other hand if #mcarton is correct and you want something like vec!["x", "y"], you could manually parse it like so:
fn parse(input: &str) -> Option<Vec<String>> {
let mut part = String::new();
let mut collected = Vec::new();
let mut char_iter = input.chars();
if char_iter.next() != Some('[') {
return None
}
loop {
match char_iter.next()? {
']' => {
collected.push(part);
return Some(collected)
}
',' | ' ' => {
if !part.is_empty() {
collected.push(part);
part = String::new();
}
}
x => part.push(x),
}
}
}
println!("{:?}", parse("[a, b, foo]"));
Or you could also use a regex to break it up instead, but you can look into how that works yourself.
I want a function that can take two arguments (string, number of letters to crop off front) and return the same string except with the letters before character x gone.
If I write
let mut example = "stringofletters";
CropLetters(example, 3);
println!("{}", example);
then the output should be:
ingofletters
Is there any way I can do this?
In many uses it would make sense to simply return a slice of the input, avoiding any copy. Converting #Shepmaster's solution to use immutable slices:
fn crop_letters(s: &str, pos: usize) -> &str {
match s.char_indices().skip(pos).next() {
Some((pos, _)) => &s[pos..],
None => "",
}
}
fn main() {
let example = "stringofletters"; // works with a String if you take a reference
let cropped = crop_letters(example, 3);
println!("{}", cropped);
}
Advantages over the mutating version are:
No copy is needed. You can call cropped.to_string() if you want a newly allocated result; but you don't have to.
It works with static string slices as well as mutable String etc.
The disadvantage is that if you really do have a mutable string you want to modify, it would be slightly less efficient as you'd need to allocate a new String.
Issues with your original code:
Functions use snake_case, types and traits use CamelCase.
"foo" is a string literal of type &str. These may not be changed. You will need something that has been heap-allocated, such as a String.
The call crop_letters(stringofletters, 3) would transfer ownership of stringofletters to the method, which means you wouldn't be able to use the variable anymore. You must pass in a mutable reference (&mut).
Rust strings are not ASCII, they are UTF-8. You need to figure out how many bytes each character requires. char_indices is a good tool here.
You need to handle the case of when the string is shorter than 3 characters.
Once you have the byte position of the new beginning of the string, you can use drain to move a chunk of bytes out of the string. We just drop these bytes and let the String move over the remaining bytes.
fn crop_letters(s: &mut String, pos: usize) {
match s.char_indices().nth(pos) {
Some((pos, _)) => {
s.drain(..pos);
}
None => {
s.clear();
}
}
}
fn main() {
let mut example = String::from("stringofletters");
crop_letters(&mut example, 3);
assert_eq!("ingofletters", example);
}
See Chris Emerson's answer if you don't actually need to modify the original String.
I found this answer which I don't consider really idiomatic:
fn crop_with_allocation(string: &str, len: usize) -> String {
string.chars().skip(len).collect()
}
fn crop_without_allocation(string: &str, len: usize) -> &str {
// optional length check
if string.len() < len {
return &"";
}
&string[len..]
}
fn main() {
let example = "stringofletters"; // works with a String if you take a reference
let cropped = crop_with_allocation(example, 3);
println!("{}", cropped);
let cropped = crop_without_allocation(example, 3);
println!("{}", cropped);
}
my version
fn crop_str(s: &str, n: usize) -> &str {
let mut it = s.chars();
for _ in 0..n {
it.next();
}
it.as_str()
}
#[test]
fn test_crop_str() {
assert_eq!(crop_str("123", 1), "23");
assert_eq!(crop_str("ЖФ1", 1), "Ф1");
assert_eq!(crop_str("ЖФ1", 2), "1");
}
So I have gone through 90% of the tutorial on Rust and I think I mostly have a grasp on the syntax. I'm attempting to start writing code with it I'm currently using the rustc_serialize library to parse JSON from stdin and I'm not getting the results I expect. I have the following JSON file called message.txt the following content:
{"text": "hello world"}
Here is the Rust code to accept stdin and parse out the text field:
extern crate rustc_serialize;
use std::io::{self, Read};
use rustc_serialize::json::Json;
fn main() {
// provide a buffer for stdin
let mut buffer = String::new();
let _ = io::stdin().read_to_string(&mut buffer);
// parse the json
let message = match Json::from_str(&mut buffer) {
Ok(m) => m,
Err(_) => panic!("Stdin provided invalid JSON")
};
// get the message object and "text" field string
let message_object = message.as_object().unwrap();
let message_string = message_object.get("text").unwrap();
println!("{}", message_string);
println!("{}", &message_string.to_string()[0..4]);
}
The following code outputs:
"Hello World"
"Hel
I'm currently outputting the byte slice to make sure the quote wasn't something that was added by print. According to the docs message_string shouldn't have quotes around it.
If I print out the data using the example from the documentation then it prints the value of "text" without quotes:
for (key, value) in message_object.iter() {
println!("{}: {}", key, match *value {
Json::U64(v) => format!("{} (u64)", v),
Json::String(ref v) => format!("{} (string)", v),
_ => format!("other")
});
}
Output:
text: hello world (string)
I'm a newbie to Rust so I probably just don't understand the string manipulation parts of Rust all that well.
The problem is that message_string isn't what you think it is. I discovered that when I tried to use len on the "string", which didn't work (I assume that's why you have a to_string when you are slicing). Let's make the compiler tell us what it is:
let () = message_string;
Has the error:
error: mismatched types:
expected `&rustc_serialize::json::Json`,
found `()`
It's a Json! We need to convert that enumerated type into a string-like thing:
let message_object = message.as_object().unwrap();
let message_json = message_object.get("text").unwrap();
let message_string = message_json.as_string().unwrap();
Ultimately, I'd argue that Display (which allows the {} format string) should not have been implemented for this type, as Display means format in an end-user-focused manner. It's probably too late to change that decision now though.
I know that unwrap is great for quick prototyping, but I'd be remiss in not showing a slightly more idiomatic way of doing this:
fn main() {
let mut buffer = String::new();
io::stdin().read_to_string(&mut buffer).expect("Could not read from stdin");
let message = Json::from_str(&mut buffer).expect("Stdin provided invalid JSON");
let message_string = message.as_object().and_then(|obj| {
obj.get("text").and_then(|json| {
json.as_string()
})
}).expect("The `text` key was missing or not a string");
println!("{}", message_string);
}
Ignoring the Result from read_to_string is worse than panicking. ^_^
Suppose I'm trying to do a fancy zero-copy parser in Rust using &str, but sometimes I need to modify the text (e.g. to implement variable substitution). I really want to do something like this:
fn main() {
let mut v: Vec<&str> = "Hello there $world!".split_whitespace().collect();
for t in v.iter_mut() {
if (t.contains("$world")) {
*t = &t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
But of course the String returned by t.replace() doesn't live long enough. Is there a nice way around this? Perhaps there is a type which means "ideally a &str but if necessary a String"? Or maybe there is a way to use lifetime annotations to tell the compiler that the returned String should be kept alive until the end of main() (or have the same lifetime as v)?
Rust has exactly what you want in form of a Cow (Clone On Write) type.
use std::borrow::Cow;
fn main() {
let mut v: Vec<_> = "Hello there $world!".split_whitespace()
.map(|s| Cow::Borrowed(s))
.collect();
for t in v.iter_mut() {
if t.contains("$world") {
*t.to_mut() = t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
as #sellibitze correctly notes, the to_mut() creates a new String which causes a heap allocation to store the previous borrowed value. If you are sure you only have borrowed strings, then you can use
*t = Cow::Owned(t.replace("$world", "Earth"));
In case the Vec contains Cow::Owned elements, this would still throw away the allocation. You can prevent that using the following very fragile and unsafe code (It does direct byte-based manipulation of UTF-8 strings and relies of the fact that the replacement happens to be exactly the same number of bytes.) inside your for loop.
let mut last_pos = 0; // so we don't start at the beginning every time
while let Some(pos) = t[last_pos..].find("$world") {
let p = pos + last_pos; // find always starts at last_pos
last_pos = pos + 5;
unsafe {
let s = t.to_mut().as_mut_vec(); // operating on Vec is easier
s.remove(p); // remove $ sign
for (c, sc) in "Earth".bytes().zip(&mut s[p..]) {
*sc = c;
}
}
}
Note that this is tailored exactly to the "$world" -> "Earth" mapping. Any other mappings require careful consideration inside the unsafe code.
std::borrow::Cow, specifically used as Cow<'a, str>, where 'a is the lifetime of the string being parsed.
use std::borrow::Cow;
fn main() {
let mut v: Vec<Cow<'static, str>> = vec![];
v.push("oh hai".into());
v.push(format!("there, {}.", "Mark").into());
println!("{:?}", v);
}
Produces:
["oh hai", "there, Mark."]