How to check if a string contains whitespace? - string

How do I check if a string contains any whitespace in Rust?
For example, these should all return true:
"Hello, world!"
"Hello\n"
"This\tis\ta\ttab"

You can pass char::is_whitespace to .contains():
assert!("Hello, world!".contains(char::is_whitespace));
assert!("Hello\n".contains(char::is_whitespace));
assert!("This\tis\ta\ttab".contains(char::is_whitespace));
char::is_whitespace returns true if the character has the Unicode White_Space property.
Alternatively, you can use char::is_ascii_whitespace if you only want to match ASCII whitespace (space, horizontal tab, newline, form feed, or carriage return):
// This has a non-breaking space, which is not ASCII.
let string = "Hello,\u{A0}Rust!\n";
// Thus, it's *not* ASCII whitespace
assert!(!string.contains(char::is_ascii_whitespace));
// but it *is* Unicode whitespace.
assert!(string.contains(char::is_whitespace));

As someone mentioned, if you do not need to deal with Unicode, it will be faster
to just explicitly name the characters you care about:
fn main() {
let a = vec!["false", "true space", "true newline\n", "true\ttab"];
let a2: &[char] = &[' ', '\n', '\t'];
for s in a.iter() {
let b = s.contains(a2);
println!("{}", b);
}
}

Related

How to split by an unknown number of tabs, spaces, and newlines in Rust?

I want to achieve something very similar to strings.Fields in Go where I get all the non \t , space and \n consecutive character in a line
For example
this is a \n special \t\t word
will return
[this, is, a, special, word]
Is that possible in Rust?
The split function only takes an explicit pattern.
For example
a \t\t\t b \t\t\t\t c
with
for s in line.split("\t\t\t") {
println!("{}", s);
}
will return
a
b
\t
c
The split_whitespace method defined on str in the standard library will do what you want.
The example from the documentation is pretty clear:
let mut iter = " Mary had\ta\u{2009}little \n\t lamb".split_whitespace();
assert_eq!(Some("Mary"), iter.next());
assert_eq!(Some("had"), iter.next());
assert_eq!(Some("a"), iter.next());
assert_eq!(Some("little"), iter.next());
assert_eq!(Some("lamb"), iter.next());
assert_eq!(None, iter.next());

How to wrap a raw string literal without inserting newlines into the raw string?

I have a raw string literal which is very long. Is it possible to split this across multiple lines without adding newline characters to the string?
file.write(r#"This is an example of a line which is well over 100 characters in length. Id like to know if its possible to wrap it! Now some characters to justify using a raw string \foo\bar\baz :)"#)
In Python and C for example, you can simply write this as multiple string literals.
# "some string"
(r"some "
r"string")
Is it possible to do something similar in Rust?
While raw string literals don't support this, it can be achieved using the concat! macro:
let a = concat!(
r#"some very "#,
r#"long string "#,
r#"split over lines"#);
let b = r#"some very long string split over lines"#;
assert_eq!(a, b);
It is possible with indoc.
The indoc!() macro takes a multiline string literal and un-indents it at compile time so the leftmost non-space character is in the first column.
let testing = indoc! {"
def hello():
print('Hello, world!')
hello()
"};
let expected = "def hello():\n print('Hello, world!')\n\nhello()\n";
assert_eq!(testing, expected);
Ps: I really think we could use an AI that recommend good crates to Rust users.

Swift 2.0 Escaping string for new line (String Encoding)

I am trying to escape string for new line i.e \n.
For example lets say string is:-
First Line Of String
second Line of String
Third Line of String
Now if i use String extension and say
func escapeString() -> String{
newString = self.stringByRemovingPercentEncoding
return newString
}
This extension does not give me newString as
First Line Of String\nSecond Line Of String\nThird Line Of String
I need above string as a jsonString to pass to server.i.e. i have to String encode it
Swift 5
You can use JSONEncoder to escape \n, \, \t, \r, ', " and etc. characters instead of manually replacing them in your string e.g.:
extension String {
var escaped: String {
if let data = try? JSONEncoder().encode(self) {
let escaped = String(data: data, encoding: .utf8)!
// Remove leading and trailing quotes
let set = CharacterSet(charactersIn: "\"")
return escaped.trimmingCharacters(in: set)
}
return self
}
}
let str = "new line - \n, quote - \", url - https://google.com"
print("Original: \(str)")
print("Escaped: \(str.escaped)")
Outputs:
Original: new line -
, quote - ", url - https://google.com
Escaped: new line - \n, quote - \", url - https:\/\/google.com
stringByRemovingPercentEncoding is for percent encoding as used in URLs, as you might expect from the name. (If not from that, maybe from reading the docs, the pertinent part of which even shows up in Xcode code completion.) That is, it takes a string like "some%20text%20with%20spaces" and turns it into "some text with spaces".
If you want to do a different kind of character substitution, you'll need to do it yourself. But that can still be a one-liner:
extension String {
var withEscapedNewlines: String {
return self.stringByReplacingOccurrencesOfString("\n", withString: "\\n")
}
}
Note the first argument to self.stringByReplacingOccurrencesOfString is an escape code passed to the Swift compiler, so the actual value of the argument is the newline character (ASCII/UTF8 0x0A). The second argument escapes the backslash (in the text passed to the Swift compiler), so the actual value of the argument is the text \n.

What is the syntax for a multiline string literal?

I'm having a hard time figuring out how string syntax works in Rust. Specifically, I'm trying to figure out how to make a multiple line string.
All string literals can be broken across several lines; for example:
let string = "line one
line two";
is a two line string, the same as "line one\nline two" (of course one can use the \n newline escape directly too). If you wish to just break a string across multiple lines for formatting reasons you can escape the newline and leading whitespace with a \; for example:
let string = "one line \
written over \
several";
is the same as "one line written over several".
If you want linebreaks in the string you can add them before the \:
let string = "multiple\n\
lines\n\
with\n\
indentation";
It's the same as "multiple\nlines\nwith\nindentation";
In case you want to do something a bit longer, which may or may not include quotes, backslashes, etc., use the raw string literal notation:
let shader = r#"
#version 330
in vec4 v_color;
out vec4 color;
void main() {
color = v_color;
};
"#;
If you have sequences of double quotes and hash symbols within your string, you can denote an arbitrary number of hashes as a delimiter:
let crazy_raw_string = r###"
My fingers #"
can#"#t stop "#"" hitting
hash##"#
"###;
Outputs:
#version 330
in vec4 v_color;
out vec4 color;
void main() {
color = v_color;
};
Playground link
Huon's answer is correct but if the indentation bothers you, consider using Indoc which is a procedural macro for indented multi-line strings. It stands for "indented document." It provides a macro called indoc!() that takes a multiline string literal and un-indents it so the leftmost non-space character is in the first column.
let s = indoc! {"
line one
line two
"};
The result is "line one\nline two\n".
Whitespace is preserved relative to the leftmost non-space character in the document, so the following has line two indented 3 spaces relative to line one:
let s = indoc! {"
line one
line two
"};
The result is "line one\n line two\n".
If you want to have fine granular control over spaces in multiline strings with linebreaks without using an external crate you can do the follwing. Example taken from my own project.
impl Display for OCPRecData {
fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
write!(f, "OCPRecData {{\n\
\x20 msg: {:?}\n\
\x20 device_name: {:?}\n\
\x20 parent_device_name: {:?}\n\
}}", self.msg, self.device_name, self.parent_device_name)
}
}
Results in
OCPRecData {
msg: Some("Hello World")
device_name: None
parent_device_name: None
}
\n\ at each code line end creates a line break at the proper position and discards further spaces in this line of code
\x20 (hex; 32 in decimal) is an ASCII space and an indicator for the first space to be preserved in this line of the string
\x20\x20\x20\x20 and \x20 have the same effect
In case you want to indent multiline text in your code:
let s = "first line\n\
second line\n\
third line";
println!("Multiline text goes next:\n{}", s);
The result will be the following:
Multiline text goes next:
first line
second line
third line

How do I write a multi-line string in Rust? [duplicate]

This question already has answers here:
What is the syntax for a multiline string literal?
(5 answers)
Closed 1 year ago.
Is it possible to write something like:
fn main() {
let my_string: &str = "Testing for new lines \
might work like this?";
}
If I'm reading the language reference correctly, then it looks like that should work. The language ref states that \n etc. are supported (as common escapes, for inserting line breaks into your string), along with "additional escapes" including LF, CR, and HT.
Another way to do this is to use a raw string literal:
Raw string literals do not process any escapes. They start with the
character U+0072 (r), followed by zero or more of the character U+0023
(#) and a U+0022 (double-quote) character. The raw string body can
contain any sequence of Unicode characters and is terminated only by
another U+0022 (double-quote) character, followed by the same number
of U+0023 (#) characters that preceded the opening U+0022
(double-quote) character.
All Unicode characters contained in the raw string body represent
themselves, the characters U+0022 (double-quote) (except when followed
by at least as many U+0023 (#) characters as were used to start the
raw string literal) or U+005C (\) do not have any special meaning.
Examples for string literals:
"foo"; r"foo"; // foo
"\"foo\""; r#""foo""#; // "foo"
"foo #\"# bar";
r##"foo #"# bar"##; // foo #"# bar
"\x52"; "R"; r"R"; // R
"\\x52"; r"\x52"; // \x52
If you'd like to avoid having newline characters and extra spaces, you can use the concat! macro. It concatenates string literals at compile time.
let my_string = concat!(
"Testing for new lines ",
"might work like this?",
);
assert_eq!(my_string, "Testing for new lines might work like this?");
The accepted answer with the backslash also removes the extra spaces.
Every string is a multiline string in Rust.
But if you have indents in your text like:
fn my_func() {
const MY_CONST: &str = "\
Hi!
This is a multiline text!
";
}
you will get unnecessary spaces. To remove them you can use indoc! macros from indoc crate to remove all indents: https://github.com/dtolnay/indoc
There are two ways of writing multi-line strings in Rust that have different results. You should choose between them with care depending on what you are trying to accomplish.
Method 1: Dangling whitespace
If a string starting with " contains a literal line break, the Rust compiler will "gobble up" all whitespace between the last non-whitespace character of the line and the first non-whitespace character of the next line, and replace them with a single .
Example:
fn test() {
println!("{}", "hello
world");
}
No matter how many literal (blank space) characters (zero or a hundred) appear after hello, the output of the above will always be hello world.
Method 2: Backslash line break
This is the exact opposite. In this mode, all the whitespace before a literal \ on the first line is preserved, and all the subsequent whitespace on the next line is also preserved.
Example:
fn test() {
println!("{}", "hello \
world");
}
In this example, the output is hello world.
Additionally, as mentioned in another answer, Rust has "raw literal" strings, but they do not enter into this discussion as in Rust (unlike some other languages that need to resort to raw strings for this) supports literal line breaks in quoted content without restrictions, as we can see above.

Resources