get console width of char - rust

In rust, how can I get the console ("terminal") width of a character (char)?
I want the "column width" as displayed in a console. Typically this is found by counting characters in the string; s.chars().count(). However some Unicode characters are more than one column width wide. Some are "full width" and others are visually large characters that require multiple console columns.
A few examples
fn main() {
// ASCII 'A'
let c: char = '\u{41}';
println!("{:?} len_utf8 {} len_utf16 {} as u128 0x{:08X}", c, c.len_utf8(), c.len_utf16(), c as u128);
// full-width A
let c: char = '\u{FF21}';
println!("{:?} len_utf8 {} len_utf16 {} as u128 0x{:08X}", c, c.len_utf8(), c.len_utf16(), c as u128);
// visually wide char NEW MOON WITH FACE
let c: char = '\u{1F31A}';
println!("{:?} len_utf8 {} len_utf16 {} as u128 0x{:08X}", c, c.len_utf8(), c.len_utf16(), c as u128);
}
(rust playground)
Prints
'A' len_utf8 1 len_utf16 1 as u128 0x00000041
'A' len_utf8 3 len_utf16 1 as u128 0x0000FF21
'🌚' len_utf8 4 len_utf16 2 as u128 0x0001F31A
Character 'LATIN CAPITAL LETTER A' A (U+0041) is displayed one console columns wide.
Character 'FULLWIDTH LATIN CAPITAL LETTER A' A (U+FF21) is displayed two console columns wide.
Character 'NEW MOON WITH FACE' 🌚 (U+1F31A) is displayed three console columns wide.
In rust, how can I find a char console column width?

Use the unicode_width crate. Docs: https://docs.rs/unicode-width/latest/unicode_width/

Related

Rust signed modulo unsigned -> unsigned

In (stable) Rust, is there a relatively straightforward method of implementing the following function?
fn mod_euclid(val: i128, modulo: u128) -> u128;
Note the types! That is, 'standard' euclidean modulus (result is always in the range of [0, mod)), avoiding spurious overflow/underflow in the intermediate calculation. Some test cases:
// don't-care, just no panic or UB.
// Mild preference for treating this as though it was mod=1<<128 instead of 0.
assert_dc!(mod_euclid(i128::MAX, 0));
assert_dc!(mod_euclid( 0, 0));
assert_dc!(mod_euclid(i128::MIN, 0));
assert_eq!(mod_euclid( 1, 10), 1);
assert_eq!(mod_euclid( -1, 10), 9);
assert_eq!(mod_euclid( 11, 10), 1);
assert_eq!(mod_euclid( -11, 10), 9);
assert_eq!(mod_euclid(i128::MAX, 1), 0);
assert_eq!(mod_euclid( 0, 1), 0);
assert_eq!(mod_euclid(i128::MIN, 1), 0);
assert_eq!(mod_euclid(i128::MAX, u128::MAX), i128::MAX as u128);
assert_eq!(mod_euclid( 0, u128::MAX), 0);
assert_eq!(mod_euclid(i128::MIN, u128::MAX), i128::MAX as u128);
For signed%signed->signed, or unsigned%unsigned->unsigned, this is relatively straightforward. However, I can't find a good way of calculating signed % unsigned -> unsigned without converting one of the arguments - and as the last example illustrates, this may overflow or underflow no matter which direction you choose.
As far as I can tell, there is no such function in the standard library, but it's not very difficult to write one yourself:
fn mod_euclid(a: i128, b: u128) -> u128 {
if a >= 0 {
(a as u128) % b
} else {
let r = (!a as u128) % b;
b - r - 1
}
}
Playground link
How it works:
If a is non-negative then it's straightforward - just use the unsigned remainder operator.
Otherwise, the bitwise complement !a is non-negative (because the sign bit is flipped), and numerically equal to -a - 1. This means r is equivalent to b - a - 1 modulo b, and hence b - r - 1 is equivalent to a modulo b. Conveniently, b - r - 1 is in the expected range 0..b.
Maybe a little bit more straight forward, use rem_euclid where possible and else return the positive value equivalent to a:
pub fn mod_euclid(a: i128, b: u128) -> u128 {
const UPPER: u128 = i128::MAX as u128;
match b {
1..=UPPER => a.rem_euclid(b as i128) as u128,
_ if a >= 0 => a as u128,
// turn a from two's complement negative into it's
// equivalent positive value by adding u128::MAX
// essentialy calculating u128::MAX - |a|
_ => u128::MAX.wrapping_add_signed(a),
//_ => a as u128 - (a < 0) as u128,
}
}
(The parser didn't like my casting in the match hence UPPER)
Playground
Results in a little fewer instructions & jumps on x86_64 as well.

0th index value in Vector is blank, but only the 0th index of a second vector

I'm currently taking a class where we are learning Rust. I am running into a really weird issue and I'm not sure what is causing it.
So I have two vectors, and I'm looping through them to print their contents.
when I do the following, it prints out fine,
for index in 0..=delimited_record.capacity()-1{
let cell = delimited_record.get(index).unwrap();
println!("{}",cell);
}
the result is as follows
99
87.4
76
mike
73
95.5
100
gary
however, the following small change makes the 0th index of the second vector come out blank
for index in 0..=delimited_record.capacity()-1{
let cell = delimited_record.get(index).unwrap();
print!("{}",cell);
println!(" - {}",index);
}
the result is as follows
99 - 0
87.4 - 1
76 - 2
mike - 3
- 0
95.5 - 1
100 - 2
gary - 3
The thing is, if I try to do anything besides do a regular print, it'll come back blank. These values are all strings, so what I'm trying to do is convert these values into floats (not the names, just the numbers), but it keeps crashing whenever I try to parse the second array at the 0th index, and it seems it's because its blank for some reason.
Does anyone know what is causing this?
Edit
Here is the code in its entirety, Theres a lot of stuff commented out since i was trying to find out was was causing the issue
use std::{fs, ptr::null};
fn main() {
println!("Hello, world!");
println!("TEST");
let sg = StudentGrades{
name: String::from("mike"),
grades: vec![100.0,80.0,55.0,33.9,77.0]
//grades: vec!['A','B','C','A']
};
let mut cg = CourseGrades{
student_records: Vec::new()
};
cg.from_file("texttest.txt".to_owned());
}
struct StudentGrades{
name: String,
grades: Vec<f32>
}
impl StudentGrades{
fn average(&self)-> f32 {
let mut sum = 0.0;
for grade in self.grades.iter(){
sum += grade;
}
return sum / self.grades.capacity() as f32;
}
fn grade(&self) -> char {
let score = self.average();
let scure_truncated = score as i64 / 1;
match scure_truncated{
0..=59=> return 'F',
60..=69=> return 'D',
70..=79=> return 'C',
80..=89=> return 'B',
_=> return 'A',
}
}
}
struct CourseGrades{
student_records: Vec<StudentGrades>
}
impl CourseGrades{
fn from_file(&mut self, file_path:String){
let mut contents = fs::read_to_string(file_path)
.expect("Should have been able to read the file");
contents = contents.replace(" ","");
let student_rows: Vec<&str> = contents.rsplit('\n').collect();
for student_record in student_rows.iter(){
let mut student_grades:Vec<f32> = Vec::new();
let delimited_record: Vec<&str> = student_record.rsplit(",").collect();
//println!("{}",delimited_record.get(0).unwrap());
//delimited_record.iter().for_each(|x| println!("{}",x));
for index in 0..=delimited_record.len()-1{
//println!("{}",delimited_record.get(index).unwrap().parse::<f32>().unwrap_or_default());
//println!("{}",delimited_record.get(0).unwrap().parse::<i32>().unwrap_or_default());
//student_grades.push(delimited_record.get(index).unwrap().parse::<f32>().unwrap());
let cell = delimited_record.get(index).unwrap();
print!("{}",cell);
println!(" - {}",index);
//println!(" - {}", index < delimited_record.capacity()-1);
if index < delimited_record.len(){
//let grade = cell.parse::<f32>().unwrap();
//student_grades.push(cell.parse::<f32>().unwrap());
//student_grades.push(delimited_record.get(index).unwrap().parse::<f32>().unwrap());
}
else{
/*self.student_records.push(StudentGrades{
name:cell.parse::<String>().unwrap(),
grades:student_grades.to_owned()
});*/
}
}
}
}
}
the testtext.txt file is in the root of the project folder and is just a text file with the following contents
gary, 100, 95.5, 73
mike, 76, 87.4, 99
if I embed it directly, it works just fine, which makes me think there may be something weird when it reads the file
Your file has CR LF (\r\n, Windows-style) line endings, but you’re only splitting on \n. Your grade ends up ending with a CR. This is invisible when you println! it on its own (since the result is CR LF), but if you print something after it on the same line, the CR has returned the cursor to the start of the line and the second thing being printed will write over it.
7 73 73 73 73
^ ^ ^ ^ 9
^ ^
----------------------------------------
'7' '3' '\r' '\n' '9' …
7 73 73 3 -
^ ^ ^ ^ ^ ^
----------------------------------------
'7' '3' '\r' ' ' '-' …
One way to fix this is to strip all whitespace from every cell, including CR:
let cell = delimited_record.get(index).unwrap().trim();
And for debugging, consider formatting with "{:?}" instead of "{}", which will show a literal with invisible characters escaped instead of writing out whatever the string contains directly.

Format number by rounding up

I have a number which I want to print with a fixed precision, rounded up. I know I can use {:.3} to truncate it.
assert_eq!("0.0123", format!("{:.3}", 0.0123456))
Is there a simple way to "ceil" it instead?
assert_eq!("0.0124", format!("{:magic}", 0.012301))
assert_eq!("0.0124", format!("{:magic}", 0.012399))
assert_eq!("0.0124", format!("{:magic}", 0.0124))
I can do something like
let x = format!("{:.3}", (((y * 1000.0).ceil() + 0.5) as i64) as f64 / 1000.0)
which is pretty unreadable. It also gives me would give me 3 digits after the decimal point, not three digits of precision, so I need to figure out the scale the number, probably with something like -log10(y) as i64
In case it's not clear, I want a string to show the user, not an f64.
More examples
assert_eq!("1.24e-42", format!("{:magic}", 1.234e-42))
assert_eq!("1240", format!("{:magic}", 1234.5)) // "1240." also works
If the f64 representing 0.123 is slightly larger than the real number 0.123, displaying "0.124" is acceptable.
The two requirements are:
The string, when converted back to an f64, is greater than or equal to the original f64 (so 0.123 -> "0.124" is acceptable)
The string has 3 significant digits (although dropping trailing zeros is acceptable, so 0.5 -> "0.5" and "0.5 -> "0.500" both work)
In case it comes up, the input number will always be positive.
This is harder than it seems because there is no way to tell the formatting machinery to change the rounding strategy. Also, format precision works on the number of digits after the decimal point, not on the number of significant digits. (AFAIK there is no equivalent to the printf("%.3g", n), and even if there were, it wouldn't round up.)
You can use a decimal arithmetic crate such as rust_decimal to do the heavy-lifting - something like:
use rust_decimal::prelude::*;
pub fn fmtup(n: f64, ndigits: u32) -> String {
let d = Decimal::from_f64_retain(n).unwrap();
d.round_sf_with_strategy(ndigits, RoundingStrategy::AwayFromZero)
.unwrap()
.normalize()
.to_string()
}
EDIT: The answer originally included a manual implementation of the rounding due to issues in rust_decimal which have since been fixed. As of Oct 24 2021 the above snippet using rust_decimal is the recommended solution. The only exception is if you need to handle numbers that are very large or very close to zero (such as 1.234e-42 or 1.234e42), which are approximated to zero or rejected by rust_decimal.
To manually round to significant digits, one can scale the number until it has the desired number of digits before the decimal point, and then round it up. In case of 3 digits, scaling would multiply or divide it by 10 until it falls between 100 and 1000. After rounding the number, format the resulting whole number as string, and insert the . at the position determined by the amount of scaling done in the first step.
To avoid inexactness of floating-point division by ten, the number can be first converted to a fraction, and then all operations can proceed on the fraction. Here is an implementation that uses the ubiquitous num crate to provide fractions:
use num::{rational::BigRational, FromPrimitive};
/// Format `n` to `ndigits` significant digits, rounding away from zero.
pub fn fmtup(n: f64, ndigits: i32) -> String {
// Pass 0 (which we can't scale), infinities and NaN to f64::to_string()
if n == 0.0 || !n.is_finite() {
return n.to_string();
}
// Handle negative numbers the easy way.
if n < 0.0 {
return format!("-{}", fmtup(-n, ndigits));
}
// Convert the input to a fraction. From this point onward, we are only doing exact
// arithmetic.
let mut n = BigRational::from_float(n).unwrap();
// Scale N so its whole part is ndigits long, meaning truncating it will result in an
// integer ndigits long. If ndigits is 3, we'd want N to be in (100, 1000] range, so
// that e.g. 0.012345 would be scaled to 123.45, and then rounded up to 124.
let mut scale = 0i16;
let ten = BigRational::from_u8(10).unwrap();
let lower_bound = ten.pow(ndigits - 1);
if n < lower_bound {
while n < lower_bound {
n *= &ten;
scale -= 1;
}
} else {
let upper_bound = lower_bound * &ten;
while n >= upper_bound {
n /= &ten;
scale += 1;
}
}
// Round N up
n = n.ceil();
// Format the number as integer and place the decimal point at the right position.
let mut s = n.to_string();
// multiply N with 10**scale, i.e. append zeros if SCALE is positve, otherwise
// insert the point inside or before the number
if scale > 0 {
s.extend(std::iter::repeat('0').take(scale as _));
} else if scale < 0 {
// Find where to place the decimal point in the string.
let point_pos = s.len() as i16 + scale;
if point_pos <= 0 {
// Negative position means before beginning of the string, so we have
// to pad with zeros. E.g. s == "123" and point_pos == -2 means we
// want "0.00123", and with point_pos == 0 we'd want "0.123".
let mut pad = "0.".to_string();
pad.extend(std::iter::repeat('0').take(-point_pos as _));
pad.push_str(&s);
s = pad;
// Trim trailing zeros after decimal point. E.g. 0.25 gets scaled to
// 250 and then ends up "0.250".
s.truncate(s.trim_end_matches('0').len());
} else {
// Insert the decimal point in the middle of string. E.g. s == "123"
// and point_pos == 1 would result in "1.23".
let point_pos = point_pos as usize;
if s.as_bytes()[point_pos..].iter().all(|&digit| digit == b'0') {
// if only zeros are after the decimal point, e.g. "10.000", omit those
// digits instead of placing the decimal point.
s.truncate(point_pos);
} else {
s.insert(point_pos, '.');
}
}
}
s
}
Playground
Here are some test cases:
fn main() {
let fmt3up = |n| fmtup(n, 3);
assert_eq!("12400", fmt3up(12301.));
assert_eq!("1240", fmt3up(1234.5));
assert_eq!("124", fmt3up(123.01));
assert_eq!("1000", fmt3up(1000.));
assert_eq!("999", fmt3up(999.));
assert_eq!("1010", fmt3up(1001.));
assert_eq!("100", fmt3up(100.));
assert_eq!("10", fmt3up(10.));
assert_eq!("99", fmt3up(99.));
assert_eq!("101", fmt3up(101.));
assert_eq!("0.25", fmt3up(0.25));
assert_eq!("12400", fmt3up(12301.0));
assert_eq!("0.0124", fmt3up(0.0123)); // because 0.0123 is slightly above 123/10_000
assert_eq!("0.0124", fmt3up(0.012301));
assert_eq!("0.00124", fmt3up(0.0012301));
assert_eq!("0.0124", fmt3up(0.012399));
assert_eq!("0.0124", fmt3up(0.0124));
assert_eq!("0.124", fmt3up(0.12301));
assert_eq!("1.24", fmt3up(1.2301));
assert_eq!("1.24", fmt3up(1.234));
}
Note that this will display 1.234e-42 as 0.00000000000000000000000000000000000000000124, but an improvement that to switch to exponential notation should be fairly straightforward.

Advent of Code 2015: day 5, part 2 unknown false positives

I'm working through the Advent of Code 2015 problems in order to practise my Rust skills.
Here is the problem description:
Realizing the error of his ways, Santa has switched to a better model of determining whether a string is naughty or nice. None of the old rules apply, as they are all clearly ridiculous.
Now, a nice string is one with all of the following properties:
It contains a pair of any two letters that appears at least twice in the string without overlapping, like xyxy (xy) or aabcdefgaa (aa), but not like aaa (aa, but it overlaps).
It contains at least one letter which repeats with exactly one letter between them, like xyx, abcdefeghi (efe), or even aaa.
For example:
qjhvhtzxzqqjkmpb is nice because is has a pair that appears twice (qj) and a letter that repeats with exactly one letter between them (zxz).
xxyxx is nice because it has a pair that appears twice and a letter that repeats with one between, even though the letters used by each rule overlap.
uurcxstgmygtbstg is naughty because it has a pair (tg) but no repeat with a single letter between them.
ieodomkazucvgmuy is naughty because it has a repeating letter with one between (odo), but no pair that appears twice.
How many strings are nice under these new rules?
This is what I've managed to come up with so far:
pub fn part2(strings: &[String]) -> usize {
strings.iter().filter(|x| is_nice(x)).count()
/* for s in [
String::from("qjhvhtzxzqqjkmpb"),
String::from("xxyxx"),
String::from("uurcxstgmygtbstg"),
String::from("ieodomkazucvgmuy"),
String::from("aaa"),
]
.iter()
{
is_nice(s);
}
0 */
}
fn is_nice(s: &String) -> bool {
let repeat = has_repeat(s);
let pair = has_pair(s);
/* println!(
"s = {}: repeat = {}, pair = {}, nice = {}",
s,
repeat,
pair,
repeat && pair
); */
repeat && pair
}
fn has_repeat(s: &String) -> bool {
for (c1, c2) in s.chars().zip(s.chars().skip(2)) {
if c1 == c2 {
return true;
}
}
false
}
fn has_pair(s: &String) -> bool {
// Generate all possible pairs
let mut pairs = Vec::new();
for (c1, c2) in s.chars().zip(s.chars().skip(1)) {
pairs.push((c1, c2));
}
// Look for overlap
for (value1, value2) in pairs.iter().zip(pairs.iter().skip(1)) {
if value1 == value2 {
// Overlap has occurred
return false;
}
}
// Look for matching pair
for value in pairs.iter() {
if pairs.iter().filter(|x| *x == value).count() >= 2 {
//println!("Repeat pair: {:?}", value);
return true;
}
}
// No pair found
false
}
However despite getting the expected results for the commented-out test values, my result when running on the actual puzzle input does not compare with community verified regex-based implementations. I can't seem to see where the problem is despite having thoroughly tested each function with known test values.
I would rather not use regex if at all possible.
I think has_pairs has a bug:
In the word aaabbaa, we have overlapping aa (at the beginning aaa), but I think you are not allowed to return false right away, because there is another - non-overlapping - aa at the end of the word.

Is it possible to write literal byte strings with values larger than 127 in rust?

I've noticed that in Rust, we can't use the byte notation for values larger than 128, that is
let x = "\x01\x17\x7f"
is fine since all chars are < 128, but
let x = "\x01\x17\x80"
will fail since \x80 = 128.
Is there any way to still write string-like objects in that format?
Above 127 you enter the realm of Unicode and must use the \u{codepoint} escape sequence:
let x = "\u{80}";
Note however that 0x80 by itself isn't a valid byte in a UTF-8 string, so this turns out as two bytes:
let x = "\u{80}";
for b in x.bytes() {
println!("{:X}", b);
}
prints
C2
80
If you instead need the value 0x80, you can't use a string and must use a byte slice:
fn main() {
let x = b"\x80";
for b in x {
println!("{:X}", b);
}
}
prints
80

Resources