I am writing a macro to parse some structured text into tuples, line by line. Most parts work now, but I am stuck at forming a tuple by extracting/converting Strings from a vector.
// Reading Tuple from a line
// Example : read_tuple( "1 ab 3".lines()
// ,(i32, String, i32))
// Expected : (1, "ab", 3)
// Note:: you can note use str
macro_rules! read_tuple {
(
$lines :ident , ( $( $t :ty ),* )
)
=> {{
let l = ($lines).next().unwrap();
let ws = l.trim().split(" ").collect::<Vec<_>>();
let s : ( $($t),* ) = (
// for w in ws {
// let p = w.parse().unwrap();
// ( p) ,
// }
ws[0].parse().unwrap(),
ws[1].parse().unwrap(),
//...
ws[2].parse().unwrap(),
// Or any way to automatically generate these statments?
);
s
}}
}
fn main() {
let mut _x = "1 ab 3".lines();
let a = read_tuple!( _x, (i32, String, i32));
print!("{:?}",a);
}
How can I iterate through ws and return the tuple within this macro?
You can try here
A tuple is a heterogeneous collection; each element may be of a different type. And in your example, they are of different types, so each parse method is needing to produce a different type. Therefore pure runtime iteration is right out; you do need all the ws[N].parse().unwrap() statements expanded.
Sadly there is not at present any way of writing out the current iteration of a $(…)* (though it could be simulated with a compiler plugin). There is, however, a way that one can get around that: blending run- and compile-time iteration. We use iterators to pull out the strings, and the macro iteration expansion (ensuring that $t is mentioned inside the $(…) so it knows what to repeat over) to produce the right number of the same lines. This also means we can avoid using an intermediate vector as we are using the iterator directly, so we win all round.
macro_rules! read_tuple {
(
$lines:ident, ($($t:ty),*)
) => {{
let l = $lines.next().unwrap();
let mut ws = l.trim().split(" ");
(
$(ws.next().unwrap().parse::<$t>().unwrap(),)*
)
}}
}
A minor thing to note is how I changed ),* to ,)*; this means that you will get (), (1,), (1, 2,), (1, 2, 3,), &c. instead of (), (1), (1, 2), (1, 2, 3)—the key difference being that a single-element tuple will work (though you’ll still sadly be writing read_tuple!(lines, (T))).
Related
I have an OLS fitting function that returns an OMatrix. The return type should always be a one-dimensional vector of coefficients.
use std::f64::NAN;
use nalgebra::{DMatrix, Dynamic, MatrixSlice, OMatrix, RowVector};
fn ols(
x: MatrixSlice<f64, Dynamic, Dynamic>,
y: MatrixSlice<f64, Dynamic, Dynamic>,
) -> OMatrix<f64, Dynamic, Dynamic> {
(x.transpose() * x).pseudo_inverse(0.00001).unwrap() * x.transpose() * y
}
The output of ols will always have the same number of elements which is equal to the number of columns as the input x (I'm not sure how I can change the return signature to represent this, I'm new to rust).
The output of ols should then be copied to a single row of an output matrix out. I am trying to do this with the set_row function, but I get the error expected struct 'Const', found struct 'Dynamic'.
fn my_func(
x: &DMatrix<f64>, // data matrix
y: &DMatrix<f64>, // target matrix, actually a matrix with only 1 column
) -> DMatrix<f64> {
let nrows = x.shape().0;
let ncols = x.shape().1;
// initialize out matrix to all NAN's
let mut out = DMatrix::from_element(nrows, ncols, NAN);
let i: usize = 100;
let tmp_x: MatrixSlice<f64, Dynamic, Dynamic> = x.slice((i, 0), (50, ncols));
let tmp_y: MatrixSlice<f64, Dynamic, Dynamic> = y.slice((i, 0), (50, 1));
// the next two lines are where I need help
let ols_coefs = ols(tmp_x, tmp_y);
out.set_row(i, &ols_coefs); // error occurs here
return out;
}
I suspect I need to convert the type of the output of ols somehow, but I am not sure how.
I've been rewriting some performance sensitive parts of my code to aarch64 neon. For some things, like population count, i've managed to get a 12x speed. But for some algorithms i'm having trouble..
The high level problem is quickly adding a list of newline separated strings to a hashset. Assuming the hashset functionality is optimal (I am looking into it next), first i need to scan for the strings in the buffer.
I have tried various techniques - but my intuition tells me that I can create a list of pointers to each newline, and then insert them into the hashset afterwards now that i have the slices.
The fundamental problem is I can't work out an efficient way to load a vector, compare against the newline, and spit out a list of pointers to the newlines. eg. the output is a variable length, depending on how many newlines were found in the input vector.
Here is my approach;
fn read_file7(mut buffer: Vec<u8>, needle: u8) -> Result<HashSet<Vec<u8>>, Error>
{
let mut set = HashSet::new();
let mut chunk_offset: usize = 0;
let special_finder_big = [
0x80u8, 0x40u8, 0x20u8, 0x10u8, 0x08u8, 0x04u8, 0x02u8, 0x01u8, // high
0x80u8, 0x40u8, 0x20u8, 0x10u8, 0x08u8, 0x04u8, 0x02u8, 0x01u8, // low
];
let mut next_start: usize = 0;
let needle_vector = unsafe { vdupq_n_u8(needle) };
let special_finder_big = unsafe { vld1q_u8(special_finder_big.as_ptr()) };
let mut line_counter = 0;
// we process 16 chars at a time
for chunk in buffer.chunks(16) {
unsafe {
let src = vld1q_u8(chunk.as_ptr());
let out = vceqq_u8(src, needle_vector);
let anded = vandq_u8(out, special_finder_big);
// each of these is a bitset of each matching character
let vadded = vaddv_u8(vget_low_u8(anded));
let vadded2 = vaddv_u8(vget_high_u8(anded));
let list = [vadded2, vadded];
// combine bitsets into one big one!
let mut num = std::mem::transmute::<[u8; 2], u16>(list);
// while our bitset has bits left, find the set bits
while num > 0 {
let mut xor = 0x8000u16; // only set the highest bit
let clz = (num).leading_zeros() as usize;
set.get_or_insert_owned(&buffer[(next_start)..(chunk_offset + clz)]);
// println!("found '{}' at {} | clz is {} ", needle.escape_ascii(), start_offset + clz, clz);
// println!("string is '{}'", input[(next_start)..(start_offset + clz)].escape_ascii());
xor = xor >> clz;
num = num ^ xor;
next_start = chunk_offset + clz + 1;
//println!("new num {:032b}", num);
line_counter += 1;
}
}
chunk_offset += 16;
}
// get the remaining
set.get_or_insert_owned(&buffer[(next_start)..]);
println!(
"line_counter: {} unique elements {}",
line_counter,
set.len()
);
Ok(set)
}
if I unroll this to do 64 bytes at a time, on a big input it will be slightly faster than memchr. But not much.
Any tips would be appreciated.
I've shown this to a colleague who's come up with better intrinsics code than I would. Here's his suggestion, it's not been compiled, so there needs to be some finishing off of pseudo-code pieces etc, but something along the lines of below should be much faster & work:
let mut line_counter = 0;
for chunk in buffer.chunks(32) { // Read 32 bytes at a time
unsafe {
let src1 = vld1q_u8(chunk.as_ptr());
let src2 = vld1q_u8(chunk.as_ptr() + 16);
let out1 = vceqq_u8(src1, needle_vector);
let out2 = vceqq_u8(src2, needle_vector);
// We slot these next to each other in the same vector.
// In this case the bottom 64-bits of the vector will tell you
// if there are any needle values inside the first vector and
// the top 64-bits tell you if you have any needle values in the
// second vector.
let combined = vpmaxq_u8(out1, out2);
// Now we get another maxp which compresses this information into
// a single 64-bit value, where the bottom 32-bits tell us about
// src1 and the top 32-bit about src2.
let combined = vpmaxq_u8(combined, combined);
let remapped = vreinterpretq_u64_u8 (combined);
let val = vgetq_lane_u64 (remapped, 0);
if (val == 0) // most chunks won't have a new-line
... // If val is 0 that means no match was found in either vectors, adjust offset and continue.
if (val & 0xFFFF)
... // there must be a match in src1. use below code in a function
if (val & 0xFFFF0000)
... // there must be a match in src2. use below code in a function
...
}
}
Now that we now which vector to look in, we should find the index in the vector
As an example, let's assume matchvec is the vector we found above (so either out1 or out2).
To find the first index:
// We create a mark of repeating 0xf00f chunks. when we fill an entire vector
// with it we get a pattern where every byte is 0xf0 or 0x0f. We'll use this
// to find the index of the matches.
let mask = unsafe { vreinterpretq_u16_u8 (vdupq_n_u16 (0xf00f)); }
// We first clear the bits we don't want, which leaves for each adjacent 8-bit entries
// 4 bits of free space alternatingly.
let masked = vandq_u8 (matchvec, mask);
// Which means when we do a pairwise addition
// we are sure that no overflow will ever happen. The entries slot next to each other
// and a non-zero bit indicates the start of the first element.
// We've also compressed the values into the lower 64-bits again.
let compressed = vpaddq_u8 (masked, masked);
let val = vgetq_lane_u64 (compressed, 0);
// Post now contains the index of the first element, every 4 bit is a new entry
// This assumes Rust has kept val on the SIMD side. if it did not, then it's best to
// call vclz on the lower 64-bits of compressed and transfer the results.
let pos = (val).leading_zeros() as usize;
// So just shift pos right by 2 to get the actual index.
let pos = pos >> 2;
pos will now contain the index of the first needle value.
If you were processing out2, remember to add 16 to the result.
To find all the indices we can run through the bitmask without using clz, we avoid the repeated register file transfers this way.
// set masked and compressed as above
let masked = vandq_u8 (matchvec, mask);
let compressed = vpaddq_u8 (masked, masked);
int idx = current_offset;
while (val)
{
if (val & 0xf)
{
// entry found at idx.
}
idx++;
val = val >> 4;
}
Let's assume, we want a bunch of constants, associating each square of a chess board with its coordinates, so we can use those constants in our Rust code.
One such definition could be:
#[allow(dead_code)]
const A1: (usize,usize) = (0, 0);
and there would be 64 of them.
Now, as a emacs user, I could generate the source code easily, for example with:
(dolist (col '(?A ?B ?C ?D ?E ?F ?G ?H))
(dolist (row '(?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8))
(insert "#[allow(dead_code)]")
(end-of-line)
(newline-and-indent)
(insert "const " col row ": (usize,usize) = ("
(format "%d" (- col ?A))
", "
(format "%d" (- row ?1))
");")
(end-of-line)
(newline-and-indent)))
With the drawback, that now my file just grew by 128 exceptionally boring lines.
In Common Lisp, I would solve this aspect, by defining myself a macro, for example:
(defmacro defconst-square-names ()
(labels ((square-name (row col)
(intern
(format nil "+~C~D+"
(code-char (+ (char-code #\A) col))
(+ row 1))))
(one-square (row col)
`(defconstant ,(square-name row col)
(cons ,row ,col))))
`(eval-when (:compile-toplevel :load-toplevel :execute)
,#(loop
for col below 8
appending
(loop for row below 8
collecting (one-square row col))))))
(defconst-square-names) ;; nicer packaging of those 64 boring lines...
Now, the question arises, of course,
if Rust macro system is able to accomplish this?
can someone show such a macro?
I read, you need to put such Rust macro into a separate crate or whatnot?!
UPDATE
#aedm pointed me with the comment about seq-macro crate to my first attempt to get it done. But unfortunately, from skimming over various Rust documents about macros, I still don't know how to define and call compile time functions from within such a macro:
fn const_name(index:usize) -> String {
format!("{}{}",
char::from_u32('A' as u32
+ (index as u32 % 8)).unwrap()
, index / 8)
}
seq!(index in 0..64 {
#[allow(dead_code)]
const $crate::const_name(index) : (usize,usize) = ($(index / 8), $(index %8));
});
In my Common Lisp solution, I just defined local functions within the macro to get such things done. What is the Rust way?
Here's one way to do it only with macro_rules! ("macros by example") and the paste crate (to construct the identifiers). It's not especially elegant, but it is fairly short and doesn't require you to write a proc-macro crate.
It needs to be invoked with all of the involved symbols since macro_rules! can't do arithmetic. (Maybe seq-macro would help some with that, but I'm not familiar with it.)
use paste::paste;
macro_rules! board {
// For each column, call column!() passing the details of that column
// and all of the rows. (This can't be done in one macro because macro
// repetition works like "zip", not like "cartesian product".)
( ($($cols:ident $colnos:literal),*), $rows:tt ) => {
$( column!($cols, $colnos, $rows); )*
};
}
/// Helper for board!
macro_rules! column {
( $col:ident, $colno:literal, ($($rows:literal),*) ) => {
$(
paste! {
// [< >] are special brackets that tell the `paste!` macro to
// paste together all the pieces appearing within them into
// a single identifier.
#[allow(dead_code)]
const [< $col $rows >]: (usize, usize) = ($colno, $rows - 1);
}
)*
};
}
board!((A 0, B 1, C 2, D 3, E 4, F 5, G 6, H 7), (1, 2, 3, 4, 5, 6, 7, 8));
fn main() {
dbg!(A1, A8, H1, H8);
}
In Perl there is tie. Python supports various protocols so that objects can behave like i.e. a dictionary. Is there something similar in Raku?
I.e. Can I define an object that behaves like a Hash? That is: can I write $myobject<key> to end up in a routine that I can specify myself?
Perl has the Hash feature baked into the language.
So to extend it so that an object behaves like a Hash you needed to tell the runtime to do something different.
That is not the case for Raku.
A Hash in Raku is just another object.
The Hash indexing operation is just another operator that can be overloaded the same way you can overload other operators.
So you can create your own object that has the same features as a Hash, or even just inherit from it.
class Foo is Hash {
}
class Bar does Associative {
# delegate method calls to a Hash object
has %!hash handles Hash;
}
The reason to have does Associative is so that you can use it as the type to back an associative variable. (Hash already does Associative so you would inherit that too.)
my %f is Foo;
my %b is Bar;
To find out which methods you can write to implement Hash indexing operations you could look at the methods that Hash implements.
Since we know that methods that automatically get called are uppercase, we only need to look at them.
Hash.^methods.map(*.name).grep(/^<:Lu + [-]>+$/)
# (STORE BIND-KEY WHICH AT-KEY ASSIGN-KEY DELETE-KEY
# DUMP BUILDALL ASSIGN-KEY EXISTS-KEY AT-KEY STORE ACCEPTS BUILDALL)
It should be fairly obvious that the methods ending with -KEY are the ones we would want to write. (The other ones are mostly just object artifacts.)
You currently don't have to write any of them to make your object type Associative.
If you don't write a particular method, that feature won't work.
class Point does Associative {
has Real ($.x, $.y);
multi method AT-KEY ( 'x' ){ $!x }
multi method AT-KEY ( 'y' ){ $!y }
multi method ASSIGN-KEY ( 'x', Real $new-value ){ $!x = $new-value }
multi method ASSIGN-KEY ( 'y', Real $new-value ){ $!y = $new-value }
multi method EXISTS-KEY ( 'x' --> True ){}
multi method EXISTS-KEY ( 'y' --> True ){}
multi method EXISTS-KEY ( Any --> False ){}
}
my %p is Point;
%p<x> = 1;
%p<y> = 2;
say %p.x; # 1
say %p.y; # 2
Note that above has a few limitations.
You can't assign to more than one attribute at a time.
%p< x y > = 1,2;
You can't assign the values in the declaration.
my %p is Point = 1,2;
my %p is Point = x => 1, y => 2;
In the multi-assignment, the method that gets called is AT-KEY. So to make it work those must be marked as raw or rw
class Point does Associative {
…
multi method AT-KEY ( 'x' ) is rw { $!x }
multi method AT-KEY ( 'y' ) is rw { $!y }
…
}
…
%p<x y> = 1,2;
That takes care of multi assignment, but that still leaves the initialization in the declaration.
If you declared an attribute as is required the only way to write it would be:
my %p := Point.new( x => 1, y => 2 );
If you didn't do that you could implement STORE.
class Point does Associative {
…
method STORE ( \list ) {
($!x,$!y) = list.Hash<x y>
}
}
my %p is Point = x => 1, y => 2;
That also makes it so that you can also assign to it later.
%p = x => 3, y => 4;
Which is possibly not what you wanted.
We can fix that though.
Just make it so that there has to be an :INITIALIZE argument.
class Point does Associative {
…
method STORE ( \list, :INITIALIZE($) is required ) {
($!x,$!y) = list.Hash<x y>
}
}
my %p is Point = x => 1, y => 2;
# %p = x => 3, y => 4; # ERROR
In the case of Point we might want to be able to declare it wit a list of two elements:
my %p is Point = 1,2;
Or by name:
my %p is Point = x => 1, y => 2;
To do that we can change how STORE works.
We'll just look at the first value in the list and check if it is Associative.
If it is we will assume all of the arguments are also Associative.
Otherwise we will assume that it is a list of two values, x and y.
class Point does Associative {
…
method STORE ( \list, :INITIALIZE($) is required ) {
if list.head ~~ Associative {
($!x,$!y) = list.Hash<x y>
} else {
($!x,$!y) = list
}
}
}
my %a is Point = x => 1, y => 2;
my %b is Point = 1,2;
In raku the syntactical <> seems to be an postcircumfix operator that can be overloaded via a multi method AT-KEY and EXISTS-KEY as described in https://docs.raku.org/language/subscripts#Methods_to_implement_for_associative_subscripting
Can I define a object that behaves like an hash? That is: if I write $myobject<key> I endup in a function that I can specify myself?
The short answer is. No, there is not in core Raku. But there is a module that makes it easy for you to do, having only to define 5 methods to create full functionality as a "real" Hash: Hash::Agnostic
The longer answer is: read the other answers to this question :-)
I'm working through the Advent of Code 2015 problems in order to practise my Rust skills.
Here is the problem description:
Realizing the error of his ways, Santa has switched to a better model of determining whether a string is naughty or nice. None of the old rules apply, as they are all clearly ridiculous.
Now, a nice string is one with all of the following properties:
It contains a pair of any two letters that appears at least twice in the string without overlapping, like xyxy (xy) or aabcdefgaa (aa), but not like aaa (aa, but it overlaps).
It contains at least one letter which repeats with exactly one letter between them, like xyx, abcdefeghi (efe), or even aaa.
For example:
qjhvhtzxzqqjkmpb is nice because is has a pair that appears twice (qj) and a letter that repeats with exactly one letter between them (zxz).
xxyxx is nice because it has a pair that appears twice and a letter that repeats with one between, even though the letters used by each rule overlap.
uurcxstgmygtbstg is naughty because it has a pair (tg) but no repeat with a single letter between them.
ieodomkazucvgmuy is naughty because it has a repeating letter with one between (odo), but no pair that appears twice.
How many strings are nice under these new rules?
This is what I've managed to come up with so far:
pub fn part2(strings: &[String]) -> usize {
strings.iter().filter(|x| is_nice(x)).count()
/* for s in [
String::from("qjhvhtzxzqqjkmpb"),
String::from("xxyxx"),
String::from("uurcxstgmygtbstg"),
String::from("ieodomkazucvgmuy"),
String::from("aaa"),
]
.iter()
{
is_nice(s);
}
0 */
}
fn is_nice(s: &String) -> bool {
let repeat = has_repeat(s);
let pair = has_pair(s);
/* println!(
"s = {}: repeat = {}, pair = {}, nice = {}",
s,
repeat,
pair,
repeat && pair
); */
repeat && pair
}
fn has_repeat(s: &String) -> bool {
for (c1, c2) in s.chars().zip(s.chars().skip(2)) {
if c1 == c2 {
return true;
}
}
false
}
fn has_pair(s: &String) -> bool {
// Generate all possible pairs
let mut pairs = Vec::new();
for (c1, c2) in s.chars().zip(s.chars().skip(1)) {
pairs.push((c1, c2));
}
// Look for overlap
for (value1, value2) in pairs.iter().zip(pairs.iter().skip(1)) {
if value1 == value2 {
// Overlap has occurred
return false;
}
}
// Look for matching pair
for value in pairs.iter() {
if pairs.iter().filter(|x| *x == value).count() >= 2 {
//println!("Repeat pair: {:?}", value);
return true;
}
}
// No pair found
false
}
However despite getting the expected results for the commented-out test values, my result when running on the actual puzzle input does not compare with community verified regex-based implementations. I can't seem to see where the problem is despite having thoroughly tested each function with known test values.
I would rather not use regex if at all possible.
I think has_pairs has a bug:
In the word aaabbaa, we have overlapping aa (at the beginning aaa), but I think you are not allowed to return false right away, because there is another - non-overlapping - aa at the end of the word.