Rust polars : unexpected befaviour of when().then().otherwise() in groupby-agg context - rust

I have a complicated mapping logic which I seek to execute within groupby context. The code compiles and doesn't panic, but results are incorrect. I know the logic implementation is correct. Hence, I wonder if when-then-otherwise is supposed to be used within groupby at all?
use polars::prelude::*;
use polars::df;
fn main() {
let df = df! [
"Region" => ["EU", "EU", "EU", "EU", "EU"],
"MonthCCY" => ["APRUSD", "MAYUSD", "JUNEUR", "JULUSD", "APRUSD"],
"values" => [1, 2, 3, 4, 5],
].unwrap();
let df = df.lazy()
.groupby_stable([col("MonthCCY")])
.agg( [
month_weight().alias("Weight"),
]
);
}
pub fn month_weight() -> Expr {
when(col("Region").eq(lit("EU")))
.then(
// First, If MonthCCY is JUNEUR or AUGEUR - apply 0.05
when(col("MonthCCY").map( |s|{
Ok( s.utf8()?
.contains("JUNEUR|AUGEUR")?
.into_series() )
}
, GetOutput::from_type(DataType::Boolean)
))
.then(lit::<f64>(0.05))
.otherwise(
// Second, If MonthCCY is JANEUR - apply 0.0225
when(col("MonthCCY").map( |s|{
Ok( s.utf8()?
.contains("JANEUR")?
.into_series() )
}
, GetOutput::from_type(DataType::Boolean)
))
.then(lit::<f64>(0.0225))
.otherwise(
// Third, If MonthCCY starts with JUL or FEB (eg FEBUSD or FEBEUR)- apply 0.15
when(col("MonthCCY").apply( |s|{
let x = s.utf8()?
.str_slice(0, Some(3))?;
let y = x.contains("JUL|FEB")?
.into_series();
Ok(y)
}
, GetOutput::from_type(DataType::Boolean)
))
.then(lit::<f64>(0.15))
//Finally, if none of the above matched, apply 0.2
.otherwise(lit::<f64>(0.20))
)
)
).otherwise(lit::<f64>(0.0))
}
The result I am getting is:
┌──────────┬─────────────┐
│ MonthCCY ┆ Weight │
│ --- ┆ --- │
│ str ┆ list [f64] │
╞══════════╪═════════════╡
│ APRUSD ┆ [0.2, 0.15] │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ MAYUSD ┆ [0.2] │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ JUNEUR ┆ [0.05] │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ JULUSD ┆ [0.2] │
└──────────┴─────────────┘
Clearly, I would expect JULUSD to be [0.15] and APRUSD to be [0.2, 0.2].
Is my expectation of how when().then().otherwise() works within groupby wrong?
I am on Windows11, rustc 1.60.

Yep, you're doing the groupby and the mapping in the wrong order. month_weight() is not an aggregation expression but a simple mapping expression.
As it is, each group of the DataFrame is getting agged into a series that ultimately derives from order of the data in the original frame.
You first want to create a Weight column with values given by the mapping you specify in month_weight(), and then you want to aggregate this column into a list for each group.
So, what you want is the following:
let df = df
.lazy()
.with_column(month_weight().alias("Weight")) // create new column first
.groupby_stable([col("MonthCCY")]) // then group
.agg([col("Weight").list()]); // then collect into a list per group
println!("{:?}", df.collect().unwrap());
Output:
shape: (4, 2)
┌──────────┬────────────┐
│ MonthCCY ┆ Weight │
│ --- ┆ --- │
│ str ┆ list [f64] │
╞══════════╪════════════╡
│ APRUSD ┆ [0.2, 0.2] │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ MAYUSD ┆ [0.2] │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ JUNEUR ┆ [0.05] │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ JULUSD ┆ [0.15] │
└──────────┴────────────┘
Also, as an aside, .when().then() can be chained indefinitely; you don't need to nest them. So just as you can write a chained if ... else if ... else if ... else, you can write col().when().then().when().then() ... .otherwise(), which is a lot simpler than nesting each additional condition.

Related

An argument definition must end with a newline

I have a dynamic block in aws_cloudfront_distribution which is the following:
dynamic "ordered_cache_behavior" {
for_each = var.ordered_cache_behaviors
content {
path_pattern = ordered_cache_behavior.value.path_pattern
allowed_methods = ordered_cache_behavior.value.allowed_methods
cached_methods = ordered_cache_behavior.value.cached_methods
target_origin_id = var.origin_id
cache_policy_id = var.cache_policy_ids["${var.policy_prefix}${ordered_cache_behavior.value.cache_policy_name}"]
origin_request_policy_id = ordered_cache_behavior.value.path_pattern = "/" ? var.origin_request_policy_ids["whitelist_policy"] : null
dynamic "lambda_function_association" {
for_each = var.enable_auth ? var.default_cache_behavior.lambda_function_association : ordered_cache_behavior.value.lambda_function_association
content {
event_type = lambda_function_association.value.event_type
include_body = lambda_function_association.value.include_body
lambda_arn = lambda_function_association.value.lambda_arn != "" ? lambda_function_association.value.lambda_arn : local.lambda_mapping[lambda_function_association.value.event_type]
}
}
compress = ordered_cache_behavior.value.compress
viewer_protocol_policy = ordered_cache_behavior.value.viewer_protocol_policy
}
I got :
│ Error: Missing newline after argument
│
│ on main.tf line 111, in resource "aws_cloudfront_distribution" "web_distribution":
│ 111: origin_request_policy_id = ordered_cache_behavior.value.path_pattern = "/" ? var.origin_request_policy_ids["origin_1H_1D_plp_pdp_whitelist_whitelist_none"] : null
│
│ An argument definition must end with a newline.
I still dont know why I am getting the error, so what I basically wanna do is to define origin_request_policy_id based on if path pattern is / .
am I missing something?
Terraform's parser is rejecting what you wrote here because you used the = symbol in the middle of an expression. Terraform doesn't understand what you intended and so it's guessing that the expression ends after ordered_cache_behavior.value.path_pattern and then complaining that there isn't a newline at that point.
However, I think what you really intended to do here was test for equality between ordered_cache_behavior.value.path_pattern and "/". The operator for equality test is == rather than =, so you can add the extra equals character to make this valid:
origin_request_policy_id = ordered_cache_behavior.value.path_pattern == "/" ? var.origin_request_policy_ids["whitelist_policy"] : null
Because this expression is long and has multiple complex parts, I might suggest rewriting it to be a multi-line expression like this for readability, but of course that's subjective and optional:
origin_request_policy_id = (
ordered_cache_behavior.value.path_pattern == "/" ?
var.origin_request_policy_ids["whitelist_policy"] :
null
)
I think that you can't assign something in an definition in the same line.
origin_request_policy_id = ordered_cache_behavior.value.path_pattern = "/"...

Reverse Integer python

I am trying to solve the Leetcode "Reverse Integer" challenge in python.
I took look at the solution they supplied.
Their answer was written in Java.
I do not know why they use the following test-condition:
if (rev > Integer.MAX_VALUE/10 || (rev == Integer.MAX_VALUE / 10 && pop > 7)) return 0;
if (rev < Integer.MIN_VALUE/10 || (rev == Integer.MIN_VALUE / 10 && pop < -8)) return 0;
I wrote my python code without that same test condition.
It failed when I tested a negative number.
class Solution:
def reverse(self, x: int) -> int:
rev = 0
while( x != 0 ):
pop = x % 10
x //= 10
rev = rev * 10 +pop
print(rev)
return rev
I do not understand why that particular test-condition exists in their code.
The goal is to reverse the order of digits in an integer.
Some examples are shown below:
Example 1:
Input: x = 123
Output: 321
Example 2:
Input: x = -123
Output: -321
Example 3:
Input: x = 120
Output: 21
Example 4:
Input: x = 0
Output: 0
class Solution {
public int reverse(int x) {
int rev = 0;
while (x != 0) {
int pop = x % 10;
x /= 10;
if (rev > Integer.MAX_VALUE/10 || (rev == Integer.MAX_VALUE / 10 && pop > 7)) return 0;
if (rev < Integer.MIN_VALUE/10 || (rev == Integer.MIN_VALUE / 10 && pop < -8)) return 0;
rev = rev * 10 + pop;
}
return rev;
}
}
Branch conditions are sometimes easier to understand when they are drawn as tree-diagrams instead of one line formulas.
Also, we can write the tree like this:
├── OR
│ ├── (rev > Integer.MAX_VALUE/10)
│ └── AND
│ ├── rev == Integer.MAX_VALUE / 10
│ └── pop > 7
Imagine that rev is a container, like a cookie-jar, or a carboard box.
There is a maximum number of cookies which can be crammed into the cookie jar.
When rev == Integer.MAX_VALUE it means that the cookie jar is full.
The condition rev == Integer.MAX_VALUE/ 10 is represent the idea that the cookie-jar can only fit one more cookie.
Instead of stuffing cookies into cookie jars, we are stuffing digits (such as 5) into an integer.
The integer is the container.
Note that when you append zero to an integer in base 10, it is the same as multiplying by 10. For example, 25 becomes 250
Suppose that the largest integer allowed is 2,147,483,647
Note that 214,748,364 is equal to 2,147,483,647 / 10
When you divide by 10, the decimal part of the number is discarded.
When rev is 214,748,364, then you can append any integer the right-most end, as long as that digit is less than or equal to 7.
Old Element
New Element
New Container
Status
214748364
0
2147483640
VALID
214748364
1
2147483641
VALID
214748364
2
2147483642
VALID
214748364
3
2147483643
VALID
214748364
4
2147483644
VALID
214748364
5
2147483645
VALID
214748364
6
2147483646
VALID
214748364
7
2147483647
VALID
214748364
8
2147483648
TOO BIG
214748364
9
2147483649
TOO BIG
The "status" column in the table above
In their code, when they wrote (pop > 7) it probably should have been (pop > Integer.MAX_VALUE % 10)
Suppose that the largest integer allowed is 2,147,483,647
Then, 2,147,483,647 % 10 is equal to 7.
There are many, many different ways to reverse the digits of a number.
One solution, in python, is shown below:
x = int(reversed(str(x)))

"No instance found" when using seq

I'm puzzled by the fact that Alloy reports No instance found for this model using seq:
one sig Const {
T: seq (seq Int)
}
fact const_facts {
Const.T = {
0 -> {0->1 + 1->9} +
1 -> {0->3 + 1->15}
}
}
run {} for 20 but 6 Int, 8 seq
While the following model, where I simply replaced each seq with Int ->, has an instance as one would expect:
one sig Const {
T: Int -> (Int -> Int)
}
fact const_facts {
Const.T = {
0 -> {0->1 + 1->9} +
1 -> {0->3 + 1->15}
}
}
run {} for 20 but 6 Int
It's especially confusing to me since https://alloytools.org/quickguide/seq.html seems to imply that seq X and Int -> X are the same thing type-wise.
Any thoughts?
The Const.T you create has an arity of 3 while a seq must be an arity of 2.
┌──────────┬──────┐
│this/Const│T │
├──────────┼─┬─┬──┤
│Const⁰ │0│0│1 │
│ │ ├─┼──┤
│ │ │1│9 │
│ ├─┼─┼──┤
│ │1│0│3 │
│ │ ├─┼──┤
│ │ │1│15│
└──────────┴─┴─┴──┘
The predicates and functions for seq assume an arity of 2. I.e. a seq is not like an object, it is a convention for functions and predicates that take an arity 2 tupleset where the first column is an integer.

How can I extract a verb definition as data? [duplicate]

In the console, typing a single verb without parameters will print its content:
tolower
3 : 0
x=. I. 26 > n=. ((65+i.26){a.) i. t=. ,y
($y) $ ((x{n) { (97+i.26){a.) x}t
)
That's nice for development, but unexploitable during execution. Is there a way to do that dynamically? Is there a verb that can return the contents of another verb?
For example:
showverb 'tolower'
or
showverb tolower
You can use its representation. For example the boxed representation (5!:2) of tolower is:
(5!:2) <'tolower'
┌─┬─┬────────────────────────────────────────┐
│3│:│x=. I. 26 > n=. ((65+i.26){a.) i. t=. ,y│
│ │ │($y) $ ((x{n) { (97+i.26){a.) x}t │
└─┴─┴────────────────────────────────────────┘
its linear (5!:5) is:
(5!:5) <'tolower'
3 : 0
x=. I. 26 > n=. ((65+i.26){a.) i. t=. ,y
($y) $ ((x{n) { (97+i.26){a.) x}t
)

How to list the code of a verb in J

In the console, typing a single verb without parameters will print its content:
tolower
3 : 0
x=. I. 26 > n=. ((65+i.26){a.) i. t=. ,y
($y) $ ((x{n) { (97+i.26){a.) x}t
)
That's nice for development, but unexploitable during execution. Is there a way to do that dynamically? Is there a verb that can return the contents of another verb?
For example:
showverb 'tolower'
or
showverb tolower
You can use its representation. For example the boxed representation (5!:2) of tolower is:
(5!:2) <'tolower'
┌─┬─┬────────────────────────────────────────┐
│3│:│x=. I. 26 > n=. ((65+i.26){a.) i. t=. ,y│
│ │ │($y) $ ((x{n) { (97+i.26){a.) x}t │
└─┴─┴────────────────────────────────────────┘
its linear (5!:5) is:
(5!:5) <'tolower'
3 : 0
x=. I. 26 > n=. ((65+i.26){a.) i. t=. ,y
($y) $ ((x{n) { (97+i.26){a.) x}t
)

Resources