List of structs in (Rust) Polars column - rust

Let's say I have a Polars column of type list[list[str]]:
Foos
---
list[list[str]]
[["a", "b"], ["c", "d"], ["e", "f"]]
[["g", "h"], ["i", "j"], ["k", "l"]]
[["m", "n"], ["o", "p"], ["q", "r"]]
...
and a struct Foo:
struct Foo {
f1: &str,
f2: &str,
}
How can I obtain a Series list[Foo]?
Foos
---
list[Foo]
[Foo { f1: "a", f2: "b" }, Foo { f1: "c", f2: "d" }, Foo { f1: "e", f2: "f" }]
[Foo { f1: "g", f2: "h" }, Foo { f1: "i", f2: "j" }, Foo { f1: "k", f2: "l" }]
[Foo { f1: "m", f2: "n" }, Foo { f1: "o", f2: "p" }, Foo { f1: "q", f2: "r" }]
I've tried with:
ChunkedArray<ObjectType<T>>
StructArray<Struct> with fields defined as:
let fields = vec![
polars::prelude::ArrowField::new("first_name", polars::prelude::ArrowDataType::Utf8, false),
polars::prelude::ArrowField::new("last_name", polars::prelude::ArrowDataType::Utf8, false),
];
to no avail.
Is this at all possible?

Related

How do I filter a polars DataFrame by verifying if the value of a column is contained by an vector?

I have a dataframe which has the column "ID" with data typed as UInt32 and I have a vector named ids. I want to return a dataframe with the rows which "ID" value is contained by the vector ids.
MINIMAL WANTED EXAMPLE
use polars::df;
use polars::prelude::*;
fn filter_by_id(table: &DataFrame, ids: Vec<u32>) -> DataFrame {
df!{
"ID" => &[1, 3, 5],
"VALUE" => &["B", "D", "F"]
}.unwrap()
}
fn main() {
let table = df!{
"ID" => &[0, 1, 2, 3, 4, 5],
"VALUE" => &["A", "B", "C", "D", "E", "F"]
}.unwrap();
let ids = vec![1, 3, 5];
let filtered_table = filter_by_id(&table, ids);
println!("{:?}", table);
println!("{:?}", filtered_table);
}
ID
VALUE
0
A
1
B
2
C
3
D
4
E
5
F
filter vector = [1, 3, 5]
wanted output =
ID
VALUE
1
B
3
D
5
F
polars mostly operates on Series and Expr types. So by converting your vec to a Series you can accomplish this task relatively easy.
use polars::df;
use polars::prelude::*;
fn main () {
let table = df!{
"ID" => &[0, 1, 2, 3, 4, 5],
"VALUE" => &["A", "B", "C", "D", "E", "F"]
}.unwrap();
let ids = vec![1, 3, 5];
// convert the vec to `Series`
let ids_series = Series::new("ID", ids);
// create a filter expression
let filter_expr = col("ID").is_in(lit(ids_series));
// filter the dataframe on the expression
let filtered = table.lazy().filter(filter_expr).collect().unwrap();
println!("{:?}", filtered);
}
Note: you will need to add the features lazy and is_in
cargo add polars --features lazy,is_in

Return list of all dictionary keys [duplicate]

This question already has answers here:
Accessing nested keys in Python
(5 answers)
Closed 2 years ago.
Is there a way to return a list of all dictionary keys in the provided input (at all levels)?
The keys should be ordered by level, for example:
{
"A": 1,
"B": [{
"B1": 1,
"B2": 1
}, {
"B3": 1,
"B4": 1
}],
"C": {
"C1": 1,
"C2": 1
}
}
I have tried using dict.keys() and dict.items() but did not get the desired output.
The output should be like this ["A", "B", "C", "B1", "B2", "B3", "B4", "C1", "C2"].
Any help is appreciated.
I would do a recursive function:
d = {
"A": 1,
"B": [{
"B1": 1,
"B2": 1
}, {
"B3": 1,
"B4": 1
}],
"C": {
"C1": 1,
"C2": 1
}
}
def flatten(d, lst=None):
if not lst:
lst = []
if isinstance(d, list):
for item in d:
lst = flatten(item, lst)
elif isinstance(d, dict):
for k in d.keys():
lst.append(k)
for v in d.values():
lst = flatten(v, lst)
return lst
print(flatten(d)) # Output: ['A', 'B', 'C', 'B1', 'B2', 'B3', 'B4', 'C1', 'C2']
EDIT: Above code assumes you're using Python 3.7+ where dict are ordered by default. If you're running an older version, you can use collection.OrderedDict instead while initializing d to have the same behavior.
I think i found it
#In order to convert nested list to single list
import more_itertools
a={"A": 1,
"B": [{
"B1": 1,
"B2": 1
}, {
"B3": 1,
"B4": 1
}],
"C": {
"C1": 1,
"C2": 1
}
}
dic_key=[]
for i in a:
dic_key.append(i)
for i in a:
if type(a[i]) is list :
dic_key+=list(map(lambda x:list(x.keys()),a[i]))
elif type(a[i]) is dict:
dic_key+=list(map(lambda x: x,a[i]))
print(list(more_itertools.collapse(dic_key)))

Groovy: Remove duplicates from a list of maps by multiple values

Having a list of maps as this
def listOfMaps =
[
["car": "A", "color": "A", "motor": "A", "anything": "meh"],
["car": "A", "color": "A", "motor": "A", "anything": "doesn't matter"],
["car": "A", "color": "A", "motor": "B", "anything": "Anything"],
["car": "A", "color": "B", "motor": "A", "anything": "Anything"]
]
How am I supposed to find duplicates by car, color and motor? If there are more than 1 map with the same car, color and motor value it should return true. In this case it should return true since first and second map have the same car, color and motor value, value could be anything as long as they are the same.
Groovy has a handy Collection.unique(boolean,closure) method that allows you to create a new list by removing the duplicates from an input list based on the comparator defined in a closure. In your case, you could define a closure that firstly compares car field, then color, and lastly - motor. Any element that duplicates values for all these fields will be filtered out.
Consider the following example:
def listOfMaps = [
["car": "A", "color": "A", "motor": "A", "anything": "meh"],
["car": "A", "color": "A", "motor": "A", "anything": "doesn't matter"],
["car": "A", "color": "A", "motor": "B", "anything": "Anything"],
["car": "A", "color": "B", "motor": "A", "anything": "Anything"]
]
// false parameter below means that the input list is not modified
def filtered = listOfMaps.unique(false) { a, b ->
a.car <=> b.car ?:
a.color <=> b.color ?:
a.motor <=> b.motor
}
println filtered
boolean hasDuplicates = listOfMaps.size() > filtered.size()
assert hasDuplicates
Output:
[[car:A, color:A, motor:A, anything:meh], [car:A, color:A, motor:B, anything:Anything], [car:A, color:B, motor:A, anything:Anything]]
Not sure I have understood question correctly, but I have come up with the next code snippet:
def listOfMaps = [
["car": "A", "color": "A", "motor": "A", "anything": "meh"],
["car": "A", "color": "A", "motor": "A", "anything": "doesn't matter"],
["car": "A", "color": "A", "motor": "B", "anything": "Anything"],
["car": "A", "color": "B", "motor": "A", "anything": "Anything"]
]
static def findDuplicatesByKeys(List<Map<String, String>> maps, List<String> keys) {
Map<String, List<Map<String, String>>> aggregationKeyToMaps = [:].withDefault { key -> []}
maps.each { singleMap ->
def aggregationKey = keys.collect { key -> singleMap[key] }.join('-')
aggregationKeyToMaps.get(aggregationKey).add(singleMap)
}
aggregationKeyToMaps
}
findDuplicatesByKeys(listOfMaps, ['car', 'color', 'motor'])
Basically it iterates over list of maps and groups them by values of the provided keys. The result will be a map of list of maps. Something similar to:
def aggregatedMaps = [
"A-A-A": [
["car": "A", "color": "A", "motor": "A", "anything": "meh"],
["car": "A", "color": "A", "motor": "A", "anything": "doesn't matter"]
],
"A-A-B": [
["car": "A", "color": "A", "motor": "B", "anything": "Anything"]
],
"A-B-A": [
["car": "A", "color": "B", "motor": "A", "anything": "Anything"]
]
]
You can grab .values() for example and apply needed removals (you haven't specified which duplicate should be removed) and finally flatten the list. Hope that's helpful.
You can group the maps by the appropriate fields and then check if there exists at least one group with more then one element:
boolean result = listOfMaps
.groupBy { [car: it.car, color: it.color, motor: it.motor] }
.any { it.value.size() > 1 }

Is there a simple way to generate the lowercase and uppercase English alphabet in Rust?

This is what I'm doing so far:
fn main() {
let a = (0..58).map(|c| ((c + 'A' as u8) as char).to_string())
.filter(|s| !String::from("[\\]^_`").contains(s) )
.collect::<Vec<_>>();
println!("{:?}", a);
}
Output is:
["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]
Also no crates if possible.
You cannot iterate over a range of chars directly, so with a little casting we can do this:
let alphabet = (b'A'..=b'z') // Start as u8
.map(|c| c as char) // Convert all to chars
.filter(|c| c.is_alphabetic()) // Filter only alphabetic chars
.collect::<Vec<_>>(); // Collect as Vec<char>
or, combining the map and filter into filter_map
let alphabet = (b'A'..=b'z') // Start as u8
.filter_map(|c| {
let c = c as char; // Convert to char
if c.is_alphabetic() { Some(c) } else { None } // Filter only alphabetic chars
})
.collect::<Vec<_>>();
There are many options; you can do the following:
fn main() {
let alphabet = String::from_utf8(
(b'a'..=b'z').chain(b'A'..=b'Z').collect()
).unwrap();
println!("{}", alphabet);
}
This way you don't need to remember the ASCII numbers.
You can convert an int to a char in given base. Here the code for 'a' to 'z':
use std::char;
fn main() {
let alphabet = (10..36).map(|i| char::from_digit(i, 36).unwrap()).collect::<Vec<_>>();
println!("{:?}", alphabet);
}
Ouput:
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
For the whole solution, you can create another one with uppercase, and concatenate the two.
Example of the whole solution:
use std::char;
fn main() {
let mut lowercase = (10..36).map(|i| char::from_digit(i, 36).unwrap()).collect::<Vec<_>>();
let mut alphabet = lowercase.iter().map(|c| c.to_uppercase().next().unwrap()).collect::<Vec<_>>();
alphabet.append(&mut lowercase);
println!("{:?}", alphabet);
}
That being said, I think it is easier to just write the vector with literals.
Unless you for some reason (say, an assignment) need to actually generate the characters, the simplest, shortest code is of course just a literal:
fn main() {
let alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
println!("{:?}", alpha);
}
If you need the chars individually:
fn main() {
let chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".chars();
// If you for some reason need a Vec:
println!("{:?}", chars.collect::<Vec<_>>());
}
for alphabet in 'a'..='z' {
println!("{}", alphabet );
}
It prints all the alphabets in lower case.

Random letters returns blank

I'm new to Elixir and trying to get a random letter from a function.
I'm tryng to define a function that return a random letter between a and z.
For for some reason this sometimes returns a blank character.
Why?
defp random_letter do
"abcdefghijklmnopqrstuvwxyz"
|> String.split("")
|> Enum.random
end
def process do
Enum.each(1..12, fn(number) ->
IO.puts random_letter
end)
end
Output:
g
m
s
v
r
o
m
x
e
j
w
String.split("abcdefghijklmnopqrstuvwxyz", "")
returns
["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", ""]
Look at the last element in the list and you get your answer :)
To avoid that, you can use trim option like this:
String.split("abcdefghijklmnopqrstuvwxyz", "", trim: true)
When you want to split the string, here are two alternatives. The second one is used when you have Unicode strings.
iex(1)> String.codepoints("abcdefghijklmnopqrstuvwxyz")
["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p",
"q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]
iex(2)> String.graphemes("abcdefghijklmnopqrstuvwxyz")
["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p",
"q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]
Or you can use
iex(1)> <<Enum.random(?a..?z)>>
"m"

Resources