How do I read an .npy file that contains a string?

How do I read an .npy file that contains a string? - rust

I have a .npy file which contains tuples of one string (|S8) and 6 float values. I want to read the .npy in Rust as a vector of tuples.
I tried the npyz crate:
use npyz;
fn main() {
read_depl_npy();
}
fn read_depl_npy() -> Result<(), Box<dyn std::error::Error>> {
let bytes = std::fs::read("test-data/test.npy")?;
let npy = npyz::NpyFile::new(&bytes[..])?;
println!("{:?}", npy.into_vec::<Depl>());
Ok(())
}
#[derive(npyz::Deserialize, Debug)]
struct Depl {
name: String,
a: f64,
b: f64,
c: f64,
d: f64,
e: f64,
f: f64,
}
I get this error:
the trait `Deserialize` is not implemented for `String`
Are there any other solutions or modification to this code that I could use?
I tried to open the .npy in Python and send it to Rust using pyo3 by creating a Python module with Rust, but it's too inefficient. I wonder if Calling Python in Rust code would do any better.
Upadte:
I'm not in control of the data in the .npy file. It is given to me as is. Here are the first 8 items in the file :
0000 = {void} (b'N1 ', 1.15423654e-05, nan, -8.63531175e-06, 0.00234345, -1.93959406e-05, nan)
0001 = {void} (b'N10 ', nan, 0.00014046, 9.3921465e-07, nan, -1.36987648e-05, -0.00021798)
0002 = {void} (b'N100 ', -2.95802408e-06, 5.02222077e-05, 5.02908617e-07, 0.00222162, nan, 0.00015162)
0003 = {void} (b'N1000 ', 1.11732508e-06, 0.00018788, nan, 0.00098555, -6.56358132e-06, -0.00021724)
0004 = {void} (b'N1001 ', -1.07967489e-06, 0.0001863, -3.29593367e-07, 0.00098565, nan, -0.00021703)
0005 = {void} (b'N1002 ', nan, 0.00018486, -4.39249772e-07, 0.00098573, -6.54476282e-06, -0.00021686)
0006 = {void} (b'N1003 ', -1.01067021e-06, 0.00018347, 8.16061298e-07, 0.00098576, -6.56198811e-06, -0.00021675)
0007 = {void} (b'N1004 ', 1.03888923e-18, 0.00016245, -2.65077262e-06, 0.0016541, -1.13024989e-05, -0.00022285)
0008 = {void} (b'N1005 ', 2.02031333e-18, 0.00016073, -1.84684389e-06, 0.00165515, -1.13003433e-05, -0.00022227)
And the data type:
[('name', 'S8'), ('a', '<f8'), ('b', '<f8'), ('c', '<f8'), ('d', '<f8'), ('e', '<f8'), ('f', '<f8')]

Related

PySpark: Convert Map Column Keys Using Dictionary

I have a PySpark DataFrame with a map column as below:
root
|-- id: long (nullable = true)
|-- map_col: map (nullable = true)
| |-- key: string
| |-- value: double (valueContainsNull = true)
The map_col has keys which need to be converted based on a dictionary. For example, the dictionary might be:
mapping = {'a': '1', 'b': '2', 'c': '5', 'd': '8' }
So, the DataFrame needs to change from:
[Row(id=123, map_col={'a': 0.0, 'b': -42.19}),
Row(id=456, map_col={'a': 13.25, 'c': -19.6, 'd': 15.6})]
to the following:
[Row(id=123, map_col={'1': 0.0, '2': -42.19}),
Row(id=456, map_col={'1': 13.25, '5': -19.6, '8': 15.6})]
I see that transform_keys is an option if I could write-out the dictionary, but it's too large and dynamically-generated earlier in the workflow. I think an explode/pivot could also work, but seems non-performant?
Any ideas?
Edit: Added a bit to show that size of map in map_col is not uniform.

an approach using RDD transformation.
def updateKey(theDict, mapDict):
"""
update theDict's key using mapDict
"""
updDict = []
for item in theDict.items():
updDict.append((mapDict[item[0]] if item[0] in mapDict.keys() else item[0], item[1]))
return dict(updDict)
data_sdf.rdd. \
map(lambda r: (r[0], r[1], updateKey(r[1], mapping))). \
toDF(['id', 'map_col', 'new_map_col']). \
show(truncate=False)
# +---+-----------------------------------+-----------------------------------+
# |id |map_col |new_map_col |
# +---+-----------------------------------+-----------------------------------+
# |123|{a -> 0.0, b -> -42.19, e -> 12.12}|{1 -> 0.0, 2 -> -42.19, e -> 12.12}|
# |456|{a -> 13.25, c -> -19.6, d -> 15.6}|{8 -> 15.6, 1 -> 13.25, 5 -> -19.6}|
# +---+-----------------------------------+-----------------------------------+
P.S., I added a new key within the map_col's first row to show what happens if no mapping is available

transform_keys can use a lambda, as shown in the example, it's not just limited to an expr. However, the lambda or Python callable will need to utilize a function either defined in pyspark.sql.functions, a Column method, or a Scala UDF, so using a Python UDF which refers to the mapping dictionary object isn't currently possible with this mechanism. However, we can make use of the when function to apply the mapping, by unrolling the key-value pairs in the mapping into chained when conditions. See the below example to illustrate the idea:
from typing import Dict, Callable
from functools import reduce
from pyspark.sql.functions import Column, when, transform_keys
from pyspark.sql import SparkSession
def apply_mapping(mapping: Dict[str, str]) -> Callable[[Column, Column], Column]:
def convert_mapping_into_when_conditions(key: Column, _: Column) -> Column:
initial_key, initial_value = mapping.popitem()
initial_condition = when(key == initial_key, initial_value)
return reduce(lambda x, y: x.when(key == y[0], y[1]), mapping.items(), initial_condition)
return convert_mapping_into_when_conditions
if __name__ == "__main__":
spark = SparkSession\
.builder\
.appName("Temp")\
.getOrCreate()
df = spark.createDataFrame([(1, {"foo": -2.0, "bar": 2.0})], ("id", "data"))
mapping = {'foo': 'a', 'bar': 'b'}
df.select(transform_keys(
"data", apply_mapping(mapping)).alias("data_transformed")
).show(truncate=False)
The output of the above is:
+---------------------+
|data_transformed |
+---------------------+
|{b -> 2.0, a -> -2.0}|
+---------------------+
which demonstrates the defined mapping (foo -> a, bar -> b) was successfully applied to the column. The apply_mapping function should be generic enough to copy and utilize in your own pipeline.

Another way:
Use itertools to create an expression to inject into pysparks transform_keys function. Used upper just in case. Code below
from itertools import chain
m_expr1 = create_map([lit(x) for x in chain(*mapping.items())])
new =df.withColumn('new_map_col',transform_keys("map_col", lambda k, _: upper(m_expr1[k])))
new.show(truncate=False)
+---+-----------------------------------+-----------------------------------+
|id |map_col |new_map_col |
+---+-----------------------------------+-----------------------------------+
|123|{a -> 0.0, b -> -42.19} |{1 -> 0.0, 2 -> -42.19} |
|456|{a -> 13.25, c -> -19.6, d -> 15.6}|{1 -> 13.25, 5 -> -19.6, 8 -> 15.6}|
+---+-----------------------------------+-----------------------------------+

Correct and unify type of destructured list

I want to operate with arrays of arrays of values but I cannot find the right type when destructuring the lists.
The values of my arrays are not of the same type. I am neither looking into passing an array as a parameter. I am trying to destructure arrays, not objects.
I have the following non compilling snippet of ts:
const bar = (n: number, s: string, p: string): string => n === 1 ? s : p;
const f = (args: [number, string, string]): string => bar(...args)
const r = [
[1, 'a', 'b'],
[2, 'c', 'd'],
].map(f);
which throws the error:
Argument of type '(args: [number, string, string]) => string' is not assignable to parameter of type '(value: (string | number)[], index: number, array: (string | number)[][]) => string'.
Types of parameters 'args' and 'value' are incompatible.
Type '(string | number)[]' is not assignable to type '[number, string, string]'.
Target requires 3 element(s) but source may have fewer.
with Ts v4.3.5 the offiial editor. The resulting js does work with node
➜ ~ node
Welcome to Node.js v14.17.0.
Type ".help" for more information.
> const bar = (n, s, p) => n === 1 ? s : p;
undefined
> const f = (args) => bar(...args);
undefined
> const r = [
... [1, 'a', 'b'],
... [2, 'c', 'd'],
... ].map(f);
undefined
> r
[ 'a', 'd' ]
Is it possible to generalize this?

Background
The problem is that TypeScript unfortunately will not infer the tuple types for the input data that you expect it to infer.
Consider the following types:
const t = [1, 'a', 'b']
// type of t: (string | number)[]
const t2 = [1, 'a', 'b'] as const
// type of t2: readonly [1, 'a', 'b']
The first assignment will infer an array of the union of the given element types. In the second assignment we use the as const addition to force the compiler to infer the most concrete type. As you see, it actually now infers literal types (e.g. 'a' instead of string).
This may also not always be what we want. You can of course just cast the elements to the correct types. However, I myself often use a helper function that does nothing except infer the types I most often want:
function tuple<T extends unknown[]>(...tup: T): T {
return tup;
}
const t3 = tuple(1, 'a', 'b')
// type of t3: [number, string, string]
Solution 1
Let's apply this knowledge to your example:
const input = [
[1, 'a', 'b'],
[2, 'c', 'd'],
]
// type of input: (string | number)[][]
const input2 = [
tuple(1, 'a', 'b'),
tuple(2, 'c', 'd')
]
// type of input2: [number, string, string][]
The last type is exactly what you need for your input type of f.
Combining everything, we get:
const bar = (n: number, s: string, p: string): string => n === 1 ? s : p;
const f = (args: [number, string, string]): string => bar(...args)
function tuple<T extends unknown[]>(...tup: T): T {
return tup;
}
const r = [
tuple(1, 'a', 'b'),
tuple(2, 'c', 'd'),
].map(f);
// type of r: string[]
Solution 2
If you don't want to refine the input type, you can also create a helper for the .map function that will infer the correct types:
const bar = (n: number, s: string, p: string): string => (n === 1 ? s : p);
const f = (args: [number, string, string]): string => bar(...args);
function mapValues<T, R>(values: T[], f: (value: T) => R): R[] {
return values.map(f);
}
const r = mapValues(
[
[1, "a", "b"],
[2, "c", "d"]
],
f
);
The result should be the same.
Solution 3
If you really do not like or cannot use helper functions, the easiest solution I can come up with is using as const. You will need to slightly change the type of f because the arrays will become readonly:
const bar = (n: number, s: string, p: string): string => (n === 1 ? s : p);
const f = (args: readonly [number, string, string]): string => bar(...args);
const r = ([
[1, "a", "b"],
[2, "c", "d"]
] as const).map(f);

You f function is expecting its first argument to be a tuple: [number, string, string].
However, the items in your array, r, have the type (string | number)[]. So an array of either strings or numbers.
So, you would need to tell TypeScript that the array items are tuples. For example:
const r: [number, string, string][] = [
[1, 'a', 'b'],
[2, 'c', 'd'],
];
const result = r.map(f);

Nice way to turn a dict into a TypedDict?

I want to have a nice (mypy --strict and pythonic) way to turn an untyped dict (from json.loads()) into a TypedDict. My current approach looks like this:
class BackupData(TypedDict, total=False):
archive_name: str
archive_size: int
transfer_size: int
transfer_time: float
error: str
def to_backup_data(data: Mapping[str, Any]) -> BackupData:
result = BackupData()
if 'archive_name' in data:
result['archive_name'] = str(data['archive_name'])
if 'archive_size' in data:
result['archive_size'] = int(data['archive_size'])
if 'transfer_size' in data:
result['transfer_size'] = int(data['transfer_size'])
if 'transfer_time' in data:
result['transfer_time'] = int(data['transfer_time'])
if 'error' in data:
result['error'] = str(data['error'])
return result
i.e I have a TypedDict with optional keys and want a TypedDict instance.
The code above is redundant and non-functional (in terms of functional programming) because I have to write names four times, types twice and result has to be mutable.
Sadly TypedDict can't have methods otherwise I could write s.th. like
backup_data = BackupData.from(json.loads({...}))
Is there something I'm missing regarding TypeDict? Can this be written in a nice, non-redundant way?

When you use a TypedDict, all information is stored in the __annotations__ field.
For your example:
BackupData.__annotations__
returns:
{'archive_name': <class 'str'>, 'archive_size': <class 'int'>, 'transfer_size': <class 'int'>, 'transfer_time': <class 'float'>, 'error': <class 'str'>}
Now we can use that dictionary to iterate over the data and use the values for type casting:
def to_backup_data(data: Mapping[str, Any]) -> BackupData:
result = BackupData()
for key, key_type in BackupData.__annotations__.items():
if key not in data:
raise ValueError(f"Key: {key} is not available in data.")
result[key] = key_type(data[key])
return result
Note that I throw an error when the data is not available, this can be changed at your discretion.
With the following test code:
data = dict(
archive_name="my archive",
archive_size="50",
transfer_size="100",
transfer_time="2.3",
error=None,
)
for key, value in result.items():
print(f"Key: {key.ljust(15)}, type: {str(type(value)).ljust(15)}, value: {value!r}")
The result will be:
Key: archive_name , type: <class 'str'> , value: 'my archive'
Key: archive_size , type: <class 'int'> , value: 50
Key: transfer_size , type: <class 'int'> , value: 100
Key: transfer_time , type: <class 'float'>, value: 2.3
Key: error , type: <class 'str'> , value: 'None'

How to update the weights in WeightedIndex?

I want to update the weights for the key after its drawn, how can I do that.
use rand::distributions::WeightedIndex;
use rand::prelude::*;
fn main() {
let mut rng = thread_rng();
let items = [('a', 0), ('b', 3), ('c', 7)];
let dist2 = WeightedIndex::new(items.iter().map(|item| item.1)).unwrap();
for _ in 0..10 {
// 0% chance to print 'a', 30% chance to print 'b', 70% chance to print 'c'
println!("{}", items[dist2.sample(&mut rng)].0);
// dist2.update_weights(new_weights: &[(usize, &X)])
}
}
That is when "b" is drawn, I want to set the weight for b to zero, so that its no longer drawn, or set it to some other weight.
https://docs.rs/rand/0.7.3/rand/distributions/weighted/struct.WeightedIndex.html

use rand::distributions::WeightedIndex;
use rand::prelude::*;
fn main() {
let mut rng = thread_rng();
let items = [('a', 8), ('b', 5), ('c', 1)];
let mut dist2 = WeightedIndex::new(items.iter().map(|item| item.1)).unwrap();
for _ in 0..3 {
let index = dist2.sample(&mut rng);
println!("{}", index);
println!("{:?}", items[index].0);
dist2.update_weights(&[(index, &0)]).unwrap();
}
}
You just need to call the update_weights function on the same distribution - it will mutate the existing distribution, so there is no need to reassign. Note that the return value of update_weights is just a Result<(), WeightedError>. An Ok(()) indicates that the mutation of the distribution was succesful.

Here is my code:
use rand::distributions::WeightedIndex;
use rand::prelude::*;
fn main() {
let mut rng = thread_rng();
let items = [('a', 8), ('b', 5), ('c', 1)];
let mut dist2 = WeightedIndex::new(items.iter().map(|item| item.1)).unwrap();
for _ in 0..3 {
// 0% chance to print 'a', 30% chance to print 'b', 70% chance to print 'c'
let index = dist2.sample(&mut rng);
println!("{}", index);
println!("{:?}", items[index].0);
let _d = dist2.update_weights(&[(index, &0)]);
}
}

Python: How to convert json comma separated key to a dictionary

I have a JSON in below format:
{
'166, 175': 't2',
'479': 't3'
}
I want to convert this to a map:
166: 't2'
175: 't2'
479: 't3'

src = {
'166, 175': 't2',
'479': 't3'
}
res = {}
for k, v in src.items():
for i in k.split(', '):
res[int(i)] = v
print(res)

You can use some dictionary comprehension here:
{
int(k): v
for ks, v in data.items()
for k in ks.split(',')
}
For the sample data, this gives us:
>>> {
... int(k): v
... for ks, v in data.items()
... for k in ks.split(',')
... }
{166: 't2', 175: 't2', 479: 't3'}

Bit complicated though
src = {
'166, 175': 't2',
'479': 't3'
}
output = dict(reduce(lambda a, b: a + b, map(lambda b:zip(b.split(', '), [a[b]] * len(b.split(', '))), src)))

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How do I read an .npy file that contains a string? - rust

Related

PySpark: Convert Map Column Keys Using Dictionary

Correct and unify type of destructured list

Nice way to turn a dict into a TypedDict?

How to update the weights in WeightedIndex?

Python: How to convert json comma separated key to a dictionary

Categories

Resources