Pytorch Custom Dataset with multiple return datatypes - pytorch

I am writing a custom torch Dataset for my problem. For each sample, I would like to return an integer, a list of 10 floats and a boolean. What is the most efficient way to deal with this?
So in the __getitem__() function should I return each element separately like return int, list, bool or for example packed in a tuple return (int, list, bool)?

Returning "separately" will default to returning a tuple.

Related

How to convert elements type in numpy array from string to int

I have a numpy array which looks like this
arr = np.array([['1','2','3','4','5','6']])
For the most common way to convert, loops are required like this.
for a in arr:
for b in a:
int(b)
However, I would like to convert all elements in the array without loops. How could I do that?
You can define the type of elements at the assignment:
arr = numpy.array(['1','2','3','4','5','6'], int)
or in case, you want to parse the type of every element in an initialized array, you can use astype() method
arr.astype('int')

Check if value exists in python xarray dataset

I'm cutting up xarrays into small cubes of data for a machine learning process and am trying to filter out cubes with no-data values in them.
I want to keep the memory footprint small and have assigned an unlikely value of -999 to no-data values. This is done to keep things int16 instead of requiring a larger type for nan
Question: What is the best way to check if -999 exists in an xarray.Dataset?
Here is what I have:
(dataset == -999).any()
will yeild:
<xarray.Dataset>
Dimensions: ()
Data variables:
var_a bool True
var_b bool True
var_c bool False
after which I would likely have to select something like var_a. My code would end up looking like this:
def is_clean(dataset):
return (dataset == -999).any().var_a is True
Maybe I'm still fresh when it comes to Xarrays, but I can't find a nicer way to do this in the docs. What bit of structural knowledge about xarrays am I missing that keeps me from being ok with my current solution? Any hints?
Expressions on xarray objects generally return new xarray objects of the same type. This means (dataset.var_a == -999).any() results in a scalar xarray.DataArray object.
Like scalar NumPy arrays, scalar DataArray objects can be inboxed by calling builtin types on them like bool() or float(). This happens implicitly inside the condition of an if statement, for example. Also like NumPy arrays, you can unbox a scalar DataArray of any dtype by with the .item() method.
To check every data variable in a Dataset, you'll either need to iterate over the Dataset using dictionary like access, e.g.,
def is_clean(dataset):
return all((v != -999).all() for v in dataset.data_vars.values())
Or you could convert the whole Dataset into a single DataArray by calling .to_array(), e.g.,
def is_clean(dataset):
return bool(dataset.to_array() != -999).all())
To avoid excess memory usage, you might convert to an array after reducing, which is a little longer but not too bad:
def is_clean(dataset):
return bool((dataset != -999).all().to_array().all())

How to construct an array with multiple possible lengths using immutability and functional programming practices?

We're in the process of converting our imperative brains to a mostly-functional paradigm. This function is giving me trouble. I want to construct an array that EITHER contains two pairs or three pairs, depending on a condition (whether refreshToken is null). How can I do this cleanly using a FP paradigm? Of course with imperative code and mutation, I would just conditionally .push() the extra value onto the end which looks quite clean.
Is this an example of the "local mutation is ok" FP caveat?
(We're using ReadonlyArray in TypeScript to enforce immutability, which makes this somewhat more ugly.)
const itemsToSet = [
[JWT_KEY, jwt],
[JWT_EXPIRES_KEY, tokenExpireDate.toString()],
[REFRESH_TOKEN_KEY, refreshToken /*could be null*/]]
.filter(item => item[1] != null) as ReadonlyArray<ReadonlyArray<string>>;
AsyncStorage.multiSet(itemsToSet.map(roArray => [...roArray]));
What's wrong with itemsToSet as given in the OP? It looks functional to me, but it may be because of my lack of knowledge of TypeScript.
In Haskell, there's no null, but if we use Maybe for the second element, I think that itemsToSet could be translated to this:
itemsToSet :: [(String, String)]
itemsToSet = foldr folder [] values
where
values = [
(jwt_key, jwt),
(jwt_expires_key, tokenExpireDate),
(refresh_token_key, refreshToken)]
folder (key, Just value) acc = (key, value) : acc
folder _ acc = acc
Here, jwt, tokenExpireDate, and refreshToken are all of the type Maybe String.
itemsToSet performs a right fold over values, pattern-matching the Maye String elements against Just and (implicitly) Nothing. If it's a Just value, it cons the (key, value) pair to the accumulator acc. If not, folder just returns acc.
foldr traverses the values list from right to left, building up the accumulator as it visits each element. The initial accumulator value is the empty list [].
You don't need 'local mutation' in functional programming. In general, you can refactor from 'local mutation' to proper functional style by using recursion and introducing an accumulator value.
While foldr is a built-in function, you could implement it yourself using recursion.
In Haskell, I'd just create an array with three elements and, depending on the condition, pass it on either as-is or pass on just a slice of two elements. Thanks to laziness, no computation effort will be spent on the third element unless it's actually needed. In TypeScript, you probably will get the cost of computing the third element even if it's not needed, but perhaps that doesn't matter.
Alternatively, if you don't need the structure to be an actual array (for String elements, performance probably isn't that critical, and the O (n) direct-access cost isn't an issue if the length is limited to three elements), I'd use a singly-linked list instead. Create the list with two elements and, depending on the condition, append the third. This does not require any mutation: the 3-element list simply contains the unchanged 2-element list as a substructure.
Based on the description, I don't think arrays are the best solution simply because you know ahead of time that they contain either 2 values or 3 values depending on some condition. As such, I would model the problem as follows:
type alias Pair = (String, String)
type TokenState
= WithoutRefresh (Pair, Pair)
| WithRefresh (Pair, Pair, Pair)
itemsToTokenState: String -> Date -> Maybe String -> TokenState
itemsToTokenState jwtKey jwtExpiry maybeRefreshToken =
case maybeRefreshToken of
Some refreshToken ->
WithRefresh (("JWT_KEY", jwtKey), ("JWT_EXPIRES_KEY", toString jwtExpiry), ("REFRESH_TOKEN_KEY", refreshToken))
None ->
WithoutRefresh (("JWT_KEY", jwtKey), ("JWT_EXPIRES_KEY", toString jwtExpiry))
This way you are leveraging the type system more effectively, and could be improved on further by doing something more ergonomic than returning tuples.

Scala method that returns the first string found in the alphabet/dictionary

For example this code
val stringTuple = ("BLACK", "GRAY", "WHITE")
firstInAlphabet(stringTuple)
Should return "BLACK". How would you define firstInAlphabet?
Personally I prefer simple and fast implementations over complicated ones that would cover a lot of cases.
t.productIterator.map(_.asInstanceOf[String]).min
productIterator converts the elements of the tuple to an iterator. This looses the type information, so we have to cast the elements and then we use min to find the first.
If you have non-String elements in your tuple this version should do the trick:
t.productIterator.map(_.toString).min
instead of casting to String it converts to a String.

How to write a function max_and_min that accepts a tuple containing integer elements as an argument?

Functions can only return a single value but sometimes, we may want functions to return multiple values. Tuples can come in handy in such cases. We can create a tuple containing multiple values and return the tuple instead of a single value.
Write a function max_and_min that accepts a tuple containing integer elements as an argument and returns the largest and smallest integer within the tuple. The return value should be a tuple containing the largest and smallest value, in that order.
for example: max_and_min((1, 2, 3, 4, 5)) = (5, 1)
I am told to use an iteration to loop through each value of the tuple parameter to find the maximum and minimum values. Also, I must use Python 3.x.
How do I do this? I am really clueless. Thanks for your help!
def max_and_min(values):
# Write your code here
You are looking to pass a variable number of arguments to a function. In python, you can get multiple arguments passed at invocation with the * notation:
def max_and_min(*arg):
return (max(arg), min(arg))
Note that the Python 3 min and max functions themselves accept a variable number of arguments.

Resources