Return value of a trait - rust

I want to implement a trait for shared behaviour among a set of models.
In particular, I have to be sure that every model implementing my trait have to be a function that returns a polars dataframe with (at least) some fixed columns.
For sake of clarity, let us suppose we have a (0,1) float column and a trait for clustering. For values less than 0.5 we will label it as "low" else "hight". An algorithm that works only on the label needs to be sure to find that column and in the trait I want to be sure that column will be there.
Is it possible to specify the list of (mandatory) columns I expect for structs implementing this trait?

Related

How to Append matrices together in nalgebra Rust?

Suppose I have a matrix [[1,0],[0,1]] and [[2,0],[0,2]] I want to figure out a way to combine these 2 matrices into [[1,0,2,0],[0,1,0,2]]. I cannot find the appropriate constructor in the documentation: https://www.nalgebra.org/docs/user_guide/vectors_and_matrices/#matrix-construction
that will help me solve this problem.
I also tried firstly declaring an empty Dynamic matrix like DMatrix and then appending the rows using insert_row() but it seems that insert_row() can only be used to insert rows which are all just filled with the same constant number.

Is there a way to call the pcfcross function on groups of marks?

I'm using the pcfcross function to estimate the pair correlation functions (PCFs) between pairs of cell types, indicated by marks. I would now like to expand my analysis to include measuring the PCFs between cell types and groups of cell types. Is there a way to use the pcfcross function on a group of marks?
Alternatively, is there a way to change the marks of a group of marks to a singular mark?
You can collapse several levels of a factor to a single level, using the spatstat function mergeLevels. This will group several types of points into a single type.
However, this may not give you any useful new information. The pair correlation function is a second-order summary, so the pair correlation for the grouped data can be calculated from the pair correlations for the un-grouped data. (See Chapter 7 of the spatstat book).

How to add NER tags to features

I have a set of training sentences for which I computed some float features. In each sentence, two entities are identified. They are either of type 'PERSON', 'ORGANIZATION', 'LOCATION', or 'OTHER'. I would like to add these types to my feature matrix (which stores float variables).
My question is: is there a recommended way to add these entity types ?
I could think of two ways for now:
either adding TWO columns, one for each entity, that will be filled with entity types ids (e.g 0 to 3 or 1 to 4)
adding EIGHT columns, one for each entity type and each entity, and filling them with 0's and 1's
Best!
I would recommend that you use something that can easily be normalized and which is in the same range as the rest of your data.
So if all your float values are between -1 and 1, i would keep the values from your "Named Entity Recognition" in the same range.
So depending on what you prefer or what gives you the best result you could either assign 4 values in the same range as the rest of your floats or use a binary result with more columns.
Finally, the second suggestion (adding EIGHT columns, one for each entity type and each entity, and filling them with 0's and 1's) worked fine!

How to assure single value is passed to formula or function?

Trying to structure things up, I use named ranges often. When these are vectors, I feel uncertain whether the formula or function the vector is passed to will pick a single value (that of the row or column of the calling cell), or use the entire vector.
What's the easiest way of inferring whether a formula will try to pick a single value or not out of a vector argument?
Using the VALUE() function I get the single value, but it is long and makes the formulas harder to read. Is there a shorter formulation or a more elegant way?
I think the attached Picture illustrates my question:
You can force implicit intersection for a function by preceding its arguments with +
=MAX(+Vec1_,+Vec2_)
AFAIK there is no easy way of telling (apart from testing or guessing) which arguments for which function have been setup to handle implicit intersection and which have not: but you can make a pretty good guess by thinking about how the function is supposed to work:
VLOOKUP(lookupVal,LookupRange, colnum)
lookupVal and Colnum are expected to be single values and so will do implicit intersection, but LookupRange is expected to be multiple values so will not.

Defining Data Structures/ Types In Haskell

How would it possible to define a data structure in Haskell, such that there are certain constraints/rules that apply to the elements of the structure, AND be able to reflect this in the type.
For example, if I have a type made up of a list of another type, say
r = [x | x <- input, rule1, rule2, rule3].
In this case, the type of r is a list of elements of (type of x). But by saying this, we loose the rules. So how would it be possible to retain this extra information in the type definition.
To give more concreteness to my question, take the sudoko case. The grid of sudoko is a list of rows, which in turn is a list of cells. But as we all know, there are constraints on the values, frequency. But when one expresses the types, these constraints don't show up in the definition of the type of the grid and row.
Or is this not possible?
thanks.
In the example of a sodoku, create a data type that has multiple constructors, each representing a 'rule' or semantic property.
I.E.
data SodokuType = NotValidatedRow | InvalidRow | ValidRow
Now in some validation function you would return an InvalidRow where you detect a validation of the sodoku rules, and a ValidRow where you detect a successful row (or column or square etc). This allows you to pattern match as well.
The problem you're having is that you're not using types, you're using values. You're defining a list of values, while the list does not say anything about the values it contains.
Note that the example I used is probably not very useful as it does not contain any information about the rows position or anything like that, but you can define it yourself as you'd like.

Resources