Does the concept of coupling two different items of data by storing them in a single variable have a name? - coupling

For example, if I have a 64-bit variable and store two 32-bit items of data in it, perhaps for the purposes of SIMD processing, is there a name to describe the logical coupling of those two items of data?
A colleague of mine suggests "Hybrid Coupling", is this a widely used term?
To clarify: we're after a higher level concept than specific implementations. Say, for example, in a C-like language we have these two structs:
struct CoupledData
{
uint64 x_and_y; // x is stored in the top 4 bytes, y in the bottom 4
}
struct UncoupledData
{
uint32 x;
uint32 y;
}
Regardless of the reasons for doing so, there is an implicit coupling between the x and y data members in CoupledData that doesn't exist in UncoupledData. Is there a term which describes this coupling between x and y?

I've always called it "packing." You pack R, G, B, and alpha bytes into a long.
pixel=(a<<24)+(r<<16)+(g<<8)+b;
Here's a reference for SIMD instructions in particular (from MMX/SSE):
ADDPS: Add Packed Single-Precision FP
Values
CMPccPS: Packed Single-Precision
FP Compare

In the c world its called a union
A "union declaration" specifies a set of variable values and, optionally, a tag naming the union. The variable values are called "members" of the union and can have different types. Unions are similar to "variant records" in other languages
A variable with union type stores one of the values defined by that type. The same rules govern structure and union declarations. Unions can also have bit fields.
In his example you would have this psuedocode:
union 32_64 {
64_bit_specifier sixty_four;
32_bit_specifier thirty_two;
}

Related

Octave: Differences between struct and cell array

Sparked by this question and posted comments/answers, I came up with another question:
What features are available in Cell arrays that are not in Structures, and viceversa, in Octave?
I could gather the following:
In Cell arrays:
Can operate on full "columns" (fields in the structure lingo) at once.
In Structures:
Have named fields.
I think the best way to answer this, rather than address how they are similar, is to point out how they differ.
Also, since you seem to be drawing equivalents to (and perhaps confusing) concepts from other languages, it may be instructive to point out similarities to constructs from other popular languages, namely R and python.
In all the above languages, there exists the concept of
an "array": a rectangular collection of elements of the same type, which can be one or more dimensions, and typically guaranteed to occupy a contiguous area in memory
a "list": a collection of elements which can be of different types, does not have to be rectangular (i.e. can be 'jagged'), typically only 1D (but can be multidimensional, or contain nested lists), and its elements are not guaranteed to occupy a contiguous area in memory
a "dict": a collection of elements which are like a list, but augmented by the fact that they can be accessed by a 'key', rather than just by index.
a "table" (?): a horizontal concatenation of equal-sized columns, each identified by a 'column header'
Octave
In octave, the closest to the "array" concept is the 'object' array, (where 'object' could be anything, but is typically numerical) e,g, [1,2:3,4].
The closest to a "list" concept is the cell array, e.g. { [1,2], true; 'John' }. To index a cell array and obtain the contents of a cell at a particular index, you use {}. However, octave cell-arrays are slightly different, in that they can also be thought of as 'object arrays' where the object elements are 'cells', which you could think of as references to their contained objects. This means you can construct a cell-array and index it with () as a normal array, returning a sub-array of cells (i.e. another cell-array). Also, a cell can contain another cell-array as its contents (i.e. cell-arrays can be nested).
The closest to a "dict" concept is the struct. This allows you to create an object which can have 'fields', such that for each field you can assign value.
Python
By contrast, in python you don't have arrays. You only have lists and dicts. In order to get array functionality you need to rely on external modules (such as numpy) which take a list as an argument to convert to an array type. Python lists are also always 1D (but can be nested).
Python dicts effectively behave the same way as octave structs. There are some tiny conceptual differences, but they're effectively equivalent constructs.
R
R is probably the bit that's causing the most confusion, because R allows you to allocate names to elements of both arrays and lists, and allows you to access both using either an index or the allocated name.
But, R still has a vector type, e.g. c(1,2,3), which despite the fact that it can also be given names, e.g. c( a=1, b=2, c=3 ), it still requires that elements need to be of the same type, otherwise R will convert to the least common denominator. (e.g. c(1, '2') will convert both elements to strings).
Then, you have lists, which are basically something like 'lists' and 'dicts' combined. If you have list(1, 2, 3), you have 'list' functionality, and if you have list(a=1, b=2, c=3) you have 'dict' functionality. If you access a list element using the [] operator, the output is expressed as another list (in a similar way to how cellarrays in octave can be indexed with () ), whereas if you index a list using the [[]] operator, you get the 'contents' only (similar to if you index a cell-array in octave with {} ).
"Tables": dataframes vs dicts vs structs
Now, in R, you also have dataframes. This is effectively a list with names (i.e. 'dict') where all elements are vectors of the same length (but can be different types), e.g. data.frame( list( a=1:3, b=c('one', 'two', 'three') ) ); note that expressing this as a data.frame rather than plain list simply results in different behaviour (e.g. printing), but otherwise the underlying object is the same (which you can confirm by typing unclass(df).
In python, we can note that a pandas dataframe behaves the same way (i.e. a pandas dataframe is initialized via a dict whose keys contain values that are equally sized vectors).
Therefore since a dataframe is basically a list of equal vectors, the easiest way to have dataframe functionality in octave is to create a struct whose fields are equal sized vectors. Or, if you don't care about fieldnames and are happy to access your contained arrays by "column index", then you can create a cell array and store in each cell your equally-sized numerical 'data' arrays.
Do cells have "columns" in the way implied in the question?
No. If you want to do vectorised operations, you cannot do it across cell-array columns. You need to performed vectorised operations on arrays.
So actually, if what you're looking for is the equivalent of a dataframe, where each "column" represents a numerical vector, the equivalent of that is a struct, where you assign a numerical vector to a field.
In other words the equivalent of dataframes in the various languages are:
Python: pandas.DataFrame( { 'col1': [1,2], 'col2': [3,4] })
R: data.frame( list( col1=c(1,2), col2=c(3,4) ) )
octave: struct( 'col1', [1,2], 'col2', [3,4] )
Having said that, you may prefer a more 'tabular' output. You can either write your own function for this, or try the dataframe package from octave forge, which provides a class for just that.
As an example here's one snippet you could easily convert to a function, and improve on to add all sorts of bells and whistles like colour etc.
fprintf( '%4s %5s %5s\n', '', fieldnames(S){:} ), for i = 1 : length(S.col1), fprintf( '%4d %5.3f %5.3f\n', i, num2cell( structfun(#(x) x(i), S) ){:} ), end
col1 col2
1 1.000 3.000
2 2.000 4.000

Choco Solver ICF constraint to define the standard deviation of an IntVar array within limit

Say, I am having an IntVar array
int n = 10;
IntVar[] x = VariableFactory.boundedArray("x", n, 0, 100, solver);
I need to define a constraint that restricts the standard deviation(can be a number with decimal points) of this array less than a predefined real number, say 3.45.
The deviation constraint is not (yet) implemented in choco. My company can implement it and add it to the library for you if you wish. Contact us to get a commercial offer (https://www.cosling.com/#contact).
Otherwise, you may encode the deviation as a continuous constraint (as in this example https://github.com/chocoteam/choco-solver/blob/master/choco-samples/src/main/java/org/chocosolver/samples/real/SmallSantaClaude.java) but it requires to install Ibex solver, with the jni bridge (http://www.ibex-lib.org/doc/java-install.html).
Best,
Jean-Guillaume Fages
https://www.cosling.com/

Alloy - set difference leading to vars and clauses, set union does not

I'm curious as to when evaluation sets in, apparently certain operators are rather transformed into clauses than evaluated:
abstract sig Element {}
one sig A,B,C extend Element {}
one sig Test {
test: set Element
}
pred Test1 { Test.test = A+B }
pred Test2 { Test.test = Element-C }
and run it for Test1 and Test2 respectively will give different number of vars/clauses, specifically:
Test1: 0 vars, 0 primary vars, 0 clauses
Test2: 5 vars, 3 primary vars, 4 clauses
So although Element is abstract and all its members and their cardinalities are known, the difference seems not to be computed in advance, while the sum is. I don't want to make any assumptions, so I'm interested in why that is. Is the + operator special?
To give some context, I tried to limit the domain of a relation and found, that using only + seems to be more efficient, even when the sets are completely known in advance.
To give some context, I tried to limit the domain of a relation and found, that using only + seems to be more efficient, even when the sets are completely known in advance.
That is pretty much the right conclusion. The reason is the fact that the Alloy Analyzer tries to infer relation bounds from certain Alloy idioms. It uses a conservative approximation that is always sound for set union and product, but not for set difference. That's why for Test1 in the example above the Alloy Analyzer infers a fixed bound for the test relation (this/Test.test: [[[A$0], [B$0]]]) so no solver needs to be invoked; for Test2, the bound for the test relation cannot be shrunk so is set to be the most permissive (this/Test.test: [[], [[A$0], [B$0], [C$0]]]), thus a solver needs to be invoked to find a solution satisfying the constraints given the bounds.

How do I define a collection of values with units?

I want to define a collection of data arrays and associate each with units. For instance, one member of the collection could be Vector Float to which I somehow associate the units meters. Another member could be Vector Int to which I associate the units counts. Another could be Vector Double to which I associate the units centimeters. My use case is that for each member of the collection, I want to:
Display the precision of the data (i.e. Int vs Float vs Double vs Int64)
Display the numeric values according to the precision (i.e. [1.0, 3.0, 4.0] or [1, 3, 4] as appropriate).
Display the significance of the values, including units (i.e. Length meters, Count, Length centimeters).
Each data array is read from a file that defines the array of values, the precision, and the units, so none of these are available at compile time.
How can I define the type for such a collection, say, as an IntMap? Can I do so using existing units related libraries or do I need to roll my own? I am open to using any GHC extensions needed to get the job done (such as GADTs).
I have encountered the following issues in trying to formulate a solution.
Vector Float, Vector Double, and Vector Int are all different types, so I cannot put them into the same collection without existential quantification such as in the answer to this question. However, if I use existential quantification using the Num class as a constraint, I remove the precision information.
The units in the dimensional package are at the type level, so arrays with quantity Length cannot be in the same collection as arrays with quantity Time. Again, this might be solved with existential quantification, and again, destroying the needed information. Other units packages I've found treat units on the type level as well.
One possible solution is to take the units and precision information out of the type level and into the data level:
data Quantity = Length LengthUnit
| Time TimeUnit
| Count
| Angle AngleUnit
etc
data TimeUnit = Sec | MSec | Hour | Day
data AngleUnit = Deg | Rad | etc
data LengthUnit = Meter | Centimeter | Mile
etc
data Precision = Int | Float | Double
data DataArray quant prec = forall e . Num e => DataArray Quantity Precision (Vector e)
(This simple units system could be improved by adding prefixes, conversion factors, etc, like units packages usually do. The example above is just for illustrative purposes.)
This presents a number of problems as well. For one, it's easy to define a DataArray that has inconsistent type such as DataArray Angle Int (Vector Float). It also smells of reinventing the wheel, both in implementing a whole new units system and redundantly defining the precision of the data.
I feel like I am overcomplicating something that should be simple. What is a type strategy that achieves my usage goals?

Is there a language with constrainable types?

Is there a typed programming language where I can constrain types like the following two examples?
A Probability is a floating point number with minimum value 0.0 and maximum value 1.0.
type Probability subtype of float
where
max_value = 0.0
min_value = 1.0
A Discrete Probability Distribution is a map, where: the keys should all be the same type, the values are all Probabilities, and the sum of the values = 1.0.
type DPD<K> subtype of map<K, Probability>
where
sum(values) = 1.0
As far as I understand, this is not possible with Haskell or Agda.
What you want is called refinement types.
It's possible to define Probability in Agda: Prob.agda
The probability mass function type, with sum condition is defined at line 264.
There are languages with more direct refinement types than in Agda, for example ATS
You can do this in Haskell with Liquid Haskell which extends Haskell with refinement types. The predicates are managed by an SMT solver at compile time which means that the proofs are fully automatic but the logic you can use is limited by what the SMT solver handles. (Happily, modern SMT solvers are reasonably versatile!)
One problem is that I don't think Liquid Haskell currently supports floats. If it doesn't though, it should be possible to rectify because there are theories of floating point numbers for SMT solvers. You could also pretend floating point numbers were actually rational (or even use Rational in Haskell!). With this in mind, your first type could look like this:
{p : Float | p >= 0 && p <= 1}
Your second type would be a bit harder to encode, especially because maps are an abstract type that's hard to reason about. If you used a list of pairs instead of a map, you could write a "measure" like this:
measure total :: [(a, Float)] -> Float
total [] = 0
total ((_, p):ps) = p + probDist ps
(You might want to wrap [] in a newtype too.)
Now you can use total in a refinement to constrain a list:
{dist: [(a, Float)] | total dist == 1}
The neat trick with Liquid Haskell is that all the reasoning is automated for you at compile time, in return for using a somewhat constrained logic. (Measures like total are also very constrained in how they can be written—it's a small subset of Haskell with rules like "exactly one case per constructor".) This means that refinement types in this style are less powerful but much easier to use than full-on dependent types, making them more practical.
Perl6 has a notion of "type subsets" which can add arbitrary conditions to create a "sub type."
For your question specifically:
subset Probability of Real where 0 .. 1;
and
role DPD[::T] {
has Map[T, Probability] $.map
where [+](.values) == 1; # calls `.values` on Map
}
(note: in current implementations, the "where" part is checked at run-time, but since "real types" are checked at compile-time (that includes your classes), and since there are pure annotations (is pure) inside the std (which is mostly perl6) (those are also on operators like *, etc), it's only a matter of effort put into it (and it shouldn't be much more).
More generally:
# (%% is the "divisible by", which we can negate, becoming "!%%")
subset Even of Int where * %% 2; # * creates a closure around its expression
subset Odd of Int where -> $n { $n !%% 2 } # using a real "closure" ("pointy block")
Then you can check if a number matches with the Smart Matching operator ~~:
say 4 ~~ Even; # True
say 4 ~~ Odd; # False
say 5 ~~ Odd; # True
And, thanks to multi subs (or multi whatever, really – multi methods or others), we can dispatch based on that:
multi say-parity(Odd $n) { say "Number $n is odd" }
multi say-parity(Even) { say "This number is even" } # we don't name the argument, we just put its type
#Also, the last semicolon in a block is optional
Nimrod is a new language that supports this concept. They are called Subranges. Here is an example. You can learn more about the language here link
type
TSubrange = range[0..5]
For the first part, yes, that would be Pascal, which has integer subranges.
The Whiley language supports something very much like what you are saying. For example:
type natural is (int x) where x >= 0
type probability is (real x) where 0.0 <= x && x <= 1.0
These types can also be implemented as pre-/post-conditions like so:
function abs(int x) => (int r)
ensures r >= 0:
//
if x >= 0:
return x
else:
return -x
The language is very expressive. These invariants and pre-/post-conditions are verified statically using an SMT solver. This handles examples like the above very well, but currently struggles with more complex examples involving arrays and loop invariants.
For anyone interested, I thought I'd add an example of how you might solve this in Nim as of 2019.
The first part of the questions is straightfoward, since in the interval since since this question was asked, Nim has gained the ability to generate subrange types on floats (as well as ordinal and enum types). The code below defines two new float subranges types, Probability and ProbOne.
The second part of the question is more tricky -- defining a type with constrains on a function of it's fields. My proposed solution doesn't directly define such a type but instead uses a macro (makePmf) to tie the creation of a constant Table[T,Probability] object to the ability to create a valid ProbOne object (thus ensuring that the PMF is valid). The makePmf macro is evaluated at compile time, ensuring that you can't create an invalid PMF table.
Note that I'm a relative newcomer to Nim so this may not be the most idiomatic way to write this macro:
import macros, tables
type
Probability = range[0.0 .. 1.0]
ProbOne = range[1.0..1.0]
macro makePmf(name: untyped, tbl: untyped): untyped =
## Construct a Table[T, Probability] ensuring
## Sum(Probabilities) == 1.0
# helper templates
template asTable(tc: untyped): untyped =
tc.toTable
template asProb(f: float): untyped =
Probability(f)
# ensure that passed value is already is already
# a table constructor
tbl.expectKind nnkTableConstr
var
totprob: Probability = 0.0
fval: float
newtbl = newTree(nnkTableConstr)
# create Table[T, Probability]
for child in tbl:
child.expectKind nnkExprColonExpr
child[1].expectKind nnkFloatLit
fval = floatVal(child[1])
totprob += Probability(fval)
newtbl.add(newColonExpr(child[0], getAst(asProb(fval))))
# this serves as the check that probs sum to 1.0
discard ProbOne(totprob)
result = newStmtList(newConstStmt(name, getAst(asTable(newtbl))))
makePmf(uniformpmf, {"A": 0.25, "B": 0.25, "C": 0.25, "D": 0.25})
# this static block will show that the macro was evaluated at compile time
static:
echo uniformpmf
# the following invalid PMF won't compile
# makePmf(invalidpmf, {"A": 0.25, "B": 0.25, "C": 0.25, "D": 0.15})
Note: A cool benefit of using a macro is that nimsuggest (as integrated into VS Code) will even highlight attempts to create an invalid Pmf table.
Modula 3 has subrange types. (Subranges of ordinals.) So for your Example 1, if you're willing to map probability to an integer range of some precision, you could use this:
TYPE PROBABILITY = [0..100]
Add significant digits as necessary.
Ref: More about subrange ordinals here.

Resources