How can I test this (interaction) pattern - statistics

Hi, I have some troubles understanding which analysis is suitable to test this expected pattern.
The idea here is that in Condition 1, the difference between A and B is higher, but small between C and IC. In Condition 2, the difference between C and IC should be higher, but lower between A and B. Ideally, I would like to test this via a three-way ANOVA (2x2x2), but as the graphs are parallel in both plots, it seems that there would not be a significant interaction. Does anyone have an idea? Thanks a lot in advance

The proposed model seems to be something like:
RT ~ Condition * GroupAB * GroupCIC
Based on the plots provided, this should produce output with:
no meaningful 3-way interaction, nor a meaningful 2-way interaction between GroupAB and GroupCIC since the lines are parallel in both conditions.
A meaningful Condition:GroupCIC interaction, since the lines are further apart in condition 2 that condition 1
A meaningful Condition:GroupAB interaction, since the slopes of the lines are different between the two conditions.

Related

Why are OpenMP Reduction Clauses Non-deterministic for Statically Scheduled Loops?

I have been working on a multi-GPU project where I have had problems with obtaining non-deterministic results. I was surprised when it turned out that I obtained non-deterministic results due to a reduction clause executed on the CPU.
In the book Using OpenMP - The Next Step it is written that
"[...] the order in which threads combine their value to construct the
value for the shared result is non-deterministic."
Maybe I just don't understand how the reduction clauses are implemented. Does it mean that if I use schedule(monotonic:static) in combination with a reduction clause each thread will execute its chunk of the iterations in a deterministic order, but that the order in which the partial results are combined at the end of the parallel region is non-deterministic?
Does it mean that if I use schedule(monotonic:static) in combination
with a reduction clause each thread will execute its chunk of the
iterations in a deterministic order, but that the order in which the
partial results are combined at the end of the parallel region is
non-deterministic?
It is known that the end result is non-determinist, detailed information can be found in:
What Every Computer Scientist Should Know about Floating Point Arithmetic. For instance:
Another grey area concerns the interpretation of parentheses. Due to roundoff errors, the associative laws of algebra do not necessarily hold for floating-point numbers. For example, the expression (x+y)+z has a totally different answer than x+(y+z) when x = 1e30, y = -1e30 and z = 1 (it is 1 in the former case, 0 in the latter).
Now regarding the order in which the threads perform the reduction action, as far as I know, the OpenMP standard does not enforce any order, or requires that the order has to be deterministic. Hence, this is an implementation detail that is left up to the compiler that is implementing the OpenMP standard to decide, and consequently, it is something that your code should not reply upon.
Programming language semantics usually declares that a+b+c+d is evaluated as ((a+b)+c)+d. This is not parallel, so an OpenMP reduction is probably evaluated as (a+b)+(c+d). And so on for larger numbers of summands.
So you immediately have that, because of the non-associativity of floating point arithmetic, the result may be subtly different from the sequential value.
But more importantly, the exact value will depend on precisely how the combination is done. Is it a+(b+c) (on 2 threads) or (a+b)+c? So the result is at least "indeterministic" in the sense that you can not reconstruct how it was formed. It could probably even be done in two different ways, if you run the same reduction twice. That's what I would call "non-deterministic", but look in the standard for the exact definition of the term.
By the way, if you want to get some idea of how OpenMP actually does it, write your own reduction operator, and let each invocation print out what it computes. Here is a decent illustration: https://victoreijkhout.github.io/pcse/omp-reduction.html#Initialvalueforreductions
By the way, the standard actually doesn't use the word "non-deterministic" for this case. The following passage explains the issue:
Furthermore, using different numbers of threads may result in
different numeric results because of changes in the association of
numeric operations. For example, a serial addition reduction may have
a different pattern of addition associations than a parallel
reduction.

partial functions vs input verification

I really love using total functions. That said, sometimes I'm not sure what the best approach is for guaranteeing that. Lets say that I'm writing a function similar to chunksOf from the split package, where I want to split up a list into sublists of a given size. Now I'd really rather say that the input for sublist size needs to be a positive int (so excluding 0). As I see it I have several options:
1) all-out: make a newtype for PositiveInt, hide the constructor, and only expose safe functions for creating a PositiveInt (perhaps returning a Maybe or some union of Positive | Negative | Zero or what have you). This seems like it could be a huge hassle.
2) what the split package does: just return an infinite list of size-0 sublists if the size <= 0. This seems like you risk bugs not getting caught, and worse: those bugs just infinitely hanging your program with no indication of what went wrong.
3) what most other languages do: error when the input is <= 0. I really prefer total functions though...
4) return an Either or Maybe to cover the case that the input might have been <= 0. Similar to #1, it seems like using this could just be a hassle.
This seems similar to this post, but this has more to do with error conditions than just being as precise about types as possible. I'm looking for thoughts on how to decide what the best approach for a case like this is. I'm probably most inclined towards doing #1, and just dealing with the added overhead, but I'm concerned that I'll be kicking myself down the road. Is this a decision that needs to be made on a case-by-case basis, or is there a general strategy that consistently works best?

Transportation problem to minimize the cost using genetic algorithm

I am new to Genetic Algorithm and Here is a simple part of what i am working on
There are factories (1,2,3) and they can server any of the following customers(ABC) and the transportation costs are given in the table below. There are some fixed cost for A,B,C (2,4,1)
A B C
1 5 2 3
2 2 4 6
3 8 5 5
How to solve the transportation problem to minimize the cost using a genetic algorithm
First of all, you should understand what is a genetic algorithm and why we call it like that. Because we act like a single cell organism and making cross overs and mutations to reach a better state.
So, you need to implement your chromosome first. In your situation, let's take a side, customers or factories. Let's take customers. Your solution will look like
1 -> A
2 -> B
3 -> C
So, your example chromosome is "ABC". Then create another chromosome ("BCA" for example)
Now you need a fitting function which you wish to minimize/maximize.
This function will calculate your chromosomes' breeding chance. In your situation, that'll be the total cost.
Write a function that calculates the cost for given factory and given customer.
Now, what you're going to do is,
Pick 2 chromosomes weighted randomly. (Weights are calculated by fitting function)
Pick an index from 2 chromosomes and create new chromosomes via using their switched parts.
If new chromosomes have invalid parts (Such as "ABA" in your situation), make a fixing move (Make one of "A"s, "C" for example). We call it a "mutation".
Add your new chromosome to the chromosome set if it wasn't there before.
Go to first process again.
You'll do this for some iterations. You may have thousands of chromosomes. When you think "it's enough", stop the process and sort the chromosome set ascending/descending. First chromosome will be your result.
I'm aware that makes the process time/chromosome dependent. I'm aware you may or may not find an optimum (fittest according to biology) chromosome if you do not run it enough. But that's called genetic algorithm. Even your first run and second run may or may not produce the same results and that's fine.
Just for your situation, possible chromosome set is very small, so I guarantee that you will find an optimum in a second or two. Because the entire chromosome set is ["ABC", "BCA", "CAB", "BAC", "CBA", "ACB"] for you.
In summary, you need 3 informations for applying a genetic algorithm:
How should my chromosome be? (And initial chromosome set)
What is my fitting function?
How to make cross-overs in my chromosomes?
There are some other things to care about this problem:
Without mutation, genetical algorithm can stuck to a local optimum. It still can be used for optimization problems with constraints.
Even if a chromosome exists with a very low chance to be picked for cross-over, you shouldn't sort and truncate the chromosome set till the end of iterations. Otherwise, you may stuck at a local extremum or worse, you may get an ordinary solution candidate instead of global optimum.
To fasten your process, pick non-similar initial chromosomes. Without enough mutation rate, finding global optimum could be a real pain.
As mentioned in nejdetckenobi's answer, in this case the solution search space is too small, i.e. only 8 feasible solutions ["ABC", "BCA", "CAB", "BAC", "CBA", "ACB"]. I assume this is only a simplified version of your problem, and your problem actually contains more factories and customers (but the numbers of factories and customers are equal). In this case, you can just make use of special mutation and crossover to avoid infeasible solution with repeating customers, e.g. ["ABA", 'CCB', etc.].
For mutation, I suggest to use a swap mutation, i.e. randomly pick two customers, swap their corresponding factory (position):
ABC mutate to ACB
ABC mutate to CBA

Number of assembly instruction execution orders

I am working on a presentation on multithreading and I want to demonstrate how instructions can increase in a factorially large way.
Consider the trivial program
a++;
b++;
c++;
In a single-threaded program the three assembly instructions (read, add one, write) that make up the ++ operation only has one order (read a, add one to a, write a to memory, read b,...)
In a program with just three threads executing these three lines in parallel there are many more configurations. The compiler can optimize and re-write these instruction in any order with the constraint that 'read', 'add one' and 'write' occur in order for a b and c. How many valid orders are there?
Initial thoughts:
(3+3+3)!* 1/(3!+3!+3!)=20160
where (3+3+3)! is the total number of permutations without constraint and 1/(3!+3!+3!) is the proportion of permutations that have the correct order.
This might be more of a elaborate comment but...
In the single thread version the compiler can reorder those additions with out the change in the output. c++ compilers are allowed to do so. So there is 3! possibilities for the single thread. And that is assuming the ++ is atomic.
When you go into multithreading the sense of order of operations loses its meaning, depending on architecture it can be done in precisely at the same time. In fact you do not even have threads. E.g. SSE instructions.
What you are trying to count is executing 3 additions where load->inc->store are not atomic, on a single thread. IMO, the way to impose order on the total of 9 elements would be similar to yours, but the factor would be (3!*3!*3!).
1st you take 9! then you impose order on 3 elements by dividing it by 3!, and then repeat process 2 more times. However I get the feeling that this factor is too big.
I would ask a mathematician that's good with combinatorics. The equivalent question is, having NxM coloured balls. N is the number of variables, M is the number of atomic operations you need to execute on each. What is the number of different orders for the balls. The colour is the variable. Because you know that 1st of a colour must be the load, 2nd ++ and 3rd store. So you get M=3 balls for each of N=3 colours. Maybe this representation would be better for a pure mathematician.
EDIT: Well apparently according to wikipedia on permutations of multisets my initial guess was right. Still I would check myself.

A reverse inference engine (find a random X for which foo(X) is true)

I am aware that languages like Prolog allow you to write things like the following:
mortal(X) :- man(X). % All men are mortal
man(socrates). % Socrates is a man
?- mortal(socrates). % Is Socrates mortal?
yes
What I want is something like this, but backwards. Suppose I have this:
mortal(X) :- man(X).
man(socrates).
man(plato).
man(aristotle).
I then ask it to give me a random X for which mortal(X) is true (thus it should give me one of 'socrates', 'plato', or 'aristotle' according to some random seed).
My questions are:
Does this sort of reverse inference have a name?
Are there any languages or libraries that support it?
EDIT
As somebody below pointed out, you can simply ask mortal(X) and it will return all X, from which you can simply pick a random one from the list. What if, however, that list would be very large, perhaps in the billions? Obviously in that case it wouldn't do to generate every possible result before picking one.
To see how this would be a practical problem, imagine a simple grammar that generated a random sentence of the form "adjective1 noun1 adverb transitive_verb adjective2 noun2". If the lists of adjectives, nouns, verbs, etc. are very large, you can see how the combinatorial explosion is a problem. If each list had 1000 words, you'd have 1000^6 possible sentences.
Instead of the deep-first search of Prolog, a randomized deep-first search strategy could be easyly implemented. All that is required is to randomize the program flow at choice points so that every time a disjunction is reached a random pole on the search tree (= prolog program) is selected instead of the first.
Though, note that this approach does not guarantees that all the solutions will be equally probable. To guarantee that, it is required to known in advance how many solutions will be generated by every pole to weight the randomization accordingly.
I've never used Prolog or anything similar, but judging by what Wikipedia says on the subject, asking
?- mortal(X).
should list everything for which mortal is true. After that, just pick one of the results.
So to answer your questions,
I'd go with "a query with a variable in it"
From what I can tell, Prolog itself should support it quite fine.
I dont think that you can calculate the nth solution directly but you can calculate the n first solutions (n randomly picked) and pick the last. Of course this would be problematic if n=10^(big_number)...
You could also do something like
mortal(ID,X) :- man(ID,X).
man(X):- random(1,4,ID), man(ID,X).
man(1,socrates).
man(2,plato).
man(3,aristotle).
but the problem is that if not every man was mortal, for example if only 1 out of 1000000 was mortal you would have to search a lot. It would be like searching for solutions for an equation by trying random numbers till you find one.
You could develop some sort of heuristic to find a solution close to the number but that may affect (negatively) the randomness.
I suspect that there is no way to do it more efficiently: you either have to calculate the set of solutions and pick one or pick one member of the superset of all solutions till you find one solution. But don't take my word for it xd

Resources