How to find k in contrasts, where comparing one group against multiple groups in a sample - statistics

I am having trouble understanding how to conduct contrasts where comparing the mean of one group to the average mean of multiple groups.
My understanding so far, is that the Contrast (C) = mu1- (mu2+mu3...mug)/g
And that C must be divided by the standard error to obtain the t-value.
I am having trouble obtaining the standard error, for example, where you are comparing one group in the sample to that of 3 other groups, where you treat the 3 other groups as a single group, so C = mu1 - (mu2+mu3+mu4)/3.
I know that pooled standard deviation * SQRT(sum(k^2/n)) is an approach, but I don't know how to decide on k.
Any help would be greatly appreciated, thanks.

Related

How can I test this (interaction) pattern

Hi, I have some troubles understanding which analysis is suitable to test this expected pattern.
The idea here is that in Condition 1, the difference between A and B is higher, but small between C and IC. In Condition 2, the difference between C and IC should be higher, but lower between A and B. Ideally, I would like to test this via a three-way ANOVA (2x2x2), but as the graphs are parallel in both plots, it seems that there would not be a significant interaction. Does anyone have an idea? Thanks a lot in advance
The proposed model seems to be something like:
RT ~ Condition * GroupAB * GroupCIC
Based on the plots provided, this should produce output with:
no meaningful 3-way interaction, nor a meaningful 2-way interaction between GroupAB and GroupCIC since the lines are parallel in both conditions.
A meaningful Condition:GroupCIC interaction, since the lines are further apart in condition 2 that condition 1
A meaningful Condition:GroupAB interaction, since the slopes of the lines are different between the two conditions.

DEAP cooperative coevolution

I don't quite understand the example of cooperative coevolution described in the documentation for DEAP.
What is the target_set, that appears when evaluating individual fitness ?
Why is the line for updating fitness
ind.fitness.values = toolbox.evaluate([ind] + r, target_set)
rather than
ind.fitness.values = toolbox.evaluate([ind])
?
How I understand it is that the evaluation of an individual from a certain species can only be done in the context of other individuals from all other species.
The individuals that will "help" in the evaluation of other species are the representatives.
In the first generation, no evaluations have been made so the representatives are chosen randomly. After the evaluation of a certain species, its representative is chosen as the fittest one.
To answer your question, I would implement the evaluation function such that it receives a list of individuals, each one from a different species and as they say "possibly some other arguments". Since the individual from the species being currently evaluated will always be in the first index of the list in [ind] + r, I don't see a clear reason to send the target_set variable as well (moreover, they did not set it in their code).

Transportation problem to minimize the cost using genetic algorithm

I am new to Genetic Algorithm and Here is a simple part of what i am working on
There are factories (1,2,3) and they can server any of the following customers(ABC) and the transportation costs are given in the table below. There are some fixed cost for A,B,C (2,4,1)
A B C
1 5 2 3
2 2 4 6
3 8 5 5
How to solve the transportation problem to minimize the cost using a genetic algorithm
First of all, you should understand what is a genetic algorithm and why we call it like that. Because we act like a single cell organism and making cross overs and mutations to reach a better state.
So, you need to implement your chromosome first. In your situation, let's take a side, customers or factories. Let's take customers. Your solution will look like
1 -> A
2 -> B
3 -> C
So, your example chromosome is "ABC". Then create another chromosome ("BCA" for example)
Now you need a fitting function which you wish to minimize/maximize.
This function will calculate your chromosomes' breeding chance. In your situation, that'll be the total cost.
Write a function that calculates the cost for given factory and given customer.
Now, what you're going to do is,
Pick 2 chromosomes weighted randomly. (Weights are calculated by fitting function)
Pick an index from 2 chromosomes and create new chromosomes via using their switched parts.
If new chromosomes have invalid parts (Such as "ABA" in your situation), make a fixing move (Make one of "A"s, "C" for example). We call it a "mutation".
Add your new chromosome to the chromosome set if it wasn't there before.
Go to first process again.
You'll do this for some iterations. You may have thousands of chromosomes. When you think "it's enough", stop the process and sort the chromosome set ascending/descending. First chromosome will be your result.
I'm aware that makes the process time/chromosome dependent. I'm aware you may or may not find an optimum (fittest according to biology) chromosome if you do not run it enough. But that's called genetic algorithm. Even your first run and second run may or may not produce the same results and that's fine.
Just for your situation, possible chromosome set is very small, so I guarantee that you will find an optimum in a second or two. Because the entire chromosome set is ["ABC", "BCA", "CAB", "BAC", "CBA", "ACB"] for you.
In summary, you need 3 informations for applying a genetic algorithm:
How should my chromosome be? (And initial chromosome set)
What is my fitting function?
How to make cross-overs in my chromosomes?
There are some other things to care about this problem:
Without mutation, genetical algorithm can stuck to a local optimum. It still can be used for optimization problems with constraints.
Even if a chromosome exists with a very low chance to be picked for cross-over, you shouldn't sort and truncate the chromosome set till the end of iterations. Otherwise, you may stuck at a local extremum or worse, you may get an ordinary solution candidate instead of global optimum.
To fasten your process, pick non-similar initial chromosomes. Without enough mutation rate, finding global optimum could be a real pain.
As mentioned in nejdetckenobi's answer, in this case the solution search space is too small, i.e. only 8 feasible solutions ["ABC", "BCA", "CAB", "BAC", "CBA", "ACB"]. I assume this is only a simplified version of your problem, and your problem actually contains more factories and customers (but the numbers of factories and customers are equal). In this case, you can just make use of special mutation and crossover to avoid infeasible solution with repeating customers, e.g. ["ABA", 'CCB', etc.].
For mutation, I suggest to use a swap mutation, i.e. randomly pick two customers, swap their corresponding factory (position):
ABC mutate to ACB
ABC mutate to CBA

constraint to avoid generating duplicates in this search task

I have to solve the following optimization problem:
Given a set of elements (E1,E2,E3,E4,E5,E6) create an arbitrary set of sequences e.g.
seq1:E1,E4,E3
seq2:E2
seq3:E6,E5
and given a function f that gives a value for every pair of elements e.g.
f(E1,E4) = 5
f(E4,E3) = 2
f(E6,E5) = 3
...
in addition it also gives a value for the pair of an element combined with some special element T, e.g.
f(T,E2) = 10
f(E2,T) = 3
f(E5,T) = 1
f(T,E6) = 2
f(T,E1) = 4
f(E3,T) = 2
...
The utility function that must be optimized is the following:
The utility of a sequence set is the sum of the utility of all sequences.
The utility of a sequence A1,A2,A3,...,AN is equal to
f(T,A1)+f(A1,A2)+f(A2,A3)+...+f(AN,T)
for our example set of sequences above this leads to
seq1: f(T,E1)+f(E1,E4)+f(E4,E3)+f(E3,T) = 4+5+2+2=13
seq2: f(T,E2)+f(E2,T) =10+3=13
seq3: f(T,E6)+f(E6,E5)+f(E5,T) =2+3+1=6
Utility(set) = 13+13+6=32
I try to solve a larger version (more elements than 6, rather 1000) of this problem using A* and some heuristic. Starting from zero sequences and stepwise adding elements either to existing sequences or as a new sequence, until we obtain a set of sequences containing all elements.
The problem I run into is the fact that while generating possible solutions I end up with duplicates, for example in above example all the following combinations are generated:
seq1:E1,E4,E3
seq2:E2
seq3:E6,E5
+
seq1:E1,E4,E3
seq2:E6,E5
seq3:E2
+
seq1:E2
seq2:E1,E4,E3
seq3:E6,E5
+
seq1:E2
seq2:E6,E5
seq3:E1,E4,E3
+
seq1:E6,E5
seq2:E2
seq3:E1,E4,E3
+
seq1:E6,E5
seq2:E1,E4,E3
seq3:E2
which all have equal utility, since the order of the sequences does not matter.
These are all permutations of the 3 sequences, since the number of sequences is arbitrairy there can be as much sequences as elements and a faculty(!) amount of duplicates generated...
One way to solve such a problem is keeping already visited states and don't revisit them. However since storing all visited states requires a huge amount of memory and the fact that comparing two states can be a quite expensive operation, I was wondering whether there wasn't a way I could avoid generating these in the first place.
THE QUESTION:
Is there a way to stepwise construct all these sequence constraining the adding of elements in a way that only combinations of sequences are generated rather than all variations of sequences.(or limit the number of duplicates)
As an example, I only found a way to limit the amount of 'duplicates' generated by stating that an element Ei should always be in a seqj with j<=i, therefore if you had two elements E1,E2 only
seq1:E1
seq2:E2
would be generated, and not
seq1:E2
seq2:E1
I was wondering whether there was any such constraint that would prevent duplicates from being generated at all, without failing to generate all combinations of sets.
Well, it is simple. Allow generation of only such sequences that are sorted according to first member, that is, from the above example, only
seq1:E1,E4,E3
seq2:E2
seq3:E6,E5
would be correct. And this you can guard very easily: never allow additional sequence that has its first member less than the first member of its predecessor.

Matching Based on Arbitrary Categories and Similarity Measures

I have customer database who have certain attributes, and a customer type. The collection of attributes can vary (they do come from a finite set though), and when I look at a new customer with unknown type, with given attributes, I would like to determine which type s/he belongs to. For example, say I have these customers already in DB,
Customer | Type | Attributes
1 A 44,32,5,'X'
2 A 3,32,66,'A'
3 B 6,32,'A', 'B'
4 C 47,31,2,'H'
5 C 14,32,2,'O'
6 C 2,'C'
7 A 44
When I receive a new customer who has attributes, for example, 3,32,2, I would like to determine which type this customer belongs to, and the code should report its confidence (as percentage) of this match.
What is the best method to use here? Something statistical, or a method based on an affinity matrix of some kind, or recommendation engine style Pearson Correlation coefficients based approach? Sample, pseude code would be most welcome, but any, all ideas are fine.
Thanks,
The way to solve this problem is using Naive Bayes.

Resources