Hamming distance [closed] - hamming-distance

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
My work is in genetics and I'm using the Hamming distance (in Matlab) to calculate the genetic distance between genotypes of a virus.
For example: Type 1 has structure 01234 and Type 2 has structure 21304 etc. Obviously there are many genotypes present. Because the genotypes have the same length, I thought using the Hamming distance would be fine.
My question is this: How can I order the genotypes based on the Hamming distance. Another way of putting this: how can I sort the genotypes into clusters based on the Hamming distance between them?
Thanks

You can use severel methodes to cluster such data.
Based on the distance matrix you can use UPGMA or neighbor joining
Single linkage or complete linkage are also distance based cluster methodes.

Related

Suggestions for question answering system NLP [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am trying to build a question answering system where I have a set of predefined questions and their answers. For any given question from the user I have to find if the similar question already exists in the predefined questions and send answers. If it doesn't exist it has to reply a generic response. Any ideas on how to implement this using NLP would be really helpful.
Thanks in advance!!
As you have already mentioned in the question, this calls for a solution that computes text similarity. In this case question-question similarity. You have got a bunch of questions and for an incoming query/question, a similarity score has to be computed with every available question in hand. From a previous answer of mine, to do simple sentence similarity,
Convert the sentences into as suitable representation
Compute some of distance metric between the two representations and figure out the closest match
To achieve 1, you can consider converting every word in a sentence to corresponding vectors. There are libraries/algorithms like fasttext that provide vector mapping. A vector representation of the entire sentence is obtained by taking an average over all word vectors. Use cosine similarity to compute a score between the query and each question in the available list.

Seating arrangement problem in a circular table [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 months ago.
Improve this question
N people sit around a circular table. You have to find the probability that two particular people won't be sitting together.
The input will have the number N and the output should have the probability printed as a float type number rounded off to four decimal places.
Here's the link for the derived formula
You can find the step by step derivation over there
Here's the simple python implementation as per the thread
n = 5
result = (n-3)/(n-1)
print(result)
n= int(input())
import math
print(round(1-math.factorial(n-2)*math.factorial(2)/math.factorial(n-1),4))

Graphic to check for complete separation [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I need to check for complete separation. I am using SPSS and need to know what steps I have to take to get the grahpic on this site. Can someone help me?
SPSS does not provide that probability curve (SAS and Stata can do that). However, plotting the 1/0 outcome against the continuous predictor, and observe how the two horizontal data lines overlap may be enough to give you some hint.
If you have enough data, you can also first separate your data by different groups (for example, 10 equal groups split by your continuous predictors), and the compute each group's mean (aka probability of "yes" to outcome), and join the points. That line should approximate the curve in the illustration you provide.

Determining the maximum distance of two points in a list [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a list of points (x/y each in one column) and need to determine the maximum distance of any combination of pairs of points.
I'm only interested in the distance not the pair of points itself.
Right now I use a rough upper boundary estimation by calculating the length of the vector
(max(x)-min(x), max(y)-min(y))
You could try using CTRL+SHIFT+ENTER:
=MAX((x-TRANSPOSE(x))^2+(y-TRANSPOSE(y))^2)^0.5
Also see: http://newtonexcelbach.wordpress.com/2010/11/27/maximum-distance-between-two-points/

What does taking the logarithm of a variable mean? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Question with regards to taking the logarithm of a variable (Statistics Question)
Say you have a bar graph displaying data for an example "Cost of Computer Orders by the Population" and you are trying to analyze the data and find a distribution. The information does not indicate anything so you take the logarithm of the variable and the graph then resembles a normal distribution. I know that the normal distribution basically means the mean, but what does taking the logarithm of the information indicate?
It seems that you are describing the lognormal distribution: a random variable is said to come from a lognormal distribution if its logarithm is distributed normally.
In practice, this can describe processes where the value cannot go below zero, and most of the population is close to the left (right skewness). For example: salaries, home prices, bone fractures, number of girlfriends all could be reasonably modeled with a log normal distribution.
For example: say that on average young adults have had 2.5 girlfriends. A few have never had one; you cannot have "negative number" of girlfriends, and a few bastards have had 25. However, most young adults will have had between, say, one and three.
if you display the values of x as their log(x) then the line in the diagramm is a straight line, when the values grow exponential. This is a stastistically trick for a fast check if values grow exponentionally.

Resources