Balanced Incomplete Latin Square Design - randomized-algorithm

Can anyone help me create a Balanced Incomplete Latin Square Design (Also called youden square design), where the rows and columns are balanced?


Finding the relationship between two variables for RE Investment Analysis

I've been struggling with how to attack this problem for the better part of a week. I'll give some quick background to the situation. Basically, I'm trying to figure out a formula to find the average value of below-ground square footage (basement) and above-ground square footage independently within specified areas. This way, I can divide the two averages to determine a ratio of below-ground sf to above-ground sf. This will help me make certain adjustments for a few different real estate investment analytics. It is commonly known that a square foot in a basement is not as valuable as a square foot above the ground. The national average is roughly half. Meaning, a square foot in the basement holds half the value as a square foot on the main floor or upstairs.
If I have a spreadsheet with columns of the sold price for all homes in an area, the above-ground square footage, and the below-ground square footage; is there a way to properly isolate above-ground square footage from below-ground square footage to figure out on average what a square foot in the basement is worth relative to a square foot on the main floor/upstairs?
I've tried a lot of different approaches. I thought I solved it a few different times until realizing upon testing that I was finding solutions for different things... not what I wanted. I tried creating a system of linear equations... but realized quickly that there was no solution that way. Then I also tried to run regressions... but in all honesty, some of that went over my head. I know there has to be a way to figure this out, but I'm looking for any assistance that I can at this point. Any suggestions or resources would be much appreciated, thanks!

fast calculation of the intersection area of a triangle and the unit square

In my current project I need to calculate the intersection area of triangles and the unit squares in an infinite grid.
For every triangle (given by three pairs of floating point numbers) I need to know the area (in the interval (0,1]) it has in common with every square it intersects.
Right now I convert both (the triangle and the square) to polygons and use Sutherland-Hodgman polygon clipping to calculate the intersection polygon, which I then use to calculate its area.
This approach now shows to be a performance bottleneck in my application. I guess a more specialized (analytical) algorithm would be much faster. Is there a standard solution for this problem, or do you have any idea? I only need the areas, not the shape of the intersections.
Your polygon are convex. There are some algorithms for convex polygons faster than general ones. I've used O'Rourke algorithm with success (code from his book here, I believe that good description exists). Note that some values may be precomputed for your squares.
If your polygons not always intersect, then you may at first check the fact of intersection with separating axes method.
Another option to try- Liang-Barski algorithm for clipping every triangle edge by square.
Edit: You can quickly find all intersections of triangle edges with grid using algorthm of Amanatides and Woo (example in grid traversal section here)
To process this task with hi performance , i suggest some modifications of
Vatti line sweep clipping.
Stepping from minimal Y vertex of your Triangle make such steps:
sort vertexes by Y coordinate
step Y higher to MIN(nextVertex.Y, nextGridBottom)
Calculate points of intersection of grid with edges.
Collect current trapezoid
repeat from step2 until vertex with highest Y coordinate.
Split trapezoids by X coordinate if required.
here is example of Trapezoidalization in X direction
It illustrate main idea of line sweep algorithm. Good luck.
You are not mentioning what precision you are looking for. In case you are looking for a analytical method, disregard this answer, but if you just want to do antialiasing I suggest a scanline edge-flag algorithm by Kiia Kallio. I have used it a few times and it is quite fast and can be set up for very high precision. I have a java implementation if you are interested.
You can take advantage of the regular pattern of squares.
I'm assuming the reason this is a bottleneck is because you have to wait while your algorithm finds all squares intersecting any of the triangles and computes all the areas of intersection. So we'll compute all the areas, but in batches for each triangle in order to get the most information from the fewest calculations.
First, as explained by others, for each edge of the triangle, you can find the sequence of squares that edge passes through, as well as the points at which it crosses each vertical or horizontal edge of a square.
Do this for all three sides, keeping a list of all the squares you encounter, but keep only one copy of each square. It may be useful to store the squares in multiple lists, so that all squares on a given row are all kept in the same list.
When you've found all squares the triangle's edges pass through, if two of those squares were on the same row, any squares between those two that are not in the list are completely inside the triangle, so 100% of each of those squares is covered.
For the other squares, the calculation of area can depend on how many vertices of the triangle are in the square (0, 1, 2, or 3) and where the edges of the triangle intersect the sides of the square. You can summarize all the cases in a few pencil-and-paper drawings, and come up with calculations for each one. For example, when an edge of triangle crosses two sides of the square, with one corner of the square on the "outside" side of the edge, that corner is one angle of a small triangle "cut off" by that edge of the larger triangle; use the points of intersection on the square's sides to compute the area of the small triangle and deduct it from the area of the square. If two points instead of one are "outside", you have a trapezoid whose two base lengths are found from the points of intersection, and whose height is the width of the square; deduct its area from the square. If three points are outside, deduct the entire area of the square and then add the area of the small triangle.
One vertex of the large triangle inside the square, three corners of the square outside that angle: draw a line from the remaining corner to the triangle's vertex, so you have two small triangles, deduct the entire square and add those triangles' areas. Two corners of the square outside the angle, draw lines to the vertex to get three small triangles, etc.
I'm phrasing this so that you always assume you start with the entire area of the square and reduce the area by some amount depending on how the edge of the triangle intersects the square. That way, in the case where the edges of the triangle intersect the square more than twice--such as one edge cuts across one corner of the square and another edge cuts across a different corner, you can just deduct the area cut off by the first edge, then deduct the area cut off by the second edge.
This will be a considerable number of special cases, though you can take advantage of symmetry; for example, you don't have to write the complete calculation for "cut off a triangle in one corner" four times.
You'll write a lot more code than if you just took someone's convex-polygon library off the shelf, and you will want to test the living daylights out of it to make sure you didn't forget to code any cases, but once you get it working, it shouldn't take much more effort to make it reasonably fast.

Is there a standard metric for sorted text?

Given a range of numbers, say from [80,240], it is easy to determine how much of that range lies within [100,105]: (105-100)/(240-80) = 5/160 = .03125. Easy.
So now, how much of a Meriam Webster dictionary lies between umbrella and velvet? Even if we assume uniform distribution of text across the corpus, is there a standard metric for text?
I don't think there is a standard for that. If you had all entries from Meriam Webster in an array, you could use first and last positions as the bounds, so you would have a set going from 1 to n. Then you could pick the positions of "umbrella" and "velvet", call them x and y, and calculate your range as (y - x + 1) / (n).
That works if you are seeing words as elements of an ordered set, so as to have them behave as real numbers. You are basically dividing the distance between two numbers in a set by the distance between the boundaries of the set. Some forms of algebra deal with them differently - when calculating the Levenshtein distance between any two given words, for example, each words is seen as a vector with as many dimensions as they have characters.
You could define the boundaries of your n-dimensional space by using the biggest word in Meriam Webster (hint: it's "pneumonoultramicroscopicsilicovolcanoconiosis", so your space would have 45 dimensions). However, when considering any A-B pair of words, a third word C of intermediary length may or may not be between those, depending on the operations involved in the transformation from A to B.
You'd have to check every word with a length between that of A and B to check whether they are part of the range between A and B... So it's not a matter of simple calculus, and I don't know if this could be even feasible with a regular computer nowadays. And that's just considering Meriam's close to half a million entries.

Find contour of 2D unorganized pointcloud

I have a set of 2D points, unorganized, and I want to find the "contour" of this set (not the convex hull). I can't use alpha shapes because I have a speed objective (less than 10ms on an average computer).
My first approach was to compute a grid and find the outline squares (squares which have an empty square as a neighbor). So I think I downsized efficiently my numbers of points (from 22000 to 3000 roughly). But I still need to refine this new set.
My question is : how do I find the real outlines points among my green points ?
After a weekend full of reflexions, I may have found a convenient solution.
So we need a grid, we need to fill it with our points, no difficulty here.
We have to decide which squares are considered as "Contour". Our criteria is : at least one empty neighbor and at least 3 non empty neighbors.
We lack connectivity information. So we choose a "Contour" square which as 2 "Contour" neighbors or less. We then pick one of the neighbor. From that, we can start the expansion. We just circle around the current square to find the next "Contour" square, knowing the previous "Contour" squares. Our contour criteria prevent us from a dead end.
We now have vectors of connected squares, and normally if our shape doesn't have a hole, only one vector of connected squares !
Now for each square, we need to find the best point for the contour. We select the one which is farther from the barycenter of our plane. It works for most of the shapes. Another technique is to compute the barycenter of the empty neighbors of the selected square and choose the nearest point.
The red points are the contour of the green one. The technique used is the plane barycenter one.
For a set of 28000 points, this techniques take 8 ms. CGAL's Alpha shapes would take an average 125 ms for 28000 points.
PS : I hope I made myself clear, English is not my mothertongue :s
You really should use the alpha shapes. Maybe use only green points as inputs of the alpha alpha algorithm.

How to calculate probabilities from confusion matrices? need denominator, chars matrices

This paper contains confusion matrices for spelling errors in a noisy channel. It describes how to correct the errors based on conditional properties.
The conditional probability computation is on page 2, left column. In footnote 4, page 2, left column, the authors say: "The chars matrices can be easily replicated, and are therefore omitted from the appendix." I cannot figure out how can they be replicated!
How to replicate them? Do I need the original corpus? or, did the authors mean they could be recomputed from the material in the paper itself?
Looking at the paper, you just need to calculate them using a corpus, either the same one or one relevant to your application.
In replicating the matrices, note that they implicitly define two different chars matrices: a vector and an n-by-n matrix. For each character x, the vector chars contains a count of the number of times the character x occurred in the corpus. For each character sequence xy, the matrix chars contains a count of the number of times that sequence occurred in the corpus.
chars[x] represents a look-up of x in the vector; chars[x,y] represents a look-up of the sequence xy in the matrix. Note that chars[x] = the sum over chars[x,y] for each value of y.
Note that their counts are all based on the 1988 AP Newswire corpus (available from the LDC). If you can't use their exact corpus, I don't think it would be unreasonable to use another text from the same genre (i.e. another newswire corpus) and scale your counts such that they fit the original data. That is, the frequency of a given character shouldn't vary too much from one text to another if they're similar enough, so if you've got a corpus of 22 million words of newswire, you could count characters in that text and then double them to approximate their original counts.
