Bipartite Graph (undirected) - colors

I am taking an input, e.g.
4
1 3
1 2
2 4
First line is number of nodes, lines after that are the edges. I have to attempt to color the graph, and if I can't, I need to list a cycle in the graph that is causing the error.
This is fine so far, except one of the graphs contains 1,000,000 nodes. Everytime I try to use it I get a Stack Overflow error, even though I streamlined it more, and raised the max heap size of eclipse to 1024m.
I'm not asking for code, just asking if I am doing something blatantly wrong to keep getting errors.

Maybe you could optimize your cycle detection algorithm.
This might help you: http://en.wikipedia.org/wiki/Cycle_detection
Other than that, a million nodes plus adjacency matrix might as well be too much to handle at once, so perhaps there's a way to just load parts of the graph at a time.

If it's a bipartite graph you can always colour it with only two colours (e.g. colour the first partition white, and the second partition black).
Usually when one thinks of colouring a graph he should specify the number of colours. You can always colour a graph by assigning to each node a different colour. Another example: for planar graphs only four colour are required. However, for most graphs the [chromatic number] of a graph is [NP].
[NP] http://en.wikipedia.org/wiki/Karp%27s_21_NP-complete_problems
[chromatic number] http://en.wikipedia.org/wiki/Graph_coloring

Well, this IS an NP complete problem (longest cycle), so this sort of thing IS likely to happen.
I'd probably just bump the heap again, and let it run...
EDIT: Never mind. got my reduction backwards.

Related

Fitting multiple curves to one data set

I have a data set that I receive from an outside source, and have no real control over.
The data, when plotted, shows two clumps of points with several sparse, irrelevant points. Here is a sample plot:
There is a clump of points on the left, clustered around (1, 16). This clump is actually part of a set of points that lies on (or near to) a line stretching from (1, 17.5) to (2.4, 13).
There is also an apparent curve from (1.75, 18) to (2.75, 12.5).
Finally, there are some sparse points above the second curve, around (2.5, 17).
Visually, it's not difficult to separate these groups of points. However, I need to separate these points within the data file into three groups, which I'll call Line, Curve, and Other (the Curve group is the one I actually need). I'd like to write a program that can do this reasonably well without needing to visually see the plot.
Now, I'm going to add a couple items that make this much worse. This is only a sample set of data. While the shapes of the curve and line are relatively constant from one data set to the next, the positions are not. These regions can (and do) shift, both horizontally and vertically. The only real constant is that there's a negative-slope line from the top-left to the bottom-right of the plot, an almost curve from the top-center to the bottom-right, and most of the sparse points are in the top-right corner, above the curve.
I'm on Linux, and I'm out of ideas. I can tell you the approaches that I've tried, though they have not done well.
First, I cleaned up the data set and sorted it in ascending order by x-coordinate. I thought that maybe the points were sorted in some sort of a logical way that would allow me to 'head' or 'tail' the data to achieve the desired result, but this was not the case.
I can write a code in anything (Python, Fortran, C, etc.) that removes a point if it's not within X distance of the previous point. This would be just fine, except that the scattering of the points is such that two points very near each other in x, are separated by an appreciable distance in y. It also doesn't help that the Line and Curve draw near one another for larger x-values.
I can fit a curve to a partial data set. When I sort the data by x-coordinate, for example, I can choose to only plot the first 30 points, or the last 200, or some set of 40 in the middle somewhere. That's not a problem. But the Line points tuck underneath the Curve points, which causes a problem.
If the Line points were fairly constant (which they're not), I could rotate my plot by some angle so that the Line is vertical and I can just look at the points to the right of that line, then rotate back. This may the best way to go about doing this, but in order to do that, I need to be able to isolate the linear points, which is more or less the essence of the problem.
The other idea that seems plausible to me, is to try to identify point density and split the data into separate files by those parameters. I think this is the best candidate for this problem, since it is independent of point location. However, I'm not sure how to go about doing this, especially because the Line and Curve do come quite close together for larger x-values (In the sample plot, it's x-values greater than about 2).
I know this does not exactly fall in with the request of a MWE, but I don't know how I'd go about providing a more classical MWE. If there's something else I can provide that would help, please ask. Thank you in advance.

D3 - Difference between basis and linear interpolation in SVG line

I implemented a multi-series line chart like the one given here by M. Bostock and ran into a curious issue which I cannot explain myself. When I choose linear interpolation and set my scales and axis everything is correct and values are well-aligned.
But when I change my interpolation to basis, without any modification of my axis and scales, values between the lines and the axis are incorrect.
What is happening here? With the monotone setting I can achieve pretty much the same effect as the basis interpolation but without the syncing problem between lines and axis. Still I would like to understand what is happening.
The basis interpolation is implementing a beta spline, which people like to use as an interpolation function precisely because it smooths out extreme peaks. This is useful when you are modeling something you expect to vary smoothly but only have sharp, infrequently sampled data. A consequence of this is that resulting line will not connect all data points, changing the appearance of extreme values.
In your case, the sharp peaks are the interesting features, the exception to the typically 0 baseline value. When you use a spline interpolation, you are smoothing over these peaks.
Here is a fun demo to play with the different types of line interpoations:
http://bl.ocks.org/mbostock/4342190
You can drag the data around so they resemble a sharp peak like yours, even click to add new points. Then, switch to a basis interpolation and watch the peak get averaged out.

AI search - build a rectangle from 12 tetris shapes how many states are possible?

You have 12 shapes:
which you can make each out of five identical squares.
You need to combine the 12 pieces to one rectangle.
You can form four different rectangles:
2339 solutions (6x10), 2 solutions (3x20), 368 solutions (4x15), 1010 solutions (5x12).
I need to build the 3X20 rectangle:
My question what is the maximum number of states (i.e., the branching factor) that is possible?
My half way calculation:
The way I see it, there are 4 operations on each shape: turn 90/180/270 degrees and mirroring (turning it upside down).
Then, you have to put the shape on the board, somewhere on the 3X20 board.
Illegal states will be one that the shape doesn't fit in the board, but they are still states.
For the first move, you can chose each shape in 4 ways which is 4X12 ways, and then you need to multiply in the number of positions the shape can be in, and that is the number of states you have. But how can I calculate the number of positions?
Please help me with this calculation it is very important, it is not some kind of homework which I'm trying to avoid.
I think there is no easy & 'intelligent' way to list solutions (or states) to pentomino puzzles. You have to try all possibilities. Recursive programming or backtracking is the way to do it. You should check this solution that also has java source code available. Hopefully that points you to the right direction.
There is also a python solution that is perhaps more readable.

Recognizing line segments from a sequence of points

Given an input of 2D points, I would like to segment them in lines. So if you draw a zig-zag style line, each of the segments should be recognized as a line. Usually, I would use OpenCV's
cvHoughLines or a similar approach (PCA with an outlier remover), but in this case the program is not allowed to make "false-positive" errors. If the user draws a line and it's not recognized - it's ok, but if the user draws a curcle and it comes out as a square - it's not ok. So I have an upper bound on the error - but if it's a long line and some of the points have a greater distance from the approximated line, it's ok again. Summed up:
-line detection
-no false positives
-bounded, dynamically adjusting error
Oh, and the points are drawn in sequence, just like hand drawing.
At least it does not have to be fast. It's for a sketching tool. Anyone has an idea?
This has the same difficulty as voice and gesture recognition. In other words, you can never be 100% sure that you've found all the corners/junctions, and among those you've found you can never be 100% sure they are correct. The reason you can't be absolutely sure is because of ambiguity. The user might have made a single stroke, intending to create two lines that meet at a right angle. But if they did it quickly, the 'corner' might have been quite round, so it wouldn't be detected.
So you will never be able to avoid false positives. The best you can do is mitigate them by exploring several possible segmentations, and using contextual information to decide which is the most likely.
There are lots of papers on sketch segmentation every year. This seems like a very basic thing to solve, but it is still an open topic. The one I use is out of Texas A&M, called MergeCF. It is nicely summarized in this paper: http://srlweb.cs.tamu.edu/srlng_media/content/objects/object-1246390659-1e1d2af6b25a2ba175670f9cb2e989fe/mergeCF-sbim09-fin.pdf.
Basically, you find the areas that have high curvature (higher than some fraction of the mean curvature) and slow speed (so you need timestamps). Combining curvature and speed improves the initial fit quite a lot. That will give you clusters of points, which you reduce to a single point in some way (e.g. the one closest to the middle of the cluster, or the one with the highest curvature, etc.). This is an 'over fit' of the stroke, however. The next stage of the algorithm is to iteratively pick the smallest segment, and see what would happen if it is merged with one of its neighboring segments. If merging doesn't increase the overall error too much, you remove the point separating the two segments. Rinse, repeat, until you're done.
It has been a while since I've looked at the new segmenters, but I don't think there have been any breakthroughs.
In my implementation I use curvature median rather than mean in my initial threshold, which seems to give me better results. My heavily modified implementation is here, which is definitely not a self-contained thing, but it might give you some insight. http://code.google.com/p/pen-ui/source/browse/trunk/thesis-code/src/org/six11/sf/CornerFinder.java

Graphviz DOT arrange Nodes in circles, layout too "compact"

I'm halfway there please see the edit
OK here's my problem, I'm generating a graph of a python module, including all the files with their functions/methods/classes.
I want to arrange it so, that nodes gather in circles around their parent nodes, currently everything is on one gargantuan horizontal row, which makes the thing >50k pixels wide and also let's the svg converter fail(only renders about the half of the graph).
I went through the docs but couldn't find anything that seems to do the trick.
So the question is:
Is there a simple way to do this or do I have to layout the whole thing by myself? :/
EDIT:
Thanks to Andrews comment I've got the right layout, the only problem now is that it's a bit to "compact"... so the question now is, how to fix this?
i've mentioned all of the most significant parameters that influence your current layout and then suggested values for those parameters. Still, i suspect you can get the layout that you want just from applying a couple of these suggestions.
reduce the edge weight, eg, [weight=0.5]; this will make the
edges longer, causing the tight
clusters you currently see in your
graph to 'fan out'.
get rid of the node borders, node_A
[color=none; shape=plaintext];
especially for oval-shaped nodes, a
substantial fraction of the total
node space is 'unused' (ie, not used
to display the node label).
explicitly set the font size for
the nodes (the node borders are
enlarged so that they surround the
node text, which means that the font
size and amount of text for a given
node has a significant effect on its
size); [fontsize=11] should be large
enough to be legible yet also reduce
the 'cluttered' appearance (the
default size is 14).
increase minimum separation between
nodes, via 'nodesep'; eg, nodesep=2.0; this will
directly address your objection
regarding your graph being "too
compact." ('nodesep' and 'ranksep'
probably affect how dot draws a graph
more than any other parameters for
node, edge, or graph. In your case,
it looks like you have only two ranks
of nodes; 'ranksep' sets the minimum
distance between nodes of different
ranks--it looks like all of the nodes
that comprise your graph are of the
same rank (except for few top level
nodes in the centers).
explicitly set total graph size, eg,
size="7.75,10.25" (ensures that your
graph fits on an 8.5 x 11 page and
that it occupies the entire space)
And one purely aesthetic suggestion
that at most will only help your
graph appear less cluttered: the
default fontcolor for both edges and
nodes is black. The majority of the
ink on your graph is from those two
structures (particularly if you
remove the node borders), so i would
for instance set either the node
(text) fontcolor or the edge
fontcolor to "blue" to help the eye
distinguish the two sets of graph
structures.
If it is too compact, you will want to mess with the edge length. You have a couple options depending on the graph layout:
If your layout is sfdp or fdp, tweak the graph property K. Default is 0.3.
For neato (or fdp), tweak the edge property len. Default is 1.0 for neato and 0.3 for fdp.
For dot you can use the edge property minlen which is the minimum edge length. Default is 1.
You might also want to mess with the graph property model which determines clustering behavior. Specifically, try subset. I believe this handles len for you:
http://www.graphviz.org/doc/info/attrs.html#d:model
Also, you can remove overlaps all together with scaling techniques: http://www.graphviz.org/doc/info/attrs.html#d:overlap
I have around 500 nodes and used doug's recommendation.
This is my sample code that works (in python):
f = Digraph('companies',filename='companies.gv',
edge_attr={'weight':'1',
'fontsize':'11',
'fontcolor':'blue',
'len':'4'},
graph_attr={'fixedsize':'false',
'bgcolor':'transparent'},
node_attr={'fontsize':'11',
'shape':'plaintext',
'color':'none',
'fontcolor':'black'})
f.attr(layout="neato")
f.attr(nodesep='3')
f.attr(ranksep='3')
f.attr(size='5000,5000')

Resources