Distributing uneven data evenly scaling closely data distantly - statistics

I have data point which are too close, I am using linear scale to plot my points. But linear scale range values are close too and point are over lapping and clustering, is there any other scale or method which map close data distantly.
Close data mean data like this [1, 1.1, 1.2, 1.3, 1.4....]

Related

How to approximate low-res 3D density map to smooth models?

3D Density maps of course can be plotted as heatmap, but when data itself is homogeneous (near 0) except for a small part (2D cross section for example):
This should give a letter 'E' shape as 2D "model". The original data is not saved as point-cloud however.
A naive approach would be to use the pixels that are more than a certain value, and then smooth the border. However this does not take into account of the border pixels being small.
Another would be to use some point-cloud based algorithms that come with modeling softwares, but then the point-cloud's probability function would still be discontinuous on pixel border, and not taking into account that only one side have signal.
Is there any tested solution to this (the example is 2D, the actual case is many 2D slices that compose a low-res 3D density map)? I was thinking of making border pixels have area proportional to signal data, and border should be defined from gradient? Any suggestions?
I was thinking of model visualization results similar to this (seems to be based on established point-cloud algorithm):

Networkx (or Graphviz) layout with fixed y positions

Are there any layout algorithms in networkx (or that I can call in Graphviz) that allow me to fix the Y-position of nodes in a DAG to a potentially different floating point value for each node, but spread out the X positions in some reasonable way (ideally attempting to minimise edge lengths or crossovers, although I suspect this might not be possible)? I can only find layouts that require nodes to be on discrete layers.
Added: Below is an example of the sort of graph topology I have, plotted using nx.kamada_kawai_layout. The thing is that these nodes have a "time" value (not shown here), which I want to plot on the Y axis. The vertices are directed in time, so that a parent node (e.g. 54 here) is always older than its children (here 52 and 53). So I want to lay this out with the Y position given by the node "time", and the X position such that crossings are minimised, in as much as that's possible (I know this is NP hard in general, but the layout below is actually doing a pretty good job.
p.s. usually all the leaf nodes, e.g. 2, 3, 7 here, are at time 0, so should be laid out at the bottom of the final layout.
p.p.s. Essentially what I would like to do is to imagine this as a spring diagram, "pick up" the root node (54) in the plot above and place it at the top of the page, with the topology dangling down, then adjust the Y-position of the children to the their internal "time" values.
Edit 2. Thanks to #sroush below, I can get a decent layout with the dot graphviz engine:
A = nx.nx_agraph.to_agraph(G)
fig = plt.figure(1, figsize=(10, 10))
A.add_subgraph(ts.samples(), level="same", name="cluster")
A.layout(prog="dot")
pos = {n: [float(x) for x in A.get_node(n).attr["pos"].split(",")] for n in G.nodes()}
nx.draw_networkx(G, pos, with_labels=True)
But I then want to reposition the nodes slightly so instead of ranked times (the numbers) they use their actual, floating point times. Like this:
true_times = nx.get_node_attributes(G, 'time')
reposition = {node_id: np.array([pos[node_id][0], true_times[node_id]]) for node_id in true_times}
nx.draw_networkx(G, reposition, with_labels=True)
As you can see, that squashed the nodes together rather a lot. Is there any way to increase the horizontal positions of those nodes to make them not bump into one-another? I could perhaps cluster some on to the same layer and iterate, but that seems quite expensive.
The Graphviz dot engine can get you pretty close. This is usually described as a "timeline" issue. Here is a graph that is part of the Graphviz source that seems to do what you want: https://www.flickr.com/photos/kentbye/1155560169

How to do clustering on a set of paraboloids and on a set of planes?

I am performing cluster analysis in two parts: in the first part (a), it is clustering on a set of paraboloides and in the second part (b), on a set of planes. The parts are separated, but in both I initially had one set of images, on every image of which I have detected the points to which I (a) fit the paraboloid and (b) the plane. I obtained the equations of the surfaces (paraboloids and planes) so now I have 2 sets of data, for (a) it is the array of the arrays of size 6 (6 coefficients of the equation of the paraboloid) and for (b) it is the array of the arrays of size 3 (3 coefficients of the equation of the plane).
I want to cluster both groups based on the similarities of (a) paraboloids and (b) planes. I am not sure which features of the surfaces (paraboloids and planes) are suitable for clustering.
For (b) I have tried using the angle between the fitted plane and the plane z = 0 -- so only 1 feature for every object in the sample.
I have also tried simply considering these 3 (or 6) coefficients to be seperate variables, but I believe that this way I am not using the fact that this coefficients are connected with each other.
I would be really greatful to hear if there is a better approach what features to use except merely a set of coefficients. Also, I am performing hierarchical and agglomerative clustering.

how to correlate noise data of sklearn-DBSCAN result with other clusters?

I am using sklearn-DBSCAN to cluster my text data.
I used GoogleNews-vectors-negative300.bin to create 300 dimensional sentence vectors for each document and created metrics of size 10000*300.
when I passed metrics to DBSCAN with few possible values of eps (0.2 to 3) & min_samples (5 to 100) with other default parameters, getting numbers of clusters (200 to 10).
As I analyzed for all the clusters noise data is approx 75-80% of my data.
Is there any way to reduce noise or use some other parameters (distances) to reduce noise?
Even I checked with euclidean distance between 2 vectors is 0.6 but both are in different clusters, how can I manage to bring in same cluster?
X_scaled = scaler.fit_transform(sentence_vectors)
ep = 0.3
min_sam = 10
for itr in range(1,11):
dbscan = DBSCAN(eps=ep, min_samples = min_sam*itr)
clusters = dbscan.fit_predict(X_scaled)
If you want two points at distance 0.6 to be in the same cluster, then you may need to use a larger epsilon (which is a distance threshold). At 0.6 they should be in the same cluster.
Since word2vec is trained with dot products, it would likely make more sense to use the dot product as similarity and/or cosine distance.
But in general I doubt you'll be able to get good results. The way sentence vectors are built by averaging word2vec vectors kills too much signal, and adds to much noise. And since the data is high-dimensional, all such noise is a problem.

computing the matrix that turns one set of coordinates into another

I am playing with some models for the game glest.
These models are made up of one or more meshes; each mesh is made up of many frames which describe the position of each vertex for each frame of animation. In the model shown below, the position of each vertex in each wheel in each frame is in an array.
These models have been exported from 3D tools like Blender. Someone somewhere has the originals.
But I am wondering, for simple animation such as a wheel turning, how can you compute the transforms - the steps of rotate, scale and translate, or the matrix that when applied to the previous frame will result in the new frame?
(Obviously not all frames will have such transforms, because they may distort the models and such.)
Also, how can you detect mirroring and other opportunities to reduce the amount of vertex data by applying a matrix and rendering the same vertices again?
Running speed - if its measured in just minutes - won't be a problem.
First off, some assumptions:
You're dealing with 3D affine transformations (linear transformation plus translation).
You have the vertices for each frame in your animation
You can associate at least 4 vertices in a frame with 4 vertices in the next frame
Then you can take 4 vertices as 4D collumn vectors (appending a 1 in each vector's 4th element) in the original space and concatenate them to create a 4x4 matrix, called X. Do the same for their corresponding vectors in the tranformed space and call them Y, which will also be a 4x4 matrix. A little linear algebra provides you with a method to find the 4x4 matrix A that when applied to X gives you Y. Thus:
AX = Y
A = YX-1
Using this to get rotations and scaling is not trivial. However, the rightmost column of A will contain the translation for the object between the successive frames.

Resources