MASTER_ADDR & MASTER_PORT in PyTorch DDP - pytorch

I am new to PyTorch DDP. Using it for the first time. As per the documentation: https://pytorch.org/docs/stable/distributed.html, MASTER_PORT should be the free port with rank 0 while MASTER_ADDR address of rank 0 nodes. These parameters are required to help rendezvous. My question is: "How I can find these two parameters on Mac or any other server?"
Thanks in advance!

Related

Analyze networks which are not scale free

I am trying to analyze the graph constructed with networkx having around 7000 nodes. When I plot the degree distribution there are nodes that are far away from the fitted power law as shown in the attached plot. This means the network is not scale-free (to my understanding). I am trying to analyze this network by using various parameters such as Degree, clustering coefficient, betweenness centrality, and many others. Does analyzing such networks with these parameters is acceptable? I try to find some examples of analyzing networks that are not scale-free but no luck so far. Any suggestions and pointer for such examples would be really great. In addition, some differences in network characteristics of scale-free and non-scale free networks would be very helpful. Thanks in advance.
1. What type of model did you constructed? Did you use a data from a file?
2. What do you want to check?
Models such as Watts-Strogatz (https://en.wikipedia.org/wiki/Watts%E2%80%93Strogatz_model) is also no scale-free :
'They do not account for the formation of hubs. Formally, the degree
distribution of ER graphs converges to a Poisson distribution, rather
than a power law observed in many real-world, scale-free
networks.[3]'
WS is a 'small-world' network. It is characterized by high clustering coefficient. Why you think you can't analyze it?

what is a wireless channel and the pratical difference between then?

I was studying about differences in 2,4 GHz and 5 GHz , i could understand the whole concept about speed, the range, the frequency etc...
But I still can understand what is a channel. I got some definitions but doesn't make sense " a wireless channel is a way to fine tune and alter the frequency " Could someone explain please.
The channels are just an agreed way to refer to different regions within the portion of bandwidth for a particular wifi range.
For example, each channel in the 2.4GHz spectrum is 5 MHz from the next one - or more accurately the 'centre' of the channels are that distance apart. See below for a diagram from Wikipedia (https://en.wikipedia.org/wiki/List_of_WLAN_channels):
Its important to note that WiFI needs a certain range each side of the centre frequency of the channels (which again are simply a shorthand for specific frequencies within the range), This is shown above and its easy to see from this how channels can 'overlap' which is a common term used in WiFi also.

Dataset sorting and add data in machine learning

Got a dataset of ID (auto increment) and a random number sequence.
ID,Rnumber
1,500
2,799
3,683
4,237
5,974
6,654
7,778
8,423
9,389
And im trying to create a rank from highest to smallest value and categorize sequences of the dataset based on the rank in groups.
Example: Rank 1-150 is placed in groupe 1
Rank 151-300 is placed in groupe 2 and so on.
What is the easiest solution to this using azure machine learning?
I realize that this may be easy, but since my knowledge on this subject is limited to a general knowledge of function and usage its still alot of possibilities and functions to explore machine learning.
So since i got this specific question im asking it here to get a boosted start!
Any help appriciated!

K-means metrics

I have read through the scikit learn documentation and Googled to no avail. I have 2000 data sets, clustered as the picture shows. Some of the clusters, as shown, are wrong, here the red cluster. I need a metrics to method to validate all the 2000 cluster-sets. Almost every metric in scikit learn requires the ground truth class labels, which I do not think I have or CAN have for that matter. I have the hourly traffic flow for 30 days and I am clustering them using k-means. The lines are the cluster centers. What should I do? Am I even on the right track?!The horizontal axis is the hour, 0 to 23, and the vertical axis is the traffic flow, so the data points represent the traffic flow in that hour over the 30 days, and k=3.
SciKit learn has no methods, except from the silhouette coefficient, for internal evaluation, to my knowledge, we can implement the DB Index (Davies-Bouldin) and the Dunn Index for such problems. The article here provides good metrics for k-means:
http://www.iaeng.org/publication/IMECS2012/IMECS2012_pp471-476.pdf
Both the Silhouette coefficient and the Calinski-Harabaz index are implemented in scikit-learn nowadays and will help you evaluate your clustering results when there is no ground-truth.
More details here:
http://scikit-learn.org/stable/modules/clustering.html
And here:
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_samples.html#sklearn.metrics.silhouette_samples
Did you look at the Agglomerative clustering and then the subsection (Varying the metric):
http://scikit-learn.org/stable/modules/clustering.html#varying-the-metric
To me it seems very similar to what you are trying to do.

How to use weka ClassifierSubsetEval

I am using weka on a dataset with ~9000 attributes. I want to run an attribute selection on the dataset and tried the ClassifierSubsetEval AttributeSelection filter. I varied the used Classifiers and search methods.
I am not a machine learner per se, so I do a lot with trial and error.
What I am wondering about:
When I use ClassifierSubsetEval for example with NaiveBayes in combination with GeneticSearch in standard settings, get a selection of about 3000 attributes. However if I use the same classifier with BestFirst forward (standard settings as well as increased number of nonimproving nodes up to 100) I always gett about 25 attributes.
1) Why is the difference so huge? Is the AttributeSelection with BestFirst not getting away from a local optimum?
2) How can I set the GeneticSearch more strict? 3000 attributes seems still a lot.
3) Are there any Classifiers that work especially well with specific search methods? I often see NaiveBayes mentioned together with GeneticSearch.
4) In which cases is it better to use WrapperSubsetEval and why?
Thanks to anyone willing to help or showing me where to look for answers!

Resources