can we use multiple RegisterTransitCallback in ortools? - python-3.x

Does anyone know if we can use multiple RegisterTransitCallback for different dimensions in ortools vrp?
What I have noticed in my code shows that only the last RegisterTransitCallback will take effect for all the added dimensions.

I is common to have one callback for time and one for distance. This is shown in vrptw examples.
So yes.

Related

How to handle various units in a single attribute / feature using Pandas?

I have a dataset, on which i am working on Data Cleaning part, where one of the attribute or feature is having the values with various units. for example some of the values are as follow.
1 kg; 6 LB; 900 gms; 32 oz; etc.
If i use the standard scaler then it will not be fair as the values and their units are different, so cannot treat them as is.
Please do suggest how to handle such data.
I will recommend to change the different value to same unit first of all. For example, you can make all the value to kg or whatever suits best for you, and then perform the standard scale.
Thanks All, I did some research and found that i need to convert the various units into standard units and which follow internation norms referred to SI Units https://www.nist.gov/pml/weights-and-measures/metric-si/si-units , and same suggestion has given by #sharmajee499.
Moving ahead with this approach.. though this is going to be a lot of manual code, but seems there is no direct short and easy way.
Please do post if have any better solution.

If I interrupt sklearn grid_search.fit() before completion can I access the current .best_score_, .best_params_?

If I interrupt grid_search.fit() before completion will I loose everything it's done so far?
I got a little carried away with my grid search and provided an obscenely large search space. I can see scores that I'm happy with already but my stdout doesn't display which params led to those scores..
I've searched the docs: http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html
And there is a discussion from a couple years ago about adding a feature for parrallel search here: https://sourceforge.net/p/scikit-learn/mailman/message/31036457/
But nothing definitive. My search has been working for ~48hrs, so I don't want to loose what's been discovered, but I also don't want to continue.
Thanks!
welcome to SO!
To my understanding there isn't any intermediate variables that get returned off the grid_search function, only the resulting grid and their scores (see here for more information grid search.py).
So if you cancel it you might lose the work that's been done so far.
But a bit of advice, 48 hours is a long time (obviously this depends on the rows, columns and number of hyper parameters being tuned). You might want to start with a more broad grid search first and then refine your parameter search off that.
That will benefit you two ways:
Run time might end up being much shorter (see caveats above) meaning you don't have to wait so long and risk losing results
You might find that your model prediction score is only impacted by one or two hyper parameters, letting you keep the other searches more broad and focussing your efforts on the parameters that influence your prediction accuracy most.
Hopefully by the time I've written this response your grid search has completed!!

Test multiple algorithms in one experiment

Is there any way to test multiple algorithms rather than doing it once for each and every algorithm; then checking the result? There are a lot of times where I don’t really know which one to use, so I would like to test multiple and get the result (error rate) fairly quick in Azure Machine Learning Studio.
You could connect the scores of multiple algorithms with an 'Evaluate Model' button to evaluate algorithms against each other.
Hope this helps.
The module you are looking for, is the one called “Cross-Validate Model”. It basically splits whatever comes in from the input-port (dataset) into 10 pieces, then reserves the last piece as the “answer”; and trains the nine other subset models and returns a set of accuracy statistics measured towards the last subset. What you would look at is the column called “Mean absolute error” which is the average error for the trained models. You can connect whatever algorithm you want to one of the ports, and subsequently you will receive the result for that algorithm in particular after you “right-click” the port which gives the score.
After that you can assess which algorithm did the best. And as a pro-tip; you could use the Filter-based-feature selection to actually see which column had a significant impact on the result.
You can check section 6.2.4 of hands-on-lab at GitHub https://github.com/Azure-Readiness/hol-azure-machine-learning/blob/master/006-lab-model-evaluation.md which focuses on the evaluation of multiple algorithms etc.

Close-Enough TSP implementation

I'm looking for a solution to a Close-Enough Traveling Salesman Problem (CETSP) where I have a set of nodes that I need to visit all within a certain distance of optimally. I've found a couple of sources for some approaches towards this TSP variant but was unable to find a solver or a algorithm that I could easily use.
Do you have any suggestions for how I can go about getting a solution to my CETSP problem, whether it be running an implementation of it myself or using an existing solver.
You can try using UFFLP. They have an example where you can find the correct coordinates the salesman is supposed to pass given a predetermined sequence. So you can generate thousands of sequences and choose the best one (just a simple heuristic).
Have a look at http://www.gapso.com.br/en/ufflp-en/
You will find useful information.

Comparing audio recordings

I have 5 recorded wav files. I want to compare the new incoming recordings with these files and determine which one it resembles most.
In the final product I need to implement it in C++ on Linux, but now I am experimenting in Matlab. I can see FFT plots very easily. But I don't know how to compare them.
How can I compute the similarity of two FFT plots?
Edit: There is only speech in the recordings. Actually, I am trying to identify the response of answering machines of a few telecom companies. It's enough to distinguish two messages "this person can not be reached at the moment" and "this number is not used anymore"
This depends a lot on your definition of "resembles most". Depending on your use case this can be a lot of things. If you just want to compare the bare spectra of the whole file you can just correlate the values returned by the two ffts.
However spectra tend to change a lot when the files get warped in time. To figure out the difference with this, you need to do a windowed fft and compare the spectra for each window. This then defines your difference function you can use in a Dynamic time warping algorithm.
If you need perceptual resemblance an FFT probably does not get you what you need. An MFCC of the recordings is most likely much closer to this problem. Again, you might need to calculate windowed MFCCs instead of MFCCs of the whole recording.
If you have musical recordings again you need completely different aproaches. There is a blog posting that describes how Shazam works, so you might be able to find this on google. Or if you want real musical similarity have a look at this book
EDIT:
The best solution for the problem specified above would be the one described here ("shazam algorithm" as mentioned above).This is however a bit complicated to implement and easier solution might do well enough.
If you know that there are only 5 different different possible incoming files, I would suggest trying first something as easy as doing the euclidian distance between the two signals (in temporal or fourier). It is likely to give you good result.
Edit : So with different possible starts, try doing an autocorrelation and see which file has the higher peak.
I suggest you compute simple sound parameter like fundamental frequency. There are several methods of getting this value - I tried autocorrelation and cepstrum and for voice signals they worked fine. With such function working you can make time-analysis and compare two signals (base - to which you compare, in - which you would like to match) on given interval frequency. Comparing several intervals based on such criteria can tell you which base sample matches the best.
Of course everything depends on what you mean resembles most. To compare function you can introduce other parameters like volume, noise, clicks, pitches...

Resources