Grid Search Applicable for TFF and FL.? - grid-search

I'm currently researching with TFF and image classification (Federated Learning for Image Classification) emnist.
I'm looking at hyper parameters for the model learning rate and optimizer. Is grid search a good approach here ? . In a real world scenario would you simply sample clients/devices from the overall domain and if so if I was to do a grid search would I have to fix my client samples 1st. In which case does it make sense to do the grid search.
What would be a typical real world way of selecting parameters, ie is this more a heuristic approach. ?
Colin . . .

I think there is still a lot of open research in these areas for Federated Learning.
Page 6 of https://arxiv.org/abs/1912.04977 describes a cross-device and a cross-silo setting for federated learning.
In cross-device settings, the population is generally very large (hundreds of thousands or millions) and participants are generally only seen once during the entire training process. In this setting, https://arxiv.org/abs/2003.00295 demonstrates that hyper-parameters such as client learning rate play an outsized role in determining speed of model convergence and final model accuracy. To demonstrate that finding, we first performed a large coarse grid search to identify promising hyper-parameter space, and then ran finer grids in the promising regions. However this can be expensive depending on the compute resources available for simulation, the training process must be run to completion to understand these effects.
It might be possible to view federated learning as very large mini-batch SGD. In fact the FedSGD algorithm in https://arxiv.org/abs/1602.05629 is exactly this. In this regime, re-using theory from centralized model training may be fruitful.
Finally https://arxiv.org/abs/1902.01046 describes a system used at Google for federated learning, and does have a small discussion on hyper-parameter exploration.

Related

What is the best way of architecture selection for deep neural networks

I am training a deep network with Adam optimizer. For single hidden layer, I used to do statistical calculations (coefficient of correlation, MSE etc.) and plot them to select optimum number of hidden nodes. Is there any methodology to select optimum structure for deeper networks? (apart from trial and error.)
Hyperparameter optimization is still an open problem and it is done mostly by trial and error, especially in case of neural networks.
Except for some general guidelines such as monitoring the performance (under/over-fitting), there is not much you can do (if you have a better solution, you can easily market it these days, just search google for paid hyperparameter optimization services and you will see it growing by day).
That being said, you don't need to perform the search manually (manually changing the structure of a neural network/train/repeat). ML libraries (such as Keras and many others) offer some kind hyperparameter grid search where you can set up values/ranges for hyperparameters you want to optimize and let the grid search find the optimal solution (optimal withing the range of values defined by you so it may not be optimal at all if incorrectly configured).
In case of Keras, you can use keras tuner.
Of course, running the tuner can be pretty expensive (depending on the complexity of the network).

What type of CNN will be suitable for underwater image processing?

The primary objective (my assigned work) is to do an image segmentation for the underwater images using a convolutional neural network. The camera shots taken from the underwater structure will have poor image quality due to severe noise and bad light exposure. In order to achieve higher classification accuracy, I want to do an automatic image enhancement for the images (see the attached file). So, I want to know, which CNN architecture will be best to do both tasks. Please kindly suggest any possible solutions to achieve the objective.
What do you need to segment? I'd be nice so see some labels of the segmentation.
You may not need to enhance the image, if all your dataset has that same amount of noise, the network will generalize properly.
Regarding CNNs architectures, it depends on the constraints you have with processing power and accuracy. If that is not a constrain go with something like MaskRCNN, check that repo as a good starting point, some results are like this:
Be mindful it's a bit of a complex architecture so inference times might be a bit too high (but it's doable on realtime depending your gpu).
Other simple architectures are FCN (Fully Convolutional Networks) with are basically your CNN but instead of fully connected layers:
You replace with with Fully Convolutional Layers:
Images taken from HERE.
The advantage of this FCNs are that they are really easy to implement and modify since you can go with simple architectures (FCN-Alexnet), to more complex and more accurate ones (FCN-VGG, FCN-Resnet).
Also, I think you don't mention framework, there are many to choose from and it depends on your familiarly with languages, most of them you can do them with python:
TensorFlow
Pytorch
MXNet
But if you are a beginner, try starting with a GUI based one, Nvidia Digits is a great starting point and really easy to configure, it's based on Caffe so it's fairly fast when deploying and can easily be integrated with accelerators like TensorRT.

Agriculture commodity price predictions using machine learning

I want to create a web application which uses machine learning to predict the price of agriculture commodities before 2-3 months.
Is it really feasible or not?
If yes, then please provide some rough idea about which tools and technologies I can use to implement it.
First of all, study math, more precisely, statistics and differential algebra.
Then, use any open (or not) source neural networking libraries you could find. Even MATLAB would help, as it has a good set of examples (I think it has some of alike prediction models, at least I remember creating a model for predicting election results in Poland)
Decide on your training and input data. Research how news and global situation influences commodity prices. Research how existing bots predict prices for next 1-2 minutes. Also consider using history of predictions from certain individuals, I think Reuters has some API for this. Saying this I imply you'll have to integrate natural language processors, too.
Train your model, test it, improve it for quite a long time.
Finally, deploy a boring front-end and monetize it.
If you dont want to implement ML, you can also use kalman filters.

When and why would you want to use a Probability Density Function?

A wanna-be data-scientist here and am trying to understand as a data scientist, when and why would you use a Probability Density Function (PDF)?
Sharing a scenario and a few pointers to learn about this and other such functions like CDF and PMF would be really helpful. Know of any book that talks about these functions from practice stand-point?
Why?
Probability theory is very important for modern data-science and machine-learning applications, because (in a lot of cases) it allows one to "open up a black box" and shed some light into the model's inner workings, and with luck find necessary ingredients to transform a poor model into a great model. Without it, a data scientist's work is very much restricted in what they are able to do.
A PDF is a fundamental building block of the probability theory, absolutely necessary to do any sort of probability reasoning, along with expectation, variance, prior and posterior, and so on.
Some examples here on StackOverflow, from my own experience, where a practical issue boils down to understanding data distribution:
Which loss-function is better than MSE in temperature prediction?
Binary Image Classification with CNN - best practices for choosing “negative” dataset?
How do neural networks account for outliers?
When?
The questions above provide some examples, here're a few more if you're interested, and the list is by no means complete:
What is the 'fundamental' idea of machine learning for estimating parameters?
Role of Bias in Neural Networks
How to find probability distribution and parameters for real data? (Python 3)
I personally try to find probabilistic interpretation whenever possible (choice of loss function, parameters, regularization, architecture, etc), because this way I can move from blind guessing to making reasonable decisions.
Reading
This is very opinion-based, but at least few books are really worth mentioning: The Elements of Statistical Learning, An Introduction to Statistical Learning: with Applications in R or Pattern Recognition and Machine Learning (if your primary interest is machine learning). That's just a start, there are dozens of books on more specific topics, like computer vision, natural language processing and reinforcement learning.

News Article Categorization (Subject / Entity Analysis via NLP?); Preferably in Node.js

Objective: a node.js function that can be passed a news article (title, text, tags, etc.) and will return a category for that article ("Technology", "Fashion", "Food", etc.)
I'm not picky about exactly what categories are returned, as long as the list of possible results is finite and reasonable (10-50).
There are Web APIs that do this (eg, alchemy), but I'd prefer not to incur the extra cost (both in terms of external HTTP requests and also $$) if possible.
I've had a look at the node module "natural". I'm a bit new to NLP, but it seems like maybe I could achieve this by training a BayesClassifier on a reasonable word list. Does this seem like a good/logical approach? Can you think of anything better?
I don't know if you are still looking for an answer, but let me put my two cents for anyone who happens to come back to this question.
Having worked in NLP i would suggest you look into the following approach to solve the problem.
Don't look for a single package solution. There are great packages out there, no doubt for lots of things. But when it comes to active research areas like NLP, ML and optimization, the tools tend to be atleast 3 or 4 iterations behind whats there is academia.
Coming to the core problem. What you want to achieve is text classification.
The simplest way to achieve this would be an SVM multiclass classifier.
Simplest yes, but also with very very (see the double stress) reasonable classification accuracy, runtime performance and ease of use.
The thing which you would need to work on would be the feature set used to represent your news article/text/tag. You could use a bag of words model. add named entities as additional features. You can use article location/time as features. (though for a simple category classification this might not give you much improvement).
The bottom line is. SVM works great. they have multiple implementations. and during runtime you don't really need much ML machinery.
Feature engineering on the other hand is very task specific. But given some basic set of features and a good labelled data you can train a very decent classifier.
here are some resources for you.
http://svmlight.joachims.org/
SVM multiclass is what you would be interested in.
And here is a tutorial by SVM zen himself!
http://www.cs.cornell.edu/People/tj/publications/joachims_98a.pdf
I don't know about the stability of this but from the code its a binary classifier SVM. which means if you have a known set of tags of size N you want to classify the text into, you will have to train N binary SVM classifiers. One each for the N category tags.
Hope this helps.

Resources