Should Azure Face API's Large Face List be trained every time a new face is added to it? - azure

I use Microsoft Cognitive Services Face API for a face recognition project, where users keep adding faces over a period of time. Previously, the faces were stored in a "Face List". I am shifting the faces to a "Large Face List" now. However, it requires a training call, which "Face Lists" did not require.
I am unable to find any documentation that mentions if
we have to train it once? or
train it every time a face is added?

It is not stated in the REST documentation for Face API but it is stated in the actual documentation of the Face API at the very beginning.
To enable Face search performance for Identification and FindSimilar
in large scale, introduce a Train operation to preprocess the
LargeFaceList and LargePersonGroup. The training time varies from
seconds to about half an hour based on the actual capacity. During the
training period, it's possible to perform Identification and
FindSimilar if a successful training operating was done before. The
drawback is that the new added persons and faces don't appear in the
result until a new post migration to large-scale training is
Which means you need to train it every time there is an addition to the faces, as LargeFaceList is meant for large-scale use (with up to 1,000,000 faces), thus, if you don't require that capacity, then you might want to go with FaceList (with up to 1,000 faces) since it doesn't require training every time.


Azure Custom Vision model retrain

We use Azure Custom Vision service to detect products on store shelves. This works pretty well, but we can't understand why each subsequent iteration of training makes the forecast worse. Does the service create a model from scratch in each iteration, or does it retrain the same model?
Whether you're using the classifier or the object detector, each time you train, you create a new iteration with its own updated performance metrics.
That means that each iteration in a project is independent, built on different sets of training images.
To maintain high performance don't delete existing images from the previous iteration before retraining, because by retraining you're basically creating a new model based on the images you currently have tagged.
It is also stated in the documentation here for the classifier, and here for the object detector.

Grid Search Applicable for TFF and FL.?

I'm currently researching with TFF and image classification (Federated Learning for Image Classification) emnist.
I'm looking at hyper parameters for the model learning rate and optimizer. Is grid search a good approach here ? . In a real world scenario would you simply sample clients/devices from the overall domain and if so if I was to do a grid search would I have to fix my client samples 1st. In which case does it make sense to do the grid search.
What would be a typical real world way of selecting parameters, ie is this more a heuristic approach. ?
Colin . . .
I think there is still a lot of open research in these areas for Federated Learning.
Page 6 of describes a cross-device and a cross-silo setting for federated learning.
In cross-device settings, the population is generally very large (hundreds of thousands or millions) and participants are generally only seen once during the entire training process. In this setting, demonstrates that hyper-parameters such as client learning rate play an outsized role in determining speed of model convergence and final model accuracy. To demonstrate that finding, we first performed a large coarse grid search to identify promising hyper-parameter space, and then ran finer grids in the promising regions. However this can be expensive depending on the compute resources available for simulation, the training process must be run to completion to understand these effects.
It might be possible to view federated learning as very large mini-batch SGD. In fact the FedSGD algorithm in is exactly this. In this regime, re-using theory from centralized model training may be fruitful.
Finally describes a system used at Google for federated learning, and does have a small discussion on hyper-parameter exploration.

predicting next event from averaging sequences

I am pretty new in ml so I am facing some difficulties realizing how could I use spark machine learning libraries with time series data that reflect to a sequence of events.
I have a table that contains this info:
StepN#, element_id, Session_id
Where step n# is the sequence in which each element appears, element_id is the element that has been clicked and session_id in which user session this happened.
It consists of multiple sessions and multiple element-sequence per session. i.e. one session will contain multiple lines of elements. Also each session would have the same starting and ending point.
My objective is to train a model that would use the element sequences observed to predict the next element that is most likely to be clicked. Meaning I need to predict the next event given the previous events.
(in other words I need to average users click behavior for a specific workflow so that the model will be able to predict the next most-relevant click based on the average)
From the papers and the examples I find online I understand that this makes sense when there is a single sequence of events that is meant to be used as an input for the training model.
In my case though, I have multiple sessions/instances of events (starting all at the same point) and I would like to train an averaging model. I find it a bit challenging though to understand how could that be approached using for example HMM in spark. Is there any practical example or tutorial that covers this case?
Thank you for spending the time to read my post. Any ideas would be appreciated!
This can also solve with frequent pattern mining. check this:
In this situation, you can find frequent items that occurred frequently together. In the first step you teach the model what is frequent, Then for prediction step, the model can see some events and can predict the most common events to this event

What is an appropriate training set size for sentiment analysis?

I'm looking to use some tweets about measles/ the mmr vaccine to see how sentiment about vaccination changes over time. I plan on creating the training set from the corpus of data I currently have (unless someone has a recommendation on where I can get similar data).
I would like to classify a tweet as either: Pro-vaccine, Anti-Vaccine, or Neither (these would be factual tweets about outbreaks).
So the question is: How big is big enough? I want to avoid problems of overfitting (so I'll do a test train split) but as I include more and more tweets, the number of features needing to be learned increases dramatically.
I was thinking 1000 tweets (333 of each). Any input is appreciated here, and if you could recommend some resources, that would be great too.
More is always better. 1000 tweets on a 3-way split seems quite ambitious, I would even consider 1000 per class for a 3-way split on tweets quite low. Label as many as you can within a feasible amount of time.
Also, it might be worth taking a cascaded approach (esp. with so little data), i.e. label a set vaccine vs non-vaccine, and within the vaccine subset you'd have a pro vs anti set.
In my experience trying to model a catch-all "neutral" class, that contains everything that is not explicitly "pro" or "anti" is quite difficult because there is so much noise. Especially with simpler models such as Naive Bayes, I have found the cascaded approach to be working quite well.

Incremental training of ALS model

I'm trying to find out if it is possible to have "incremental training" on data using MLlib in Apache Spark.
My platform is Prediction IO, and it's basically a wrapper for Spark (MLlib), HBase, ElasticSearch and some other Restful parts.
In my app data "events" are inserted in real-time, but to get updated prediction results I need to "pio train" and "pio deploy". This takes some time and the server goes offline during the redeploy.
I'm trying to figure out if I can do incremental training during the "predict" phase, but cannot find an answer.
I imagine you are using spark MLlib's ALS model which is performing matrix factorization. The result of the model are two matrices a user-features matrix and an item-features matrix.
Assuming we are going to receive a stream of data with ratings or transactions for the case of implicit, a real (100%) online update of this model will be to update both matrices for each new rating information coming by triggering a full retrain of the ALS model on the entire data again + the new rating. In this scenario one is limited by the fact that running the entire ALS model is computationally expensive and the incoming stream of data could be frequent, so it would trigger a full retrain too often.
So, knowing this we can look for alternatives, a single rating should not change the matrices much plus we have optimization approaches which are incremental, for example SGD. There is an interesting (still experimental) library written for the case of Explicit Ratings which does incremental updates for each batch of a DStream:
The idea of using an incremental approach such as SGD follows the idea of as far as one moves towards the gradient (minimization problem) one guarantees that is moving towards a minimum of the error function. So even if we do an update to the single new rating, only to the user feature matrix for this specific user, and only the item-feature matrix for this specific item rated, and the update is towards the gradient, we guarantee that we move towards the minimum, of course as an approximation, but still towards the minimum.
The other problem comes from spark itself, and the distributed system, ideally the updates should be done sequentially, for each new incoming rating, but spark treats the incoming stream as a batch, which is distributed as an RDD, so the operations done for updating would be done for the entire batch with no guarantee of sequentiality.
In more details if you are using Prediction.IO for example, you could do an off line training which uses the regular train and deploy functions built in, but if you want to have the online updates you will have to access both matrices for each batch of the stream, and run updates using SGD, then ask for the new model to be deployed, this functionality of course is not in Prediction.IO you would have to build it on your own.
Interesting notes for SGD updates:
For updating Your model near-online (I write near, because face it, the true online update is impossible) by using fold-in technique, e.g.:
Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender Systems.
Ou You can look at code of:
Oryx - framework build with Lambda Architecture paradigm. And it should have updates with fold-in of new users/items.
It's the part of my answer for similar question where both problems: near-online training and handling new users/items were mixed.
