faceapi.js nodejs + express implementation

faceapi.js nodejs + express implementation - node.js

I am trying to develop a real time face recognition model using faceapi js using node and express. I have tried to recognize the face using the below code.
https://gist.github.com/Dannybrown2710/5e30c063da936ee947d8fff474f9f57b
But it seems that it takes a minute to train and recognize a model. Is there any way to improve recognizing speed to get real time results as in browser?

The code is loading and training the models on each request, why not load once on startup like so:
let loadModelsPromise = loadModels();
async function detectFace(image) {
await loadModelsPromise;
...
This way the first call may be delayed, however each subsequent call should be very quick. I'm presuming that the uploaded face is not added to the training data.

Related

Nodejs strapi load functions implementation at bootstrap

I have a backend running on strapi v3 (nodejs), however the answer doesnt need to be strapi specific, nodejs specific is good enough.
We are running for 1 region and are expanding to second region, then for example, a function calculateFee() will have different implementation depending on region app is running on.
We dont want to branch implementation in the function itself , something like if (region == regionA) then { doA ) else { doB } since it's very hard to maintain and expand to yet another region or more.
What I'm thinking of is we have regionA.js. containing its implementation of calculateFee() , regionB has regionB.js containing its own. Then at bootstrap, we pre-load current region's implementation of the function, then the rest of the app behave as usual without changes.
However, i'm not very experienced with nodejs or strapi, and stuck at how to achieve this. Many thanks in advance !

Google Cloud Vision API : OCR text detection giving inaccurate result- Node JS

I have used the below code to perform text detection on the image.
const client = await new vision.ImageAnnotatorClient({ credentials });
const [result] = await client.textDetection(path);
console.log(result.textAnnotations);
the intended result is BBTPB9999Q but
the program output is BBTPB9999C
Is there any way to improve the accuracy?

From my experience the quality of the pictures matters. So I think improving quality of analyzed image is only way to improve results of detection when it comes to descriptions similar to your example, like serial numbers etc. Once I had an idea to try to convert images before, ex. make them monochrome, however I never got test results worth implementing. You may try this...
At the moment, in Google Cloud Vision, only way to improve the the result for the same picture is to Specify the language, however this will work only when you want to detect text in already known language.

Design Pattern recommendations - Python Selenium multi-page webscraper w/ Parser and Database

I am working on a scraper that is growing bigger and bigger and I'm worried about making the wrong design choices.
I have never done more than short scripts in python and I'm at a loss knowing how to design a project with bigger proportions.
The scraper retrieves data from different, but similar themed websites, so an implementation for each site is needed.
The desired raw text of each website is then put through a parser which searches for the required values.
After retrieving the values they should be stored in a 3N-Database.
In its final evolution the scraper should run on a cloud service and check all the different sites periodically for new data. Speed and performance are not of highest importance but desirable. Most importantly the required data should be retrieved without unnecessary reuse of code.
I'm using the Selenium webdriver and have the driver object implemented as a singleton, so all the requests are done by the same driver object. The website text is then part of state of that object.
All the other functionality is currently modelled as functions, everything in one file. For adding another website to the project I first copied the script and just changed the retrieval part. As it soon occurred to me that that's pretty stupid I wanted to ask for design recommendations.
Would you rather implement a Retriever mother class and inherit from that for every website or is there an even better way to go?
Many thanks for any ideas!

Using keras model in pyspark lambda map function

I want to use the model to predict scores in map lambda function in PySpark.
def inference(user_embed, item_embed):
feats = user_embed + item_embed
dnn_model = load_model("best_model.h5")
infer = dnn_model.predict(np.array([feats]), verbose=0, steps=1)
return infer
iu_score = iu.map(lambda x: Row(userid=x.userid, entryid=x.entryid, score = inference(x.user_embed, x.item_embed)))
The running is extremely slow and it stuck at the final stage quickly after code start running.
[Stage 119:==================================================>(4048 + 2) / 4050]
In HTOP monitor, only 2 of 80 cores are in full work load, others core seems not working.
So what should I do to making the model predicting in parallel ? The iu is 300 million so the efficiency if important for me.
Thanks.
I have turn verbose=1, and the predict log appears, but it seems that the prediction is just one by one , instead of predict in parallel.

During the response I researched a little bit and found this question interesting.
First, if efficiency is really important, invest a little time on recoding the whole thing without Keres. You still can use the high-level API for tensorflow (Models) and with a little effort to extract the parameters and assign them to the new model. Regardless it is unclear from all the massive implementations in the framework of wrappers (is TensorFlow not a rich enough framework?), you will most likely meet problems with backward compatibility when upgrading. Really not recommended for production.
Having said that, can you inspect what is the problem exactly, for instance - are you using GPUs? maybe they are overloaded? Can you wrap the whole thing to not exceed some capacity and use a prioritizing system? You can use a simple queue if not there are no priorities. You can also check if you really terminate tensorflow's sessions or the same machine runs many models that interfere with the others. There are many issues that can be the reason for this phenomena, it will be great to have more details.
Regarding the parallel computation - you didn't implement anything that really opens a thread or a process for this models, so I suspect that pyspark just can't handle the whole thing by its own. Maybe the implementation (honestly I didn't read the whole pyspark documentation) is assuming that the dispatched functions runs fast enough and doesn't distributed as it should. PySpark is simply a sophisticated implementation of map-reduce principles. The dispatched functions plays the role of a mapping function in a single step, which can be problematic for your case. Although it is passed as a lambda expression, you should inspect more carefully which are the instances that are slow, and on which machines they are running.
I strongly recommend you do as follows:
Go to Tensorflow deplot official docs and read how to really deploy a model. There is a protocol for communicating with the deployed models called RPC and also a restful API. Then, using your pyspark you can wrap the calls and connect with the served model. You can create a pool of how many models you want, manage it in pyspark, distribute the computations over a network, and from here the sky and the cpus/gpus/tpus are the limits (I'm still skeptical about the sky).
It will be great to get an update from you about the results :) You made me curious.
I hope you the best with this issue, great question.

How to structure module to be closer/adhere to Node.js philosophy

I am relatively new to Node.js and I am trying to get more familiar with it by writing a simple module. The module's purpose is take an id, scrape a website and return an array of dictionaries with the data.
The data on the website is scattered across pages whereas every page is accessed by a different index number in the URI. I've defined a function that takes the id and page_number, scrapes the website via http.request() for this page_number and on end event the data is passed to another function that applies some RegEx to get the data in a structured way.
In order for the module to have complete functionality, all the available page_nums of the website should be scraped.
Is it ok by Node.js style/philosophy to create a standard for() loop to call the scraping function for every page, aggregate the results of every return and then return them all in once from the exported function?
EDIT
I figured out a solution based on help from #node.js on freenode. You can find the working code at http://github.com/attheodo/katina_node
Thank you all for the comments.

The common method, if you don't want to bother with one of the libraries mentioned by #ControlAltDel, is to to set a counter equal to the number of pages. As each page is processed (ansynchronously so you don't know in what order, nor do you care), you decrement the counter. When the counter is zero, you know you've processed all pages and can move on to the next part of the process.

The problem you will probably encounter is recombining all of the aggregated results. There are several libraries out there that can help, including Async and Step. Or you can use a promises library like Fibers.Promise. But the latter is not really node philosophy and requires direct code changes / additions to the node executable.

With the helpful comments from #node.js on Freenode I managed to find a solution by sequentially calling the scraping function and attaching callbacks, as Node.js philosophy requires.
You can find the code here: https://github.com/attheodo/katina_node/blob/master/lib/katina.js
The code block of interest lies between lines 87 and 114.
Thank you all

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string