I have a bit of a complicated process in Power Automate. I'm trying to parse user uploaded screenshots and categorizing them into different variables. At first, it seemed that an obvious choice would be to build and train the AI Model but the only issue is that the data in the screenshots can vary (i.e. some images will contain more rows, some won't contain the relevant data, and the data can be located in different regions of the screenshot).
Some example of images, which a user can upload, are as follows: (i) Samsung 1 Metrics, (ii) Samsung 2 Metrics (iii) iPhone metrics
My attempt was to perform OCR on the uploaded screenshot and then do string parsing. Therefore, I tried attempting the following flow: Flow Diagram and specifically the substring parsing as:
Substring parsing
Basically, I'm performing OCR on the screenshot and then searching for a substring which corresponds to the values that I'm interested in. I'm unsure if this is the best way to do this as it isn't dynamic (i.e. I have to offset the substring index by a certain amount of characters). Any advice is greatly appreciated.
I believe you should be able to train a custom Form Processing model to extract the information you need. You can two different collections in your training dataset to have the model be able to recognize both Samsung and iPhone layouts.
All you'll need is 5 samples for each collection and you should be good to go.
Related
When preforming image co-registration of multiple subjects, how should we select the reference image?
Can a randomly selected image form one dataset could be the reference image for an image from the other dataset?
If we do that, should all the images belonging to the reference image dataset be co-registered with the reference image as well?
I couldn't find any material in this area. Could someone please advice?
I'm not sure exactly what you mean by the term "dataset", but I will assume you are asking about co-registering multiple images from different patients (i.e. multiple 3D images per subject).
To answer your questions:
If there are no obvious choices about which image is best, then a random choice is fine. If you have e.g. a CT and an MRI for each subject, then co-registration using the CT images is likely going to give you better results because of intrinsic image characteristics (e.g. less distortion, image value linked to physical quantity).
I suppose that depends on what you want to do, but if it is important to have all imaging data in the same co-registered reference space then yes.
Another option is to try and generate an average image, and then use that as a reference to register other images to. Without more information about what you are trying to achieve it's hard to give any more specific advice.
Azure Form recognizer does not provide capability of training less than 5 documents in a group. For this reason, I have to replicate the documents available in the group and change their values so I can create a set of 5 documents. The files are in PNG/JPG format. Any help would be great in finding the right tool to do this.
Note: The layout changes by group
Form Recognizer needs 5 documents to train a model, best is to use real documents to train the model so it fits the production scenario. If you need to create 5 PNG / JPG files from a single file you can use an image editor or even power point to place new text on the existing values as an additional layer or in power point as text boxes with white background and then save them as a new image. Old school way of creating sample documents but as it needs only 5 for train and then 1 or 2 for testing the model shouldn't be too complex.
Is there a way using Azure Cognitive Services to compare two images each containing handwritten
signatures or stamps to come back with a confidence level that the signatures in the scanned
images are similar or that the stamp is the right stamp?
Hint:
1- We are "not" looking for a solution to convert printed or handwritten text in an
image to a machine readable format (like OCR for example)
2- We want to compare two images together and come back with a confidence level that they
are similar. For example, in Face recognition the Face API is able to take two different
images of the face of a person and can identify it is the same person. The only difference
in our scenario is instead of dealing with Faces we want to deal with Handwritten
Signatures and Stamps.
This is not an available feature on Azure Cognitive Search. You'll need to work with some computer vision AI model.
Brief introduction:
I'm trying to get certain texts from an image of a lot of texts.
By just thinking, there should be at least two ways to handle this problem:
One way is first segmenting the images by text areas — for example, train the neural network with a bunch of sample images that contain the sample texts, and then let the trained model locate corresponding text areas in the real image, then crop that area out from the image, save it — and secondly use, for instance, pytesseract to convert image to string.
The other way is to reverse the processes. First convert the image into strings, then train the neural network with sample real texts, then let the trained model find corresponding texts in texts converted from images.
So, my questions are listed below:
Can this problem be solved without training a neural network? Will it be more efficient than NN, in terms of time taken to run the program and accuracy of results?
Among the two methods above I wrote, which one is better, in terms of time taken to run the program and accuracy of results?
Any other experienced suggestions?
Additional background information if needed:
So, I have a number of groups of screenshots of different web pages, each of which has a lot of texts on it. And I want to extract certain paragraphs from that large volume of texts. The paragraphs I want to extract express similar things but under different contexts.
For example, on a large mixed online forum platform, many comments are made on different things, some on landscapes of mountains, some politics, some sciences, etc... As that platform cannot only have one page, there must be hundreds of pages where countless of users make their comments. Now I want to extract the comments on politics specifically from the entire forum, i.e. from all the pages that platform has. So I would use Python + Selenium to scrape the pages and save the screenshots. Now we need to go back to the questions asked above. What to do now?
Update:
Just a thought went by. Probably a NN trained by images that contain texts cannot give a very accurate location of wanted texts, as the NN might be only looking for arrangements of pixels instead of the words, or even meaning, that compose the sentences or paragraphs. So maybe the second method, text processing, may be better in this case? (like NLP?)
So, you decided not to parse text, but save it as an image and then detect text from this image.
Text -> Image -> Text
It is worst scenario for parsing webpages.
While dealing with OCR you must expect many problems, such as:
High CPU consumption;
Different fonts;
Hidden elements (like 'See full text');
And the main one - you can't OCR with 100% accuracy.
If you try to create common parser, that should crawl only required text from any page given without any "garbage" - it is almost utopic idea.
As far as i know, something about this is 'HTML Readability' technology (Browsers like Safari and Firefox uses it). But how it will work with forums i can't say. Forums is very special format of pages.
I have a trivial understanding of NLP so please keep things basic.
I would like to run some PDFs at work through a keyword extractor/classifier and build a taxonomy - in the hope of delivering some business intelligence.
For example, given a few thousand PDFs to mine I would like to determine the markets they apply to (we serve about 5 major industries with each one having several minor industries. Each industry and sub-industry has a specific market and in most cases those deal with OEMs, which in turn deal models, which further sub divide into component parts, etc.
I would love to crunch these PDFs into a semi-structured (more a graph actually) output like:
Aerospace
Manufacturing
Repair
PT Support
M250
C20
C18
Distribution
Can text classifiers do that? Is this too specific? How do you train a system like this that C18 is a "model" of "manufacturer" Rolls Royce of the M250 series and "PT SUPPORT" is a sub-component?
I could build this data manually but would take forever...
Is there a way I could use a text classifier framework and build something more efficiently than regex and python?
Just looking for ideas at this point... Watched a few tutorials on R and python libs but they didn't sound quite like what I am looking for.
Ok lets break your problem into small sub-problems first , i will break the task as
Read PDF and extract data and meta data from them - take a look at Apache Tikka lib
Any classifier to be more effective need training data - Create training data for text classifier
Then apply any suitable classifier algo .
You can also have look at Carrot2 clustering algo , it will automatically analyse the data and group pdf into different categories.