Detect Page Layout using Machine Learning

Detect Page Layout using Machine Learning - modeling

i try to make An application that used to tag page something like HTML Pages
is there is a way using ML to detect and identify the layout of the page
Headers , tables , Footer
or it should only using AI Algorithms
can any one help me in this point
best reagrds

I used Faster RCNN referring to this Post
https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10
and it worked fine with me also i tried some other algorithm to enhance the result like RLSA you can refer to this also
https://pypi.org/project/pythonRLSA/
i couldn't copy a sample of my output is it relative to my business work but result is very optimistic
hope help you

Related

Does Office 365 image search work? If so, how?

According to Microsoft ("Image Analysis" in https://techcommunity.microsoft.com/t5/Microsoft-SharePoint-Blog/Enrich-your-SharePoint-Content-with-Intelligence-and-Automation/ba-p/194174, from May 21, 2018), we should be able to search for text within images.
Is this working for you/anyone? If so, I would like to know what you had to do to get it to work.
I have a SharePoint modern team site with PNG images that contain clearly readable text...but search will not find anything. I have requested re-indexing.
I have had a Microsoft Support request (#10638094) open since June 27 with this question/issue, and no one--even after escalation--has been able to answer it.
Based on the article above, it appears that "MediaService" column(s) should be added to the library to support this; however, I can find no such columns in the environment (using PnP export to review).
Naomi Moneypenny and Kathrine Hammervold highlighted this functionality at Ignite 2017 (https://channel9.msdn.com/Events/Ignite/Microsoft-Ignite-Orlando-2017/BRK2181, about 27:00), but it doesn't seem to be available/working (at least not for me).
August 24: So, after research, digging yet further, I have an escalated support ticket at Microsoft (#10638094, unsolved) and there are conversations at https://techcommunity.microsoft.com/t5/Intelligent-Search-Discovery/Search-for-words-in-your-images-in-Office-365/ba-p/135703, https://techcommunity.microsoft.com/t5/Microsoft-SharePoint-Blog/Enrich-your-SharePoint-Content-with-Intelligence-and-Automation/bc-p/236625, and Does Office 365 image search work? If so, how?. I have yet to hear of this functionality working for anyone. I will keep digging, and I will certainly post if I hear anything. J

After some digging, from official it seems already released at the end of 2017. However there is no any related doc or official guide to this Text in image search function.
The 2 way i can think of perform text in image search.
Perform OCR yourself on the image before uploading the image and embed the text in image metadata.
Use support image type like IIRC and TIF that image are recognized.
In your case, you can upload the image and have another column that contains text and apply metadata to the image in a list/ library column.
OneDrive in another hand also has this function. For example, search for things like "cat" and it * should* pull up most pictures you have of cats. Its more likely using tag as label for the image instead of reading the picture it self.
Also, i believe OneNote has its index recognizable text and handwriting. Maybe this can point you to the right directions.
*Microsoft Azure's computer Vision offer service to recognized text in image. Maybe this can help.

"Is this working for you/anyone?" Yes, I responded to this post elsewhere and see it posted here, as well. Unfortunately, I cannot tell you HOW to get it to work or to verify that it is correctly configured. I can only suggest a test for you to see if it is working for you, as it works for me. I have not tested every way in which it could or should work. I have only discovered it working with PNGs I inserted into Wiki Pages in SharePoint Online. Those PNGs are generated using Snag-It to take Screen Captures and I do not see where Snag-It would be doing any OCR on the image to embed anything, etc. OCR is not even in the Snag-It help file, so I believe the PNG files are just simple PNGs. I insert them into the SharePoint Wiki page, which uploads them to the Site Assets library. And, when I search for a word in the image, the image is returned as a result - not the Wiki page. So, suggest you try a simple test of just inserting a PNG with text in it into a Wiki Page and give the index a bit of time to run to see if it works for you.

It seems like the functionality has matured recently. I have been testing it more thoroughly, and I have documented the results in my blog at http://www.collaboration-foundry.com/SharePointImageAnalysis.
Bottom line: It works for me in OneDrive and SharePoint (modern and classis), but I've only seen it work on the out-of-the-box Document content type--which limits custom solutions somewhat.
It's cool functionality when it works. Looking forward to seeing Microsoft build on this.
John

Google Docs: Table of content page numbers

we are currently building an application on the google cloud platform, which generates reports in Google Doc. For them, it is really important to have a table of content ... with page numbers. I know this is a feature request since a few years and there are add-ons (Paragraph Styles +, which didn't work for us) that provide this solution, butt we are considering to build this ourselves. if anybody has a suggestion on how we could start with this, it would be a great help!
thanks,

Best bet is to file a feature request on the product forums.
Currently the only way to do that level of manipulation of a doc to provide a custom TOC is to use Apps Script. It provides access to the document structure sufficient enough to build and insert a basic table of contents, but I'm not sure there's enough to do paging correctly (unless you force a page break on ever page...) There's no method to answer the question of "what page is this element on?"
Hacks like writing to a DOCX and converting don't work because TOCs are recognized for what they and show up without page numbers.
Of course you could write a DOCX or PDF with the TOC as you'd like and upload as a blob rather than as a Google Doc. They can still be viewed in Drive and such.

Unable to extract data using Import.io from Amazon web page where data is loaded into the page via Ajax

Anyone know how to extract data from a webpage using Import.io where the data is loaded into the page via Ajax?
I am unable to extract data from below mentioned pages.
There is no issue in first page data extraction, but how do I move on to extract data from second page?
URL is given below.
<http://www.amazon.com/gp/aag/main?ie=UTF8&asin=&isAmazonFulfilled=&isCBA=&marketplaceID=ATVPDKIKX0DER&orderID=&seller=A13JB7253Q5S1B>

The data on that page is deployed using an interesting mix of technologies; it relies heavily on server side code and Javascript. That type of page can be a challenge, however, there are always methods to get the data. For example, some sellers have a page like this:
http://www.amazon.co.uk/gp/node/index.html?ie=UTF8&marketplaceID=ATVPDKIKX0DER&me=A2WO1PQ2OIOIGM&merchant=A2WO1PQ2OIOIGM
Which is very easy to extract data from, even using the magic algorithm - https://magic.import.io/?site=http:%2F%2Fwww.amazon.co.uk%2Fgp%2Fnode%2Findex.html%3Fie%3DUTF8%26marketplaceID%3DA1F83G8C2ARO7P%26me%3DA2WO1PQ2OIOIGM%26merchant%3DA2WO1PQ2OIOIGM
I had to take off the redirect=true from the URLs before it would work - just an FYI.
Other times some stores don't have such a URL, its a bit of a pain, and there URLs can be tough to figure out.
We do help some of our enterprise customers build bespoke APIs when the data is very important to them, so do feel free to get in touch. I imagine a larger scale workaround would be to create a dataset/API based on a the categories you are interested in and then to filter that larger dataset down (python or CSV style) by seller name. That would probably work!

I managed to get a static dataset but no API. You can find that dataset at the following GUID: c7c63f1c-7081-4d4a-ad91-afe9789a6620
Thanks

Putting islandora results in a view or block

I am new to islandora. I have it integrated with solr and it works fine but the results only show on the page with the following URL:
somesite.com/islandora/search
This is fine but I would like the results to show on a different page inside a view or a block.
Is this actually possible?
After reading the documentation it doesn't seem like it is. But I believe this should be a rather slandered requirement.
I have tried using URL Aliases, with no luck.
Has anyone done this??

If you have integrated the apache solr with sit Search then there is a module named Search API Page Block, which can help you showing result in block but it will only work for title of the nodes.
As it explains -
Currently, this module is only useful when placing search result
blocks on node pages, since it currently only uses $node->title as the
search keywords. Future development can lead to other uses, including
using Taxonomy terms, Context module, or custom fields to set the
keywords.

Which tools to build a complete interactive mapping application/web application?

I want too build a web application, and I am looking at the tools I will have to use.
I want to use a real time map
I'm a thinking about :
Tilemill to get .png in order to constitue the background of my maps
or get data from a webite in shp files to build layers for this in mapnik.
Mapnik Build layers with the data I want to add on my map.
Mapnik : Put layers together and generate a map.
TileStache : generate tiles for my application.
Openlayers : Display my map with tiles in a browser.
Once my map is displayed, I'd like to add interactivity. For example when you go over a line or a circle (a town/ an event), then it gives you the attributes of this object.
But the lines and circles will integrated dirctly to the mapnik map, so I need to add some javascript to make it dynamic and open a pop-up. How do I do this ? Using Openlayer javascript libraries or node.js.
What is your advice on the question/the way I want to use theese tools?
Thanks a lot!

I'm in a similar situation, so I don't know the answer, but from what I've been able to figure out I think you're on the right track.
I started off using the Mapbox approach, which simplifies things as long as your data is static. You use Tilemill not only to generate your PNG tiles (once you've used Carto to do some nice styling) but also to import your data sets.
TileMill can export your TileJSON and UTFGrid files with the PNG tiles all packaged up and ready to use. Mapbox will then host all that stuff for you, and you can use their mapbox.js library (an extension of Leaflet) to bring it all together in the browser, with full interactivity. Opening popups would be something you'd do in Javascript in the browser - and if you mean infoWindows (the overlay window that's associated with a map point) then that would be a call to the Leaflet API.
If you're happy to create your layers and import your data offline this approach seems to be really simple and powerful; Mapbox will even render out tiles using multiple layers overlaid - so for example you can see your circles on top of a satellite image, merged into a single PNG.
The problem really comes in when your data needs to be live and you can't therefore prepare it all ahead of time in TileMill. I'm still trying to figure this all out but it does seem as though a combination of TileStache and Mapnik would be able to serve you up the TileJSON, GeoJSON and UTFGrid files you'd need as well as the tiles themselves, in the way you've outlined in the question.
You might also want PostGIS and GeoDjango or similar behind the scenes in order to hold and manage your live data, respectively.
As I said, I'm still trying to actually get my full stack working so I can't vouch for this 100% but if your data is gathered upfront then I'd definitely recommend the TileMill route for simplicity's sake.
I hope that's a help!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string