Image scraping from web pages for character recognition using Python

Image scraping from web pages for character recognition using Python - python-3.x

I have to gather a dataset of 10,000 captcha images from a website
http://elegalix.allahabadhighcourt.in/elegalix/StartWebSearch.do
You can find the captcha images under the judgment date tab. So I have to scrape 10,000 images from this website using python 3.0 or above and every time you hit judgement date we get a new captcha image. So I have to take in the image and save it has the (6-digit number).jpg/.png.
Any idea how can I do that and please keep in mind after only I hit judgement date/case number/Judge name/title/counsel name I get a new image, not by refreshing the page. So anyone can help me with the code?
Thanks in Advance

Related

Text from Image API

I am building an API that extracts text from image using Tesseract.js and Node.js. I want to add a feature that tells the user the percentage of text occupied in the image. I'd be much grateful if anyone could guide me how to do this.

Chrome Web Store promotional tile Image has been rejected

I have tried (about a dozen times now) to add promotional tiles to my extension's web store listing.
I am getting this one every single time:
"This small tile image has been rejected due to the following reasons:
Text is too small
Too much detail
Please review the guidelines, upload a new image and republish."
I thought for a while that it's about text, but at my last try it was even without a single character in there and it was still rejected. Also I think the text rule is not that enforced since every single one on the front page has it's name on the tile.
Here is the last one I tried (instantly rejected this time so most likely automatic?) https://i.imgur.com/B2Qh7qO.png
Another one I tried a few days ago: https://i.imgur.com/WMcmF3O.png
Any advice would be appreciated.

The Chrome Webstore Developer support got back to me with the response
"I've checked your item and your promotional image is now fixed"
So it seems like a bug somewhere in their system so if anyone else runs into this don't do what I did and spend months trying to tweak your promotional images over and over, just contact them..
EDIT: For some reason the Developer Support contact form is extremely hard to find. Here it is: https://support.google.com/chrome_webstore/contact/developer_support?hl=en
The follow up support emails came from these address: cws-developer-support#google.com and developer-support#google.com

Unable to extract data using Import.io from Amazon web page where data is loaded into the page via Ajax

Anyone know how to extract data from a webpage using Import.io where the data is loaded into the page via Ajax?
I am unable to extract data from below mentioned pages.
There is no issue in first page data extraction, but how do I move on to extract data from second page?
URL is given below.
<http://www.amazon.com/gp/aag/main?ie=UTF8&asin=&isAmazonFulfilled=&isCBA=&marketplaceID=ATVPDKIKX0DER&orderID=&seller=A13JB7253Q5S1B>

The data on that page is deployed using an interesting mix of technologies; it relies heavily on server side code and Javascript. That type of page can be a challenge, however, there are always methods to get the data. For example, some sellers have a page like this:
http://www.amazon.co.uk/gp/node/index.html?ie=UTF8&marketplaceID=ATVPDKIKX0DER&me=A2WO1PQ2OIOIGM&merchant=A2WO1PQ2OIOIGM
Which is very easy to extract data from, even using the magic algorithm - https://magic.import.io/?site=http:%2F%2Fwww.amazon.co.uk%2Fgp%2Fnode%2Findex.html%3Fie%3DUTF8%26marketplaceID%3DA1F83G8C2ARO7P%26me%3DA2WO1PQ2OIOIGM%26merchant%3DA2WO1PQ2OIOIGM
I had to take off the redirect=true from the URLs before it would work - just an FYI.
Other times some stores don't have such a URL, its a bit of a pain, and there URLs can be tough to figure out.
We do help some of our enterprise customers build bespoke APIs when the data is very important to them, so do feel free to get in touch. I imagine a larger scale workaround would be to create a dataset/API based on a the categories you are interested in and then to filter that larger dataset down (python or CSV style) by seller name. That would probably work!

I managed to get a static dataset but no API. You can find that dataset at the following GUID: c7c63f1c-7081-4d4a-ad91-afe9789a6620
Thanks

Displaying a range of images from Instagram

How to display our images from Instagram say 10 per page?
As for now, I see no way to retrieve info only about images we need rather than the full list of all our images (with https://api.instagram.com/v1/users/self/feed?access_token=ACCESS-TOKEN).
Thus every time when displaying a page with images, we need to download the full list of images from Instagram and this seems slow.
Any ideas?

Store the URLs returned in a database - then you can check periodically for new images and update accordingly. Now, display the images you want from you DB - this way you have full control over which and how many images to display AND it's a lot faster then going through the API...

How do I create a plain-text version of an HTML email?

We are doing EDMs and we're doing it manually. However, this time, we only have 1 big image and some text at the bottom for the EDM (the image is like a christmas card). It goes directly to spam but there is no reason for us to add more text at the bottom. One way I've read is to add a "text-only" version.
But how?
Are there other ways to lower the spam score?

You have not said how you are creating your email, that will have a bearing on any answers you might get re inserting a text only email. You could load your image to a website and have a text only email with a URL link to it. How you do that depends on your authoring tool.
WRT lowering your spam score, have a look at http://www.mailingcheck.com this is a free service to let you test the spam scores of your email

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Image scraping from web pages for character recognition using Python - python-3.x

Related

Text from Image API

Chrome Web Store promotional tile Image has been rejected

Unable to extract data using Import.io from Amazon web page where data is loaded into the page via Ajax

Displaying a range of images from Instagram

How do I create a plain-text version of an HTML email?

Categories

Resources