Python - make a search and retrieve a set amount of images from a search engine - python-3.x

I would like to get images from a search engine, to run some automated tests without the need to go online and pick them by hand.
I found an old example from 5 years ago (ajax.googleapis.com/ajax/services/search/images), which sadly does not work anymore. What is the current method to do so in Python3? Ideally I would like to be able to pass a string with the search name, and retrieve a set amount of images, at full size.
I don't really mind which search engine is used; I just want to be sure that it is supported for the time being. Also I would like to avoid Selenium; I am planning to run this without any UI nor using the browser, all from terminal.

Have you heard of pixabay? There is a nice python wrapper for working with it as well.

Found a pretty good solution using BeautifulSoup.
It does not work on Google, since I get 403, but when faking the header in the request, is possible to get sometimes, data. I will have to experiment with different other sites.
So far the workflow is to search in the browser so I can get the url to pass to beautifulsoup. Once I get the url in code, I replaced the query part with a variable, so I can pass it programmatically.
Then I parse the output of beautifulsoup to extract the links to the images, and retrieve them using requests.
I wish there was a public API to get also parameters like picture size and such, but I found nothing that works currently.

Related

Instagram - Obtaining data in realtime

I am trying to get the recent post from a particular location. using this url.
https://api.instagram.com/v1/media/search?lat=34.0500&lng=-118.2500&distance=50&MAX_ID=max_id&access_token=XXXX
So when I use this URL for the first time, I get 20 results. I obtain the max ID from the list of 20 results and modify my url .
But when I use the modified URL, I obtain the same result as the first one.
How do I go about solving this?
Contrary to what I thought, the media search endpoint doesn't return a pagination object. Sorry. It also doesn't support the min_id/max_id parameters, which is why you are having problems..
If you want to get different data you are going to have to use the time based request parameter MIN_TIMESTAMP. However it looks like that parameter doesn't work for that endpoint either (though the documentation says it is supported). Indeed, a quick search on the internet reveals it might be a long standing bug with the api.

Unable to extract data using Import.io from Amazon web page where data is loaded into the page via Ajax

Anyone know how to extract data from a webpage using Import.io where the data is loaded into the page via Ajax?
I am unable to extract data from below mentioned pages.
There is no issue in first page data extraction, but how do I move on to extract data from second page?
URL is given below.
<http://www.amazon.com/gp/aag/main?ie=UTF8&asin=&isAmazonFulfilled=&isCBA=&marketplaceID=ATVPDKIKX0DER&orderID=&seller=A13JB7253Q5S1B>
The data on that page is deployed using an interesting mix of technologies; it relies heavily on server side code and Javascript. That type of page can be a challenge, however, there are always methods to get the data. For example, some sellers have a page like this:
http://www.amazon.co.uk/gp/node/index.html?ie=UTF8&marketplaceID=ATVPDKIKX0DER&me=A2WO1PQ2OIOIGM&merchant=A2WO1PQ2OIOIGM
Which is very easy to extract data from, even using the magic algorithm - https://magic.import.io/?site=http:%2F%2Fwww.amazon.co.uk%2Fgp%2Fnode%2Findex.html%3Fie%3DUTF8%26marketplaceID%3DA1F83G8C2ARO7P%26me%3DA2WO1PQ2OIOIGM%26merchant%3DA2WO1PQ2OIOIGM
I had to take off the redirect=true from the URLs before it would work - just an FYI.
Other times some stores don't have such a URL, its a bit of a pain, and there URLs can be tough to figure out.
We do help some of our enterprise customers build bespoke APIs when the data is very important to them, so do feel free to get in touch. I imagine a larger scale workaround would be to create a dataset/API based on a the categories you are interested in and then to filter that larger dataset down (python or CSV style) by seller name. That would probably work!
I managed to get a static dataset but no API. You can find that dataset at the following GUID: c7c63f1c-7081-4d4a-ad91-afe9789a6620
Thanks

Search Algorithm for a web application that needs to look for a specific value

I'm developing a webapp that will need to download the html form a website and then iterate through the code and try to find a specific but ever changing value (in our case it will be the price for the product).
For this, I was thinking about asking the user (upon installation and setup) to provide the system with a few lines of html from the page (that has the price) and then from then on, every time we need to fetch the price we would try to search for those lines and find the price.
Now, I believe this is a horrible and slow way of doing this and since there are no rules and the html can be totally different from one website to another (even the same website might change) I couldn't find a better way.
One improvement that I thought about was to iterate through the first time and record the line at which we find the code. Once found, the subsequent times we would then start from a few lines before the expected location and start the search. Any Thoughts on how I can improve on this?
I posted this question on https://cstheory.stackexchange.com/ but they commented that it's not on topic and that I should post it here.
I have the code for the above and if needed I can post it, I'm simply thinking that there must be a better, faster way of doing this.
This is actually something I tried for a project recently (using BeautifulSoup and Python). The solution that worked for me was to workout CSS selectors (which can map to jQuery selectors) that targeted the elements that contained the values I was looking for. In my case I was able to narrow down the full document to just the elements that contained what I was looking for but if you couldn't get exactly what you where after you could combine this with some extra lactic like test to see if it looks like a price (via regex) or test what it is next to.

Python3 free (no sign up) currency conversion

After looking, I have been unable to find a python3 module or method of pulling exchange rate data from the internet into my program.
Any ideas?
Thanks
Well, I'm not sure what's the location of your data source and in what format you need the data, but from your description I can make the following recomandation:
Use a service like:
https://openexchangerates.org/
and afterwards parse the response with a JSON parser (like the official one):
http://docs.python.org/3.3/library/json.html
and Voila you have the currency.
UPDATE
If you really have time and as the time passes patience to modify it, you can take whatever site you want with a visible exchange rate and write a HTML parser:
http://www.diveintopython.net/html_processing/extracting_data.html
This solution gives you freedom (basically you can query whatever you want) but if a site changes... well you have to update your code
UPDATE 2
You can use a very simple trick and a very simple html from google. Try to call the following link:
http://www.google.com/finance/converter?a=1&from=BGN&to=AED
and you will get a response for free, without any key, but be aware that this is not fair play especially when you are polling the service many times a minute!

can you have "variables" in text in google sites?

Sorry, this is a bad question. I don't even know what the title should be. I'm a total noob at making websites so this might be easy to find but I just don't know the terminology to search for. I cannot find anything about how to do this...
What I want to do is have something like references/variables that I can use in a block of text and it will automatically get replaced with whatever value should be there. Best way I can think of to describe it would be if I was using the site as a design doc for a game or something, I would be able to type in [Title] or something similar on any page and when it loads that text would be replaced with whatever my Title is. That way If I ever change titles, names, classes, races, places, items, etc... they would only have to be changed in 1 place and the change would be reflected everywhere.
I notice if I add a link to a page it will automatically use the Title of that page as the text of the link. That is almost exactly what I want. Except when I change the Title of the other page the text of the link remains as the original text. It doesn't get updated to the new Title and that is not at all what I want.
Also, I want to do this in Google Sites and as simply as possible. I don't really want to use a database. I was hoping Google Sites would have some kind of funcionality for this.
I don't believe this is possible (on Google Sites) and likely you need to consider a hosted solution.
Quoting the answer from this relevant post:
You should consider hosting your solution using Google's App Engine
instead of Google Sites. You can set it up so it uses PHP (see link
below), you can configure it to use your domain name and you get
enough CPU, disk and bandwidth allowance to serve around five million
page views for free each month, if you are serving more than that,
their prices are extremely competitive.
Google App Engine:
http://code.google.com/appengine/docs/whatisgoogleappengine.html How
to setup PHP using Google App Engine: http://blog.caucho.com/?p=187
Also I'm not sure how your PHP skills are but if you're unfamiliar with it then this should help to get you started.

Resources