Wikipedia read-only bot API? - bots

I have written my own read-only bot (in Objective C) that simply does a curl of target pages every six seconds; it works fine, but takes days when I need to read thousands of pages. So I'd like to switch to a bot API to set maxlag=5 to get better performance when the load is low. Can you point me to the appropriate description of how to write such a simple read-only bot? I have found only complicated descriptions of how to write, e.g., editing bots. Thanks!

See https://www.mediawiki.org/wiki/API:Revisions.
Specify pages by pageids or titles. If your specific pages have something in common (template, image, category) you can use a generator

Related

Having several GoogleResponses in a row without user input or interaction

I am working on a cooking recipe app for google home and I need a way to string several GoogleResponses (SimpleResponse etc..) together without requiring user interaction between them.
I have searched for other answers pertaining to this, and while I have found a few similar questions to mine, the replies tend to be along the lines of "the system was designed for dialogues so what would be the point?".
I fully understand this point of view, however because of the nature and behaviour requirements of the app that I am developing I find myself in need of this particular possibility.
The recipes are divided into steps (revolutionary, I know..) and there is roughly a 1 to 1 correspondence between steps and GoogleResponses.
To give an example of how a typical recipe unfolds it is usually like this (this is a simplification of course):
main content -> question -> main content -> question -> etc..
With each instance of "main content" being a step of the recipe and each "question" requiring user input.
If if was just like this all the time then there would not be a problem, I could just bundle each "main content -> question" section into one GoogleResponse and be done.
However there are often times where the recipe flows more like:
main content -> main content -> main content -> question
With each "main content" being a step in the recipe, it does not make sense in this context to bundle them together into the same response (there is a system for the user to move back and forth between steps).
I was originally using MediaResponses for the "main content" sections as those do not require user input to move onto the next step, but due to various reasons I won't go into here as this is already getting quite long, the project manager has decided that MediaResponses should not be used in this project.
The short answer is the one you already encountered - trying to make conversational actions not-so-conversational doesn't work very well. However, there are a few things you can look into.
Recipe Structured Data
Since you're working on a recipe action, specifically, it may be worthwhile to use the standard recipe support that comes with the Assistant.
On the upside - people will be familiar with it, and you don't need to do much code, just provide markup on a webpage.
On the downside - if you have other requirements for how you want the interaction to go, it isn't that flexible. (For example, if you're asking questions at some of the recipe points, or if you want to offer measurement adjustments based on number of people to serve.)
Misuse the "No Input" event
You can configured dynamic reprompts so you get an event if the user doesn't say anything after a few seconds. If they want to speed a reply, they could ask for the next context specifically, or you can catch the actions_intent_NO_INPUT event in Dialogflow and advance yourself.
There are a few downsides here:
Not all devices support no-input. In particular, for example, mobile devices won't generate this.
This may only be valid for two no-input events in a row. On the third event, the Assistant may automatically close the conversation. (The documentation is unclear on this, and the exact behavior has changed over time.)
Media Response
You're not clear why using Media Response "shouldn't be used", but this is one of the only ways way to trigger an event when speaking is completed.
There are several downsides, however:
There are a number of bugs with Media Response around quitting
On devices with screens, there is a media player. Since the media itself is incidental to what you're doing, having the player doesn't make sense
It isn't supported on all surfaces
Interactive Canvas
A similar approach, however, would be to use the Interactive Canvas. This gives you an HTML page with JavaScript that you control, including being able to generate responses to the server as if the user spoke them (or as if they touched a suggestion chip). You can also listen to events for when the generated speech has finished.
There are, however, a number of downsides which probably prevent you from using this right now:
The biggest is that the Interactive Canvas can only be used for games right now. (But this seems to be a policy decision, rather than a technical one. So perhaps it will be lifted in the future.)
It does not work on smart speakers - only some devices with screens.
Combining the above approaches
One way to get around the device limitations of the Interactive Canvas and the poor visuals that accompany Media Response might be to mix the two. For devices that support IC, use that. If not, try using Media Response. (You may even wish to consider the no-input reprompt for some platforms.)
But this still won't work on all devices, and still has the limitation that Interactive Canvas is only for games right now.
Summary
There is no one, clear, way to handle this... and this isn't a feature they are likely to add given the conversational nature of the platform. However, there may be some workarounds which might work for your scenario.

Design Pattern recommendations - Python Selenium multi-page webscraper w/ Parser and Database

I am working on a scraper that is growing bigger and bigger and I'm worried about making the wrong design choices.
I have never done more than short scripts in python and I'm at a loss knowing how to design a project with bigger proportions.
The scraper retrieves data from different, but similar themed websites, so an implementation for each site is needed.
The desired raw text of each website is then put through a parser which searches for the required values.
After retrieving the values they should be stored in a 3N-Database.
In its final evolution the scraper should run on a cloud service and check all the different sites periodically for new data. Speed and performance are not of highest importance but desirable. Most importantly the required data should be retrieved without unnecessary reuse of code.
I'm using the Selenium webdriver and have the driver object implemented as a singleton, so all the requests are done by the same driver object. The website text is then part of state of that object.
All the other functionality is currently modelled as functions, everything in one file. For adding another website to the project I first copied the script and just changed the retrieval part. As it soon occurred to me that that's pretty stupid I wanted to ask for design recommendations.
Would you rather implement a Retriever mother class and inherit from that for every website or is there an even better way to go?
Many thanks for any ideas!

Instagram API media/popular

What are the queries we can use with media/popular. Can we localize it according to country or geolocation?
Also is there a way to get the discovery feature's results with the api?
This API is no longer supported.
Ref : https://www.instagram.com/developer/endpoints/media/
I was recently struggling with same problem and came to conclusion there is no other way except the hard one.
If you want location based popular images you must go with location endpoint.
https://api.instagram.com/v1/locations/214413140/media/recent
This link brings up recent media from custom location, key being the location-id. Your job is now to follow simple pagination api and merge responded arrays into one big bunch of JSON. $response['pagination']['next_max_id'] parameter is responsible for pagination, so you simply send every next request with max_id of previous request.
https://api.instagram.com/v1/locations/214413140/media/recent?max_id=1093665959941411696
End result will depend on the amount of information you gathered. In the end you will just gonna need to sort the array with like count and you're up to go whatever you were going to do.
Of course important part is to save images locally rather than generating every time user opens the webpage. Reason being not only generation time but limited amount of requests per hour.
Hope someone will come up better solution or Instagram API will finally support media/popular by location.

How can I scrape data from a website?

I want to scrape only four data items from the following page in each and every product from the following link that was an infinitive scroll down page.
name of the product
price of the product
href of the product
img src of the product.
All the data will be stored in a single csv file.
How can I do this?
Any idea?
i have not sure of this method.
get the original source code where you can get all of info of the website including the photo link or any word
This is usually considered a bad idea. If you write code to scrape a website for it's content, what happens when they change their markup? Or what happens when they realize you're scraping (stealing) their original content and ban your server's IP address or IP range even. It's a losing battle, so unless you have permission from them to do so I wouldn't recommend trying. It may work for a little while, but probably not for long. It's generally considered poor form to do something like this, so personally I wouldn't encourage anyone to teach someone how to scrape a website for it's content.
Furthermore, it says very clearly in their Terms of Use not to do exactly that:
You agree not to access (or attempt to access) the Website and the materials
or Services by any means other than through the interface that is provided by
Snapdeal. You shall not use any deep-link, robot, spider or other automatic
device, program, algorithm or methodology, or any similar or equivalent manual
process, to access, acquire, copy or monitor any portion of the Website or
Content (as defined below), or in any way reproduce or circumvent the
navigational structure or presentation of the Website, materials or any
Content, to obtain or attempt to obtain any materials, documents or
information through any means not specifically made available through the
Website.

Instagram synced many images from a tag with Real time, what to do with deleted images

Using the Instagram API, I subscribed to a tag with the Real time feature. I sync media that match my project's criteria, then save those to DB. When users visit my website, I display these images from my DB (and not from instagram API).
From time to time, I see broken links showing up in the images. I identified that the source of the problem is that those images have now been deleted.
What's a good way to handle this?
Probably not attempting to duplicate the Instagram DB (or part thereof) would be the best option. Depending on the usage of your project and what sort of tags you're subscribing to, that could get pretty large pretty quickly.
Short of that, doing a quick HTTPRequest to the image URL (and checking the response code) before deciding whether to display it would do the job.
#Steve Crawford is on the right track.
The problem with your solution is that you are duplicating volatile data that you:
a) can't control
b) don't receive notifications on.
I would think the better method would be track the meta data of the images you are interested (like the author, url,date,etc) and then display them if they are still available.
If you are going to cache data you also need to a way to invalidate your cache. So another option would be to duplicate the data as you already are, but also have a background job to ensure that the data is still valid and remove the ones that aren't.

Resources