pyramid + ZODB and pagination - pagination

I've images stored in my pyramid ZODB app. And I've a view that lists all the images, but that makes rendering the page way too long/slow when I've a lot of images, therefore I want them to paginate say, 15 images per page.
I've looked at quite a bit of examples on the web on webhelpers, but all are for SQL, is there a way to use webhelpers with ZODB? or is there some other library for pagination for ZODB?

Related

Load items while scrolling - Angular 4 and nodejs

I want to create a page with articles. I do not want to load all articles at once though (because there are a lot and they have images). I want to make something like Facebook or 9gag has. They have a system and when you scroll it will automatically append items.
Can anyone point me in the right direction how to challenge this?
Should I request the articles JSON all in once (from server) or should I request them when I scroll?
You should load results as they are needed, the mechanism is generally called infinite scroll.
For angular4 you can look at https://github.com/orizens/ngx-infinite-scroll (haven't tried it myself but it looks like it will fit your needs)

Unable to extract data using Import.io from Amazon web page where data is loaded into the page via Ajax

Anyone know how to extract data from a webpage using Import.io where the data is loaded into the page via Ajax?
I am unable to extract data from below mentioned pages.
There is no issue in first page data extraction, but how do I move on to extract data from second page?
URL is given below.
<http://www.amazon.com/gp/aag/main?ie=UTF8&asin=&isAmazonFulfilled=&isCBA=&marketplaceID=ATVPDKIKX0DER&orderID=&seller=A13JB7253Q5S1B>
The data on that page is deployed using an interesting mix of technologies; it relies heavily on server side code and Javascript. That type of page can be a challenge, however, there are always methods to get the data. For example, some sellers have a page like this:
http://www.amazon.co.uk/gp/node/index.html?ie=UTF8&marketplaceID=ATVPDKIKX0DER&me=A2WO1PQ2OIOIGM&merchant=A2WO1PQ2OIOIGM
Which is very easy to extract data from, even using the magic algorithm - https://magic.import.io/?site=http:%2F%2Fwww.amazon.co.uk%2Fgp%2Fnode%2Findex.html%3Fie%3DUTF8%26marketplaceID%3DA1F83G8C2ARO7P%26me%3DA2WO1PQ2OIOIGM%26merchant%3DA2WO1PQ2OIOIGM
I had to take off the redirect=true from the URLs before it would work - just an FYI.
Other times some stores don't have such a URL, its a bit of a pain, and there URLs can be tough to figure out.
We do help some of our enterprise customers build bespoke APIs when the data is very important to them, so do feel free to get in touch. I imagine a larger scale workaround would be to create a dataset/API based on a the categories you are interested in and then to filter that larger dataset down (python or CSV style) by seller name. That would probably work!
I managed to get a static dataset but no API. You can find that dataset at the following GUID: c7c63f1c-7081-4d4a-ad91-afe9789a6620
Thanks

Search result: How to show only pages, not different content items?

We are using Liferay as a classic CMS meaning that we compose pages using web content articles. There is an issue with Liferay's internal search I could not yet find a proper answer for:
Because web content articles are pretty much only building blocks for pages we don't want the search to show them as distinct items. The user should only get a list of pages that contain their search keywords, including all the articles put onto this page.
At the moment we can see two different approaches and both come with certain problems we could not solve yet:
Idea 1
We modify the journal indexer and try to obtain all URLs of the pages (how?) where the article has been placed on. Then we add them to the document to be indexed. In the search result we then can access the URLs and collect them. In the end we make sure every URL is only shown once.
Idea 2
At some point Liferay renders the entire page before sending it to the browser. If we somehow could put an indexer there, we could index the entire page. We then could limit the search to the special "page documents". Getting the fully rendered page would be the main issue here, because either we would have to run a crawler to frequently trigger this indexing or we would need to find a way to trigger page rendering from within an indexer or something like that.
I have been carrying this problem around for quite a while now and still could not find an idea good enough to spend time trying it out. If anyone of you has some input on those two ideas or maybe an entirely different approach, I would be extremely grateful.
I'll just answer myself, because by now we found a suitable solution to solve our problem:
In addition to the default search portlet there is also a "Web Content Search Portlet" shipped with Liferay. It seems to have been part of Liferay for quite a while now, but it's somewhat hard to find, because there is hardly any documentation for it (I only found the Liferay wiki page, which isn't really anything at all). It searches only within web content articles and shows links to the pages rather than just a link an isolated view of the article. It has much less configuration options than the default search portlet, however. Pretty much all it allows to change is whether articles actually have to be placed on at least one page to show up in the results.
So there is no need for any kind of custom indexer or any other "hack"...all we need to do is use the correct portlet. We will only need to write a hook that changes the appearance of the result page.
What you ask is interesting but your ideas are on the wrong direction.
Specially idea 2 it's particulary wrong because you cannot do indexing work meanwhile a page is rendered. Think about performace only.
In Liferay pages and assets are not directly linked: pages have portlets and portlets display assets (web content and more).
Liferay indexing refers and scans assets content, not refers the display result of the assets. Think about permission: the same page can display different contents depends on the user who looks.
bye

Given a URL retrieve the largest image on that page with Node

I'm looking to build a feature into an Angular.js web app that allows a user to paste a url to an eCommerce site like Amazon or Zappos and retrieve the main product image from that page. My plan is to post the url to my express API and handle the image retrieval on the server.
My initial plan was to download the raw html, parse it out with htmlparser, select all the html image elements with soupselect and retrieve their src attributes. Ideally I would like to implement a solution that would work across any site, and not just hardcode values for a particular retailer's site (using specific known css class names). One of the assumptions I made was that the largest image on the page would likely be the main product image, with this logic I decided I would try to sort the images by file size. My idea was to make a http head request with the src url for each of the images to determine their size with the content-length header property. So far this approach has worked well but I would really like to avoid making so many http requests even if they are only head requests.
I feel there is a better way of doing this, would it be easier to use something like PhantomJS to load the entire page and parse it that way? I was trying to make this work as quick as possible and thus avoiding downloading all of the images. Does anyone have any suggestions?
I would think the best image to use isn't the one with the largest file size, but the image that is displayed largest on the page. PhantomJS might be able to help you determine that. Load the page, but instruct PhantomJS not to load images. Then pick the image element whose calculated dimensions are biggest. This will only work if the page uses CSS or width and height attributes on the img to give it dimension.
Alternatively, you could send the image URLs back to the client, and have the client fetch the images and figure out which is biggest. That limits the number of requests your server has to make, and it allows the user to quickly pick a different image if the largest isn't the best.

Displaying a range of images from Instagram

How to display our images from Instagram say 10 per page?
As for now, I see no way to retrieve info only about images we need rather than the full list of all our images (with https://api.instagram.com/v1/users/self/feed?access_token=ACCESS-TOKEN).
Thus every time when displaying a page with images, we need to download the full list of images from Instagram and this seems slow.
Any ideas?
Store the URLs returned in a database - then you can check periodically for new images and update accordingly. Now, display the images you want from you DB - this way you have full control over which and how many images to display AND it's a lot faster then going through the API...

Resources