Given a URL retrieve the largest image on that page with Node - node.js

I'm looking to build a feature into an Angular.js web app that allows a user to paste a url to an eCommerce site like Amazon or Zappos and retrieve the main product image from that page. My plan is to post the url to my express API and handle the image retrieval on the server.
My initial plan was to download the raw html, parse it out with htmlparser, select all the html image elements with soupselect and retrieve their src attributes. Ideally I would like to implement a solution that would work across any site, and not just hardcode values for a particular retailer's site (using specific known css class names). One of the assumptions I made was that the largest image on the page would likely be the main product image, with this logic I decided I would try to sort the images by file size. My idea was to make a http head request with the src url for each of the images to determine their size with the content-length header property. So far this approach has worked well but I would really like to avoid making so many http requests even if they are only head requests.
I feel there is a better way of doing this, would it be easier to use something like PhantomJS to load the entire page and parse it that way? I was trying to make this work as quick as possible and thus avoiding downloading all of the images. Does anyone have any suggestions?

I would think the best image to use isn't the one with the largest file size, but the image that is displayed largest on the page. PhantomJS might be able to help you determine that. Load the page, but instruct PhantomJS not to load images. Then pick the image element whose calculated dimensions are biggest. This will only work if the page uses CSS or width and height attributes on the img to give it dimension.
Alternatively, you could send the image URLs back to the client, and have the client fetch the images and figure out which is biggest. That limits the number of requests your server has to make, and it allows the user to quickly pick a different image if the largest isn't the best.

Related

Web - Redirect website url that is being used by a request

I know that the title isn't very clear, but my problem is kind of complicated.
Lets say I am on this website (lets say, example.com), and the website is taking images from another website ( images.com ).
I want to change something so that instead of images.com, example.com takes images from myimages.com.
Still too complicated? I will break it down more. This is what currently is hapenning:
example.com takes images from images.com (I dont know what request it uses)
This is want I want:
example.com takes images from myimages.com
If you still dont understand, please comment below!
If example.com is database driven, simply download the database and do a find for "images.com" and replace with "myimages.com", then reupload the database.
If it is static HTML, repeat by downloading the HTML from example.com, using something like Notepad++ to find in files, then replace all in exactly the same way.
As a sidenote, you should provide as much information as possible as to which platform, codebase and type of request you mean. It will generate higher quality answers.
Most of the time this is done through a GET rest call if done on the server side see https://www.tutorialspoint.com/restful/.
If it is done client side(html css javascript) code you will need to find where that reference is defined in the code and change it. In HTML image tags have an src attribute that tells the page where to grab the image from.
If you go to the example shown here: https://www.w3schools.com/tags/tag_img.asp
and change the image tag src like I do below attribute and click run it will show a the stack overflow image located at https://i.stack.imgur.com/oURrw.png.
<img src="https://i.stack.imgur.com/oURrw.png" alt="Smiley face"
height="42">

Load items while scrolling - Angular 4 and nodejs

I want to create a page with articles. I do not want to load all articles at once though (because there are a lot and they have images). I want to make something like Facebook or 9gag has. They have a system and when you scroll it will automatically append items.
Can anyone point me in the right direction how to challenge this?
Should I request the articles JSON all in once (from server) or should I request them when I scroll?
You should load results as they are needed, the mechanism is generally called infinite scroll.
For angular4 you can look at https://github.com/orizens/ngx-infinite-scroll (haven't tried it myself but it looks like it will fit your needs)

Obfuscate Images in EaselJS

Is there any way to protect your sprites on EaselJS?
Currently is too easy to download the sprites.
On chrome just go to console -> resources like this
I made a resarch before i made this answer and found this topic .
That could be very nice. Also we don't need to save the slices in a json like he said, if we have a shuffle seed.
But, i didn't find any thing in nodejs(back-end) to make this image shuffle.
I tried Node GM but its looks too complicaded to bind a image on top of another with (w,h,x,y,offsetX,offsetY)
I know always will have a way to "hack" the resource. But at least offer some difficult.
One of the simple approaches is to encode images to base64, store them as part of Javascript and decode at runtime. See:
Convert and insert Base64 data to Canvas in Javascript
But obviously this will increase download size.
Personally, I would not go this route for "normal" applications or games, unless it is really justified or put on me as an external requirement. For example, one can easily extract assets from the android APK, but this does not seem an area of concern for most of the developers.
The user's browser downloads those images whether you want it or not. Otherwise, they wouldn't be able to display them.
At any given time, any user can just right click on any image on the site and click SAVE AS, you can't stop it, and you shouldn't try.
If you don't want people downloading your work, don't put it on the public facing internet.

Unable to extract data using Import.io from Amazon web page where data is loaded into the page via Ajax

Anyone know how to extract data from a webpage using Import.io where the data is loaded into the page via Ajax?
I am unable to extract data from below mentioned pages.
There is no issue in first page data extraction, but how do I move on to extract data from second page?
URL is given below.
<http://www.amazon.com/gp/aag/main?ie=UTF8&asin=&isAmazonFulfilled=&isCBA=&marketplaceID=ATVPDKIKX0DER&orderID=&seller=A13JB7253Q5S1B>
The data on that page is deployed using an interesting mix of technologies; it relies heavily on server side code and Javascript. That type of page can be a challenge, however, there are always methods to get the data. For example, some sellers have a page like this:
http://www.amazon.co.uk/gp/node/index.html?ie=UTF8&marketplaceID=ATVPDKIKX0DER&me=A2WO1PQ2OIOIGM&merchant=A2WO1PQ2OIOIGM
Which is very easy to extract data from, even using the magic algorithm - https://magic.import.io/?site=http:%2F%2Fwww.amazon.co.uk%2Fgp%2Fnode%2Findex.html%3Fie%3DUTF8%26marketplaceID%3DA1F83G8C2ARO7P%26me%3DA2WO1PQ2OIOIGM%26merchant%3DA2WO1PQ2OIOIGM
I had to take off the redirect=true from the URLs before it would work - just an FYI.
Other times some stores don't have such a URL, its a bit of a pain, and there URLs can be tough to figure out.
We do help some of our enterprise customers build bespoke APIs when the data is very important to them, so do feel free to get in touch. I imagine a larger scale workaround would be to create a dataset/API based on a the categories you are interested in and then to filter that larger dataset down (python or CSV style) by seller name. That would probably work!
I managed to get a static dataset but no API. You can find that dataset at the following GUID: c7c63f1c-7081-4d4a-ad91-afe9789a6620
Thanks

Displaying a range of images from Instagram

How to display our images from Instagram say 10 per page?
As for now, I see no way to retrieve info only about images we need rather than the full list of all our images (with https://api.instagram.com/v1/users/self/feed?access_token=ACCESS-TOKEN).
Thus every time when displaying a page with images, we need to download the full list of images from Instagram and this seems slow.
Any ideas?
Store the URLs returned in a database - then you can check periodically for new images and update accordingly. Now, display the images you want from you DB - this way you have full control over which and how many images to display AND it's a lot faster then going through the API...

Resources