Programmatic Website Screen Shots - web

I'm checking into the available methods for programmatically taking screenshots of an arbitrary website. For example,
Facebook:
http://clicktoverify.truste.com/pvr.php?page=validate&url=www.facebook.com&sealid=102
Salesforce.com:
http://clicktoverify.truste.com/pvr.php?page=validate&companyName=Salesforce.com,%20Inc.&sealid=102&internal=true
On that page you'll see they have a screenshot of the referenced site.
What are my options for getting these kinds of screenshots in an automated fashion?
I'm primarily working with PHP, but am open to all suggestions.
Thanks!

Truth of the matter is, the actual process is extremely non-trivial. You can of course manage it but it is a very difficult matter.
That said, there are a ton of webservices that you can use that do exactly this. One such example is http://www.thumbalizr.com/ although they are by no means alone in this.

Related

Is it possible to create an app for a site without an API?

I would like to create an app for a myBB forum. So the site on the forum will look nicer and much more cleaner on an iPhone or Android.
Is it possible without an API? It isn't my site ether.
everything is possible, it's just a matter of resources...
technically, you can write an app for everything on the web, but:
an API will tell you how you can do things with the site, without having to reverse engineer all pages/posts/..., and the format of every output resulting from post/get operations. reverse engineering may take a long time, and you will surely not come accross all possible results (error pages, bad authentication...);
an API is quite stable and is always updated with great care from the developpers so as not to break existing applications. without an API, there is no guarantees that your app will not break with the next release of the forum when it is upgraded;
a web API generally defines an output format which is easily parseable: many API outputs XML or JSON, which can be processed with standard libraries. without an API, the output format is plain HTML, which may be difficult to reorganize in order to show the results in a different format.
so, yes, you can definitely write an app for a myBB forum, but it may require a fair amount of work.
You can do, it's called screen scraping and is what was done before XML, the semantic web, SOAP, web services and then JSON apis tried to solve the problem better.
In screen scraping, you grab the site's HTML, parse it, get the data you want out of it, then do what you need with that data. It's more work, and breaks each time the site's layout changes, hence the history of improvements to it.
You mention the site in question is not yours. Many sites do not regard screen scraping as fair use, so check with the site's terms and conditions that you can legally create an app from the data posted there.
you can consider useing HTML5 ... do you think it doable for use app ?

Integrating PowerShell in SharePoint

For some time I have been looking at the possibility to integrate PowerShell as a scripting engine in SharePoint but I haven't found the right solution yet.
My main objective is to enable event triggers in e.g. a list to call and execute a PowerShell script (by filename) on the local server. This would give me a lot of flexibility compared to using an ordinary event handler written in visual studio, but the question is whether it is possible and whether I have overlooked some serious security issues?
Since each and every unique idea that I come up with in many years have already be invented by somebody else, I might have missed an existing product/project so any links to such projects will be appreciated, thanks
In the spirit of "already being invented by somebody else", check out http://www.codeplex.com/iLoveSharePoint for some very interesting uses of PowerShell inside SharePoint. Some great code samples and documentation. Haven't tried myself yet, but seems interesting.
I see what you're trying to achieve, but there's something that just doesn't "feel right" about a user indirectly running script code on your server.
The key difference is that the script can be run by anyone logging into the server. Event handlers can only be run by SharePoint. Strict validation of any inputs would be essential. You should also ensure the script is signed so tampered scripts won't execute.
Also, scripts by their nature aren't really designed for enterprise solutions. There is less opportunity for best practices such as good software architecture, design patterns, source control, code analysis, unit testing, and reuse of code. It's also messy/difficult to share code with a common code base that contains web parts, controls, entities, etc.
Finally, introducing PowerShell means another technology to be maintained in the mix we already have with SharePoint. This might be OK if you are comfortable with it.
Depending on how much customisation has already been done or is planned for the future some of the points above may not matter. Be sure to think about how this idea would feel if implemented 6, 12 and 24 months down the track.

Linking to scripts or images will change page rank?

If i create some useful services like some dynamic free photos or some useful java scripts, for example http://www.mysite.com/ipaddress.jpg
and then other websites use this image or script in their webpages... dose it count for link popularity for me (inbound link) and does it increase my website ranks from google point of view?
Like Emiswelt said, no one knows Google's secret algorithms, ask a SEO.
However, most conventional crawlers only follow anchor tag hrefs.
If you search for filetype:.jpg on Google, you will not get any image results and only actual pages are shown. Images and javascripts cannot have any outbound links, and thus pagerank is probably not applicable to them.
As others have said, the actual system is top secret, but all the major search vendors support things that help describe images better, in particular RDF (Resource Description Framework). It is probablly your best bet at providing information about your images in the most useful way that will hopefully give them an appropriate page rank.
The PageRank value reflects the importance of certain web pages an not that of a whole web site.
Nobody knows this exactly, google's algorithms are a big secret...
It probably does not boots your page rank (I sincerely hope not!), but there is no way to know for sure. Besides, if it boost it today, there's no guarantee it will still boosts it tomorrow.
If you are think of doing clever things just to boot your site's ranking, I'd think again. If you are too blatant about it and Google catches you, your site might get zero ranked. And as a user of Google, I think that is a good thing.

Are there any building blocks for a search engine that will scrape other sites?

I want build a search service for one particular thing. The data is freely available out there, via free classified services, and a host of other sites.
Are there any building blocks, e.g. open-source crawlers that I would customize - rather than build from scratch, that I can use?
Any advice on building such a product? Not just technical, but any privacy/legal things that I might need to take into consideration.
E.g. do I need to 'give credit' where the results are from and put a link to the original - if I get them from many places?
Edit: By the way, I am using GWT with JS for the front-end, haven't decided on the language for the back-end. Either PHP or Python. Thoughts?
There are few blocks in python you can use.
beautifulsoup [http://www.crummy.com/software/BeautifulSoup/] for parsing HTML. It can handle bad code too, and its API is veeery easy... way better than any DOM-like tool for me. My friend used it to scrape his old phpbb forum with success. It has pretty good docs.
mechanize [http://wwwsearch.sourceforge.net/mechanize/] is a webbrowser-simulating http client library. It handles cookies, filling forms and so on. Also easy to use, but it helps if you understand how does http work.
http://dev.scrapy.org/ -- this is a relatively new thing: a whole scraping framework based on twisted. I haven't played with it much.
I use first two for my needs; f.e. it needs 20 lines of code to get an automatic testing tool for a 3-stage poll, with simulation of waiting for user entering data and so on.
I made a screen-scraper in Ruby that took like five minutes. Apparently this dude has it down to 60 seconds! I'm not sure if Ruby is as scalable or fast as what you're looking for, but I've never seen a faster route to a proof-of-concept or a prototype.
The secret is a library called "hpricot", which was built for exactly this purpose.
I don't know anything about PHP or Python or what's available for those development systems/languages.
Good luck!

Resource for developing a website

Can anyone recommend resources to learn how to develop websites, as opposed to web applications?
I am looking to develop a website for a consulting company to be precise. I would be more interested in best practices for creating the layout of a website (user appeal, eye candy, not an eye sore)
Thanks
-M
It really depends upon the language you want to use, your current skill sets, who's going to maintain the site, what OS the site will be hosted on etc etc.
I suspect you need to narrow down your question.
What do you mean by web site rather than web application? Are you talking about the dynamic nature of the content or somethign else?
update
If you're looking for discussions on design of websites (visual design, UX etc) then I'm a great fan of Smashing Magazine.
http://www.smashingmagazine.com/
It doesn't often speak about MS technologies (ASP.NET etc) but it's a great place to see discussions and papers on "what makes a great website". Some recent examples:
http://www.smashingmagazine.com/2009/05/15/optimizing-conversion-rates-its-all-about-usability/
http://www.smashingmagazine.com/2009/05/14/non-profit-website-design-examples-and-best-practices/
Subscribe to their RSS feed and see what those colouring-in people get up to.
Here's your first port of call.
Unless you're artistically inclined, I recommend purchasing or contracting the template design to someone who is skilled in this area.
For $60 a year, you can have unlimited downloads and unlimited use of all the templates at the following site:
http://www.dreamtemplate.com/
There are many more here:
http://www.templatemonster.com/website-templates.php
http://www.w3schools.com/
for purely informational sites, html, and css will probably be plenty, though I think I would reccomend using wordpress if you're just trying to put content on the internet
If you speak German or French, http://www.selfhtml.org is quite a good resource.
Otherwise, I would recommend http://www.w3schools.com/ or http://htmldog.com/. Both are very good as they really go deeply into the matter and tell about standards from the beginning.
sitepoint.com
Their best content is packaged in their books, but their articles are good, too. Covers design best-practices and web standards, but also has good tips on the business of web design and managing clients.
You may want to look at the alistapart website.
simply the best I have seen for this.
I would also - since I have just been reminded of it use
http://www.webmonkey.com/
http://w3schools.com/
http://www.w3schools.com/ is a good start.

Resources