Is it possible to create an app for a site without an API? - mobile-phones

I would like to create an app for a myBB forum. So the site on the forum will look nicer and much more cleaner on an iPhone or Android.
Is it possible without an API? It isn't my site ether.

everything is possible, it's just a matter of resources...
technically, you can write an app for everything on the web, but:
an API will tell you how you can do things with the site, without having to reverse engineer all pages/posts/..., and the format of every output resulting from post/get operations. reverse engineering may take a long time, and you will surely not come accross all possible results (error pages, bad authentication...);
an API is quite stable and is always updated with great care from the developpers so as not to break existing applications. without an API, there is no guarantees that your app will not break with the next release of the forum when it is upgraded;
a web API generally defines an output format which is easily parseable: many API outputs XML or JSON, which can be processed with standard libraries. without an API, the output format is plain HTML, which may be difficult to reorganize in order to show the results in a different format.
so, yes, you can definitely write an app for a myBB forum, but it may require a fair amount of work.

You can do, it's called screen scraping and is what was done before XML, the semantic web, SOAP, web services and then JSON apis tried to solve the problem better.
In screen scraping, you grab the site's HTML, parse it, get the data you want out of it, then do what you need with that data. It's more work, and breaks each time the site's layout changes, hence the history of improvements to it.
You mention the site in question is not yours. Many sites do not regard screen scraping as fair use, so check with the site's terms and conditions that you can legally create an app from the data posted there.

you can consider useing HTML5 ... do you think it doable for use app ?

Related

Hybrid App Development, Database-Driven Content

I've been doing a lot of research, and perhaps just need a few dots connected.
I have an idea for a mobile app/website that contains lists of local eating/drinking establishments along with the deals/specials they offer each day. The idea is to create an app that people can refer to in order to save money on a night out.
I'm familiar enough with HTML/CSS/JS to create a functioning website, but when it comes to backend I'm a little confused. Editing the markup in order to reflect changes (e.g. a new deal starts or new establishment opens up) is a bit cumbersome. Now I know I want a database with my information ready to be displayed on my page. Does this mean that I need to develop my own API for everything, and then make sure it integrates with the hosting website that I end up choosing?
I feel like I'm missing something that should make it obvious what the next step is. Can anyone offer any advice?
The short answer is yes, you are exactly right.
The long answer is that is definetly one way to do it. But, for large projext just using JS can get quite cumbersomoe on your client end. Usually the first level would be using something like ajax. It's a great way to start and you can go a long way with just ajax. This is acutually where most people "start" when using just javascript to make api calls. The next level would be to use a framework like Angular. This will of course do more for you than just help handle api calls and it requires a larger investment in learning.
So that is all client side...
Now for the server side part... When you publish a website you are now dealing with "server-side" content. You have taken your static content and it is served up from the server but it's always the same static content from the server then it becomes dynamic on the client when all the javascript starts getting parsed.
The API is another server side component. But instead of being static like your pages, a bunch of files just sitting there, it is an actual application on the server. It takes a command via an api request and then does its thinking and then spits out a response object dynamically to the requester, which in this case will be the JS on your site.
Now, if you don't like the idea of learning to make your own API there are resources out there that will host an api for you and give you a gui to build your own API. I can't recommend one because I have never used one, but I do work with businesses that do and they love the fact they don't have to hire a dev to make thier apis. The downside is they are tied to that service and limited to the functionality that the service offers. It's not a big limitation as the services are quire powerful but if you are going to be managing complex data sets then it would probably be better to learn to make your own api.
Hope that clears things up a bit for you!

Is there a benefit to embedding microformat information in the HTML for a web app?

Is there a benefit to implementing micro-formats (or itemscope) in the html for a web app? So far it only looks like it is useful for seo and my web app is behind a login screen, so I will not have to worry about that. Are there any plugins or browsers which automatically process the information.
Or as unor proposed, is there a benefit in adding structured data to non-public sites?
If you generalize the question like unor proposed:
Is there a benefit in adding structured data to non-public sites?
It has got a definit advantage! For someone with visual disability it is easier to navigate through sites if micro-formats are implemented. If there is a possibility that someone with a screen reader will use the application it worth the effort. Not to mention that it is a one-time task with a long term positive effect. I think it is a good thing to proactively thrive to serve all kind of user.
Answer to the original question: browsers do not have to process those information, but some advanced technology could use it in the browser like preload. There was a research at my University about which pages to reload and those were using this feature. (Eg: in free time of the processor the browser plugin preloaded home to enhance the browsing experience.)

"Sandbox" Google Analytics for security

By including Google Analytics in a website (specifically the Javascript version) isn't it true that you are giving Google complete access to all your cookies and site information? (ie. it could be a security hole).
Can this be mitigated by putting Google in an iFrame that is sandboxed? Or maybe only passing Google the necessary information (ie. browser type, screen resolution, etc)?
How can someone get the most out of Google Analytics without leaving the entire site open?
Or perhaps passing the data through my own server and then uploading it to Google?
You can create a scriptless implementation via the measurement protocol (for Universal Analytics enabled properties). This not only avoids any security issues with the script (although I'd rather trust Google on that), it also means you have more control what data is submitted to the Google Server.
A script run on your site can read cookies on your site, yes. And that data can be sent back to google, yes. That is why you shouldn't store sensitive information in cookies. You shouldn't do this even if you don't use google analytics. Even if you don't use ANY other code except your own. Browsers and browser addons can also read that stuff and you definitely cannot control that. Again, never store sensitive information in cookies.
As far as access to "site information".. javascript can be used to read the content on your pages, know urls of pages, etc.. IOW anything you serve up on a web page. Anything that is not behind a wall (e.g. login barrier) is surely up for grabs. But crawlers will look at that stuff anyway. Stuff behind walls can still be grabbed automatically, depending on what they have to actually do to get past those walls (e.g. simple registration/login barriers are pretty easy to get past).
This is also why you should never display sensitive information even in content of your site. E.g. credit card numbers, passwords, etc.. that's why virtually every site you go to that has even remotely sensitive information always shows a mask (e.g. ** ) instead of actual values.
Google Analytics does not actively do these things, but you're right: there's nothing stopping them from doing it, and you've already given them the right to do it by using their script.
And you are right: the safest way to control what Google can actually see is to send server-side requests to them. And also put all your content behind barriers that cannot be easily crawled or scraped. The strongest barrier being one that involves having to pay for access. People are ingenious about making bots about making crawlers and bots to get past all sorts of forms and "human" checks etc.. and you're fighting a losing battle on that count, but nothing stops a bot faster than requiring someone to give you money to access your stuff. Of course, this also means you'd have to make everybody pay for access...
Anyways.. if you're that paranoid about this stuff, why use GA at all? Use something you host yourself (e.g. Piwik). This won't solve for crawlers/bots, obviously, but it will solve for worries about GA grabbing more than you want it to.

SPA Architecture questions

This post is intended to start a deeper discussion on Single Page Applications for the web. There are questions that do not seem to have a clear answer in most resources on the subject.
They are in my mind
Authorization and authentication.
With entire web app being on the client, it may make calls to the server in any of its functions, even those that the user does not have rights to. The fact that the user cannot see a menu, does not preclude that person from invoking java script functions. This is easily handled in MVC app, for example, by using controllers that validate user rights to a specific function based on a cookie for example. However, some SPA apps just use single controller with Breeze or Web Api, which make authorization server side impossible.
Memory management on the client
For small sample apps this is not an issue, but imagine an app with 100's of screens or an app with a single screen that pulls thousands of records over the course of one day. With persistent caching one could imagine large memory issues, especially on under-powered devices with little RAM, like phones or tablets. How can a group of developers had SPA route without a clear vision of handling memory management?
Three Tier deployment
Some IT departments will never allow applications with a connection string to a database located on front end web servers. Every SPA demo I have seen is structured exactly like that, including Breeze or Web Api for that matter.
Unobtrusive validation.
It would require developers to use MVC partial views and controllers instead of just HTML files, which seems to fly in the face of SPA concepts, while it provides a very robust way to easily incorporate validation and UI to support it into web applications.
Exposing primary integer based keys in the url.
This is non-no in OWASP.
As a result, SPA applications "seem" to target areas with few security requirements and small feature sets. What do you think?
Thanks.
#Sergey - I think this is just too broad a question for StackOverflow. S.O. isn't a discussion forum; it's a place to go for specific answers. So while your questions are potentially valid, I don't think you should hold out much hope for deep substantive responses here.
May I add, in the friendliest possible way, that your sweeping, unsupported, and negative statements make you look like a troll. You're not a troll are you Sergey?
On the chance that you are in fact authentically concerned, I offer a few quick reactions, particularly as they pertain to Breeze.
Authorization. In Web API you can authorize at the method level. The ApiController base class has a User property that returns the IPrincipal. So whether you have one controller or many (and you can have many in Breeze if you want), the granularity is method level, not just class level.
Memory management. Desktop developers have coped with this concern for years. It may cause you some astonishment if you've always developed traditional web apps where process lifetimes are brief. But long-running processes are not news to those of us who built large apps in desktop technologies such as WinForms, WPF, and Silverlight. The issues and solutions are much the same in the land of HTML and JavaScript.
Layers on the backend. You've been looking at demos too long. Yes most demos dump everything into one project running on one server. We assume you know how to refactor the server to meet scaling, performance and security requirements for your environment. Our demos are concerned mostly with front-end SPA development. We do dabble at the service boundary to show how data flow through a service API, through an ORM, through to the database. We thought it sufficient to identify these distinct layers and leave as an exercise for the reader the comparatively trivial matter of moving these layers to different tiers. We may have to re-visit that assumption someday. But does anyone seriously believe that there are significant obstacles to distributing layers/responsibilities across server-side tiers? Really? Like what?
Unobtrusive validation. When most people start using the word "unobtrusive" in connection with HTML, they are usually making a point about keeping JavaScript out the HTML. Perhaps that's what you mean too, in which case SPA developers everywhere agree ... and that's why there are numerous "unobtrusive validation" libraries available. HTML 5 validation, jQuery validation and Knockout validation come to mind. All of them are in the SPA developer's toolkit and none of them "require developers to use MVC partial views and controllers". What gives you the impression that a SPA would need any server-side resources of any kind to implement validation with JavaScript-free HTML markup?
Ids as security risk. Really? This is bogus. The key value is no more a security risk than any other data value. Millions of applications - not just SPAs - communicate key values to the client, both in the URL and in the body. It's standard in REST APIs. It's standard in ODATA. And you want to dismiss them all by saying that they "target areas with few security requirements and small feature sets"? Good luck with that. I think you'll have to do better than rest your case on a link to a relatively obscure organization's entire web site.
I have built some SPA applications, ranging from small to large (over 100 scripts and views). Only a handful of them had every view accessible to the public. The rest went through a strict access structure. It was so simple to return a 401 unauthorized from the server and the client just handling the 401 to redirect it to the login screen. Mr. Ward and Mr. Papa put it right. Get out of the Demo mode and try to find solutions to the issues you come across. I have watched John Papa's SPA on pluralsight, gone through numerous articles and applications on Breeze and I have to tell you, none of my applications use Breeze to do queries from the client side, because YOU DON'T NEED TO!!
Moreover, I have only extended what I have learnt and come up with my own way of solving problems. This is not an answer to your queries, but I only can provide a short comment. No technique is perfect and there is no ONE way to do everything. My server side is locked down where it needs to be locked down, my routes on the client side are locked down (if using durandal take a look at guardRoute), my scripts are minified and my images are sprited (if there is a word like that). All in all, SPA is a great technique, you got to find solutions to the quirks!

Are there any building blocks for a search engine that will scrape other sites?

I want build a search service for one particular thing. The data is freely available out there, via free classified services, and a host of other sites.
Are there any building blocks, e.g. open-source crawlers that I would customize - rather than build from scratch, that I can use?
Any advice on building such a product? Not just technical, but any privacy/legal things that I might need to take into consideration.
E.g. do I need to 'give credit' where the results are from and put a link to the original - if I get them from many places?
Edit: By the way, I am using GWT with JS for the front-end, haven't decided on the language for the back-end. Either PHP or Python. Thoughts?
There are few blocks in python you can use.
beautifulsoup [http://www.crummy.com/software/BeautifulSoup/] for parsing HTML. It can handle bad code too, and its API is veeery easy... way better than any DOM-like tool for me. My friend used it to scrape his old phpbb forum with success. It has pretty good docs.
mechanize [http://wwwsearch.sourceforge.net/mechanize/] is a webbrowser-simulating http client library. It handles cookies, filling forms and so on. Also easy to use, but it helps if you understand how does http work.
http://dev.scrapy.org/ -- this is a relatively new thing: a whole scraping framework based on twisted. I haven't played with it much.
I use first two for my needs; f.e. it needs 20 lines of code to get an automatic testing tool for a 3-stage poll, with simulation of waiting for user entering data and so on.
I made a screen-scraper in Ruby that took like five minutes. Apparently this dude has it down to 60 seconds! I'm not sure if Ruby is as scalable or fast as what you're looking for, but I've never seen a faster route to a proof-of-concept or a prototype.
The secret is a library called "hpricot", which was built for exactly this purpose.
I don't know anything about PHP or Python or what's available for those development systems/languages.
Good luck!

Resources