How to write a simple browser with DOM access? - linux

What is an easy way to write a browser application on Linux (Crunchbang/Debian)?
I need to write an application for some DOM editing and automation. My preferred way of doing this is to have my own browser object (like WebKit's WebView or System.Windows.Forms.WebBrowser) and access the DOM from there. I tried both (with mono), but I found two things:
WebView does not implement DOM access (ref)
System.Windows.Forms.WebBrowser does not work (ref)
This means Mono is not very suitable for this purpose.
What is your preferred way for accessing web pages, reading the DOM and automating navigation?

Probably easier to use PhantomJS?
Uses Webkit to render webpages and then makes the DOM available for you:
http://phantomjs.org

Related

Detect Google Polymer browser compatibility

I am writing a project using polymer. Right now, if a browser doesn't support it most of the page is just blank. Is there a way to quickly if the users browser supports polymer and if not, prompt them to use a modern browser?
You can use the page http://caniuse.com/ and ask for the technology that is being used in the framework you want to implement, Shadow dom and Custom elements in this case I think that are the main things that polymer uses and that don't allow this framework to be used all over the place in any browser

Porting to WinJS without modifying libraries

Is there a way to port existing HTML5 apps to WinJS without modifying libraries? Going through each library and wrapping innerHTML calls with execUnsafeLocalFunction is such a dirty hack.
Could I load the base page as unsafe some how? Loading everything in an iframe is just displaying an empty white box.
You should be able to make things work with an iframe if you put it in the right context with ms-appx-web:
<iframe src="ms-appx-web:///path/to/iframe.html"></iframe>
The downside is you won't be able to get at WinRT APIs from there. Unfortunately there isn't a way to selectively disable parts of SafeHTML. While you could globally disable it by doing some hacks to replace HTMLElement.innerHTML so that it calls execUnsafeLocalFunction implicitly, that would also open you up to script injection attacks which could cause your app to be rejected from the store.
Another option is to actually navigate the root page into the web context:
window.location.href = "ms-appx-web:///path/to/iframe.html";
I believe you then may be able to have an invisible iframe back to the local context (ms-appx) and use the postMessage API to communicate across the boundary as needed.

See text in cross-frame scripting in IE

I know its a security feature in IE to not allow scripts to interact past frame tags/objects but is there a way around this?
I am using a program to interact with the IE COM interface.
For example if I were to have an tag and I wanted to use the $tagobject.innerhtml to retrieve the html inside the frame tag, what is the best way of going that this?
Native code running at full trust (e.g. an IE plugin) can interact with any document loaded in the browser. You're correct to note that interacting with subframes that are from an origin different than the top-level page is blocked by security policy when accessed in the "simple" way. But native code can circumvent that.
This support article shows how to enumerate the contents of all frames, regardless of their origin. The basic idea is that you cast the top-level document to an IOleContainer and then enumerate its children, which bypasses the security check.

Is it possible to use a script to block adware recursive links in a browser window?

There are client-side solutions for nasty adware and their recursive links, but is it possible to use a script in the html to prevent the links from displaying in a user's browser who has adware on their machine and is visiting my web site?
I am NOT a programmer. I am designer, and I know just enough to create problems that send me to forums like this.
I doubt it. Malware like that injects links and creates popups by manipulating the internals of the browser.

How should I go about rendering a webpage without using a browser?

Basically I am currently doing some research, and I am interested to find out how I could render web pages without a browser: I have some algorithms that I would like to run to calculate the visual aspect of each blocks of DOM node(s) for each page.
What you're asking for basically, is a browser rendering engine, otherwise known as a layout engine... For example, Firefox uses the Gecko layout engine to render the pages. Theoretically, you could adopt this engine for whatever project you're working on, saving you a lot of time.
The Gecko engine is used in more projects than just Firefox, and since it's open source, you could easily get the source code and try to throw it in an application.
Wikipedia has a nice list of layout engines, so there are other alternatives to Gecko, like GtkHTML.
Basically, you want to create the data structures a browser internally creates so that it knows how to render the page.
Check out the Firefox source.
I suspect it's rather complex.

Resources