How to detect page language/locale in a Chrome extension content script?

How to detect page language/locale in a Chrome extension content script? - google-chrome-extension

I would like my Chrome extension content script to detect the language or locale of the page's content (not the browser language/locale). I assume there is a method for this in the Chrome extension API, but should I be using standard Javascript libraries instead?

This is the Chrome extension method: chrome.tabs.detectLanguage(...). From the description:
Detects the primary language of the content in a tab.

You could use standard javascript DOM functions to look for a lang attribute on the root html element (or possibly the body element). But keep in mind that a page might not be entirely in one language, so different elements of the page may be marked up with different lang attributes.
Also, if you want to support xhtml, I'd suggest looking the xml:lang attribute as well.

Related

In Chrome extensions, why use a background page with HTML?

I understand that the background page of a Chrome extension is never displayed. It makes sense to me that a background page should contain only scripts. In what situations would HTML markup ever be needed?
At https://developer.chrome.com/extensions/background_pages there is an example with an HTML background page, but I haven't been able to get it to work (perhaps because I am not sure what it should be doing).
Are there any examples of simple Chrome extensions which demonstrate how HTML markup can be useful in a background page?

Historical reasons
The background page is, technically, a whole separate document - except it's not rendered in an actual tab.
For simplicity's sake, perhaps, extensions started with requiring a full HTML page for the background page through the background_page manifest property. That was the only form.
But, as evidenced by your question, most of the time it's not clear what the page can actually be used for except for holding scripts. That made the entire thing being just a piece of boilerplate.
That's why when Chrome introduced "manifest_version": 2 in 2012 as a big facelift to extensions, they added an alternative format, background.scripts array. This will offload the boilerplate to Chrome, which will then create a background page document for you, succinctly called _generated_background_page.html.
Today, this is a preferred method, though background.page is still available.
Practical reasons
With all the above said, you still sometimes want to have actual elements in your background page's document.
<script> for dynamically adding scripts to the background page (as long as they conform to extension CSP).
Among other things, since you can't include external scripts through background.scripts array, you need to create a <script> element for those you whitelist for the purpose.
<canvas> for preparing image data for use elsewhere, for example in Browser Action icons.
<audio> for producing sounds.
<textarea> for (old-school) working with clipboard (don't actually do this).
<iframe> for embedding an external page into the background page, which can sometimes help extracting dynamic data.
..possibly more.
It's debatable which boilerplate is "better": creating the elements in advance as a document, or using document.createElement and its friends as needed.
In any case, a background page is always a page, whether provided by you or autogenerated by Chrome. You can use all the DOM functions you want.

My two cents:
Take Google Mail Checker as an example, it declares a canvas in background.html
<canvas id="canvas" width="19" height="19">
Then it could manipulate the canvas in background.js and call chrome.browserAction.setIcon({imageData: canvasContext.getImageData(...)}) to change the browser action icon.
I know we could dynamically create canvas via background.js, however when doing something involving DOM element, using html directly seems easier.

Why can the background page be an html file?

In manifest.json, we specify our background page and can put an html or a js file for it. Since it is only a script that executes what sense does it make to have an html file for it?
I mean where is UI going to get shown anyway?
Similarly the devtools_page property has to be an html file. What sense does that make?

It will not be shown anywhere (that's the essence of "background"), but some elements on it make sense.
You can have an <audio> tag, and if you play it, it will be heard.
You can have an <iframe> with some other page loaded invisibly.
..and so on
As for devtools_page, it would actually be visible in the interface (as an extra panel in the DevTools)
It is possible that devtools_page must be an HTML file just for legacy reasons: it was not updated when manifest version 2 rolled out with changes to how background pages are specified. Still, the same arguments as above apply.

background_page is a legacy feature from the initial support of extensions in Chrome. background.scripts was added in Chrome 18. I can't speak for Google's original intentions but I'd guess that in the original design using an page felt more natural and would be less likely to confuse developers. Once they realized how many background_pages were just being used to load JavaScript it made sense to explicitly support that.

Easy way to get hyperlink info from rendered web page

I'd like do this programmatically:
Given a page URL, I need to get all links on the page. What's important is that at least 3 pieces of link info must be obtained: anchor text, href attribute value, absolute position of the link on the page.
Java CSSBox library is an option, but it's not fully implemented yet(the href attribute value cannot be obtained at the same time and some extra mapping must be done with additional library such as Jsoup). What's more, the CSSBox library renders a page really slow.
It seems that Javascript has all functions available but we have to inject the javascript code into the page and write a driver to take advantage of existing browsers. Scripting languages such as Python and Ruby have support for this as well. It is hard for me to find out the most handy tool.

Does PHP's DOM manipulation library help you? http://www.php.net/manual/en/book.dom.php

How does browser detect embedded web content from a HTML page?

Once a browser gets the main html page, how does it know which are the embedded content should be request again from web server, and which are only external links? Is it based on type of tags, e.g ?
If so, could someone give me a reference of what these tags are?
Thanks.

The HTML5 spec defines the element category "Embedded content":
Embedded content is content that imports another resource into the document, or content from another vocabulary that is inserted into the document.
It lists the following elements:
audio
canvas
embed
iframe
img
math
object
svg
video
Elements like link or script (both in Metadata category) can also refer to other ressources that user-agents (browsers, screen-readers, …) are free to link to or include or do whatever they want to do with it. For example, browsers like Firefox or Chromium will (by default) load and "apply" CSS that is linked within the link element, that has the rel value = stylesheet. Browsers like Lynx or w3m won't do that. They simply ignore that link.
For link, HTML5 states which link types "are links to resources that are to be used to augment the current document, generally automatically processed by the user agent":
Two categories of links can be created using the link element: Links to external resources and hyperlinks. The link types section defines whether a particular link type is an external resource or a hyperlink.
Maybe also consider the style attribute (for inline CSS), which could include a background-image url.

Yes, the tags help browser identify the resources to load. After downloading/retrieving the content the browser determines what to do with the content based on the content-type header in the response.

Is it possible to create custom XUL elements from XPCOM or NPAPI?

I was wondering if it is possible to create a new XUL component via any available api, such as XPCOM or NPAPI, so we can use it our XUL files.
Let's say I wanted to clone the XULs vbox's components code and add a few modifications to it, so we could use our custom XUL component just like this:
<window>
<myvbox mycustomarg1="customValue"> Some content... </myvbox>
</window>
I know what XBL is and what is used for and it doesn't fit our need.
Any suggestion of how to achieve that?
Edit:
We need to create a browser component in Firefox as child of another browser object. The problem is some websites detect this child browser as iframe and we want to avoid this.
Thanks.

If the point is preventing a webpage loaded into a frame from messing with your XUL document then you should use <browser type="content"> - this establishes a security boundary between chrome and content which (among other things) prevents the content document from accessing its parent frame. It is important however that your XUL document itself is loaded as chrome and not content (by either being on top level or inside <browser type="chrome">). See https://developer.mozilla.org/en/XUL/Attribute/browser.type for documentation.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string