I am trying to create chrome extension that will scrap data from my webpage and then will display it in browser action window. I wanted to use background page for this, cause if i understand extensions correctly, it is only element capable of non-stop working, without need of visible tab.
The problem is, the script i wrote for background.js doesn't work properly, when i use background.js:
var location = window.location.href = 'http://localhost/index.php';
console.log(location);
manifest.json:
"background": {
"scripts": ["src/background/background.js"]
},
The answer i get is chrome-extension://some_random_text/_generated_background_page.html.
It is possible to use background pages to navigate to my webpage, then fill some forms and scrap data for later use?
This is an old question, but I recently wanted to do exactly the same.
So I'll provide an answer for others who are interested.
Setting window.location still does not work in Chrome52.
There is a workaround though. You can first fetch the web page with fetch(), and then use document.write to set the content.
This works fine, and you can then query the document and do everything you want with it.
Here is an example. (Note that I'm using the fetch API, arrow functions and LET, which all work fine now in Chrome52).
fetch("http://cnn.com").then((resp) => {
return resp.text();
}).then((html) => {
document.open("text/html");
document.write(html);
document.close();
// IMPORTANT: need to use setTimeout because chrome takes a little
// while to update the document.
setTimeout(function() {
let allLinks = document.querySelectorAll('a');
// Do something with the links.
}, 250);
});
A chrome extension has two main parts, the extension process and the browser itself. The Background Page works on the extension process. It does not have direct access and information about your webpages.
To have scripts working non-stop on your webpages, you will need to use Content Scripts.
You can then communicate between your Content Script and your Background Page using messages
contentScript.js
var location = window.location.href = 'http://localhost/index.php';
chrome.runtime.sendMessage({location: location}, function(response) {});
background.js
chrome.runtime.onMessage.addListener(
function(request, sender, sendResponse) {
console.log(request.location);
});
Recently, i integrate node and phantomjs by phantomjs-node. I opened page that has iframe element, i can get the hyperlink element of iframe, but failed when i execute click on it.
Do you have a way? Anyone can help me?
example:
page.open(url);
...
page.evaluate(function(res){
var childDoc = $(window.frames["iframe"].document),
submit = childDoc.find("[id='btnSave']"),
cf = submit.text();//succeed return text
submit.click()//failed
return cf;
},function(res){
console.log("result="+res);//result=submit
spage.render("test.png");//no submit the form
ph.exit();
});
You can't execute stuff in an iframe. You can only read from it. You even created a new document from the iframe, which will only contain the textual representation of the iframe, but it is in no way linked to the original iframe.
You would need to use page.switchToFrame to switch to the frame to execute stuff on the frame without copying it first.
It looks like switchToFrame is not implemented in phantomjs-node. You could try node-phantom.
If the iframe is on the same domain you can try the following from here:
submit = $("iframe").contents().find("[id='btnSave']")
cf = submit.text();
submit.click()
If the iframe is not from the same domain, you will need to create the page with web security turned off:
phantom.create('--web-security=false', function(page){...});
I've start with phantom.js (btw I'm in love). I'm trying to make the headless browser go to my php admin panel, log in with a username and password, and from the page that it redirects to after log in i want to get some text from a div tag.
So far I manage to successfully fill the fields, create a click event, and even find the access to the DOM part of the div tag and get the inner.Text.
The only missing part for me is what to do when phantom.js clicks on a button (the log in button in this case) which will log me in and change the page content. I can't find how to handle after .click(); event.
This is the code I made so far (by the way its a good way to start with...)
var page = new WebPage();
page.open("the url comes here",
function(status){
if(status != "success"){console.log('fail loading the page');}
page.evaluate(function(){
var arr = document.getElementsByName("formname");
arr[0].elements["username"].value="username here";
arr[0].elements["password"].value="password here";
arr[0].elements["submit"].click();
return;
}
phantom.exit()
});
The code i want run on the page that comes after it is
console.log(window.frames[1].document.getElementById('status').innerHTML)
So the only question remaining is how to handle the redirect and launch the script on the other page.
Thanks,
You need to setup a new callback for the page load:
page.onLoadFinished = function(status){
console.log(window.frames[1].document.getElementById('status').innerHTML)
}
this should come right before triggering .click().
How do you prevent Firefox and Safari from caching iframe content?
I have a simple webpage with an iframe to a page on a different site. Both the outer page and the inner page have HTTP response headers to prevent caching. When I click the "back" button in the browser, the outer page works properly, but no matter what, the browser always retrieves a cache of the iframed page. IE works just fine, but Firefox and Safari are giving me trouble.
My webpage looks something like this:
<html>
<head><!-- stuff --></head>
<body>
<!-- stuff -->
<iframe src="webpage2.html?var=xxx" />
<!-- stuff -->
</body>
</html>
The var variable always changes. Although the URL of the iframe has changed (and thus, the browser should be making a new request to that page), the browser just fetches the cached content.
I've examined the HTTP requests and responses going back and forth, and I noticed that even if the outer page contains <iframe src="webpage2.html?var=222" />, the browser will still fetch webpage2.html?var=111.
Here's what I've tried so far:
Changing iframe URL with random var value
Adding Expires, Cache-Control, and Pragma headers to outer webpage
Adding Expires, Cache-Control, and Pragma headers to inner webpage
I'm unable to do any JavaScript tricks because I'm blocked by the same-origin policy.
I'm running out of ideas. Does anyone know how to stop the browser from caching the iframed content?
Update
I installed Fiddler2 as Daniel suggested to perform another test, and unfortunately, I am still getting the same results.
This is the test I performed:
Outer page generates random number using Math.random() in JSP.
Outer page displays a random number on the webpage.
Outer page calls iframe, passing in a random number.
Inner page displays a random number.
With this test, I'm able to see exactly which pages are updating, and which pages are cached.
Visual Test
For a quick test, I load the page, navigate to another page, and then press "back." Here are the results:
Original Page:
Outer Page: 0.21300034290246206
Inner Page: 0.21300034290246206
Leaving page, then hitting back:
Outer page: 0.4470929019483644
Inner page: 0.21300034290246206
This shows that the inner page is being cached, even though the outer page is calling it with a different GET parameter in the URL. For some reason, the browser is ignoring the fact that the iframe is requesting a new URL; it simply loads the old one.
Fiddler Test
Sure enough, Fiddler confirms the same thing.
(I load the page.)
Outer page is called. HTML:
0.21300034290246206
<iframe src="http://ipv4.fiddler:1416/page1.aspx?var=0.21300034290246206" />
http://ipv4.fiddler:1416/page1.aspx?var=0.21300034290246206 is called.
(I navigate away from the page and then hit back.)
Outer page is called. HTML:
0.4470929019483644
<iframe src="http://ipv4.fiddler:1416/page1.aspx?var=0.4470929019483644" />
http://ipv4.fiddler:1416/page1.aspx?var=0.21300034290246206 is called.
Well, from this test, it looks as though the web browser isn't caching the page, but it's caching the URL of the iframe and then making a new request on that cached URL. However, I'm still stumped as to how to solve this issue.
Does anyone have any ideas on how to stop the web browser from caching iframe URLs?
This is a bug in Firefox:
https://bugzilla.mozilla.org/show_bug.cgi?id=356558
Try this workaround:
<iframe src="webpage2.html?var=xxx" id="theframe"></iframe>
<script>
var _theframe = document.getElementById("theframe");
_theframe.contentWindow.location.href = _theframe.src;
</script>
I have been able to work around this bug by setting a unique name attribute on the iframe - for whatever reason, this seems to bust the cache. You can use whatever dynamic data you have as the name attribute - or simply the current ms or ns time in whatever templating language you're using. This is a nicer solution than those above because it does not directly require JS.
In my particular case, the iframe is being built via JS (but you could do the same via PHP, Ruby, whatever), so I simply use Date.now():
return '<iframe src="' + src + '" name="' + Date.now() + '" />';
This fixes the bug in my testing; probably because the window.name in the inner window changes.
As you said, the issue here is not iframe content caching, but iframe url caching.
As of September 2018, it seems the issue still occurs in Chrome but not in Firefox.
I've tried many things (adding a changing GET parameter, clearing the iframe url in onbeforeunload, detecting a "reload from cache" using a cookie, setting up various response headers) and here are the only two solutions that worked from me:
1- Easy way: create your iframe dynamically from javascript
For example:
const iframe = document.createElement('iframe')
iframe.id = ...
...
iframe.src = myIFrameUrl
document.body.appendChild(iframe)
2- Convoluted way
Server-side, as explained here, disable content caching for the content you serve for the iframe OR for the parent page (either will do).
AND
Set the iframe url from javascript with an additional changing search param, like this:
const url = myIFrameUrl + '?timestamp=' + new Date().getTime()
document.getElementById('my-iframe-id').src = url
(simplified version, beware of other search params)
After trying everything else (except using a proxy for the iframe content), I found a way to prevent iframe content caching, from the same domain:
Use .htaccess and a rewrite rule and change the iframe src attribute.
RewriteRule test/([0-9]+)/([a-zA-Z0-9]+).html$ /test/index.php?idEntity=$1&token=$2 [QSA]
The way I use this is that the iframe's URL end up looking this way: example.com/test/54/e3116491e90e05700880bf8b269a8cc7.html
Where [token] is a randomly generated value. This URL prevents iframe caching since the token is never the same, and the iframe thinks it's a totally different webpage since a single refresh loads a totally different URL :
example.com/test/54/e3116491e90e05700880bf8b269a8cc7.html
example.com/test/54/d2cc21be7cdcb5a1f989272706de1913.html
both lead to the same page.
You can access your hidden url parameters with $_SERVER["QUERY_STRING"]
To get the iframe to always load fresh content, add the current Unix timestamp to the end of the GET parameters. The browser then sees it as a 'different' request and will seek new content.
In Javascript, it might look like:
frames['my_iframe'].location.href='load_iframe_content.php?group_ID=' + group_ID + '×tamp=' + timestamp;
I found this problem in the latest Chrome as well as the latest Safari on the Mac OS X as of Mar 17, 2016. None of the fixes above worked for me, including assigning src to empty and then back to some site, or adding in some randomly-named "name" parameter, or adding in a random number on the end of the URL after the hash, or assigning the content window href to the src after assigning the src.
In my case, it was because I was using Javascript to update the IFRAME, and only switching the hash in the URL.
The workaround in my case was that I created an interim URL that had a 0 second meta redirect to that other page. It happens so fast that I hardly notice the screen flash. Plus, I made the background color of the interim page the same as the other page, and so you notice it even less.
It is a bug in Firefox 3.5.
Have a look..
https://bugzilla.mozilla.org/show_bug.cgi?id=279048
I set iframe src attribute later in my app. To get rid of the cached content inside iframe at the start of the application I simply do:
myIframe.src = "";
... somewhere in the beginning of js code (for instance in jquery $() handler)
Thanks to
http://www.freshsupercool.com/2008/07/10/firefox-caching-iframe-data/
I also had this problem in 2016 with iOS Safari. What seemed to work for me was
giving a GET-parameter to the iframe src and a value for it like this
<iframe width="60%" src="../other/url?cachebust=1" allowfullscreen></iframe>
I also met this issue, after trying different browsers, and a ton of trial and error, I came up with this solution, which works well in my case:
import { defineComponent } from 'vue'
import { v4 as uuid } from 'uuid'
export default defineComponent({
setup() {
return () => (
// append a uuid after `?` to prevent browsers from caching it
<iframe src={`https://www.example.com?${uuid()}`} frameborder='0' />
)
},
})
If you want to get really crazy you could implement the page name as a dynamic url that always resolves to the same page, rather than the querystring option?
Assuming you're in an office, check whether there's any caching going on at a network level. Believe me, it's a possibility. Your IT folks will be able to tell you if there's any network infrastructure around HTTP caching, although since this only happens for the iframe it's unlikely.
Have you installed Fiddler2?
It will let you see exactly what is being requested, what is being sent back, etc. It doesn't sound plausible that the browser would really hit its cache for different URLs.
Make the URL of the iframe point to a page on your site which acts as a proxy to retrieve and return the actual contents of the iframe. Now you are no longer bound by the same-origin policy (EDIT: does not prevent the iframe caching issue).
I was looking for a way to detect the browser extension I am building from my website and I need to alert my users in-case they are viewing my site without it. I have been able to do this in firefox, but I want to know is there a way I can do this in Google Chrome? Even if there is a hack to get this going I am fine.
Sure. Create a content script specific to you site in the extension, and make it add an invisible marker in the DOM, eg:
$('body').append('<div style="display: none;" class="extension_enabled" />');
In the page, set a short timeout to check for this after the document is fully loaded, eg:
$(function() {
setTimeout(function() {
if ($('.extension_enabled').length > 0) {
alert('Installed!');
} else {
alert('Not installed.');
}
}, 500);
});
NOTE: Code in jQuery format for simplicity. You can do it with raw javascript, of course.
The official Google Chrome Extensions Developers' Guide has an item covering exactly this.