Is it possible for apache nutch to download a file from a url after button clicking and index it?
Example - Suppose the url is http://example.com
File is downloaded after clicking a button on http://example.com and getting saved. How can we do it in apache nutch.
This really deppends on how the button is implementend, if the download action is just a link to the direct file it should work fine. If the download action happens through some javascript event or in a form with a <button> element then Nutch would not detect this. Perhaps using the protocol-selenium could help if the link is generated using some javascript.
EDIT
Since the button is triggered on something like the onclick event, then your best bet is to use protocol-interactiveselenium (https://github.com/apache/nutch/blob/master/src/plugin/protocol-interactiveselenium/README.md) and perhaps implement a custom handler if you need to. I haven't tested this personally but should work.
Related
I am trying to crawl the listings on this website via scrapy: https://www.hipflat.com/search/rent/condo_y/TH.BM_r1/any_r2/any_p/any_b/any_a/any_w/any_i/100.560155,13.737171_c/16_z/list_v
However, I am stuck with the navigation. At the bottom of the page the links for "next page" show up. But as far as I can see it, they call an external site (algolia) via a JavaScrip-Query.
What would be the easiest way to make the navigation crawlable via scrapy?
The next page link is present in the page. You can get it using response.css("[rel='next']" ::attr("href")). This will provide you the next link for pagination. Now you can simply proceed with GET request using response.follow(url=,callback=).
Is is possible to load special javascript code (created by me) to my browser console (mozilla firefox) on every redirect page? I want to create a special script-robot, which will be do special actions on some website. For example:
Click button: "Add new advert"
... NOW IS REDIRECT
Write to input[name='title'] advert title
Write to input[name='description'] advert description
Click button: "Add now!"
... NOW IS REDIRECT
Click button: "Logout".
So I want to include my own javascript code to my console and I want automatically reinclude this file on every page redirect. Is is possible?
Thanks.
yes, it is possible. What you would do is create your own google extension with links to a js file to run. Here are some instructions on how to make said extension.
I'm trying to have a controller action in my app, upon a button being clicked by a user, attach the requested file and redirect all in one controller action.
I'm having trouble doing so though, because if I place a res.redirect after my res.attachment() ; res.send(), the file won't send, and if I leave out the redirect, the file sends but no redirect occurs. How should I handle this?
Based on your stated goal, this isn't something you would handle on the back end. Instead, just have your download button redirect to the new page, and add code to that page to automatically start the download. Lots of suggestions for how to auto-start the download can be found in this SO question.
I have a viewPanel on an xpage that links to documents in different databases. I am able to click links to open the documents.After submit the document, i want to redirect the url to the previous viewPanel. Now the problem is that i can't open the viewPanel again and the url history(context.getHistoryUrl) is blank for redirecting to different databases. Any tips on how to get the URL history?
If you are executing a context.redirecttopage to redirect the user you could change this code so that when you're update has finished the oncomplete is being executed and does a redirect clientside to the correct url. To compute the correct url you can use the code you gave in the answer to my comment.
more info here: http://xpag.es/?192E
I am trying to do url rewriting in sharepoint . I have done something and its working fine but the problem is when I click on by default controls in sharepoint like edit page, approve or any links they are pointing to the old one and not to the new one and because of that I m getting 404 not found.
If anyone is having idea how to solve this in sharepoint .I have seen postback posts of Scott but in that he has mentioned postback with controls you create in asp.net add form browser but what abt existing ones in sharepoint. do I need to add something in the master page.
Any help would really be appreciated.
You could try overriding SharePoint's default postback handler (javascript function) using either an HttpModule, or by creating a control that replaces the old url for the new one, generating an postback function overide that uses that "translated" url. Then add the control to the masterpage.
Not too sure if this is possible. My guess is that you might run into some request validation issues when you do this.
EDIT:
Read scott guthrie article about this subject: article
How do you do your rewriting, e.g. what are your rules?
Mine does it in such a way the normal urls still work. Page editing is done through using the normal urls, so you don't get problems like this.