WWW::Mechanize::Chrome loses somehow the node, when calling a different method like ->find_all_links()
No node with given id found
-32000 at /usr/local/share/perl/5.20.2/Chrome/DevToolsProtocol/Target.pm line 490
Node 447 has gone away in the meantime, could not resolve at /usr/local/share/perl/5.20.2/WWW/Mechanize/Chrome/Node.pm line 206.
No node with given id found
When I do $mech->get($url) again before fetching all links, it's working, but I don't want to re-get URLs every time when we call a method and we need to submit forms later on anyway.
my $mech = WWW::Mechanize::Chrome->new(
headless => 1,
launch_exe => '/usr/bin/google-chrome',
launch_arg => ["--headless" , "--no-sandbox"],
separate_session => 1,
);
&parse_content;
sub parse_content {
$mech->get($url);
my $content = $browser->content;
# parse some content
}
# $mech->get($url);
# fetching the links would work when getting the URL again
# but this is not the way it should work
my #links = $mech->find_all_links();
Can anybody help?
Related
Building a search with Algolia search product for our Moodle site.
To update the index at Algolia site we want to use some automated process (upload the index data).
I decided to start with the Python client (Python v 3.9.x). Since we are working inside a corporate network we are behind the firewalls and there is a problem accessing algolia's servers.
I'm testing with two methods:
Search within already existing indices: index.search
update all indexes: index.replace_all_objects
Error message: 'Unreachable hosts'
Here is my code snippets:
# Search:
searchAppID = constants.ALGOLIA_APP_ID
searchKeyHash = constants.ALGOLIA_SEARCH_KEY
config = SearchConfig(searchAppID, searchKeyHash)
config.connect_timeout = 2
config.read_timeout = 5
config.write_timeout = 30
client = SearchClient.create_with_config(config)
index = client.init_index('myIndex')
result = index.search('SearchString')
# function to update index using json file:
def replace_IdxObj(client,index,fileName):
try:
if (index.exists()):
with open(fileName, encoding="utf8") as f:
data = json.load(f)
result = index.replace_all_objects(data, { 'autoGenerateObjectIDIfNotExist' : True })
return result
else:
print('No Index found! Cancel operation... \n')
return None
except Exception as err:
print ("Error: " + str(err))
When I test my code outside of the company's network - it's all ok!
Also I have a SSL certificate that could be used to work with external resources, so I used it in the following test (running from company's network behind the firewall):
response = requests.get('https://google.com/', verify=("C:\\Program Files\\Common Files\\SSL\\certs\\cert.cer"))
print(response)
This test works just fine (I'm getting 200 response)!
Please advise are there any ways to make Python process (which is used by algolia's python client) to use the certificate that I have ??? Any alternatives?
Many thanks!
I am currently trying to perform some operations using promises in a loop but I ended up with huge memory leaks.
My problem is exactly the one pointed out in this article but as opposite to author, I am writing in coffee-script (yes, with hyphen. Which means coffeescript 1.12 and not the latest version). Thus, I am not able to use "await" key word (this is a casual guess since each time I want to use it, I got "await is not defined" error).
This is my original code (with memory leaks) :
recursiveFunction: (next = _.noop) ->
_data = #getSomeData()
functionWithPromise(_data).then (_enrichedData) =>
#doStuffWithEnrichedData(_enrichedData)
#recursiveFunction()
.catch (_err) =>
#log.error _err.message
#recursiveFunction()
So according to the article I linked, I would have to do something like that :
recursiveFunction: (next = _.noop) ->
_data = #getSomeData()
_enrichedData = await functionWithPromise(_data)
#recursiveFunction()
But then again, I am stuck because I can't use "await" key word. What would be the best approach then ?
EDIT:
Here is my real original code. What I am trying to achieve is a face-detection application. This function is located in a lib and I am using "Service" variable to expose variables between libs. In order to get frame from webcam, I am using opencv4nodejs.
faceapi = require('face-api.js')
tfjs = require('#tensorflow/tfjs-node')
(...)
# Analyse the new frame
analyseFrame: (next = _.noop) ->
# Skip if not capturing
return unless Service.isCapturing
# get frame
_frame = Service.videoCapture.getFrame()
# get frame date, and
#currentFrameTime = Date.now()
# clear old faces in history
#refreshFaceHistory(#currentFrameTime)
#convert frame to a tensor
try
_data = new Uint8Array(_frame.cvtColor(cv.COLOR_BGR2RGB).getData().buffer)
_tensorFrame = tfjs.tensor3d(_data, [_frame.rows, _frame.cols, 3])
catch _err
#log.error "Error instantiating tensor !!!"
#log.error _err.message
# find faces on frames
faceapi.detectAllFaces(_tensorFrame, #faceDetectionOptions).then (_detectedFaces) =>
#log.debug _detectedFaces
# fill face history with detceted faces
_detectedFaces = #fillFacesHistory(_detectedFaces)
# draw boxes on image
Service.videoCapture.drawFaceBoxes(_frame, _detectedFaces)
# Get partial time
Service.frameDuration = Date.now() - #currentFrameTime
# write latency on image
Service.videoCapture.writeLatency(_frame, Service.frameDuration)
# show image
Service.faceRecoUtils.showImage(_frame)
# Call next
_delayNextFrame = Math.max(0, 1000/#options.fps - Service.frameDuration)
setTimeout =>
# console.log "Next frame : #{_delayNextFrame}ms - TOTAL : #{_frameDuration}ms"
#analyseFrame()
, (_delayNextFrame)
The solution was to dispose the tensor copy sent to detectFaces.
faceapi.detectAllFaces(_tensorFrame, #faceDetectionOptions).then (_detectedFaces) =>
(...)
_tensorFrame.dispose()
(...)
I am just exploring scrapy with splash and I am trying to scrape all the product (pants) data with productid,name and price from one of the e-commerce site
gap but I didn't see all the dynamic product data loaded when I see from splash web UI splash web UI (only 16 items are loading though for every request - no clue why)
I tried with the following options but no luck
Increasing wait time upto 20 sec
By starting the docker with "--disable-private-mode"
By using lua_script for page scrolling
With view report full option splash:set_viewport_full()
lua_script2 = """ function main(splash)
local num_scrolls = 10
local scroll_delay = 2.0
local scroll_to = splash:jsfunc("window.scrollTo")
local get_body_height = splash:jsfunc(
"function() {return document.body.scrollHeight;}"
)
assert(splash:go(splash.args.url))
splash:wait(splash.args.wait)
for _ = 1, num_scrolls do
scroll_to(0, get_body_height())
splash:wait(scroll_delay)
end
return splash:html()
end"""
yield SplashRequest(
url,
self.parse_product_contents,
endpoint='execute',
args={
'lua_source': lua_script2,
'wait': 5,
}
)
Can anyone please shed some light on this behavior?
p.s : I am using scrapy framework and I am able to parse the product information (itemid,name and price) from the render.html (but render.html has only 16 items information)
I updated the script to below
function main(splash)
local num_scrolls = 10
local scroll_delay = 2.0
splash:set_viewport_size(1980, 8020)
local scroll_to = splash:jsfunc("window.scrollTo")
local get_body_height = splash:jsfunc(
"function() {return document.body.scrollHeight;}"
)
assert(splash:go(splash.args.url))
-- splash:set_viewport_full()
splash:wait(10)
splash:runjs("jQuery('span.icon-x').click();")
splash:wait(1)
for _ = 1, num_scrolls do
scroll_to(0, get_body_height())
splash:wait(scroll_delay)
end
splash:wait(30)
return {
png = splash:png(),
html = splash:html(),
har = splash:har()
}
end
And ran it in my local splash, the png doesn't work fine but the HTML has the last product
The only issue was when the email subscribe popup is there it won't scroll, so I added code to close it
I have arbitrary JSON that is sensibly laid out like this:
[
{
"id":100,
"name":"Buckeye, AZ",
"status":"OPEN",
"address":{
"street":"416 S Watson RD",
"city":"Buckeye"
...
}
}
]
I've written a node.js script like this for proof of concept (why I'm using node is that the JS API seems better supported than REST or Ruby for this. I could be wrong):
http = require('http')
Firebase = require('firebase')
all_sites_url = "http://supercharge.info/service/supercharge/allSites"
firebase_url = "https://tesla-supercharger.firebaseio.com/"
http.get(all_sites_url, (res) ->
body = ""
res.on "data", (chunk) ->
body += chunk
return
res.on "end", ->
response = JSON.parse(body)
all_sites = response
send_to_firebase(response)
return
return
).on "error", (e) ->
console.log "Got error: ", e
return
send_to_firebase = (response) ->
firebase_ref = new Firebase(firebase_url)
for charger in response
console.log charger
new_child = firebase_ref.push()
new_child.set {id: charger.id, data: charger}, (error) ->
if error
console.log "Data cound not be saved #{error}"
else
console.log "Data saved successfully"
The result is a unique id generated by Firebase, which has as a child a data and an id child. The data child has the expected information like name, status, etc.
What I'd prefer is to generate a key-value pair. E.g., for an id of 100:
- 100
- name
- address
street
city
etc. So my first question is how to accomplish this or if it is even sensible.
After the first time around, this data (call it the data from an external server) will be there and a mobile app will have added some fields. These are not present in the data already there. Next time I fetch data from the external server, I want to update things that have changed that the server would know about, like status. I don't want to tamper with things that only the mobile devices would know about like remote_observations.
I know I'm seeming a bit dense here, but I'm trying to put together a sensible data model that will be updatable from that server using a CRON job and incrementally updatable from a bunch of mobile devices.
Any help is much appreciated.
UPDATE: I have found that this works for getting the structure I want:
send_to_firebase = (response) ->
firebase_ref = new Firebase(firebase_url)
for charger in response
firebase_ref.child(charger.id).update charger, (error) ->
if error
console.log "Data could not be saved #{error}"
else
responses_pending += 1
console.log "Data saved successfully : #{responses_pending} pending"
firebase_ref.on 'value', ->
console.log "value received rp=#{responses_pending}"
process.exit() if (responses_pending -= 1) < 1
So the code I settled on is this:
http = require('http')
Firebase = require('firebase')
firebase_url = '/path/to/your/firebase'
# code to get JSON of the form:
{
"id":100,
"name":"Buckeye, AZ",
"status":"OPEN",
"address":{"street":"416 S Watson RD",
"city":"Buckeye",
"state":"AZ",
"zip":"85326",
"country":"USA"},
... etc.
}
# Asynchronous get of JSON hash from some server or other.
get_my_fine_JSON().on 'complete', (response) ->
send_to_firebase(response)
send_to_firebase = (response) ->
firebase_ref = new Firebase(firebase_url)
length = response.length
for charger in response
firebase_ref.child(charger.id).update charger, (error) ->
if error
console.log "Data could not be saved #{error}"
else
console.log "Data saved successfully"
process.exit() if length -= 1 is 0
Discussion:
The idea was to have a Firebase structure like this:
- 100
- address
street: "123 Main Street"
etc.
That's reason 1 why id is pulled up to be the primary key. Reason 2 is so that I can uniquely identify an object pulled off the external server as the "same" one in my Firebase and apply any updates necessary.
Epiphany 1: Update is more like upsert. If the key is there, whatever hash you supply replaces matching values. If it's not there, then Firebase happily adds it. Which is way cool because it covers both the push and patch cases.
Epiphany 2: This process will hang waiting for events if nothing tells it to stop. That's why the countdown index, length is decremented until the code has upserted (for lack of a better term) each item.
Observation 1: Doing this in node.js is super fast compared with REST using Python or Ruby. And this upsert stuff is wicked cool if I'm understanding it right.
Observation 2: There isn't a ton of wisdom out there as of this writing regarding writing node shell scripts to do this kind of stuff. Maybe it's a good idea, maybe a bad one. I don't know.
Observation 3: Because of the asynchronous nature of node and the Firebase Javascript API (both GOOD THINGs), terminating a process before the last bit is done can be tricky because your process has to hang on just long enough to complete its last request/response with Firebase. This is, as mentioned before, done in the completion handler of the update. Otherwise we wouldn't necessarily be complete when the process exited.
Caveat 1: Related to observation 2, this could be a bad idea, but I haven't been able to find resources that speak to the problem.
Caveat 2: This could be a horrid abuse or misunderstanding of the Firebase update API. I am reporting observed behavior in the limited case of my specific data. YMMV.
Caveat 3: I'm hoping the process lifetime is as I suggest it is in observation 3.
A note to the decaffeinated: The Javascript for this is so trivially different that it shouldn't be too tough to translate. Or go to js2coffee and paste the Coffeescript into the right pane to get real Javascript in the left pane that you can tune.
We are currently using watir-webdriver (0.6.2) with firefox to run acceptance tests.
Our tests take a long time to run, and often fail with timeout errors.
We wanted to decrease the timeout time, for them to fail faster.
We tried:
browser = Watir::Browser.new("firefox")
browser.driver.manage.timeouts.implicit_wait=3
However, we are still experiencing 30s timeouts.
We couldn't find any documentation or question regarding this issue. Does anyone know how to configure Watir waiting timeouts properly?
It depends exactly what you mean by 'timeout'. AFAIK there are three different definitions of timeout commonly discussed when talking about Watir-Webdriver:
How long does the browser wait for a page to load?
How long does Watir-Webdriver explicitly wait before considering an element 'not present' or 'not visible' when told to wait via the '.when_present' function
How long does Watir-Webdriver implicitly wait for an object to appear before considering an element 'not present' or 'not visible' (when not waiting via explicitly call see #2)
#1: Page load
Justin Ko is right that you can set page load timeout as described if your goal is to modify that, though it looks like the canonical way to do that is to set the client timeout before creating the browser and passing it to the browser on creation:
client = Selenium::WebDriver::Remote::Http::Default.new
client.timeout = 180 # seconds – default is 60
b = Watir::Browser.new :firefox, :http_client => client
- Alistair Scott, 'How do I change the page load Timeouts in Watir-Webdriver'
#2: Explicit timeout
But I think #p0deje is right in saying you are experiencing explicit timeouts, though it's not possible to say for sure without seeing your code. In the below I experienced the explicit declaration overriding the implicit (I am unsure if that's intentional):
b = Watir::Browser.new :firefox
b.driver.manage.timeouts.implicit_wait = 3
puts Time.now #=> 2013-11-14 16:24:12 +0000
begin
browser.link(:id => 'someIdThatIsNotThere').when_present.click
rescue => e
puts e #=> timed out after 30 seconds, waiting for {:id=>"someIdThatIsNotThere", :tag_name=>"a"} to become present
end
puts Time.now #=> 2013-11-14 16:24:43 +0000
Watir-Webdriver will wait 30 seconds before failure by default thanks to 'when_present'. Alternatively you can say 'when_present(10)' to alter the default and wait 10 seconds. (Watir-Webdriver > Watir::Wait#when_present.) I can not divine any way to do this globally. Unless you find such a thing - and please tell me if you do - it must be done on each call. :( Edit: Fellow answerer Justin Ko gave me the answer as to how to do what I described above. Edit 2: #jarib added this to Watir, per #justinko in the linked answer: "Update: This monkey patch has been merged into watir-webdriver and so will no longer be needed in watir-webdriver v0.6.5. You will be able to set the timeout using: Watir.default_timeout = 90"
#3 Implicit timeout
The code you provided sets the time Watir-Webdriver will wait for any element to be come present without you explicitly saying so:
b = Watir::Browser.new :firefox
b.driver.manage.timeouts.implicit_wait = 3
puts Time.now #=> 2013-11-14 16:28:33 +0000
begin
browser.link(:id => 'someIdThatIsNotThere').when_present.click
rescue => e
puts e #=> unable to locate element, using {:id=>"someIdThatIsNotThere", :tag_name=>"a"}
end
puts Time.now #=> 2013-11-14 16:28:39 +0000
The implicit_wait is the amount of time selenium-webdriver tries to find an element before timing out. The default is 0 seconds. By changing it to "3", you are actually increasing the amount of time that it will wait.
I am guessing that you actually want to change the timeout for waiting for the page to load (rather than for finding an element). This can be done with:
browser.driver.manage.timeouts.page_load = 3
For example, we can say to only wait 0 seconds when loading Google:
require 'watir-webdriver'
browser = Watir::Browser.new :firefox
browser.driver.manage.timeouts.page_load = 0
browser.goto 'www.google.ca'
#=> Timed out waiting for page load. (Selenium::WebDriver::Error::TimeOutError)
Update: Since Watir 6.5, the default timeout is configurable using
Watir.default_timeout = 3
We experienced the same issue and chose to override Watir methods involving timeouts, namely
Watir::Wait.until { ... }
Watir::Wait.while { ... }
object.when_present.set
object.wait_until_present
object.wait_while_present
Here is the code, you can put it in your spec_helper.rb if using rspec
# method wrapping technique adapted from https://stackoverflow.com/a/4471202/177665
def override_timeout(method_name, new_timeout = 3)
if singleton_methods.include?(method_name)
old_method = singleton_class.instance_method(method_name)
define_singleton_method(method_name) do |timeout = new_timeout, *args, &block|
old_method.bind(self).(timeout, *args, &block)
end
else
old_method = instance_method(method_name)
define_method(method_name) do |timeout = new_timeout, *args, &block|
old_method.bind(self).(timeout, *args, &block)
end
end
end
# override default Watir timeout from 30 seconds to 3 seconds
module Watir
module Wait
override_timeout(:until)
override_timeout(:while)
end
module EventuallyPresent
override_timeout(:when_present, 5) # 5 secs here
override_timeout(:wait_until_present)
override_timeout(:wait_while_present)
end
end
We used answer from https://stackoverflow.com/a/4471202/177665 to get it working.