LinkDb: adding segment & SolrIndexer takes lots of time - nutch

Below is the command that I am running to index pages.
bin/nutch crawl bin/urls -solr http://localhost:8983/solr/ -dir crawl -depth 2 -topN 15
The fetching happens pretty quickly but LinkDb:adding segments and SolrIndexer steps are taking lot of time, as I run above command repeatedly the time increases. My requirement is such that I want to index pages as fast as possible because links disappear pretty quickly (within 2 mins). I want to decrease this time to a very small figure, what should I do to make this possible?
If I only wanted to index URL and title of the page, will doing so do any good to indexing speed?
Thanks

If you have a static seedlist then you can delete "crawl" folder each time you want to run the nutch! it would save a lot's of time for you!
every time you run nutch your segments growth so linkdb gonna take more time!
Also you can create a thread and pass this part of job to it, but you have to handle segmenting buy yourself!

Related

How to increase gif generation time

I have a web page with an IMG tag and a link my-server.com/animationID.gif.
When someone opens a web page, my server generates a new GIF animation which appears on the web page.
I'm using the gifencoder node package to generate dynamic 60-frame animations.
The animation updates every second, so I don't really see a good way to cache it...
It takes 1-3 seconds to generate the animation which is very slow.
A few years ago I used services like countdownmail and mailtimers which generate a 60 frame countdown timers. Somehow, they manage to generate it very fast in less than 0.5-1 second.
After some debugging it seems that the addFrame method takes the most of time (and it's called 60 times).
encoder.addFrame(ctx);
Is there a way to increase the generation speed or cache the animation?

Baseline Scan ZAP (OWASP) on a defined list of urls

Is it possible to define a list of URLs that the ZAP baseline (https://www.zaproxy.org/docs/docker/baseline-scan/) scan should scan? The default behaviour is that it runs for one minute. I only want 20 defined URLs to be scanned.
It the moment I use the docker container with the following parameters:
docker run -t owasp/zap2docker-stable zap-baseline.py -t https://www.example.com
It will run for up_to one minute (by default). If your app has only 20 urls then it will hopefully find them much faster than that. If it takes 2 seconds to find them then thats how long it will take to find them. The passive scanning will take a bit longer, but hopefully not too long.

FabricJS ApplyFilter is slow on large images

Taking a very basic stock example such as the redify filter, with a large image (1200x1024) I was trying to determine why it takes (what I think) is too long. After some investigating, I find that the delay occurs in fabricjs::ApplyFilter, where replacement.src = canvasEl.toDataURL('image/png'); (line 17933 in 1.6.2). That take a long time, even compared to the complete pixel run through by the filter.
Is there some way around this? Can I do something differently to speed up the process? TIA

Watir Measure Page Performance

Ive found this gem : http://watirwebdriver.com/page-performance/
But i cant seem to understand what this measures
browser.performance.summary[:response_time]/1000
Does it start measuring from the second i open the browser?
Watir::Browser.new :chrome
or from the last Watir-webdriver command writen?
And how can i set when it starts the timer?
** I've tried several scripts but i keep getting 0 seconds
Thats why im not sure.**
From what I have read (I have not actually used it on a project), the response_time is the time from starting navigation to the end of the page loading - see Tim's (the gem's author) answer in a previous question. The graphical image on Tim's blog will helps to understand the different values - http://90kts.com/2011/04/19/watir-webdriver-performance-gem-released/.
The gem is for getting performance results of single response, rather than overall usage of a browser during a script. So there is no need to start/stop the timer.
If you are getting 0 seconds, it likely means that the response_time is less than 1000 milliseconds (ie in Ruby, doing 999/1000 gives 0). To make sure you are getting something non-zero, try doing:
browser.performance.summary[:response_time]/1000.0
Dividing by 1000.0 will ensure that you get the decimal values (eg 0.013).

MODX getChunk alternative

I'm looking for MODX getChunk() alternative mostly because it seems to be really slow when outputting a lot of times.
When I use it once in a snippet then I could hardly notice its speed, but if it's used in a loop then each second matters.
I'm outputting ~1300 images 100 per page as part of the gallery and it takes:
6-7 seconds when the output is placed in a chunk $output .= $modx->getChunk('chunkname');
2-3 seconds when the output is plain HTML
Does anyone know faster alternative to output the result of image query using chunk?
What does your chunk look like?
You might consider abandoning the getChunk() call and just inlining your html:
$output = '';
foreach ($images as $img) {
$output .= '<li><a href="'.$img['path'].'" alt="'.$img['name'].'" /></li>';
}
return $output;
Yeah yeah, it's bad practise but when faced with the alternative taking more than twice as long it's not a bad optimisation.
There's another solution from more of an architectural level - 1300 images is a huge amount to load on one page!
Depending on your design, why not load the first 20 - 30 and implement some kind of infinite scrolling, loading in the rest by ajax (in lots of 20 or so) when the user begins to scroll.
That's going to take the load off your server, save bandwidth, provide a faster user experience. And get around the slow getChunk call.

Resources