Baseline Scan ZAP (OWASP) on a defined list of urls - security

Is it possible to define a list of URLs that the ZAP baseline (https://www.zaproxy.org/docs/docker/baseline-scan/) scan should scan? The default behaviour is that it runs for one minute. I only want 20 defined URLs to be scanned.
It the moment I use the docker container with the following parameters:
docker run -t owasp/zap2docker-stable zap-baseline.py -t https://www.example.com

It will run for up_to one minute (by default). If your app has only 20 urls then it will hopefully find them much faster than that. If it takes 2 seconds to find them then thats how long it will take to find them. The passive scanning will take a bit longer, but hopefully not too long.

Related

gsutil copy with multithread doesn't finish copying all files

We have around 650 GB of data on google compute engine.
We need to move them to Cloud Storage to a Coldline bucket, so the best options we could find is to copy them with gsutil with parallel mode.
The files are from kilobytes to 10Mb max, and there are few million files.
The command we used is
gsutil -m cp -r userFiles/ gs://removed-websites/
On first run it copied around 200Gb and stopped with error
| [972.2k/972.2k files][207.9 GiB/207.9 GiB] 100% Done 29.4 MiB/s ETA 00:00:00
Operation completed over 972.2k objects/207.9 GiB.
CommandException: 1 file/object could not be transferred.
On second run it finished almost at the same place, and stopped again.
How can we copy these files successfully ?
Also the buckets that have the partial data are not being removed after deleting them. Console just says preparing to delete, and nothing happens, we waited more than 4 hours, any way to remove those buckets ?
Answering your first question, I can propose the several options. All of them based on data split and uploading by small portions of data.
You can try distributed upload from several machines.
https://cloud.google.com/storage/docs/gsutil/commands/cp#copying-tofrom-subdirectories-distributing-transfers-across-machines
In this case you are splitting data by safe chunks, like 50GB, and uploading it from several machines in parallel. But it requires machines, that is not required actually.
You still can try such splited upload on a single machine, but you need then some splitting mechanism, which will not upload all files at once, but by chunks. In this case, if some thing fails, you will need to reload only this chunk. In addition, you will have better accuracy and you'll be able to localize possible fail place if something happens.
Regarding, how you can delete them. Well, same technique as for upload. Divide data on chunks and delete them by chunks. Or, you can try to remove whole project, if it suitable for your situation.
Update 1
So, I checked gsutil interface and it is supports glob syntax. You can match with glob syntax, for example 200 folders, and launch this command 150 time (this will upload 200 x 500 = 30 000 folders).
You can use such approach and combine it with -m option, so this is partially that your script did, but might work faster. This will work for folders names and files as well.
If you provide examples of the folders names and files names it would be easier to propose appropriate glob pattern.
It could be that you are affected by gs-util issue 464. This happens when you are running multiple gs-util instances concurrently wit the -m option. Apparently these instances share a state directory which causes weird behavior.
One of the workarounds is to add parameters: -o GSUtil:parallel_process_count=1 -o GSUtil:parallel_thread_count=24.
E.g.:
gsutil -o GSUtil:parallel_process_count=1 -o GSUtil:parallel_thread_count=24 -m cp -r gs://my-bucket .
I've just run into the same issue, and turns out that it's caused by the cp command running into an uncopyable file (in my case, a broken symlink) and aborting.
Problem is, if you're running a massively parallel copy with -m, the broken file may not be immediately obvious. To figure out which one it is, try a dry run rsync -n instead:
gsutil -m rsync -n -r userFiles/ gs://removed-websites/
This will clearly flag the broken file and abort, and you can fix or delete it and try again. Alternatively, if you're not interested in symlinks, just use the -e option and they'll be ignored entirely.

LinkDb: adding segment & SolrIndexer takes lots of time

Below is the command that I am running to index pages.
bin/nutch crawl bin/urls -solr http://localhost:8983/solr/ -dir crawl -depth 2 -topN 15
The fetching happens pretty quickly but LinkDb:adding segments and SolrIndexer steps are taking lot of time, as I run above command repeatedly the time increases. My requirement is such that I want to index pages as fast as possible because links disappear pretty quickly (within 2 mins). I want to decrease this time to a very small figure, what should I do to make this possible?
If I only wanted to index URL and title of the page, will doing so do any good to indexing speed?
Thanks
If you have a static seedlist then you can delete "crawl" folder each time you want to run the nutch! it would save a lot's of time for you!
every time you run nutch your segments growth so linkdb gonna take more time!
Also you can create a thread and pass this part of job to it, but you have to handle segmenting buy yourself!

Benchmarking Node.JS server

I've written a Node.JS server which I would like to benchmark. It has the following components that I would like to benchmark separately:
- socket.io: how many continuous connections can it accept and process (where is the saturation point)
- redis: the same as above
- express: don't want to benchmark it
I know there is quite some (not a lot) documentation about that on the internet, but I don't like to reinvent the wheel, plus I don't want to actually spend countless hours of time trying some solution that turns out to be wrong for the job.
This is why I'm asking you guys here: what should I use to get a number/graph (whatever) on the number of simultaneous connections that server can process simultaneuosly without being bogged down? It would also be nice to monitor cpu, memory and swap of the process (yeah, yeah I know I can use countless of techniques or write my own script, but maybe something like that exists already).
I'm not looking for an answer where you'll paste a link to some solution that I already know it exists, I would like an answer in such a way, so that the person giving it has some actual experience and can really make a point or two and point me in the right direction.
Thank you
You can use ApacheBench ab to test the load that your server may take - man page
Some nice tutorials :
nixcraft/howto-performance-benchmarks-a-web-server
petefreitag/Using Apache Bench for Simple Load Testing
Usage :
$ ab -k -n 1000 -c 100 www.yourserver.com
-k - keep alive
-n N - will send N requests to the server
-c X - will send X packets concurrently

Website Benchmarking using ab

I am trying my hand at various benchmarking tools for the website I am working on and have found Apache Bench (ab) to be an excellent tool for load testing. It is a command line tool and is very easy to use, apparently. However I have a doubt about two of its basic flags. The site I was reading says:
Suppose we want to see how fast Yahoo can handle 100 requests, with a maximum of 10 requests running concurrently:
ab -n 100 -c 10 http://www.yahoo.com/
and the explanation for the flags states:
Usage: ab [options] [http[s]://]hostname[:port]/path
Options are:
-n requests Number of requests to perform
-c concurrency Number of multiple requests to make
I guess I am just not able to wrap my head around number of requests to perform and number of multiple requests to make. What happens when I give them both together like in the example above?
Can anyone give me a simpler explanation of what these two flags do together?
In your example ab will create 10 connections to yahoo.com and request a page using each of them simultaneously.
If you omit -c 10 ab will create only one connection and create next only when the first completes(when we have the whole main page downloaded).
If we pretend that server's response time does not depend on the number of requests it is simultaneously handling, your example will complete 10 times faster than without -c 10.
Also: What is concurrent request (-c) in Apache Benchmark?
-n 100 -c 10 means "issue 100 requests, 10 at a time."

Linux display average CPU load for last week

On a Linux box, I need to display the average CPU utilisation per hour for the last week. Is that information logged somewhere? Or do I need to write a script that wakes up every 15 minutes to copy /proc/loadavg to a logfile?
EDIT: I'm not allowed to use any tools other than those that come with Linux.
You might want to check out sar (man page), it fits your use case nicely.
System Activity Reporter (SAR) - capture important system performance metrics at
periodic intervals.
Example from IBM Developer Works Article:
Add an entry to your root crontab
# Collect measurements at 10-minute intervals
0,10,20,30,40,50 * * * * /usr/lib/sa/sa1
# Create daily reports and purge old files
0 0 * * * /usr/lib/sa/sa2 -A
Then you can simply query this information using a sar command (display all of today's info):
root ~ # sar -A
Or just for a certain days log file:
root ~ # sar -f /var/log/sa/sa16
You can usually find it in the sysstat package for your linux distro
As far as I know it's not stored anywhere... It's a trivial thing to write, anyway. Just add something like
cat /proc/loadavg >> /var/log/loads
to your crontab.
Note that there are monitoring tools (like Munin) which can do this kind of thing for you, and generate pretty graphs of it to boot... they might be overkill for your situation though.
I would recommend looking at Multi Router Traffic Grapher (MRTG).
Using snmpd to read the load average, it will automatically calculate averages at any time interval and length, along with nice charts for analysis.
Someone has already posted a CPU usage example.

Resources