How to store output for every xargs instance separately - linux

cat domains.txt | xargs -P10 -I % ffuf -u %/FUZZ -w wordlist.txt -o output.json
Ffuf is used for directory and file bruteforcing while domains.txt contains valid HTTP and HTTPS URLs like http://example.com, http://example2.com. I used xargs to speed up the process by running 10 parallel instances. But the problem here is I am unable to store output for each instance separately and output.json is getting override by every running instance. Is there anything we can do to make output.json unique for every instance so that all data gets saved separately. I tried ffuf/$(date '+%s').json instead but it didn't work either.

Sure. Just name your output file using the domain. E.g.:
xargs -P10 -I % ffuf -u %/FUZZ -w wordlist.txt -o output-%.json < domains.txt
(I dropped cat because it was unnecessary.)
I missed the fact that your domains.txt file is actually a list of URLs rather than a list of domain names. I think the easiest fix is just to simplify domains.txt to be just domain names, but you could also try something like:
xargs -P10 -I % sh -c 'domain="%"; ffuf -u %/FUZZ -w wordlist.txt -o output-${domain##*/}.json' < domains.txt

cat domains.txt | xargs -P10 -I % sh -c "ping % > output.json.%"
Like this and your "%" can be part of the file name. (I changed your command to ping for my testing)
So maybe something more like this:
cat domains.txt | xargs -P10 -I % sh -c "ffuf -u %/FUZZ -w wordlist.txt -o output.json.%
"
I would replace your ffuf command with the following script, and call this from the xargs command. It just strips out the invalid file name characters and replaces them with a dot then runs the command:
#!/usr/bin/bash
URL=$1
FILE="`echo $URL | sed 's/:\/\//\./g'`"
ffuf -u ${URL}/FUZZ -w wordlist.txt -o output-${FILE}.json

Related

how to use wget spider to identify broken urls from a list of urls and save broken ones

I am trying to write a shell script to identify broken urls from a list of urls.
here is input_url.csv sample:
https://www.google.com/
https://www.nbc.com
https://www.google.com.hksjkhkh/
https://www.google.co.jp/
https://www.google.ca/
Here is what I have which works:
wget --spider -nd -nv -H --max-redirect 0 -o run.log -i input_url.csv
and this gives me '2019-09-03 19:48:37 URL: https://www.nbc.com 200 OK' for valid urls, and for broken ones it gives me '0 redirections exceeded.'
what i expect is that i only want to save those broken links into my output file.
sample expect output:
https://www.google.com.hksjkhkh/
I think I would go with:
<input.csv xargs -n1 -P10 sh -c 'wget --spider --quiet "$1" || echo "$1"' --
You can use -P <count> option to xargs to run count processes in parallel.
xargs runs the command sh -c '....' -- for each line of the input file appending the input file line as the argument to the script.
Then sh inside runs wget ... "$1". The || checks if the return status is nonzero, which means failure. On wget failure, echo "$1" is executed.
Live code link at repl.
You could filter the output of wget -nd -nv and then regex the output, well like
wget --spider -nd -nv -H --max-redirect 0 -i input 2>&1 | grep -v '200 OK' | grep 'unable' | sed 's/.* .//; s/.$//'
but this looks not expendable, is not parallel so probably is slower and probably not worth the hassle.

ssh tail with nested ls and head cannot access

am trying to execute the following command:
$ ssh root#10.10.10.50 "tail -F -n 1 $(ls -t /var/log/alert_ARCDB.log | head -n1 )"
ls: cannot access /var/log/alert_ARCDB.log: No such file or directory
tail: cannot follow `-' by name
notice the error returned, when i login to ssh separately and then execute
tail -F -n 1 $(ls -t /var/log/alert_ARCDB.log | head -n1 )"
see the below:
# ls -t /var/log/alert_ARCDB.log | head -n1
/var/log/alert_ARCDB.log
why is that happening and how to fix it. am trying to do this in one line as i don't want to create a script file.
Thanks a lot
Shell parameter expansion happens before command execution.
Here's a simple example. If I type...
ls "$HOME"
...the shell replaces $HOME with the path to my home directory first, then runs something like ls /home/larsks. The ls command has no idea that the command line originally had $HOME.
If we look at your command...
$ ssh root#10.10.10.50 "tail -F -n 1 $(ls -t /var/log/alert_ARCDB.log | head -n1 )"
...we see that you're in exactly the same situation. The $(ls -t ...) expression is expanded before ssh is executed. In other words, that command is running your local system.
You can inhibit the shell expansion on your local system by using single quotes. For example, running:
echo '$HOME'
Will produce:
$HOME
So you can run:
ssh root#10.10.10.50 'tail -F -n 1 $(ls -t /var/log/alert_ARCDB.log | head -n1 )'
But there's another problem here. If /var/log/alert_ARCDB.log is a file, your command makes no sense: calling ls -t on a single file gets you nothing.
If alert-ARCDB.log is a directory, you have a different problem. The result of ls /some/directory is a list of filenames without any directory prefix. If I run something like:
ls -t /tmp
I will get output like
file1
file2
If I do this:
tail $(ls -t /tmp | head -1)
I end up with a command that looks like:
tail file1
And that will fail, because there is no file1 in my current directory.
One approach would be to pipe the commands you want to perform to ssh. One simple way to achieve that is to first create a function that will echo the commands you want executed :
remote_commands()
{
echo 'cd /var/log/alert_ARCDB.log'
echo 'tail -F -n 1 "$(ls -t | head -n1 )"'
}
The cd will allow you to use the relative path listed by ls. The single quotes make sure that everything will be sent as-is to the remote shell, with no local expansion occurring.
Then you can do
ssh root#10.10.10.50 bash < <(remote_commands)
This assumes alert_ARCDB.log is a directory (or else I am not sure why you would want to add head -n1 after that).

Bash grep command finding the same file 5 times

I'm building a little bash script to run another bash script that's found in multiple directories. Here's the code:
cd /home/mainuser/CaseStudies/
grep -R -o --include="Auto.sh" [\w] | wc -l
When I execute just that part, it finds the same file 5 times in each folder. So instead of getting 49 results, I get 245. I've written a recursive bash script before and I used it as a template for this problem:
grep -R -o --include=*.class [\w] | wc -l
This code has always worked perfectly, without any duplication. I've tried running the first code with and without the " ", I've tried -r as well. I've read through the bash documentation and I can't seem to find a way to prevent, or even why I'm getting, this duplication. Any thoughts on how to get around this?
As a separate, but related question, if I could launch Auto.sh inside of each directory so that the output of Auto.sh was dumped into that directory; without having to place Auto.sh in each folder. That would probably be much more efficient that what I'm currently doing and it would also probably fix my current duplication problem.
This is the code for Auto.sh:
#!/bin/bash
index=1
cd /home/mainuser/CaseStudies/
grep -R -o --include=*.class [\w] | wc -l
grep -R -o --include=*.class [\w] |awk '{print $3}' > out.txt
while read LINE; do
echo 'Path '$LINE > 'Outputs/ClassOut'$index'.txt'
javap -c $LINE >> 'Outputs/ClassOut'$index'.txt'
index=$((index+1))
done <out.txt
Preferably I would like to make it dump only the javap outputs for the application its currently looking at. Since those .class files could be in any number of sub-directories, I'm not sure how to make them all dump in the top folder, without executing a modified Auto.sh in the top directory of each application.
Ok, so to fix the multiple find:
grep -R -o --include="Auto.sh" [\w] | wc -l
Should be:
grep -R -l --include=Auto.sh '\w' | wc -l
The reason this was happening, was that it was looking for instances of the letter w in Auto.sh. Which occurred 5 times in the file.
However, the overall fix that doesn't require having to place Auto.sh in every directory, is something like this:
MAIN_DIR=/home/mainuser/CaseStudies/
cd $MAIN_DIR
ls -d */ > DirectoryList.txt
while read LINE; do
cd $LINE
mkdir ProjectOutputs
bash /home/mainuser/Auto.sh
cd $MAIN_DIR
done <DirectoryList.txt
That calls this Auto.sh code:
index=1
grep -R -o --include=*.class '\w' | wc -l
grep -R -o --include=*.class '\w' | awk '{print $3}' > ProjectOutputs.txt
while read LINE; do
echo 'Path '$LINE > 'ProjectOutputs/ClassOut'$index'.txt'
javap -c $LINE >> 'ProjectOutputs/ClassOut'$index'.txt'
index=$((index+1))
done <ProjectOutputs.txt
Thanks again for everyone's help!

wget and bash error: bash: line 0: fg: no job control

I am trying to run a series of commands in parallel through xargs. I created a null-separated list of commands in a file cmd_list.txt and then attempted to run them in parallel with 6 threads as follows:
cat cmd_list.txt | xargs -0 -P 6 -I % bash -c %
However, I get the following error:
bash: line 0: fg: no job control
I've narrowed down the problem to be related to the length of the individual commands in the command list. Here's an example artificially-long command to download an image:
mkdir a-very-long-folder-de090952623b4865c2c34bd6330f8a423ed05ed8de090952623b4865c2c34bd6330f8a423ed05ed8de090952623b4865c2c34bd6330f8a423ed05ed8
wget --no-check-certificate --no-verbose -O a-very-long-folder-de090952623b4865c2c34bd6330f8a423ed05ed8de090952623b4865c2c34bd6330f8a423ed05ed8de090952623b4865c2c34bd6330f8a423ed05ed8/blah.jpg http://d4u3lqifjlxra.cloudfront.net/uploads/example/file/48/accordion.jpg
Just running the wget command on its own, without the file list and without xargs, works fine. However, running this command at the bash command prompt (again, without the file list) fails with the no job control error:
echo "wget --no-check-certificate --no-verbose -O a-very-long-folder-de090952623b4865c2c34bd6330f8a423ed05ed8de090952623b4865c2c34bd6330f8a423ed05ed8de090952623b4865c2c34bd6330f8a423ed05ed8/blah.jpg http://d4u3lqifjlxra.cloudfront.net/uploads/example/file/48/accordion.jpg" | xargs -I % bash -c %
If I leave out the long folder name and therefore shorten the command, it works fine:
echo "wget --no-check-certificate --no-verbose -O /tmp/blah.jpg http://d4u3lqifjlxra.cloudfront.net/uploads/example/file/48/accordion.jpg" | xargs -I % bash -c %
xargs has a -s (size) parameter that can change the max size of the command line length, but I tried increasing it to preposterous sizes (e.g., 16000) without any effect. I thought that the problem may have been related to the length of the string passed in to bash -c, but the following command also works without trouble:
bash -c "wget --no-check-certificate --no-verbose -O a-very-long-folder-de090952623b4865c2c34bd6330f8a423ed05ed8de090952623b4865c2c34bd6330f8a423ed05ed8de090952623b4865c2c34bd6330f8a423ed05ed8/blah.jpg http://d4u3lqifjlxra.cloudfront.net/uploads/example/file/48/accordion.jpg"
I understand that there are other options to run commands in parallel, such as the parallel command (https://stackoverflow.com/a/6497852/1410871), but I'm still very interested in fixing my setup or at least figuring out where it's going wrong.
I'm on Mac OS X 10.10.1 (Yosemite).
It looks like the solution is to avoid the -I parameter for xargs which, per the OS X xargs man page, has a 255-byte limit on the replacement string. Instead, the -J parameter is available, which does not have a 255-byte limit.
So my command would look like:
echo "wget --no-check-certificate --no-verbose -O a-very-long-folder-de090952623b4865c2c34bd6330f8a423ed05ed8de090952623b4865c2c34bd6330f8a423ed05ed8de090952623b4865c2c34bd6330f8a423ed05ed8/blah.jpg http://d4u3lqifjlxra.cloudfront.net/uploads/example/file/48/accordion.jpg" | xargs -J % bash -c %
However, in the above command, only the portion of the replacement string before the first whitespace is passed to bash, so bash tries to execute:
wget
which obviously results in an error. My solution is to ensure that xargs interprets the commands as null-delimited instead of whitespace-delimited using the -0 parameter, like so:
echo "wget --no-check-certificate --no-verbose -O a-very-long-folder-de090952623b4865c2c34bd6330f8a423ed05ed8de090952623b4865c2c34bd6330f8a423ed05ed8de090952623b4865c2c34bd6330f8a423ed05ed8/blah.jpg http://d4u3lqifjlxra.cloudfront.net/uploads/example/file/48/accordion.jpg" | xargs -0 -J % bash -c %
and finally, this works!
Thank you to #CharlesDuffy who provided most of this insight. And no thank you to my OS X version of xargs for its poor handling of replacement strings that exceed the 255-byte limit.
I suspect it's the percent symbol, and your top shell complaining.
cat cmd_list.txt | xargs -0 -P 6 -I % bash -c %
Percent is a metacharacter for job control. "fg %2", e.g. "kill %4".
Try escaping the percents with a backslash to signal to the top shell that it should not try to interpret the percent, and xargs should be handed a literal percent character.
cat cmd_list.txt | xargs -0 -P 6 -I \% bash -c \%

Bash - Command call ported to variable with another variable inside

I believe this is a simple syntax issue on my part but I have been unable to find another example similar to what i'm trying to do. I have a variable taking in a specific disk location and I need to use that location in an hdparm /grep command to pull out the max LBA
targetDrive=$1 #/dev/sdb
maxLBA=$(hdparm -I /dev/sdb |grep LBA48 |grep -P -o '(?<=:\s)[^\s]*') #this works perfect
maxLBA=$(hdparm -I $1 |grep LBA48 |grep -P -o '(?<=:\s)[^\s]*') #this fails
I have also tried
maxLBA=$(hdparm -I 1 |grep LBA48 |grep -P -o '(?<=:\s)[^\s]*')
maxLBA=$(hdparm -I "$1" |grep LBA48 |grep -P -o '(?<=:\s)[^\s]*')
Thanks for the help
So I think here is the solution to your problem. I did basically the same as you but changed the way I pipe the results into one another.
grep with regular expression to find the line containing LBA48
cut to retrieve the second field when the resulting string is divided by the column ":"
then trim all the leasding spaces from the result
Here is my resulting bash script.
#!/bin/bash
target_drive=$1
max_lba=$(sudo hdparm -I "$target_drive" | grep -P -o ".+LBA48.+:.+(\d+)" | cut -d: -f2 | tr -d ' ')
echo "Drive: $target_drive MAX LBA48: $max_lba"

Resources