How to script bulk nslookups - linux

I have a list of several million domain names and I want to see if they are available or not.
I tried pywhois first but am getting rate limited. As I don't need an authoritative answer, I thought I would just use nslookup. I am having trouble scripting this though.
Basically, what I want to do is, if the domain is registered, echo it. What I'm getting is grep: find”: No such file or directory . I think its something easy and I've just been looking at this for too long...
#!/bin/bash
START_TIME=$SECONDS
for DOMAIN in `cat ./domains.txt`;
do
if ! nslookup $DOMAIN | grep -v “can’t find”; then
echo $DOMAIN
fi
done
echo ELAPSED_TIME=$(($SECONDS - $START_TIME))

If you have millions to check, you may like to use GNU Parallel to get the job done faster, like this if you want to repeatedly do, say 32 lookups in parallel
parallel -j 32 nslookup < domains.txt | grep "^Name"
If you want to fiddle with the output of nslookup, the easiest way is probably to declare a little function called lkup(), tell GNU Parallel about it and then use that, like this
#!/bin/bash
lkup() {
if ! nslookup $1 | grep -v "can't find"; then
echo $1
fi
}
# Make lkup() function visible to GNU parallel
export -f lkup
# Check the domains in parallel
parallel -j 32 lkup < domains.txt
If the order of the lookups is important to you, you can add the -k flag to parallel to keep the order.

The error is because you have curly quotes in your script, which are not the proper way to quote command line elements. As a result, they're being treated as part of a filename. Change to:
if ! nslookup $DOMAIN | grep -v "can't find"; then

Related

Using Bash to cURL a website and grep for keywords

I'm trying to write a script that will do a few things in the following order:
cURL websites from a list of urls contained within a "url_list.txt" (new-line delineated) file.
For each website in the list, I want to grep that website looking for keywords contained within a "keywords.txt" (new-line delineated) file.
I want to finish by printing to the terminal in the following format (or something similar):
$URL (that contained match) : $keyword (that made the match)
It needs to be able to run in Ubuntu (GNU grep, etc.)
It does not need to be cURL and grep; as long as the functionality is there.
So far I've got:
#!/bin/bash
keywords=$(cat ./keywords.txt)
urllist=$(cat ./url_list.txt)
for url in $urllist; do
content="$(curl -L -s "$url" | grep -iF "$keywords" /dev/null)"
echo "$content"
done
But for some reason, no matter what I try to tweak or change, it keeps failing to one degree or another.
How can I go about accomplishing this task?
Thanks
Here's how I would do it:
#!/bin/bash
keywords="$(<./keywords.txt)"
while IFS= read -r url; do
curl -L -s "$url" | grep -ioF "$keywords" |
while IFS= read -r keyword; do
echo "$url: $keyword"
done
done < ./url_list.txt
What did I change:
I used $(<./keywords.txt) to read the keywords.txt. This does not rely on an external program (cat in your original script).
I changed the for loop that loops over the url list, into a while loop. This guarentees that we use Θ(1) memory (i.e. we don't have to load the entire url list in memory).
I remove /dev/null from grep. greping from /dev/null alone is meaningless, since it will find nothing there. Instead, I invoke grep with no arguments so that it filters its stdin (which happens to be the output of curl in this case).
I added the -o flag for grep so that it outputs only the matched keyword.
I removed the subshell where you were capturing the output of curl. Instead I run the command directly and feed its output to a while loop. This is necessary because we might get more than keyword match per url.

Mail output with Bash Script

SSH from Host A to a few hosts (only one listed below right now) using the SSH Key I generated and then go to a specific file, grep for a specific word with a date of yesterday .. then I want to email this output to myself.
It is sending an email but it is giving me the command as opposed to the output from the command.
#!/bin/bash
HOST="XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXX"
DATE=$(date -d "yesterday")
INVALID=' cat /xxx/xxx/xxxxx | grep 'WORD' | sed 's/$/.\n/g' | grep "$DATE"'
COUNT=$(echo "$INVALID" | wc -c)
for x in $HOSTS
do
ssh BLA#"$x" $COUNT
if [ "$COUNT" -gt 1 ];
then
EMAILTEXT=""
if [ "$COUNT" -gt 1 ];
then
EMAILTEXT="$INVALID"
fi
fi
done | echo -e "$EMAILTEXT" | mail XXXXXXXXXXX.com
This isn't properly an attempt to answer your question, but I think you should be aware of some fundamental problems with your code.
INVALID=' cat /xxx/xxx/xxxxx | grep 'WORD' | sed 's/$/.\n/g' | grep "$DATE"'
This assigns a simple string to the variable INVALID. Because of quoting issues, s/$/.\n/g is not quoted at all, and will probably be mangled by the shell. (You cannot nest single quotes -- the first single-quoted string extends from the first quote to the next one, and then WORD is outside of any quotes, followed by the next single-quoted string, etc.)
If your intent is to execute this as a command at this point, you are looking for a command substitution; with the multiple layers of uselessness peeled off, perhaps something like
INVALID=$(sed -n -e '/WORD/!d' -e "/$DATE/s/$/./p" /xxx/xxx/xxxx)
which looks for a line matching WORD and $DATE and prints the match with a dot appended at the end -- I believe that's what your code boils down to, but without further insights into what this code is supposed to do, it's impossible to tell if this is what you actually need.
COUNT=$(echo "$INVALID" | wc -c)
This assigns a number to $COUNT. With your static definition of INVALID, the number will always be 62; but I guess that's not actually what you want here.
for x in $HOSTS
do
ssh BLA#"$x" $COUNT
This attempts to execute that number as a command on a number of remote hosts (except the loop is over HOSTS and the variable containing the hosts is named just HOST). This cannot possibly be useful, unless you have a battery of commands named as natural numbers which do something useful on these remote hosts; but I think it's safe to assume that that is not what is supposed to be going on here (and if it was, it would absolutely be necessary to explain this in your question).
if [ "$COUNT" -gt 1 ];
then
EMAILTEXT=""
if [ "$COUNT" -gt 1 ];
then
EMAILTEXT="$INVALID"
fi
fi
So EMAILTEXT is either an empty string or the value of INVALID. You assigned it to be a static string above, which is probably the source of your immediate question. But even if it was somehow assigned to a command on the local host, why do you need to visit remote hosts and execute something there? Or is your intent actually to execute the command on each remote host and obtain the output?
done | echo -e "$EMAILTEXT" | mail XXXXXXXXXXX.com
Piping into echo makes no sense at all, because it does not read its standard input. You should probably just have a newline after done; though a possibly more useful arrangement would be to have your loop produce output which we then pipe to mail.
Purely speculatively, perhaps something like the following is what you actually want.
for host in $HOSTS; do
ssh BLA#"$host" sed -n -e '/WORD/!d' -e "/$DATE/s/$/./p" /xxx/xxx/xxxx |
grep . || echo INVALID
done | mail XXXXXXXXXXX.com
If you want to check that there is strictly more than one line of output (which is what the -gt 1 suggests) then this may need to be a little bit more complicated.
Your command substitution is not working. You should read up on how it works but here are the problem lines:
COUNT=$(echo "$INVALID" | wc -c)
[...]
ssh BLA#"$x" $COUNT
should be:
COUNT_CMD="'${INVALID} | wc -c'"
[...]
COUNT=$(ssh BLA#"$x" $COUNT_CMD)
This inserts the value of $INVALID into the string, and puts the whole thing in single quotes. The single quotes are necessary for the ssh call so the pipes aren't evaluated in the script but on the remote host. (COUNT is changed to COUNT_CMD for readability/clarity.)
EDIT:
I misread the question and have corrected my answer.

Is is possible to pipe the output of a command from a server to a local machine?

I have a series of functionally identical servers provided by my school that run various OS and hardware configurations. For the most part, I can use 5 of these interchangeably. Unfortunately, other students tend to bunch up on some machines and It's a pain to find one that isn't bogged down.
What I want to is ssh into a machine, run the command:
w | wc -l
to get a rough estimate of the load on that server, and use that information to select the least impacted one. A sort of client-side load balancer.
Is there a way to do this or achieve the same result?
I'd put this on your .bashrc file
function choose_host(){
hosts="host1 ... hostn"
for host in $hosts
do
echo $(ssh $host 'w|wc -l') $host
done | sort | head -1 | awk '{print $2}'
}
function ssh_host(){
ssh $(choose_host)
}
choose_host should give you the one you're looking for. This is absolutely overkill but i was feeling playful :D
sort will order the output according to the result of w|wc -l, then head -1 gets the first line and awk will just print the hostname !
You can call ssh_host and should log you automatically.
You can use pdsh command from your desktop which run the specified command on the set of machines you specified and return the results. This way you can find out the one which is least loaded. This will avoid you doing ssh to every single machine and run the w | wc -l.
Yes. See e.g.:
ssh me#host "ls /etc | sort" | wc -l
The part inside "" is done remotely. The part afterwards is local.

Grep filtering output from a process after it has already started?

Normally when one wants to look at specific output lines from running something, one can do something like:
./a.out | grep IHaveThisString
but what if IHaveThisString is something which changes every time so you need to first run it, watch the output to catch what IHaveThisString is on that particular run, and then grep it out? I can just dump to file and later grep but is it possible to do something like background it and then bring it to foreground and bringing it back but now piped to some grep? Something akin to:
./a.out
Ctrl-Z
fg | grep NowIKnowThisString
just wondering..
No, it is only in your screen buffer if you didn't save it in some other way.
Short form: You can do this, but you need to know that you need to do it ahead-of-time; it's not something that can be put into place interactively after-the-fact.
Write your script to determine what the string is. We'd need a more detailed example of the output format to give a better example of usage, but here's one for the trivial case where the entire first line is the filter target:
run_my_command | { read string_to_filter_for; fgrep -e "$string_to_filter_for" }
Replace the read string_to_filter_for with as many commands as necessary to read enough input to determine what the target string is; this could be a loop if necessary.
For instance, let's say that the output contains the following:
Session id: foobar
and thereafter, you want to grep for lines containing foobar.
...then you can pipe through the following script:
re='Session id: (.*)'
while read; do
if [[ $REPLY =~ $re ]] ; then
target=${BASH_REMATCH[1]}
break
else
# if you want to print the preamble; leave this out otherwise
printf '%s\n' "$REPLY"
fi
done
[[ $target ]] && grep -F -e "$target"
If you want to manually specify the filter target, this can be done by having the loop check for a file being created with filter contents, and using that when starting up grep afterwards.
That is a little bit strange what you need, but you can do it tis way:
you must go into script session first;
then you use shell how usually;
then you start and interrupt you program;
then run grep over typescript file.
Example:
$ script
$ ./a.out
Ctrl-Z
$ fg
$ grep NowIKnowThisString typescript
You could use a stream editor such as sed instead of grep. Here's an example of what I mean:
$ cat list
Name to look for: Mike
Dora 1
John 2
Mike 3
Helen 4
Here we find the name to look for in the fist line and want to grep for it. Now piping the command to sed:
$ cat list | sed -ne '1{s/Name to look for: //;h}' \
> -e ':r;n;G;/^.*\(.\+\).*\n\1$/P;s/\n.*//;br'
Mike 3
Note: sed itself can take file as a parameter, but you're not working with text files, so that's how you'd use it.
Of course, you'd need to modify the command for your case.

Question on grep

Out of many results returned by grepping a particular pattern, if I want to use all the results one after the other in my script, how can I go about it?For e.g. I grep for .der in a certificate folder which returns many results. I want to use each and every .der certificate listed from the grep command. How can I use one file after the other out of the grep result?
Are you actually grepping content, or just filenames? If it's file names, you'd be better off using the find command:
find /path/to/folder -name "*.der" -exec some other commands {} ";"
It should be quicker in general.
One way is to use grep -l. This ensures you only get every file once. -l is used to print the name of each file only, not the matches.
Then, you can loop on the results:
for file in `grep ....`
do
# work on $file
done
Also note that if you have spaces in your filenames, there is a ton of possible issues. See Looping through files with spaces in the names on the Unix&Linux stackexchange.
You can use the output as part of a for loop, something like:
for cert in $(grep '\.der' *) ; do
echo ${cert} # or something else
done
Of course, if those der things are actually files (and you're using ls | grep to get them), you can directly use the files:
for cert in *.der ; do
echo ${cert} # or something else
done
In both cases, you may need to watch out for arguments with embedded spaces.

Resources