Multiple process curl command for urls to output to individual files - linux

I am attempting to curl multiple urls in a bash command. Eventually I will be curling a large number of Urls so I am using xargs to use multiple processes to speed up the process.
My file consists of x number of URLs:
https://someurl.com
https://someotherurl.com
My issue comes when attempting to output the results to separate files named after the URLs I curl.
The bash command I have is:
xargs -P 5 -n 1 -I% curl -k -L % -0 % < urls.txt
When I run this I get 'Failed to create file https://someotherurl.com'

You cannot create a file with / in the filename. You could do it this way:
#!/bin/bash
while IFS= read -r line
do
echo "LINE: $line"
if [[ "$line" != "" ]]
then
filename="${line#https://}"
echo "FILENAME: $filename"
# YOUR CURL COMMAND HERE, USING $filename
fi
done < url.txt
it ignores empty lines
variable substitution is used to remove the https:// part of each URL
this will allow you to create the file
Note: if your URLs containt sub-directories, they must be removed as well.
Ex: you want to do https://www.exemple.com/some/sub/dir
The script I suggested here would try to create a file named "www.exemple.com/some/sub/dir". In this case, you could replace the / with _ using tr.
The script would become:
#!/bin/bash
while IFS= read -r line
do
echo "LINE: $line"
if [[ "$line" != "" ]]
then
filename=$(echo "$line" | tr '/' '_')
filename2=${filename#https:__}
echo "FILENAME: $filename2"
# YOUR CURL COMMAND HERE, USING $filename2
fi
done < url.txt

Because your question is ambiguous, I would assume:
You have a file urls.txt that contains URLs separated by LF.
You want to download all URLs by curl and use each URL as its filename.
Unfortunately, that's not possible because URL contains invalid characters like slash /. Alternatively, for this case, I would suggest you use Bsse64 safe mode to decode URL before saving to file based on RFC 3548.
After applying this requirement, your script would become like:
seq 100 | xargs -I# echo 'https://example.com?#' > urls.txt
xargs -P0 -L1 sh -c 'curl -SskL0 -o $(printf %s "$1" | uuencode -m /dev/stdout | sed "1d;\$d" | tr +/ -_) "$1"' sh < urls.txt

Related

How can i move/group specific folders in bash?

I have a folder structure like the following:
2020-123-1
2020-123-2
2020-123-3
2020-124-1
2020-124-2
...
I need to create folders from the first 2 numbers and omit whatever follows the second dash (-). Then I need to put the prior folders under the newly created ones with the correct name.
2020-123
->2020-123-1
->2020-123-2
->2020-123-3
2020-124
->2020-124-1
->2020-124-2
I tried to write a script in bash like this:
ls -d */ > folder.txt
cut -f1,2 -d"-" folder.txt |cut -f1 -d"/" |sort|uniq > mainfolder.txt
while read line; do mkdir $line ; done < mainfolder.txt
while read line; do mv $(cut -f1,2 -d"-" $line) $line/ ; done < folder.txt
I couldn't make the last line work, I know it has issues.
Actually, you don't have to parse the directory names and build the hierarchy. You can make use of the -p option of mkdir, thus, an awk one-liner will do the job:
awk -F'-' '{top=$1 FS $2;printf "mkdir -p %s; mv %s %s\n",top, $0, top}' dir.txt
The output with your example:
mkdir -p 2020-123; mv 2020-123-1 2020-123
mkdir -p 2020-123; mv 2020-123-2 2020-123
mkdir -p 2020-123; mv 2020-123-3 2020-123
mkdir -p 2020-124; mv 2020-124-1 2020-124
mkdir -p 2020-124; mv 2020-124-2 2020-124
Note
This one-liner just print the commands without executing them, you just pipe the output to |sh if everything looks fine. Examine the output commands, change the printf format/values for adjustment.
I didn't quote the filenames, since your example doesn't contain any special chars. Do it if it is in the case.
So the final script is as follows:
ls -d */ | cut -f1 -d"/" > folder.txt
awk -F'-' '{top=$1 FS $2;printf "mkdir -p %s; mv %s %s\n",top, $0, top}' folder.txt |sh
In pure bash:
#!/bin/bash
for src in *-*-*; do
destdir=${src%-*}
[[ -d $destdir ]] || mkdir "$destdir" || exit
# This just prints out the command that will be called.
# Remove the "echo" in actual script after making sure it will run as intented
echo mv "$src" "$destdir"
done
In the script above it is assumed that each file name to be moved contains exactly two dashes. If it can contain two or more dashes then the destdir=${src%-*} line should be replaced with these two lines:
suffix=${src#*-*-}
destdir=${src%"-$suffix"}
For detailed information read the "shell parameter expansion" section in bash reference.
Additionally, a good read article is: Why you shouldn't parse the output of ls

How to run all the scripts found by find

I'm trying to find all the init scripts created for websphere.
I know all the scripts end up with -init, so the first part of the code is:
find /etc/rc.d/init.d -name "*-init"
Also, I need all the script that run on an specific path, so the second part would be
| grep -i "/opt/ibm"
Finally, I need help with the last part. I have found the scripts I need to run them with the stop argument.
find /etc/rc.d/init.d -name "*-init" | grep -i "/opt/ibm" | <<run script found with stop argument>>
How can I run the command found with find?
Use a loop so that we are a little more careful while executing them:
#!/bin/bash
shopt -s globstar
for file in /etc/rc.d/init.d/**/*-init; do # grab all -init scripts
script=$(readlink -f "$file") # grab the actual file in case of a symlink
[[ -f $script ]] || continue # skip if not a regular file
[[ $file = */opt/ibm/* ]] || continue # not "/opt/ibm/", skip
printf '%s\n' "Executing script '$script'"
"$script" stop; exit_code=$?
printf '%s\n' "Script '$script' finished with exit_code $exit_code"
done
If you omit the 'find' and use grep directly you could do something like this:
grep -i "/opt/ibm" /etc/rc.d/init.d/* | sed 's/:.*/ stop/g' | sort -u | bash
it uses grep directly, which adds the filename to the output: filename:matched line
since you only need the filename and not the match, use sed to replace the ':' and the rest of the line with ' stop' (see the space before stop)
use sort -u (make sure, to execute each script only once)
Pipe the result into a shell

sed with variable as argument in bash script

I am trying to write a bash script to scan for authorized_keys files and remove the keys of a couple previous employees if found. I am having one heck of a time figuring out the escaping for the sed command at the end. I am using commas instead of / since / can show up in the ssh-key. Any help would be appreciated
#!/bin/bash
declare -A keys
keys["employee1"]='AAAAB3NzaC1yc2EAAAABJQAAAIEAxoZ7ZdpJkL98n8cSTkFBwaAeSNK0m/tOWtF1mu5NAzMM/+1SDO6rJH/ruyyqBJo9s+AHWZLGRHfXT2XBg2SRaUnubAKp0w6qNIbej0MsA/ifAs8AfVGdj0pUPLtKpo6XVZkB8vEZSIQ+xNk1n5HJrGJnFGWKWeY3z1/KOLxcLHU='
keys["employee2"]='AAAAB3NzaC1yc2EAAAABIwAAAQEAwHYNAVhb319OBVXPhYF8cSTkFBwaAekr7UcKjfLPCHMpz19W0L/C0g+75Hn8COxOQILDUhIPhYHXOduQjGD/6NXgJDWxgyT00Azg5BREUnBd58WqZPlEvTZYlAgmdMIbnWPPGdJwzqKH/k7/STK6vTKxL6rxBo4lSNK0m/tOWtF1mu5NAzMM/+1SDO6rJH/ruyyqBJo9s+NIbej0MsA/ifAs8AfAkfO2JjgeQpJMyZ7B02XVN5iSLAyC3Cb5FXIjJuk4LPhcApuVyszH2lgve0r5bt/nFgVujJTvJTHPlGrqkYDcDJVUtfbjoLqGPrnpijp6rGIC7aFDDe7bk0ygHYMXDFWcjJBerfLGUWTYWFFLY3bfiO/h/9oEycmQHyB2co4a0IyyDnaYn9OY6xsRRATVlk4Q=='
files=`find / -name authorized_keys`
echo "Checking Authorized_Keys files on: " `hostname`
echo ""
echo "Located files: "
for file in $files; do
echo " $file"
done
echo""
for file in $files; do
for key in "${!keys[#]}"; do
if grep -q ${keys[$key]} $file; then
echo " *** Removing $key from $file"
sed "s,${keys[$key]},d" $file
fi
done
done
You've made it a bit complicated I think.
You can do this using grep -vf and process substitution:
# array to hold the value you want to remove
keys=(
'AAAAB3NzaC1yc2EAAAABJQAAAIEAxoZ7ZdpJkL98n8cSTkFBwaAeSNK0m/tOWtF1mu5NAzMM/+1SDO6rJH/ruyyqBJo9s+AHWZLGRHfXT2XBg2SRaUnubAKp0w6qNIbej0MsA/ifAs8AfVGdj0pUPLtKpo6XVZkB8vEZSIQ+xNk1n5HJrGJnFGWKWeY3z1/KOLxcLHU=',
'AAAAB3NzaC1yc2EAAAABIwAAAQEAwHYNAVhb319OBVXPhYF8cSTkFBwaAekr7UcKjfLPCHMpz19W0L/C0g+75Hn8COxOQILDUhIPhYHXOduQjGD/6NXgJDWxgyT00Azg5BREUnBd58WqZPlEvTZYlAgmdMIbnWPPGdJwzqKH/k7/STK6vTKxL6rxBo4lSNK0m/tOWtF1mu5NAzMM/+1SDO6rJH/ruyyqBJo9s+NIbej0MsA/ifAs8AfAkfO2JjgeQpJMyZ7B02XVN5iSLAyC3Cb5FXIjJuk4LPhcApuVyszH2lgve0r5bt/nFgVujJTvJTHPlGrqkYDcDJVUtfbjoLqGPrnpijp6rGIC7aFDDe7bk0ygHYMXDFWcjJBerfLGUWTYWFFLY3bfiO/h/9oEycmQHyB2co4a0IyyDnaYn9OY6xsRRATVlk4Q=='
)
while IFS= read -d '' -r file; do
grep -vf <(printf "%s\n" "${keys[#]}") "$file" > "$file.tmp"
mv "$file.tmp" "$file"
done < <(find / -name authorized_keys -print0)
In your case, it's easy, just need to use a sign which not contained in base64 code as the delimiter, eg |:
sed "\|${keys[$key]}|d" $file
Explanation in the sed manual:
\%regexp%
(The % may be replaced by any other single character.)
This also matches the regular expression regexp, but allows one to use a different delimiter than /.

Bash script to download graphic files from website

I'm trying to write bash script in Linux (Debian), that will be used for downloading graphic files from website given by user during start-up. I'm not sure if my code is correct but first problem is when i try to run my script with website e.g. http://www.bbc.com/ an error shows: http://www.bbc.com/ : invalid identifier. I even tried a simple website that has only a few JPG files. My next problem is to find out how to download files from .txt file where the images Internet adresses are included.
#!/bin/bash
# $1 - URL $2 - new catalog name
read $1 $2
url=$1
fold=$2
mkdir -p $fold
if [$# -ne 3];
then
echo "Wrong command"
exit -1
fi
curl $url | grep -o -e "<img src=\".*\"+>" > img_list.txt |wc -l img_list.txt | lin=${% *}
baseurl=$(echo $url | grep -o "https?://[a-z.]*"")
curl -s $url | egrep -o "<img src\=[^>]*>" | sed 's/<img src=\"\([^"]*\).*/\1/.*/\1/g' > url_list.txt
sed -i "s|^/|$baseurl/|" url_list.txt
cd $fold;
what can I do next?
For download every image from the webpage I would to use:
mech-dump --absolute --images http://example.com | xargs -n1 curl -O
but this need to be installed the mech-dump command from the WWW::Mechanize package.
Using the list file
while read -r url folder
do
mkdir -p "$folder" || exit 1
(cd "$folder" && mech-dump --absolute --images "$url" | xargs -n1 curl -O)
done < list.txt
(assuming than no url nor folder containing a space).
an error shows: http://www.bbc.com/ : invalid identifier
Your use of read is wrong; change
read $1 $2
url=$1
fold=$2
to
read url fold
or decide to specify the arguments on the command line and omit only read $1 $2.
Also, each operand in [ ] must be separated from the brackets; change
if [$# -ne 3];
to
if [ -z "$fold" ]

Bash | curl | curls 2 URL's then stops

I am trying to write a simple bash script that will use a list from a text document and curl each URL that is on the list in order to see what the contents of each URL is. It allows me to cURL 2 sites and creates the text documents for the rest however it only downloads the first 2. I have already manage to write the script that pulls there IP's and places them in a seperate file using the grep command. At first i tried
#!/bin/bash
for var in `cat host.txt`; do
curl -s $var >> /tmp/ping/html/$var.html
done
I have tried with and without the silent switch. I then tried the following:
#!/bin/bash
for var in `head -2 host.txt`; do
curl $var >> /tmp/ping/html/$var.html
wait
done
for var in `head -4 host.txt | tail -2`; do
curl $var >> /tmp/ping/html/$var.html
done
This would try and do them all at the same time again stopping after 2
#!/bin/bash
for var in `head -2 host.txt`; do
curl $var >> /tmp/ping/html/$var.html
done
wait
for var in `head -4 host.txt | tail -2`; do
curl $var >> /tmp/ping/html/$var.html
done
This would do the same, I am new to bash scripting and only know some of the basics, any help would be appreciated
Start with the simple: verify that you are in fact iterating over the entire list:
# This is the recommended way to iterate over the file. See
# http://mywiki.wooledge.org/BashFAQ/001
while read -r var; do
echo "$var"
done < hosts.txt
Then add in the call to curl, checking its exit status
while read -r var; do
echo "$var"
curl "$var" >> /tmp/ping/html/$var.html || echo "curl failed: $?"
done < hosts.txt
You pipe into $var, which could result in a wrong filename, because of the two slashes in the URL. Additionally i would quote the URL. For Example it works with the basename of the URL.
#!/bin/bash
for var in `cat host.txt`; do
name=$(basename $var)
curl -v -s "$var" -o "/tmp/ping/html/$name.html"
done
You may also want to skip blank lines and Comments (#)
#!/bin/bash
file="host.txt"
curl="curl"
while read -r line
do
[[ $line = \#* ]] || [[ -z "${line}" ]] && continue
filename=$(basename $line)
$curl -s "$line" >> "/tmp/ping/html/$filename.html"
done < "$file"

Resources