I'm working on a bash script where I use aria2c with --max-concurrent-downloads to parallel download files from a list --input-file.
Like this:
08/05 11:45:05 [NOTICE] Downloading 5 item(s)
[#25487f 3.4MiB/112MiB(3%)][#b99e01 9.5MiB/39MiB(23%)][#e0ce70 3.2MiB/909MiB(0%)][#06633c 2.9MiB/800MiB(0%)][#a450d5 2.1MiB/17GiB(0%)]
The problem is that I have to wait for all the 5 files to finish downloading in order to loop again and download other 5 files.
And if one of the files is huge there is a bottleneck:
[#a450d5 14GiB/17GiB(83%) CN:1 DL:7.8MiB ETA:6m19s]
So is there any way to begin a new download when one finishes? To always have 5 files downloading all the time.
This is the code:
#!/bin/bash
#before this there is more logic, but it is not relevant here.
#/mnt/k/dl/${NAME}/${NAME}.txt is a list of IDs like this:
#29356176 29356744 29360752 29488484 29488703 29507184 29567654 29576218 29658504
VID_IDS=$(cat /mnt/k/dl/${NAME}/${NAME}.txt)
eval "VID_IDS=($VID_IDS)"
n=5
#loop every 5 IDs from the .txt list
for (( i = 0; i <= "${#VID_IDS[#]}"; i += n )); do
#delete --input-file for aria on every loop
rm -r /mnt/k/dl/${NAME}/aria_${NAME}.txt
TEMP=("${VID_IDS[#]:i:n}")
VIDSGROUP=${TEMP[*]}
for VIDEO_ID in $VIDSGROUP; do
#Logic to get a valid download link: $VIDEO_LINK. Not relevant here.
#build the --input-file for aria
echo $VIDEO_LINK >> /mnt/k/dl/${NAME}/aria_${NAME}.txt
done
DOWNLOADDIR="/mnt/k/dl/${NAME}"
ARIAFILELIST="/mnt/k/dl/${NAME}/aria_${NAME}.txt"
aria2c --check-certificate=false \
--max-file-not-found=10 \
--retry-wait=5 \
--max-tries=10 \
--max-connection-per-server=5 \
--max-concurrent-downloads=5 \
--allow-overwrite=false \
-c \
--dir=$DOWNLOADDIR \
--input-file=$ARIAFILELIST
done
Related
I use a Linux bash script to create posts on my WordPress website with wp-cli.
this is the example of the script that I use:
#! /bin/bash
while IFS="|" read -r rec_column1 rec_column2 rec_column3 rec_column4
do
cd /posts/ &&
wp post create \
--post_type=post \
--post_title="$rec_column1" \
--post_status=publish \
--post_category=2 \
--url=http://127.0.0.1/ \
--meta_input="{\"word\":\"$rec_column2\"}" \
--tags_input="$rec_column3" \
--post_content="$rec_column4"
done < <(tail -n +2 ./files/$1) > ./logs/$1.log
and this is an example of the csv file that I that the data from:
title|word|tags|content
Lorem|ipsum|dolor|sit
amet|consectetur|adipiscing|elit
Maecenas|sed|condimentum|est
in|fermentum|justo|Aenean
The issue that I'm facing is that if the post creation gets interrupted for some reason, I would than have to take a look at the log file to see what was the post id of the last post created, afterwards I go to the WordPress site, see the that last post, find the line in the csv file, delete all the line above that line and start over.
I would like to improve that process so, after the post is created, the line from which that post was created in the CSV file is deleted. so next time that the post creation process gets interrupted, I will be able to continue it from the place it was stopped in.
for example if the process was stopped after the first line the new csv file will look like:
title|word|tags|content
amet|consectetur|adipiscing|elit
Maecenas|sed|condimentum|est
in|fermentum|justo|Aenean
I'll appreciate any suggestions or code modification on how to make this work like I want it to.
In the while loop, store current line number in a variable $LINE_NUMBER.
Create a function delete_lines() which deletes processed lines from the input file:
delete_lines() {
sed -i "1,${LINE_NUMBER}d" ./files/$FILE_NAME
exit
}
Finally, set up a trap which calls this function on error, exit or script interruption:
trap delete_lines EXIT ERR INT TERM
Whole solution:
#!/bin/bash
LINE_NUMBER=1
FILE_NAME="./files/$1"
delete_lines() {
if [[ "$LINE_NUMBER" -gt "1" ]]; then
sed -i "2,${LINE_NUMBER}d" "$FILE_NAME"
fi
exit
}
trap delete_lines ERR INT TERM
while IFS="|" read -r rec_column1 rec_column2 rec_column3 rec_column4
do
cd /posts/ &&
wp post create \
--post_type=post \
--post_title="$rec_column1" \
--post_status=publish \
--post_category=2 \
--url=http://127.0.0.1/ \
--meta_input="{\"word\":\"$rec_column2\"}" \
--tags_input="$rec_column3" \
--post_content="$rec_column4"
LINE_NUMBER=$((LINE_NUMBER+1))
done < <(tail -n +2 "$FILE_NAME") > ./logs/$1.log
delete_lines
trap delete_lines EXIT ERR INT TERM
delete_lines() {
sed -i "2,${LINE_NUMBER}d" ${FILES_DIR}/$FILE_NAME
exit
}
FILE_NAME="$1"
FILES_DIR="/path/to/files"
LOGS_DIR="/path/to/logs"
while IFS="|" read -r LINE_NUMBER rec_column1 rec_column2 rec_column3 rec_column4
do
cd /posts/ &&
wp post create \
--post_type=post \
--post_title="$rec_column1" \
--post_status=publish \
--post_category=2 \
--url=http://127.0.0.1/ \
--meta_input="{\"word\":\"$rec_column2\"}" \
--tags_input="$rec_column3" \
--post_content="$rec_column4"
done < <(awk 'NR!=1 { print NR"|"$0}' ${FILES_DIR}/${FILE_NAME}) > ${LOGS_DIR}/${FILE_NAME}.log
delete_lines
I am working on a bash script and I got a list of IP's that I wanted to add one by one in a CURL command.
For example given list on a file named list.txt
8.8.8.8
10.10.10.10
136.34.24.22
192.168.10.32
I wanted to add each value on curl command
curl -k -u $user:$password "https://logservice/jobs" --data-urlencode 'search=search index=test $ARRAYVALUE | head 1' > output.txt
Where $ARRAYVALUE is the IP address to be used on the command.
I will appreciate any hint.
Thanks
If I understood correctly, you want to:
map each line of a "list.txt" to an item of an array
loop over the newly created array inserting items one by one into your command invocation
Consider this, heavily commented, snippet. Look especially at mapfile and how variable is used in curl invocation, surrounded by double quotes.
#!/bin/bash
# declare a (non-associative) array
# each item is indexed numerically, starting from 0
declare -a ips
#put proper values here
user="userName"
password="password"
# put file into array, one line per array item
mapfile -t ips < list.txt
# counter used to access items with given index in an array
ii=0
# ${#ips[#]} returns array length
# -lt makes "less than" check
# while loops as long as condition is true
while [ ${ii} -lt ${#ips[#]} ] ; do
# ${ips[$ii]} accesses array item with the given (${ii}) index
# be sure to use __double__ quotes around variable, otherwise it will not be expanded (value will not be inserted) but treated as a string
curl -k -u $user:$password "https://logservice/jobs" --data-urlencode "search=search index=test ${ips[$ii]} | head -1" > output.txt
# increase counter to avoid infinite loop
# and access the next item in an array
((ii++))
done
You may read about mapfile in GNU Bash reference: Built-ins.
You may read about creating and accessing arrays in GNU Bash reference: Arrays
Check this great post about quotes in bash.
I hope you found this answer helpful.
I believe you need something like this :
#!/bin/bash
function FN()
{
filename=$1
declare -a IPs_ARRAY
i=0
user=$2
password=$3
while read ip
do
IPs_ARRAY[$i]=$ip
echo ${IPs_ARRAY[$i]}
# Uncomment for your actions ::
#curl -k -u $user:$password "https://logservice/jobs" --data-urlencode 'search=search index=test ${IPs_ARRAY[$i]} | head 1' > output.txt
(( i++ ))
done < $filename
}
#############
### MAIN ###
###########
read -p "Enter username: " username
read -p "Enter password: " password
# Call your function
filename="list.txt"
FN $filename $username $password
I have added to my explanation a bit. Conceptually, I am running a script that processes in a loop, calling shells that use the line content as an input parameter.(FYI: a kicks off an execution and b monitors that execution)
I am needing 1a and 1b to run first, in paralell for the first two $param
Next, 2a and 2b need to run in serial for $params when step 1 is complete
3a and 3b will kick off once 2a and 2b are complete (irrelevant if serial or parallel)
Loop continues with next 2 lines from input .txt
I cant get it to process the second in serial, only all in parallel: What I need is the following
cat filename | while readline
export param=$line
do
./script1a.sh "param" > process.lg && ./script2b.sh > monitor.log &&
##wait for processes to finish, running 2 in parallel in script1.sh
./script2a.sh "param" > process2.log && ./script2b.sh > minitor2.log &&
##run each of the 2 in serial for script2.sh
./script3a.sh && ./script3b.sh
I tried adding in wait, and tried an if statement containing script2a.sh and script2b.sh that would run in serial, but to no avail.
if ((++i % 2 ==0)) then wait fi
done
#only run two lines at a time, then cycle back through loop
How on earth can I get the script2.sh to run in serial as a result of script1 in parallel??
Locking!
If you want to parallelize script1 and script3, but need all invocations of script2 to be serialized, continue to use:
./script1.sh && ./script2.sh && ./script3.sh &
...but modify script2 to grab a lock before it does anything else:
#!/bin/bash
exec 3>.lock2
flock -x 3
# ... continue with script2's business here.
Note that you must not delete the .lock2 file used here, at risk of allowing multiple processes to think they hold the lock concurrently.
You are not showing us how the lines you read from the file are being consumed.
If I understand your question correctly, you want to run script1 on two lines of filename, each in parallel, and then serially run script2 when both are done?
while read first; do
echo "$first" | ./script1.sh &
read second
echo "$second" | ./script1.sh &
wait
script2.sh & # optionally don't background here?
script3.sh
done <filename &
The while loop contains two read statements, so each iteration reads two lines from filename and feeds each to a separate instance of script1. Then we wait until they both are done before we run script2. I background it so that script3 can start while it runs, and background the whole while loop; but you probably don't actually need to background the entire job by default (development will be much easier if you write it as a regular foreground job, then when it works, background the whole thing when you start it if you need to).
I can think of a number of variations on this depending on how you actually want your data to flow; here is an update in response to your recently updated question.
export param # is this really necessary?
while read param; do
# First instance
./script1a.sh "$param" > process.lg &&
./script2b.sh > monitor.log &
# Second instance
read param
./script2a.sh "$param" > process2.log && ./script2b.sh > minitor2.log &
# Wait for both to finish
wait
./script3a.sh && ./script3b.sh
done <filename
If this still doesn't help, maybe you should post a third question where you really actually explain what you want...
I am not 100% sure what you mean with your question, but now I think you mean something like this in your inner loop:
(
# run script1 and script2 in parallel
script1 &
s1pid=$!
# start no more than one script2 using GNU Parallel as a mutex
sem --fg script2
# when they are both done...
wait $s1pid
# run script3
script3
) & # and do that lot in parallel with previous/next loop iteration
#tripleee I put together the following if interested (note:I changed some variables for the post so sorry if there are inconsistencies anywhere...also the exports have their reasons. I think there is a better way than exporting, but for now it works)
cat input.txt | while read first; do
export step=${first//\"/}
export stepem=EM_${step//,/_}
export steptd=TD_${step//,/_}
export stepeg=EG_${step//,/_}
echo "$step" | $directory"/ws_client.sh" processOptions "$appName" "$step" "$layers" "$stages" "" "$stages" "$stages" FALSE > "$Folder""/""$stepem""_ProcessID.log" &&
$dir_model"/check_ status.sh" "$Folder" "$stepem" > "$Folder""/""$stepem""_Monitor.log" &
read second
export step2=${second//\"/}
export stepem2=ExecuteModel_${step2//,/_}
export steptd2=TransferData_${step2//,/_}
export stepeg2=ExecuteGeneology_${step2//,/_}
echo "$step2" | $directory"/ws_client.sh" processOptions "$appName" "$step2" "$layers" "$stages" "" "$stages" "$stages" FALSE > "$Folder""/""$stepem2""_ProcessID.log" &&
$dir _model"/check _status.sh" "$Folder" "$stepem2" > "$Folder""/""$stepem2""_Monitor.log" &
wait
$directory"/ws_client.sh" processOptions "$appName" "$step" "$layers" "" "" "$stage_final" "" TRUE > "$appLogFolder""/""$steptd""_ProcessID.log" &&
$dir _model"/check_status.sh" "$Folder" "$steptd" > "$Folder""/""$steptd""_Monitor.log" &&
$directory"/ws_client.sh" processOptions "$appName" "$step2" "$layers" "" "" "$stage_final" "" TRUE > "$appLogFolder""/""$steptd2""_ProcessID.log" &&
$dir _model"/check _status.sh" "$Folder" "$steptd2" > "$Folder""/""$steptd2""_Monitor.log" &
wait
$directory"/ws_client.sh" processPaths "$appName" "$step" "$layers" "$genPath_01" > "$appLogFolder""/""$stepeg""_ProcessID.log" &&
$dir _model"/check _status.sh" "$Folder" "$stepeg" > "$Folder""/""$stepeg""_Monitor.log" &&
$directory"/ws_client.sh" processPaths "$appName" "$step2" "$layers" "$genPath_01" > "$appLogFolder""/""$stepeg2""_ProcessID.log" &&
$dir_model"/check _status.sh" "$Folder" "$stepeg2" > "$Folder""/""$stepeg2""_Monitor.log" &
wait
if (( ++i % 2 == 0))
then
echo "Waiting..."
wait
fi
I understand your question like:
You have a list of models. These models needs to be run. After they are run they have to be transferred. The simple solutions is:
run_model model1
transfer_result model1
run_model model2
transfer_result model2
But to make this go faster, we want to parallelize parts. Unfortunately transfer_result cannot be parallelized.
run_model model1
run_model model2
transfer_result model1
transfer_result model2
model1 and model2 are read from a text file. run_model can be run in parallel, and you would like 2 of those running in parallel. transfer_result can only be run one at a time, and you can only transfer a result when it has been computed.
This can be done like this:
cat models.txt | parallel -j2 'run_model {} && sem --id transfer transfer_model {}'
run_model {} && sem --id transfer transfer_model {} will run one model and if it succeeds transfer it. Transferring will only start if no other transfer is running.
parallel -j2 will run two of the these jobs in parallel.
If transfer takes shorter than computing a model, then you should get no surprises: the transfers will at most be swapped with the next transfer. If transfer takes longer than running a model, you might see that the models are transferred completely out of order (e.g. you might see transfer of job 10 before transfer of job 2). But they will all be transferred eventually.
You can see the execution sequence exemplified with this:
seq 10 | parallel -uj2 'echo ran model {} && sem --id transfer "sleep .{};echo transferred {}"'
This solution is better than the wait based solution because you can run model3 while model1+2 is being transferred.
Problem Statement:-
Below is the script that someone else wrote and he left the company so I don't know whom should I ask about this. So that is the reason I am posting here to find the solution.
What this script does is- It gzip the data from a particular folder (/data/ds/real/EXPORT_v1x0) for a particular date (20121017) and move it back to HDFS (hdfs://ares-nn/apps/tech/ds/new/) directory.
date=20121017
groups=(0 '1[0-3]' '1[^0-3]' '[^01]')
for shard in 0 1 2 3 4 5 6 7 8 9 10 11; do
for piece in 0 1 2 3; do
group=${groups[$piece]}
if ls -l /data/ds/real/EXPORT_v1x0_${date}_${shard}_T_${group}*.dat.gz; then
gzip -dc /data/ds/real/EXPORT_v1x0_${date}_${shard}_T_${group}*.dat.gz | \
hadoop jar /export/home/ds/lib/HadoopUtil.jar com.host.hadoop.platform.util.WriteToHDFS -z -u \
hdfs://ares-nn/apps/tech/ds/new/$date/EXPORT-part-$shard-$piece
sleep 15
fi
done
done
So during the migration to HDFS I found out this file has some problem in HDFS-
hdfs://ares-nn/apps/tech/ds/new/20121017/EXPORT-part-8-3
So Is there any way by doing some permutation from the above script we can find out what are the files under this directory (/data/ds/real/EXPORT_v1x0) which ultimately got converted to this hdfs://ares-nn/apps/tech/ds/new/20121017/EXPORT-part-8-3 which has the problem.
Any thoughts?
Update:-
Something like this below?
groups=(0 '1[0-3]' '1[^0-3]' '[^01]')
for shard in 0 1 2 3 4 5 6 7 8 9 10 11; do
for piece in 0 1 2 3; do
group=${groups[$piece]}
if ls -l /data/ds/real/EXPORT_v1x0_${date}_${shard}_T_${group}*.dat.gz; then
[ "$date/EXPORT-part-$shard-$piece" == "20121017/EXPORT-part-8-3" ] && {
echo /data/real/EXPORT_v1x0_${date}_${shard}_T_${group}*.dat.gz
}
fi
done
done
Few Sample Files Format I have in the /data/real/EXPORT folder-
/data/real/EXPORT_v1x0_20121017_4_T_115600_115800.dat.gz
/data/real/EXPORT_v1x0_20121017_4_T_235600_235800.dat.gz
/data/real/EXPORT_v1x0_20121017_4_T_115800_120000.dat.gz
/data/real/EXPORT_v1x0_20121017_4_T_235800_000000.dat.gz
And few sample output that I got after making changes-
/data/real/EXPORT_v1x0_20121017_0_T_0*.dat.gz: No such file or directory
/data/real/EXPORT_v1x0_20121017_0_T_1[0-3]*.dat.gz: No such file or directory
/data/real/EXPORT_v1x0_20121017_0_T_1[^0-3]*.dat.gz: No such file or directory
/data/real/EXPORT_v1x0_20121017_0_T_[^01]*.dat.gz: No such file or directory
In this case reaplce the whole gzip line to:
[ "$date/EXPORT-part-$shard-$piece" == "20121017/EXPORT-part-8-3" ] && {
echo /data/real/EXPORT_v1x0_${date}_${shard}_T_${group}*.dat.gz
}
That should do the trick.
Edit: remove sleep to speed up the loop!
I have many data files in this format:
-1597.5421
-1909.6982
-1991.8743
-2033.5744
But I would like to merge them all into one data file with each original data file taking up one row with spaces in between so I can import it in excel.
-1597.5421 -1909.6982 -1991.8743 -2033.5744
-1789.3324 -1234.5678 -9876.5433 -9999.4321
And so on. Each file is named ALL.ene and every directory in my working directory contains it. Can someone give me a quick fix? Thanks!
:edit. Each file has 11 entries. Those were just examples.
for i in */ALL.ene
do
echo $(<$i)
done > result.txt
Assumptions:
I assume all your data files are of this format:
<something1><newline>
<something2><newline>
<something3><newline>
So for example, if the last newline is missing, the following script will miss the field corresponding to <something3>.
Usage: ./merge.bash -o <output file> <input file list or glob>
The script appends to any existing output files from previous runs. It also does not make any assumptions to how many fields of data every input file has. It blindly puts every line into a line in the output file separated by spaces.
#!/bin/bash
# set -o xtrace # uncomment to debug
declare output
[[ $1 =~ -o$ ]] && output="$2" && shift 2 || { \
echo "The first argument should always be -o <output>";
exit -1; }
declare -a files=("${#}") row
for file in "${files[#]}";
do
while read data; do
row+=("$data")
done < "$file"
echo "${row[#]}" >> "$output"
row=()
done
Example:
$ cat data1
-1597.5421
-1909.6982
-1991.8743
-2033.5744
$ cat data2
-1789.3324
-1234.5678
-9876.5433
-9999.4321
$ ./merge.bash -o test data{1,2}
$ cat test
-1597.5421 -1909.6982 -1991.8743 -2033.5744
-1789.3324 -1234.5678 -9876.5433 -9999.4321
This is what coreutils paste is good at, try:
paste -s data_files*