Download multiple files, with different final names

Download multiple files, with different final names - linux

OK, what I need is fairly simple.
I want to download LOTS of different files (from a specific server), via cURL and would want to save each one of them as a specific new filename, on disk.
Is there an existing way (parameter, or whatever) to achieve that? How would you go about it?
(If there was an option to input all URL-filename pairs in a text file, one per line, and get cURL to process it, would be ideal)
E.g.
http://www.somedomain.com/some-image-1.png --> new-image-1.png
http://www.somedomain.com/another-image.png --> new-image-2.png
...

OK, just figured a smart way to do it myself.
1) Create a text file with pairs of URL (what to download) and Filename (how to save it to disk), separated by comma (,), one per line. And save it as input.txt.
2) Use the following simple BASH script :
while read line; do
IFS=',' read -ra PART <<< "$line";
curl $PART[0] -o $PART[1];
done < input.txt
*Haven't thoroughly tested it yet, but I think it should work.

Related

How can I run two bash scripts simultaneously and without repetition of the same action?

I'm trying to write a script that automatically runs a data analysis program. The data analysis takes a file, analyzes it, and puts all the outputs into a folder. The program can be run on two terminals simultaneously (each analyzing a different subject file).
I wrote a script that can do all the inputs automatically. However, I can only get my script to run one automatically. If I run my script simultaneously it will analyze the same subject twice (useless)
Currently, my script looks like:
for name in `ls [file_directory]`
do
[Data analysis commands]
done
If you run this on two terminals, it will start from the top of the directory containing all the data files. This is a problem, so I tried to do checks for duplicates but they weren't very effective.
I tried a name comparison with the if command (didn't work because all the output files except one were of a unique name, so it would check the first outfput folder at the top of the directory and say the name was different even though an output folder further down had the same name). It looked something like..
for name in `ls <file_directory>`
do
for output in `ls <output directory>`
do
If [ name==output ]
then
echo "This file has already been analyzed."
else
<Data analyis commands>
fi
done
done
I thought this was the right method but apparently not. I would need to check all the names before some decision was made (rather one by one which that does)
Then I tried moving completed data files with the mv command (didn't work because "name" in the for statement stored all the file names so it went down the list regardless of what was in the folder at present). I remember reading something about how shell scripts do not do things in "real time" so it makes sense that this didn't work.
My thought was looking for some sort of modification to that if statement so it does all the name checks before I make a decision (how?)
Also are there any other commands I could possibly be missing that I could possibly try?

One pattern I use often is to use split command.
ls <file_directory> > file_list
split -d -l 10 file_list file_list_part
This will create files like file_list_part00 to file_list_partnn
You can then feed these file names to you script.
for file_part in `ls file_list_part*`
do
for file_name in `cat file_part | tr '\n' ' '`
do
data_analysis_command file_name
done
done

Never use "ls" in a "for" (http://mywiki.wooledge.org/ParsingLs)
I think you should use a fifo (see mkfifo)

As a follow-on from the comments, you can install GNU Parallel with homebrew:
brew install parallel
Then your command becomes:
parallel analyse ::: *.dat
and it will process all your files in parallel using as many CPU cores as you have in your Mac. You can also add in:
parallel --dry-run analyse ::: *.dat
to get it to show you the commands it would run without actually running anything.
You can also add in --eta (Estimated Time of Arrival) for an estimate of when the jobs will be done, and -j 8 if you want to run, say 8, jobs at a time. Of course, if you specifically want the 2 jobs at a time you asked for, use -j 2.
You can also have GNU Parallel simply distribute jobs and data to any other machines you may have available via ssh access.

Interactive quiz in Bash (Multiple Q's)

I'm teaching an introductory Linux course and have abandoned the paper-based multiple-choice quizzes and have created interactive quizzes in Bash. My quiz script is functional, but kind of quick-and-dirty, and now I'm in the improvement phase and looking for suggestions.
First off, I'm not looking to automate the grading, which certainly simplifies things.
Currently, I have a different script file for each quiz, and the questions are hard-coded. That's obviously terrible, so I created a .txt file holding the questions, delimited by lines with "question 01" etc. I can loop through and use sed -n "/^quest.*$i\$/,/^quest.*$(($i+1))\$/p", but this prints the delimiter lines. I can pipe through sed "/^q/d" or head -n-1|tail -n+2 to get rid of them, but is there a better way?
Second issue: For questions where the answer is an actual command, I'm printing a [user]$ prompt, but for short-answer, I'm using a >. In my text file, for each question, the last line is the prompt to use. Initially, I was thinking I could store the question in a variable and |tail -1 it to get the prompt, but duh, when you store it it strips newlines. I want the cursor to immediately follow the prompt, so I either need to pass it to read -p or strip the final newline from the output. (Or create some marker in the file to differentiate between the $ and > prompt.) One thought I had was to store each question in a separate file and just cat it to display it, making sure there was no newline at the end. That might be kind of a pain to maintain, but it would solve both problems. Thoughts?
Now to how I'm actually running the quiz. This is a Fedora 20 box, and I tried copying bash and setuid-ing it to me so that it would be able to read the quiz script that the students couldn't normally read, but I couldn't get that to work. After some trial and error, I ended up copying touch and setuid-ing it to me, then using that to create their answer file in a "submit" directory with an ACL so new files have o=w so they can write to their answer file (in the quiz with >> echo) but not read it back or access the directory. The only major loophole I see with this is that they can delete their file by name and start the quiz over with no record of having done so. Since I'm not doing any automatic grading, I'm not terribly concerned with the students being able to read the script file, although if I'm storing the questions separately, I suppose I could make a copy of cat and setuid it to read in files that they can't access.
Also, I realize that Bash is not the best choice for this, and learning the required simple input/output for Python or something better would not take much effort. Perhaps that's my next step.

1) You could use
sed -n "/^quest.*$i\$/,/^quest.*$(($i+1))\$/ { //!p }"
Here // repeats the last attempted pattern, which is the opening pattern in the first line of the range and the closing pattern for the rest.
...by the way, if you really want to do this with sed, you better be damn sure that i is a number, or you'll run into code injection problems.
2) You can store multiline command output in a variable without problems. You just have to make sure you quote the variable everafter to avoid shell expansion on it. For example,
QUESTION=$(sed -n "/^quest.*$i\$/,/^quest.*$(($i+1))\$/ { //!p }" questions.txt)
echo -n "$QUESTION" # <-- the double quotes are important here.
The -n option to echo tells echo to not append a newline at the end, which should take care of your prompt problem.
3) Yes, well, hackery breeds more hackery. If you want to lock this down, the first order of business would be to not give students a shell on the test machine. You could put your script behind inetd and have the students fill it out with telnet or something, I suppose, but...really, why bash? If it were me, I'd knock something together with a web server and one of the several gazillion php web quiz frameworks. Although I also have to wonder why it's a problem if students can see the questions and the answers they gave. It's not like all students use the same account and can see each other's answers, is it? (is it?) Don't store an answer key on the same machine and you shouldn't have a problem.

Download batch files from a website using linux

i want to downloads some files (nearly about 1000-2000 zip files) from a website.
i can sit around and add each file one after another. please give me a program or script or whatever method so i can automate the download.
The website i am talking about has download link as
sitename.com/sometetx/date/12345/folder/12345_zip.zip
date can be taken care of. the main concern is that number 12345 before and after the folder, they both change simultaneously. e.g.
sitename.com/sometetx/date/23456/folder/23456_zip.zip
sitename.com/sometetx/date/54321/folder/54321_zip.zip
i tried using curl
sitename.com/sometetx/date/[12345-54321]/folder/[12345-54321]_zip.zip
but it makes to much of combination of downloads i.e. keeps left 12345 as it is and scan through 12345 to 54321 the increment left 12345 +1 then repeats scan from [12345-54321].
also tried bash wget
here i have one variable at two places, when using loop the right 12345 with a " _" is ignored by the program.
PLease help me, i dont know much about linux or programing, thanks

In order to get your loop variable next to _ to not be ignored by the shell, put it in the quotes, like this:
$ for ((i=10000; i < 99999; i++)); do \
wget sitename.com/sometetx/date/$i/folder/"$i"_zip.zip; done

Script command to manipulate binary file (on linux)

I am looking for a mechanism to manipulate my eeprom image with a unique device id. I'd like to do this in a make file so that the device would automatically obtain a new ID and then update it to the data image, then flash it. In pseudocode:
wget http://my.centralized.uid.service/new >new.id
binedit binary.image -write 0xE6 new.id
flash binary.image into device
So first we get an id into a separate file, then we overwrite the image (from given offset) with the contents of this ID file. Then flash. But how to do the second part? I looked up bvi, which seems to have some scripting abilities, but I did not fully understand it, and to be honest vi always gave me the creeps.
Thanks for help beforehand!

(Full disclosure: I made the initial vote to close as a duplicate. This answer is adapted from the referenced question.)
Use dd with the notrunc option:
offset=$(( 0xe6 ))
length=$( wc -c < new.id )
dd bs=1 if=new.id of=binary.image count=$length seek=$offset conv=notrunc
You may want to try this on a copy first, just to make sure it works properly.

If you know the offset of the file that you want to replace from, you can use the split command to split the initial file up until the offset. The cat command can then be used to join the required pieces together.
Another useful tool when working with binary files is od which will let you examine the binary files in human readable format.

I would perhaps use something like Perl. See here and in particular the section labelled Updating a Random-Access File (example here)

Piping SVG file into ImageMagick

Sorry if this belongs on serverfault
I'm wondering what the proper way is to use an SVG(xml) string as standard input
for a "convert msvg:- jpeg:- 2>&1" command (using linux)
Currently I'm just saving a temp file to use as input,
but the data originates from an API in my case, so feeding
the string directly to the command would obviously be most efficient.
I appreciate everyone's help. Thanks!

This should work:
convert - output.jpg
Example:
convert logo: logo.svg
cat logo.svg | convert - logo.jpg
Explanation:
The example's first line creates an SVN file and writes it to disk. This is only a preparatory stop so that we can run the second line.
The second line is a pipeline of two commands: cat streams the bytes of the file to stdout (standard output).
The first line served only as preparation for the next command in the pipeline, so that this next command has something to read in.
This next command is convert.
The - character is a way to tell convert to read its input data not from disk, but from stdin (standard input).
So convert reads its input data from its stdin and writes its JPEG output to the file logo.jpg.
So my first command/line is similar to your step described as 'currently I'm just saving a temp file to use as input'.
My second command/line does not use your API (I don't have access to it, do I?), but it demonstrates a different method to 'feeding a string directly to the command'.
So the most important lesson is this: Whereever convert would usually read input from a file and where you would write the file's name on the commandline, you can replace the filename by - to tell convert it should read from stdin. (But you need to make sure that there is actually something offered on convert's standard input which it can digest...)
Sorry, I can't explain better than this...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string