bash script for creating and then downloading the links - linux

Hellooo.
So I'm wanting to make a script for my girlfriend that uses an external file to append words to a URL, then download the links and iterate.
The awkward thing is she doesn't want to tell me too much (I suspect the result of using the script will be for my benefit :P), so I'm not certain about the function, kind of guessing.
The aim is to get the script to contain a base URL. The script will iterate through an external file that contains a list of word and then append each word to the link. Then the script will then open that link. Then iterate through, append, open etc.
Can someone help me out a bit with this? I'm a bit new to scripting.
Should I set up an external file to hold the base url and then refer to that as well?
I'm thinking somthing along the lines of:
url=$(grep * url.txt)
for i in $(cat file.txt);
do
>> $url
wget $url
done
What and how much do I need to change and add?
Thanks for any help.

I have a file named source which has below content in it :
which-2.16.tar.gz
which-2.17.tar.gz
which-2.21.tar.gz
I wrote a script named downloader with the below content :
#!/bin/bash
url="http://ftp.gnu.org/gnu/which" #source url
while read line
do
wget "$url/$line" #download url = source url + file name from the file
done <source #feeding filenames from the source file.
On running downloader will download the files mentioned in source file from the ftp site mentioned in url. Voila !!
I guess you could employ a similar concept.

Related

How to export many rrd files to xml files

I have set of old rrd files, so I need to convert them to xml files and again to rrd files in a new server. In order to create xml files from whole directory, I have used following simple bash script.
#!/bin/bash
cd /varcacti/rra
for i in ./traffic_in*;
do
/usr/local/rrdtool/bin/rrdtool dump $i /home/newuser/rrd_dump_xml/$i.xml;
done
This provides a set of xml files but the extension is not what I required actually. This provides,
traffic_in_1111.rrd>>>traffic_in_1111.rrd.xml
But I need traffic_in_1111.rrd>>>traffic_in_1111.xml. can someone help me to modify my code?
You want
/home/newuser/rrd_dump_xml/"${i%.rrd}".xml
#..........................^^^^^^^^^^^
which removes the rrd extension from the end of the string.
You might want to be more specific with the files you're looping over:
for i in ./traffic_in*.rrd

WGET - how to download embedded pdf's that have a download button from a text file URL list? Is it possible?

Happy New Years!
I wanted to see if anybody has ever successfully downloaded embedded pdf file's from multiple url's contained in a .txt file for a website?
For instance;
I tried several combinations of wget -i urlist.txt (which downloads all the html files perfectly); however it doesn't also grab each html file's embedded .pdf?xxxxx <---- slug on the end of the .pdf?*
The exact example of this obstacle is the following:
This dataset I have placed all 2 pages of links into a url.txt:
https://law.justia.com/cases/washington/court-of-appeals-division-i/2014/
1 example URL within this dataset:
https://law.justia.com/cases/washington/court-of-appeals-division-i/2014/70147-9.html
The embedded pdf link is the following:
https://cases.justia.com/washington/court-of-appeals-division-i/2014-70147-9.pdf?ts=1419887549
The .pdf files are actually "2014-70147-9.pdf?ts=1419887549" .pdf?ts=xxxxxxxxxx
each one is different.
The URL list contains 795 links. Does anyone have a successful method to download every .html in my urls.txt while also downloading the .pdfxxxxxxxxxxxxxx file's also to go with the .html's ?
Thank you!
~ Brandon
Try using the following:
wget --level 1 --recursive --span-hosts --accept-regex 'https://law.justia.com/cases/washington/court-of-appeals-division-i/2014/.*html|https://cases.justia.com/washington/court-of-appeals-division-i/.*.pdf.*' --input-file=urllist.txt
Details about the options --level, --recursive, --span-hosts, --accept-regex, and --input-file can be found in wget documentation at https://www.gnu.org/software/wget/manual/html_node/index.html.
You will also need to know how regular expressions work. You can start at https://www.grymoire.com/Unix/Regular.html
You are looking for a web-scraper. Be careful to not break any rules if you ever use one.
You could also process the content you have received through wget using some string manipulation in a bash script.

Detect a new file, and send it with mpack

I have a very specific question. I am using Debian.
I have an FTP folder where an app will upload a pdf-file, and the file will be stored in ftpfolder/EMAIL_ADDRESS, and the name of the file will be CURRENT_DATE_AND_TIME.
What I want to do is whenever a new file is uploaded, in either of the EMAIL_ADDRESS folders, to send the file with mpack. As you might have guessed I want the file sent to the name of the folder, with the file attached.
So to break it down I need to:
Detect whenever a new file is uploaded
Extract the address from the foldername
Extract the filename, and attach it with mpack
Send it
I am stumped on how to approach this problem, so any suggestions will be greatly appreciated!
How about a cron that would launch a script doing all the stuff you need and then archive the files found in another folder?
#!/usr/bin/env bash
cd ftpfolder;
for email in *; do
mpack -s "New PDF file uploaded" $email/* $email;
mv $email /archivefolder;
done
Pros:
simplicity
Cons:
you have to have write permissions to move files
messing up with the original files
Note that the above script assumes only one file appears in the folder between the cron executions. If you cannot assure that (i.e. expect more than one file within a minute) you might have to loop over the folder contents.

Exploiting and Correcting Path Traversal Vulnerability

I have a Java Web App running on Tomcat on which I'm supposed to exploit Path traversal vulnerability. There is a section (in the App) at which I can upload a .zip file, which gets extracted in the server's /tmp directory. The content of the .zip file is not being checked, so basically I could put anything in it. I tried putting a .jsp file in it and it extracts perfectly. My problem is that I don't know how to reach this file as a "normal" user from browser. I tried entering ../../../tmp/somepage.jsp in the address bar, but Tomcat just strips the ../ and gives me http://localhost:8080/tmp/ resource not available.
Ideal would be if I could somehow encode ../ in the path of somepage.jsp so that it gets extracted in the web riot directory of the Web App. Is this possible? Are there maybe any escape sequences that would translate to ../ after extracting?
Any ideas would be highly appreciated.
Note: This is a school project in a Security course where I'm supposed to locate vulnerabilities and correct them. Not trying to harm anyone...
Sorry about the downvotes. Security is very important, and should be taught.
Do you pass in the file name to be used?
The check that the server does is probably something something like If location starts with "/tmp" then allow it. So what you want to do is pass `/tmp/../home/webapp/"?
Another idea would be to see if you could craft a zip file that would result in the contents being moved up - like if you set "../" in the filename inside the zip, what would happen? You might need to manually modify things if your zip tools don't allow it.
To protect against this kind of vulnerability you are looking for something like this:
String somedirectory = "c:/fixed_directory/";
String file = request.getParameter("file");
if(file.indexOf(".")>-1)
{
//if it contains a ., disallow
out.print("stop trying to hack");
return;
}
else
{
//load specified file and print to screen
loadfile(somedirectory+file+".txt");
///.....
}
If you just were to pass the variable "file" to your loadfile function without checking, then someone could make a link to load any file they want. See https://www.owasp.org/index.php/Path_Traversal

Linux: WGET - scheme missing using -i option

I am trying to download multiple files from yahoo finance using wget.
To do that i used a python script to generate a text file with all urls that i need.
When downloading a single file (a csv file) using the following code:
wget ichart.finance.yahoo.com/table.csv?s=BIOM3.SA&a=00&b=5&c=1900&d=04&e=21&f=2013&g=d&ignore=.csv
everything goes OK!
However, when the option -i is added and instead of reading the url directly, but instead reading it from the file, i get the error:
Invalid URL ichart.finance.yahoo.com/table.csv?s=BIOM3.SA&a=00&b=5&c=1900&d=04&e=21&f=2013&g=d&ignore=.csv: Scheme missing
The file that contains the urls is a text file with a single url in each line. The urls are exactly like the one in the first example, but with some different parameters.
Is there a way to correct this?
Thanks a lot for reading!!
To solve the problem I added double-quotes on the links and a web protocol. For example:
"http://ichart.finance.yahoo.com/table.csv?s=BIOM3.SA&a=00&b=5&c=1900&d=04&e=21&f=2013&g=d&ignore=.csv"

Resources