I'm currently trying to work out a way to use the wget to find a string within the html files of a site without downloading it locally. For example I am looking for any passwords that may be stored in a html file, how would I use wget to find that.
At present the only way I have been able to search is to do the following
Download the site
wget - r *sitename*
Use Find to located the text in the files
find . -type f -exec grep -H "stingvalue" {} ;
I have tried merging the commands but cannot get it to work.
Thanks to Barmar for the assist.
This will download the content of the required html file, without downloading the rest of the site. From there you can grep the file for the data you need:
wget -r -O log.log *webaddress* -nv -nd | grep -r -e *string*
Related
I'm trying to find all ZIP files in a specific folder, extract them using GUNZIP, and pipe the output to GREP to search within HTML files contained in these ZIP files.
I managed to do so with UNZIP (unzip -p), but unfortunately due to many servers that I will eventually run that search on with SSH loop, that doesn't have ZIP/UNZIP installed, I'm limited to GUNZIP which is installed on these old Linux kernel servers, I guess that by default.
Is there a way to pipe the output of gunzip extraction (of more than 1 file following a find -exec command) to grep, in a way that will allow searching inside these HTML files (not in their file names, but within)?
That's how I've tried to do it so far, without succeess:
find /home/osboxes/project/ZIPs/*.zip -exec gunzip -l {} \;|grep 'pattern'
UNZIP has a -p option that can pipe the output and I get the needed result with it, but it seems that GUNZIP doesn't...
Can you think of a way to help me make it work?
Appreciated
gunzip -c writes the output to standard output. The original file is not affected. zcat also works, it is the same as gunzip -c.
The problem is that I have a directory full of html files. However, when I open the folder in Firefox it is difficult to navigate because when I open the folder there are also all of the associated html folders.
I tested using a ln -s of just the html to a seperate viewing directory and tested it and it worked.
Now my problem is trying to set up these ln -s across hundreds of files but I cannot figure out how to do this. I thought that the best way would be to use xargs on ls output but I cannot seem to get the syntax to work.
I believe that my problem is that I need to parse two sets of arguments to ln -s but I cannot get it to work
I have tried many different variations of the below but can't get the syntax to work. I've also tried using gnu parallel but still can't seem to get the syntax right.
ls Downloads (filenames) | grep html | xargs ln -s ~\Downloads\(filenames) ~\ViewingDirectory\(filename)
Any help would be appreciated. Thank you.
You misunderstand the use of xargs. And you parse the output of ls, which is generally considered a bad idea.
A better solution would be:
for f in ~\Downloads\*.html ; do
b=$(basename "$f")
ln -s "$f" ~\ViewingDirectory\"$b"
done
If you insist on using xargs, you could do it as follows for example:
find ~/Downloads/ -type f -name '*.html' \
| xargs -I# sh -c 'ln -s # ~/ViewingDirectory/"$(basename #)"'
Now, with xargs you could run the ln calls in parallel by using the -P flag:
find ~/Downloads/ -type f -name '*.html' \
| xargs -P"$(nproc)" -I# sh -c 'ln -s # ~/ViewingDirectory/"$(basename #)"'
where nproc returns the number of processing units available.
While I can find all the .tgz files within a folder and then extract only PDF, EPUB and MOBI files from it if it is present in the archive.
find '/home/pi/Downloads/complete/' -type f -name "*.tgz"| while read i ; do tar -xvzf "$i" -C /home/pi/Downloads/complete/ebook/ --strip=1 --wildcards --no-anchored '*.pdf' '*.mobi' '*.epub'; done
This line of code works perfectly when either of pdf, mobi or epub is present in the archive. However with this code, whenever there is no pdf / epub / mobi within given archive it returns an error as shown below.
tar: *.pdf: Not found in archive
tar: *.mobi: Not found in archive
tar: Exiting with failure status due to previous errors
How to prevent this error. I believe there should be a way to provide the multiple wildcards with a 'OR' operator as available in other scripting languages.
tar isn't a scripting language.
To hide the error message, just redirect the stderr of tar to a bit bucket:
tar ... 2> /dev/null
Note that you might miss other errors, though.
The safe way would be to list the files first, select the ones to extract, and only do that if there were any.
tar --list -f ...tgz | grep '\.\(pdf\|mobi\|epub\)$'
Thanks to #choroba below code is perfect. No error reported. Posting the code as answer so that others have better visibility to the final working piece of code.
find '/home/pi/Downloads/complete/' -type f -name "*.tgz"| while read i ; do tar --list -f "$i" | grep '\.\(pdf\|mobi\|epub\)$' | while read -r line ; do tar -kxvzf "$i" -C "/home/pi/Downloads/complete/ebook/" "$line" --strip=1;done; done;
I have an application zip file created using Play Framework. It create the zip file with name A-1.0.zip. This zip file contains the directory with name A-1.0. (1.0 changes according to the version)
I wanted to extract the zip file and rename the folder from A-1.0 to A. So that my application init.d script finds the directory to start the application. This shuld be done dynamically using shell script.
Is there a way where i can extract all the zip files into A folder instead of extracting into A-1.0 and renaming?? Please help!
The following is what I tried....
unzip A-1.0.zip -d ~/A
(I know that it is very dumb of me to do this !!)
This extracted the file into ~/A/A-1.0/[contents]
I need to extract all the [contents] into ~/A instead of ~/A/A-1.0/. I dunno how to do this using command line.....
My init.d script searched for ~/A/bin/A -Dhttp.port=6565 -Dconfig.file=~/A/conf/application.conf to start the Play! application.
To make this script working, I extract all into A-1.0/ then I rename with mv ~/A-1.0 ~/A manually.
I didn't find any specific unzip option to perform this automatically, but managed to achieve this goal by creating a temporary symbolic link in order to artificially redirect the extracted files this way
ln -s A A-1.0
unzip A-1.0.zip
rm A-1.0
From the unzip man page it boils down to:
unzip A-1.0.zip 'A-1.0/*' -d /the/output/dir
^ ^
| |
| +- files to extract (note the quotes: unzip shall parse the wildcard instd of sh)
+- The archive
EDIT: This answer does not preserve subdirectories. It works fine if one doesn't have or need the subdirectory structure.
I found that you can combine the answer from #géza-török with the -j option mentioned by #david-c-rankin (in the comment below the question). Which leads to unzip -j A-1.0.zip 'A-1.0/*' -d /the/output/dir. That would only process the files inside A-1.0/ and output them straight into the given output directory.
Source: https://linux.die.net/man/1/unzip (look at -j)
I was looking to unzip all .zip files in the current directory into directories with names of the zip files (even after you rename them)
the following command is not elegant but works
cd to/the/dir
find *.zip | cut -d. -f1 | xargs -I % sh -c "unzip %.zip -d %; ls % | xargs -I # sh -c 'mv %/#/* %; rm -rf %/#'"
DISCLAIMER: I am not asking how to fix wordpress, I am asking on how to use find for a specific task.
I host a blog in the Wordpress platform. I've realized that I've been infected and now all my php files show a dirty header that breaks my installation. Fortunately I've realized that this is easy to clean. I just need to remove the first line of the php files, which contains the string neeuczbkme and substitute it with a normal php header <?php
First I created a file with the proper header string <?php and saved it as ~/fix
I am trying to just grep -v all my php files, copy them to ~/tmp then cat ~/fix ~/tmp and overwrite the original php file. If I do this manually it works very well but when I try to implement this idea using find I cannot do so.
I am trying
find plugins -iname "*.php" -exec grep -v neeuczbkme {} > ~/tmp \; -exec cat ~/fix ~/tmp > {} \;
This doesn't work... any ideas?