search through files and put into array - linux

cat `find . -name '*.css'`
This will open any css file. I now what do two things.
1) How do I add *.js to this as well. So I want to look inside all css and javascript files.
2) I want to look for any css or image files within those (css or js files) and push those into an array. So I guess look for a .png, .jpg, .gif, .tif, .css and put everything before that until the quote or single quote into an array. I want an array because this command will go into a shell script and after I get all the names of the files that I need I will need to loop through and download those files later.
Any help would be appreciated.

Extra hackery, in case someone needs it:
find ./ -name "*.css" | xargs grep -o -h -E '[A-Za-z0-9:./_-]+\.(png|jpg|gif|tif|css)'| sed -e 's/\.\./{{url here}}/g'|xargs wget
will download every missing resource

Do the command:
find ./ -name "*.css" -or -name "*.js" > fileNames.txt
Then read each line of fileNames.txt in the loop and download them.
Or if you are going to use wget to download the images you could do:
find ./ -name "*.css" -or -name "*.js" | xargs grep '*.png' | xargs wget
May need a little refinement like a cut after the grep but you get the idea

1) simple answer: you can add the names of all .js files to your cat command, by instructing find to find more files:
cat `find . -name '*.css' -or -name '*.js'`
2) a text-searching tool such as grep is probably what you're after:
find . -name '*.css' -or -name '*.js' | xargs grep -o -h -E '[A-Za-z0-9:./_-]+\.(png|jpg|gif|tif|css)'
Note: my grep pattern isn't universal or perfect, but it's a starting example. It matches any string that includes alpha-numeric,colon,dot,slash,underscore or hyphens in it, followed by any one of the given extensions.
The -o option causes grep to output only the parts of the .css/.js files that match the pattern (i.e. only the apparent filenames).
If you want to download them you could add | xargs wget -v to the command, which would instruct wget to fetch all those filenames.
NOTE: this won't work for relative filenames; some other magic will be required (i.e. you'll have to resolve them with respect to the grepped file's location). Perhaps some extra hackery, such as sed or awk.
Also: How often do you see references to TIFFs in your CSS/JS?

Related

Bash - Find directory containing specific logs file

I've created a script to quickly analyze some logs and automatically provide advices to solve problems based on errors found.
All works as expected.
However, it's appears that folders structure containing these logs can change (depends on system configuration) and my script not work any more.
I would like to find a way to find the directory containing a specifics files like logs or appinfo.txt file.
Once obtains I could use it as variable and finally solve my problem.
Here is an example:
AppLogDirectory ='Your_Special_Command_You_Will_HelpMe_To_Find'
grep -i "Error" $AppLogDirectory/esl*.log
Log format is: ESL.randomValue.log
Files analyzed : appinfo.txt,
system.txt etc ..
A suggested in comment section, I edit my orginal post with more detail to clarify the context, below an example:
Log files (esl.xxx.tt.ss.log ) can be in random directory, like:
/var/log/ApplicationName/logs/
/opt/ApplicationName/logs/
/var/data/ApplicationName/Extended/logs/
Because of random directory, I need to find a solution to print the directory names of the files that match esl*.log patter (without esl filename)
Use find and pass the output to xargs with grep, like so, which runs grep on multiple files and prints the output together with the file name where the pattern was found:
find /path/to/files /another/path/to/other/files \( -name 'appinfo.txt' -o -name 'system.txt' -o -name 'esl*.log' \) -print0 | xargs -0 grep -i 'Error'
Or simply use -exec ... \+, which gives the same effect, without the need for xargs:
find /path/to/files /another/path/to/other/files \( -name 'appinfo.txt' -o -name 'system.txt' -o -name 'esl*.log' -exec grep -i 'Error' \+
To find the directories which contain the files that contain the desired pattern, use grep -l to print file names only (not the lines that match), and pipe the results to xargs dirname to print the directory names. If you need the unique dir names, pipe it further to sort -u:
find /path/to/files /another/path/to/other/files \( -name 'appinfo.txt' -o -name 'system.txt' -o -name 'esl*.log' -exec grep -il 'Error' \+ | xargs dirname | sort -u
SEE ALSO:
GNU find manual
To search for files based on their contents
xargs
Solution found thanks to you thank you again!
#Ask for extracted tar.gz folder
read -p "Where did you extract the tar.gz file? r1
#directory path where esl files is located
logpath=`find $r1 -name "esl*.log" | xargs dirname | sort -u`
#Search value (here "Error") into all esl*.log
grep 'Error' $logpath/esl*.log | awk '{print $8}'

How can i update the contents of a file by replacing strings using grep and find command

I am finding XML files under particular sub directory having "responsible" word in it. Searching is working fine as shown below.
find . -name '*.xml' -exec grep -H 'responsible' {} \;
./dir1/d1.xml<responsible><></responsible>
./dir2/d2.xml<responsible><SYSTEM></responsible>
./dir3/d3.xml<responsible><SYSTEM></responsible>
... and so on.
Is there a way i can replace all occurrences of SYSTEM with blank one.
result i am looking is:
./dir1/d1.xml<responsible><></responsible>
./dir2/d2.xml<responsible><></responsible>
./dir3/d3.xml<responsible><></responsible>
I would use perl pie
perl -p -i -e 's/<responsible><SYSTEM><\/responsible>/<responsible><><\/responsible>/' `find ./ -name *.xml`

bash: filter the files where NOT to search

I have created a script that searches for the specified keywords in specified directories:
find $directory -type f -name "*.properties" -exec grep -Fi "$keyword"
The problem i faced is that the $directory contains 2 types of files - sample files and config files: config / sample.config. Where sample.config is an example only, thus i'm not interested to include them into the search.
The question is how to exclude these 'sample.*' files out of the results of my results?
From the question to exclude sample.config files, add ! -name sample.config in find commands, for example :
find $(<$SRC) -type f -name "*.properties" ! -name sample.config -exec grep -Fi "$keyword" --color {} +
however *.properties can't match sample.config so it will not change the result
Probably 1 command to search $keyword, with all 4 kinds of your file types, exclude sample.*:
msr -rp dir1,dir2,dirN -f "\.(properties|pl|xml|ini)$" --nf "^sample\." -it "keyword"
Use -PAC or -P -A -C to remove color and line number etc. to get pure result.
Use -l to just list the file paths and show distribution: count + percentage.
msr.gcc* is a single exe tool to search/replace file/pipe in my open project https://github.com/qualiu/msr tools directory, with cross platform versions and OS-bit versions. Built-in doc like: https://qualiu.github.io/msr/usage-by-running/msr-CentOS-7.html Vivid-demo, Performance-comparision-with-findstr-and-grep, test etc. just see the home.
Using the suggestion of #Nahuel, i've modified it a bit and it started working for me as:
find $(<$SRC) -type f -name "*.properties" ! -name "sample.*" -exec grep -Fi "$keyword" --color {} +

Piping find results into grep for fast directory exclusion

I am successfully using find to create a list of all files in the current subdirectory, excluding those in the subdirectory "cache." Here's my first bit of code:
find . -wholename './cach*' -prune -o -print
I now wish to pipe this into a grep command. It seems like that should be simple:
find . -wholename './cach*' -prune -o -print | xargs grep -r -R -i "samson"
... but this is returning results that are mostly from the cache directory. I've tried removing the xargs reference, but that does what you'd expect, running the grep on text of the file names, rather than on the files themselves. My goal is to find "samson" in any files that aren't cached content.
I'll probably get around this issue by just using doubled greps in this instance, but I'm very curious about why this one-liner behaves this way. I'd love to hear thoughts on a way to modify it while still using these two commands (as there are speed advantages to doing it this way).
(This is in CentOS 5, btw.)
The wholename match may be the reason why it's still including "cache" files. If you're executing the find command in the directory that contains the "cache" folder, it should work. If not, try changing it to -name '*cache*' instead.
Also, you do not need the -r or -R for your grep, that tells it to recurse through directories - but you're testing individual files.
You can update your command using the piped version, or a single-command:
find . -name '*cache*' -prune -o -print0 | xargs -0 grep -il "samson"
or
find . -name '*cache*' -prune -o -exec grep -iq "samson" {} \; -print
Note, the -l in the first command tells grep to "list the file" and not the line(s) that match. The -q in the second does the same; it tells grep to respond quietly so find will then just print the filename.
You've told grep itself to recurse (twice! -r and -R are synonyms). Since one of the arguments you're passing is . (the top directory), grep is searching in every file (some of them twice, or even more if they're in subdirectories).
If you're going to use find and grep, do this:
find . -path './cach*' -prune -o -print0 | xargs -0 grep -i "samson"
Using -print0 and -0 makes your script work even with file names that contain spaces or punctuation characters.
However, you probably don't need to bother with find here, since GNU grep is capable of excluding directories:
grep -R --exclude-dir='cach*' -i "samson" .
(This also excludes ./deeply/nested/directory/cache. If you only want to exclude cache directories at the toplevel, use find as you did.)
Use the -exec option on find instead of piping them to another command. From there you can use grep "samson" {} \; to look for samson in each file listed.
For example:
find . -wholename './cach*' -prune -o -exec grep "samson" "{}" +

How to list specific type of files in recursive directories in shell?

How can we find specific type of files i.e. doc pdf files present in nested directories.
command I tried:
$ ls -R | grep .doc
but if there is a file name like alok.doc.txt the command will display that too which is obviously not what I want. What command should I use instead?
If you are more confortable with "ls" and "grep", you can do what you want using a regular expression in the grep command (the ending '$' character indicates that .doc must be at the end of the line. That will exclude "file.doc.txt"):
ls -R |grep "\.doc$"
More information about using grep with regular expressions in the man.
ls command output is mainly intended for reading by humans. For advanced querying for automated processing, you should use more powerful find command:
find /path -type f \( -iname "*.doc" -o -iname "*.pdf" \)
As if you have bash 4.0++
#!/bin/bash
shopt -s globstar
shopt -s nullglob
for file in **/*.{pdf,doc}
do
echo "$file"
done
find . | grep "\.doc$"
This will show the path as well.
Some of the other methods that can be used:
echo *.{pdf,docx,jpeg}
stat -c %n * | grep 'pdf\|docx\|jpeg'
We had a similar question. We wanted a list - with paths - of all the config files in the etc directory. This worked:
find /etc -type f \( -iname "*.conf" \)
It gives a nice list of all the .conf file with their path. Output looks like:
/etc/conf/server.conf
But, we wanted to DO something with ALL those files, like grep those files to find a word, or setting, in all the files. So we use
find /etc -type f \( -iname "*.conf" \) -print0 | xargs -0 grep -Hi "ServerName"
to find via grep ALL the config files in /etc that contain a setting like "ServerName" Output looks like:
/etc/conf/server.conf: ServerName "default-118_11_170_172"
Hope you find it useful.
Sid
Similarly if you prefer using the wildcard character * (not quite like the regex suggestions) you can just use ls with both the -l flag to list one file per line (like grep) and the -R flag like you had. Then you can specify the files you want to search for with *.doc
I.E. Either
ls -l -R *.doc
or if you want it to list the files on fewer lines.
ls -R *.doc
If you have files with extensions that don't match the file type, you could use the file utility.
find $PWD -type f -exec file -N \{\} \; | grep "PDF document" | awk -F: '{print $1}'
Instead of $PWD you can use the directory you want to start the search in. file prints even out he PDF version.

Resources