Linux shell script - find all files and run a command on each one of them - linux

I'm trying to port a windows batch file to a linux shell script (bash).
Here is the part that is giving me some headache:
for /r %%G in (*Metadata*.xml) DO (
java -jar %SAXON%\saxon9he.jar -o:%%~nG.csv "%%G" %WORKDIR%\transformBQT.xsl)
What this does is find all .xml file containing the text Metadata and then running the an XSLT transformation on each of these files. This takes 3 arguments
-o is the output file (this will be a .csv with the same name as the .xml)
next is the target file
final argument is the .xsl file
I am thinking of using the following:
find /results/ -type f -name "*Metadata*.xml" -exec
java -jar $SAXON/saxon9he.jar -o:??? {} $WORKDIR/transformXMI.xsl
but this doesn't quite work as I don't know how to make the output file have the same name as the .xml (with .csv extension)
Any tips?

You could process the results from find line by line and transform <file>.xml into <file>.csv:
find /results/ -type f -name "*Metadata*.xml" | while read file; do java -jar $SAXON/saxon9h3.jar -o:${file%.xml}.csv $file $WORKDIR/transform.XMI.xsl; done
This simple approach fails in case the file names have spaces in their paths/names.

Per the docs, the string {} is replaced by the current file name bring processed. With that, along with using Bash's parameter expansion, you should be able to rename the output files to CSV.
find /results/ -type f -name "*Metadata*.xml" -exec
bash -c 'fname="{}"; java -jar $SAXON/saxon9he.jar -o:"${fname%.xml}.csv" "$fname" $WORKDIR/transformXMI.xsl' \;
The important bit here is the shell parameters that you create when you execute Bash. fname="{}" creates a new shell parameter with the contents of the current XML file. After you have that, you can use parameter expansions: ${fname%.xml} strips of the .xml extension, which is then replaced with .csv.

Related

BASH loop through multiple files replacing content and filename

I'm trying to find all files with a .pux extension 1 level below my parent directory. Once the files are identified, I want to add a carriage return to each line inside each file (if possible, only if that line doesn't already have the CR). Finally I want to rename the extension to .pun ensuring there is no .pux left behind.
I've been trying different methods with this and my biggest problem is that I cannot develop or debug this code easily as I cannot access the command line directly. I can't access the Linux server that the script will run on. I can only call it from my application on my windows server (trust me, I'm thinking exactly what you are right now).
The Linux server is running BASH 3.2.57(2). I don't believe the Unix2Dos utility is installed as I've tried using it in it's most basic form with no success. I've confirmed my find command can successfully identify the files I need as I have ran this and checked my log file output.
#!/bin/bash
MYSCRIPTS=${0%/*}
PARENTDIR=/home/clnt/parent/
LOGFILE="$MYSCRIPTS"/PUX2PUN.log
find "$PARENTDIR" -mindepth 2 -maxdepth 2 -type f -name "*.pux" > "$LOGFILE"
Logfile output:
/home/clnt/parent/z3y/prz3y.pux
/home/clnt/parent/wsl/prwsl.pux
However when I have tried to build on this code and pipe those results to a while read do, it doesn't appear to do anything.
#!/bin/bash
MYSCRIPTS=${0%/*}
PARENTDIR=/home/clnt/parent/
LOGFILE="$MYSCRIPTS"/PUX2PUN.log
find "$PARENTDIR" -mindepth 2 -maxdepth 2 -type f -name "*.pux" -print0 | while IFS= read -r file; do
sed -i '/\r/! s/$/\r/' "${file}" &&
mv "${file}" "${file/%pux/pun}" >> "$LOGFILE"
done
I'm open to other methods if they are standard in my BASH version and safe. Below my parent first should be anywhere from 1-250 folders max and each of those children folders can have up to 1 pr*.pux file each (* will match the folder name as shown in my example output earlier). So were' not dealing with a ton of files.

Shell script to search a string and output lines for specific files in all directories

How can I search for a string in specific files in all directories and output results to a file?
I am using below code but how can I write a shell script for this?
find. -name *.txt | grep *%text* > result.xls
I don't get your point. I think your command line will work fine.
To make this into a script, just save this command line into a text file, and add x permission.
Save following content into a file. Ex: xxx.sh
#!/bin/bash
find . -name '*.txt' | grep '*%text*' > result.xls
Add x permission for the file by following command.
chmod +x xxx.sh
Run the script.
./xxx.sh
Check this output file result.xls.

Find a zip file, print path and zip contents

I have a series of numbered sub-directories that may or may not contain zip files, and within those zip files are some single-line .txt files I need. Is it possible to use a combination of find and unzip -p to list the file path and the single line contents on the same output line? I'd like to save the results to a .txt and import it into excel to work with.
From the main directory I can successfully find and output the single line:
find . -name 'file.zip' -exec unzip -p {} file.txt \;
How can I prefix the find output (i.e. the file path) to the output of this unzip command? Ideally, I'd like each line of the text file to resemble:
./path/to/file1.zip "Single line of file1.txt file"
./path/to/file2.zip "Single line of file2.txt file"
and so on. Can anyone provide some suggestions? I'm not very experienced with linux command line beyond simple commands.
Thank you.
Put all the code you want to execute into a shell script, then use the exec feature to call the shell script, i.e.
cat finder.bash
#!/bin/bash
printf "$# : " # prints just the /path/to/file/file.zip
unzip -p "$#" file.txt
For now, get that to work, you can make it generic to pass others besides file.txt later.
Make the script executable
chmod 755 finder.bash
Call it from find. i.e.
find . -name 'file.zip' -exec /path/to/finder.bash {} \;
(I don't have an easy way to test this, so reply in comments with error msgs).

why the file command in a for loop behaves so ridiculous

#!/bin/sh
file_list=`find . -type f`
IFS=$(echo) #this enables for loop to break on newline
for file_ in $file_list; do
file $file_
done
This shell script will amazingly report that (File name too long). I guess that the script feeds file with 3 times $file_list !!!
But if I change the file command with a simple echo, then the script will print all files in the current directory line by line which is expected.
It's not a good idea to iterate over the results of find. Use it's exec option instead:
find -type f -exec file {} \;

Unix: traverse a directory

I need to traverse a directory so starting in one directory and going deeper into difference sub directories. However I also need to be able to have access to each individual file to modify the file. Is there already a command to do this or will I have to write a script? Could someone provide some code to help me with this task? Thanks.
The find command is just the tool for that. Its -exec flag or -print0 in combination with xargs -0 allows fine-grained control over what to do with each file.
Example: Replace all foo's by bar's in all files in /tmp and subdirectories.
find /tmp -type f -exec sed -i -e 's/foo/bar/' '{}' ';'
for i in `find` ; do
if [ -d $i ] ; then do something with a directory ; fi
if [ -f $i ] ; then do something with a file etc. ; fi
done
This will return the whole tree (recursively) in the current directory in a list that the loop will go through.
This can be easily achieved by mixing find, xargs, sed (or other file modification command).
For example:
$ find /path/to/base/dir -type f -name '*.properties' | xargs sed -ie '/^#/d'
This will filter all files with file extension .properties.
The xargs command will feed the file path generated by find command into the sed command.
The sed command will delete all lines start with # in the files (feed by xargs).
Command combination in this way is very flexible.
For example, find command have different parameters so you can filter by user name, file size, file path (eg: under /test/ subfolder), file modification time.
Another dimension of flexibility is how and what to change in your file. For ex, sed command allows you to make changes on file in applying substitution (specify via regular expressions). Similarly, you can use gzip to compress the file. And so on ...
You would usually use the find command. On Linux, you have the GNU version, of course. It has many extra (and useful) options. Both will allow you to execute a command (eg a shell script) on the files as they are found.
The exact details of how to make changes to the file depend on the change you want to make to the file. That is probably best scripted, with find running the script:
POSIX or GNU:
find . -type f -exec your_script '{}' +
This will run your script once for a group of files with those names provided as arguments. If you want to do it one file at a time, replace the + with ';' (or \;).
I am assuming SearchMe is the example directory name you need to traverse completely.
I am also assuming, since it was not specified, the files you want to modify are all text file. Is this correct?
In such scenario I would suggest using the command:
find SearchMe -type f -exec vi {} \;
If you are not familiar with vi editor, just use another one (nano, emacs, kate, kwrite, gedit, etc.) and it should work as well.
Bash 4+
shopt -s globstar
for file in **
do
if [ -f "$file" ];then
# do some processing to your file here
# where the find command can't do conveniently
fi
done

Resources