Merge multiple files in multiple directories - bash - linux

We have a requirement to loop multiple directiories and in each directory there will be, multiple text files with Pattern File"n".txt which needs to merged to one, File.txt
We are using Bourne Shell scripting.
Example:
/staging/dusk/inbound/ --> Main Directory
Dir1
File1.txt,File2.txt,..sample1.doc,sample2.pdf,File*.txt --> we have to Merge the File name which starts with Fil*.txt -->Final.txt
Dir2
File1.txt,File2.txt,..attach1.txt,sample1.doc, File*.txt --> we have to Merge the File name which starts with Fil*.txt -->Final.txt
Dir3
File1.txt,File2.txt,File*.txt,..sample1,sample2*.txt --> we have to Merge the File name which starts with Fil*.txt -->Final.txt
Dir4
File1.txt,File2.txt,File*.txt,..temp.doc,attach.txt --> we have to Merge the File name which starts with Fil*.txt -->Final.txt
Di5
File1.txt,File2.txt,File*.txt,..sample1,sample2*.txt --> we have to Merge the File name which starts with Fil*.txt -->Final.txt
Dir"n"
File1.txt,File2.txt,File3.txt,File*.txt..attach1,attach*.txt --> we have to Merge the File name which starts with Fil*.txt -->Final.txt
The files from each directory can be looped using
cat *.txt > all.txt command.
But how we loop the directiories?

To catenate Fil*.txt from all immediate subdirectories of the current directory,
cat */Fil*.txt >Final.txt
To access arbitrarily deeply nested subdirectories, some shells offer ** as an extension, but this is not POSIX sh -compatible.
cat **/Fil*.txt >Final.txt
If you actually want to loop over your directories and do something more complex with each, that's
for d in */; do
: something with "$d"
done
Or similarly, for shells which support **, you can loop over all directories within directories;
for d in **/; do
: something with "$d"
done
For example, : something with "$d" could be cat "$d"/Fil*.txt >"$d"/Final.txt to create a Final.txt in each directory, which contains only the Fil*.txt files in that directory.

You can do it in a ad-hok way
find /staging/dusk/inbound -type f -name *.txt | sort -n | awk '{ print "cat " $1 " >> all.txt"}' > marge.sh
chmod +x marge.sh
sh merge.sh

you can probably try using the find command
find /staging/dusk/inbound/ -name "*.txt" -type f -exec cat {} \;>>MergedFile.txt

Related

Renaming folders and files in subdirectories using text file linux

I am trying to rename the files and directories using a text file separated by space.
The text file looks like this:
dir1-1 dir1_1
dir2-1 dir223_1
My command is as follows:
xargs -r -a files.txt -L1 mv
This command can rename only folders from dir1-1 to dir1_1 and dir2-1to dir223_1so on but it doesn't rename the files in the subdirectories. The files in the corresponding directories also have these prefix of these directories.
Looking forward for the assistance.
Assuming you don't have special characters(space of tab...) in your file/dir names,
try
perl_script=$(
echo 'chop($_); $orig=$_;'
while read -r src tgt; do
echo 'if (s{(.*)/'"$src"'([^/]*)}{$1/'"$tgt"'\2}) { print "$orig $_\n";next;}'
done < files.txt)
find . -depth | perl -ne "$perl_script" | xargs -r -L1 echo mv
Remove echo once you see it does what you wanted.

Add name of each directory to files inside the corresponding directory in linux

I have a directory containing multiple directories. here is an example of the list of directories:
dir1_out
dir2_out
dir3_out
dir4_out
Each directory contains multiple files.
For example folder1_out contains the following files:
file1
file2
file3
In the same fashion other directories contain several folders.
I would like to add the name of each directory to file name in the corresponding directory.
I would like to have the following result in first directory(dir1_out):
dir1.file1
dir1.file2
dir1.file3
Since I have around 50 directories I would like to write a loop that takes the name of each directory and add the name to the beginning of all subfiles.
Do you have any idea how can I do that in linux.
A simple bash onliner if there aren't too many files is:
for p in */*; do [ -f "$p" ] && mv -i "$p" "${p%/*}/${p/\//.}"; done
This uses parameter expansions to generate new filenames, after checking that we are trying to rename an actual file - See bash manpage descriptions of ${parameter%word} and ${parameter/pattern/string}
If there may be too many files to safely expand them all into a single list:
#!/bin/bash
find . -maxdepth 2 -print |\
while read p; do
p="${p#./}"
mv -i "$p" "${p%/*}/${p/\//.}"
done

How to find/list the directories where a particular sub-directory is not present

I am writing a shell script where it is checking if the bin directory is present under all the users directory under /home directory. The bin directory can be present directly under user directory or under the child directory of the user directory.
I mean let say I have a user as amit under /home. So the bin directory can be present directly as /amit/bin or can be present as /amit/jash/bin
Now my requirement is that I should have a list of users directories where the bin directory is not present either directly under user directory or under the child directory of the user directory. I tried the command as :
find /home -type d ! -exec test -e '{}/bin' \; -print
but it is not working. However when I am replacing the bin directory with some file, the command is working fine. Looks like this command is particularly for files. Is there any similar command for directories?? Any help on this will be greatly appreciated.
You're on the right track. The catch is that your test of "does the following directory NOT exist in this target" can't be expressed within find's conditions in such a way as to return only the top-level directory. So you need to nest, one way or another.
One strategy would be to use a for loop in bash:
$ mkdir foo bar baz one two
$ mkdir bar/bin baz/bin
$ for d in /home/*/; do find "$d" -type d -name bin | grep -q . || echo "$d"; done
foo/
one/
two/
This uses pathname expansion (globbing) to generate the list of directories to test, and then checks for the existence of "bin". If that check fails (i.e. find outputs nothing), the directory is printed. Note the trailing slash on /home/*/, which ensures that you will only be searching within directories, rather than files that might accidentally exist in /home/.
Another possibility might be to use nested finds, if you don't want to depend on bash:
$ find /home/ -type d -depth 1 -not -exec sh -c "find {}/ -type d -name bin -print | grep -q . " \; -print
/home/foo
/home/one
/home/two
This roughly duplicates the effect of the bash for loop above, but by nesting find within find -exec. It uses grep -q . to convert the output of find into an exit status that can be used as a condition for the outer find.
Note that since you're looking for a bin directory, we want to use test -d rather than test -e (which would also check for a bin file, which probably does not matter to you.)
Another option is to use bash process redirection. On multiple lines for easier reading:
cd /home/
comm -3 \
<(printf '%s\n' */ | sed 's|/.*||' | sort) \
<(find */ -type d -name bin | cut -d/ -f1 | uniq)
This unfortunately requires you to change to the /home directory before running, because of the way it strips off subdirectories. You can of course collapse this into a big long one-liner if you feel so inclined.
This comm solution also has the risk of failing on directories with special characters in their names, like newlines.
One last option is bash-only but more than a one-liner. It involves subtracting the directories containing "bin" from the full list. It uses an associative array and globstar, so it depends on bash version 4.
#!/usr/bin/env bash
shopt -s globstar
# Go to our root
cd /home
# Declare an associative array
declare -A dirs=()
# Populate the array with our "full" list of home directories
for d in */; do dirs[${d%/}]=""; done
# Remove directories that contain a "bin" somewhere inside 'em
for d in **/bin; do unset dirs[${d%%/*}]; done
# Print the result in reproducible form
declare -p dirs
# Or print the result just as a list of words.
printf '%s\n' "${!dirs[#]}"
Note that we're storing directories in the array index, which (1) makes it easy for us to find and delete items, and (2) insures unique entries, even if one user has multiple "bin" directories under their home.
cd /home
find . -maxdepth 1 -type d ! -name . | sort > a
find . -type d -name bin | cut -d/ -f1,2 | sort > b
comm -23 a b
Here, I'm making two sorted lists. The first contains all the home directories, and the second contains the top parent of any bin subdirectory. Finally I output any items from the first list not present in the second.

I want to cat a file for a list of file names, then search a directory for each of the results and move any files it finds

I'm having a really hard time finding an answer for this because most people want to find a list of files, then do the xargs.
I have a file with a list of file names. I would like to go through the list of file names in that file and one by one search a single directory (not recursive) and any time it finds the file, move it to a sub folder.
cat test.txt | xargs find . -type f -iname
At this point, I get this error:
find: paths must precede expression: file1
Why don't you use something like:
for i in `cat test.txt`
do
test -e $i && mv <src> <dst>
done

Find all directories containing a file that contains a keyword in linux

In my hierarchy of directories I have many text files called STATUS.txt. These text files each contain one keyword such as COMPLETE, WAITING, FUTURE or OPEN. I wish to execute a shell command of the following form:
./mycommand OPEN
which will list all the directories that contain a file called STATUS.txt, where this file contains the text "OPEN"
In future I will want to extend this script so that the directories returned are sorted. Sorting will determined by a numeric value stored the file PRIORITY.txt, which lives in the same directories as STATUS.txt. However, this can wait until my competence level improves. For the time being I am happy to list the directories in any order.
I have searched Stack Overflow for the following, but to no avail:
unix filter by file contents
linux filter by file contents
shell traverse directory file contents
bash traverse directory file contents
shell traverse directory find
bash traverse directory find
linux file contents directory
unix file contents directory
linux find name contents
unix find name contents
shell read file show directory
bash read file show directory
bash directory search
shell directory search
I have tried the following shell commands:
This helps me identify all the directories that contain STATUS.txt
$ find ./ -name STATUS.txt
This reads STATUS.txt for every directory that contains it
$ find ./ -name STATUS.txt | xargs -I{} cat {}
This doesn't return any text, I was hoping it would return the name of each directory
$ find . -type d | while read d; do if [ -f STATUS.txt ]; then echo "${d}"; fi; done
... or the other way around:
find . -name "STATUS.txt" -exec grep -lF "OPEN" \{} +
If you want to wrap that in a script, a good starting point might be:
#!/bin/sh
[ $# -ne 1 ] && echo "One argument required" >&2 && exit 2
find . -name "STATUS.txt" -exec grep -lF "$1" \{} +
As pointed out by #BroSlow, if you are looking for directories containing the matching STATUS.txt files, this might be more what you are looking for:
fgrep --include='STATUS.txt' -rl 'OPEN' | xargs -L 1 dirname
Or better
fgrep --include='STATUS.txt' -rl 'OPEN' |
sed -e 's|^[^/]*$|./&|' -e 's|/[^/]*$||'
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# simulate `xargs -L 1 dirname` using `sed`
# (no trailing `\`; returns `.` for path without dir part)
Maybe you can try this:
grep -rl "OPEN" . --include='STATUS.txt'| sed 's/STATUS.txt//'
where grep -r means recursive , -l means only list the files matching, '.' is the directory location. You can pipe it to sed to remove the file name.
You can then wrap this in a bash script file where you can pass in keywords such as 'OPEN', 'FUTURE' as an argument.
#!/bin/bash
grep -rl "$1" . --include='STATUS.txt'| sed 's/STATUS.txt//'
Try something like this
find -type f -name "STATUS.txt" -exec grep -q "OPEN" {} \; -exec dirname {} \;
or in a script
#!/bin/bash
(($#==1)) || { echo "Usage: $0 <pattern>" && exit 1; }
find -type f -name "STATUS.txt" -exec grep -q "$1" {} \; -exec dirname {} \;
You could use grep and awk instead of find:
grep -r OPEN * | awk '{split($1, path, ":"); print path[1]}' | xargs -I{} dirname {}
The above grep will list all files containing "OPEN" recursively inside you dir structure. The result will be something like:
dir_1/subdir_1/STATUS.txt:OPEN
dir_2/subdir_2/STATUS.txt:OPEN
dir_2/subdir_3/STATUS.txt:OPEN
Then the awk script will split this output at the colon and print the first part of it (the dir path).
dir_1/subdir_1/STATUS.txt
dir_2/subdir_2/STATUS.txt
dir_2/subdir_3/STATUS.txt
The dirname will then return only the directory path, not the file name, which I suppose it what you want.
I'd consider using Perl or Python if you want to evolve this further, though, as it might get messier if you want to add priorities and sorting.
Taking up the accepted answer, it does not output a sorted and unique directory list. At the end of the "find" command, add:
| sort -u
or:
| sort | uniq
to get the unique list of the directories.
Credits go to Get unique list of all directories which contain a file whose name contains a string.
IMHO you should write a Python script which:
Examines your directory structure and finds all files named STATUS.txt.
For each found file:
reads the file and executes mycommand depending on what the file contains.
If you want to extend the script later with sorting, you can find all the interesting files first, save them to a list, sort the list and execute the commands on the sorted list.
Hint: http://pythonadventures.wordpress.com/2011/03/26/traversing-a-directory-recursively/

Resources