Related
I am working on a bioinformatics workflow in which the tool in question, 'salmon' creates multiple directories having a 'quant.sf' file. I want to find all 'lnc' entries within these files and save them as 'lnc.sf' for all directories.
I was previously running
cat quant.sf | grep 'lnc' > lnc.sf
in all directories individually that seemed to solve my problem. Now I want to write a script that goes into each directory and generates a lnc.sf file.
I have tried doing
find . -name "quant.sf" | while read A
do
cat $A | grep 'lnc' > lnc.sf
done
But this just creates a concatenated lnc.sf file in the current directory. Any help is highly appreciated.
Thank You!
If all your quant.sf files are at the same hierarchy level, the following should work, assuming a folder structure like month/day/quant.sf:
grep -h 'lnc' */*/quant.sf > lnc.sf
Otherwise, find the files, be aware of using find+read instead of exec or xargs; understand variable expansion with whitespaces, get rid of the redundant cat process, and write the file to the correct directory:
find . -name 'quant.sf' | while IFS= read -r A
do
grep 'lnc' "$A" > "${A%/*}/lnc.sf"
done
If you have GNU find + xargs, use -print0 combined with -0:
find . -name 'quant.sf' -print0 | xargs -0 -n1 sh -c 'grep "lnc" "$1" > "${1%/*}/lnc.sf"' -
Or use -exec of find, which avoids problems with weird files names:
find . -name 'quant.sf' -exec sh -c 'grep "lnc" "$1" > "${1%/*}/lnc.sf"' - ';'
In my hierarchy of directories I have many text files called STATUS.txt. These text files each contain one keyword such as COMPLETE, WAITING, FUTURE or OPEN. I wish to execute a shell command of the following form:
./mycommand OPEN
which will list all the directories that contain a file called STATUS.txt, where this file contains the text "OPEN"
In future I will want to extend this script so that the directories returned are sorted. Sorting will determined by a numeric value stored the file PRIORITY.txt, which lives in the same directories as STATUS.txt. However, this can wait until my competence level improves. For the time being I am happy to list the directories in any order.
I have searched Stack Overflow for the following, but to no avail:
unix filter by file contents
linux filter by file contents
shell traverse directory file contents
bash traverse directory file contents
shell traverse directory find
bash traverse directory find
linux file contents directory
unix file contents directory
linux find name contents
unix find name contents
shell read file show directory
bash read file show directory
bash directory search
shell directory search
I have tried the following shell commands:
This helps me identify all the directories that contain STATUS.txt
$ find ./ -name STATUS.txt
This reads STATUS.txt for every directory that contains it
$ find ./ -name STATUS.txt | xargs -I{} cat {}
This doesn't return any text, I was hoping it would return the name of each directory
$ find . -type d | while read d; do if [ -f STATUS.txt ]; then echo "${d}"; fi; done
... or the other way around:
find . -name "STATUS.txt" -exec grep -lF "OPEN" \{} +
If you want to wrap that in a script, a good starting point might be:
#!/bin/sh
[ $# -ne 1 ] && echo "One argument required" >&2 && exit 2
find . -name "STATUS.txt" -exec grep -lF "$1" \{} +
As pointed out by #BroSlow, if you are looking for directories containing the matching STATUS.txt files, this might be more what you are looking for:
fgrep --include='STATUS.txt' -rl 'OPEN' | xargs -L 1 dirname
Or better
fgrep --include='STATUS.txt' -rl 'OPEN' |
sed -e 's|^[^/]*$|./&|' -e 's|/[^/]*$||'
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# simulate `xargs -L 1 dirname` using `sed`
# (no trailing `\`; returns `.` for path without dir part)
Maybe you can try this:
grep -rl "OPEN" . --include='STATUS.txt'| sed 's/STATUS.txt//'
where grep -r means recursive , -l means only list the files matching, '.' is the directory location. You can pipe it to sed to remove the file name.
You can then wrap this in a bash script file where you can pass in keywords such as 'OPEN', 'FUTURE' as an argument.
#!/bin/bash
grep -rl "$1" . --include='STATUS.txt'| sed 's/STATUS.txt//'
Try something like this
find -type f -name "STATUS.txt" -exec grep -q "OPEN" {} \; -exec dirname {} \;
or in a script
#!/bin/bash
(($#==1)) || { echo "Usage: $0 <pattern>" && exit 1; }
find -type f -name "STATUS.txt" -exec grep -q "$1" {} \; -exec dirname {} \;
You could use grep and awk instead of find:
grep -r OPEN * | awk '{split($1, path, ":"); print path[1]}' | xargs -I{} dirname {}
The above grep will list all files containing "OPEN" recursively inside you dir structure. The result will be something like:
dir_1/subdir_1/STATUS.txt:OPEN
dir_2/subdir_2/STATUS.txt:OPEN
dir_2/subdir_3/STATUS.txt:OPEN
Then the awk script will split this output at the colon and print the first part of it (the dir path).
dir_1/subdir_1/STATUS.txt
dir_2/subdir_2/STATUS.txt
dir_2/subdir_3/STATUS.txt
The dirname will then return only the directory path, not the file name, which I suppose it what you want.
I'd consider using Perl or Python if you want to evolve this further, though, as it might get messier if you want to add priorities and sorting.
Taking up the accepted answer, it does not output a sorted and unique directory list. At the end of the "find" command, add:
| sort -u
or:
| sort | uniq
to get the unique list of the directories.
Credits go to Get unique list of all directories which contain a file whose name contains a string.
IMHO you should write a Python script which:
Examines your directory structure and finds all files named STATUS.txt.
For each found file:
reads the file and executes mycommand depending on what the file contains.
If you want to extend the script later with sorting, you can find all the interesting files first, save them to a list, sort the list and execute the commands on the sorted list.
Hint: http://pythonadventures.wordpress.com/2011/03/26/traversing-a-directory-recursively/
So I have a huge folder full subfolders with tons of files, and I add files to it all the time.
I need a subfolder in the root of that folder with a symlink of the last 10-20 files added so that I can quickly find the things I recently added. This is located on a NAS, but I have a linux box running Arch connected through NFS, so I assume the best way is to run a bash script with a find command followed by a loop of ln -sf, but I can't do it safely without help.
Something like this is required:
mkdir -p subfolder
find /dir/ -type f -printf '%T# %p\n' | sort -n | tail -n 10 | cut -d' ' -f2- | while IFS= read -r file ; do ln -s "$file" subfolder ; done
Which will create symlinks in subfolder pointing to the 10 most recently modified files in the directory tree rooted at /dir/
You could just create a shell function like:
recent() { ls -lt ${1+"$#"} | head -n 20; }
which will give you a listing of the 20 most recent items in the specified directories, or the current directory if no arguments are given.
Is it possible to copy a single file to multiple directories using the cp command ?
I tried the following , which did not work:
cp file1 /foo/ /bar/
cp file1 {/foo/,/bar}
I know it's possible using a for loop, or find. But is it possible using the gnu cp command?
You can't do this with cp alone but you can combine cp with xargs:
echo dir1 dir2 dir3 | xargs -n 1 cp file1
Will copy file1 to dir1, dir2, and dir3. xargs will call cp 3 times to do this, see the man page for xargs for details.
No, cp can copy multiple sources but will only copy to a single destination. You need to arrange to invoke cp multiple times - once per destination - for what you want to do; using, as you say, a loop or some other tool.
Wildcards also work with Roberts code
echo ./fs*/* | xargs -n 1 cp test
I would use cat and tee based on the answers I saw at https://superuser.com/questions/32630/parallel-file-copy-from-single-source-to-multiple-targets instead of cp.
For example:
cat inputfile | tee outfile1 outfile2 > /dev/null
As far as I can see it you can use the following:
ls | xargs -n 1 cp -i file.dat
The -i option of cp command means that you will be asked whether to overwrite a file in the current directory with the file.dat. Though it is not a completely automatic solution it worked out for me.
These answers all seem more complicated than the obvious:
for i in /foo /bar; do cp "$file1" "$i"; done
ls -db di*/subdir | xargs -n 1 cp File
-b in case there is a space in directory name otherwise it will be broken as a different item by xargs, had this problem with the echo version
Not using cp per se, but...
This came up for me in the context of copying lots of Gopro footage off of a (slow) SD card to three (slow) USB drives. I wanted to read the data only once, because it took forever. And I wanted it recursive.
$ tar cf - src | tee >( cd dest1 ; tar xf - ) >( cd dest2 ; tar xf - ) | ( cd dest3 ; tar xf - )
(And you can add more of those >() sections if you want more outputs.)
I haven't benchmarked that, but it's definitely a lot faster than cp-in-a-loop (or a bunch of parallel cp invocations).
If you want to do it without a forked command:
tee <inputfile file2 file3 file4 ... >/dev/null
To use copying with xargs to directories using wildcards on Mac OS, the only solution that worked for me with spaces in the directory name is:
find ./fs*/* -type d -print0 | xargs -0 -n 1 cp test
Where test is the file to copy
And ./fs*/* the directories to copy to
The problem is that xargs sees spaces as a new argument, the solutions to change the delimiter character using -d or -E is unfortunately not properly working on Mac OS.
Essentially equivalent to the xargs answer, but in case you want parallel execution:
parallel -q cp file1 ::: /foo/ /bar/
So, for example, to copy file1 into all subdirectories of current folder (including recursion):
parallel -q cp file1 ::: `find -mindepth 1 -type d`
N.B.: This probably only conveys any noticeable speed gains for very specific use cases, e.g. if each target directory is a distinct disk.
It is also functionally similar to the '-P' argument for xargs.
No - you cannot.
I've found on multiple occasions that I could use this functionality so I've made my own tool to do this for me.
http://github.com/ddavison/branch
pretty simple -
branch myfile dir1 dir2 dir3
ls -d */ | xargs -iA cp file.txt A
Suppose you want to copy fileName.txt to all sub-directories within present working directory.
Get all sub-directories names through ls and save them to some temporary file say, allFolders.txt
ls > allFolders.txt
Print the list and pass it to command xargs.
cat allFolders.txt | xargs -n 1 cp fileName.txt
Another way is to use cat and tee as follows:
cat <source file> | tee <destination file 1> | tee <destination file 2> [...] > <last destination file>
I think this would be pretty inefficient though, since the job would be split among several processes (one per destination) and the hard drive would be writing several files at once over different parts of the platter. However if you wanted to write a file out to several different drives, this method would probably be pretty efficient (as all copies could happen concurrently).
Using a bash script
DESTINATIONPATH[0]="xxx/yyy"
DESTINATIONPATH[1]="aaa/bbb"
..
DESTINATIONPATH[5]="MainLine/USER"
NumberOfDestinations=6
for (( i=0; i<NumberOfDestinations; i++))
do
cp SourcePath/fileName.ext ${DESTINATIONPATH[$i]}
done
exit
if you want to copy multiple folders to multiple folders one can do something like this:
echo dir1 dir2 dir3 | xargs -n 1 cp -r /path/toyourdir/{subdir1,subdir2,subdir3}
If all your target directories match a path expression — like they're all subdirectories of path/to — then just use find in combination with cp like this:
find ./path/to/* -type d -exec cp [file name] {} \;
That's it.
If you need to be specific on into which folders to copy the file you can combine find with one or more greps. For example to replace any occurences of favicon.ico in any subfolder you can use:
find . | grep favicon\.ico | xargs -n 1 cp -f /root/favicon.ico
This will copy to the immediate sub-directories, if you want to go deeper, adjust the -maxdepth parameter.
find . -mindepth 1 -maxdepth 1 -type d| xargs -n 1 cp -i index.html
If you don't want to copy to all directories, hopefully you can filter the directories you are not interested in. Example copying to all folders starting with a
find . -mindepth 1 -maxdepth 1 -type d| grep \/a |xargs -n 1 cp -i index.html
If copying to a arbitrary/disjoint set of directories you'll need Robert Gamble's suggestion.
I like to copy a file into multiple directories as such:
cp file1 /foo/; cp file1 /bar/; cp file1 /foo2/; cp file1 /bar2/
And copying a directory into other directories:
cp -r dir1/ /foo/; cp -r dir1/ /bar/; cp -r dir1/ /foo2/; cp -r dir1/ /bar2/
I know it's like issuing several commands, but it works well for me when I want to type 1 line and walk away for a while.
For example if you are in the parent directory of you destination folders you can do:
for i in $(ls); do cp sourcefile $i; done
I am writing a shell script that takes file paths as input.
For this reason, I need to generate recursive file listings with full paths. For example, the file bar has the path:
/home/ken/foo/bar
but, as far as I can see, both ls and find only give relative path listings:
./foo/bar (from the folder ken)
It seems like an obvious requirement, but I can't see anything in the find or ls man pages.
How can I generate a list of files in the shell including their absolute paths?
If you give find an absolute path to start with, it will print absolute paths. For instance, to find all .htaccess files in the current directory:
find "$(pwd)" -name .htaccess
or if your shell expands $PWD to the current directory:
find "$PWD" -name .htaccess
find simply prepends the path it was given to a relative path to the file from that path.
Greg Hewgill also suggested using pwd -P if you want to resolve symlinks in your current directory.
readlink -f filename
gives the full absolute path. but if the file is a symlink, u'll get the final resolved name.
Use this for dirs (the / after ** is needed in bash to limit it to directories):
ls -d -1 "$PWD/"**/
this for files and directories directly under the current directory, whose names contain a .:
ls -d -1 "$PWD/"*.*
this for everything:
ls -d -1 "$PWD/"**/*
Taken from here
http://www.zsh.org/mla/users/2002/msg00033.html
In bash, ** is recursive if you enable shopt -s globstar.
You can use
find $PWD
in bash
ls -d "$PWD/"*
This looks only in the current directory. It quotes "$PWD" in case it contains spaces.
Command: ls -1 -d "$PWD/"*
This will give the absolute paths of the file like below.
[root#kubenode1 ssl]# ls -1 -d "$PWD/"*
/etc/kubernetes/folder/file-test-config.txt
/etc/kubernetes/folder/file-test.txt
/etc/kubernetes/folder/file-client.txt
Try this:
find "$PWD"/
You get list of absolute paths in working directory.
You can do
ls -1 |xargs realpath
If you need to specify an absolute path or relative path You can do that as well
ls -1 $FILEPATH |xargs realpath
The $PWD is a good option by Matthew above. If you want find to only print files then you can also add the -type f option to search only normal files. Other options are "d" for directories only etc. So in your case it would be (if i want to search only for files with .c ext):
find $PWD -type f -name "*.c"
or if you want all files:
find $PWD -type f
Note: You can't make an alias for the above command, because $PWD gets auto-completed to your home directory when the alias is being set by bash.
If you give the find command an absolute path, it will spit the results out with an absolute path. So, from the Ken directory if you were to type:
find /home/ken/foo/ -name bar -print
(instead of the relative path find . -name bar -print)
You should get:
/home/ken/foo/bar
Therefore, if you want an ls -l and have it return the absolute path, you can just tell the find command to execute an ls -l on whatever it finds.
find /home/ken/foo -name bar -exec ls -l {} ;\
NOTE: There is a space between {} and ;
You'll get something like this:
-rw-r--r-- 1 ken admin 181 Jan 27 15:49 /home/ken/foo/bar
If you aren't sure where the file is, you can always change the search location. As long as the search path starts with "/", you will get an absolute path in return. If you are searching a location (like /) where you are going to get a lot of permission denied errors, then I would recommend redirecting standard error so you can actually see the find results:
find / -name bar -exec ls -l {} ;\ 2> /dev/null
(2> is the syntax for the Borne and Bash shells, but will not work with the C shell. It may work in other shells too, but I only know for sure that it works in Bourne and Bash).
Just an alternative to
ls -d "$PWD/"*
to pinpoint that * is shell expansion, so
echo "$PWD/"*
would do the same (the drawback you cannot use -1 to separate by new lines, not spaces).
fd
Using fd (alternative to find), use the following syntax:
fd . foo -a
Where . is the search pattern and foo is the root directory.
E.g. to list all files in etc recursively, run: fd . /etc -a.
-a, --absolute-path Show absolute instead of relative paths
If you need list of all files in current as well as sub-directories
find $PWD -type f
If you need list of all files only in current directory
find $PWD -maxdepth 1 -type f
You might want to try this.
for name in /home/ken/foo/bar/*
do
echo $name
done
You can get abs path using for loop and echo simply without find.
find jar file recursely and print absolute path
`ls -R |grep "\.jar$" | xargs readlink -f`
/opt/tool/dev/maven_repo/com/oracle/ojdbc/ojdbc8-19.3.0.0.jar
/opt/tool/dev/maven_repo/com/oracle/ojdbc/ons-19.3.0.0.jar
/opt/tool/dev/maven_repo/com/oracle/ojdbc/oraclepki-19.3.0.0.jar
/opt/tool/dev/maven_repo/com/oracle/ojdbc/osdt_cert-19.3.0.0.jar
/opt/tool/dev/maven_repo/com/oracle/ojdbc/osdt_core-19.3.0.0.jar
/opt/tool/dev/maven_repo/com/oracle/ojdbc/simplefan-19.3.0.0.jar
/opt/tool/dev/maven_repo/com/oracle/ojdbc/ucp-19.3.0.0.jar
This works best if you want a dynamic solution that works well in a function
lfp ()
{
ls -1 $1 | xargs -I{} echo $(realpath $1)/{}
}
lspwd() { for i in $#; do ls -d -1 $PWD/$i; done }
Here's an example that prints out a list without an extra period and that also demonstrates how to search for a file match. Hope this helps:
find . -type f -name "extr*" -exec echo `pwd`/{} \; | sed "s|\./||"
This worked for me. But it didn't list in alphabetical order.
find "$(pwd)" -maxdepth 1
This command lists alphabetically as well as lists hidden files too.
ls -d -1 "$PWD/".*; ls -d -1 "$PWD/"*;
stat
Absolute path of a single file:
stat -c %n "$PWD"/foo/bar
This will give the canonical path (will resolve symlinks): realpath FILENAME
If you want canonical path to the symlink itself, then: realpath -s FILENAME
Most if not all of the suggested methods result in paths that cannot be used directly in some other terminal command if the path contains spaces. Ideally the results will have slashes prepended.
This works for me on macOS:
find / -iname "*SEARCH TERM spaces are okay*" -print 2>&1 | grep -v denied |grep -v permitted |sed -E 's/\ /\\ /g'
for p in <either relative of absolute path of the directory>/*; do
echo $(realpath -s $p)
done
Recursive files can be listed by many ways in Linux. Here I am sharing one liner script to clear all logs of files(only files) from /var/log/ directory and second check recently which logs file has made an entry.
First:
find /var/log/ -type f #listing file recursively
Second:
for i in $(find $PWD -type f) ; do cat /dev/null > "$i" ; done #empty files recursively
Third use:
ls -ltr $(find /var/log/ -type f ) # listing file used in recent
Note: for directory location you can also pass $PWD instead of /var/log.
If you don't have symbolic links, you could try
tree -iFL 1 [DIR]
-i makes tree print filenames in each line, without the tree structure.
-f makes tree print the full path of each file.
-L 1 avoids tree from recursion.
Write one small function
lsf() {
ls `pwd`/$1
}
Then you can use like
lsf test.sh
it gives full path like
/home/testuser/Downloads/test.sh
I used the following to list absolute path of files in a directory in a txt file:
find "$PWD" -wholename '*.JPG' >test.txt
find / -print will do this
ls -1 | awk -vpath=$PWD/ '{print path$1}'