List newest file, by type (.txt), after searching recursively, in a terminal - linux

I'm trying to get my terminal to return the latest .txt file, with path intact. I've been researching ls, grep, find, and tail, using the '|' functionality of passing results from one utility to the next. The end result would be to have a working path + result that I could pass my text editor.
I've been getting close with tests like this:
find . | grep '.txt$' | tail -1
..but I haven't had luck with grep returning the newest file - is there a flag I'm missing?
Trying to use find & ls isn't exactly working either:
find . -name "*.txt" | ls -lrth
..the ls returns the current directories instead of the results of my find query.
Please help!

You're so very close.
vi "$(find . -name '*.txt' -exec ls -t {} + | head -1)"

find /usr/share -name '*.txt' -printf '%C+ %p\n' | sort -r | head -1 | sed 's/^[^ ]* //'

If you have bash4+
ls -t ./**/*.txt | head -1
edit the latest txt file
vim $(ls -t ./**/*.txt |head -1)
ps: need enabled shopt -s globstar in your .bashrc or .profile...

You can use the stat function to print each file with just the latest modification time and name.
find . -name "*.txt" -exec stat -c "%m %N" {} \; | sort

Related

LINUX Copy the name of the newest folder and paste it in a command [duplicate]

I would like to find the newest sub directory in a directory and save the result to variable in bash.
Something like this:
ls -t /backups | head -1 > $BACKUPDIR
Can anyone help?
BACKUPDIR=$(ls -td /backups/*/ | head -1)
$(...) evaluates the statement in a subshell and returns the output.
There is a simple solution to this using only ls:
BACKUPDIR=$(ls -td /backups/*/ | head -1)
-t orders by time (latest first)
-d only lists items from this folder
*/ only lists directories
head -1 returns the first item
I didn't know about */ until I found Listing only directories using ls in bash: An examination.
This ia a pure Bash solution:
topdir=/backups
BACKUPDIR=
# Handle subdirectories beginning with '.', and empty $topdir
shopt -s dotglob nullglob
for file in "$topdir"/* ; do
[[ -L $file || ! -d $file ]] && continue
[[ -z $BACKUPDIR || $file -nt $BACKUPDIR ]] && BACKUPDIR=$file
done
printf 'BACKUPDIR=%q\n' "$BACKUPDIR"
It skips symlinks, including symlinks to directories, which may or may not be the right thing to do. It skips other non-directories. It handles directories whose names contain any characters, including newlines and leading dots.
Well, I think this solution is the most efficient:
path="/my/dir/structure/*"
backupdir=$(find $path -type d -prune | tail -n 1)
Explanation why this is a little better:
We do not need sub-shells (aside from the one for getting the result into the bash variable).
We do not need a useless -exec ls -d at the end of the find command, it already prints the directory listing.
We can easily alter this, e.g. to exclude certain patterns. For example, if you want the second newest directory, because backup files are first written to a tmp dir in the same path:
backupdir=$(find $path -type -d -prune -not -name "*temp_dir" | tail -n 1)
The above solution doesn't take into account things like files being written and removed from the directory resulting in the upper directory being returned instead of the newest subdirectory.
The other issue is that this solution assumes that the directory only contains other directories and not files being written.
Let's say I create a file called "test.txt" and then run this command again:
echo "test" > test.txt
ls -t /backups | head -1
test.txt
The result is test.txt showing up instead of the last modified directory.
The proposed solution "works" but only in the best case scenario.
Assuming you have a maximum of 1 directory depth, a better solution is to use:
find /backups/* -type d -prune -exec ls -d {} \; |tail -1
Just swap the "/backups/" portion for your actual path.
If you want to avoid showing an absolute path in a bash script, you could always use something like this:
LOCALPATH=/backups
DIRECTORY=$(cd $LOCALPATH; find * -type d -prune -exec ls -d {} \; |tail -1)
With GNU find you can get list of directories with modification timestamps, sort that list and output the newest:
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\0" | sort -z -n | cut -z -f2- | tail -z -n1
or newline separated
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\n" | sort -n | cut -f2- | tail -n1
With POSIX find (that does not have -printf) you may, if you have it, run stat to get file modification timestamp:
find . -mindepth 1 -maxdepth 1 -type d -exec stat -c '%Y %n' {} \; | sort -n | cut -d' ' -f2- | tail -n1
Without stat a pure shell solution may be used by replacing [[ bash extension with [ as in this answer.
Your "something like this" was almost a hit:
BACKUPDIR=$(ls -t ./backups | head -1)
Combining what you wrote with what I have learned solved my problem too. Thank you for rising this question.
Note: I run the line above from GitBash within Windows environment in file called ./something.bash.

Bash: How to tail then copy multiple files (eg using xargs)?

I've been trying various combinations of xargs and piping but I just can't get the right result. Previous questions don't quite cover exactly what I want to do:
I have a source directory somewhere, lets say /foo/source, with a mix of different files
I want to copy just the csv files found in source to a different destination, say /foo/dest
But I ALSO at the same time need to remove 232 header rows (eg using tail)
I've figured out that I need to pipe the results of find into xargs, which can then run commands on each find result. But I'm struggling to tail then copy. If I pipe tail into cp, cp does not seem to receive the file (missing file operand). Here's some examples of what I've tried so far:
find /foo/source -name "*.csv" | xargs -I '{}' sh -c 'tail -n +232 | cp -t /foo/dest'
cp: missing file operand
find /foo/source -name "*.csv" | xargs -I '{}' sh -c 'tail -n +232 {} | cp -t /foo/dest'
Result:
cp: failed to access '/foo/dest': No such file or directory ...
find /foo/source -name "*.csv" | xargs -I '{}' sh -c 'tail -n +232 {} > /foo/dest/{}'
sh: /foo/dest/foo/source/0001.csv: No such file or directory ...
Any pointers would be really appreciated!
Thanks
Just use find with exec and copy the file name in a variable:
find your_dir -name "*.csv" -exec sh -c 'f="$1"; tail -n +5 "$f" > dest_dir/$(basename "$f")' -- {} \;
See f={} makes $f hold the name of the file, with the full path. Then, it is a matter of redirecting the output of tail into the file, stripping the path from it.
Or, based on Random832's suggestion below in comments (thanks!):
find your_dir -name "*.csv" -exec sh -c 'tail -n +5 "$1" > dest_dir/$(basename "$1")' -- {} \;
Your last command is close, but the problem is that {} is replaced with the full pathname, not just the filename. Use the basename command to extract the filename from it.
find /foo/source -name "*.csv" | xargs -I '{}' sh -c 'tail -n +232 {} > /foo/dest/$(basename {})'
As an alternative to find and xargs you could use a for loop, and as an alternative to tail you could use sed, consider this:
source=/foo/source
dest=/foo/dest
for csv in $source/*.csv; do sed '232,$ !d' $csv > $dest/$(basename $csv); done
Using GNU Parallel you would do:
find /foo/source -name "*.csv" | parallel tail -n +232 {} '>' /foo/dest/{/}

Is it possible to pipe the results of FIND to a COPY command CP?

Is it possible to pipe the results of find to a COPY command cp?
Like this:
find . -iname "*.SomeExt" | cp Destination Directory
Seeking, I always find this kind of formula such as from this post:
find . -name "*.pdf" -type f -exec cp {} ./pdfsfolder \;
This raises some questions:
Why cant you just use | pipe? isn't that what its for?
Why does everyone recommend the -exec
How do I know when to use that (exec) over pipe |?
There's a little-used option for cp: -t destination -- see the man page:
find . -iname "*.SomeExt" | xargs cp -t Directory
Good question!
why cant you just use | pipe? isn't that what its for?
You can pipe, of course, xargs is done for these cases:
find . -iname "*.SomeExt" | xargs cp Destination_Directory/
Why does everyone recommend the -exec
The -exec is good because it provides more control of exactly what you are executing. Whenever you pipe there may be problems with corner cases: file names containing spaces or new lines, etc.
how do I know when to use that (exec) over pipe | ?
It is really up to you and there can be many cases. I would use -exec whenever the action to perform is simple. I am not a very good friend of xargs, I tend to prefer an approach in which the find output is provided to a while loop, such as:
while IFS= read -r result
do
# do things with "$result"
done < <(find ...)
You can use | like below:
find . -iname "*.SomeExt" | while read line
do
cp $line DestDir/
done
Answering your questions:
| can be used to solve this issue. But as seen above, it involves a lot of code. Moreover, | will create two process - one for find and another for cp.
Instead using exec() inside find will solve the problem in a single process.
Try this:
find . -iname "*.SomeExt" -print0 | xargs -0 cp -t Directory
# ........................^^^^^^^..........^^
In case there is whitespace in filenames.
I like the spirit of the response from #fedorqui-so-stop-harming, but it needed a tweak to work in my bash terminal.
In this version...
find . -iname "*.SomeExt" | xargs cp Destination_Directory/
The cp command incorrectly takes Destination_Directory/ as the first argument. I needed to add a replacement string in order to get xargs to insert the argument in the right position for cp. I used a percent symbol for the replacement string, but you can use anything that doesn't conflict with the input from the pipe. This version works for me.
find . -iname "*.SomeExt" | xargs -I % cp % Destination_Directory/
This SOLVED my problem.
find . -type f | grep '\.pdf' | while read line
do
cp $line REPLACE_WITH_TARGET_DIRECTORY
done
If there are spaces in the filenames, try:
find . -iname *.ext > list.txt
cat list.txt | awk 'BEGIN {a="'"'"'"}{print "cp "a$0a" Directory"}' > script.sh
sh script.sh
You can inspect list.txt and script.sh before sh script.sh. Remember to delete the list.txt and script.sh afterwards.
I had some files with parenthesis and wanted a progress bar, so replaced the cat line with:
cat list.txt | awk -v X='"' '{print "rsync -Pa "X$0X" /Volumes/Untitled/"}' > script.sh

Select files by extension using grep

I need to count all the .txt files in the current folder.
I tried ls | grep .txt but if my folder content is: a.txt btxt c.c it will select a.txt and btxt and I only want files that end with .txt. I tried various combinations of regexp but with no result.
Find may be better than in this case since it is designed for handling file names:
find . -maxdepth 0 -name '*.txt' | wc -l
Buf if you are very cautious about possibly strange file names:
find . -maxdepth 0 -name '*.txt' -exec echo 1 \; | wc -l
For Grep, using the character '.' means: "any character"... so you'll need to escape the dot:
ls | grep -e "\.txt"
edit in fact the -e option is not even necessary. this will do the trick:
ls | grep "\.txt"
If all you need is number of files with extension '.txt' in current directory only, then this will also help.
ls -l *.txt | wc -l

Linux command: How to 'find' only text files?

After a few searches from Google, what I come up with is:
find my_folder -type f -exec grep -l "needle text" {} \; -exec file {} \; | grep text
which is very unhandy and outputs unneeded texts such as mime type information. Any better solutions? I have lots of images and other binary files in the same folder with a lot of text files that I need to search through.
I know this is an old thread, but I stumbled across it and thought I'd share my method which I have found to be a very fast way to use find to find only non-binary files:
find . -type f -exec grep -Iq . {} \; -print
The -I option to grep tells it to immediately ignore binary files and the . option along with the -q will make it immediately match text files so it goes very fast. You can change the -print to a -print0 for piping into an xargs -0 or something if you are concerned about spaces (thanks for the tip, #lucas.werkmeister!)
Also the first dot is only necessary for certain BSD versions of find such as on OS X, but it doesn't hurt anything just having it there all the time if you want to put this in an alias or something.
EDIT: As #ruslan correctly pointed out, the -and can be omitted since it is implied.
Based on this SO question :
grep -rIl "needle text" my_folder
Why is it unhandy? If you need to use it often, and don't want to type it every time just define a bash function for it:
function findTextInAsciiFiles {
# usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
find "$1" -type f -exec grep -l "$2" {} \; -exec file {} \; | grep text
}
put it in your .bashrc and then just run:
findTextInAsciiFiles your_folder "needle text"
whenever you want.
EDIT to reflect OP's edit:
if you want to cut out mime informations you could just add a further stage to the pipeline that filters out mime informations. This should do the trick, by taking only what comes before :: cut -d':' -f1:
function findTextInAsciiFiles {
# usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
find "$1" -type f -exec grep -l "$2" {} \; -exec file {} \; | grep text | cut -d ':' -f1
}
find . -type f -print0 | xargs -0 file | grep -P text | cut -d: -f1 | xargs grep -Pil "search"
This is unfortunately not space save. Putting this into bash script makes it a bit easier.
This is space safe:
#!/bin/bash
#if [ ! "$1" ] ; then
echo "Usage: $0 <search>";
exit
fi
find . -type f -print0 \
| xargs -0 file \
| grep -P text \
| cut -d: -f1 \
| xargs -i% grep -Pil "$1" "%"
Another way of doing this:
# find . |xargs file {} \; |grep "ASCII text"
If you want empty files too:
# find . |xargs file {} \; |egrep "ASCII text|empty"
How about this:
$ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable'
If you want the filenames without the file types, just add a final sed filter.
$ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'
You can filter-out unneeded file types by adding more -e 'type' options to the last grep command.
EDIT:
If your xargs version supports the -d option, the commands above become simpler:
$ grep -rl "needle text" my_folder | xargs -d '\n' -r file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'
Here's how I've done it ...
1 . make a small script to test if a file is plain text
istext:
#!/bin/bash
[[ "$(file -bi $1)" == *"file"* ]]
2 . use find as before
find . -type f -exec istext {} \; -exec grep -nHi mystring {} \;
Here's a simplified version with extended explanation for beginners like me who are trying to learn how to put more than one command in one line.
If you were to write out the problem in steps, it would look like this:
// For every file in this directory
// Check the filetype
// If it's an ASCII file, then print out the filename
To achieve this, we can use three UNIX commands: find, file, and grep.
find will check every file in the directory.
file will give us the filetype. In our case, we're looking for a return of 'ASCII text'
grep will look for the keyword 'ASCII' in the output from file
So how can we string these together in a single line? There are multiple ways to do it, but I find that doing it in order of our pseudo-code makes the most sense (especially to a beginner like me).
find ./ -exec file {} ";" | grep 'ASCII'
Looks complicated, but not bad when we break it down:
find ./ = look through every file in this directory. The find command prints out the filename of any file that matches the 'expression', or whatever comes after the path, which in our case is the current directory or ./
The most important thing to understand is that everything after that first bit is going to be evaluated as either True or False. If True, the file name will get printed out. If not, then the command moves on.
-exec = this flag is an option within the find command that allows us to use the result of some other command as the search expression. It's like calling a function within a function.
file {} = the command being called inside of find. The file command returns a string that tells you the filetype of a file. Regularly, it would look like this: file mytextfile.txt. In our case, we want it to use whatever file is being looked at by the find command, so we put in the curly braces {} to act as an empty variable, or parameter. In other words, we're just asking for the system to output a string for every file in the directory.
";" = this is required by find and is the punctuation mark at the end of our -exec command. See the manual for 'find' for more explanation if you need it by running man find.
| grep 'ASCII' = | is a pipe. Pipe take the output of whatever is on the left and uses it as input to whatever is on the right. It takes the output of the find command (a string that is the filetype of a single file) and tests it to see if it contains the string 'ASCII'. If it does, it returns true.
NOW, the expression to the right of find ./ will return true when the grep command returns true. Voila.
I have two issues with histumness' answer:
It only list text files. It does not actually search them as
requested. To actually search, use
find . -type f -exec grep -Iq . {} \; -and -print0 | xargs -0 grep "needle text"
It spawns a grep process for every file, which is very slow. A better solution is then
find . -type f -print0 | xargs -0 grep -IZl . | xargs -0 grep "needle text"
or simply
find . -type f -print0 | xargs -0 grep -I "needle text"
This only takes 0.2s compared to 4s for solution above (2.5GB data / 7700 files), i.e. 20x faster.
Also, nobody cited ag, the Silver Searcher or ack-grep¸as alternatives. If one of these are available, they are much better alternatives:
ag -t "needle text" # Much faster than ack
ack -t "needle text" # or ack-grep
As a last note, beware of false positives (binary files taken as text files). I already had false positive using either grep/ag/ack, so better list the matched files first before editing the files.
Although it is an old question, I think this info bellow will add to the quality of the answers here.
When ignoring files with the executable bit set, I just use this command:
find . ! -perm -111
To keep it from recursively enter into other directories:
find . -maxdepth 1 ! -perm -111
No need for pipes to mix lots of commands, just the powerful plain find command.
Disclaimer: it is not exactly what OP asked, because it doesn't check if the file is binary or not. It will, for example, filter out bash script files, that are text themselves but have the executable bit set.
That said, I hope this is useful to anyone.
I do it this way:
1) since there're too many files (~30k) to search thru, I generate the text file list daily for use via crontab using below command:
find /to/src/folder -type f -exec file {} \; | grep text | cut -d: -f1 > ~/.src_list &
2) create a function in .bashrc:
findex() {
cat ~/.src_list | xargs grep "$*" 2>/dev/null
}
Then I can use below command to do the search:
findex "needle text"
HTH:)
I prefer xargs
find . -type f | xargs grep -I "needle text"
if your filenames are weird look up using the -0 options:
find . -type f -print0 | xargs -0 grep -I "needle text"
bash example to serach text "eth0" in /etc in all text/ascii files
grep eth0 $(find /etc/ -type f -exec file {} \; | egrep -i "text|ascii" | cut -d ':' -f1)
If you are interested in finding any file type by their magic bytes using the awesome file utility combined with power of find, this can come in handy:
$ # Let's make some test files
$ mkdir ASCII-finder
$ cd ASCII-finder
$ dd if=/dev/urandom of=binary.file bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.009023 s, 116 MB/s
$ file binary.file
binary.file: data
$ echo 123 > text.txt
$ # Let the magic begin
$ find -type f -print0 | \
xargs -0 -I ## bash -c 'file "$#" | grep ASCII &>/dev/null && echo "file is ASCII: $#"' -- ##
Output:
file is ASCII: ./text.txt
Legend: $ is the interactive shell prompt where we enter our commands
You can modify the part after && to call some other script or do some other stuff inline as well, i.e. if that file contains given string, cat the entire file or look for a secondary string in it.
Explanation:
find items that are files
Make xargs feed each item as a line into one liner bash
command/script
file checks type of file by magic byte, grep checks if ASCII
exists, if so, then after && your next command executes.
find prints results null separated, this is good to escape
filenames with spaces and meta-characters in it.
xargs , using -0 option, reads them null separated, -I ##
takes each record and uses as positional parameter/args to bash
script.
-- for bash ensures whatever comes after it is an argument even
if it starts with - like -c which could otherwise be interpreted
as bash option
If you need to find types other than ASCII, simply replace grep ASCII with other type, like grep "PDF document, version 1.4"
find . -type f | xargs file | grep "ASCII text" | awk -F: '{print $1}'
Use find command to list all files, use file command to verify they are text (not tar,key), finally use awk command to filter and print the result.
How about this
find . -type f|xargs grep "needle text"

Resources