How to copy all files by single extension - linux

I have list of files a.xxx a.yyy. a.zzz
I need copy all files by select by extension
for example
ls *.xxx | xargs cp a.* dir
I write such code
ls mysql/db/*.MYD | xargs -n1 basename | sed 's/\.MYD//g' | xargs -i cp mysql/db/{}.* new_folder
but get error
cp: cannot stat 'mysql/db/ps_opc_social_customer.*'

The problem here is that the * in the last command gets expanded by the shell at the very instant when you press return, so the copy command does not get an expanded string, but the literal <file>.* string.
You need to get all the files you need in one go or to use a new shell to do the glob expansion for you:
ls mysql/db/*.MYD | xargs -n1 basename | sed 's/\.MYD//g' | xargs -i bash -c "cp mysql/db/{}.* new_folder"

Related

linux merge only subset of lines from multiple files

I have the following folder structure:
/drive1/180204_somerandomtext/file.csv
/drive1/180504_somerandomtext/file.csv
/drive1/190101_somerandomtext/file.csv
/drive1/190305_somerandomtext/file.csv
...
Each file.txt has the same structure, but contains different data. From a file.txt I want to extract only a subset of lines using the following command:
grep -A5000 -m1 -e 'Sample_ID,' /drive1/180204_somerandomtext/file.csv | tail -n+2
This command works and prints the next 5000 lines following the line that starts with 'Sample_ID,'
I've extended this command
grep -A5000 -m1 -e 'Sample_ID,' /drive1/180204_somerandomtext/file.csv | tail -n+2 | sed 's/^/180204_somerandomtext,/'
Using the 'sed' I now add to the beginning of each line the pattern '180204_somerandomtext', which is actually the name of the folder that contains the file.csv
I'm now stuck at the following steps:
how to do this for all file.csv files in the subfolders of drive1
how to store this result in one large file called 'samples.csv'
I've tried something with xargs. It works with the grep command, but piping the sed, isn't working then.
find /drive1/ -maxdepth 1 -name '1*' | cut -d '/' -f2 | xargs -I {} grep -A5000 -m1 -e 'Sample_ID,' /drive1/{}/file.csv | sed 's/^/{},/'
I'm also not a big fan of xargs, I'm finding find -exec much clearer to use, let me explain:
Imagine I would like to do something with a file file1.txt:
Command -sw1 param1 -sw2 param2.1 param2.2 file1.txt
Launch a command, and use switches sw1, sw2 with parameters param1, param2.1 and param2.2.
When I want to perform this for all file1.txt within a directory structure, I do the following:
find . -name "file1.txt" -exec Command -sw1 param1 -sw2 param2.1 param2.2 {} \;
So I just put the find command (with some information on where and what to find), and afterwards comes the -exec. After that -exec I put the exact command, where I replace my original filename by {} and I end the whole thing by \;.
In your case, it would be something like:
find /drive1 -name file.csv -exec grep -A5000 -m1 -e 'Sample_ID,' {} \;

Using pipes with find command in linux

I would like to find files in my home directory that start with '~', sort them numerically, print the first five and delete them using find command and pipes in Linux. I have a bash script:
#!/bin/bash
find ~/ -name "~*" | sort -n | head -5 | tee | xargs rm
This works fine for deleting files, but I was expecting tee command to print deleted files to standard output. All this command does is delete files, but there in so output in terminal. What should I add/ change?
Thank you.
You could just use the verbose flag on rm and it will tell you what it's deleting
find ~/ -name "~*" | sort -n | head -5 | xargs rm -v
Use man rm to see the docs
-v, --verbose
explain what is being done
You can use rm -v to print each deleting filename:
find ~ -name '~*' -print0 | sort -zn | head -z -n 5 | xargs -0 rm -v
Also note use -print0 and all corresponding options in sort. head, xargs to address filenames with whitespace and glob characters.

How to pipe output from grep to cp?

I have a working grep command that selects files meeting a certain condition. How can I take the selected files from the grep command and pipe it into a cp command?
The following attempts have failed on the cp end:
grep -r "TWL" --exclude=*.csv* | cp ~/data/lidar/tmp-ajp2/
cp: missing destination file operand after
‘/home/ubuntu/data/lidar/tmp-ajp2/’ Try 'cp --help' for more
information.
cp `grep -r "TWL" --exclude=*.csv*` ~/data/lidar/tmp-ajp2/
cp: invalid option -- '7'
grep -l -r "TWL" --exclude=*.csv* | xargs cp -t ~/data/lidar/tmp-ajp2/
Explanation:
grep -l option to output file names only
xargs to convert file list from the standard input to command line arguments
cp -t option to specify target directory (and avoid using placeholders)
you need xargs with the placeholder option:
grep -r "TWL" --exclude=*.csv* | xargs -I '{}' cp '{}' ~/data/lidar/tmp-ajp2/
normally if you use xargs, it will put the output after the command, with the placeholder ('{}' in this case), you can choose the location where it is inserted, even multiple times.
This worked for me when searching for files with a specific date:
ls | grep '2018-08-22' | xargs -I '{}' cp '{}' ~/data/lidar/tmp-ajp2/
To copy files to grep found directories, use -printf to output directories and -i to place the command argument from xarg (after pipe)
find ./ -name 'filename.*' -print '%h\n' | xargs -i cp copyFile.txt {}
this copies copyFile.txt to all directories (in ./) containing "filename"
grep -rl '/directory/' -e 'pattern' | xargs cp -t /directory

pass output as an argument for cp in bash [duplicate]

This question already has answers here:
How to pass command output as multiple arguments to another command
(5 answers)
Closed 6 years ago.
I'm taking a unix/linux class and we have yet to learn variables or functions. We just learned some basic utilities like the flag and pipeline, output and append to file. On the lab assignment he wants us to find the largest files and copy them to a directory.
I can get the 5 largest files but I don't know how to pass them into cp in one command
ls -SF | grep -v / | head -5 | cp ? Directory
It would be:
cp `ls -SF | grep -v / | head -5` Directory
assuming that the pipeline is correct. The backticks substitute in the line the output of the commands inside it.
You can also make your tests:
cp `echo a b c` Directory
will copy all a, b, and c into Directory.
I would do:
cp $(ls -SF | grep -v / | head -5) Directory
xargs would probably be the best answer though.
ls -SF | grep -v / | head -5 | xargs -I{} cp "{}" Directory
Use backticks `like this` or the dollar sign $(like this) to perform command substitution. Basically this pastes each line of standard ouput of the backticked command into the surrounding command and runs it. Find out more in the bash manpage under "Command Substitution."
Also, if you want to read one line at a time you can read individual lines out of a pipe stream using "while read" syntax:
ls | while read varname; do echo $varname; done
If your cp has a "-t" flag (check the man page), that simplifies matters a bit:
ls -SF | grep -v / | head -5 | xargs cp -t DIRECTORY
The find command gives you more fine-grained ability to get what you want, instead of ls | grep that you have. I'd code your question like this:
find . -maxdepth 1 -type f -printf "%p\t%s\n" |
sort -t $'\t' -k2 -nr |
head -n 5 |
cut -f 1 |
xargs echo cp -t DIRECTORY

Delete files with string found in file - Linux cli

I am trying to delete erroneous emails based on finding the email address in the file via Linux CLI.
I can get the files with
find . | xargs grep -l email#example.com
But I cannot figure out how to delete them from there as the following code doesn't work.
rm -f | xargs find . | xargs grep -l email#example.com
Solution for your command:
grep -l email#example.com * | xargs rm
Or
for file in $(grep -l email#example.com *); do
rm -i $file;
# ^ prompt for delete
done
For safety I normally pipe the output from find to something like awk and create a batch file with each line being "rm filename"
That way you can check it before actually running it and manually fix any odd edge cases that are difficult to do with a regex
find . | xargs grep -l email#example.com | awk '{print "rm "$1}' > doit.sh
vi doit.sh // check for murphy and his law
source doit.sh
You can use find's -exec and -delete, it will only delete the file if the grep command succeeds. Using grep -q so it wouldn't print anything, you can replace the -q with -l to see which files had the string in them.
find . -exec grep -q 'email#example.com' '{}' \; -delete
I liked Martin Beckett's solution but found that file names with spaces could trip it up (like who uses spaces in file names, pfft :D). Also I wanted to review what was matched so I move the matched files to a local folder instead of just deleting them with the 'rm' command:
# Make a folder in the current directory to put the matched files
$ mkdir -p './matched-files'
# Create a script to move files that match the grep
# NOTE: Remove "-name '*.txt'" to allow all file extensions to be searched.
# NOTE: Edit the grep argument 'something' to what you want to search for.
$ find . -name '*.txt' -print0 | xargs -0 grep -al 'something' | awk -F '\n' '{ print "mv \""$0"\" ./matched-files" }' > doit.sh
Or because its possible (in Linux, idk about other OS's) to have newlines in a file name you can use this longer, untested if works better (who puts newlines in filenames? pfft :D), version:
$ find . -name '*.txt' -print0 | xargs -0 grep -alZ 'something' | awk -F '\0' '{ for (x=1; x<NF; x++) print "mv \""$x"\" ./matched-files" }' > doit.sh
# Evaluate the file following the 'source' command as a list of commands executed in the current context:
$ source doit.sh
NOTE: I had issues where grep could not match inside files that had utf-16 encoding.
See here for a workaround. In case that website disappears what you do is use grep's -a flag which makes grep treat files as text and use a regex pattern that matches any first-byte in each extended character. For example to match Entité do this:
grep -a 'Entit.e'
and if that doesn't work then try this:
grep -a 'E.n.t.i.t.e'
Despite Martin's safe answer, if you've got certainty of what you want to delete, such as in writing a script, I've used this with greater success than any other one-liner suggested before around here:
$ find . | grep -l email#example.com | xargs -I {} rm -rf {}
But I rather find by name:
$ find . -iname *something* | xargs -I {} echo {}
rm -f `find . | xargs grep -li email#example.com`
does the job better. Use `...` to run the command to offer the file names containing email.#example.com (grep -l lists them, -i ignores case) to remove them with rm (-f forcibly / -i interactively).
find . | xargs grep -l email#example.com
how to remove:
rm -f 'find . | xargs grep -l email#example.com'
Quick and efficent. Replace find_files_having_this_text with the text you want to search.
grep -Ril 'find_files_having_this_text' . | xargs rm

Resources