Linux using grep to print the file name and first n characters - linux

How do I use grep to perform a search which, when a match is found, will print the file name as well as the first n characters in that file? Note that n is a parameter that can be specified and it is irrelevant whether the first n characters actually contains the matching string.

grep -l pattern *.txt |
while read line; do
echo -n "$line: ";
head -c $n "$line";
echo;
done
Change -c to -n if you want to see the first n lines instead of bytes.

You need to pipe the output of grep to sed to accomplish what you want. Here is an example:
grep mypattern *.txt | sed 's/^\([^:]*:.......\).*/\1/'
The number of dots is the number of characters you want to print. Many versions of sed often provide an option, like -r (GNU/Linux) and -E (FreeBSD), that allows you to use modern-style regular expressions. This makes it possible to specify numerically the number of characters you want to print.
N=7
grep mypattern *.txt /dev/null | sed -r "s/^([^:]*:.{$N}).*/\1/"
Note that this solution is a lot more efficient that others propsoed, which invoke multiple processes.

There are few tools that print 'n characters' rather than 'n lines'. Are you sure you really want characters and not lines? The whole thing can perhaps be best done in Perl. As specified (using grep), we can do:
pattern="$1"
shift
n="$2"
shift
grep -l "$pattern" "$#" |
while read file
do
echo "$file:" $(dd if="$file" count=${n}c)
done
The quotes around $file preserve multiple spaces in file names correctly. We can debate the command line usage, currently (assuming the command name is 'ngrep'):
ngrep pattern n [file ...]
I note that #litb used 'head -c $n'; that's neater than the dd command I used. There might be some systems without head (but they'd pretty archaic). I note that the POSIX version of head only supports -n and the number of lines; the -c option is probably a GNU extension.

Two thoughts here:
1) If efficiency was not a concern (like that would ever happen), you could check $status [csh] after running grep on each file. E.g.: (For N characters = 25.)
foreach FILE ( file1 file2 ... fileN )
grep targetToMatch ${FILE} > /dev/null
if ( $status == 0 ) then
echo -n "${FILE}: "
head -c25 ${FILE}
endif
end
2) GNU [FSF] head contains a --verbose [-v] switch. It also offers --null, to accomodate filenames with spaces. And there's '--', to handle filenames like "-c". So you could do:
grep --null -l targetToMatch -- file1 file2 ... fileN |
xargs --null head -v -c25 --

Related

Move a file list based upon grep pattern in command line [duplicate]

I want to pass each output from a command as multiple argument to a second command, e.g.:
grep "pattern" input
returns:
file1
file2
file3
and I want to copy these outputs, e.g:
cp file1 file1.bac
cp file2 file2.bac
cp file3 file3.bac
How can I do that in one go? Something like:
grep "pattern" input | cp $1 $1.bac
You can use xargs:
grep 'pattern' input | xargs -I% cp "%" "%.bac"
You can use $() to interpolate the output of a command. So, you could use kill -9 $(grep -hP '^\d+$' $(ls -lad /dir/*/pid | grep -P '/dir/\d+/pid' | awk '{ print $9 }')) if you wanted to.
In addition to Chris Jester-Young good answer, I would say that xargs is also a good solution for these situations:
grep ... `ls -lad ... | awk '{ print $9 }'` | xargs kill -9
will make it. All together:
grep -hP '^\d+$' `ls -lad /dir/*/pid | grep -P '/dir/\d+/pid' | awk '{ print $9 }'` | xargs kill -9
For completeness, I'll also mention command substitution and explain why this is not recommended:
cp $(grep -l "pattern" input) directory/
(The backtick syntax cp `grep -l "pattern" input` directory/ is roughly equivalent, but it is obsolete and unwieldy; don't use that.)
This will fail if the output from grep produces a file name which contains whitespace or a shell metacharacter.
Of course, it's fine to use this if you know exactly which file names the grep can produce, and have verified that none of them are problematic. But for a production script, don't use this.
Anyway, for the OP's scenario, where you need to refer to each match individually and add an extension to it, the xargs or while read alternatives are superior anyway.
In the worst case (meaning problematic or unspecified file names), pass the matches to a subshell via xargs:
grep -l "pattern" input |
xargs -r sh -c 'for f; do cp "$f" "$f.bac"; done' _
... where obviously the script inside the for loop could be arbitrarily complex.
In the ideal case, the command you want to run is simple (or versatile) enough that you can simply pass it an arbitrarily long list of file names. For example, GNU cp has a -t option to facilitate this use of xargs (the -t option allows you to put the destination directory first on the command line, so you can put as many files as you like at the end of the command):
grep -l "pattern" input | xargs cp -t destdir
which will expand into
cp -t destdir file1 file2 file3 file4 ...
for as many matches as xargs can fit onto the command line of cp, repeated as many times as it takes to pass all the files to cp. (Unfortunately, this doesn't match the OP's scenario; if you need to rename every file while copying, you need to pass in just two arguments per cp invocation: the source file name and the destination file name to copy it to.)
So in other words, if you use the command substitution syntax and grep produces a really long list of matches, you risk bumping into ARG_MAX and "Argument list too long" errors; but xargs will specifically avoid this by instead copying only as many arguments as it can safely pass to cp at a time, and running cp multiple times if necessary instead.
The above will still work incorrectly if you have file names which contain newlines. Perhaps see also https://mywiki.wooledge.org/BashFAQ/020
#!/bin/bash
for f in files; do
if grep -q PATTERN "$f"; then
echo cp -v "$f" "${f}.bac"
fi
done
files can be *.txt or *.text which basically means files ending in *.txt or *text or replace with something that you want/need, of course replace PATTERN with yours. Remove echo if you're satisfied with the output. For a recursive solution take a look at the bash shell option globstar

How to move files where the first line contains a string?

I am currently using the following command:
grep -l -Z -E '.*?FindMyRegex' /home/user/folder/*.csv | xargs -0 -I{} mv {} /home/destination/folder
This works fine. The problem is it uses grep on the entire file.
I would like to use the grep command on the FIRST line of the file only.
I have tried to use head -1 file | at the beginning, but it did not work.
A change I would add to your script is -
for file in *.csv; do
head -1 "$file" | grep -l -Z -E '.*?FindMyRegex' | xargs -0 -I{} mv {} /home/destination/folder;
done
you can maybe try sed '1q' file.csv | grep ... to search the regexp only in the first line.
You don't need grep or find, as long as your files don't have embedded newlines.
I don't know an easy way off the top of my head to get sed to delimit with nulls.
mv $( for f in /home/user/folder/*.csv;
do sed -ns '1 { /yourPattern/F; q; }' $f;
done ) /home/destination/folder/
EDIT
Rewrote with a loop. This will run a separate instance of sed to check each file, but at least it shouldn't read beyond the first line. It will fail syntactically if there are no hits.
You might need -E depending on your regex.
-n says don't print records from the files.
-s says treat each file as a distinct input - this is so the filenames aren't always the first one.
This does require GNU sed for the F.
gawk 'FNR==1{if($0~/PATTERN/)
printf "mv %s %s\n",FILENAME, "/target";nextfile}' /path/*.csv
First of all, in your regex: .*?FindMyRegex the .*? doesn't make any sense, they could be removed.
The above awk (gawk) one-liner will build up mv file target command lines for you. You can check them, if you are satisfied with them, pipe the output to |sh , the commands are gonna be executed.
replace PATTERN by your regex pattern, and /target by the real target dir.
The one-liner is assuming that the filenames don't contain special chars (space i.e.), if it is the case, add "s to the mv cmd.
using GNU awk to find the filenames, pipe the filenames into xargs
gawk -v pattern="myRegex" '
FNR == 1 {if ($0 ~ pattern) printf "%s\0", FILENAME; nextfile}
' *.csv | xargs -0 echo mv -t destination
If it looks OK, remove "echo"
Try this Shellcheck-clean Bash code:
#! /bin/bash
shopt -s nullglob # Globs that match nothing expand to nothing
shopt -s dotglob # Globs match files whose names start with '.'
dest=/home/destination/folder
for file in *.csv ; do
head -n 1 -- "$file" | grep -qE '.*?FindMyRegex' && mv -- "$file" "$dest"
done
shopt -s nullglob prevents an error if there are no .csv files in the directory.
shopt -s dotglob ensures that files whose name starts with '.' are handled.
The -- in the options for head and mv ensures that files whose names begin with - are handled correctly.
The quotes in "$file" and "$dest" ensure that names that contain whitespace (actually $IFS) characters (including newlines) or glob metacharacters are handled correctly.
Note that the .*? in the reqular expression is probably redundant, and may not do what you think it does (grep -E doesn't do non-greedy matching).

optimize xargs argument enumeration

Can this usage of xargs argument enumaration be optimized better?
The aim is to inject single argument in the middle of the actual command.
I do:
echo {1..3} | xargs -I{} sh -c 'for i in {};do echo line $i here;done'
or
echo {1..3} | for i in $(xargs -n1);do echo line $i here; done
I get:
line 1 here
line 2 here
line 3 here
which is what I need but I wondered if loop and temporary variable could be avoided?
You need to separate the input to xargs by newlines:
echo {1..3}$'\n' | xargs -I% echo line % here
For array expansions, you can use printf:
ar=({1..3})
printf '%s\n' "${ar[#]}" | xargs -I% echo line % here
(and if it's just for output, you can use it without xargs:
printf 'line %s here\n' "${ar[#]}"
)
Try without xargs. For most situations xargs is overkill.
Depending on what you really want you can choose a solution like
# Normally you want to avoid for and use while, but here you want the things splitted.
for i in $(echo {1 2 3} );do
echo line $i here;
done
# When you want 1 line turned into three, `tr` can help
echo {1..3} | tr " " "\n" | sed 's/.*/line & here/'
# printf will repeat itself when there are parameters left
printf "line %s here\n" $(echo {1..3})
# Using the printf feature you can avoid the echo
printf "line %s here\n" {1..3}
Maybe this?
echo {1..3} | tr " " "\n" | xargs -n1 sh -c ' echo "line $0 here"'
The tr replaces the spaces with newlines, so xargs sees three lines. I would not be surprised if there were a better (more efficient) solution, but this one is quite simple.
Please note I have modified my previous answer to remove the use of {}, which was suggested in the comments to eliminate a potential code injection vulnerability.
There is a not well known feature of GNU sed. You can add the e flag to the s command and then sed executes whatever is in the pattern space and replaces the pattern space with the output if that command.
If you are really only interested in the output of the echo commands, you might try this GNU sed example, which eliminates the temporary variable, the loop (and the xargs as well):
echo {1..3} | sed -r 's/([^ ])+/echo "line \1 here"\n/ge
it fetches one token (i.e. whatever is separated by the spaces)
replaces it with echo "line \1 here"\n command, with \1 replaced by the token
then executes echo
puts the output of the echo command back into pattern space
that means it outputs the result of the three echos
But an even better way to get the desired output is to skip the execution and do the transformation directly in sed, like this:
echo {1..3} | sed -r 's/([^ ])+ ?/line \1 here\n/g'

Shell script to get count of a variable from a single line output

How can I get the count of the # character from the following output. I had used tr command and extracted? I am curious to know what is the best way to do it? I mean other ways of doing the same thing.
{running_device,[test#01,test#02]},
My solution was:
echo '{running_device,[test#01,test#02]},' | tr ',' '\n' | grep '#' | wc -l
I think it is simpler to use:
echo '{running_device,[test#01,test#02]},' | tr -cd # | wc -c
This yields 2 for me (tested on Mac OS X 10.7.5). The -c option to tr means 'complement' (of the set of specified characters) and -d means 'delete', so that deletes every non-# character, and wc counts what's provided (no newline, so the line count is 0, but the character count is 2).
Nothing wrong with your approach. Here are a couple of other approaches:
echo $(echo {running_device,[test#01,test#02]}, |awk -F"#" '{print NF - 1}')
or
echo $((`echo {running_device,[test#01,test#02]} | sed 's+[^#]++g' | wc -c` - 1 ))
The only concern I would have is if you are running this command in a loop (e.g. once for every line in a large file). If that is the case, then execution time could be an issue as stringing together shell utilities incurs the overhead of launching processes which can be sloooow. If this is the case, then I would suggest writing a pure awk version to process the entire file.
Use GNU Grep to Avoid Character Translation
Here's another way to do this that I personally find more intuitive: extract just the matching characters with grep, then count grep's output lines. For example:
echo '{running_device,[test#01,test#02]},' |
fgrep --fixed-strings --only-matching # |
wc -l
yields 2 as the result.

Bash Sorting STDIN

I want to write a bash script that sorts the input by rules in different files. The first rule is to write all chars or strings in file1. The second rule is to write all numbers in file2. The third rule is to write all alphanumerical strings in file3. All specials chars must be ignored. Because I am not familiar with bash I don t know how to realize this.
Could someone help me?
Thanks,
Haniball
Thanks for the answers,
I wrote this script,
#!/bin/bash
inp=0 echo "Which filename for strings?"
read strg
touch $strg
echo "Which filename for nums?"
read nums
touch $nums
echo "Which filename for alphanumerics?"
read alphanums
touch $alphanums
while [ "$inp" != "quit" ]
do
echo "Input: "
read inp
echo $inp | grep -o '\<[a-zA-Z]+>' > $strg
echo $inp | grep -o '\<[0-9]>' > $nums
echo $inp | grep -o -E '\<[0-9]{2,}>' > $nums
done
After I ran it, it only writes string in the stringfile.
Greetings, Haniball
Sure can help. See here:
How To Ask Questions The Smart Way
Help Vampires: A Spotter’s Guide
cool site about the bash is here: http://wiki.bash-hackers.org/doku.php
for sorting try man sort
for pattern matching try man grep
other useful tools: man sed man awk man strings man tee
And it is always correct tag your homework as "homework" ;)
You can try something like:
<input_file strings -1 -a | tee chars_and_strings.txt |\
grep "^[A-Za-z0-9][A-Za-z0-9]*$" | tee alphanum.txt |\
grep "^[0-9][0-9]*$" > numonly.txt
The above is only for USA - no international (read unicode) chars, where things coming a little bit more complicated.
grep is sufficient (your question is a bit vague. If I got something wrong, let me know...)
Using the following input file:
this is a string containing words,
single digits as in 1 and 2 as well
as whole numbers 42 1066
all chars or strings
$ grep -o '\<[a-zA-Z]\+\>' sorting_input
this
is
a
string
containing
words
single
digits
as
in
and
as
well
all single digit numbers
$ grep -o '\<[0-9]\>' sorting_input
1
2
all multiple digit numbers
$ grep -o -E '\<[0-9]{2,}\>' sorting_input
42
1066
Redirect the output to a file, i.e. grep ... > file1
Bash really isn't the best language for this kind of task. While possible, ild highly recommend the use of perl, python, or tcl for this.
That said, you can write all of stdin from input to a temporary file with shell redirection. Then, use a command like grep to output matches to another file. It might look something like this.
#!/bin/bash
cat > temp
grep pattern1 > file1
grep pattern2 > file2
grep pattern3 > file3
rm -f temp
Then run it like this:
cat file_to_process | ./script.sh
I'll leave the specifics of the pattern matching to you.

Resources