Filtering file-list using grep

Filtering file-list using grep - linux

I am trying to list files in a specific directory whose name do not match a certain pattern.
For eg. list all files not ending with abc.yml
For this I am using the command:
ls | grep -v "*abc.yml"
However I still see the files ending with abc.yml, what am I doing wrong here?

Asterisk has a different meaning in regular expressions. In fact, putting it to the front of the expressions makes it match literally. You can remove it, as grep tries to match the expression anywhere on the line, it doesn't try to match the whole line. To add the "end of line" anchor, add $. Also, . matches any character, use \. to match a dot literally:
ls | grep -v 'abc\.yml$'
In some shells, you can use extended globbing to list the files without the need to pipe to grep. For example, in bash:
shopt -s extglob
ls !(*abc.yml)

Related

Bash script - Get part of a line of text from another file

I'm quite new to bash scripting. I have a script where I want to extract part of the value of a particular line in a separate config file and then use that value as a variable in the script.
For example:
Line 75 in a file named config.cfg
"ssl_cert_location=/etc/ssl/certs/thecert.cer"
I want just the value at the end of "thecert.cer" to then use in the script. I've tried awk and various uses of grep but I can't quite get just the name of the certificate.
Any help would be appreciated. Thanks
These are some examples of the commands I ran:
awk -F "/" '{print $4}' config.cfg
grep -o *.cer config.cfg
Is this possible to extract the value on that line and then edit the output so it just contains the name of the certificate file?

This is a pure Bash version of the basic functionality of basename:
cert=${line##*/}
which removes everything up to and including the final slash. It presupposes that you've already read the line.
Or, using sed:
cert=$(sed -n '75s/^.*\///p' filename)
or
cert=$(sed -n '/^ssl_cert_location=/s/^.*\///p' filename)
This gets the specified line based on the line number or the setting name and replaces everything up to and including the final slash with nothing. It ignores all other lines in the file (unless the setting is repeated in the case of the text match version). The text match version is better because it works no matter what line number the setting is on.
grep uses regular expressions (as does sed). The grep command in your command appears to have a glob expression which won't work. One way to use grep (GNU grep) is to use the PCRE feature (Perl Compatible Regular Expressions):
cert=$(grep -Po '^ssl_cert_location=.*/\K.*' filename)
This works similarly to the sed command.
I have anchored the regular expressions to the beginning of the line. If there may be leading white spaces (the line may be indented), change the regex so it looks something like this:
^[[:space:]]*ssl_cert_location=
which works for both indented and unindented lines.

There are many variants, but a simple one that comes to mind with grep is first getting the line, then matching only non-slashes at the end of the line:
<config.cfg grep '^ssl_cert_location=' | grep -o '[^/]*$'
Why didn't your grep command (grep -o *.cer config.cfg) work? Becasue *.cer is a shell glob pattern and will be expanded by the shell to matching file names, even before the grep process is even started. If there are no matching files, it will be passed verbatim, but * in regular expressions is a quantifier which needs a preceeding expression. . in regex is "match any single character". So what you wanted is probably grep -o '.*\.cer', but .* matches anything, including slashes.
An awk solution would look like the following:
awk -F/ '/^ssl_cert_location=/{print $NF}' config.cfg
It uses "/" as separator, finds only lines starting with "ssl_cert_location" and then prints the last (NF) field in from this line.
Or an equivalent sed solution, which matches the same line and then deletes everything including the last slash:
sed -n '/^ssl_cert_location=/s#^.*/##p' config.cfg
To store the output of any command in a variable, use command substitution:
var="$(command with arguments)"

Output the names of all files from file.txt, having the .conf extension

I need to output from a file file.txt the names of all files with the .conf extension.
grep .conf file.txt
But in the end, I get a file called dconf and a file with the config extension. How can I output everything else, but without these two?

The '.' has a special meaning, it says "any character". If you really want to match only the dot itself, you have to mask the character with:
grep "\.conf" file.txt
The masking with backslash must also be masked for the shell itself with ".
To see a list of regular expressions, you can take a look at online regex test.
Add on:
From the comments: How to see no file from the list which is named xyz.config
Answer: You have to tell grep that the regular expression ends at the end of the word with:
grep "\.conf\>" file.txt

TL;DR: you should instead do:
grep "\.conf\>" file.txt
grep uses Regular Expressions. The . character in a regex is a command which means "match any one character." So your command means "match any string which contains one character followed by c o n f in that order."
So, your regular expression will match what you are looking for, but it will also match strings that have things after your match (your .config example) as well as anything followed by "conf" (your dconf example)
So instead you want to tell grep that you are looking for a "string literal ." by escaping that character in your regular expression by preceding it with a backslash (\), and you want to describe what the end or your string input is like, which may be a newline or it may simply be a space.

How to print the longest word in a file by using combination of grep and wc

iam trining to find the longest word in a text file.
i tried it and find out the no of characters in the longest word in a file
by using the command
wc -L
i need to print the longest word By using this number and grep command .

If you must use the two commands give, I'd suggest:
grep -E ".{$(wc -L < test.txt)}" test.txt
The command substitution is used to build the correct brace expression to match the line(s) with exactly the given number of characters. -E is needed to enable extended regular expression support; otherwise, the braces need to be escaped: grep ".\{...\}" test.txt.
Using an awk command that makes a single pass through the file may be faster.

Why does grep work differently in Debian vs CentOs?

Take this simple shell function...
function search (){
grep -roiI --include $2 $1 . | sort -u
}
And then use it like this:
# search drop *.properties
In CentOs it will function as desired returning a list of grep'd results. However, in Debian, it parses the special chars in "*.properties" as a regex, thus not grep'ing properly. Why is Debian parsing special chars and CentOs not?

This sounds like different settings for the nullglob shell option which controls what happens when you use a glob (something with a wildcard) and there are no files matching that glob. With nullglob turned on, this would treat ".properties" as a list of files, even if that was an empty list, with nullglob turned off, this would treat ".properties" as a string if it didn't match any files. You can try disabling nullglob with shopt -u nullglob and turn it back on with shopt -s nullglob.
However, in this case, when you do NOT want *.properties to be treated as a glob ever, and you want this string to be passed directly into your script, you should be either escaping the * as search drop \*.properties or you should be quoting the string with either double or single quotes: search drop '*.properties'. Similarly, in your search script, you should be enclosing the $2, $1 parameters in double qoutes.

Maybe grep is not the issue. It can be a shell expansion problem.
On bash:
Bash scans each word for the characters ‘*’, ‘?’, and ‘[’. If one of these
characters appears, then the word is regarded as a pattern, and replaced with
an alphabetically sorted list of file names matching the pattern.

Extract file basename without path and extension in bash [duplicate]

This question already has answers here:
Extract filename and extension in Bash
(38 answers)
Closed 6 years ago.
Given file names like these:
/the/path/foo.txt
bar.txt
I hope to get:
foo
bar
Why this doesn't work?
#!/bin/bash
fullfile=$1
fname=$(basename $fullfile)
fbname=${fname%.*}
echo $fbname
What's the right way to do it?

You don't have to call the external basename command. Instead, you could use the following commands:
$ s=/the/path/foo.txt
$ echo "${s##*/}"
foo.txt
$ s=${s##*/}
$ echo "${s%.txt}"
foo
$ echo "${s%.*}"
foo
Note that this solution should work in all recent (post 2004) POSIX compliant shells, (e.g. bash, dash, ksh, etc.).
Source: Shell Command Language 2.6.2 Parameter Expansion
More on bash String Manipulations: http://tldp.org/LDP/LG/issue18/bash.html

The basename command has two different invocations; in one, you specify just the path, in which case it gives you the last component, while in the other you also give a suffix that it will remove. So, you can simplify your example code by using the second invocation of basename. Also, be careful to correctly quote things:
fbname=$(basename "$1" .txt)
echo "$fbname"

A combination of basename and cut works fine, even in case of double ending like .tar.gz:
fbname=$(basename "$fullfile" | cut -d. -f1)
Would be interesting if this solution needs less arithmetic power than Bash Parameter Expansion.

Here are oneliners:
$(basename "${s%.*}")
$(basename "${s}" ".${s##*.}")
I needed this, the same as asked by bongbang and w4etwetewtwet.

Pure bash, no basename, no variable juggling. Set a string and echo:
p=/the/path/foo.txt
echo "${p//+(*\/|.*)}"
Output:
foo
Note: the bash extglob option must be "on", (Ubuntu sets extglob "on" by default), if it's not, do:
shopt -s extglob
Walking through the ${p//+(*\/|.*)}:
${p -- start with $p.
// substitute every instance of the pattern that follows.
+( match one or more of the pattern list in parenthesis, (i.e. until item #7 below).
1st pattern: *\/ matches anything before a literal "/" char.
pattern separator | which in this instance acts like a logical OR.
2nd pattern: .* matches anything after a literal "." -- that is, in bash the "." is just a period char, and not a regex dot.
) end pattern list.
} end parameter expansion. With a string substitution, there's usually another / there, followed by a replacement string. But since there's no / there, the matched patterns are substituted with nothing; this deletes the matches.
Relevant man bash background:
pattern substitution:
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pat
tern just as in pathname expansion. Parameter is expanded and
the longest match of pattern against its value is replaced with
string. If pattern begins with /, all matches of pattern are
replaced with string. Normally only the first match is
replaced. If pattern begins with #, it must match at the begin‐
ning of the expanded value of parameter. If pattern begins with
%, it must match at the end of the expanded value of parameter.
If string is null, matches of pattern are deleted and the / fol
lowing pattern may be omitted. If parameter is # or *, the sub
stitution operation is applied to each positional parameter in
turn, and the expansion is the resultant list. If parameter is
an array variable subscripted with # or *, the substitution
operation is applied to each member of the array in turn, and
the expansion is the resultant list.
extended pattern matching:
If the extglob shell option is enabled using the shopt builtin, several
extended pattern matching operators are recognized. In the following
description, a pattern-list is a list of one or more patterns separated
by a |. Composite patterns may be formed using one or more of the fol
lowing sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns

Here is another (more complex) way of getting either the filename or extension, first use the rev command to invert the file path, cut from the first . and then invert the file path again, like this:
filename=`rev <<< "$1" | cut -d"." -f2- | rev`
fileext=`rev <<< "$1" | cut -d"." -f1 | rev`

If you want to play nice with Windows file paths (under Cygwin) you can also try this:
fname=${fullfile##*[/|\\]}
This will account for backslash separators when using BaSH on Windows.

Just an alternative that I came up with to extract an extension, using the posts in this thread with my own small knowledge base that was more familiar to me.
ext="$(rev <<< "$(cut -f "1" -d "." <<< "$(rev <<< "file.docx")")")"
Note: Please advise on my use of quotes; it worked for me but I might be missing something on their proper use (I probably use too many).

Use the basename command. Its manpage is here: http://unixhelp.ed.ac.uk/CGI/man-cgi?basename

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Filtering file-list using grep - linux

I am trying to list files in a specific directory whose name do not match a certain pattern. For eg. list all files not ending with abc.yml For this I am using the command: ls | grep -v "*abc.yml" However I still see the files ending with abc.yml, what am I doing wrong here?

Related

Bash script - Get part of a line of text from another file

Output the names of all files from file.txt, having the .conf extension

How to print the longest word in a file by using combination of grep and wc

Why does grep work differently in Debian vs CentOs?

Extract file basename without path and extension in bash [duplicate]

Categories

Resources