How to use "Regular Expression" in ps?

How to use "Regular Expression" in ps? - linux

I am trying to use
ps -C chromi*
to see all chromium processes, but no success. How can I use regular expression in here?

How to use "Regular Expression" in ps?
You cannot, ps does not support regular expressions. The argument is parsed literally.
How to use "Regular Expression" in ps?
You can patch procps ps to support it, most probably (with yet another!) additional flag. The patch looks simple, basically another tree traversing parse_* function that uses regex.h instead of strncmp.
I doubt such patch would make it upstream - it's typical to use other tools, most notably pgrep or shell with a pipe and grep, to filter process by command line name. ps has to stay POSIX compatible, and has so many options already.
Note that regular expression is not "globbing". Consult man 7 glob vs man 7 regex. Regular expression chromi* would match chrom or chromiiiii - chrom followed by zero or more i.
Note that unquoted arguments with "trigger" characters undergo filename expansion (ls 'chromi*' vs ls chromi*). This is different than passing the literal argument when there exist files that match the pattern. If the intention is to pass the pattern to the tool, quote the argument to prevent filename expansion.

I think you are looking for pgrep:
pgrep -f chromium
This will print pids only, no further information.
With the help of xargs, this can be piped to ps again for detailed output:
pgrep -f chromium | xargs ps -o pid,cmd,user,etime -p

Related

Why does find -regex command, differ from find | grep?

The find command below outputs nothing, and does not find any "include" files or directories.
find -regex "*include*" 2>/dev/null
However piping the find command into grep -E seems to find most include files.
find ./ 2>/dev/null | grep -E "*include*"
I've left out the output since the first is blank and the second matches quite a few files.
I'm starting to need to dig through linux system files to find the answers I need (especially to find macro values). In order to do that I have been using find | grep -E to find the files that that should have the macro I am looking for.
Below is the line I tried today with find (my root directory is /), and nothing is output. I don't want to run the command as root so I pipe the errors out to /dev/null. I checked the errors for regex syntax errors but nothing. Its still looping through all directories since I still get a "find: /var/lib: Permission Denied" Error
find -regex "*include*" 2>/dev/null
However this seems to work and give me everything I want.
find ./ 2>/dev/null | grep -E "*include*"
So my main question is why does find -regex not output the same as find | grep -E ?

Regular expressions are not a language, but a general mathematical construct with many different notations and dialects thereof.
For simple patterns you can often get away with ignoring this fact since most dialects use very similar notation, but since you are specifying an ill defined pattern with a leading asterisk, you get into engine specific behavior.
grep -E uses the GNU implementation of POSIX ERE, and interprets your pattern as ()*includ(e)* and therefore matches includ followed by zero or more es. (POSIX says that the behavior of a leading asterisk is undefined).
find uses Emacs Regex, and interprets it as \*includ(e)* and therefore requires a literal asterisk in the filename.
If you want the same result from both, you can use find -regextype posix-egrep, or you can specify a regex that is equivalent in both such as .*include.* to match include as a substring.

As I understand your question you want to find files in Linux directories
You should use this library
yum install locate
If you use ubuntu
sudo apt-get install locate
Prepare library
sudo updatedb
Then start search
locate include

convert this linux statement into a statement which is supported by windows command prompt

This is my statement supported by unix environment
"cat document.xml | grep \'<w:t\' | sed \'s/<[^<]*>//g\' | grep -v \'^[[:space:]]*$\'"
But I want to execute that statement in windows command prompt .
How do I do that? and what are the commands which are similar to cat, grep,sed .
please tell me the exact code supported for windows similar to above command

The double quotes around the pipeline in your question are a syntax error, and the backslashed single quotes should apparently really not have backslashes, but I assume it's just an artefact of a slightly imprecise presentation.
Here's what the code does.
cat document.xml |
This is a useless use of cat but its purpose is to feed the contents of this file into the pipeline.
grep '<w:t' |
This looks for lines containing the literal string <w:t (probably the start of a tag in the XML format in the file). The single quotes quote the string so that it is not interpreted by the shell (otherwise the < would be interpreted as a redirection operator); they are consumed by the shell, and not passed through to grep.
sed 's/<[^<]*>//g' |
This replaces every pair of open/close brokets with an empty string. The regular expression [^<]* matches zero or more occurrences of a character which can be anything except <. If the XML is well-formed, these should always occur in pairs, and so we effectively remove all XML tags.
grep -v '^[[:space:]]*$'
This removes any line which is empty or consists entirely of whitespace.
Because sed is a superset of grep, the program could easily be rephrased as a single sed script. Perhaps the easiest solution for your immediate problem would be to obtain a copy of sed for your platform.
sed -e '/<w:t/!d' -e 's/<[^<]*>//g' -e '/[^[:space]]/!d' document.xml
I understand quoting rules on Windows may be different; try with double quotes instead of single, or put the script in a file and use sed -f file document.xml where file contains the script itself, like this:
/<w:t/!d
s/<[^<]*>//g
/[^[:space]]/!d
This is a rather crude way to extract the CDATA from an XML document, anyway; perhaps some XML processor would be the proper way forward. E.g. xmlstarlet appears to be available for Windows. It works even if the XML input doesn't have the beginning and ending <w:t> tags on the same line, with nothing else on it. (In fact, parsing XML with line-oriented tools is a massive antipattern.)

May try with "powershell" ?
It is included since Win8 I think,
for sure on W10 it is.
I've just tested a "cat" command and it works.
"grep" don't but may be adapt like this :
PowerShell equivalent to grep -f
and
https://communary.wordpress.com/2014/11/10/grep-the-powershell-way/

The equivalent of grep on windows would be findstr and the equivalent of cat would be type.

File Glob Patterns in Linux terminal

I want to search a filename which may contain kavi or kabhi.
I wrote command in the terminal:
ls -l *ka[vbh]i*
Between ka and i there may be v or bh .
The code I wrote isn't correct. What would be the correct command?

A nice way to do this is to use extended globs. With them, you can perform regular expressions on Bash.
To start you have to enable the extglob feature, since it is disabled by default:
shopt -s extglob
Then, write a regex with the required condition: stuff + ka + either v or bh + i + stuff. All together:
ls -l *ka#(v|bh)i*
The syntax is a bit different from the normal regular expressions, so you need to read in Extended Globs that...
#(list): Matches one of the given patterns.
Test
$ ls
a.php AABB AAkabhiBB AAkabiBB AAkaviBB s.sh
$ ls *ka#(v|bh)i*
AAkabhiBB AAkaviBB

a slightly longer cmd line could be using find, grep and xargs. it has the advantage of being easily extended to different search terms (by either extending the grep statement or by using additional options of find), a bit more readability (imho) and flexibility in being able to execute specific commands on the files which are found
find . | grep -e "kabhi" -e "kavi" | xargs ls -l

You can get what you want by using curly braces in bash:
ls -l *ka{v,bh}i*
Note: this is not a regular expression question so much as a "shell globbing" question. Shell "glob patterns" are different from regular expressions, though they are similar in many ways.

My regular expression isn't working in grep

Here's the text of the file I'm working with:
(4 spaces)Hi, everyone
(1 tab)yes
When I run this command - grep '^[[:space:]]+' myfile - it doesn't print anything to stdout.
Why doesn't it match the whitespace in the file?
I'm using GNU grep version 2.9.

There are several different regular expression syntaxes. The default for grep is called basic syntax in the grep documentation.
From man grep(1):
In basic regular expressions the meta-characters
?, +, {, |, (, and ) lose their special meaning; instead
use the backslashed versions \?, \+, \{, \|, \(, and \).
Therefore instead of + you should have typed \+:
grep '^[[:space:]]\+' FILE
If you need more power from your regular expressions, I also encourage you to take a look at Perl regular expression syntax. They are generally considered the most expressive. There is a C library called PCRE which emulates them, and grep links to it. To use them (instead of basic syntax) you can use grep -P.

You could use -E:
grep -E '^[[:space:]]+' FILE
This enables extended regex. Without it you get BREs (basic regex) which have a more simplified syntax. Alternatively you could run egrep instead with the same result.

I found you need to escape the +:
grep '^[[:space:]]\+' FILE

Try grep -P '^\s+' instead, provided you’re using GNU grep. It’s a lot easier to type, and has better regexes.

Using ? with sed

I just want to get the number of a file that may or may not be gzip'd. However, it appears that a regular expression in sed does not support a ?. Here's what I tried:
echo 'file_1.gz'|sed -n 's/.*_\(.*\)\(\.gz\)?/\1/p'
and nothing was returned. Then I added a ? to the string being analyzed:
echo 'file_1.gz?'|sed -n 's/.*_\(.*\)\(\.gz\)?/\1/p'
and got:
1
So, it looks like the ? used in most regex's is not supported in sed, right? Well then, I would just like sed to give a 1 for file_1 and file_1.gz. What's the best way to do that in a bash script if execution time is critical?

The equivalent to x? is \(x\|\).
However, many versions of sed support an option to enable "extended regular expressions" which includes ?. In GNU sed the flag is -r. Note that this also changes unescaped parens to do grouping. eg:
echo 'file_1.gz'|sed -n -r 's/.*_(.*)(\.gz)?/\1/p'
Actually, there's another bug in your regex which is that the greedy .* in the parens is going to swallow up the ".gz" if there is one. sed doesn't have a non-greedy equivalent to * as far as I know, but you can use | to work around this. | in sed (and many other regex implementations) will use the leftmost match that works, so you can do something like this:
echo 'file_1.gz'|sed -r 's/(.*_(.*)\.gz)|(.*_(.*))/\2\4/'
This tries to match with .gz, and only tries without it if that doesn't work. Only one of group 2 or 4 will actually exist (since they are on opposite sides of the same |) so we just concatenate them to get the value we want.

If you're looking for an answer to the specific example given in the question, or why it uses the ? incorrectly (regardless of syntax), see the answer by Laurence Gonsalves.
If you're looking instead for the answer to the general question of why ? doesn't exhibit its special meaning in sed as you might expect:
By default, sed uses the " POSIX basic regular expressions syntax", so the question mark must be escaped as \? to apply its special meaning, otherwise it matches a literal question mark. As an alternative, you can use the -r or --regexp-extended option to use the "extended regular expression syntax", which reverses the meaning of escaped and non-escaped special characters, including ?.
In the words of the GNU sed documentation (view by running 'info sed' on Linux):
The only difference between basic and extended regular expressions is in
the behavior of a few characters: '?', '+', parentheses, and braces
('{}'). While basic regular expressions require these to be escaped if
you want them to behave as special characters, when using extended
regular expressions you must escape them if you want them to match a
literal character.
and the option is explained:
-r
--regexp-extended
Use extended regular expressions rather than basic regular
expressions. Extended regexps are those that `egrep' accepts;
they can be clearer because they usually have less backslashes,
but are a GNU extension and hence scripts that use them are not
portable.
Update
Newer versions of GNU sed now say this:
-E
-r
--regexp-extended
Use extended regular expressions rather than basic regular
expressions. Extended regexps are those that 'egrep' accepts; they
can be clearer because they usually have fewer backslashes.
Historically this was a GNU extension, but the '-E' extension has
since been added to the POSIX standard
(http://austingroupbugs.net/view.php?id=528), so use '-E' for
portability. GNU sed has accepted '-E' as an undocumented option
for years, and *BSD seds have accepted '-E' for years as well, but
scripts that use '-E' might not port to other older systems.
So, if you need to preserve compatibility with ancient GNU sed, stick with -r. But if you prefer better cross-platform portability on more modern systems (e.g. Linux+Mac support), go with -E (but note that there are still some quirks and differences between GNU sed and BSD sed, so you'll have to make sure your scripts are portable in any case).

echo 'file_1.gz'|sed -n 's/.*_\(.*\)\?\(\.gz\)/\1/p'
Works. You have to put the return in the right spot, and you have to escape it.

A function that should return a number that follows the '_' in a filename, regardless of file extension:
realname () {
local n=${$1##*/}
local rn="${n%.*}"
sed 's/^.*\_//g' ${$rn:-$n}
}

You should use awk which is superior to sed when it comes to field grabbing/parsing:
$ awk -F'[._]' '{print $2}' <<<"file_1"
1
$ awk -F'[._]' '{print $2}' <<<"file_1.gz"
1
Alternatively you can just use Bash's parameter expansion like so:
var=file_1.gz;
temp=${var#*_};
file=${temp%.*}
echo $file
Note: works when var=file_1 as well

Part of the solution lies in escaping the question mark or using the -r option.
sed 's/.*_\([^.]*\)\(\.\?[^.]\+\)\?$/\1/'
or
sed -r 's/.*_([^.]*)(\.?[^.]+)?$/\1/'
will work for:
file_1.gz
file_12.txt
file_123
resulting in:
1
12
123

I just realized that could do something very easy:
echo 'file_1.gz'|sed -n 's/.*_\([0-9]*\).*/\1/p'
Notice the [0-9]* instead of a .*. #Laurence Gonsalves's answer made me realize the greediness of my previous post.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string