A Shell Script to simulate the wc command with its options?

A Shell Script to simulate the wc command with its options? - linux

we have to write a shell script program , which works similar to wc command.
receives -l, -c and -w as its options.
Shell scripting syntax aside; MY QUESTION is that can we simulate logic of wc -c or wc -l or wc -w using sed or grep or anything else ; if yes then how?
IMP: Don't use wc in script

A single awk command that you can parameterize by setting the appropriate -v variables to 0:
LC_ALL=C awk -v l=1 -v w=1 -v c=1 '
{ wc+=NF; cc+=1+length($0) }
END { printf "%s\t%s\t%s\n", l ? NR : "", w ? wc: "", c ? cc : ""}
' file
Note:
For simplicity, you always get 3 \t-separated output fields, with fields whose output wasn't requested empty; it wouldn't be hard to modify this to emulate wc's output behavior, however.
As explained in choroba's grep answer, you must prepend LC_ALL=C  to awk ..., if you really want to count bytes (-c) rather than (potentially multi-byte) characters (-m).
To count characters (the equivalent of wc -m), remove LC_ALL=C  above.
Caveat: This won't work BSD awk, as also found on macOS, unfortunately, because it not Unicode-aware and always counts the number of bytes (try awk '{print length($0)}' <<<ü).
wc -l strictly counts the number of \n characters, so it doesn't count an incomplete line - one missing a trailing \n - at the end of its input; the above awk command, by contrast, does count that line (and an implied trailing newline in the byte/character count).
How it works:
awk's NF variables contains the number of fields on each input line, where the line is broken into fields by arbitrary runs of whitespace by default; in other words: by default, fields are words.
$0 is the input line at hand, whose length() tells you the number of characters / bytes, with 1 added to account for the \n character at the end of the line.
Note how variables wc and cc need to initialization, because awk implicitly treats empty/undefined variables as 0 in a numeric context (such as with compound operator +=).
NR contains the current, 1-based line number, which in the END block is equal to the total number of input lines.

Using awk:
-l:
awk 'END{print NR}' inFile
-w:
awk '{words+=NF}END{print words}' inFile
-c:
ls -l inFile | awk '{print $5}'

If you can use grep, simulating the line count is easy: just count how many times something that matches always happens:
grep -c '^' filename
This should output the same as wc -l (but it might report one more line if the file doesn't end in a newline).
To get the number of words, you can use the following pipeline:
grep -o '[^[:space:]]\+' filename | grep -c '^'
You need grep that supports the -o option which prints each matching string to a line of its own. The expression matches all non-space sequences, and piping them into what we used in the previous case just counts them.
To get the number of characters (wc -c), you can use
LC_ALL=C grep -o . filename | grep -c '^'
Setting LC_ALL is needed if your locale supports UTF-8, otherwise you'd count wc -m. You need to add the number of newlines to the output number, so
echo $(( $( grep -c '^' filename )
+ $( LC_ALL=C grep -o . filename | grep -c '^' ) ))

Related

Finding a process by argument string

I'm using ps, grep, and sed to try to identify some java processes that are uniquely identified by some specific argument, e.g. -DAppService=DDDABC_456 or -DAppService=DDDXYZ_456_cazorla. I want to return a comma separated list: PID,argument,process
I'm working on CentOS7. So far I'm only about half way down the line but getting tangled up.
I'm shooting for this:
1234,-DAppService=DDDABC_456,/usr/java/jdk1.8.0_112/bin/java
2345,-DAppService=DDDABC_456_cazorla,/usr/java/jdk1.8.0_112/bin/java
3456,-DAppService=DDDXYZ_789,/usr/java/jdk1.8.0_112/bin/java
4567,-DAppService=DDDXYZ_789_cazorla,/usr/java/jdk1.8.0_112/bin/java
Note that the argument may or may not have a suffix of "_cazorla".
I tried this but it loses the arguments (and the number of arguments may vary so I don't think I can continue with $9, $10, etc.):
ps -ef | grep DAppService=DDD[A-Z]*_[0-9]*(?:_[a-z]*)? | grep -v grep | awk '{OFS=","; print $2,$8}'
Gives me:
1234,/usr/java/jdk1.8.0_112/bin/java
2345,/usr/java/jdk1.8.0_112/bin/java
3456,/usr/java/jdk1.8.0_112/bin/java
4567,/usr/java/jdk1.8.0_112/bin/java
Also this which comma separates all the grep column results and all arguments too which I don't want:
ps -aef | grep DAppService=DDD[A-Z]*_[0-9]*(?:_[a-z]*)? | grep -v grep | sed -e "s/\s\+/,/g"
Actual result too much to list here but e.g.
user,1234,1,0,Jul03,pts/0,00:03:21,/usr/java/jdk1.8.0_112/bin/java,arg1,arg2,arg3,argn...
user,2345,1,0,Jul03,pts/0,00:03:21,/usr/java/jdk1.8.0_112/bin/java,arg1,arg2,arg3,argn...
user,3456,1,0,Jul03,pts/0,00:03:21,/usr/java/jdk1.8.0_112/bin/java,arg1,arg2,arg3,argn...
user,4567,1,0,Jul03,pts/0,00:03:21,/usr/java/jdk1.8.0_112/bin/java,arg1,arg2,arg3,argn...
My sed knowledge is pretty poor (as is awk but would be open to that as an option too). Once I'm happy with the commands I want to put them into a bash script that I can call from elsewhere.

ps -eo pid=,args= |\
awk '
{
for (i=3; i<=NF; i++)
if ($i ~ regex) {
print $1, $i, $2
next
}
}
' OFS=, regex='awk re to match arg'
ask ps to output just pid and the commandline
specify a regex to awk and have it check each argument (fields 3 to NF) for a match
if found, output pid ($1), command ($2), and the relevant argument ($i)
Notes:
awk can't distinguish cmd "arg1 with spaces" from cmd arg1 arg2 arg3 but that may not matter here
spaces in the command (eg. in a directory name in the path) will cause the command to be truncated at the first space
commas in the command (or the relevant argument) will break the csv output

How to check that the 4th character in a file is 'a' using linux grep command

To find the 2nd character it was grep -e '^.[aA]'. Then what will be for the 4th character? I tried grep -e'^...[aA]'. But it went wrong.

grep processes the input line by line. ^.[aA] is true if a or A is the second character on any line.
You can combine grep with head to only inspect the first line:
head -n1 filename | grep '^...[aA]'
But it still wouldn't work for a file whose first line is shorter than four characters:
x
ya
To really check the fourth character in a file, grep is not the best tool.
#! /bin/bash
read -N4 chars < filename
if [[ "${chars:3:1}" == [aA] ]] ; then
echo Found
fi
But if you tried hard enough, you can still use it. E.g., use tr to replace newlines by spaces, then you can run your grep:
tr '\n' ' ' < filename | grep '^...[aA]'

awk system does not take hyphens

I want to redirect the output of some command to awk and use system call in awk. But Awk does not accept flags with hyphen. For example, Lets say I have bunch of files, and I want to "cat" them. I would use ls -1 | awk '{ system(" cat " $1)}'
Now, if I want to print the line number also with -n then it does not work ls -1 | awk '{ system(" cat -n" $1)}'

You need a space between -n and the file name:
ls -1 | awk '{ system(" cat -n " $1)}'
Notes
-1 is not needed. ls implicitly prints 1 file per line when its output goes to a pipe.
Any file name with whitespace in it will cause this code to fail.
Parsing the output of ls is generally a bad idea. Both find and the shell offer superior handling of difficult file names.

John1024's helpful answer fixes your problem and contains helpful advice, but let me focus on the syntax aspects:
As a command string, cat -n <file> requires at least 1 space (or tab) between the n, which is an option, and <file>, which is an operand.
String concatenation works differently in awk than in the shell:
" cat -n" $1, despite the presence of a space between " cat -n" and $1, does not insert that space in the resulting string, because awk's string concatenation works by directly joining strings placed next to one another irrespective of intervening whitespace.
For instance, the following commands all yield string literal ab, irrespective of any whitespace between the operands of the string concatenation:
awk 'BEGIN { print "a""b" }'
awk 'BEGIN { print "a" "b" }'
awk 'BEGIN { s = "b"; print "a"s }'
awk 'BEGIN { s = "b"; print "a" s }'

this is not a proper use case for awk, you're better off with something like this
find . -maxdepth 1 -type f -exec cat -n {} \;

search a line that contain a special character using sed or awk

I wonder if there is a command in Linux that can help me to find a line that begins with "*" and contains the special character "|"
for example
* Date | Auteurs

Simply use:
grep -ne '^\*.*|' "${filename}"
Or if you want to use sed:
sed -n '/^\*.*|/{=;p}' "${filename}" | sed '{N;s/\n/:/}'
Or (gnu) awk equivalent (require to backslash the pipe):
awk '/^\*.*\|/' "${filename}"
Where:
^ : start of the line
\*: a literal *
.*: zero or more generic char (not newline)
| : a literal pipe
NB: "${filename}": i've assumed you're using the command in a script with the target file passed in a double quoted variable as "${filename}". In the shell simply use the actual name of the file (or the path to it).
UPDATE (line numbers)
Modify the above commands to obtain also the line number of the matched lines. With grep is simple as to add -n switch:
grep -ne '^\*.*|' "${filename}"
We obtain an output like this:
81806:* Date | Auteurs
To obtain exactly the same output from sed and awk we have to complicate the commands a little bit:
awk '/^\*.*\|/{print NR ":" $0}' "${filename}"
# the = print the line number, p the actual match but it's on two different lines so the second sed call
sed -n '/^\*.*|/{=;p}' "${filename}" | sed '{N;s/\n/:/}'

How to join multiple lines of filenames into one with custom delimiter

How do I join the result of ls -1 into a single line and delimit it with whatever I want?

paste -s -d joins lines with a delimiter (e.g. ","), and does not leave a trailing delimiter:
ls -1 | paste -sd "," -

EDIT: Simply "ls -m" If you want your delimiter to be a comma
Ah, the power and simplicity !
ls -1 | tr '\n' ','
Change the comma "," to whatever you want. Note that this includes a "trailing comma" (for lists that end with a newline)

This replaces the last comma with a newline:
ls -1 | tr '\n' ',' | sed 's/,$/\n/'
ls -m includes newlines at the screen-width character (80th for example).
Mostly Bash (only ls is external):
saveIFS=$IFS; IFS=$'\n'
files=($(ls -1))
IFS=,
list=${files[*]}
IFS=$saveIFS
Using readarray (aka mapfile) in Bash 4:
readarray -t files < <(ls -1)
saveIFS=$IFS
IFS=,
list=${files[*]}
IFS=$saveIFS
Thanks to gniourf_gniourf for the suggestions.

I think this one is awesome
ls -1 | awk 'ORS=","'
ORS is the "output record separator" so now your lines will be joined with a comma.

Parsing ls in general is not advised, so alternative better way is to use find, for example:
find . -type f -print0 | tr '\0' ','
Or by using find and paste:
find . -type f | paste -d, -s
For general joining multiple lines (not related to file system), check: Concise and portable “join” on the Unix command-line.

The combination of setting IFS and use of "$*" can do what you want. I'm using a subshell so I don't interfere with this shell's $IFS
(set -- *; IFS=,; echo "$*")
To capture the output,
output=$(set -- *; IFS=,; echo "$*")

Adding on top of majkinetor's answer, here is the way of removing trailing delimiter(since I cannot just comment under his answer yet):
ls -1 | awk 'ORS=","' | head -c -1
Just remove as many trailing bytes as your delimiter counts for.
I like this approach because I can use multi character delimiters + other benefits of awk:
ls -1 | awk 'ORS=", "' | head -c -2
EDIT
As Peter has noticed, negative byte count is not supported in native MacOS version of head. This however can be easily fixed.
First, install coreutils. "The GNU Core Utilities are the basic file, shell and text manipulation utilities of the GNU operating system."
brew install coreutils
Commands also provided by MacOS are installed with the prefix "g". For example gls.
Once you have done this you can use ghead which has negative byte count, or better, make alias:
alias head="ghead"

Don't reinvent the wheel.
ls -m
It does exactly that.

just bash
mystring=$(printf "%s|" *)
echo ${mystring%|}

This command is for the PERL fans :
ls -1 | perl -l40pe0
Here 40 is the octal ascii code for space.
-p will process line by line and print
-l will take care of replacing the trailing \n with the ascii character we provide.
-e is to inform PERL we are doing command line execution.
0 means that there is actually no command to execute.
perl -e0 is same as perl -e ' '

To avoid potential newline confusion for tr we could add the -b flag to ls:
ls -1b | tr '\n' ';'

It looks like the answers already exist.
If you want
a, b, c format, use ls -m ( Tulains Córdova’s answer)
Or if you want a b c format, use ls | xargs (simpified version of Chris J’s answer)
Or if you want any other delimiter like |, use ls | paste -sd'|' (application of Artem’s answer)

The sed way,
sed -e ':a; N; $!ba; s/\n/,/g'
# :a # label called 'a'
# N # append next line into Pattern Space (see info sed)
# $!ba # if it's the last line ($) do not (!) jump to (b) label :a (a) - break loop
# s/\n/,/g # any substitution you want
Note:
This is linear in complexity, substituting only once after all lines are appended into sed's Pattern Space.
#AnandRajaseka's answer, and some other similar answers, such as here, are O(n²), because sed has to do substitute every time a new line is appended into the Pattern Space.
To compare,
seq 1 100000 | sed ':a; N; $!ba; s/\n/,/g' | head -c 80
# linear, in less than 0.1s
seq 1 100000 | sed ':a; /$/N; s/\n/,/; ta' | head -c 80
# quadratic, hung

sed -e :a -e '/$/N; s/\n/\\n/; ta' [filename]
Explanation:
-e - denotes a command to be executed
:a - is a label
/$/N - defines the scope of the match for the current and the (N)ext line
s/\n/\\n/; - replaces all EOL with \n
ta; - goto label a if the match is successful
Taken from my blog.

If you version of xargs supports the -d flag then this should work
ls | xargs -d, -L 1 echo
-d is the delimiter flag
If you do not have -d, then you can try the following
ls | xargs -I {} echo {}, | xargs echo
The first xargs allows you to specify your delimiter which is a comma in this example.

ls produces one column output when connected to a pipe, so the -1 is redundant.
Here's another perl answer using the builtin join function which doesn't leave a trailing delimiter:
ls | perl -F'\n' -0777 -anE 'say join ",", #F'
The obscure -0777 makes perl read all the input before running the program.
sed alternative that doesn't leave a trailing delimiter
ls | sed '$!s/$/,/' | tr -d '\n'

Python answer above is interesting, but the own language can even make the output nice:
ls -1 | python -c "import sys; print(sys.stdin.read().splitlines())"

You can use:
ls -1 | perl -pe 's/\n$/some_delimiter/'

If Python3 is your cup of tea, you can do this (but please explain why you would?):
ls -1 | python -c "import sys; print(','.join(sys.stdin.read().splitlines()))"

ls has the option -m to delimit the output with ", " a comma and a space.
ls -m | tr -d ' ' | tr ',' ';'
piping this result to tr to remove either the space or the comma will allow you to pipe the result again to tr to replace the delimiter.
in my example i replace the delimiter , with the delimiter ;
replace ; with whatever one character delimiter you prefer since tr only accounts for the first character in the strings you pass in as arguments.

You can use chomp to merge multiple line in single line:
perl -e 'while (<>) { if (/\$/ ) { chomp; } print ;}' bad0 >test
put line break condition in if statement.It can be special character or any delimiter.

Quick Perl version with trailing slash handling:
ls -1 | perl -E 'say join ", ", map {chomp; $_} <>'
To explain:
perl -E: execute Perl with features supports (say, ...)
say: print with a carrier return
join ", ", ARRAY_HERE: join an array with ", "
map {chomp; $_} ROWS: remove from each line the carrier return and return the result
<>: stdin, each line is a ROW, coupling with a map it will create an array of each ROW

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

A Shell Script to simulate the wc command with its options? - linux

we have to write a shell script program , which works similar to wc command. receives -l, -c and -w as its options. Shell scripting syntax aside; MY QUESTION is that can we simulate logic of wc -c or wc -l or wc -w using sed or grep or anything else ; if yes then how? IMP: Don't use wc in script

Using awk: -l: awk 'END{print NR}' inFile -w: awk '{words+=NF}END{print words}' inFile -c: ls -l inFile | awk '{print $5}'

Related

Finding a process by argument string

How to check that the 4th character in a file is 'a' using linux grep command

awk system does not take hyphens

search a line that contain a special character using sed or awk

How to join multiple lines of filenames into one with custom delimiter

Categories

Resources