Script to compare 2 files line by line - linux

I have two text files:
File1.txt
dadads 434 43 {"4fsdf":"66db1" fdf1:"5834"}
gsgss 45 0 {"gsdg":"8853" sgdfg:"4631"}
fdf 767 4643 {"klhf":"3455" kgs:"4566"}
.
.
File2.txt
8853
6437437567
36265
4566
.
.
Output could be two files
Match.txt
gsgss 45 0 {"gsdg":"8853" sgdfg:"4631"}
fdf 767 4643 {"klhf":"3455" kgs:"4566"}
Non_Match.txt
dadads 434 43 {"4fsdf":"66db1" fdf1:"5834"}
Can someone help me write bash script for this?
I think i have the logic here if it helps:
for (rows in File1.txt) {
bool found = false;
for (id in File2.txt) {
if (row contains id) {
found = true;
echo row >> Match.txt
break;
}
}
if (!found) {
echo row >> Non_Match.txt
}
}
Edit Part:
I also have a bash script but its not helping as it is not putting the row which matches but instead only the ID that matches..
#!/bin/bash
set -e
file1="File2.txt"
file2="File1.txt"
for id in $(tail -n+1 "${file1}"); do
if ! grep "${id}" "${file2}"; then
echo "${id}" >>non_matches.txt
else
echo "${id}" >>matches.txt
fi
done

You could use grep -f to look for search patterns that are listed in a separate file. It'd probably be good to use the -F (fixed strings) and -w (match whole words) flags as well.
grep -Fw -f File2.txt File1.txt > Match.txt
grep -Fwv -f File2.txt File1.txt > Non_Match.txt

This sounds a bit like diff or wdiff if you want to do this on word level.
If you run diff on your two files, you will generate the following output:
< dadads 434 43 {"4fsdf":"66db1" fdf1:"5834"}
< gsgss 45 0 {"gsdg":"8853" sgdfg:"4631"}
< fdf 767 4643 {"klhf":"3455" kgs:"4566"}
---
> 8853
> 6437437567
> 36265
> 4566
It means that the "minimal" way (per line) to modify the first file into the second is removing all lines and add all new lines.
If however the second file would have been:
8853
6437437567
gsgss 45 0 {"gsdg":"8853" sgdfg:"4631"}
36265
4566
The diff output is:
1c1,2
< dadads 434 43 {"4fsdf":"66db1" fdf1:"5834"}
---
> 8853
> 6437437567
3c4,5
< fdf 767 4643 {"klhf":"3455" kgs:"4566"}
---
> 36265
> 4566
So diff no longer asks to remove the second line.
wdiff does approximately the same, but on word level:
[-dadads 434 43 {"4fsdf":"66db1" fdf1:"5834"}-]{+8853
6437437567+}
gsgss 45 0 {"gsdg":"8853" sgdfg:"4631"}
[-fdf 767 4643 {"klhf":"3455" kgs:"4566"}-]
{+36265
4566+}

Related

Line counting - How to exclude a directory and images?

In order to count the lines of my repository, I typed the code below, and found out that images and pdfs are also included in the word count.
git ls-files | xargs wc -l
When someone asks you for the scale of the repository, would you include the images/pdfs?
If not, could someone help me answer the questions below?
How to exclude the files under "/pdfs" directory
How to exclude .jpg and .png?
You can make use of cloc. It counts blank lines, comment lines, and physical lines of source code in many programming languages. Cloc can take file, directory, and/or archive names as inputs. For instance, if you want to count the number of lines of code in your repository and exclude some directories while counting, you can specify those directories separated by comma like this:
cloc --exclude-dir=imagedir,pdfdir your_repository
cloc will show you the report like this:
387 text files.
387 unique files.
22 files ignored.
github.com/AlDanial/cloc v 1.88 T=0.97 s (376.5 files/s, 152866.0 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Go 235 17216 11769 95308
InstallShield 2 410 0 11178
XML 41 1418 159 2738
Python 5 516 523 1792
Bourne Shell 21 266 283 1512
JSON 19 24 0 1005
Markdown 23 452 0 797
AsciiDoc 4 119 0 312
Ruby 4 44 31 238
YAML 4 4 2 113
WiX source 1 19 24 112
make 3 16 25 68
DOS Batch 2 13 2 38
WiX include 1 0 0 28
Dockerfile 1 13 9 17
-------------------------------------------------------------------------------
SUM: 366 20530 12827 115256
-------------------------------------------------------------------------------
You can also use CLOC with Git like this:
cloc $(git ls-files)
which is equivalent to
git ls-files | xargs cloc
cloc sounds like it does the job. You should remove space+tab from IFS if you use command sub though: IFS=$'\n' cloc $(git ls-files)
If you just want to know a word count or line count, you could bodge it together like this. It gives you the language too. Clone the repo, test for text file / file type, count lines, delete files.
#!/bin/sh -e
# Get dir name from URL + remove trailing slashes - works for _most_ urls
url=${1:? No URL given}
url=${url%/}; url=${url%/}
repo=${1##*/}
repo=${repo%.git}
dir=./$repo
# Clone repo in tmp
cd "${TMPDIR:-/tmp}"
[ -e "$dir" ] && { echo Exists: "$dir" >&2; exit 1; }
trap 'rm -rf "$dir"' EXIT INT
git clone "$url"
# Get column 1 width, for alignment
max_path_length=$(printf '%s\n' "$dir/"* | wc -L)
# Extract and print the data
printf '\n%s\n\n' "$repo text files details:"
for file in "$dir"/*; do
mime=$(file --brief --mime-type "$file")
type=${mime%%/*}
if [ "$type" = text ]; then
lines=$(grep -c . "$file") || true
lang=${mime##*/}
printf "%-${max_path_length}s %s\n" "${file#$dir}" "[$lang, $lines lines]"
total_lines=$((total_lines + lines))
fi
done
printf '\n%s\n\n' "${dir#./} total lines: $total_lines"
Example output:
$ git-wc 'git://git.savannah.gnu.org/sed.git'
Cloning into 'sed'...
remote: Counting objects: 6276, done.
remote: Compressing objects: 100% (1134/1134), done.
remote: Total 6276 (delta 4994), reused 6276 (delta 4994)
Receiving objects: 100% (6276/6276), 2.14 MiB | 495.00 KiB/s, done.
Resolving deltas: 100% (4994/4994), done.
sed text files details:
/AUTHORS [plain, 6 lines]
/BUGS [plain, 101 lines]
/COPYING [plain, 553 lines]
/ChangeLog-2014 [plain, 2586 lines]
/Makefile.am [x-makefile, 123 lines]
/NEWS [plain, 498 lines]
/README [plain, 12 lines]
/README-hacking [plain, 58 lines]
/THANKS.in [plain, 63 lines]
/basicdefs.h [x-c, 83 lines]
/bootstrap [x-shellscript, 930 lines]
/bootstrap.conf [plain, 121 lines]
/cfg.mk [plain, 343 lines]
/configure.ac [x-m4, 294 lines]
/init.cfg [plain, 163 lines]
/thanks-gen [x-perl, 12 lines]
sed total lines: 5946
If the repo is local, you can just adjust the input methods. I'm sure the idea is clear. I know cloning the whole repo may be the dumbest way to do something like this, but sometimes you just want to know a thing. Plus you can use bash/sh - eg. [[ "$file" == "$dir/<exclude-dir>/* ]].

Convert a text into time format using bash script

I am new to shell scripting.. I have a tab-separated file, e.g.,
0018803 01 1710 2050 002571
0018951 01 1934 2525 003277
0019362 02 2404 2415 002829
0019392 01 2621 2820 001924
0019542 01 2208 2413 003434
0019583 01 1815 2134 002971
Here, the 3rd and 4th column is representing Start Time and End Time.
I want to convert these two columns in proper timeFrame so that I can get 6th column as the exact time difference between column 4 and column 3 in hours and minutes.
Column 6 result will be 3:40, 5:51, 00:11, 1:59, 2:05.
One way with awk:
$ cat test.awk
# create a function to split hour and minute
function f(h, x) {
h[0] = substr(x,1,2)+0
h[1] = substr(x,3,2)+0
}
{
f(start, $3);
f(end, $4);
span = end[1] - start[1] > 0 \
? sprintf("%d:%02d", end[0]-start[0], end[1]-start[1]) \
: sprintf("%d:%02d", end[0]-start[0]-1, 60+end[1]-start[1]);
print $0 OFS span
}
then run the awk file as the following:
$ awk -f test.awk input_file
Edit: per #glenn jackman's suggestion, the code can be simplified (refer to #Kamil Cuk's method):
function g(x) {
return substr(x,1,2)*60 + substr(x,3,2)
}
{
span = g($4) - g($3)
printf("%s%s%d:%02d\n", $0, OFS, int(span/60), span%60)
}
A simple bash solution using arithmetic expansion:
while IFS='' read -r l; do
IFS=' ' read -r _ _ st et _ <<<"$l"
d=$(( (10#${et:0:2} * 60 + 10#${et:2:2}) - (10#${st:0:2} * 60 + 10#${st:2:2}) ))
printf "%s %02d:%02d\n" "$l" "$((d/60))" "$((d%60))"
done < intput_file_path
will output:
0018803 01 1710 2050 002571 03:40
0018951 01 1934 2525 003277 05:51
0019362 02 2404 2415 002829 00:11
0019392 01 2621 2820 001924 01:59
0019542 01 2208 2413 003434 02:05
0019583 01 1815 2134 002971 03:19
Here is one in GNU awk using time functions, mktime to convert to epoch time and strftime to convert the time to desired format HH:MM:
$ awk -v OFS="\t" '{
dt3="1970 01 01 " substr($3,1,2) " " substr($3,3,2) " 00"
dt4="1970 01 01 " substr($4,1,2) " " substr($4,3,2) " 00"
print $0,strftime("%H:%M",mktime(dt4)-mktime(dt3),1) # thanks #glennjackman,1 :)
}' file
Output ($6 only):
03:40
05:51
00:11
01:59
02:05
03:19

shell script to change directory to the tail result path

I'm using Tail to an error happen on the log lines like:
tail -f syschecklog.log | grep "ERROR processEvent: /mnt/docs/"
and this gives results like:
01.lnxp.com 2019-03-13 07:10:24, 345 ERROR processEvent: /mnt/docs/003217899/cfo paid ¿ inv -inc 1234321
So what I do manually is to change the path using cd:
cd /mnt/docs/003217899/
Is there any script to change directory automatically? As I run another manual script to change file names for the files contained in /003217899/, those like /003217899/ are happening many times a day, and they are changing, so I need this script to automatically catch those errors, and change the path then run a file name change script.
In addition to the above, the log line has another subfolder that contains a error file name like /mnt/docs/003217899/attch/fees ¿ to be paid. How can we cd to that directory?
After Altering [Update]
grep "ERROR processEvent: /mnt/docs/" syschecklog.log | sed 's#.*ERROR processEvent: /mnt/docs/ \(/.*\)/.*#\1#' | while read -r DIR
do
BASEDIR=${DIR%/*}
if [ "$BASEDIR" != /mnt/docs/ ]
then
( cd "$BASEDIR" && find -type f -exec touch {} + | python -c 'import os, re; [os.rename(i, re.sub(r"\?", "¿", i)) for i in os.listdir(".")]' )
fi
# end of code for additional requirement
( cd "$DIR" && find -type f -exec touch {} + | python -c 'import os, re;
[os.rename(i, re.sub(r"\?", "¿", i)) for i in os.listdir(".")]' )
done
Results:
[results][1]
3rd script updated for renameFiles();
$ renameFiles()
> {
> # The next line is copied unchanged from the question. This could be improved.
> find -type f -exec touch {} + | python -c 'import os, re; [os.rename(i, re.sub(r"\?", "¿", i)) for i in os.listdir(".")]'
> }
$
$ # Two possible variants because the question was modified.
$ #
$ # To process the complete input file as it is now
$ # grep "ERROR processEvent: /mnt/docs/" syschecklog.log | ...
$ #
$ # To continuously follow the file
$ # tail -f /mnt/docs/syschecklog.log | grep "ERROR processEvent: /mnt/docs/" | ...
$
$ grep "ERROR processEvent: /mnt/docs/" syschecklog.log | sed 's#.*ERROR processEvent: \(/.*\)/.*#\1#' | while read -r DIR
> do
> # additional requirement from comment: if DIR is /mnt/docs/003217899/attch
> # the script should be run both in .../003217899 and .../attch
> BASEDIR=${DIR%/*}
> if [ "$BASEDIR" != /mnt/docs/ ]
> then
> ( cd "$BASEDIR" && renameFiles)
> fi
> # end of code for additional requirement
> ( cd "$DIR" && renameFiles)
> done
-bash: cd: /mnt/docs/001234579/Exp8888861¿_Applicant_Case_Conference_l (No such file or directory): No such file or directory
-bash: cd: /mnt/docs/001888579/¿_SENIOR_RESOLUTION_MANAGER_i(No such file or directory): No such file or directory
-bash: cd: /mnt/docs/001234579/Exp2222276¿18 from all and Treatments Inc. February 27_ 20199999(No such file or directory): No such file or directory
3rd results [3rd results][2]
-bash: cd: /mnt/docs/001234579/Exp8888861¿_Applicant_Case_Conference_l (No such file or directory): No such file or directory
-bash: cd: /mnt/docs/001888579/¿_SENIOR_RESOLUTION_MANAGER_i(No such file or directory): No such file or directory
-bash: cd: /mnt/docs/001234579/Exp2222276¿18 from all and Treatments Inc. February 27_ 20199999(No such file or directory): No such file or directory
grep results as you requested;
grep "ERROR processEvent: /mnt/docs/" syschecklog.log
01.lnxp.com 3 2019-03-14 07:04:30,446 ERROR processEvent: /mnt/docs/001111224/Exposure2178861/Email_from_LAT__18_009945_AABS¿__Summary_not_received12128050 (No such file or directory)
01.lnxp.com 3 2019-03-14 07:05:13,137 ERROR processEvent: /mnt/docs/001567890/Coop_subro_question__TO__ZED_LANDERS_¿_SENIOR__Basse12130781 (No such file or directory)
01.lnxp.com 3 2019-03-14 07:05:19,914 ERROR processEvent: /mnt/docs/001323289/Exposure2622276/OCF¿18 from All and Treatments Inc. February 27_ 201912129762 (No such file or directory)
Results of Locale
$ locale
LANG=en_CA.UTF-8
LC_CTYPE="en_CA.UTF-8"
LC_NUMERIC="en_CA.UTF-8"
LC_TIME="en_CA.UTF-8"
LC_COLLATE="en_CA.UTF-8"
LC_MONETARY="en_CA.UTF-8"
LC_MESSAGES="en_CA.UTF-8"
LC_PAPER="en_CA.UTF-8"
LC_NAME="en_CA.UTF-8"
LC_ADDRESS="en_CA.UTF-8"
LC_TELEPHONE="en_CA.UTF-8"
LC_MEASUREMENT="en_CA.UTF-8"
LC_IDENTIFICATION="en_CA.UTF-8"
LC_ALL=
Results of fgrep python yourscript | od -c -tx1
$ fgrep python invert.sh | od -c -tx1
0000000 f i n d - t y p e
20 20 20 20 66 69 6e 64 20 20 2d 74 79 70 65 20
0000020 f - e x e c t o u c h {
66 20 20 2d 65 78 65 63 20 74 6f 75 63 68 20 7b
0000040 } + | p y t h o n - c
7d 20 2b 20 7c 20 70 79 74 68 6f 6e 20 2d 63 20
0000060 ' i m p o r t o s , r e ;
27 69 6d 70 6f 72 74 20 6f 73 2c 20 72 65 3b 20
0000100 [ o s . r e n a m e ( i , r e
5b 6f 73 2e 72 65 6e 61 6d 65 28 69 2c 20 72 65
0000120 . s u b ( r " \ ? " , " 302 277 "
2e 73 75 62 28 72 22 5c 3f 22 2c 20 22 c2 bf 22
0000140 , i ) ) f o r i i n o
2c 20 69 29 29 20 66 6f 72 20 69 20 69 6e 20 6f
0000160 s . l i s t d i r ( " . " ) ] '
73 2e 6c 69 73 74 64 69 72 28 22 2e 22 29 5d 27
0000200 \n
0a
0000201
I need to change each '?' in the filename to '¿' as the system creates '?' and it shows as '¿', so have to change to that where the server can understand it!
I found that Capital A with hat is created by itself in the system, using CAT
cat invert.sh
#!/bin/bash
renameFiles()
{
find -type f -exec touch {} + | python -c 'import os, re; [os.rename(i, re.sub(r"\?", "¿", i)) for i in os.listdir(".")]'
}
grep "ERROR processEvent: /mnt/docs/" syschecklog.log | sed 's#.*ERROR processEvent: /mnt/docs/ \(/.*\)/.*#\1#' | while read -r DIR
do
BASEDIR=${DIR%/*}
if [ "$BASEDIR" != /mnt/cc-docs ]
then
( cd "$BASEDIR" && renameFiles)
fi
( cd "$DIR" && renameFiles)
results of od -c -txl, on the error file;
echo *|od -c -tx1
0000000 O C F - 2 1 I n v 2 0 8 3 5
4f 43 46 2d 32 31 20 49 6e 76 20 32 30 38 33 35
0000020 9 9 A s s e s s M e d $ 6 2
39 39 20 41 73 73 65 73 73 4d 65 64 20 24 36 32
0000040 1 . 5 0 ( H a n g Q ) ?
31 2e 35 30 20 28 48 61 6e 67 20 51 29 20 3f 20
0000060 d t d F e b 2 7 _ 2 0 1 9
64 74 64 20 46 65 62 20 32 37 5f 20 32 30 31 39
0000100 1 2 1 7 4 5 8 3 \n
31 32 31 37 34 35 38 33 0a
0000111
Checked the systems when using eco on hex encoding on ¿, its attaching  to it as below;
$echo -e '\xc2\xbf'
¿
Script modified again for additional requirements.
(As I did not get answers to all questions I modified the script based on the incomplete information.)
Instead of processing two directories separately the script now uses find in the parent directory (or the only directory), renames and touches all files that contain a '?' in the name. (-name '*\?*').
#! /bin/bash
# Two possible variants because the question was modified.
#
# To process the complete input file as it is now
# fgrep "ERROR processEvent: /mnt/docs/" syschecklog.log | ...
#
# To continuously follow the file
# tail -f syschecklog.log| fgrep "ERROR processEvent: /mnt/docs/" | ...
# The "LANG=C sed ..." avoids problems with invalid UTF-8 characters that do not match '.' in sed's pattern
fgrep "ERROR processEvent: /mnt/docs/" syschecklog.log | LANG=C sed 's#.*ERROR processEvent: \(/mnt/docs/[^/]*\)/.*#\1#' | while IFS= read -r DIR
do
find "$DIR" -name '*\?*' | while IFS= read -r FILE
do
NEW=$(echo "$FILE"| tr '?' $'\xBF')
mv "$FILE" "$NEW"
touch "$NEW"
done
done
Note that grep and sed will switch to buffered output when used in a pipeline. This will delay the processing of the extracted lines. You might have to disable buffering for the commands in the pipeline, see http://mywiki.wooledge.org/BashFAQ/009
2nd major update
There was a problem with invalid characters. In a UTF-8 environment sed behaves strangely when the input contains bytes that are not valid UTF-8 charactes. The pattern . does not match these invalid characters. (The example file contains a byte with the value 0xBF. See http://www.linuxproblem.org/art_21.html. Setting LANG=C for the sed command fixes this problem.
I tested my script with the grep output added to the question. I wrote this into a file somelog.log. I modified my script to use grep pattern somelog.log | ... with a local file instead of using a log file with a full path which does not exist on my test system.
After adding LANG=C to the sed command the script ran successfully with the raw input file provided as an external link.
The output is
$ grep "ERROR processEvent: /mnt/docs/" syschecklog.log | sed 's#.*ERROR processEvent: \(/.*\)/.*#\1#' | while read -r DIR; do BASEDIR=${DIR%/*}; if [ "$BASEDIR" != /mnt/docs/ ]; then ( cd "$BASEDIR" && renameFiles); fi; ( cd "$DIR" && renameFiles); done
bash: cd: /mnt/docs/001234567: No such file or directory
bash: cd: /mnt/docs/001234567/Subdir9876543: No such file or directory
bash: cd: /mnt/docs/002345678: No such file or directory
bash: cd: /mnt/docs/003456789: No such file or directory
bash: cd: /mnt/docs/003456789/Subdir8765432: No such file or directory
... (more similar lines removed)
You can see that it tried to cd into the directories from the log messages. It does not show parts of the file name. In my case it simply failed because the directories don't exist. I think the script should work.
After replacing the two cd and renameFiles commands with find... the output with my test is
find: ‘/mnt/docs/001234567’: No such file or directory
find: ‘/mnt/docs/002345678’: No such file or directory
find: ‘/mnt/docs/003456789’: No such file or directory
...

find length of a fixed width file wtih a little twist

Hi Wonderful People/My Gurus and all kind-hearted people.
I've a fixed width file and currently i'm trying to find the length of those rows that contain x bytes. I tried couple of awk commands but, it is not giving me the result that i wanted. My fixed width contains 208bytes, but there are few rows that don't contain 208 bytes. I"m trying to discover those records that doesn't have 208bytes.
this cmd gave me the file length
awk '{print length;exit}' file.text
here i tried to print rows that contain 101 bytes, but it didn't work.
awk '{print length==101}' file.text
Any help/insights here would be highly helpful
With awk:
awk 'length() < 208' file
Well, length() gives you the number of characters, not bytes. This number can differ in unicode context. You can use the LANG environment variable to force awk to use bytes:
LANG=C awk 'length() < 208' file
Perl to the rescue!
perl -lne 'print "$.:", length if length != 208' -- file.text
-n reads the input line by line
-l removes newlines from the input before processing it and adds them to print
The one-liner will print line number ($.) and the length of the line for each line whose length is different than 208.
if you're using gawk, then it's no issue, even in typical UTF-8 locale mode :
length(s) = # chars native to locale,
# typically that means # utf-8 chars
match(s, /$/) - 1 = # raw bytes # this also work for pure-binary
# inputs, without triggering
# any error messages in gawk Unicode mode
Best illustrated by example :
0000000 3347498554 3381184647 3182945161 171608122
: Ɔ ** LJ ** Ȉ ** ɉ ** 㷽 ** ** : 210 : \n
072 306 206 307 207 310 210 311 211 343 267 275 072 210 072 012
: ? 86 ? 87 ? 88 ? 89 ? ? ? : 88 : nl
58 198 134 199 135 200 136 201 137 227 183 189 58 136 58 10
3a c6 86 c7 87 c8 88 c9 89 e3 b7 bd 3a 88 3a 0a
0000020
# gawk profile, created Sat Oct 29 20:32:49 2022
BEGIN {
1 __ = "\306\206\307\207\310" (_="\210") \
"\311\211\343\267\275"
1 print "",__,_
1 STDERR = "/dev/stderr"
1 print ( match(_, /$/) - 1, "_" ) > STDERR # *A
1 print ( length(__), match(__, /$/) - 1 ) > STDERR # *B
1 print ( (__~_), match(__, (_) ".*") ) > STDERR # *C
1 print ( RSTART, RLENGTH ) > STDERR # *D
}
1 | _ *A # of bytes off "_" because it was defined as 0x88 \210
5 | 11 *B # of chars of "__", and
# of bytes of it :
# 4 x 2-byte UC
# + 1 x 3-byte UC = 11
1 | 3 *C # does byte \210 exist among larger string (true/1),
# and which unicode character is 1st to
# contain \210 - the 3rd one, by original definition
3 | 3 *D # notice I also added a ".*" to the tail of this match() :
# if the left-side string being tested is valid UTF-8,
# then this will match all the way to the end of string,
# inclusive, in which you can deduce :
#
# "\210 first appeared in 3rd-to-last utf-8 character"
Combining that inferred understanding :
RLENGTH = "3 chars to the end, inclusive",
with knowledge of how many to its left :
RSTART - 1 = "2 chars before",
yields a total count of 3 + 2 = 5, affirming length()'s result

find all users who has over N process and echo them in shell

I'm writing script is ksh. Need to find all users who has over N process and echo them in shell.
N reads from ksh.
I know what I should use ps -elf but how parse it, find users with >N process and create array with them. Little troubles with array in ksh. Please help. Maybe simple solutions can help me instead of array creating.
s162103#helios:/home/s162103$ ps -elf
0 S s153308 4804 1 0 40 20 ? 17666 ? 11:03:08 ? 0:00 /usr/lib/gnome-settings-daemon --oa
0 S root 6546 1327 0 40 20 ? 3584 ? 11:14:06 ? 0:00 /usr/dt/bin/dtlogin -daemon -udpPor
0 S webservd 15646 485 0 40 20 ? 2823 ? п╪п╟я─я ? 0:23 /opt/csw/sbin/nginx
0 S s153246 6746 6741 0 40 20 ? 18103 ? 11:14:21 ? 0:00 iiim-panel --disable-crash-dialog
0 S s153246 23512 1 0 40 20 ? 17903 ? 09:34:08 ? 0:00 /usr/bin/metacity --sm-client-id=de
0 S root 933 861 0 40 20 ? 5234 ? 10:26:59 ? 0:00 dtgreet -display :14
...
when i type
ps -elf | awk '{a[$3]++;}END{for(i in a)if (a[i]>N)print i, a[i];}' N=1
s162103#helios:/home/s162103$ ps -elf | awk '{a[$3]++;}END{for(i in a)if (a[i]>N)print i, a[i];}' N=1
root 118
/usr/sadm/lib/smc/bin/smcboot 3
/usr/lib/autofs/automountd 2
/opt/SUNWut/lib/utsessiond 2
nasty 31
dima 22
/opt/oracle/product/Oracle_WT1/ohs/ 7
/usr/lib/ssh/sshd 5
/usr/bin/bash 11
that is not user /usr/sadm/lib/smc/bin/smcboot
there is last field in ps -elf ,not user
Something like this(assuming 3rd field of your ps command gives the user id):
ps -elf |
awk '{a[$3]++;}
END {
for(i in a)
if (a[i]>N)
print i, a[i];
}' N=3
The minimal ps command you want to use here is ps -eo user=. This will just print the username for each process and nothing more. The rest can be done with awk:
ps -eo user= |
awk -v max=3 '{ n[$1]++ }
END {
for (user in n)
if (n[user]>max)
print n[user], user
}'
I recommend to put the count in the first column for readability.
read number
ps -elfo user= | sort | uniq -c | while read count user
do
if (( $count > $number ))
then
echo $user
fi
done
That is best solution and it works!

Resources