awk: Iterate through content of a large list of files [closed] - linux

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 9 years ago.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Questions must demonstrate a minimal understanding of the problem being solved. Tell us what you've tried to do, why it didn't work, and how it should work. See also: Stack Overflow question checklist
Improve this question
So, I have about 60k-70k vCard-Files and want to check (or, at this point, count), which vCards contain a mail address (EMAIL;INTERNET:me#my-domain.com)
I tried to pass the output of find to awk, but I just get awk to work with the files list, not with every files content. How can I get awk to do so? I tried several combinations of find, xargs and awk, but I don't get it to work properly.
Thanks for your help,
Wolle

I'd probably use grep for this.
If you want to extract adresses from the files:
grep -rio "EMAIL;INTERNET:.*#[a-z0-9-]*\.[a-z]*" *
Use cut, sed or awk to remove the leading EMAIL;INTERNET::
... | cut -d: -f2
... | sed "s/.*://"
... | awk -F: '{print $2}'
If you want the names of the files containing a particular address:
grep -ril "EMAIL;INTERNET:me#my-domain\.com" *
If grep can't process that many files at once, drop the -r option and try with find and xargs:
find /start/dir -name "*.vcf" -print0 | xargs -0 -I {} grep -io "..." {}

grep recursive can do this
grep -r 'EMAIL.+#'

Related

Bash Console putting some invisible chars into string var [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Below I shared my console. I want to cut some string from output of some commands.
But there are 17 extra chars which I have no idea where comes from.
Can someone pls explain to me?
$ ls -al | grep total | sed 's/[[:blank:]].*$//' | wc -m
23
$ ns="total"
$ echo $ns | sed 's/[[:blank:]].*$//' | wc -c
6
But there are 17 extra chars which I have no idea where comes from.
Those are ANSI escape codes that grep uses for coloring matching substrings. You probably have an alias (run alias | grep grep to examine) like
alias grep='grep --color=always'
somewhere that causes grep to color matches even if output is not a tty, or something similar.
Try
ls -al | grep --color=never total | sed 's/[[:blank:]].*$//' | wc -m
and you'll get six.

Use grep and get text after the pattern [duplicate]

This question already has answers here:
How to grep for contents after pattern?
(8 answers)
Closed 4 years ago.
I need to get the IP from a log I need to grep the true-client and after that I need to grep true-client-ip=[191.168.171.15] and get just the IP
2019.02.14-08:26:06:713,asd:1234:chan,0.000,asd,S,request-begin-site,POST,{remoteHost=1.2.3.4,remoteAddr=1.2.3.4,requestType=POST,serverName=api=[text/html],accept-charset=[iso-12345-15, utf-8;q=0.5, *;q=0.5],accept-encoding=[gzip],server-origin=[5],cache-control=[no-cache, max-age=0],pragma=[no-cache],program-header=[true],te=[chunked;q=1.0],true-client-ip=[191.168.171.15],true-host=[www.server.com]
I was trying grep -o "true-client-ip=[^ ]*," but it brings me:
true-client-ip=[191.168.171.15],true-host=[www.server.com]
I need just true-client-ip=[191.168.171.15] so I can cut after to bring get the IP like true-client-ip=[191.168.171.15] | cut -d= -f2
Using grep -P flag if available :
grep -oP 'true-client-ip=\[\K[^]]*'
Perl's \K meta-character discards what precedes when displaying the result, so it will match the "true-client-ip=[" part but only display the IP.
If grep -P isn't available, I would use sed :
sed -nE 's/.*true-client-ip=\[([^]]*).*/\1/p'
If you have GNU grep, you can do it like this:
$ grep -oP "(?<=true-client-ip=\[)[^\]]*" file
191.168.171.15
The (?<=) is called Positive Lookbehind, which you can find related doc here.
The backslash \ in [^\]] is actually unnecessary, I just feel like to add it to make it more intuitive, less misleading-prone :-) .

Compare ZIP file with dir with shell command [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I have compressed a lot of files with zip from infozip.org.
How do I make sure that the zip file contains all the files from the original files. Or is there a GUI tool do to it.
You can install a command line tool called unzip, and run
$unzip -l yourzipfile.zip
Files contained in yourzipfile.zip will be listed.
========
To verify files automatically, you can follow these steps.
If files compressed into yourzipfile.zip is in dir1, you can first unzip yourzipfile.zip into dir2, then you may compare files in dir1 and dir2 by running
$ diff --brief -r dir1/ dir2/
I tried to do this myself, and you can string together a few things to do this without unzipping to a directory
diff <(unzip -l foo.zip | cut -d':' -f2 | cut -d' ' -f4-100 | sed 's/\/$//' | sort) <(find somedir/ | sort)
Basic breakdown is:
Use diff to compare output streams of 2 commands
diff <(command1) <(command2)
Use unzip -l, and process the output. I used 2 cuts to get just the filenames, remove trailing / on directories, and finally sort:
unzip -l foo.zip | cut -d':' -f2 | cut -d' ' -f4-100 | sed 's/\/$//' | sort
For the directory listing, a simple find and sort
find somedir/ | sort

Scripting with unix to get the processes run by users [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
If I find out I have two users logged (UserA and UserB) in to the systems right now, How do i find out the processes run by those two users. but, the trick here is the script is to be run in an unattended batch without any input from the keyboard. other than being invoked.
I know the first part of the script would be
who | awk '{print $1}'
the output of this would be
UserA
UserB
What I would like to know is, how can I use this output and shove it with some ps command automatically and get the required result.
I finally figured out the one-liner I was searching for, with the help of the other answers (updated for case where no users logged in - see comments).
ps -fU "`who | cut -d' ' -f1 | uniq | xargs echo`" 2> /dev/null
The thing inside the backticks is executed and "inserted at the spot". It works as follows:
who : you know what that does
cut -d' ' : split strings into fields, using ' ' as separator
-f1 : and return only field 1
uniq : return only unique entries
xargs echo : take each of the values piped in, and send them through echo: this strips the \n
2> /dev/null : if there are any error messages (sent to 2: stderr)
: redirect those to /dev/null - i.e. "dump them, never to be seen again"
The output of all that is
user1 user2 user3
...however many there are. And you then call ps with the -fU flags, requesting all processes for these users with full format (you can of course change these flags to get the formatting you want, just keep the -U in there just before the thing in "` `"
ps -fU user1 user2 user3
Get a list of users (using who), save to a file, then list all processes, and grep that (using the file you just created),
tempfile=/tmp/wholist.$$
who | cut -f1 -d' '|sort -u > $tempfile
ps -ef |grep -f $tempfile
rm $tempfile
LOGGED_IN=$( who | awk '{print $1}' | sort -u | xargs echo )
[ "$LOGGED_IN" ] && ps -fU "$LOGGED_IN"
The standard switch -U will restrict output to only those processes whose real user ID corresponds to any given as its argument. (E.g., ps -f -U "UserA UserB".)
Not sure if I'm understanding your question correctly, but you can pipe the output of ps through grep to get the processes run by a particular user, like so:
ps -ef | grep '^xxxxx '
where xxxxx is the user.

Distribution of different shells [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
What would be the command in linux to find the
distribution of different shells used by all users?
getent passwd | awk -F: '{print $7}' | sort | uniq -c
The getent command dumps the password database. Normally that's just a file, /etc/passwd, but it can come from other sources; using getent passwd rather than just reading /etc/passwd allows for that.
If your system doesn't have the getent command, find out what your system's equivalent is (perhaps ypcat passwd if your system uses NIS), or just read the /etc/passwd file directly if you're sure the information isn't stored elsewhere.
The awk command grabs the 7th colon-delimited field from each line, which is the login shell for that account.
sort | uniq -c prints the number of occurrences of each shell. Add | sort -rn if you want the list in decreasing order of popularity.
Note carefully that this lists the login shells for all accounts on the system, many of which do not actually correspond to users. There are various ways to filter the list (typically the numeric user id, the 3rd field, starts at 1000), but none that are 100% reliable.
Run this as root ! This will give you the username and their login shell.
grep -v "nologin" /etc/passwd | awk 'BEGIN{FS=":"}{print $1,$7}'
You could cat out /etc/passwd, awk out the shell field, grep -v out anything you didn't want and then sort unique it. Like so:
cat /etc/passwd | awk -F ":" '{print $7}' | grep -v "whatever" | sort | uniq -c
On my mac (which doesn't have any "real" users) this results in
10 with no shell, 1 with /bin/sh, 70 with /usr/bin/false and 1 with /usr/sbin/uucico
Presumably on a system with actual users there'd be /bin/sh, /bin/ksh, /bin/csh and /bin/bash quantities.
It usually is just inside /etc/passwd (but as the above answer tells, is given by getent passwd); on some systems it could by a NIS/YP, LDAP, ... etc ... database (but see also pam). Details are configurable in /etc/nsswitch.conf (see nsswitch.conf(5) man page).
Also, the authorized login shells are listed in /etc/shells (see shells(5)); you need to add a shell's pathname there to make it changeable by chsh (see chsh(1)).

Resources