So the following command in Linux is to order a Nginx access.log file by those making the most requests.
'awk '{ print $1 }' access.log | uniq -c | sort -nr | more'
What it the equivalent for this command in Windows Powershell ?
Get-Content access.log | ForEach-Object { $_.split()[0] -as [IPAddress] } | Sort-Object | Select-Object -Unique -ExpandProperty IPAddressToString
or
gc access.log |%{ $_.Split()[0] -as [IPAddress] } | sort -U |%{ "$_" }
Read the file
Process it line by line
Split on spaces and take the first element
Cast it to an IPAddress type so it will sort numerically
Sort and deduplicate one way or another
Get the string representation of the [IPAddress] back out
NB. your code doesn't do what you claim; you need to be sorting first, before uniq, as it only removes consecutive duplicates, not all duplicates.
Related
I would like to know the count of unique values in column using linux commands. The column has values like below (data is edited from previous ones). I need to ignore .M, .Q and .A at the end and just count the unique number of plants
"series_id":"ELEC.PLANT.CONS_EG_BTU.56855-ALL-ALL.M"
"series_id":"ELEC.PLANT.CONS_EG_BTU.56855-ALL-ALL.Q"
"series_id":"ELEC.PLANT.CONS_EG_BTU.56855-WND-ALL.A"
"series_id":"ELEC.PLANT.CONS_EG_BTU.56868-LFG-ALL.Q"
"series_id":"ELEC.PLANT.CONS_EG_BTU.56868-LFG-ALL.A"
"series_id":"ELEC.PLANT.CONS_EG_BTU.56841-WND-WT.Q"
"series_id":"ELEC.CONS_TOT.COW-GA-2.M"
"series_id":"ELEC.CONS_TOT.COW-GA-94.M"
I've tried this code but I'm not able to avoid those suffix
cat ELEC.txt | grep 'series_id' | cut -d, -f1 | wc -l
For above sample, expected count should be 6 but I get 8
This should do the job:
grep -Po "ELEC.PLANT.*" FILE | cut -d. -f -4 | sort | uniq -c
You first grep for the "ELEC.PLANT." part
remove the .Q,A,M
remove duplicates and count using sort | uniq -c
EDIT:
for the new data it should be only necessary to do the following:
grep -Po "ELEC.*" FILE | cut -d. -f -4 | sort | uniq -c
When you have to do some counting, you can easily do it with awk. Awk is an extremely versatile tool and I strongly recommend you to have a look at it. Maybe start with Awk one-liners explained.
Having that said, you can easily do some conditioned counting here:
What you want, is to count all unique lines which have series_id in it.
awk '/series_id/ && (! $0 in a) { c++; a[$0] } END {print c}'
This essentially states: if my line contains "series_id" and I did not store the line in my array a, then it means I did not encounter my line yet and increase the counter c with 1. At the END of the program, I print the count c.
Now you want to clean things up a bit. Your lines of interest essentially look like
"something":"something else"
So we are interested in something else which is in the 4th field if " is a field separator, and we are only interested in that if something is series_id located in field 2.
awk -F'"' '($2=="series_id") && (! $4 in a ) { c++; a[$4] } END {print c}'
Finally, you don't care about the last letter of the fourth field, so we need to make a small substitution:
awk -F'"' '($2=="series_id") { str=$4; gsub(/.$/,"",str); if (! str in a) {c++; a[str] } } END {print c}'
You could also rewrite this differently as:
awk -F'"' '($2 != "series_id" ) { next }
{ str=$4; gsub(/.$/,"",str) }
( str in a ) { next }
{ c++; a[str] }
END { print c }'
My standard way to count unique values is making sure I have the list of values (using grep and cut in your case), and add the following commands behind a pipe:
| sort -n | uniq -c
The sort does the sorting, based on number sorting, while the uniq gets the unique entries (the -c stands for "count").
Do this : cat ELEC.txt | grep 'series_id' | cut -f1-4 -d. | uniq | wc -l
-f1-4 will remove the the fourth . from each line
Here is a possible solution using awk:
awk 'BEGIN{FS="[:.\"]+"} /^"series_id":/{print $6}' \
ELEC.txt |sort -n |uniq -c
The ouput for the sample you posted will be something like this:
1 56841-WND-WT
2 56855-ALL-ALL
1 56855-WND-ALL
2 56868-LFG-ALL
If you need the entire string, you can print the other fields as well:
awk 'BEGIN{FS="[:.\"]+"; OFS="."} /^"series_id":/{print $3,$4,$5,$6}' \
ELEC.txt |sort -n | uniq -c
And the output will be something like this:
1 ELEC.PLANT.CONS_EG_BTU.56841-WND-WT
2 ELEC.PLANT.CONS_EG_BTU.56855-ALL-ALL
1 ELEC.PLANT.CONS_EG_BTU.56855-WND-ALL
2 ELEC.PLANT.CONS_EG_BTU.56868-LFG-ALL
The command
Get-VM | Where {$_.PowerState -eq "PoweredOn"} | Select Name,VMHost | Where {$_ -match "abc" -or $_ -match "def"} | foreach{$_.Name} | Out-File output.txt
writes a list to output.txt where only the column Name will be printed in the form:
a
b
c
...
Now what I want to achieve is to append ,xxx to each line in some sort of loop, so that I get the following:
a,xxx
b,xxx
c,xxx
...
I tried to append the string, but this doesn't seem to work:
Get-VM | Where {$_.PowerState -eq "PoweredOn"} | Select Name,VMHost | Where {$_ -match "abc" -or $_ -match "def"} | foreach{$_.Name} | Out-File output.txt | Add-Content output.txt ",xxx"
I'm really not familiar with PowerShell, and I did not find a way to concatenate ,xxx.
In my case it is essential to do the concatenation within a loop, not with a file operation afterwards.
Instead of foreach { $_.Name }, write foreach { "$($_.Name),xxx" }
everyone, I am dealing with a log file which has about 5 million lines, so I use the awk shell in linux
I have to grep the domains and get the highest 100 in the log, so I write like this:
awk '{print $19}' $1 |
awk '{ split($0, string, "/");print string[1]}' |
awk '{domains[$0]++} END{for(j in domains) print domains[j], j}' |
sort -n | tail -n 100 > $2
it runs about 13 seconds
then I change the script like this:
awk 'split($19, string, "/"); domains[string[1]]++}
END{for(j in domains) print domains[j], j}' $1 |
sort -n | tail -n 100 > $2
it runs about 21 seconds
why?
you know one line of awk shell may reduce the sum of cal, it only read each line once, but the time increase...
so, if you know the answer, tell me
When you pipe commands they run in parallel as long as the pipe is full.
So my guess is that in the first version work is distributed among your CPUs, while in the second one all the work is done by one core.
You can verify this with top (or, better, htop).
Out of curiosity, is this faster? (untested):
cut -f 19 -d' ' $1 | cut -f1 -d'/' | sort | uniq -c | sort -nr | head -n 100 > $2
I have a jar file, i need to execute the files in it in Linux.
So I need to get the result of the unzip -l command line by line.
I have managed to extract the files names with this command :
unzip -l package.jar | awk '{print $NF}' | grep com/tests/[A-Za-Z] | cut -d "/" -f3 ;
But i can't figure out how to obtain the file names one after another to execute them.
How can i do it please ?
Thanks a lot.
If all you need the first row in a column, add a pipe and get the first line using head -1
So your one liner will look like :
unzip -l package.jar | awk '{print $NF}' | grep com/tests/[A-Za-Z] | cut -d "/" -f3 |head -1;
That will give you first line
now, club head and tail to get second line.
unzip -l package.jar | awk '{print $NF}' | grep com/tests/[A-Za-Z] | cut -d "/" -f3 |head -2 | tail -1;
to get second line.
But from scripting piont of view this is not a good approach. What you need is a loop as below:
for class in `unzip -l el-api.jar | awk '{print $NF}' | grep javax/el/[A-Za-Z] | cut -d "/" -f3`; do echo $class; done;
you can replace echo $class with whatever command you wish - and use $class to get the current class name.
HTH
Here is my attempt, which also take into account Daddou's request to remove the .class extension:
unzip -l package.jar | \
awk -F'/' '/com\/tests\/[A-Za-z]/ {sub(/\.class/, "", $NF); print $NF}' | \
while read baseName
do
echo " $baseName"
done
Notes:
The awk command also handles the tasks of grep and cut
The awk command also handles the removal of the .class extension
The result of the awk command is piped into the while read... command
baseName represents the name of the class file, with the .class extension removed
Now, you can do something with that $baseName
I have small file (100) lines of web request(apache std format) there are multiple request from clients. I want to ONLY have a list of request(lines) in my file that comes from a UNIQUE IP and is the latest entry
I have so far
/home/$: cat all.txt | awk '{ print $1}' | sort -u | "{print the whole line ??}"
The above gives me the IP's(bout 30 which is right) now i need to have the rest of the line(request) as well.
Use an associative array to keep track of which IPs you've found already:
awk '{
if (!found[$1]) {
print;
found[$1]=1;
}
}' all.txt
This will print the first line for each IP. If you want the last one then:
awk '
{ found[$1] = $0 }
END {
for (ip in found)
print found[ip]
}
' all.txt
I hate that unique doesn't come with the same options as sort, or that sort cannot do what it says, I reckon this should work[1],
tac access.log | sort -fb -k1V -u
but alas, it doesn't;
Therefore, it seems we're stuck at doing something silly like
cat all.txt | awk '{ print $1}' | sort -u | while read ip
do
tac all.txt | grep "^$ip" -h | head -1
done
Which is really inefficient, but 'works' (haven't tested it: module typo's then)
[1] according to the man-page
The following should work:
tac access.log | sort -f -k1,1 -us
This takes the file in reverse order and does a stable sort using the first field, keeping only unique items.