Piping, sorting then counting - linux

I have this task on hand.
Finish this task using a combined command (a.k.a. piping) using four individual commands. Task: Find all lines containing the keyword "wonderful" from file "day.txt" and sort the lines alphabetically. Pick the first fifteen of the sorted lines and count how many characters there are in the fifteen lines.
I know the first command should be grep "wonderful" day.txt but then I can't figure out what to do afterwards.

Something like this would do it.
grep "wonderful" day.txt | sort | head -15 | wc -c
References:
sort
head
wc
Piping in linux

Related

How to display number of times a word repeated after a common pattern

I have a file which has N number of line
For example
This/is/workshop/1
This/is/workshop/2
This/is/workshop/3
This/is/workshop/4
This/is/workshop/5
How to get the below result using uniq command:
This/is/workshop/ =5
Okay so there are a couple tools you can utilize here. Familiarize yourself with grep, cut, and uniq. My process for doing something like this may not be ideal, but given your original question I'll try to tailor the process to the lines in the file you've given.
First you'll want to grep the file for the relevant strings. Then you can pass it through to cut, declaring the fields you want to include by specifying the delimiter and also the number of fields. Lastly, you can pipe this through to uniq to count it.
Example:
Contents of file.txt
This/is/workshop/1
This/is/workshop/2
This/is/workshop/3
This/is/workshop/4
This/is/workshop/5
Use grep, cut and uniq
$ grep "This/is/workshop/" file.txt | cut -d/ -f1-3 | uniq -c
5 This/is/workshop
To specify the delimiter in cut, you use the -d flag and the delimiter you want to use. Each field is what exists between delimiters, starting at 1. For this, we want the first three. Then just pipe it through to uniq to get the count you are after.

How to capture running log using bash script? [Case Closed]

I'm a new player in bash scripting. There's something that I want to know about capture logfile using bash script.
Let's say there is a server which store logfile every hour with format file below.
file[20160509_130000].log
The logfile has detailed information like this.
13:00:00 INFO [file.chdev130] Event: ADIEVN_PLAY_DONE, Finished , nbytes=16360
13:00:00 INFO [file.chdev39] adiCollectDigits() success
My question is how can i read and store the running log or one hour before to get specific parameter (e.g. "Event") to new file using bash scripting?
Can someone teach me how to do it? Thanks.
Update
Here the flow that I want (for this time I want to know how the first point works):
Get the running log or one hour before.
Read the 15 minute interval (13:00:00 - 13:15:00).
grep the parameter (e.g. "Event) in that interval.
Count the parameter.
Store it to another file.
SOLVED
Here the solution in case someone need it.
List all the content based on time stamp using ls -t then pipe it
Use grep -v ^d (i still doesn't know the exact explanation for ^d), pipe again
Display first few lines with head
So the result is,
ls -t *.log | grep -v ^d | head -1 (for display the running log)
ls -t *.log | grep -v ^d | head -2 | tail -1 (for display the one log before the running log)
Hope it'll help. Thanks
== Case Closed ==
tail -f /path-to-logfile will allow you to read a file continuously until you cancel tail execution.
grep "^13:" /path-to-logfile will show you all strings in the file, which starts from "13:", in our case you'll get every record for 13-th hour.
grep "Event" /path-to-logfile will show you all strings with "Event" in them, i think, you got the idea about grep already. )
You can figure out the current or the previous log filename using date, using your logfile name convention:
fname="file[`date -v-1H '+%Y%m%d_%H0000'`].log"
will give you the previous filename. Omit -v-1H to get the current one.
For the 15-minute intervals, you may use regexps like '^\d\d:(0\d|1[0-4])' for 00:00-14:59 interval, '^\d\d:(1[5-9]|2\d)' for 15:00-29:59, etc.
For example, in the first regex, ^ matches the beginning of the line, \d\d: matches "two digits and a colon", and (0\d|1[0-4]) matches either 0 with any adjacent digit, or 1 with adjacent digit from 0 to 4. In the second regex, (1[5-9]|2\d) matches 1 with digit from 5 to 9, or 2 with any digit.
Then you grep -Po '(?<=Event: ).+(?=,)' the type of events in your log, assuming that the type of event always ends with a ,. That regexp will greedily match any symbols, as many as it can, starting from one symbol, if they are between strings Event: and , (Event: and , themselves are not matched, that's what lookbehind/lookaheads are for). Then use sort | uniq -c to count number of different events entries .
So the resulting script would look something like
fname="file[`date -v-1H '+%Y%m%d_%H0000'`].log"
grep -P '^\d\d:(0\d|1[0-4])' "$fname" | grep -Po '(?<=Event: ).+(?=,)' | sort | uniq -c > "/tmp/$fname_00-14" # entries for first 15 minutes
grep -P '^\d\d:(1[5-9]|2\d)' "$fname" | grep -Po '(?<=Event: ).+(?=,)' | sort | uniq -c > "/tmp/$fname_15-29" # entries for second 15 minutes
grep -P '^\d\d:(3\d|4[0-4])' "$fname" | grep -Po '(?<=Event: ).+(?=,)' | sort | uniq -c > "/tmp/$fname_30-44" # entries for third 15 minutes
grep -P '^\d\d:(4[5-9]|5\d)' "$fname" | grep -Po '(?<=Event: ).+(?=,)' | sort | uniq -c > "/tmp/$fname_45-59" # entries for fourth 15 minutes
for the last hour log.
Another approach is to use logtail with cron entries to get last log entries, instead of grepping. You can set up your script to be run by cron at 00, 15, 30 and 45 minutes each hour, determine the log filename inside it and logtail the file like logtail -f "$fname". The catch here would be that when running at 00 minutes, you'd need to use the -v-1H switch for date, and this approach is not as accurate as grepping out the times.

diff command to get number of different lines only

Can I use the diff command to find out how many lines do two files differ in?
I don't want the contextual difference, just the total number of lines that are different between two files. Best if the result is just a single integer.
diff can do all the first part of the job but no counting; wc -l does the rest:
diff -y --suppress-common-lines file1 file2 | wc -l
Yes you can, and in true Linux fashion you can use a number of commands piped together to perform the task.
First you need to use the diff command, to get the differences in the files.
diff file1 file2
This will give you an output of a list of changes. The ones your interested in are the lines prefixed with a '>' symbol
You use the grep tool to filter these out as follows
diff file1 file2 | grep "^>"
finally, once you have a list of the changes your interested in, you simply use the wc command in line mode to count the number of changes.
diff file1 file2 | grep "^>" | wc -l
and you have a perfect example of the philosophy that Linux is all about.

How to make grep to stop searching in each file after N lines?

It's best to describe the use by a hypothetical example:
Searching for some useful header info in a big collection of email storage (each email in a separate file). e.g. doing stats of top mail client apps used.
Normally if you do grep you can specify -m to stop at first match but let's say an email does not contact X-Mailer or whatever it is we are looking for in a header? It will scan through the whole email. Since most headers are <50 lines performance could be increased by telling grep to search only 50 lines on any file. I could not find a way to do that.
I don't know if it would be faster but you could do this with awk:
awk '/match me/{print;exit}FNR>50{exit}' *.mail
will print the first line matching match me if it appears in the first 50 lines. (If you wanted to print the filename as well, grep style, change print; to print FILENAME ":" $0;)
awk doesn't have any equivalent to grep's -r flag, but if you need to recursively scan directories, you can use find with -exec:
find /base/dir -iname '*.mail' \
-exec awk '/match me/{print FILENAME ":" $0;exit}FNR>50{exit}' {} +
You could solve this problem by piping head -n50 through grep but that would undoubtedly be slower since you'd have to start two new processes (one head and one grep) for each file. You could do it with just one head and one grep but then you'd lose the ability to stop matching a file as soon as you find the magic line, and it would be awkward to label the lines with the filename.
you can do something like this
head -50 <mailfile>| grep <your keyword>
Try this command:
for i in *
do
head -n 50 $i | grep -H --label=$i pattern
done
output:
1.txt: aaaaaaaa pattern aaaaaaaa
2.txt: bbbb pattern bbbbb
ls *.txt | xargs head -<N lines>| grep 'your_string'

How to crop(cut) text files based on starting and ending line-numbers in cygwin?

I have few log files around 100MBs each.
Personally I find it cumbersome to deal with such big files. I know that log lines that are interesting to me are only between 200 to 400 lines or so.
What would be a good way to extract relavant log lines from these files ie I just want to pipe the range of line numbers to another file.
For example, the inputs are:
filename: MyHugeLogFile.log
Starting line number: 38438
Ending line number: 39276
Is there a command that I can run in cygwin to cat out only that range in that file? I know that if I can somehow display that range in stdout then I can also pipe to an output file.
Note: Adding Linux tag for more visibility, but I need a solution that might work in cygwin. (Usually linux commands do work in cygwin).
Sounds like a job for sed:
sed -n '8,12p' yourfile
...will send lines 8 through 12 of yourfile to standard out.
If you want to prepend the line number, you may wish to use cat -n first:
cat -n yourfile | sed -n '8,12p'
You can use wc -l to figure out the total # of lines.
You can then combine head and tail to get at the range you want. Let's assume the log is 40,000 lines, you want the last 1562 lines, then of those you want the first 838. So:
tail -1562 MyHugeLogFile.log | head -838 | ....
Or there's probably an easier way using sed or awk.
I saw this thread when I was trying to split a file in files with 100 000 lines. A better solution than sed for that is:
split -l 100000 database.sql database-
It will give files like:
database-aaa
database-aab
database-aac
...
And if you simply want to cut part of a file - say from line 26 to 142 - and input it to a newfile :
cat file-to-cut.txt | sed -n '26,142p' >> new-file.txt
How about this:
$ seq 1 100000 | tail -n +10000 | head -n 10
10000
10001
10002
10003
10004
10005
10006
10007
10008
10009
It uses tail to output from the 10,000th line and onwards and then head to only keep 10 lines.
The same (almost) result with sed:
$ seq 1 100000 | sed -n '10000,10010p'
10000
10001
10002
10003
10004
10005
10006
10007
10008
10009
10010
This one has the advantage of allowing you to input the line range directly.
If you are interested only in the last X lines, you can use the "tail" command like this.
$ tail -n XXXXX yourlogfile.log >> mycroppedfile.txt
This will save the last XXXXX lines of your log file to a new file called "mycroppedfile.txt"
This is an old thread but I was surprised nobody mentioned grep. The -A option allows specifying a number of lines to print after a search match and the -B option includes lines before a match. The following command would output 10 lines before and 10 lines after occurrences of "my search string" in the file "mylogfile.log":
grep -A 10 -B 10 "my search string" mylogfile.log
If there are multiple matches within a large file the output can rapidly get unwieldy. Two helpful options are -n which tells grep to include line numbers and --color which highlights the matched text in the output.
If there is more than file to be searched grep allows multiple files to be listed separated by spaces. Wildcards can also be used. Putting it all together:
grep -A 10 -B 10 -n --color "my search string" *.log someOtherFile.txt

Resources