Parsing string with grep - linux

I need some help with parsing a string in Linux.
I have a string:
[INFO] Total time: 2 minutes 8 seconds
and want to get only
2 minutes 8 seconds

Using grep:
$ echo '[INFO] Total time: 2 minutes 8 seconds' | grep -o '[[:digit:]].*$'
2 minutes 8 seconds
Or sed:
$ echo '[INFO] Total time: 2 minutes 8 seconds' | sed 's/.*: //'
2 minutes 8 seconds
Or awk:
$ echo '[INFO] Total time: 2 minutes 8 seconds' | awk -F': ' '{print $2}'
2 minutes 8 seconds
Or cut:
$ echo '[INFO] Total time: 2 minutes 8 seconds' | cut -d: -f2
2 minutes 8 seconds
And then read sed & awk, Second Edition.

The sed and perl options do work, but in this trivial case, I'd prefer
echo "[INFO] Total time: 2 minutes 8 seconds" | cut -d: -f2
If you have something against spaces, you can just use
echo "[INFO] Total time: 2 minutes 8 seconds" | cut -d: -f2 | xargs
or even...
echo "[INFO] Total time: 2 minutes 8 seconds" | cut -d: -f2 | cut -c2-
PS. Trivia: you could do this with grep only if grep implemented positive lookbehind like this egrep -o '(?<=: ).*'; Unfortunately neither POSIX extended regex nor GNU extended regex implement lookbehind (http://www.regular-expressions.info/refflavors.html)

If the line prefix is always the same, simply use sed and replace the prefix with an empty string:
sed 's/\[INFO\] Total Time: //'
Assuming that the time is always the last thing in a line after a colon, use the following regex (replace each line with everything after the colon):
sed 's/^.*: \(.*\)$/\1/'

If you prefer AWK then it is quite simple
echo "[INFO] Total time: 2 minutes 8 seconds" | awk -F": " '{ print $2 }'

Use sed or perl:
echo "[INFO] Total time: 2 minutes 8 seconds" | sed -e 's/^\[INFO\] Total time:\s*//'
echo "[INFO] Total time: 2 minutes 8 seconds" | perl -pe "s/^\[INFO\] Total time:\s*//;"

If you are getting the info from the terminal then you can grep out the info and use cut with the delimiter to remove everything before the info you want.
grep INFO | cut -f2 -d:
If you want the info out of a file then you can grep the file
grep INFO somefilename | cut -f2 -d:

Related

Pipe each row of csv into bash command [duplicate]

This question already has answers here:
Count all occurrences of a string in lots of files with grep
(16 answers)
Closed 25 days ago.
This post was edited and submitted for review 25 days ago and failed to reopen the post:
Original close reason(s) were not resolved
I have a single column CSV file with no header and I want to iteratively find the value of each row and count the number of times it appears in several files.
Something like this:
for i in file.csv:
zcat *json.gz | grep i | wc -l
However, I don't know how to iterate through the csv and pass the values forward
Imagine that file.csv is:
foo,
bar
If foo exists 20 times in *json.gz and bar exists 30 times in *json.gz, I would expect the output of my command to be:
20
30
Here is the solution I found:
while IFS=',' read -r column; do
count=$(zgrep -o "$column" *json.gz | wc -l)
echo "$column,$count"; done < file.csv
You can achieve that with a single grep operation treating file.csv as a patterns file (obtaining patterns one per line):
grep -f file.csv -oh *.json | wc -l
-o - to print only matched parts
-h - to suppress file names from the output
You can iterate through output of cat run through subprocess:
for i in `cat file.csv`: # iterates through all the rows in file.csv
do echo "My value is $i"; done;
using chatgpt :), try this:
#!/bin/bash
# Define the name of the CSV file
csv_file="path/to/file.csv"
# Extract the values from each row of the CSV file
values=$(cut -f1 "$csv_file" | uniq -c)
# Loop through each file
for file in path/to/file1 path/to/file2 path/to/file3
do
# Extract the values from each row of the file
file_values=$(cut -f1 "$file" | uniq -c)
# Compare the values and print the results
for value in $values
do
count=$(echo $value | cut -f1 -d' ')
val=$(echo $value | cut -f2 -d' ')
file_count=$(echo $file_values | grep -o "$val" | wc -l)
echo "$val appears $count times in $csv_file and $file_count times in $file"
done
done

Obtaining the total of coincidences with multiple pattern using grep command

I have a file in Linux contains strings:
CALLTMA
Starting
Starting
Ending
Starting
Ending
Ending
CALLTMA
Ending
I need the quantity of any string (FE. #Ending, # Starting, #CALLTMA). In my example I need obtaining:
CALLTMA : 2
Starting: 3
Ending : 4
I can obtaining this output when I execute 3 commands:
grep -i "Starting" "/myfile.txt" | wc -l
grep -i "Ending" "/myfile.txt" | wc -l
grep -i "CALLTMA" "/myfile.txt" | wc -l
I want to know if it is possible to obtain the same output using only one command.
I try running this command
grep -iE "CALLTMA|Starting|Ending" "/myfile.txt" | wc -l
But this returned the total of coincidences. I appreciate your help .
Use sort and uniq:
sort myfile.txt | uniq -c
The -c adds the counts to the unique lines. If you want to sort the output by frequency, add
| sort -n
to the end (and change to -nr if you want the descending order).
A simple awk way to handle this:
awk '{counts[$1]++} END{for (c in counts) print c, counts[c]}' file
Starting 3
Ending 4
CALLTMA 2
grep -c will work. You can put it all together in a short script:
for i in Starting CALLTMA Ending; do
printf "%-8s : %d\n" "$i" $(grep -c "$i" file.txt)
done
(to enter the search terms as arguments, just use the arguments array for the loop list, e.g. for i in "$#"; do)
Output
Starting : 3
CALLTMA : 2
Ending : 4

wc -l is NOT counting last of the file if it does not have end of line character

I need to count all lines of an unix file. The file has 3 lines but wc -l gives only 2 count.
I understand that it is not counting last line because it does not have end of line character
Could any one please tell me how to count that line as well ?
grep -c returns the number of matching lines. Just use an empty string "" as your matching expression:
$ echo -n $'a\nb\nc' > 2or3.txt
$ cat 2or3.txt | wc -l
2
$ grep -c "" 2or3.txt
3
It is better to have all lines ending with EOL \n in Unix files. You can do:
{ cat file; echo ''; } | wc -l
Or this awk:
awk 'END{print NR}' file
This approach will give the correct line count regardless of whether the last line in the file ends with a newline or not.
awk will make sure that, in its output, each line it prints ends with a new line character. Thus, to be sure each line ends in a newline before sending the line to wc, use:
awk '1' file | wc -l
Here, we use the trivial awk program that consists solely of the number 1. awk interprets this cryptic statement to mean "print the line" which it does, being assured that a trailing newline is present.
Examples
Let us create a file with three lines, each ending with a newline, and count the lines:
$ echo -n $'a\nb\nc\n' >file
$ awk '1' f | wc -l
3
The correct number is found.
Now, let's try again with the last new line missing:
$ echo -n $'a\nb\nc' >file
$ awk '1' f | wc -l
3
This still provides the right number. awk automatically corrects for a missing newline but leaves the file alone if the last newline is present.
Respect
I respect the answer from John1024 and would like to expand upon it.
Line Count function
I find myself comparing line counts A LOT especially from the clipboard, so I have defined a bash function. I'd like to modify it to show the filenames and when passed more than 1 file a total. However, it hasn't been important enough for me to do so far.
# semicolons used because this is a condensed to 1 line in my ~/.bash_profile
function wcl(){
if [[ -z "${1:-}" ]]; then
set -- /dev/stdin "$#";
fi;
for f in "$#"; do
awk 1 "$f" | wc -l;
done;
}
Counting lines without the function
# Line count of the file
$ cat file_with_newline | wc -l
3
# Line count of the file
$ cat file_without_newline | wc -l
2
# Line count of the file unchanged by cat
$ cat file_without_newline | cat | wc -l
2
# Line count of the file changed by awk
$ cat file_without_newline | awk 1 | wc -l
3
# Line count of the file changed by only the first call to awk
$ cat file_without_newline | awk 1 | awk 1 | awk 1 | wc -l
3
# Line count of the file unchanged by awk because it ends with a newline character
$ cat file_with_newline | awk 1 | awk 1 | awk 1 | wc -l
3
Counting characters (why you don't want to put a wrapper around wc)
# Character count of the file
$ cat file_with_newline | wc -c
6
# Character count of the file unchanged by awk because it ends with a newline character
$ cat file_with_newline | awk 1 | awk 1 | awk 1 | wc -c
6
# Character count of the file
$ cat file_without_newline | wc -c
5
# Character count of the file changed by awk
$ cat file_without_newline | awk 1 | wc -c
6
Counting lines with the function
# Line count function used on stdin
$ cat file_with_newline | wcl
3
# Line count function used on stdin
$ cat file_without_newline | wcl
3
# Line count function used on filenames passed as arguments
$ wcl file_without_newline file_with_newline
3
3

tr "[1-9]" "['01'-'09']" not working properly

I'm trying to cut only the date part from a ls -lrth | grep TRACK output:
-rw-r--r-- 1 ins ins 0 Dec 3 00:00 TRACK_1_20121203_01010014.LOG
-rw-r--r-- 1 ins ins 0 Dec 3 00:00 TRACK_0_20121203_01010014.LOG
-rw-r--r-- 1 ins ins 0 Dec 13 15:10 TRACK_9_20121213_01010014.LOG
-rw-r--r-- 1 ins ins 0 Dec 13 15:10 TRACK_8_20121213_01010014.LOG
But, doing this:
ls -lrth | grep TRACK | tr "\t" " " | cut -d" " -f 9
only gives me the dates which are double digits and spaces for single digits:
13
13
So I tried something with tr command, to translate all single digit dates to double digits:
ls -lrth | grep TRACK | tr "\t" " " | tr "[1-9]" "['01'-'09']" | cut -d" " -f 9
But it's giving some weird results, and evidently don't serve my purpose. Any ideas on how to get the correct output?
Don't parse ls output.
ls is a tool for interactively looking at file information. Its output is formatted for humans and will cause bugs in scripts. Use globs or find instead. Understand why: http://mywiki.wooledge.org/ParsingLs
I recommend this way :
If you want the date and the file path :
find . -name 'TRACK*' -printf '%a %p\n'
If you want only the date:
find . -name 'TRACK*' -printf '%a\n'
You could try another approach with something like
find . -name 'TRACK*' -exec stat -c %y {} \; | sort
You can add something like | cut -f1 -d' ' if you only need the date.
I guess this does suffice:
ls -lhrt | grep TRACK | awk '{print $6, $7, $8}'
that kind of substitution would be better handled through sed:
ls -lrth | grep TRACK | sed 's/ \+/ /g;s/ \([0-9]\) / 0\1 /g' | cut -d" " -f 7
As already said, never parse the output of ls!
Since you only want the modification time, the command date has a cool option for that: option -r (man date for more info).
Hence, you probably want this instead of your line:
for i in TRACK*; do date -r "$i"; done
I don't know how you want the format of the date, so play with the options, e.g.,
for i in TRACK*; do date -r "$i" "+%D"; done
(the formats are in man date).
Use stat to get information about a file.
Also, tr only does one-to-one character translation. It won't replace one-character sequences with two-character ones.

how to operate the mathematical data on the first fields and assign the varibale linux

I have a file contaning for just 2 numubers. One number on eash line.
4.1865E+02
4.1766E+02
I know its something line BHF = ($1 from line 1 - $1 from line 2 )
but can find the exact command.
How can I do a mathematical operation on them and save the result to a variable.
PS: This was got using
sed -i -e '/^$/d' nodout15
sed -i -e 's/^[ \t]*//;s/[ \t]*$//' nodout15
awk ' {print $13} ' nodout15 > 15
mv 15 nodout15
sed -i -e '/^$/d' nodout15
sed -i -e 's/^[ \t]*//;s/[ \t]*$//' nodout15
sed -n '/^[0-9]\{1\}/p' nodout15 > 15
mv 15 nodout15
tail -2 nodout15 > 15
mv 15 nodout15
After all this I have these two numbers and now I am not able to do some arithmatics. If possible please tell me a short form to do it on the spot rather doing all this jugglary. Nodout is a file with different length of columns so I am only interested in 13th column. Since all lines wont be in the daughter file so , empty lines deleted. Then only those lines to be taken starting with number. Then the last two lines, as they show the final state. The difference between them , will lead to a conditional statement. so I need to save it in a variable.
regards.
awk
$ BHF=`awk -v RS='' '{print $1-$2}' input.txt`
$ echo $BHF
0.99
bc
$ BHF=`cat input.txt | xargs printf '%f-%f\n' | bc`
$ echo $BHF
.990000

Resources