How can I get the length of each output line of grep - linux

I am very new to bash scripting.
I have a network trace file I want to parse. Part of the trace file is (two packets):
[continues...]
+---------+---------------+----------+
05:00:00,727,744 ETHER
|0
|00|03|a0|09|5c|1c|00|10|07|df|a4|20|08|00|45|00|00|38|e7|55|
+---------+---------------+----------+
05:00:00,727,751 ETHER
|0
|00|03|a0|09|5c|1c|00|10|07|df|a4|20|08|00|45|00|00|38|e7|56|00|00|3a|01|
[continues...]
For each packet, I want to print the time stamp, and the length of the packet (the hex values coming on the next line after |0 header) so the output will look like:
05:00:00.727744 20 bytes
05:00:00.727751 24 bytes
I can get the line with time stamp and the packets separately using grep in bash:
times=$(grep '..\:..\:' $fileName)
packets=$(grep '..|..|' $fileName)
But I can't work with the separate output lines after that. The whole result is concatenated in the two variables "times" and "packets". How can I get the length of each packet?
P.S. a good reference that really explains how to do bash programming, rather than just doing examples would be appreciated.

Okay, with plain old shell...
You can get the length of the line like this:
line="|00|03|a0|09|5c|1c|00|10|07|df|a4|20|08|00|45|00|00|38|e7|55|"
wc -c<<<$line
62
There are sixty two characters in that line. Think of each character as |00 where 00 can be any digit. In that case, there's an extra | on the end. Plus, the wc -c includes the NL on the end.
So, if we take the value of wc -c, and subtract 2, we get 60. If we divide that by 3, we get 20 which is the number of characters.
Okay, now we need a little loop, figure out the various lines, and then parse them:
#! /bin/bash
while read line
do
if [[ $line =~ ^[[:digit:]]{2} ]]
then
echo -n "${line% *}"
elif [[ $line =~ ^\|[[:digit:]]{2} ]]
then
length=$(wc -c<<<$line)
((length-=2))
((length=length/3))
echo "$length bytes"
fi
done < test.txt
There a PURE BASH solution to your problems!
You're a beginning Bash programmer, and you have no idea what's going on...
Let's take this one step at a time:
A common way to loop through a file in BASH is using a while read loop. This combines the while with a read:
while read line
do
echo "My line is '$line'"
done < test.txt
Each line in test.txt is being read into the $line shell variable.
Let's take the next one:
if [[ $line =~ ^[[:digit:]]{2} ]]
This is an if statement. Always use the [[ ... ]] brackets because they fix issues with the shell interpolating stuff. Plus, they have a bit more power.
The =~ is a regular expression match. The [[:digit:]] matches any digit. The ^ anchors the regular expression to the beginning of the line, and {2} means I want exactly two of these. This says if I match a line that starts with two digits (which is your timestamp line), execute this if clause.
${line% *} is a pattern filter. The % says to match the (glob) smallest glob pattern to the right and filter it from my $line variable. I use this to remove the ETHER from my line. The -n tells echo not to do a NL.
Let's take my elif which is an else if clause.
elif [[ $line =~ ^\|[[:digit:]]{2} ]]
Again, I am matching a regular expression. This regular expression starts with (The ^) a |. I have to put a backslash in front because | is a magical regular expression character and \ kills the magic. It's now just a pipe. Then, that's followed by two digits. Note this skips |0 but catches |00.
Now, we have to do some calculations:
length=$(wc -c<<<$line)
The $(...) say to execute the enclosed command and resubstitute it back in the line. The wc -c counts the characters and <<<$line is what we're counting. This gave us 62 characters. We have to subtract 2, then divide by 3. That's the next two lines:
((length-=2))
((length/=3))
The ((...)) allows me to do integer based math. The first subtracts 2 from $length and the next divides it by 3. Now, I can echo this out:
echo "$length bytes"
And that's our pure Bash answer to this question.

You really don't want to do such things with your shell.
You want to write a real parser that understands the format to output the needed informations.
For a quick and dirty hack you can do something like that:
perl -wne 'print "$& " if /^\d\S*/; print split(/\|/)-2, " bytes\n" if /^\|..\|/'

Related

Grep the first line from each contiguous group of matching lines

I have a data file which looks like this:
a separator
interesting line 1
interesting line 2
a comment
interesting line 3
interesting line 4
interesting line 5
a non interesting line
some other data
interesting line 6
.
.
.
and I would like to extract the first interesting line from each contiguous group, no matter how many lines are in the group is or how many extra lines separate the groups.
For the test input above the output would be:
interesting line 1
interesting line 3
interesting line 6
I could easily do this in python by having a state variable that triggers when I match a line, and resets when I encounter a non-matching line, but what about a one-line shell script? Is there a not-too-obscure way to do this?
You can use grep with a greedy regex, then print the first line of every match with :
grep -Pzo '([^\n]*interesting line[^\n](\n|$))+' file |
while IFS='' read -d '' -r match
do
head -n1 <<< "$match"
done
grep parameters:
-P : Use Perl Compatible regular expression (instead of the default basic regular expression) for the \n in the regex.
-z : Treat the input as a set of lines, each terminated by a zero byte. An ASCII NUL character will separate each match, allowing us to reliably separate the matches.
the regex ([^\n]*blablabla[^\n]*(\n|$))+ will match each group of contiguous lines containing blablabla.
In the while condition command, the IFS is emptied for the read. Otherwise, with the default IFS, the last newline character of each match would be eaten by read (that might not be a problem). It's a good practice to always clear IFS in "while read" to get the text in the variable exactly as it is read (leading spaces are also easily eaten up).
read parameters:
-d '' : Use the empty string as delimiter (= the ASCII NUL character). This is equivalent to -d $'\0' (see https://unix.stackexchange.com/q/61029/283498).
-r : don't interpret any backslash in the lines (see https://unix.stackexchange.com/q/192786/283498).
match : just a variable name I chose, which is used in the body of the loop.
And in the body of the loop: head -n1 <<< "$match" prints only the first line of the current match (the command head with -n 1 prints the first 1 line of its input). Side note: <<< is a bashism ; the command is equivalent to echo "$match" | head -n1.

linux script to find specific words in file names

I need help writing a script to do the following stated below in part a.
The following code will output all of the words found in $filename, each word on a separate line.
for word in “cat $filename”
do
echo $word
done
a. Write a new script which receives two parameters. The first is a file’s name ($1 instead of $filename) and the second is a word you want to search for ($2). Inside the for loop, instead of echo $word, use an if statement to compare $2 to $word. If they are equal, add one to a variable called COUNT. Before the for loop, initialize COUNT to 0 and after the for loop, output a message that tells the user how many times $2 appeared in $1. That is, output $COUNT, $2 and $1 in an echo statement but make sure you have some literal words in here so that the output actually makes sense to the user. HINTS: to compare two strings, use the notation [ $string1 == $string2 ]. To add one to a variable, use the notation X=$((X+1)). If every instruction is on a separate line, you do not need any semicolons. Test your script on /etc/fstab with the word defaults (7 occurrences should be found)
This is what I got so far, but it does not work right. It says it finds 0 occurrences of the word "defaults" in /etc/fstab. I am sure my code is wrong but can't figure out the problem. Help is appreciated.
count=0
echo “what word do you want to search for?: “
read two
for word in “cat $1”
do
if [ “$two” == “$word” ]; then
count=$((count+1))
fi
done
echo $two appeared $count times in $1
You need to use command substitution, you were looping over this string: cat first_parameter.
for word in $(cat "$1")
Better way to do this using grep, paraphrasing How do I count the number of occurrences of a word in a text file with the command line?
grep -o "\<$two\>" "$1" | wc -l

Bash: How to extract numbers preceded by _ and followed by

I have the following format for filenames: filename_1234.svg
How can I retrieve the numbers preceded by an underscore and followed by a dot. There can be between one to four numbers before the .svg
I have tried:
width=${fileName//[^0-9]/}
but if the fileName contains a number as well, it will return all numbers in the filename, e.g.
file6name_1234.svg
I found solutions for two underscores (and splitting it into an array), but I am looking for a way to check for the underscore as well as the dot.
You can use simple parameter expansion with substring removal to simply trim from the right up to, and including, the '.', then trim from the left up to, and including, the '_', leaving the number you desire, e.g.
$ width=filename_1234.svg; val="${width%.*}"; val="${val##*_}"; echo $val
1234
note: # trims from left to first-occurrence while ## trims to last-occurrence. % and %% work the same way from the right.
Explained:
width=filename_1234.svg - width holds your filename
val="${width%.*}" - val holds filename_1234
val="${val##*_}" - finally val holds 1234
Of course, there is no need to use a temporary value like val if your intent is that width should hold the width. I just used a temp to protect against changing the original contents of width. If you want the resulting number in width, just replace val with width everywhere above and operate directly on width.
note 2: using shell capabilities like parameter expansion prevents creating a separate subshell and spawning a separate process that occurs when using a utility like sed, grep or awk (or anything that isn't part of the shell for that matter).
Try the following code :
filename="filename_6_1234.svg"
if [[ "$filename" =~ ^(.*)_([^.]*)\..*$ ]];
then
echo "${BASH_REMATCH[0]}" #will display 'filename_6_1234.svg'
echo "${BASH_REMATCH[1]}" #will display 'filename_6'
echo "${BASH_REMATCH[2]}" #will display '1234'
fi
Explanation :
=~ : bash operator for regex comparison
^(.*)_([^.])\..*$ : we look for any character, followed by an underscore, followed by any character, followed by a dot and an extension. We create 2 capture groups, one for before the last underscore, one for after
BASH_REMATCH : array containing the captured groups
Some more way
[akshay#localhost tmp]$ filename=file1b2aname_1234.svg
[akshay#localhost tmp]$ after=${filename##*_}
[akshay#localhost tmp]$ echo ${after//[^0-9]}
1234
Using awk
[akshay#localhost tmp]$ awk -F'[_.]' '{print $2}' <<< "$filename"
1234
I would use
sed 's!_! !g' | awk '{print "_" $NF}'
to get from filename_1234.svg to _1234.svg then
sed 's!svg!!g'
to get rid of the extension.
If you set IFS, you can use Bash's build-in read.
This splits the filename by underscores and dots and stores the result in the array a.
IFS='_.' read -a a <<<'file1b2aname_1234.svg'
And this takes the second last element from the array.
echo ${a[-2]}
There's a solution using cut:
name="file6name_1234.svg"
num=$(echo "$name" | cut -d '_' -f 2 | cut -d '.' -f 1)
echo "$num"
-d is for specifying a delimiter.
-f refers to the desired field.
I don't know anything about performance but it's simple to understand and simple to maintain.

Line from bash command output stored in variable as string

I'm trying to find a solution to a problem analog to this one:
#command_A
A_output_Line_1
A_output_Line_2
A_output_Line_3
#command_B
B_output_Line_1
B_output_Line_2
Now I need to compare A_output_Line_2 and B_output_Line_1 and echo "Correct" if they are equal and "Not Correct" otherwise.
I guess the easiest way to do this is to copy a line of output in some variable and then after executing the two commands, simply compare the variables and echo something.
This I need to implement in a bash script and any information on how to get certain line of output stored in a variable would help me put the pieces together.
Also, it would be cool if anyone can tell me not only how to copy/store a line, but probably just a word or sequence like : line 1, bytes 4-12, stored like string in a variable.
I am not a complete beginner but also not anywhere near advanced linux bash user. Thanks to any help in advance and sorry for bad english!
An easier way might be to use diff, no?
Something like:
command_A > command_A.output
command_B > command_B.output
diff command_A.output command_B.output
This will work for comparing multiple strings.
But, since you want to know about single lines (and words in the lines) here are some pointers:
# first line of output of command_A
command_A | head -n 1
The -n 1 option says only to use the first line (default is 10 I think)
# second line of output of command_A
command_A | head -n 2 | tail -n 1
that will take the first two lines of the output of command_A and then the last of those two lines. Happy times :)
You can now store this information in a variable:
export output_A=`command_A | head -n 2 | tail -n 1`
export output_B=`command_B | head -n 1`
And then compare it:
if [ "$output_A" == "$output_B" ]; then echo 'Correct'; else echo 'Not Correct'; fi
To just get parts of a string, try looking into cut or (for more powerful stuff) sed and awk.
Also, just learing a good general purpose scripting language like python or ruby (even perl) can go a long way with this kind of problem.
Use the IFS (internal field separator) to separate on newlines and store the outputs in an array.
#!/bin/bash
IFS='
'
array_a=( $(./a.sh) )
array_b=( $(./b.sh) )
if [ "${array_a[1]}" = "${array_b[0]}" ]; then
echo "CORRECT"
else
echo "INCORRECT"
fi

Make sure int variable is 2 digits long, else add 0 in front to make it 2 digits long

How do I check a int variable ($inputNo) to see if it’s 2 or more decimal digits long?
Example:
inputNo="5"
Should be changed to: 05
inputNo="102"
Should be left alone: 102
I thought about using wc and if statements, but wc -m doesn’t seems to give the actual characters passed into wc, as wc always seems to give +1 to the characters that is given.
But I don’t know how to add a 0 in front of the current input number.
You can use the bash-builtin printf with the -v option to write it to a variable rather than print it to standard output:
pax> inputNo=5 ; printf -v inputNo "%02d" $inputNo ; echo $inputNo
05
pax> inputNo=102 ; printf -v inputNo "%02d" $inputNo ; echo $inputNo
102
You'll want to make sure it's numeric first otherwise the conversion will fail. If you want to be able to pad any string out to two or more characters, you can also use:
while [[ ${#inputNo} -lt 2 ]] ; do
inputNo="0${inputNo}"
done
which is basically a while loop that prefixes your string with "0" until the length is greater than or equal to two.
Note that this can also be done in bash by prefixing the number with two zeroes then simply getting the last two characters of that string, checking first that it's not already at least the desired size:
if [[ ${#inputNo} -lt 2 ]] ; then
inputNo="00${inputNo}"
inputNo="${inputNo: -2}"
fi
The difference is probably not too great for a two-digit number but you may find the latter solution is better if you need larger widths.
If you're using a shell other than bash (unlikely, based on your tags), you'll need to find the equivalents, or revert to using external processes to do the work, something like:
while [[ $(echo -n ${inputNo} | wc -c) -lt 2 ]] ; do
inputNo="0${inputNo}"
done
This does basically what you were thinking off in your question but note the use of -n in the echo command to prevent the trailing newline (which was almost certainly causing your off-by-one error).
But, as stated, this is a fall-back position. If you're using bash, the earlier suggestions of mine are probably best.
For general-purpose padding whether the string is numeric or not
No need for piping echo into wc or using a while loop.
In Bash, you can get the length of a string like this: ${#inputNo}.
And since you can do substrings, you can do this instead:
if [[ ${#input} < 2 ]]
then
inputNo="00${inputNo}"
inputNo="${inputNo: -2}"
fi
You can use http://bash-hackers.org/wiki/doku.php/commands/builtin/printf, an example from there:
the_mac="0:13:ce:7:7a:ad"
# lowercase hex digits
the_mac="$(printf "%02x:%02x:%02x:%02x:%02x:%02x" 0x${the_mac//:/ 0x})"
# or the uppercase-digits variant
the_mac="$(printf "%02X:%02X:%02X:%02X:%02X:%02X" 0x${the_mac//:/ 0x})"

Resources