Find a String occurrence between Two occurrences of other - string

I've a very long file as follows.
Input file :-
Text Point
Blah
Blah
Blah
Blah
Blah
Blah
String
Blah
Blah
Blah
Blah
Blah
Blah
Text Point
Blah
Blah
Blah
Blah
Blah
Blah
Text Point
String
Blah
Blah
Text Point
Blah
Blah
Blah
String
Blah
Blah
Blah
Text Point
Blah
Blah
String
Blah
After each Occurrence of a 'Text Point', and before the next occurrence, I expect 'String' to occur at maximum once. I've to extract string if it is occurring between Two consecutive 'Text point's to a output file Or I've to put a dash if it is not occurring.
In this case, I need a output like this
String
-
String
String
String
I tried using following command
sed -n '/Text point/{:a;N;/^\n/s/^\n//;/Text point/{p;s/.*//;};ba};' $1 | grep "String" >> Outfile
But the problem with this is when string isn't found it will not append anything to outfile.
So please help me out with the code. Thanks.

I have a solution with perl
use strict;
use warnings;
$/="Text Point";
while(<>) {
if(/String/m) {
print "String \n" ;
}
else{
print "- \n" ;
}
}

awk '/^Text Point/{print p; p="-" } /String/{ p=$0} END{print p}' input

Using a perl one-liner
perl -0777 -ne 'print /(.*String.*\n)/ ? $1 : "-\n" for split /(?=Text Point)/' file
Explanation:
Switches:
-0777: Slurp entire file
-n: Creates a while(<>){...} loop for each “line” in your input file.
-e: Tells perl to execute the code on command line.

Related

Serialize multiline string with |?

Using YamlDotNet, the following string;
"blah blah blah \n blah blah blah"
gets serialized as:
test: >-
blah blah blah
blah blah blah
Is it possible to have this serialized as
test: |
blah blah blah
blah blah blah
dotnet fiddle:
https://dotnetfiddle.net/zT1Ujs
Found it by searching github, add a [YamlMember(ScalarStyle = ScalarStyle.Literal)] attribute to the property works.

I have a log file where I need to get the output as specified and I need the logic to write in shell script

Below is the log file and the expected output. The file output should have the CustomerName and Size they have downloaded. There is a CustomerName:John has downloaded twice so in the final output I need to get the total size he has downloaded. I need help with writing a shell script.
Thank you
01-01-2012 01:13:36 Blah blah : blah CustomerName:Sam downloaded Blah Size:5432 bytes Carrier:Company-A
01-01-2012 01:13:45 Blah blah : blah CustomerName:John downloaded Blah Size:38655 bytes Carrier:Company-S
01-01-2012 01:13:47 Blah blah : blah CustomerName:Dave downloaded Blah Size:25632 bytes Carrier:Company-A
01-01-2012 01:13:50 Blah blah : blah CustomerName:John downloaded Blah Size:7213 bytes Carrier:Company-S
01-01-2012 01:13:58 Blah blah : blah CustomerName:Kristy downloaded Blah Size:70100 bytes Carrier:Company-V
Expected output
CustomerName: Sam Size: 5432
CustomerName: John Size: 45868
CustomerName: Dave Size: 25632
CustomerName: Kristy Size: 70100
Try this-
awk -F '[ :]' '{name[$11]++ ; size[$11]+=$15} END \
{for (i in name) print "CustomerName: ", i, "Size:" size[i]}' test
Where test is the name of the input file.
Output-
CustomerName: Dave Size:25632
CustomerName: John Size:45868
CustomerName: Sam Size:5432
CustomerName: Kristy Size:70100
Explanation-
-F '[ :]' sets the delimiter to be a space and the :. Hence the columns get numbered differently.
I define two arrays. The array name contains the names of the different people.
The array size has the keys based on people's names but contains the sizes of the downloads.
In the part after the END, I'm iterating over the names in the name array and simply getting the values of the names and the sizes. I've also added some text into the print part as per your question.
$cat xxx.txt | awk -F ":" '{print $5" "$6}' | awk '{print $1" "$5}' | awk '{arr[$1]+=$2} END {for (i in arr) {print i,arr[i]}}'
Dave 25632
John 45868
Sam 5432
Kristy 70100
where xxx.txt is the input file
Explanation about awk '{arr[$1]+=$2} END {for (i in arr) {print i,arr[i]}}':
{arr[$1]+=$2} will create a map which taking the name as key and number as value, and if the specific key exists, add the number to the value, and END block will be executed after all lines are processed by awk , which is print the map in this case. Read more about the END block
cat InputFile |awk -F'blah' '{print $3}'|awk -F'downloaded Blah' '{print $1 $2}'|awk -F'bytes' '{print $1}'|awk '{print $1" "$2}'|sed 's/:/\ :\ /g'
or
`#!/bin/bash
cat $1 |\ # $1 is input file name from command line.
awk -F'blah' '{print $3}'|\
awk -F'downloaded Blah' '{print $1 $2}'|\ #
awk -F'bytes' '{print $1}'|\
awk '{print $1" "$2}'|\
sed 's/:/\ :\ /g'`
Both are same thing, just above one written in single line format and second one you can keep it as script which you can modify later and understand better.
in awk, -F is delimiter to cut the string, which makes it more easy to understand and get output easily. As you have mentioned you want spaces before and after :, I have used sed for it. Both will give output like:
CustomerName : Sam Size : 5432
CustomerName : John Size : 38655
CustomerName : Dave Size : 25632
CustomerName : John Size : 7213
CustomerName : Kristy Size : 70100
Using gsubto weed out non-digits from $10:
$ awk '
{
gsub(/[^0-9]/,"",$10) # remove non-digits
a[$7]+=$10 # count the sizes grouping on the name
}
END { # in the end
for(i in a)
print i, "Size:" a[i] # output
}' file
CustomerName:John Size:45868
CustomerName:Sam Size:5432
CustomerName:Kristy Size:70100
CustomerName:Dave Size:25632

Text in file manipulation

I have a text file with text like this:
{"id":2705,"status":"Analyze","severity":"Critical",Blah Blah ... "file":"/home/foo.c","message":"Message is...","url":"http://aaa..."}
{"id":2706,"status":"Fix","severity":"Low",Blah Blah ... "file":"/home/foo1.h","message":"Message2 is...","url":"http://bbb..."}
I would like to have bash script, that reads file, and for each line use all pairs of data as variables (for example id=2705, status="Analyze"...) and echo them.
awk 'BEGIN{RS=",";FS=":";OFS="="}{$1=$1;gsub("}|{|\"","")}1' infile
id=2705
status=Analyze
severity=Critical
Blah Blah ... file=/home/foo.c
message=Message is...
url=http=//aaa...
id=2706
status=Fix
severity=Low
Blah Blah ... file=/home/foo1.h
message=Message2 is...
url=http=//bbb...

grep -A <num> until a string

assuming that we have a file containing the following:
chapter 1 blah blah
blah num blah num
num blah num blah
...
blah num
chapter 2 blah blah
and we want to grep this file so we take the lines
from chapter 1 blah blah to blah num
(the line before the next chapter).
The only things we know are
the stating string chapter 1 blah blah
somewhere after that there is another line starting with chapter
a dummy way to do this is
grep -A <num> -i "chapter 1" <file>
with large enough <num> so the whole chapter will be in it.
sed -ne '/^chapter 1/,/^chapter/{/^chapter/d;p}' file
This is easy to do with awk
awk '/chapter/ {f=0} /chapter 1/ {f=1} f' file
chapter 1 blah blah
blah num blah num
num blah num blah
...
blah num
It will print the line if flag f is true.
The chapter 1 and next chapter to changes the flag.
You can use range with awk but its less flexible if you have other stuff to test.
awk '/chapter 1/,/chapter [^1]/ {if (!/chapter [^1]/) print}' file
chapter 1 blah blah
blah num blah num
num blah num blah
...
blah num
You could do this through grep itself also but you need to enable Perl-regexp parameter P and z.
$ grep -oPz '^chapter 1[\s\S]*?(?=\nchapter)' file
chapter 1 blah blah
blah num blah num
num blah num blah
...
blah num
[\s\S]*? will do a non-greedy match of zero or more characters until the line which has the string chapter at the start is reached.
From man grep
-z, --null-data a data line ends in 0 byte, not newline
-P, --perl-regexp PATTERN is a Perl regular expression
-o, --only-matching show only the part of a line matching PATTERN

sed delete match within quotes on line containing several quotes

I have a file called names.xml
That looks like the below:
NAME="Stacey" SURNAME="Ford"
blah blah blah
NAME="Stacey" SURNAME="Ford"
blah blah blah
I need to find all occurrences of NAME=" and with the "" quotes I need to replace the name with another value.
So the output needs to look like this:
NAME="Jack" SURNAME="Ford"
blah blah blah
NAME="Jack" SURNAME="Ford"
blah blah blah
I am using: sed 's/NAME=".*"/NAME="Jack"/g' names.xml
But this is the result it gives me:
NAME="Jack"
blah blah blah
NAME="Jack"
blah blah blah
It is looking at everything up until the last " on SURNAME.
Your time and assistance is greatly appreciated.
You need to use a negated character class [^"]* which matches any character but not of " zero or more times. .* in your regex is greedy by default, it eats all the characters upto the last " double quotes. So that only it matches Stacey and upto the last Ford. And also you must need to add a word boundary \b before the NAME, so that it won't match the string NAME in SURNAME . \b matches between a word character and a non-word character.
sed 's/\bNAME="[^"]*"/NAME="Jack"/g' names.xml
Here is an awk version:
awk -F\" -vOFS=\" '$1~/NAME=/ {$2="Jack"}1' file
NAME="Jack" SURNAME="Ford"
blah blah blah
NAME="Jack" SURNAME="Ford"
blah blah blah
Use " as field separator. If field 1 contains NAME= replace filed 2 with Jack and print it.

Resources