awk is not picking string and pipe as delimiter - linux

I have a file where each record has a word "TST_BI|" and I need to use as a delimiter and populate the value after this string to a file . There is only one occurrence of this string in each record.
It is working fine in AIX environment with below command.
awk -F "TST_BI|" '{print $2}' file.txt.
But when I migrated the code to Linux and tried the same, command is NOT working, where the value "|" is also getting populated. Below are the output from both AIX and Linux
Input :
<14>1 2017-08-31T04:13:47.2345839+00:00 loggregator ecsdasc0985-cs656-4asdsds-asds-asdasg6ds73 [DEV/2] - - TST_BI|DATE~2017-08-31 04:13:47,095|TIMESTAMP~04:13:47|TEST_ID~biTestExecutor-2|COUNTRY_CODE~XX|GROUP_TESTS~BZAG
OutPut from AIX :
DATE~2017-08-31 04:13:47,095|TIMESTAMP~04:13:47|TEST_ID~biTestExecutor-2|COUNTRY_CODE~XX|GROUP_TESTS~BZAG
With same command,
Linux Output is
|DATE~2017-08-31 04:13:47,095|TIMESTAMP~04:13:47|TEST_ID~biTestExecutor-2|COUNTRY_CODE~XX|GROUP_TESTS~BZAG
A pipe is getting populated and which is not treating as delimiter.
Can anyone please help?

vertical bar char | specified with adjacent characters is treated as regex alternation operator/alternation group. In such case, to treat | literally - it should be escaped \\| or put into a character class [|]:
awk -F'TST_BI[|]' '{print $2}' file.txt
The output:
DATE~2017-08-31 04:13:47,095|TIMESTAMP~04:13:47|TEST_ID~biTestExecutor-2|COUNTRY_CODE~XX|GROUP_TESTS~BZAG

Related

how to sed spacial character if it come inside double quote in linux file

I have txt file delimited by comma (,) and each column quoted by double quote
what I want to do is :
I need to keep the delimiter as comma but I want to remove each comma come into double pair quote (as each column around by double quote)
sample on input and output file I want
input file :
"2022111812160156601777153","","","false","test1",**"here the , issue , that comma comma come inside the column"**
the output as I want :
"2022111812160156601777153","","","false","test1",**"here the issue that comma comma come inside the column"**
what I try :
sed -i ':a' -e 's/\("[^"]*\),\([^"]*"\)/\1~\2/;ta' test.txt
but above sed command replace all comma not only the comma that come inside the column
is there are way to do it ?
Using sed
$ sed -Ei.bak ':a;s/((^|,)(\*+)?"[^"]*),/\1/;ta' input_file
"2022111812160156601777153","","","false","test1",**"here the issue that comma comma come inside the column"**
Any time you find yourself using more than s, g, and p (with -n) in sed you'd be better off using awk for some combination of clarity, robustness, efficiency, portability, etc.
Using any awk in any shell on every Unix box:
$ awk 'BEGIN{FS=OFS="\""} {for (i=2; i<=NF; i+=2) gsub(/,/,"",$i)} 1' file
"2022111812160156601777153","","","false","test1",**"here the issue that comma comma come inside the column"**
Just like GNU sed has -i as in your question to update the input file with the command's output, GNU awk has -i inplace, or just add > tmp && mv tmp file with any awk or any other Unix command.
This might work for you (GNU sed):
sed -E ':a;s/^(("[^",]*"\**,?\**)*"[^",]*),/\1/;ta' file
This iterates through each line removing any commas within paired double quoted fields.
N.B. The solution above also caters for double quoted field prefixed/suffixed by zero or *'s. If this should not be catered for, here is an ameliorated solution:
sed -E ':a;s/^(("[^",]*",?)*"[^",]*),/\1/;ta' file
N.B. Escaped double quotes and commas would need a or more involved regexp.

How to get first word of every line and pipe it into dmenu script

I have a text file like this:
first state
second state
third state
Getting the first word from every line isn't difficult, but the problem comes when adding the extra \n required to separate every word (selection) in dmenu, per its syntax:
echo -e "first\nsecond\nthird" | dmenu
I haven't been able to figure out how to add the separating \n. I've tried this:
state=$(awk '{for(i=1;i<=NF;i+=2)print $(i)'\n'}' text.txt)
But it doesn't work. I also tried this:
lol=$(grep -o "^\S*" states.txt | perl -ne 'print "$_"')
But same deal. Not sure what I'm doing wrong.
Your problem is in the AWK script. You need to identify each input line as a record. This way, you can control how each record in the output is separated via the ORS variable (output record separator). By default this separator is the newline, which should be good enough for your purpose.
Now to print the first word of every input record (each line in the input stream in this case), you just need to print the first field:
awk '{print $1}' textfile | dmenu
If you need the output to include the explicit \n string (not the control character), then you can just overwrite the ORS variable to fit your needs:
awk 'BEGIN{ORS="\\n"}{print $1}' textfile | dmenu
This could be more easily done in while loop, could you please try following. This is simple, while is reading the file and during that its creating 2 variables 1st is named first and other is rest first contains first field which we are passing to dmenu later inside.
while read first rest
do
dmenu "$first"
done < "Input_file"
Based on the text file example, the following should achieve what you require:
awk '{ printf "%s\\n",$1 }' textfile | dmenu
Print the first space separated field of each line along with \n (\n needs to be escaped to stop it being interpreted by awk)
In your code
state=$(awk '{for(i=1;i<=NF;i+=2)print $(i)'\n'}' text.txt)
you attempted to use ' inside your awk code, however code is what between ' and first following ', therefore code is {for(i=1;i<=NF;i+=2)print $(i) and this does not work. You should use " for strings inside awk code.
If you want to merely get nth column cut will be enough in most cases, let states.txt content be
first state
second state
third state
then you can do:
cut -d ' ' -f 1 states.txt | dmenu
Explanation: treat space as delimiter (-d ' ') and get 1st column (-f 1)
(tested in cut (GNU coreutils) 8.30)

Extracting key word from a log line

I have a log which got like this :
.....client connection.....remote=/xxx.xxx.xxx.xxx]].......
I need to extract all lines in the log which contain the above,and print just the ip after remote=.. This would be something in the pattern :
grep "client connection" xxx.log | sed -e ....
Using grep:
grep -oP '(?<=remote=/)[^\]]+' file
o is to extract only the pattern, instead of entire line.
P is to match perl like regex. In this case, we are using "negative look behind". It will try to match set of characters which is not "]" which is preceeded by remote=/
grep -oP 'client connection.*remote=/\K.*?(?=])' input
Prints anything between remote=/ and closest ] on the lines which contain client connection.
Or by using sed back referencing: Here the line is divided into three parts/groups which are later referred by \1 \2 or \3. Each group is enclosed by ( and ). Here IP address belongs to 2nd group, so whole line is replaced by 2nd group which is IP address.
sed -r '/client connection/ s_(^.*remote=/)(.*?)]](.*)_\2_g' input
Or using awk :
awk -F'/|]]' '/client connection/{print $2}' input
Try this:
grep 'client connection' test.txt | awk -F'[/\\]]' '{print $2}'
Test case
test.txt
---------
abcd
.....client connection.....remote=/10.20.30.40]].......
abcs
.....client connection.....remote=/11.20.30.40]].......
.....client connection.....remote=/12.20.30.40]].......
Result
10.20.30.40
11.20.30.40
12.20.30.40
Explanation
grep will shortlist the results to only lines matching client connection. awk uses -F flag for delimiter to split text. We ask awk to use / and ] delimiters to split text. In order to use more than one delimiter, we place the delimiters in [ and ]. For example, to split text by = and :, we'd do [=:].
However, in our case, one of the delimiters is ] since my intent is to extract IP specifically from /x.x.x.x] by spitting the text with / and ]. So we escape it ]. The IP is the 2nd item from the splitting.
A more robust way, improved over this answer would be to also use GNU grep in PCRE mode with -P for perl style regEx match, but matching both the patterns as suggested in the question.
grep -oP "client connection.*remote=/\K(\d{1,3}\.){3}\d{1,3}" file
10.20.30.40
11.20.30.40
12.20.30.40
Here, client connection.*remote matches both the patterns in the lines and extracts IP from the file. The \K is a PCRE syntax to ignore strings up to that point and print only the capture group following it.
(\d{1,3}\.){3}\d{1,3}
To match the IP i.e. 3 groups of digits separated by dots of length from 1 to 3 followed by 4th octet.

Linux command to extract the value for a given name

Suppose I have one text file(EmployeeDetails.txt) in which the content is written(all name/value in new line) as mentioned below:-
EmployeeName=XYZ
EmployeeBand=D5
EmployeeDesignation=SSE
I need the Linux command which will read this file EmployeeDetails.txt and give the value for EmployeeBand. Output should be
D5
Using grep: If anything is followed by EmployeeBand= will be printed.
grep -oP 'EmployeeBand=\K.*' EmployeeDetails.txt
Using awk where = is used as field separator and second field is printed. if search criteria is meet.
awk -F'=' '/EmployeeBand/{print $2}' EmployeeDetails.txt
Using sed ,here the band D5 is captured is a group inside () and later used using \1.
sed -r '/EmployeeBand/ s/.*=(.*$)/\1/g' EmployeeDetails.txt

Pick a specific value in a program output (Bash)

I'm running LIBSVM in linux terminal called by a C program. Ok, i need to pick the output but the format is the following
Accuracy = 80% (24/30) (classification)
I need to pick only the "80" value as an integer. I tried with sed and came to this command:
sed 's/[^0-9^'%']//g' 'f' >> f
This is filtering all integers in the output and, thus, isn't working yet, so I need help. Thanks in advance
Try grep in PCRE mode (-P), printing only the matched parts (-o), with a lookahead assertion:
$ echo "Accuracy = 80% (24/30) (classification)" | grep -Po '[0-9]+(?=%)'
80
The regexp:
[0-9] # match a digit
+ # one or more times
(?=%) # assert that the digits are followed by a %
It is very trivial with awk. Identify the column you need and strip the '%' sign from it. The /^Accuracy/ regex ensures that the action is only performed on the lines starting with Accuracy. You don't need it if your file only contains one line.
awk '/^Accuracy/{sub(/%/,"");print $3}' inputFile
Alternatively, you can set space and % as field separators and do
awk -F'[ %]' '/^Accuracy/{print $3}' inputFile
If you want to do it with sed then you can try something like:
sed '/^Accuracy/s/.* \(.*\)%.*/\1/' inputFile
This might work for you (GNU sed):
sed -nr '/^Accuracy = ([^%]*)%.*/s//\1/p' file

Resources