I have the following scenario I have a block of text and example
basketball:
ball: round
being that I don't know exactly what's inside basketball: but I like to delete everything inside it example:
men:
height: 170
athlete: basketball
women:
height:180
athlete: basketball
I want to delete only the men block ignoring whatever is above or below this key
The AWK script filter.awk below removes all men sections which contains basketball. Is that what you mean? Run with awk -f filter.awk input.txt.
/^[A-Za-z0-9]/ {
if (sectionWanted) {
printf "%s", section
}
sectionWanted = 1
section = ""
sectionName = $1
}
/basketball/ && sectionName == "men:" {
sectionWanted = 0
}
{
section = section $0 "\n"
}
END {
if (sectionWanted) {
printf "%s", section
}
}
I'm trying to create my own program to do a recursive listing: each line corresponds to the full path of a single file. The tricky part I'm working on now is: I don't want bind mounts to trick my program into listing files twice.
So I already have a program that produces the right output except that if /foo is bind mounted to /bar then my program incorrectly lists
/foo/file
/bar/file
I need the program to list just what's below (EDIT: even if it was asked to list the contents of /foo)
/bar/file
One approach I thought of is to mount | grep bind | awk '{print $1 " " $3}' and then iterate over this to sed every line of the output, then sort -u.
My question is how do I iterate over the original output (a bunch of lines) and the output from mount (another bunch of lines)? (or is there a better approach) This needs to be POSIX (EDIT: and work with /bin/sh)
Place the 'mount | grep bind' command into the AWK within a BEGIN block and store the data.
Something like:
PROG | awk 'BEGIN{
# Define the data you want to store
# Assign to global arrays
command = "mount | grep bind";
while ((command | getline) > 0) {
count++;
mount[count] = $1;
mountPt[count] = $3
}
}
# Assuming input is line-by-line and that mountPt is the value
# that is undesired
{
replaceLine=0
for (i=1; i<=count; i++) {
idx = index($1, mountPt[i]);
if (idx == 1) {
replaceLine = 1;
break;
}
}
if (replaceLine == 1) {
sub(mountPt[i], mount[i], $1);
}
if (printed[$1] != 1) {
print $1;
}
printed[$1] = 1;
} '
Where I assume your current program, PROG, outputs to stdout.
find YourPath -print > YourFiles.txt
mount > Bind.txt
awk 'FNR == NR && $0 ~ /bind/ {
Bind[ $1] = $3
if( ( ThisLevel = split( $3, Unused, "/") - 1 ) > Level) Level = ThisLevel
}
FNR != NR && $0 !~ /^ *$/ {
RealName = $0
for( ThisLevel = Level; ThisLevel > 0; ThisLevel--){
match( $0, "(/[^/]*){" ThisLevel "}" )
UnBind = Bind[ substr( $0, 1, RLENGTH) ]
if( UnBind !~ /^$/) {
RealName = UnBind substr( $0, RLENGTH + 1)
ThisLevel = 0
}
}
if( ! File[ RealName]++) print RealName
}
' Bind.txt YourFiles.txt
search based on a exact path/bind comparaison from a bind array loaded first
Bind.txt and YourFiles.txt could be a direct redirection to be "1" instruction and no temporary files
have to be adapted (first part of awk) if path in bind are using space character (assume not here)
file path are changed live when reading, compare to an existing bind relation
print file if not yet known
I have a shell script that is doing something.I want to print the Unknown string where there is blank space in the output.
I want to do check if (f[1] == "") or (f[2] == "") or (f[3] == ""), it should be replaced by a unknown string and should be written in a single file
if(f[1] == "") printf(fmt, id, f[1], f[2], f[3]) > file
where f[1],f[2],f[3] if empty should be replaced by unknown string
where f[1] is the first index, fmt is the format specifier I have defined in the code.How to replace these empty spaces with a string in Linux.
Any lead is appreciated.
Thanks
Use the conditional operator:
ec2-describe-instances | awk -F'\t' -v of="$out" -v mof="$file" '
function pr() { # Print accumulated data
if(id != "") { # Skip if we do not have any unprinted data.
printf(fmt, id, f[1], f[2], f[3]) > of
if (f[1] == "" || f[2] == "" || f[3] == "") {
printf(fmt, id, f[1]==""?"Unknown":f[1], f[2]==""?"Unknown":f[2], f[3]==""?"Unknown":f[3]) > mof
}
}
# Clear accumulated data.
id = f[1] = f[2] = f[3] = ""
}
BEGIN { # Set the printf() format string for the header and the data lines.
fmt = "%-20s %-40s %-33s %s\n"
# Print the header
headerText="Instance Details"
headerMaxLen=100
padding=(length(headerText) - headerMaxLen) / 2
printf("%" padding "s" "%s" "%" padding "s" "\n\n\n", "", headerText, "") > of
printf(fmt, "Instance id", "Name", "Owner", "Cost.centre") > of
printf("%" padding "s" "%s" "%" padding "s" "\n\n\n", "", headerText, "") > mof
printf(fmt, "Instance id", "Name", "Owner", "Cost.centre") > mof
}
$1 == "TAG" {
# Save the Instance ID.
id = $3
if($4 ~ /[Nn]ame/) fs = 1 # Name found
else if($4 ~ /[Oo]wner/) fs = 2 # Owner found
else if($4 ~ /[Cc]ost.[Cc]ent[er][er]/) fs = 3 # Cost center found
else next # Ignore other TAGs
f[fs] = $5 # Save data for this field.
}
$1 == "RESERVATION" {
# First line of new entry found; print results from previous entry.
pr()
}
END { # EOF found, print results from last entry.
pr()
}'
I'm trying to get the lines with special characters which is not prefixed with \. Below are the special characters:
^$%.*+?!(){}[]|\
I need to check all the above special characters which is not prefixed with \ in 2nd column. I'm trying with awk to complete this, but no luck. I want the output as below.
input.txt
1,ap^ple
2,o$range
3,bu+tter
4,gr(ape
5,sm\(ok\e
6,ra\in
7,p+la\\y
8,wor\+k
output.txt
1,ap^ple
2,o$range
3,bu+tter
4,gr(ape
5,sm\(ok\e
6,ra\in
7,p+la\\y
7th row and 5 row are in output.txt because there is 2 special charcters(one is with backslash another without backslash)
"final" final edit: I wanted to allow "\x" whatever x is, but the OP seems to not want that, so I fixed it too.
After trying to find a "clever" regexp (which choked on "\\" or any impair number of "\", but apparently worked for the rest...)
I re-wrote it in awk to do it in a "state automata" way:
The idea:
If in "normal mode", we encounter a special char other than "\" ? : we print the line!
If in "normal mode", we encounter a "\" ? : we enter "escaped mode", and in that mode, ignore the next char
(but if we don't have a next char, we need to print that line too!)
the script:
awk -F"," '
{
IN_ESCAPED_MODE=0 ;
for (i=1 ; i<=length($2) ; i++)
{ char=substr($2,i,1)
if ( IN_ESCAPED_MODE == 0)
{ if ( index(".^$%*+?!(){}[]|",char) > 0 )
{ print $0 ; break ;
}
if ( index("\\" , char ) > 0 )
{ IN_ESCAPED_MODE=1 ; continue ;
}
}
if ( IN_ESCAPED_MODE == 1)
{ if ( index(".^$%*+?!(){}[]|\\",char) > 0 )
{ IN_ESCAPED_MODE=0 ; continue ;
}
else
{ IN_ESCAPED_MODE=0 ; print $0; break;
}
}
}
if (IN_ESCAPED_MODE == 1)
{
print $0 ; break ;
}
}
' input.txt > output.txt
With this change, you will have the same output as the OP, which prints a line when it contains "\e" for example... Which I find weird: to me "\e" is fine, we can "escape" anything?
With that input:
1,ap^ple
2,o$range
3,bu+tter
4,gr(ape
5,sm\(ok\e
6,ra\in
7,p+la\\y
8,wor\+k
10,\
11,\\
12,\\\
13,.
14,\.
15,..
16,^
17,\^
18,$
19,\$
20,%
21,\%
22,*
23,\*
24,+
25,\+
26,?
27,\?
28,!
29,\!
30,(
31,\(
32,)
33,\)
34,{
35,\{
36,}
37,\}
38,[
39,\[
40,]
41,\]
42,|
43,\|
it outputs:
1,ap^ple
2,o$range
3,bu+tter
4,gr(ape
5,sm\(ok\e
6,ra\in
7,p+la\\y
10,\
12,\\\
13,.
15,..
16,^
18,$
20,%
22,*
24,+
26,?
28,!
30,(
32,)
34,{
36,}
38,[
40,]
42,|
(so it appears to really work this time !)
If you prefer to allow any "\x" and NOT only if "x" is a SPECIAL char:
change the "middle lines":
if ( IN_ESCAPED_MODE == 1)
{ if ( index(".^$%*+?!(){}[]|\\",char) > 0 )
{ IN_ESCAPED_MODE=0 ; continue ;
}
else
{ IN_ESCAPED_MODE=0 ; print $0; break;
}
}
into:
if ( IN_ESCAPED_MODE == 1)
{ IN_ESCAPED_MODE=0 ; continue ;
}
for historical reason : the regexp (which worked in "most" cases but choked in some, for example if there was "\\") :
egrep '[^\][].^$%*+?!(){}[|]|[^\][\][^].^$%*+?!(){}[|\]' input.txt > output.txt
But that one will not display the line 12, for example...
A good read: http://www.regular-expressions.info/charclass.html .... and http://www.gnu.org/software/gawk/manual/html_node/Gory-Details.html (scary ...)
You can try the following:
awk '
{
line=$0
sub(/\\[\^$%.*+?!(){}\[\]|\\]/,"")
if(/[\^$%.*+?!(){}\[\]|\\]/)
print line
}' input.txt
sed '/[]\\^$%.*+?!(){}[|]/ {
h
s/\\[]\\^$%.*+?!(){}[|]/_/g
/[]\\^$%.*+?!(){}[|]/ {
x
p
}
}' YourFile
Depending of shell and sed could be interpreted (especialy the \) differently. Works on my AIX/KSH
I have this code, but it's giving me an error
awk '
FNR == NR {
# reading get_ids_only.txt
values[$1] = ""
next
}
BEGIN {
# reading default.txt
for (elem in values){
if ($0 ~ elem){
if (values[elem] == ""){
values[elem] = "\"" $0 "\""
getline;
values[elem] = "\n"" $0 ""\n"
}
else{
values[elem] = values[elem] ", \"" $0 "\""
getline;
values[elem] = values[elem] "\n"" $0 ""\n"
}
}
}
END {
for (elem in values)
print elem " [" values[elem] "]"
}
' get_ids_only.txt default.txt
The error says
awk: syntax error at source line 23
context is
>>> END <<< {
awk: illegal statement at source line 24
awk: illegal statement at source line 24
missing }
This is where my END{ } function starts...
What I'm trying to do is.. compare the string.... in file 1.. if the string is found in file 2, print the string and print the line after it as well., then skip a space.
input1:
message id "hello"
message id "good bye"
message id "what is cookin"
input2:
message id "hello"
message value "greetings"
message id "good bye"
message value "limiting"
message id "what is there"
message value "looking for me"
message id "what is cooking"
message value "breakfast plate"
output:
should print out all the input1, grabbing the message value from input 2.
can anyone guide me on why this error is occurring?
I'm using the terminal on my mac.
Here's your BEGIN block with recommended indention and comments, can you see the problem?
BEGIN {
# reading default.txt
for (elem in values){
if ($0 ~ elem){
if (values[elem] == ""){
values[elem] = "\"" $0 "\""
getline;
values[elem] = "\n"" $0 ""\n"
}
else{
values[elem] = values[elem] ", \"" $0 "\""
getline;
values[elem] = values[elem] "\n"" $0 ""\n"
} # End inner if
} # End outer if
} # End for loop
Your missing a closing brace. Note that in the final concatenation with $0, $0 is actually quoted.
There are some other issues with this, I'm not sure what you are trying to do, but it seems a very un-awky approach. Usually if you find yourself overusing getline, you should be thinking about spreading the code into separate blocks with appropriate conditions. See this article on the uses an misuses of getline for more.
A more awky way to solve it
If I understand you correctly, this is the way I would solve this task:
extract.awk
FNR==NR { id[$0]; next } # Collect id lines in the `id' array
$0 in id { f=1 } # Use the `f' as a printing flag
f # Print when `f' is 1
NF==0 { f=0 } # Stop printing after an empty line
Run it like this:
awk -f extract.awk input1 input2
Output:
message id "hello"
message value "greetings"
message id "good bye"
message value "limiting"