I thought it would be easy to define a string such as "1 2 3" and use it within AWK (GAWK) to extract the required fields, how wrong I have been.
I have tried creating AWK arrays, BASH arrays, splitting, string substitution etc, but could not find any method to use the resulting 'chunks' (ie the column/field numbers) in a print statement.
I believe Akshay Hegde has provided an excellent solution with the get_cols function, here
but it was over 8 years ago, and I am really struggling to work out 'how it works', namely, what this is doing;
s = length(s) ? s OFS $(C[i]) : $(C[i])
I am unable to post a comment asking for clarification due to my lack of reputation (and it is an old post).
Is someone able to explain how the solution works?
NB I don't think I need the sub as I using the following to cleanup (replace all non-numeric characters with a comma, ie seperator, and sort numerically)
Columns=$(echo $Input_string | sed 's/[^0-9]\+/,/g') Columns=$(echo $Columns | xargs -n1 | sort -n | xargs)
(using this string, the awk would be Executed using awk -v cols=$Columns -f test.awk infile in the given solution)
Given the informative answer from #Ed Morton, with a nice worked example, I have attempted to remove the need for a function (and also an additional awk program file). The intention is to have this within a shell script, and I would rather it be self contained, but also, further investigation into 'how it works'.
Fields="1 2 3"
echo $Fields | awk -F "," '{n=split($0,Column," "); for(i=1;i<=n;i++) s = length(s) ? s OFS $(Column[i]) : $(Column[i])}END{print "s="s " arr1="Column[1]" arr2="Column[2]" arr3="Column[3]}'
The results have surprised me (taking note of my Comment to Ed)
s=1 2 3 arr1=1 arr2=2 arr3=3
The above clearly shows the split has worked into the array, but I thought s would include $ for each ternary operator concatenation, ie "$1 $2 $3"
Moreso, I was hoping to append the actual file to the above command, which I have found allows me to use echo $string | awk '{program}' file.name
NB it is a little insulting that my question has been marked as -1 indicating little research effort, as I have spent days trying to work this out.
Taking all the information above, I think s results in "1 2 3", but the print doesn't accept this in the same way as it does as it is called from a function, simply trying to 'print 1 2 3' in relation to the file, which seems to be how all my efforts have ended up.
This really confuses me, as Ed's 'diagonal' example works from command line, indicating that concept of 'print s' is absolutely fine when used with a file name input.
Can anyone suggest how this (example below) can work?
I don't know if using echo pipe and appending the file name is strictly allowed, but it appears to work (?!?!?!)
(failed result)
echo $Fields | awk -F "," '{n=split($0,Column," "); for(i=1;i<=n;i++) s = length(s) ? s OFS $(Column[i]) : $(Column[i])}END{print s}' myfile.txt
This appears to go through myfile.txt and output all lines containing many comma separated values, ie the whole file (I haven't included the values, just for illustration only)
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
what this is doing; s = length(s) ? s OFS $(C[i]) : $(C[i])
You have encountered a ternary operator, it has following syntax
condition ? valueiftrue : valueiffalse
length function, when provided with single argument does return number of characters, in GNU AWK integer 0 is considered false, others integers are considered true, so in this case it is is not empty check. When s is not empty (it might be also not initalized yet, as GNU AWK will assume empty string in such case), it is concatenated with output field separator (OFS, default is space) and C[i]-th field value and assigned to variable s, when s is empty value of C[i]-th field value. Used multiple time this allows building of string of values sheared by OFS, consider following simple example, let say you want to get diagonal of 2D matrix, stored in file.txt with following content
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
then you might do
awk '{s = length(s) ? s OFS $(NR) : $(NR)}END{print s}' file.txt
which will get output
1 7 13 19 25
Explanation: NR is number row, so 1st row $(NR) is 1st field, for 2nd row it is 2nd field, for 3rd it is 3rd field and so on
(tested in GNU Awk 5.0.1)
I have a log that returns thousands of lines of data, I want to extract a few values from that.
In the log there is only one line containing the unquie unit reference so I can grep for that using:
grep "unit=Central-C152" logfile.txt
That produces a line of output similar to the following:
a3cd23e,85d58f5,53f534abef7e7,unit=Central-C152,locale=32325687-8595-9856-1236-12546975,11="School",1="Mr Green",2="Qual",3="SWE",8="report",5="channel",7="reset",6="velum"
The format of the line may change in that the order of the values won't always be in the same position.
I'm trying to work out how to get the value of 2 and 7 in to separate variables.
I had thought about cut on , or = but as the values aren't in a set order I couldn't work out that best way to do it.
I' trying to get:
var state=value of 2 without quotes
var mode=value of 7 without quotes
Can anyone advise on the best way to do this ?
Thanks
Could you please try following to create variable's values.
state=$(awk '/unit=Central-C152/ && match($0,/2=\"[^"]*/){print substr($0,RSTART+3,RLENGTH-3)}' Input_file)
mode=$(awk '/unit=Central-C152/ && match($0,/7=\"[^"]*/){print substr($0,RSTART+3,RLENGTH-3)}' Input_file)
You could print them too by doing following.
echo "$state"
echo "$mode"
Explanation: Adding explanation of command too now.
awk ' ##Starting awk program here.
/unit=Central-C152/ && match($0,/2=\"[^"]*/){ ##Checking condition if a line has string (unit=Central-C152) and using match using REGEX to check from 2 to till "
print substr($0,RSTART+3,RLENGTH-3) ##Printing substring starting from RSTART+3 till RLENGTH-3 characters.
}
' Input_file ##Mentioning Input_file name here.
You are probably better off doing all of the processing in Awk.
awk -F, '/unit=Central-C152/ {
for(i=1;i<=NF;++i)
if($i ~ /^[27]="/) {
b[++k] = $i
sub(/^[27]="/, "", b[k])
sub(/"$/, "", b[k])
gsub(/\\/, "", b[k])
}
print "state " b[1] ", mode " b[2]
}' logfile.txt
This presupposes that the fields always occur in the same order (2 before 7). Maybe you need to change or disable the gsub to remove backslashes in the values.
If you want to do more than print the values, refactoring whatever Bash code you have into Awk is often a better approach than doing this processing in Bash.
Assuming you already have the line in a variable such as with:
line="$(grep 'unit=Central-C152' logfile.txt | head -1)"
You can then simply use the built-in parameter substitution features of bash:
f2=${line#*2=\"} ; f2=${f2%%\"*} ; echo ${f2}
f7=${line#*7=\"} ; f7=${f7%%\"*} ; echo ${f7}
The first command on each line strips off the first part of the line up to and including the <field-number>=". The second command then strips everything off that beyond (and including) the first quote. The third, of course, simply echos the value.
When I run those commands against your input line, I see:
Qual
reset
which is, from what I can see, what you were after.
This question already has answers here:
Printing with sed or awk a line following a matching pattern
(9 answers)
Closed 6 years ago.
I'm trying to replace every nth occurrence of a string in a text file.
background:
I have a huge bibtex file (called in.bib) containing hundreds of entries beginning with "#". But every entry has a different amount of lines. I want to write a string (e.g. "#") right before every (let's say) 6th occurrence of "#" so, in a second step, I can use csplit to split the huge file at "#" into files containing 5 entries each.
The problem is to find and replace every fifth "#".
Since I need it repeatedly, the suggested answer in printing with sed or awk a line following a matching pattern won't do the job. Again, I do not looking for just one matching place but many of it.
What I have so far:
awk '/^#/ && v++%5 {sub(/^#/, "\n#\n#")} {print > "out.bib"}' in.bib
replaces 2nd until 5th occurance (and no more).
(btw, I found and adopted this solution here: "Sed replace every nth occurrence". Initially, it was meant to replace every second occurence--which it does.)
And, second:
awk -v p="#" -v n="5" '$0~p{i++}i==n{sub(/^#/, "\n#\n#")}{print > "out.bib"}' in.bib
replaces exactly the 5th occurance and nothing else.
(adopted solution from here: "Display only the n'th match of grep"
What I need (and not able to write) is imho a loop. Would a for loop do the job? Something like:
for (i = 1; i <= 200; i * 5)
<find "#"> and <replace with "\n#\n#">
then print
The material I have looks like this:
#article{karamanic_jedno_2007,
title = {Jedno Kosova, Dva Srbije},
journal = {Ulaznica: Journal for Culture, Art and Social Issues},
author = {Karamanic, Slobodan},
year = {2007}
}
#inproceedings{blome_eigene_2008,
title = {Das Eigene, das Andere und ihre Vermischung. Zur Rolle von Sexualität und Reproduktion im Rassendiskurs des 19. Jahrhunderts},
comment = {Rest of lines snippet off here for usability -- as in following entries. All original entries may have a different amount of lines.}
}
#book{doring_inter-agency_2008,
title = {Inter-agency coordination in United Nations peacebuilding}
}
#book{reckwitz_subjekt_2008,
address = {Bielefeld},
title = {Subjekt}
}
What I want is every sixth entry looking like this:
#
#book{reckwitz_subjekt_2008,
address = {Bielefeld},
title = {Subjekt}
}
Thanks for your help.
Your code is almost right, i modified it.
To replace every nth occurrence, you need a modular expression.
So for better understanding with brackets, you need an expression like ((i % n) == 0)
awk -v p="#" -v n="5" ' $0~p { i++ } ((i%n)==0) { sub(/^#/, "\n#\n#") }{ print }' in.bib > out.bib
you can do the splitting in awk easily in one step.
awk -v RS='#' 'NR==1{next} (NR-1)%5==1{c++} {print RT $0 > FILENAME"."c}' file
will create file.1, file.2, etc with 5 records each, where the record is defined by the delimiter #.
Instead of doing this in multiple steps with multiple tools, just do something like:
awk '/#/ && (++v%5)==1{out="out"++c} {print > out}' file
Untested since you didn't provide any sample input/output.
If you don't have GNU awk and your input file is huge you'll need to add a close(out) right before the out=... to avoid having too many files open simultaneously.
I am in the middle of a migration for PTR records from MSoft and I am adjusting the zonefiles for my needs. I have already prepared the zone files so they look like the following:
snapo#jump:~/mike/10$ cat 21.128
102 [AGE:3630582] 1200 PTR host1.domain.company.local.
69 [AGE:3630774] 1200 PTR host2.domain.compan2.local.
[AGE:3630762] 1200 PTR host2.domain.company.local.
80 [AGE:3630774] 1200 PTR hostXX.domain.company.local.
so I have the filename as variable x and I want to achieve the output of the text file to be like this with awk (because I don't think that there is another way in bash). Please no php/python/perl answers, because the script will need to run on different systems and the only language that is supposed to be installed is bash.
Because this is a merge from multiple PTR zones to one, I would have to edit the zone file to look like this:
102.21.128 [AGE:3630582] 1200 PTR host1.domain.company.local.
69.21.128 [AGE:3630774] 1200 PTR host2.domain.compan2.local.
21.128 [AGE:3630762] 1200 PTR host2.domain.company.local.
80.21.128 [AGE:3630774] 1200 PTR hostXX.domain.company.local.
It is also possible that there is no number in the first row "empty" , then it should add it without a dot in front. Do you have an awk sample or any other sample (cut , grep , head, tail, sed)?
Command should replace the strings in the existing file or with a pipe in the output file > editedtextfile.txt or similar.
With sed:
sed 's/^[^[:space:]]\+/&.21.128/' filename
Treating the input as plain text has the advantage of keeping the formatting intact.
For the edited question, this can be expanded to
sed 's/^[^[:space:]]\+/&.21.128/; s/^[[:space:]]/21.128&/' filename
Addendum: If you don't want to repeat the inserted data in the code, then
sed 's/^[^[:space:]]*/&\n21.128/; s/^\n//; s/\n/./' filename
is another approach that uses a little more trickery: It inserts a marker before the new data, removes the marker if there is nothing before it and otherwise replaces it with a dot.
Addendum 2: Using shell variables with sed code is a little tricky and potentially dangerous (because of code injection). If the variable comes from a trustworthy source and is known to not contain any metacharacters, then it is possible to write
sed "s/^[^[:space:]]*/&\n$variable/; s/^\n//; s/\n/./" filename
as #triplee points out in the comments. If $variable contains slashes but no other metacharacters and a character is known that it does not contain, then it is possible to use a different delimiter for the s command:
sed "s#^[^[:space:]]*#&\n$variable#; s/^\n//; s/\n/./" filename
(if it is known that $variable does not contain the character #).
If none of this is the case, deeper magic is required. For example, if $variable is known to be a single line (I suspect that this is the case because otherwise the transformation makes little sense), then it is possible to write
(echo "$variable"; cat filename) | sed '1 { h; d; }; s/^[^[:space:]]*/&\n/; G; s/\(.*\n\)\(.*\)\n\(.*\)/\1\3\2/; s/^\n//; s/\n/./'
This feeds the variable to sed as first line of the input, and then works as follows:
1 { h; d; } # first line: hold, don't print
s/^[^[:space:]]*/&\n/ # after that: Insert marker as before
G # fetch variable from the hold buffer
s/\(.*\n\)\(.*\)\n\(.*\)/\1\3\2/ # move it to the right place
s/^\n// # rest as before.
s/\n/./
However, at this point you may want to consider using awk instead, which has better facilities to deal with shell variables (that is to say, you can use them without treating them as code):
awk -v var="$variable" '{ n = match($0, /[ \t]/); print substr($0, 1, n - 1) (n <= 1 ? "" : ".") var substr($0, n) }' filename
The -v var="$variable" makes a variable var known to the awk code that has the value of $variable", and the awk code then works as follows:
{
# find the first space or tab in the line (0 if none)
# (I would use [[:space:]] here, but there are commonly shipped versions
# of mawk that don't understand POSIX character classes, so for portability
# I resort to [ \t])
n = match($0, /[ \t]/)
# assemble output line accordingly and print it.
print substr($0, 1, n - 1) (n <= 1 ? "" : ".") var substr($0, n)
}
awk -F" " '{print $1".21.128\t" $2"\t"$3"\t"$4"\t"$5}' $1