add text after keyword in bash / shell

add text after keyword in bash / shell - linux

I am in the middle of a migration for PTR records from MSoft and I am adjusting the zonefiles for my needs. I have already prepared the zone files so they look like the following:
snapo#jump:~/mike/10$ cat 21.128
102 [AGE:3630582] 1200 PTR host1.domain.company.local.
69 [AGE:3630774] 1200 PTR host2.domain.compan2.local.
[AGE:3630762] 1200 PTR host2.domain.company.local.
80 [AGE:3630774] 1200 PTR hostXX.domain.company.local.
so I have the filename as variable x and I want to achieve the output of the text file to be like this with awk (because I don't think that there is another way in bash). Please no php/python/perl answers, because the script will need to run on different systems and the only language that is supposed to be installed is bash.
Because this is a merge from multiple PTR zones to one, I would have to edit the zone file to look like this:
102.21.128 [AGE:3630582] 1200 PTR host1.domain.company.local.
69.21.128 [AGE:3630774] 1200 PTR host2.domain.compan2.local.
21.128 [AGE:3630762] 1200 PTR host2.domain.company.local.
80.21.128 [AGE:3630774] 1200 PTR hostXX.domain.company.local.
It is also possible that there is no number in the first row "empty" , then it should add it without a dot in front. Do you have an awk sample or any other sample (cut , grep , head, tail, sed)?
Command should replace the strings in the existing file or with a pipe in the output file > editedtextfile.txt or similar.

With sed:
sed 's/^[^[:space:]]\+/&.21.128/' filename
Treating the input as plain text has the advantage of keeping the formatting intact.
For the edited question, this can be expanded to
sed 's/^[^[:space:]]\+/&.21.128/; s/^[[:space:]]/21.128&/' filename
Addendum: If you don't want to repeat the inserted data in the code, then
sed 's/^[^[:space:]]*/&\n21.128/; s/^\n//; s/\n/./' filename
is another approach that uses a little more trickery: It inserts a marker before the new data, removes the marker if there is nothing before it and otherwise replaces it with a dot.
Addendum 2: Using shell variables with sed code is a little tricky and potentially dangerous (because of code injection). If the variable comes from a trustworthy source and is known to not contain any metacharacters, then it is possible to write
sed "s/^[^[:space:]]*/&\n$variable/; s/^\n//; s/\n/./" filename
as #triplee points out in the comments. If $variable contains slashes but no other metacharacters and a character is known that it does not contain, then it is possible to use a different delimiter for the s command:
sed "s#^[^[:space:]]*#&\n$variable#; s/^\n//; s/\n/./" filename
(if it is known that $variable does not contain the character #).
If none of this is the case, deeper magic is required. For example, if $variable is known to be a single line (I suspect that this is the case because otherwise the transformation makes little sense), then it is possible to write
(echo "$variable"; cat filename) | sed '1 { h; d; }; s/^[^[:space:]]*/&\n/; G; s/\(.*\n\)\(.*\)\n\(.*\)/\1\3\2/; s/^\n//; s/\n/./'
This feeds the variable to sed as first line of the input, and then works as follows:
1 { h; d; } # first line: hold, don't print
s/^[^[:space:]]*/&\n/ # after that: Insert marker as before
G # fetch variable from the hold buffer
s/\(.*\n\)\(.*\)\n\(.*\)/\1\3\2/ # move it to the right place
s/^\n// # rest as before.
s/\n/./
However, at this point you may want to consider using awk instead, which has better facilities to deal with shell variables (that is to say, you can use them without treating them as code):
awk -v var="$variable" '{ n = match($0, /[ \t]/); print substr($0, 1, n - 1) (n <= 1 ? "" : ".") var substr($0, n) }' filename
The -v var="$variable" makes a variable var known to the awk code that has the value of $variable", and the awk code then works as follows:
{
# find the first space or tab in the line (0 if none)
# (I would use [[:space:]] here, but there are commonly shipped versions
# of mawk that don't understand POSIX character classes, so for portability
# I resort to [ \t])
n = match($0, /[ \t]/)
# assemble output line accordingly and print it.
print substr($0, 1, n - 1) (n <= 1 ? "" : ".") var substr($0, n)
}

awk -F" " '{print $1".21.128\t" $2"\t"$3"\t"$4"\t"$5}' $1

Related

Use bash to find line in java files which include a pattern, and then replace another part of the line

I have a directory that includes a lot of java files, and in each file I have a class variable:
String system = "x";
I want to be able to create a bash script which I execute in the same directory, which will go to only the java files in the directory, and replace this instance of x, with y. Here x and y are a word. Now this may not be the only instance of the word x in the java script, however it will definitely be the first.
I want to be able to execute my script in the command line similar to:
changesystem.sh -x -y
This way I can specify what the x should be, and the y I wish to replace it with. I found a way to find and print the line number at which the first instance of a pattern is found:
awk '$0 ~ /String system/ {print NR}' file
I then found how to replace a substring on a given line using:
awk 'NR==line_number { sub("x", "y") }'
However, I have not found a way to combine them. Maybe there is also an easier way? Or even, a better and more efficient way?
Any help/advice will be greatly appreciated

You may create a changesystem.sh file with the following GNU awk script:
#!/bin/bash
for f in *.java; do
awk -i inplace -v repl="$1" '
!x && /^\s*String\s+system\s*=\s*".*";\s*$/{
lwsp=gensub(/\S.*/, "", 1);
print lwsp"String system = \""repl"\";";
x=1;next;
}1' "$f";
done;
Or, with any awk:
#!/bin/bash
for f in *.java; do
awk -v repl="$1" '
!x && /^[[:space:]]*String[[:space:]]+system[[:space:]]*=[[:space:]]*".*";[[:space:]]*$/{
lwsp=$0; sub(/[^[:space:]].*/, "", lwsp);
print lwsp"String system = \""repl"\";";
x=1;next
}1' "$f" > tmp && mv tmp "$f";
done;
Then, make the file executable:
chmod +x changesystem.sh
Then, run it like
./changesystem.sh 'new_value'
Notes:
for f in *.java; do ... done iterates over all *.java files in the current directory
-i inplace - GNU awk feature to perform replacement inline (not available in a non-GNU awk)
-v repl="$1" passes the first argument of the script to the awk command
!x && /^\s*String\s+system\s*=\s*".*";\s*$/ - if x is false and the record starts with any amount of whitespace (\s* or [[:space:]]*), then String, any 1+ whitespaces, system, = enclosed with any zero or more whitesapces, and then a " char, then has any text and ends with "; and any zero or more whitespaces, then
lwsp=gensub(/\S.*/, "", 1); puts the leading whitespace in the lwsp variable (it removes all text starting with the first non-whitespace char from the line matched)
lwsp=$0; sub(/[^[:space:]].*/, "", lwsp); - same as above, just in a different way since gensub is not supported in non-GNU awk and sub modifies the given input string (here, lwsp)
{print "String system = \""repl"\";";x=1;next}1 - prints the String system = " + the replacement string + ";, assigns 1 to x, and moves to the next line, else, just prints the line as is.

You don't need to pre-compute the line number. The whole job can be done by one not-too-complicated sed command. You probably do want to script it, though. For example:
#!/bin/bash
[[ $# -eq 3 ]] || {
echo "usage: $0 <context regex> <target regex> <replacement text>" 1>&2
exit 1
}
sed -si -e "/$1/ { s/\\<$2\\>/$3/; t1; p; d; :1; n; b1; }" ./*.java
That assumes that the files to modify are java source files in the current working directory, and I'm sure you understand the (loose) argument check and usage message.
As for the sed command itself,
the -s option instructs sed to treat each argument as a separate stream, instead of operating as if by concatenating all the inputs into one long stream.
the -i option instructs sed to modify the designated files in-place.
the sed expression takes the default action for each line (printing it verbatim) unless the line matches the "context" pattern given by the first script argument.
for lines that do match the context pattern,
s/\\<$2\\>/$3/ - attempt to perform the wanted substitution
the \< and \> match word start and end boundaries, respectively, so that the specified pattern will not match a partial word (though it can match multiple complete words if the target pattern allows)
t1 - if a substitution was made, then branch to label 1, otherwise
p; d - print the current line and immediately start the next cycle
:1; n; b1 - label 1 (reachable only by branching): print the current line and read the next one, then loop back to label 1. This prints the remainder of the file without any more tests or substitutions.
Example usage:
/path/to/replace_first.sh 'String system' x y
It is worth noting that that does expose the user to some details of seds interpretation of regular expressions and replacement text, though that does not manifest for the example usage.
Note that that could be simplified by removing the context pattern bit if you are sure you want to modify the overall first appearance of the target in each file. You could also hard-code the context, the target pattern, and/or the replacement text. If you hard-code all three then the script would no longer need any argument handling or checking.

How to extract specific value using grep and awk?

I am facing a problem to extract a specific value in a .txt file using grep and awk.
I show below an excerpt from the .txt file:
"-
bravais-lattice index = 2
lattice parameter (alat) = 10.0000 a.u.
unit-cell volume = 250.0000 (a.u.)^3
number of atoms/cell = 2
number of atomic types = 1
number of electrons = 28.00
number of Kohn-Sham states= 18
kinetic-energy cutoff = 60.0000 Ry
charge density cutoff = 300.0000 Ry
convergence threshold = 1.0E-09
mixing beta = 0.7000"
I also defined some variable: ELEMENT and lat.
I want to extract the "unit-cell volume" value which is equal to 250.00.
I tried the following to extract the value using grep and awk:
volume=`grep "unit-cell volume" ./latt.10/$ELEMENT.scf.latt_$lat.out | awk '{printf "%15.12f\n",$5}'`
However, when i run the bash file I always get 00.000000 as a result instead of the correct value of 250.00.
Can anyone help, please?
Thanks in advance.

awk '{printf "%15.12f\n",$5}'
You're asking awk to print out the fifth field of the line ($5).
unit-cell volume = 250.0000 (a.u.)^3
1 2 3 4 5
The fifth field is (a.u.)^3, which you are then asking awk to interpret as a number via the %f format code. It's not a number, though (or actually, doesn't start with a number), and when awk is asked to treat a non-numeric string as a number, it uses 0 instead. Thus it prints 0.
Solution: use $4 instead.
By the way, you can skip invoking grep by using awk itself to select the line, e.g.
awk /^ unit-cell/ {...}
The /^ unit-cell/ is a regular expression that matches "unit-cell" (with a leading space) at the beginning of the line. Adjust as necessary if you have other lines that start with unit-cell which you don't want to select.

You never need grep when you're using awk since awk can do anything useful that grep can do. It sounds like this is all you need:
$ awk -F'=' '/unit-cell volume/{printf "%.2f\n",$2}' file
250.00
The above works because when FS is = that means $2 is <spaces>250.000 (a.u.)^3 and when awk is asked to convert a string to a number it strips off leading spaces and anything after the numeric part so that leaves 250.000 to be converted to a number by %.2f.
In the script you posted $5 was failing because the 5th space-separated field in:
$1 $2 $3 $4 $5
<unit-cell> <volume> <=> <250.0000> <(a.u.)^3>
is (a.u.)^3 - you could have just added print $5 to see that.

Since you are processing key-value pairs where the key can have variable amount on space in it, you need to tune that field number ($4, $5 etc.) separately for each record you want to process unless you set the field separator (FS) appropriately to FS=" *= *". Then the key will always be in $1 and value in $2.
Then use split to split the value and unit parts from each other.
Also, you can loose that grep by defining in awk a pattern (or condition, /unit-cell volume/) for that printaction:
$ awk 'BEGIN{FS=" *= *"} /unit-cell volume/{split($2,a," +");print a[1]}' file
250.0000
Explained:
$ awk '
BEGIN { FS=" *= *" } # set appropriate field separator
/unit-cell volume/ { # pattern or condition
split($2,a," +") # split value part to value and possible unit parts
print a[1] # output value part
}' file

Finding character location of all instances of a string in bash

I'm trying to find the location of all instances of a string in a particular file; however, the code I'm currently running only returns the location of the first instance and then stops there. Here is what I'm currently running:
str=$(cat temp1.txt)
tmp="${str%%<C>*}"
if [ "$tmp" != "$str" ]; then
echo ${#tmp}
fi
The file is only one line of string and I would display it but the format questions need to be in won't allow me to add the proper amount of spaces between each character.

I am not sure of many details of your requirements, however this is an awk one-liner:
awk -vRS='<C>' '{printf("%u:",a+=length($0));a+=length(RS)}END{print ""}' temp1.txt
Let’s test it with an actual line of input:
$ awk -vRS='<C>' \
'{printf("%u:",a+=length($0));a+=length(RS)}END{print ""}' \
<<<" <C> <C> "
4:14:20:
This means: the first <C> is at byte 4, the second <C> is at byte 14 (including the three bytes of the first <C>), and the whole line is 20 bytes long (including final newline).
Is this what you want?
Explanation
We set (-v) record separator (RS) as <C>. Then we keep a variable a with the count of all bytes processed so far. For each “line” (i.e., <C>-separated substrings) we add the length of the current line to a, printf it with a suitable format "%u:", and increase a by the length of the separator which ended the current line. Since no printing so far included newlines, at the END we print an empty string, which is an idiom to output a final newline.

Look at the basically the same question asked here.
In particular your question may be answered for multiple instances thanks to user
JRFerguson response using perl.
EDIT: I found another solution that might just do the trick here. (The main question and response post is found here.)
I changed the shell from ksh to bash, changed the searched string to include multiple <C>'s to better demonstrate an answer the question, and named it "tester":
#!/bin/bash
printf '%s\n' '<C>abc<C>xyz<C>123456<C>zzz<C>' | awk -v s="$1" '
{ d = ""
for(i = 1; x = index(substr($0, i), s); i = i + x + length(s) - 1) {
printf("%s%d", d, i + x - 1)
d = ":"
}
print ""
}'
This is how I ran it:
$ tester '<C>'
1:7:13:22:28
I haven't figured the code out (I like to know why it works) but it seems to work! It would nice to get an explanation and an elegant way to feed your string into this script. Cheers.

find a pattern and print line based on finding the first pattern sed, awk grep

I have a rather large file. What is common to all is the hostname to break each section example :
HOSTNAME:host1
data 1
data here
data 2
text here
section 1
text here
part 4
data here
comm = 2
HOSTNAME:host-2
data 1
data here
data 2
text here
section 1
text here
part 4
data here
comm = 1
The above prints
As you see above, in between each section there are other sections broken down by key words or lines that have specific values
I like to use a oneliner to print host name for each section and then print which ever lines I want to extract under each hostname section
Can you please help. I am using now grep -C 10 HOSTNAME | gerp -C pattern
but this assumes that there are 10 lines in each section. This is not an optimal way to do this; can someone show a better way. I also need to be able to print more than one line under each pattern that I find . So if I find data1 and there are additional lines under it I like to grab and print them
So output of command would be like
grep -C 10 HOSTNAME | grep data 1
grep -C 10 HOSTNAME | grep -A 2 data 1
HOSTNAME:Host1
data 1
HOSTNAME:Hoss2
data 1
Beside Grep I use this sed command to print my output
sed -r '/HOSTNAME|shared/!d' filename
The only problem with this sed command is that it only prints the lines that have patterns shared & HOSTNAME in them. I also need to specify the number of lines I like to print in my case under the line that matched patterns shared. So I like to print HOSTNAME and give the number of lines I like to print under second search pattern shared.
Thanks

awk to the rescue!
$ awk -v lines=2 '/HOSTNAME/{c=lines} NF&&c&&c--' file
HOSTNAME:host1
data 1
HOSTNAME:host-2
data 1
print lines number of lines including pattern match, skips empty lines.
If you want to specify secondary keyword instead number of lines
$ awk -v key='data 1' '/HOSTNAME/{h=1; print} h&&$0~key{print; h=0}' file
HOSTNAME:host1
data 1
HOSTNAME:host-2
data 1

Here is a sed twoliner:
sed -n -r '/HOSTNAME/ { p }
/^\s+data 1/ {p }' hostnames.txt
It prints (p)
when the line contains a HOSTNAME
when the line starts with some whitespace (\s+) followed by your search criterion (data 1)
non-mathing lines are not printed (due to the sed -n option)
Edit: Some remarks:
this was tested with GNU sed 4.2.2 under linux
you dont need the -r if your sed version does not support it, replace the second pattern to /^.*data 1/
we can squash everything in one line with ;
Putting it all together, here is a revised version in one line, without the need for the extended regex ( i.e without -r):
sed -n '/HOSTNAME/ { p } ; /^.*data 1/ {p }' hostnames.txt

The OP requirements seem to be very unclear, but the following is consistent with one interpretation of what has been requested, and more importantly, the program has no special requirements, and the code can easily be modified to meet a variety of requirements. In particular, both search patterns (the HOSTNAME pattern and the "data 1" pattern) can easily be parameterized.
The main idea is to print all lines in a specified subsection, or at least a certain number up to some limit.
If there is a limit on how many lines in a subsection should be printed, specify a value for limit, otherwise set it to 0.
awk -v limit=0 '
/^HOSTNAME:/ { subheader=0; hostname=1; print; next}
/^ *data 1/ { subheader=1; print; next }
/^ *data / { subheader=0; next }
subheader && (limit==0 || (subheader++ < limit)) { print }'
Given the lines provided in the question, the output would be:
HOSTNAME:host1
data 1
HOSTNAME:host-2
data 1
(Yes, I know the variable 'hostname' in the awk program is currently unused, but I included it to make it easy to add a test to satisfy certain obvious requirements regarding the preconditions for identifying a subheader.)

sed -n -e '/hostname/,+p' -e '/Duplex/,+p'
The simplest way to do it is to combine two sed commands ..

sed - pass match to external command

I have written a little script using sed to transform this:
kaefert#Ultrablech ~ $ cat /sys/class/power_supply/BAT0/uevent
POWER_SUPPLY_NAME=BAT0
POWER_SUPPLY_STATUS=Full
POWER_SUPPLY_PRESENT=1
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_CYCLE_COUNT=0
POWER_SUPPLY_VOLTAGE_MIN_DESIGN=7400000
POWER_SUPPLY_VOLTAGE_NOW=8370000
POWER_SUPPLY_POWER_NOW=0
POWER_SUPPLY_ENERGY_FULL_DESIGN=45640000
POWER_SUPPLY_ENERGY_FULL=44541000
POWER_SUPPLY_ENERGY_NOW=44541000
POWER_SUPPLY_MODEL_NAME=UX32-65
POWER_SUPPLY_MANUFACTURER=ASUSTeK
POWER_SUPPLY_SERIAL_NUMBER=
into a csv file format like this:
kaefert#Ultrablech ~ $ Documents/Asus\ Zenbook\ UX32VD/power_to_csv.sh
"date";"status";"voltage µV";"power µW";"energy full µWh";"energy now µWh"
2012-07-30 11:29:01;"Full";8369000;0;44541000;44541000
2012-07-30 11:29:02;"Full";8369000;0;44541000;44541000
2012-07-30 11:29:04;"Full";8369000;0;44541000;44541000
... (in a loop)
What I would like now is to divide each of those numbers by 1.000.000 so that they don't represent µV but V and W instead of µW, so that they are easily interpretable on a quick glance. Of course I could do this manually afterwards once I've opened this csv inside libre office calc, but I would like to automatize it.
So what I found is, that I can call external programs in between sed, like this:
...
s/\nPOWER_SUPPLY_PRESENT=1\nPOWER_SUPPLY_TECHNOLOGY=Li-ion\nPOWER_SUPPLY_CYCLE_COUNT=0\nPOWER_SUPPLY_VOLTAGE_MIN_DESIGN=7400000\nPOWER_SUPPLY_VOLTAGE_NOW=\([0-9]\{1,\}\)/";'`echo 0`'\1/
and that I could get values like I want by something like this:
echo "scale=6;3094030/1000000" | bc | sed 's/0\{1,\}$//'
But the problem now is, how do I pass my match "\1" into the external command?
If you are interested in looking at the full script, you'll find it there:
http://koega.no-ip.org/mediawiki/index.php/Battery_info_to_csv

if your sed is GNU sed. you can use 'e' to pass matched group to external command/tools within sed command.
an example might be helpful to make it clear:
say, you have a problem:
you have a string "120+20foobar" now you want to get the calculation result of 120+20 part, and replace "oo" to "xx" in "foobar" part.
Note that this example is not for solving the problem above, just for
showing the sed 'e' usage
so you could make 120+20 in the first match group, and rest in 2nd group, then pass two groups to different command/tools and then get the result. like:
kent$ echo "100+20foobar"|sed -r 's#([0-9+]*)(.*)#echo \1 \|bc\;echo \2 \| sed "s/oo/xx/g"#ge'
120
fxxbar
in this way, you could nest many seds one in another one, till you get lost. :D

As sed doesn't do arithmetic on its own I would recommend using awk for something like this, e.g. to divide 3rd, 5th and 6th field by a million do something like this:
awk -F';' -v OFS=';' '
NR == 1
NR != 1 {
$3 /= 1e6
$5 /= 1e6
$6 /= 1e6
print
}'
Explanation
-F';' and -v OFS=';' specify the input and output field separator.
NR == 1 pass first line through without change.
NR != 1 if it is not the first line, divide and print.

To divide by 1,000,000 directly, you do so :
Q='3094030/1000000'
sed ':r /^[[:digit:]]\{7\}/{s$\([[:digit:]]*\)\([[:digit:]]\{6\}\)/1000000$\1.\2$;p;d};s:^:0:;br;d'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string