grep get text from specific section only from bash - linux

I have the following config file:
[general]
a=b
b=c
...
mykey=myvalue
n=X
[prod]
a=b
b=c
mykey=myvalue2
...
I want to get mykey only from [general] section.
What I have been tried is the follow:
cat my.config | grep mykey
And as except I got two results:
mykey=myvalue
mykey=myvalue2
[general] section isn't always appear on the first part of the config file.
How can I get mykey that appear under [general] section using linux commands?

Here's one with awk:
$ awk -v RS="" ' # process empty line separated blocks
$1=="[general]" { # if a block starts with a key string
for(i=2;i<=NF;i++) # iterate records or fields in this case
if($i~/^mykey=/) { # find the key
print $i # and output the field
exit # once found, no point in continuing the search
}
}' file
Output:
mykey=myvalue

You can get the values between [general] and the next squared parameter.
awk '/^\[/{f=0} f; /\[general\]/{f=1}' file.config | grep mykey

you can use python script
ini2arr.py
#!/usr/bin/env python
import sys, ConfigParser
config = ConfigParser.ConfigParser()
config.readfp(sys.stdin)
for sec in config.sections():
print "declare -A %s" % (sec)
for key, val in config.items(sec):
print '%s[%s]="%s"' % (sec, key, val)
then
eval "$(cat t.ini | ./ini2arr.py)"
echo ${general["mykey"]}
EDIT OR:
#!/usr/bin/env python
import sys
import ConfigParser
section_filter = sys.argv[1]
key_filter = sys.argv[2]
config = ConfigParser.ConfigParser()
config.readfp(sys.stdin)
print '%s[%s]="%s"' % (section_filter, key_filter, config.get(section_filter, key_filter))
then
cat t.ini | ./ini2arr.py prod a

Here is another awk solution (using standard linux awk/gawk)
/\[general\]/,/^$/ {if ($0 ~ "mykey") print}
Explanation
/\[general\]/,/^$/ # match lines range : starting with "[general]" and ending with "" (empty line)
{ # for each line in range
if ($0 ~ "mykey") # if line match regex pattern "mykey"
print $0 # print the line
}

Related

How to reformat the contents of a text file?

move_lines.sh originally looked like this:
mv /home/user/filename.txt /home/user/filename.txt.old
mv /home/user/filename1.txt /home/user/filename1.txt.old
Code used so far:
awk '{print $1,$3}' move_lines.sh >> move_lines.sh
cd /home/user/oldfiles
currentdir=$(pwd)
echo "$currentdir" >> move_lines.sh
echo "$currentdir" >> move_lines.sh
unset $currentdir
Current output:
mv /home/user/filename.txt /home/user/filename.txt.old
mv /home/user/filename1.txt /home/user/filename1.txt.old
mv /home/user/filename1.txt.old
mv /home/user/filename1.txt.old
/home/user/oldfiles/
/home/user/oldfiles/
The goal is to make it look like this:
mv /home/user/filename.txt /home/user/filename.txt.old
mv /home/user/filename1.txt /home/user/filename1.txt.old
mv /home/user/filename1.txt.old /home/user/oldfiles/
mv /home/user/filename1.txt.old /home/user/oldfiles/
Unsure how to accomplish this. Any help would be greatly appreciated.
You can correct this using Python like this:
Let's say your input file is like this
Assuming that:
Line 5 should go behind line 3
Line 6 should go behind line 4,
here's the code you can write
import os
input_file = 'input.txt'
output_file = 'output.txt'
default_path = '/home/user/oldfiles/'
# remove the output file
if os.path.exists(output_file):
os.remove(output_file)
# read all lines from input file
with open(input_file, 'r') as infile:
input_lines = infile.readlines()
# let's store good lines in output_lines and
# lines with missing last argument in orphaned_files
output_lines = []
orphaned_lines = []
# loop through all the read lines
for line in input_lines:
# remove empty spaces and new line characters at the end of the line
line = line.strip()
print('Reading line ', line)
# split by space and count # of arguments
arguments = line.split(' ')
print('Arguments count is ', len(arguments))
# this is the ideal line. Copy it to output_lines
if len(arguments) > 2:
print('Adding it to output_lines')
output_lines.append(line)
# this line is missing the last argument. Hold it in orphaned_lines
if len(arguments) == 2:
print('Found orphaned')
orphaned_lines.append(line)
# we found just the path. Add that to the first orphaned_line
# then, take that new line and put it in output_lines
if len(arguments) == 1:
# if there are any orphaned lines left, add this path
# to the first orphaned line
if len(orphaned_lines) > 0:
print('Adding it to the first orphaned')
orphaned_line = orphaned_lines.pop(0)
new_line = orphaned_line + ' ' + line
output_lines.append(new_line)
# if there are any orphaned lines still left, let's give them
# the default path
for line in orphaned_lines:
new_line = line + ' ' + default_path
output_lines.append(new_line)
# write to an output file
with open(output_file, 'w') as outfile:
for line in output_lines:
outfile.write(line + '\n')
print('Done')
How do you run this file?
Save the code in a file called test.py
Assuming your input file that contains imperfect lines is called input.txt
From command line, type python3 test.py
You will get a file called output.txt that will look like this:
Easier option with AWK
On command line, type:
awk '{n=split($0,a); if(n==3) print $0; else if (n==2) print $0" /home/user/oldfiles/" }' input.txt > output.txt
In this one-liner, we ask awk to:
split the line (represented by $0). When we split the line, the number of items are returned to a variable called n
If awk finds 3 items, we print the line
If awk finds 2 items, we print the line followed by a space and the default path you desire
Otherwise, don't print anything.
The output will go into output.txt and will look the same as the 2nd screenshot above.
Update
Based on your code, you could just do this on your bash prompt.
awk '{n=split($0,a); if(n==3) print $0; else if (n==2) print $0" /home/user/oldfiles/" }' move_lines.sh > new_move_lines.sh

Using awk to make changes to nth character in nth line in a file

I have written an awk command
awk 'NR==5 {sub(substr($1,14,1),(substr($1,14,1) + 1)); print "test.py"}' > test.py
This is trying to change the 14th character on the 5th line of a python file. For some reason this doesn't stop executing and I have to break it. It also deletes the contents of the file.
Sample input:
import tools
tools.setup(
name='test',
tagvisc='0.0.8',
packages=tools.ges(),
line xyz
)
`
Output:
import tools
tools.setup(
name='test',
tagvisc='0.0.9',
packages=tools.ges(),
line xyz
)
If I understand the nuances of what you need to do now, you will need to split the first field of the 5th record into an array using "." as the fieldsep and then remove the "\"," from the end of the 3rd element of the array (optional) before incrementing the number and putting the field back together. You can do so with:
awk '{split($1,a,"."); sub(/["],/,"",a[3]); $1=a[1]"."a[2]"."(a[3]+1)"\","}1'
(NR==5 omitted for example)
Example Use/Output
$ echo 'tagvisc="3.4.30"', |
awk '{split($1,a,"."); sub(/["],/,"",a[3]); $1=a[1]"."a[2]"."(a[3]+1)"\","}1'
tagvisc="3.4.31",
I'll leave redirecting to a temp file and then back to the original to you. Let me know if this isn't what you need.
Adding NR == 5 you would have
awk 'NR==5 {split($1,a,"."); sub(/["],/,"",a[3]); $1=a[1]"."a[2]"."(a[3]+1)"\","}1' test.py > tmp; mv -f tmp test.py
Get away from the fixed line number (NR==5) and fixed character position (14) and instead look at dynamically finding what you want to change/increment, eg:
$ cat test.py
import tools
tools.setup(
name='test',
tagvisc='0.0.10',
packages=tools.ges(),
line xyz
)
One awk idea to increment the 10 (3rd line, 3rd numeric string in line):
awk '
/tagvisc=/ { split($0,arr,".") # split line on periods
sub("." arr[3]+0 "\047","." arr[3]+1 "\047") # replace .<oldvalue>\047 with .<newvalue>\047; \047 == single quote
}
1
' test.py
NOTES:
arr[3] = 10',; with arr[3]+0 awk will take the leftmost all-numeric content, strip off everything else, then add 0, leaving us with arr[3] = 10; same logic applies for arr[3]+1 (arr[3]+1 = 11); basically a trick for discarding any suffix that is not numeric
if there are multiple lines in the file with the string tagvisc='x.y.z' then this will change z in all of the lines; we can get around this by adding some more logic to only change the first occurrence, but I'll leave that out for now assuming it's not an issue
This generates:
import tools
tools.setup(
name='test',
tagvisc='0.0.11',
packages=tools.ges(),
line xyz
)
If the objective is to overwrite the original file with the new values you have a couple options:
# use temporary file:
awk '...' test.py > tmp ; mv tmp test.py
# if using GNU awk, and once accuracy of script has been verified:
awk -i inplace '...' test.py
Using awk to make changes to nth character in [mth] line in a file:
$ awk 'BEGIN{FS=OFS=""}NR==5{$18=9}1' file # > tmp && mv tmp file
Outputs:
import tools
tools.setup(
name='test',
tagvisc='0.0.9', <----- this is not output but points to what changed
packages=tools.ges(),
line xyz
)
Explained:
$ awk '
BEGIN {
FS=OFS="" # set the field separators to empty and you can reference
} # each char in record by a number
NR==5 { # 5th record
$18=9 # and 18th char is replaced with a 9
}1' file # > tmp && mv tmp file # output to a tmp file and replace
Notice: Some awks (probably all but GNU awk) will fail if you try to replace a multibyte char by a single byte one (for example utf8 ä (0xc3 0xa4) with an a (0x61) will result in 0x61 0xa4). Naturally an ä before the position you'd like to replace will set your calculations off by 1.
Oh yeah, you can replace one char with multiple chars but not vice versa.
something like this...
$ awk 'function join(a,k,s,sep) {for(k in a) {s=s sep a[k]; sep="."} return s}
BEGIN {FS=OFS="\""}
/^tagvisc=/{v[split($2,v,".")]++; $2=join(v)}1' file > newfile
Using GNU awk for the 3rd arg to match() and "inplace" editing:
$ awk -i inplace '
match($0,/^([[:space:]]*tagvisc=\047)([^\047]+)(.*)/,a) {
split(a[2],ver,".")
$0 = a[1] ver[1] "." ver[2] "." ver[3]+1 a[3]
}
{ print }
' test.py
$ cat test.py
import tools
tools.setup(
name='test',
tagvisc='0.0.9',
packages=tools.ges(),
line xyz
)

Reformat a long file using awk/sed

I have a very long file. The contents of the file are like:
myserver1
kernel_version
os
myserver2
kernel_version
os
myserver3
kernel_version
os
...
There are more than 10.000 entries and 3 entries for each host. Hostname, kernel_version and OS version.
I would like to have an output like:
myserver1, kernel_version, os
myserver2, kernel_version, os
myserver3, kernel_version, os
...
instead. So what is the best awk/sed command to provide this output?
With sed:
$ sed '/^$/d;N;N;s/\n/, /g' infile
myserver1, kernel_version, os
myserver2, kernel_version, os
myserver3, kernel_version, os
This works as follows:
/^$/d # Delete line if empty (skips rest of commands)
N # Append second line to pattern space
N # Append third line to pattern space
s/\n/, /g # Replace newlines by comma and a blank
If you want the criterion for the line to be skipped not be "empty line" but its line number (4, 8, 12...), you can replace the first command (this is a GNU extension):
sed '4~4d;N;N;s/\n/, /g' infile
You can also use paste:
paste -d ',,\0' - - - - <file
You can use :
awk 'BEGIN{RS="";OFS=", "} {print $1,$2,$3}' data.txt
defining record separator as empty line with output field separator (OFS) as ", "
You can also use :
awk 'BEGIN{RS="";OFS=", "} {$1=$1; print $0}' data.txt
$1=$1 forces the record to be reconstituted, see this
While AWK/SED could help you perform this task, a better way would be to use Python, assuming that the *NIX system you are working on has it installed to process this data.
You could use the following in python to process this quite easily:
import csv
output_file = csv.writer(open("/path/to/output/file","w"))
column_num = 3 # number of columns in your end-state data
with open("</path/to/your/input/file>","r") as input:
row = []
iteration_counter = 0
for line in input:
iteration_counter += 1
stripped = line.strip() # to remove the newlines (\n)
if iteration_counter <= column_num:
row.append(stripped)
else:
iteration_counter = 0 # reset the counter to 0
output_writer.writerow(row) # output the list as a csv row
row = [] # clear the row list to nothing
iteration_counter += 1
row.append(stripped)

How to read a file each n characters instead of each line using awk?

This is the content of file.txt:
hello bro
my nam§
is Jhon Does
The file could also contain non-printable characters (for example \x00, or \x02), and, as you can see, the lenght of the lines are not the same.
Then I want to read it each each 5 characters without having into a count line breaks. I thought in something like this using awk:
awk -v RS='' '{
s=s $0;
}END{
n=length(s);
for(x=1; x<n; x=x+5){
# Here I will put some calcs and stuff
i++;
print "line " i ": #" substr(s,x,5) "#"
}
}' file.txt
The output is the following:
line 1: #hello#
line 2: # bro
#
line 3: #my na#
line 4: #m§
is#
line 5: # Jhon#
line 6: # Does#
It works perfectly, but the input file will be very large, so the performance is important.
In short, I'm looking for something like this:
awk -v RS='.{5}' '{ # Here I will put some calcs and stuff }'
But it doesn't works.
Another alternative that works ok:
xxd -ps mifile.txt | tr -d '\n' | fold -w 10 | awk '{print "23" $0 "230a"}' | xxd -ps -r
Do you have any idea or alternative? Thank you.
I'm not sure I understand what you want but this outputs the same as the script in your question that you say works perfectly so hopefully this is it:
$ awk -v RS='.{5}' 'RT!=""{ print "line", NR ": #" RT "#" }' file
line 1: #hello#
line 2: # bro
#
line 3: #my na#
line 4: #m§
is#
line 5: # Jhon#
line 6: # Does#
The above uses GNU awk for multi-char RS and RT.
If you are okay with Python, You may try this
f = open('filename', 'r+')
w = f.read(5)
while(w != ''):
print w;
w = f.read(5);
f.close()
You can use perl and binmode assuming you are using normal characters.
use strict;
use warnings;
open my $fh, '<', 'test';
#open the file.
binmode $fh;
# Set to binary mode
$/ = \5;
#Read a record as 5 bytes
while(<$fh>){
#Read records
print "$_#"
#Do whatever calculations you want here
}
For extended character sets you can use UTF8 and read every 5 characters instead of bytes.
use strict;
use warnings;
open my $fh, '<:utf8', 'test';
#open file in utf8.
binmode(STDOUT, ":utf8");
# Set stdout to utf8 as well
while ((read($fh, my $data, 5)) != 0){
#Read 5 characters into variable data
print "$data#";
#Do whatever you want with data here
}
So you asked How to read a file each n characters instead of each line using awk.
Solution:
If you have a modern gawk implementation use FPAT
Normally, when using FS, gawk defines the fields as the parts of the
record that occur in between each field separator. In other words, FS
defines what a field is not, instead of what a field is. However,
there are times when you really want to define the fields by what they
are, and not by what they are not.
Code:
gawk 'BEGIN{FS="\n";RS="";FPAT=".{,5}"}
{for (i=1;i<=NF;i++){
printf("$%d = <%s>\n", i, $i)}
}' file
Check the demo

convert a fixed width file from text to csv

I have a large data file in text format and I want to convert it to csv by specifying each column length.
number of columns = 5
column length
[4 2 5 1 1]
sample observations:
aasdfh9013512
ajshdj 2445df
Expected Output
aasd,fh,90135,1,2
ajsh,dj, 2445,d,f
GNU awk (gawk) supports this directly with FIELDWIDTHS, e.g.:
gawk '$1=$1' FIELDWIDTHS='4 2 5 1 1' OFS=, infile
Output:
aasd,fh,90135,1,2
ajsh,dj, 2445,d,f
I would use sed and catch the groups with the given length:
$ sed -r 's/^(.{4})(.{2})(.{5})(.{1})(.{1})$/\1,\2,\3,\4,\5/' file
aasd,fh,90135,1,2
ajsh,dj, 2445,d,f
Here's a solution that works with regular awk (does not require gawk).
awk -v OFS=',' '{print substr($0,1,4), substr($0,5,2), substr($0,7,5), substr($0,12,1), substr($0,13,1)}'
It uses awk's substr function to define each field's start position and length. OFS defines what the output field separator is (in this case, a comma).
(Side note: This only works if the source data does not have any commas. If the data has commas, then you have to escape them to be proper CSV, which is beyond the scope of this question.)
Demo:
echo 'aasdfh9013512
ajshdj 2445df' |
awk -v OFS=',' '{print substr($0,1,4), substr($0,5,2), substr($0,7,5), substr($0,12,1), substr($0,13,1)}'
Output:
aasd,fh,90135,1,2
ajsh,dj, 2445,d,f
Adding a Generic way of handling this(alternative to FIELDSWIDTH option) in awk(where we need not to harcode sub string positions, this will work as per position nuber provided by user wherever comma needs to be inserted) could be as follows, written and tested in GNU awk. To use this, we have to define values(like OP showed in samples), position numbers where we need to insert commas, awk variable name is colLength give position numbers with space between them.
awk -v colLengh="4 2 5 1 1" '
BEGIN{
num=split(colLengh,arr,OFS)
}
{
j=sum=0
while(++j<=num){
if(length($0)>sum){
sub("^.{"arr[j]+sum"}","&,")
}
sum+=arr[j]+1
}
}
1
' Input_file
Explanation: Simple explanation would be, creating awk variable named colLengh where we need to define position numbers wherever we need to insert commas. Then in BEGIN section creating array arr which has value of indexes where we need to insert commas in it.
In main program section first of all nullifying variables j and sum here. Then running a while loop from j=1 to till value of j becomes equal to num. In each run substituting from starting of current line(if length of current line is greater than sum else it doesn't make sense to perform substitution to I have put addiotnal check here) everything with everything + , as per need. Eg: sub function will become .{4} for first time loop runs then it becomes, .{7} because its 7th position we need to insert comma and so on. So sub will substitute those many characters from starting to till generated numbers with matched value + ,. At last in this program mentioning 1 will print edited/non-edited lines.
If any one is still looking for a solution, I have developed a small script in python. its easy to use provided you have python 3.5
https://github.com/just10minutes/FixedWidthToDelimited/blob/master/FixedWidthToDelimiter.py
"""
This script will convert Fixed width File into Delimiter File, tried on Python 3.5 only
Sample run: (Order of argument doesnt matter)
python ConvertFixedToDelimiter.py -i SrcFile.txt -o TrgFile.txt -c Config.txt -d "|"
Inputs are as follows
1. Input FIle - Mandatory(Argument -i) - File which has fixed Width data in it
2. Config File - Optional (Argument -c, if not provided will look for Config.txt file on same path, if not present script will not run)
Should have format as
FieldName,fieldLength
eg:
FirstName,10
SecondName,8
Address,30
etc:
3. Output File - Optional (Argument -o, if not provided will be used as InputFIleName plus Delimited.txt)
4. Delimiter - Optional (Argument -d, if not provided default value is "|" (pipe))
"""
from collections import OrderedDict
import argparse
from argparse import ArgumentParser
import os.path
import sys
def slices(s, args):
position = 0
for length in args:
length = int(length)
yield s[position:position + length]
position += length
def extant_file(x):
"""
'Type' for argparse - checks that file exists but does not open.
"""
if not os.path.exists(x):
# Argparse uses the ArgumentTypeError to give a rejection message like:
# error: argument input: x does not exist
raise argparse.ArgumentTypeError("{0} does not exist".format(x))
return x
parser = ArgumentParser(description="Please provide your Inputs as -i InputFile -o OutPutFile -c ConfigFile")
parser.add_argument("-i", dest="InputFile", required=True, help="Provide your Input file name here, if file is on different path than where this script resides then provide full path of the file", metavar="FILE", type=extant_file)
parser.add_argument("-o", dest="OutputFile", required=False, help="Provide your Output file name here, if file is on different path than where this script resides then provide full path of the file", metavar="FILE")
parser.add_argument("-c", dest="ConfigFile", required=False, help="Provide your Config file name here,File should have value as fieldName,fieldLength. if file is on different path than where this script resides then provide full path of the file", metavar="FILE",type=extant_file)
parser.add_argument("-d", dest="Delimiter", required=False, help="Provide the delimiter string you want",metavar="STRING", default="|")
args = parser.parse_args()
#Input file madatory
InputFile = args.InputFile
#Delimiter by default "|"
DELIMITER = args.Delimiter
#Output file checks
if args.OutputFile is None:
OutputFile = str(InputFile) + "Delimited.txt"
print ("Setting Ouput file as "+ OutputFile)
else:
OutputFile = args.OutputFile
#Config file check
if args.ConfigFile is None:
if not os.path.exists("Config.txt"):
print ("There is no Config File provided exiting the script")
sys.exit()
else:
ConfigFile = "Config.txt"
print ("Taking Config.txt file on this path as Default Config File")
else:
ConfigFile = args.ConfigFile
fieldNames = []
fieldLength = []
myvars = OrderedDict()
with open(ConfigFile) as myfile:
for line in myfile:
name, var = line.partition(",")[::2]
myvars[name.strip()] = int(var)
for key,value in myvars.items():
fieldNames.append(key)
fieldLength.append(value)
with open(OutputFile, 'w') as f1:
fieldNames = DELIMITER.join(map(str, fieldNames))
f1.write(fieldNames + "\n")
with open(InputFile, 'r') as f:
for line in f:
rec = (list(slices(line, fieldLength)))
myLine = DELIMITER.join(map(str, rec))
f1.write(myLine + "\n")
Portable awk
Generate an awk script with the appropriate substr commands
cat cols
4
2
5
1
1
<cols awk '{ print "substr($0,"p","$1")"; cs+=$1; p=cs+1 }' p=1
Output:
substr($0,1,4)
substr($0,5,2)
substr($0,7,5)
substr($0,12,1)
substr($0,13,1)
Combine lines and make it a valid awk-script:
<cols awk '{ print "substr($0,"p","$1")"; cs+=$1; p=cs+1 }' p=1 |
paste -sd, | sed 's/^/{ print /; s/$/ }/'
Output:
{ print substr($0,1,4),substr($0,5,2),substr($0,7,5),substr($0,12,1),substr($0,13,1) }
Redirect the above to a file, e.g. /tmp/t.awk and run it on the input-file:
<infile awk -f /tmp/t.awk
Output:
aasd fh 90135 1 2
ajsh dj 2445 d f
Or with comma as the output separator:
<infile awk -f /tmp/t.awk OFS=,
Output:
aasd,fh,90135,1,2
ajsh,dj, 2445,d,f

Resources