Related
I have written an awk command
awk 'NR==5 {sub(substr($1,14,1),(substr($1,14,1) + 1)); print "test.py"}' > test.py
This is trying to change the 14th character on the 5th line of a python file. For some reason this doesn't stop executing and I have to break it. It also deletes the contents of the file.
Sample input:
import tools
tools.setup(
name='test',
tagvisc='0.0.8',
packages=tools.ges(),
line xyz
)
`
Output:
import tools
tools.setup(
name='test',
tagvisc='0.0.9',
packages=tools.ges(),
line xyz
)
If I understand the nuances of what you need to do now, you will need to split the first field of the 5th record into an array using "." as the fieldsep and then remove the "\"," from the end of the 3rd element of the array (optional) before incrementing the number and putting the field back together. You can do so with:
awk '{split($1,a,"."); sub(/["],/,"",a[3]); $1=a[1]"."a[2]"."(a[3]+1)"\","}1'
(NR==5 omitted for example)
Example Use/Output
$ echo 'tagvisc="3.4.30"', |
awk '{split($1,a,"."); sub(/["],/,"",a[3]); $1=a[1]"."a[2]"."(a[3]+1)"\","}1'
tagvisc="3.4.31",
I'll leave redirecting to a temp file and then back to the original to you. Let me know if this isn't what you need.
Adding NR == 5 you would have
awk 'NR==5 {split($1,a,"."); sub(/["],/,"",a[3]); $1=a[1]"."a[2]"."(a[3]+1)"\","}1' test.py > tmp; mv -f tmp test.py
Get away from the fixed line number (NR==5) and fixed character position (14) and instead look at dynamically finding what you want to change/increment, eg:
$ cat test.py
import tools
tools.setup(
name='test',
tagvisc='0.0.10',
packages=tools.ges(),
line xyz
)
One awk idea to increment the 10 (3rd line, 3rd numeric string in line):
awk '
/tagvisc=/ { split($0,arr,".") # split line on periods
sub("." arr[3]+0 "\047","." arr[3]+1 "\047") # replace .<oldvalue>\047 with .<newvalue>\047; \047 == single quote
}
1
' test.py
NOTES:
arr[3] = 10',; with arr[3]+0 awk will take the leftmost all-numeric content, strip off everything else, then add 0, leaving us with arr[3] = 10; same logic applies for arr[3]+1 (arr[3]+1 = 11); basically a trick for discarding any suffix that is not numeric
if there are multiple lines in the file with the string tagvisc='x.y.z' then this will change z in all of the lines; we can get around this by adding some more logic to only change the first occurrence, but I'll leave that out for now assuming it's not an issue
This generates:
import tools
tools.setup(
name='test',
tagvisc='0.0.11',
packages=tools.ges(),
line xyz
)
If the objective is to overwrite the original file with the new values you have a couple options:
# use temporary file:
awk '...' test.py > tmp ; mv tmp test.py
# if using GNU awk, and once accuracy of script has been verified:
awk -i inplace '...' test.py
Using awk to make changes to nth character in [mth] line in a file:
$ awk 'BEGIN{FS=OFS=""}NR==5{$18=9}1' file # > tmp && mv tmp file
Outputs:
import tools
tools.setup(
name='test',
tagvisc='0.0.9', <----- this is not output but points to what changed
packages=tools.ges(),
line xyz
)
Explained:
$ awk '
BEGIN {
FS=OFS="" # set the field separators to empty and you can reference
} # each char in record by a number
NR==5 { # 5th record
$18=9 # and 18th char is replaced with a 9
}1' file # > tmp && mv tmp file # output to a tmp file and replace
Notice: Some awks (probably all but GNU awk) will fail if you try to replace a multibyte char by a single byte one (for example utf8 ä (0xc3 0xa4) with an a (0x61) will result in 0x61 0xa4). Naturally an ä before the position you'd like to replace will set your calculations off by 1.
Oh yeah, you can replace one char with multiple chars but not vice versa.
something like this...
$ awk 'function join(a,k,s,sep) {for(k in a) {s=s sep a[k]; sep="."} return s}
BEGIN {FS=OFS="\""}
/^tagvisc=/{v[split($2,v,".")]++; $2=join(v)}1' file > newfile
Using GNU awk for the 3rd arg to match() and "inplace" editing:
$ awk -i inplace '
match($0,/^([[:space:]]*tagvisc=\047)([^\047]+)(.*)/,a) {
split(a[2],ver,".")
$0 = a[1] ver[1] "." ver[2] "." ver[3]+1 a[3]
}
{ print }
' test.py
$ cat test.py
import tools
tools.setup(
name='test',
tagvisc='0.0.9',
packages=tools.ges(),
line xyz
)
I have a file that contains two repeated strings, which are my markers to recognize. The marker is "(CO)VARIANCES" and each of theme following 3 different lines. I want to remove the 3 lines after the first string and replace it with the content of another file. Structure of file is like:
cat file1.txt
Something1
Something2
Something3
(CO)VARIANCES
44.572 0.28723E-01 0.0000
0.28723E-01 0.64501E-03 0.0000
0.0000 0.0000 0.0000
Something4
Something5
Something6
(CO)VARIANCES
34.891 0.38642E-01 1.7538
0.38642E-01 0.17122E-02 0.54735E-02
1.7538 0.54735E-02 0.23285
I want to remove three lines after (CO)VARIANCES and replace it with another file that contains something. The command
sed -e '/(CO)VARIANCES/{n;N;N;d}' file1.txt
is removing 3 lines after both markers and I don't know how to indicate the number of occurrence in this command. And I don't know how to conditionally paste the second content of the second file after those markers. Does somebody have an idea about that?
To perform the replacement on the Nth occurrence, consider the following script (N is the n-th occurrence)
awk -v N=3 '
# Delete if needed
DEL > 0 { DEL-- ; next } ;
# Find matching line
$1 == "(CO)VARIANCES" && (--N) == 0 {
print
DEL = 3 ;
# Include second file here.
while ( getline v < "file2.txt" ) print v ;
next
}
# Print all other lines
{ print }
'
This might work for you (GNU sed):
sed -En '/\(CO\)VARIANCES/{n;:a;s/[^\n]*/&/3;tb;N;ba;:b;p}' file2 |
sed -Ee '/\(CO\)VARIANCES/{n;:a;R /dev/stdin' -e 's/[^\n]*/&/3;tb;N;ba;:b;d}' file1
The solution consists of two sed invocations piped together.
The first sed invocation prepares a file consisting of the lines from file2 following the marker (in the this case 3).
The second sed invocation reads in the required number of lines from the first invocation and deletes the same number of lines from file1 following the marker.
I have a file with patterns like below
12345343|559|-2,0,-200000,-20|20161108000000|FL|62,859,1439,1956|0,0,21300,0|S
7778880|123|500,100|20161108000000|AL|21,135|3|S
I'm looking for a way to separate into multiple records mapping 3rd and 6th set of values
Required output:
12345343|559|-2|20161108000000|FL|62|0,0,21300,0|S
12345343|559|0|20161108000000|FL|859|0,0,21300,0|S
12345343|559|-200000|20161108000000|FL|1439|0,0,21300,0|S
12345343|559|-20|20161108000000|FL|1956|0,0,21300,0|S
7778880|123|500|20161108000000|AL|21|3|S
7778880|123|100|20161108000000|AL|135|3|S
This might work for you (GNU sed):
sed -r 's/^(.*\|.*\|)([^,]*),([^|]*)(\|.*\|.*\|)([^,]*),([^|]*)(.*)/\1\2\4\5\7\n\1\3\4\6\7/;P;D' file
Iteratively split the current line into pieces, building two lines separated by a newline. The first line contains the head of the 3rd and 6th fields, the second line contain the tails of the 3rd and 6th lines. Print then delete the first of the lines and then repeat till the lists in the 3rd and 6th fields are consumed.
You can use this awk
awk -F'|' -vOFS='|' '{a="";b=split($3,c,",");split($6,d,",");for(e=1;e<=b;e++){if(a)a=a RS;$3=c[e];$6=d[e];a=a$0};print a}' infile
Explanation :
awk -F'|' -vOFS='|' ' # fields are separate by | for input and output
{
a=""
b=split($3,c,",") # split field 3 in array c
# b is the number of elements
split($6,d,",") # split field 6 in array d
for(e=1;e<=b;e++) # for each element of array c and d
{
if(a) # if a is defined, append RS (\n) at the end
a=a RS
$3=c[e]
$6=d[e] # substitute fields 3 and 6 with the value of array c and d
a=a$0 # append the complete line to a
}
print a # at the end of the loop print a
}
' infile
I have a large data file in text format and I want to convert it to csv by specifying each column length.
number of columns = 5
column length
[4 2 5 1 1]
sample observations:
aasdfh9013512
ajshdj 2445df
Expected Output
aasd,fh,90135,1,2
ajsh,dj, 2445,d,f
GNU awk (gawk) supports this directly with FIELDWIDTHS, e.g.:
gawk '$1=$1' FIELDWIDTHS='4 2 5 1 1' OFS=, infile
Output:
aasd,fh,90135,1,2
ajsh,dj, 2445,d,f
I would use sed and catch the groups with the given length:
$ sed -r 's/^(.{4})(.{2})(.{5})(.{1})(.{1})$/\1,\2,\3,\4,\5/' file
aasd,fh,90135,1,2
ajsh,dj, 2445,d,f
Here's a solution that works with regular awk (does not require gawk).
awk -v OFS=',' '{print substr($0,1,4), substr($0,5,2), substr($0,7,5), substr($0,12,1), substr($0,13,1)}'
It uses awk's substr function to define each field's start position and length. OFS defines what the output field separator is (in this case, a comma).
(Side note: This only works if the source data does not have any commas. If the data has commas, then you have to escape them to be proper CSV, which is beyond the scope of this question.)
Demo:
echo 'aasdfh9013512
ajshdj 2445df' |
awk -v OFS=',' '{print substr($0,1,4), substr($0,5,2), substr($0,7,5), substr($0,12,1), substr($0,13,1)}'
Output:
aasd,fh,90135,1,2
ajsh,dj, 2445,d,f
Adding a Generic way of handling this(alternative to FIELDSWIDTH option) in awk(where we need not to harcode sub string positions, this will work as per position nuber provided by user wherever comma needs to be inserted) could be as follows, written and tested in GNU awk. To use this, we have to define values(like OP showed in samples), position numbers where we need to insert commas, awk variable name is colLength give position numbers with space between them.
awk -v colLengh="4 2 5 1 1" '
BEGIN{
num=split(colLengh,arr,OFS)
}
{
j=sum=0
while(++j<=num){
if(length($0)>sum){
sub("^.{"arr[j]+sum"}","&,")
}
sum+=arr[j]+1
}
}
1
' Input_file
Explanation: Simple explanation would be, creating awk variable named colLengh where we need to define position numbers wherever we need to insert commas. Then in BEGIN section creating array arr which has value of indexes where we need to insert commas in it.
In main program section first of all nullifying variables j and sum here. Then running a while loop from j=1 to till value of j becomes equal to num. In each run substituting from starting of current line(if length of current line is greater than sum else it doesn't make sense to perform substitution to I have put addiotnal check here) everything with everything + , as per need. Eg: sub function will become .{4} for first time loop runs then it becomes, .{7} because its 7th position we need to insert comma and so on. So sub will substitute those many characters from starting to till generated numbers with matched value + ,. At last in this program mentioning 1 will print edited/non-edited lines.
If any one is still looking for a solution, I have developed a small script in python. its easy to use provided you have python 3.5
https://github.com/just10minutes/FixedWidthToDelimited/blob/master/FixedWidthToDelimiter.py
"""
This script will convert Fixed width File into Delimiter File, tried on Python 3.5 only
Sample run: (Order of argument doesnt matter)
python ConvertFixedToDelimiter.py -i SrcFile.txt -o TrgFile.txt -c Config.txt -d "|"
Inputs are as follows
1. Input FIle - Mandatory(Argument -i) - File which has fixed Width data in it
2. Config File - Optional (Argument -c, if not provided will look for Config.txt file on same path, if not present script will not run)
Should have format as
FieldName,fieldLength
eg:
FirstName,10
SecondName,8
Address,30
etc:
3. Output File - Optional (Argument -o, if not provided will be used as InputFIleName plus Delimited.txt)
4. Delimiter - Optional (Argument -d, if not provided default value is "|" (pipe))
"""
from collections import OrderedDict
import argparse
from argparse import ArgumentParser
import os.path
import sys
def slices(s, args):
position = 0
for length in args:
length = int(length)
yield s[position:position + length]
position += length
def extant_file(x):
"""
'Type' for argparse - checks that file exists but does not open.
"""
if not os.path.exists(x):
# Argparse uses the ArgumentTypeError to give a rejection message like:
# error: argument input: x does not exist
raise argparse.ArgumentTypeError("{0} does not exist".format(x))
return x
parser = ArgumentParser(description="Please provide your Inputs as -i InputFile -o OutPutFile -c ConfigFile")
parser.add_argument("-i", dest="InputFile", required=True, help="Provide your Input file name here, if file is on different path than where this script resides then provide full path of the file", metavar="FILE", type=extant_file)
parser.add_argument("-o", dest="OutputFile", required=False, help="Provide your Output file name here, if file is on different path than where this script resides then provide full path of the file", metavar="FILE")
parser.add_argument("-c", dest="ConfigFile", required=False, help="Provide your Config file name here,File should have value as fieldName,fieldLength. if file is on different path than where this script resides then provide full path of the file", metavar="FILE",type=extant_file)
parser.add_argument("-d", dest="Delimiter", required=False, help="Provide the delimiter string you want",metavar="STRING", default="|")
args = parser.parse_args()
#Input file madatory
InputFile = args.InputFile
#Delimiter by default "|"
DELIMITER = args.Delimiter
#Output file checks
if args.OutputFile is None:
OutputFile = str(InputFile) + "Delimited.txt"
print ("Setting Ouput file as "+ OutputFile)
else:
OutputFile = args.OutputFile
#Config file check
if args.ConfigFile is None:
if not os.path.exists("Config.txt"):
print ("There is no Config File provided exiting the script")
sys.exit()
else:
ConfigFile = "Config.txt"
print ("Taking Config.txt file on this path as Default Config File")
else:
ConfigFile = args.ConfigFile
fieldNames = []
fieldLength = []
myvars = OrderedDict()
with open(ConfigFile) as myfile:
for line in myfile:
name, var = line.partition(",")[::2]
myvars[name.strip()] = int(var)
for key,value in myvars.items():
fieldNames.append(key)
fieldLength.append(value)
with open(OutputFile, 'w') as f1:
fieldNames = DELIMITER.join(map(str, fieldNames))
f1.write(fieldNames + "\n")
with open(InputFile, 'r') as f:
for line in f:
rec = (list(slices(line, fieldLength)))
myLine = DELIMITER.join(map(str, rec))
f1.write(myLine + "\n")
Portable awk
Generate an awk script with the appropriate substr commands
cat cols
4
2
5
1
1
<cols awk '{ print "substr($0,"p","$1")"; cs+=$1; p=cs+1 }' p=1
Output:
substr($0,1,4)
substr($0,5,2)
substr($0,7,5)
substr($0,12,1)
substr($0,13,1)
Combine lines and make it a valid awk-script:
<cols awk '{ print "substr($0,"p","$1")"; cs+=$1; p=cs+1 }' p=1 |
paste -sd, | sed 's/^/{ print /; s/$/ }/'
Output:
{ print substr($0,1,4),substr($0,5,2),substr($0,7,5),substr($0,12,1),substr($0,13,1) }
Redirect the above to a file, e.g. /tmp/t.awk and run it on the input-file:
<infile awk -f /tmp/t.awk
Output:
aasd fh 90135 1 2
ajsh dj 2445 d f
Or with comma as the output separator:
<infile awk -f /tmp/t.awk OFS=,
Output:
aasd,fh,90135,1,2
ajsh,dj, 2445,d,f
If I have 3 csv files, and I want to merge the data all into one, but beside each other, how would I do it? For example:
Initial Merged file:
,,,,,,,,,,,,
File 1:
20,09/05,5694
20,09/06,3234
20,09/08,2342
File 2:
20,09/05,2341
20,09/06,2334
20,09/09,342
File 3:
20,09/05,1231
20,09/08,3452
20,09/10,2345
20,09/11,372
Final merged File:
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,,,09/11,372
Basically data from each file goes into a specific column of the merged file.
I know the awk function can be used for this, but I have no clue how to start
EDIT: Only the 2nd and 3rd Columns of each file are being printed. I was using this to print out the 2nd and 3rd columns:
awk -v f="${i}" -F, 'match ($0,f) { print $2","$3 }' file3.csv > d$i.csv
however, say for example, file1 and file2 were null in that row, the data for that row would be shifted to the left. so I came up with this to account for the shift:
awk -v x="${i}" -F, 'match ($0,x) { if ($2='/NULL') { print "," }; else { print $2","$3}; }' alld.csv > d$i.csv
Using GNU awk for ARGIND:
$ gawk '{ a[FNR,ARGIND]=$0; maxFnr=(FNR>maxFnr?FNR:maxFnr) }
END {
for (i=1;i<=maxFnr;i++) {
for (j=1;j<ARGC;j++)
printf "%s%s", (j==1?"":",,,"), (a[i,j]?a[i,j]:",")
print ""
}
}
' file1 file2 file3
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,,,09/11,372
If you don't have GNU awk, just add an initial line that says FNR==1{ARGIND++}.
Commented version per request:
$ gawk '
{ a[FNR,ARGIND]=$0; # Store the current line in a 2-D array `a` indexed by
# the current line number `FNR` and file number `ARGIND`.
maxFnr=(FNR>maxFnr?FNR:maxFnr) # save the max FNR value
}
END{
for (i=1;i<=maxFnr;i++) { # Loop from 1 to max number of fields
# seen across all files and for each:
for (j=1;j<ARGC;j++) # Loop from 1 to total number of files parsed and:
printf "%s%s", # Print 2 strings, specifically:
(j==1?"":",,,"), # A field separator - empty if were printing
# the first field, three commas otherwise.
(a[i,j]?a[i,j]:",") # The value stored in the array if it was
# present in the files, a comma otherwise.
print "" # Print a newline
}
}
' file1 file2 file3
I originally was using an array fnr[FNR] to track the max value of FNR but IMHO that's kinda obscure and it has a flaw where if no lines have, say, a 2nd field then a loop on for (i=1;i in fnr;i++) in the END section would bail out before getting to the 3rd field.
paste is done for this:
$ paste -d";" f1 f2 f3 | sed 's/;/,,,/g'
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,09/11,372
Note that the paste alone will output just one comma:
$ paste -d, f1 f2 f3
09/05,5694,09/05,2341,09/05,1231
09/06,3234,09/06,2334,09/08,3452
09/08,2342,09/09,342,09/10,2345
,,09/11,372
So to have multiple ones we can use another delimiter like ; and then replace by ,,, with sed:
$ paste -d";" f1 f2 f3 | sed 's/;/,,,/g'
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,09/11,372
Using pr:
$ pr -mts',,,' file[1-3]
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,09/11,372