Bash script to replace matched substrings within larger substring - string

I'm trying to write a bash script to replace the newline characters and *s from comments, but only if that comment contains a particular substring.
// file.txt
/**
* Here is a multiline
* comment that contains substring
* rest of it
*/
/**
* Here is a multiline
* comment that does not contain subNOTstring
* rest of it
*/
I would like the final result to be:
// file.txt
/** Here is a multiline comment that contains substring rest of it */
/**
* Here is a multiline
* comment that does not contain subNOTstring
* rest of it
*/
I have a regex that matches multiline comments: \/\*([^*]|[\r\n]|(\*+([^*\/]|[\r\n])))*\*\/ but can't figure out the second part, of only matching with the substring, and then replacing all the /n * with just
So to make sure my question is articulated correctly
Make a match of a substring within a file. i.e. comment
Make sure that match includes substring.
Replace all substring within the first match with another string. i.e. n/ * with

If python is your option, would you please try:
#!/usr/bin/python
import re # use regex module
with open('file.txt') as f: # open "file.txt" to read
str = f.read() # assign "str" to the lines of the file
for i in re.split(r'(/\*.*?\*/)', str, flags=re.DOTALL): # split the file on the comment including the comment in the result
if re.match(r'/\*.*substring', i, flags=re.DOTALL): # if the comment includes the keyword "substring"
i = re.sub(r'\n \* |\n (?=\*/)', ' ', i) # then replace the newline and the asterisk with a whitespace
print(i, end='') # print the element without adding newline
re.split(r'(/\*.*?\*/)', str, flags=re.DOTALL) splits "str" on the comment
including the comment in the splitted list.
The flags=re.DOTALL option makes a dot match with newline characters.
for i in .. syntax loops over the list assiging "i" to each element.
re.match(r'/\*.*substring', i, flags=re.DOTALL) matches the element
which is a comment including the keyword "substring".
re.sub(r'\n \* |\n (?=\*/)', ' ', i) replaces a newline followed by
the " * " in the next line with a whitespace.
\n (?=\*/) is a positive lookahead which matches a newline followed
by " */". It will match the last line of the comment block leaving the
"*/" as is.
[Edit]
If you want to embed the python script in bash, would you please try:
#!/bin/bash
infile="file.txt" # modify according to your actual filename
tmpfile=$(mktemp /tmp/temp.XXXXXX) # temporary file to output
# start of python script
python3 -c "
import re, sys
filename = sys.argv[1]
with open(filename) as f:
str = f.read()
for i in re.split(r'(/\*.*?\*/)', str, flags=re.DOTALL):
if re.match(r'/\*.*substring', i, flags=re.DOTALL):
i = re.sub(r'\n \* |\n (?=\*/)', ' ', i)
print(i, end='')
" "$infile" > "$tmpfile"
# end of python script
mv -f -- "$infile" "$infile".bak # backup the original file
mv -f -- "$tmpfile" "$infile" # replace the input file with the output

Related

Using awk to make changes to nth character in nth line in a file

I have written an awk command
awk 'NR==5 {sub(substr($1,14,1),(substr($1,14,1) + 1)); print "test.py"}' > test.py
This is trying to change the 14th character on the 5th line of a python file. For some reason this doesn't stop executing and I have to break it. It also deletes the contents of the file.
Sample input:
import tools
tools.setup(
name='test',
tagvisc='0.0.8',
packages=tools.ges(),
line xyz
)
`
Output:
import tools
tools.setup(
name='test',
tagvisc='0.0.9',
packages=tools.ges(),
line xyz
)
If I understand the nuances of what you need to do now, you will need to split the first field of the 5th record into an array using "." as the fieldsep and then remove the "\"," from the end of the 3rd element of the array (optional) before incrementing the number and putting the field back together. You can do so with:
awk '{split($1,a,"."); sub(/["],/,"",a[3]); $1=a[1]"."a[2]"."(a[3]+1)"\","}1'
(NR==5 omitted for example)
Example Use/Output
$ echo 'tagvisc="3.4.30"', |
awk '{split($1,a,"."); sub(/["],/,"",a[3]); $1=a[1]"."a[2]"."(a[3]+1)"\","}1'
tagvisc="3.4.31",
I'll leave redirecting to a temp file and then back to the original to you. Let me know if this isn't what you need.
Adding NR == 5 you would have
awk 'NR==5 {split($1,a,"."); sub(/["],/,"",a[3]); $1=a[1]"."a[2]"."(a[3]+1)"\","}1' test.py > tmp; mv -f tmp test.py
Get away from the fixed line number (NR==5) and fixed character position (14) and instead look at dynamically finding what you want to change/increment, eg:
$ cat test.py
import tools
tools.setup(
name='test',
tagvisc='0.0.10',
packages=tools.ges(),
line xyz
)
One awk idea to increment the 10 (3rd line, 3rd numeric string in line):
awk '
/tagvisc=/ { split($0,arr,".") # split line on periods
sub("." arr[3]+0 "\047","." arr[3]+1 "\047") # replace .<oldvalue>\047 with .<newvalue>\047; \047 == single quote
}
1
' test.py
NOTES:
arr[3] = 10',; with arr[3]+0 awk will take the leftmost all-numeric content, strip off everything else, then add 0, leaving us with arr[3] = 10; same logic applies for arr[3]+1 (arr[3]+1 = 11); basically a trick for discarding any suffix that is not numeric
if there are multiple lines in the file with the string tagvisc='x.y.z' then this will change z in all of the lines; we can get around this by adding some more logic to only change the first occurrence, but I'll leave that out for now assuming it's not an issue
This generates:
import tools
tools.setup(
name='test',
tagvisc='0.0.11',
packages=tools.ges(),
line xyz
)
If the objective is to overwrite the original file with the new values you have a couple options:
# use temporary file:
awk '...' test.py > tmp ; mv tmp test.py
# if using GNU awk, and once accuracy of script has been verified:
awk -i inplace '...' test.py
Using awk to make changes to nth character in [mth] line in a file:
$ awk 'BEGIN{FS=OFS=""}NR==5{$18=9}1' file # > tmp && mv tmp file
Outputs:
import tools
tools.setup(
name='test',
tagvisc='0.0.9', <----- this is not output but points to what changed
packages=tools.ges(),
line xyz
)
Explained:
$ awk '
BEGIN {
FS=OFS="" # set the field separators to empty and you can reference
} # each char in record by a number
NR==5 { # 5th record
$18=9 # and 18th char is replaced with a 9
}1' file # > tmp && mv tmp file # output to a tmp file and replace
Notice: Some awks (probably all but GNU awk) will fail if you try to replace a multibyte char by a single byte one (for example utf8 ä (0xc3 0xa4) with an a (0x61) will result in 0x61 0xa4). Naturally an ä before the position you'd like to replace will set your calculations off by 1.
Oh yeah, you can replace one char with multiple chars but not vice versa.
something like this...
$ awk 'function join(a,k,s,sep) {for(k in a) {s=s sep a[k]; sep="."} return s}
BEGIN {FS=OFS="\""}
/^tagvisc=/{v[split($2,v,".")]++; $2=join(v)}1' file > newfile
Using GNU awk for the 3rd arg to match() and "inplace" editing:
$ awk -i inplace '
match($0,/^([[:space:]]*tagvisc=\047)([^\047]+)(.*)/,a) {
split(a[2],ver,".")
$0 = a[1] ver[1] "." ver[2] "." ver[3]+1 a[3]
}
{ print }
' test.py
$ cat test.py
import tools
tools.setup(
name='test',
tagvisc='0.0.9',
packages=tools.ges(),
line xyz
)

How to extract a string after matching characters from a variable in shell script [duplicate]

This question already has answers here:
Can grep show only words that match search pattern?
(15 answers)
Closed 2 years ago.
I have a file with following text as below
classA = Something
classB = AB1234567
classC = Something more
classD = Something Else
Objective:
Using a shell script, I want to read the text which says AB1234567 from above complete text.
So to start, I can read the second line of the above text using following logic in my shell script:
secondLine=`sed -n '2p' my_file`;
echo $secondLine;
secondLine outputs classB = AB1234567. How do I extract AB1234567 from classB = AB1234567 in my shell script?
Question:
Considering the fact that AB is common in that particular part of the text all the files I deal with, how can I make sed to read all the numbers after AB?
Please note that classB = AB1234567 could end with a space or a newline. And I need to get this into a variable
Try:
sed '2{ s/^classB = \(AB[^ ]*\) *$/\1/;q } ;d' your_fileName
2 is the line number.
{ open a sed group command.
s/ substitute below match
^ is anchor for beginning of the line
\(...\) is known a capture group with \1 as its back-reference
[^ ]* means any character but not a space
\(AB[^ ]*\) capture AB followed by anything until first space seen but not spaces (back-reference is \1)
* means zero-or-more spaces
$ is anchor for end of the line
/ with below
\1 back-reference of above capture group
/ end of substitution
q quit to avoid reading rest of the file unnecessarily
} close group command.
d delete any other lines before seen line number 2.
get into variable:
your_variableName=$(sed '2{ s/^classB = \(AB[^ ]*\) *$/\1/;q } ;d' your_fileName)
Could you please try following, looks should be easy in awk. Considering you want to print 2nd line and print only digits in last field.
secondLine=$(awk 'FNR==2{sub(/[^0-9]*/,"",$NF);print $NF}' Input_file)
You may try this awk:
awk -F ' *= *' '$1 ~ /B$/ { print $2 }' file
AB1234567
I'm not 100% sure this is what you're looking for, but if you know there's only a single element in the file that starts with AB, this will get it into a variable:
$ cat sample.txt
classA = Something
classB = AB1234567
classC = Something more
classD = Something Else
$ x=$(perl -ne 'print if s/^.*\s+(AB\S+)\s*$/$1/' sample.txt)
$ echo "the variable is: $x"
the variable is: AB1234567
Explanation of the regex:
^ beginning of line
.* anything
\s+ any number of spaces
(AB\S+) anything that starts with AB followed by non-spaces
\s*$ Zero or more spaces followed by the end of the line.

sed remove characters between two strings on different lines [duplicate]

This question already has answers here:
How to extract characters between the delimiters using sed?
(2 answers)
Closed 2 years ago.
I would like to remove all the text between the strings /* and */ in a file, where the strings may occur on different lines and surround a comment. For example I would like to remove the following seven lines which are contained between /* and */:
/* "CyIHTAlgorithm.pyx":81
* #cython.wraparound(False)
* #cython.cdivision(True)
* cdef inline object IHTReconstruction2D(fType_t[:,:] data, # <<<<<<<<<<<<<<
* fType_t[:,:] residualFID,
* fType_t[:,:] CS_spectrum,
*/
I have managed to do this using sed where the strings occur on the same line:
sed -i.bak 's/\(\/\*\).*\(\*\/\)/\1\2/' test.txt
but I'm not sure how to extend this to multiple lines in the same file:
I have also tried:
sed -i.bak '/\/\*/{:a;N;/\*\//!ba;s/.*\/\*\|\*\/.*//g}' test.txt following the ideas here (Extract text between two strings on different lines)
This deletes the /* at the beginning and */ but not the intervening text.
Why not to work with sed ranges?
$ cat tmp/file13
first line
/* "CyIHTAlgorithm.pyx":81
* #cython.wraparound(False)
* #cython.cdivision(True)
* cdef inline object IHTReconstruction2D(fType_t[:,:] data, # <<<<<<<<<<<<<<
* fType_t[:,:] residualFID,
* fType_t[:,:] CS_spectrum,
*/
before last line
last line
$ sed '/\/\*/,/\*\//d' tmp/file13
first line
before last line
last line
you can use sed or cut but they are really designed for a pattern so each line should match it.
you should declare first line and last line by getting the lumber line of the start and finish and then you can wrap it into a function.
so,
1) get the line number for /* part
2) get the last line number for */
3) you can use "while read line;" loop and cut every line in between using cut or sed.
awk is really better suited for this kind of things. It supports ranges out of the box with the /pattern/,/pattern2/ syntax.
awk '/[:space:]*\/\*/,/[:space:]*\*\// {next} {print}' file.txt
It works the following way: for the lines between the two patterns it executes {next} actually skipping the line, for everything else it just prints the input.
The following will try to do more, so test first if it fits your needs.
cpp -P test.txt
I found the answer here: https://askubuntu.com/questions/916424/how-to-replace-text-between-two-patterns-on-different-lines
sed -n '1h; 1!H; ${ g; s/<head>.*<\/head>/IF H = 2 THEN\n INSERT FILE 'head.bes'\nEND/p }' myProgram.bes
Notes: This replaces all lines between <head> ... </head> (inclusive) in an HTML document with:
IF H = 1 THEN
INSERT FILE 'head.bes'
END

Bash script string processing

I wrote a script that reads a Plain text and a key, and then loops trough each character of plain text and shifts it with the value of the corresponding character in key text, with a=0 b=1 c=2 ... z = 25
the code works fine but with a string of size 1K characters it takes almost 3s to execute.
this is the code:
small="abcdefghijklmnopqrstuvwxyz" ## used to search and return the position of some small letter in a string
capital="ABCDEFGHIJKLMNOPQRSTUVWXYZ" ## used to search and return the position of some capital letter in a string
read Plain_text
read Key_text
## saving the length of each string
length_1=${#Plain_text}
length_2=${#Key_text}
printf " Your Plain text is: %s\n The Key is: %s\n The resulting Cipher text is: " "$Plain_text" "$Key_text"
for(( i=0,j=0;i<$length_1;++i,j=`expr $(($j + 1)) % $length_2` )) ## variable 'i' is the index for the first string, 'j' is the index of the second string
do
## return a substring statring from position 'i' and with length 1
c=${Plain_text:$i:1}
d=${Key_text:$j:1}
## function index takes two parameters, the string to seach in and a substring,
## and return the index of the first occerunce of the substring with base-insex 1
x=`expr index "$small" $c`
y=`expr index "$small" $d`
##shifting the current letter to the right with the vaule of the corresponding letter in the key mod 26
z=`expr $(($x + $y - 2)) % 26`
##print the resulting letter from capital letter string
printf "%s" "${capital:$z:1}"
done
echo ""
How is it possible to improve the performance of this code.
Thank you.
You are creating 4 new processes in each iteration of your for loop by using command substitution (3 substitutions in the body, 1 in the head). You should use arithmetic expansion instead of calling expr (search for $(( in the bash(1) manpage). Note that you don't need the $ to substitute variables inside $(( and )).
you can change character like this
a=( soheil )
echo ${a/${a:0:1}/${a:1:1}}
for change all char use loop like for
and for change char to upper
echo soheil | tr "[:lower:]" "[:upper:]"
i hope i understand your question.
be at peace
You will have a lot of repeating chars in a 1K string.
Imagine the input was 1M.
You should calculate all request/respond pairs in front, so your routine only has to lookup the replacement.
I would think of a solution with arrays is the best approach here.

convert a fixed width file from text to csv

I have a large data file in text format and I want to convert it to csv by specifying each column length.
number of columns = 5
column length
[4 2 5 1 1]
sample observations:
aasdfh9013512
ajshdj 2445df
Expected Output
aasd,fh,90135,1,2
ajsh,dj, 2445,d,f
GNU awk (gawk) supports this directly with FIELDWIDTHS, e.g.:
gawk '$1=$1' FIELDWIDTHS='4 2 5 1 1' OFS=, infile
Output:
aasd,fh,90135,1,2
ajsh,dj, 2445,d,f
I would use sed and catch the groups with the given length:
$ sed -r 's/^(.{4})(.{2})(.{5})(.{1})(.{1})$/\1,\2,\3,\4,\5/' file
aasd,fh,90135,1,2
ajsh,dj, 2445,d,f
Here's a solution that works with regular awk (does not require gawk).
awk -v OFS=',' '{print substr($0,1,4), substr($0,5,2), substr($0,7,5), substr($0,12,1), substr($0,13,1)}'
It uses awk's substr function to define each field's start position and length. OFS defines what the output field separator is (in this case, a comma).
(Side note: This only works if the source data does not have any commas. If the data has commas, then you have to escape them to be proper CSV, which is beyond the scope of this question.)
Demo:
echo 'aasdfh9013512
ajshdj 2445df' |
awk -v OFS=',' '{print substr($0,1,4), substr($0,5,2), substr($0,7,5), substr($0,12,1), substr($0,13,1)}'
Output:
aasd,fh,90135,1,2
ajsh,dj, 2445,d,f
Adding a Generic way of handling this(alternative to FIELDSWIDTH option) in awk(where we need not to harcode sub string positions, this will work as per position nuber provided by user wherever comma needs to be inserted) could be as follows, written and tested in GNU awk. To use this, we have to define values(like OP showed in samples), position numbers where we need to insert commas, awk variable name is colLength give position numbers with space between them.
awk -v colLengh="4 2 5 1 1" '
BEGIN{
num=split(colLengh,arr,OFS)
}
{
j=sum=0
while(++j<=num){
if(length($0)>sum){
sub("^.{"arr[j]+sum"}","&,")
}
sum+=arr[j]+1
}
}
1
' Input_file
Explanation: Simple explanation would be, creating awk variable named colLengh where we need to define position numbers wherever we need to insert commas. Then in BEGIN section creating array arr which has value of indexes where we need to insert commas in it.
In main program section first of all nullifying variables j and sum here. Then running a while loop from j=1 to till value of j becomes equal to num. In each run substituting from starting of current line(if length of current line is greater than sum else it doesn't make sense to perform substitution to I have put addiotnal check here) everything with everything + , as per need. Eg: sub function will become .{4} for first time loop runs then it becomes, .{7} because its 7th position we need to insert comma and so on. So sub will substitute those many characters from starting to till generated numbers with matched value + ,. At last in this program mentioning 1 will print edited/non-edited lines.
If any one is still looking for a solution, I have developed a small script in python. its easy to use provided you have python 3.5
https://github.com/just10minutes/FixedWidthToDelimited/blob/master/FixedWidthToDelimiter.py
"""
This script will convert Fixed width File into Delimiter File, tried on Python 3.5 only
Sample run: (Order of argument doesnt matter)
python ConvertFixedToDelimiter.py -i SrcFile.txt -o TrgFile.txt -c Config.txt -d "|"
Inputs are as follows
1. Input FIle - Mandatory(Argument -i) - File which has fixed Width data in it
2. Config File - Optional (Argument -c, if not provided will look for Config.txt file on same path, if not present script will not run)
Should have format as
FieldName,fieldLength
eg:
FirstName,10
SecondName,8
Address,30
etc:
3. Output File - Optional (Argument -o, if not provided will be used as InputFIleName plus Delimited.txt)
4. Delimiter - Optional (Argument -d, if not provided default value is "|" (pipe))
"""
from collections import OrderedDict
import argparse
from argparse import ArgumentParser
import os.path
import sys
def slices(s, args):
position = 0
for length in args:
length = int(length)
yield s[position:position + length]
position += length
def extant_file(x):
"""
'Type' for argparse - checks that file exists but does not open.
"""
if not os.path.exists(x):
# Argparse uses the ArgumentTypeError to give a rejection message like:
# error: argument input: x does not exist
raise argparse.ArgumentTypeError("{0} does not exist".format(x))
return x
parser = ArgumentParser(description="Please provide your Inputs as -i InputFile -o OutPutFile -c ConfigFile")
parser.add_argument("-i", dest="InputFile", required=True, help="Provide your Input file name here, if file is on different path than where this script resides then provide full path of the file", metavar="FILE", type=extant_file)
parser.add_argument("-o", dest="OutputFile", required=False, help="Provide your Output file name here, if file is on different path than where this script resides then provide full path of the file", metavar="FILE")
parser.add_argument("-c", dest="ConfigFile", required=False, help="Provide your Config file name here,File should have value as fieldName,fieldLength. if file is on different path than where this script resides then provide full path of the file", metavar="FILE",type=extant_file)
parser.add_argument("-d", dest="Delimiter", required=False, help="Provide the delimiter string you want",metavar="STRING", default="|")
args = parser.parse_args()
#Input file madatory
InputFile = args.InputFile
#Delimiter by default "|"
DELIMITER = args.Delimiter
#Output file checks
if args.OutputFile is None:
OutputFile = str(InputFile) + "Delimited.txt"
print ("Setting Ouput file as "+ OutputFile)
else:
OutputFile = args.OutputFile
#Config file check
if args.ConfigFile is None:
if not os.path.exists("Config.txt"):
print ("There is no Config File provided exiting the script")
sys.exit()
else:
ConfigFile = "Config.txt"
print ("Taking Config.txt file on this path as Default Config File")
else:
ConfigFile = args.ConfigFile
fieldNames = []
fieldLength = []
myvars = OrderedDict()
with open(ConfigFile) as myfile:
for line in myfile:
name, var = line.partition(",")[::2]
myvars[name.strip()] = int(var)
for key,value in myvars.items():
fieldNames.append(key)
fieldLength.append(value)
with open(OutputFile, 'w') as f1:
fieldNames = DELIMITER.join(map(str, fieldNames))
f1.write(fieldNames + "\n")
with open(InputFile, 'r') as f:
for line in f:
rec = (list(slices(line, fieldLength)))
myLine = DELIMITER.join(map(str, rec))
f1.write(myLine + "\n")
Portable awk
Generate an awk script with the appropriate substr commands
cat cols
4
2
5
1
1
<cols awk '{ print "substr($0,"p","$1")"; cs+=$1; p=cs+1 }' p=1
Output:
substr($0,1,4)
substr($0,5,2)
substr($0,7,5)
substr($0,12,1)
substr($0,13,1)
Combine lines and make it a valid awk-script:
<cols awk '{ print "substr($0,"p","$1")"; cs+=$1; p=cs+1 }' p=1 |
paste -sd, | sed 's/^/{ print /; s/$/ }/'
Output:
{ print substr($0,1,4),substr($0,5,2),substr($0,7,5),substr($0,12,1),substr($0,13,1) }
Redirect the above to a file, e.g. /tmp/t.awk and run it on the input-file:
<infile awk -f /tmp/t.awk
Output:
aasd fh 90135 1 2
ajsh dj 2445 d f
Or with comma as the output separator:
<infile awk -f /tmp/t.awk OFS=,
Output:
aasd,fh,90135,1,2
ajsh,dj, 2445,d,f

Resources