How to replace the last character in column 2 with value 0
input
1232;1001;1
2231;2007;1
2234;2009;2
2003;1114;1
output desired
1232;1000;1
2231;2000;1
2234;2000;2
2003;1110;1
Modifying Input with gensub()
You can use any number of GNU awk string functions to do this, but the gensub() command is particularly useful. It has the signature:
gensub(regexp, replacement, how [, target])
which makes it extremely flexible for these sorts of transformations.
Converting Your Example
# Store your input in a shell variable for MCVE convenience, although
# you can have this data in a file or pass it on standard input if you
# prefer.
example_input='1232;1001;1
2231;2007;1
2234;2009;2
2003;1114;1'
# Use awk's gensub() string function.
echo "$example_input" | awk '{print gensub(/.;/, "0;", 2, $1)}'
This results in the following output:
1232;1000;1
2231;2000;1
2234;2000;2
2003;1110;1
awk approach:
awk -F';' '{ sub(/.$/,0,$2) }1' OFS=';' file
The output:
1232;1000;1
2231;2000;1
2234;2000;2
2003;1110;1
Or the same with substr() function:
awk -F';' '{ $2=substr($2,0,3)0 }1' OFS=';' file
not necessarily better, but a mathematical approach for numerical data...
$ awk 'BEGIN{FS=OFS=";"} {$2=int($2/10)*10}1'
round down the last digits (ones), to round down two digits (ones and tens) replace 10 with 100.
Or, simple replacement is easier with GNU sed
$ sed 's/.;/0;/2'
I would do that with sed:
sed -e 's/^\([^;]*;[^;]*\).;/\10;/' filename
Related
I have the following output to grep the value in this case "225". This value is actually a variable $pd so it could change depending on users input" It could be integer numbers or an alphanumeric character case-insensitive exact match. Example if value of variable is "225" then a "0225" or "11225" its not a valid output from the file Im reading it.
Input File:
10.20.223.10|2000-H1|1/1/2|DeviceX_4021|LG
10.20.223.10|2000-H1|1/1/3|Undiscoverable|Unkwn
10.20.225.10|2000-H1|1/1/5|DeviceZ_2050|LG
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
10.20.223.10|2000-H1|1/1/8|DeviceY_01225_|Kenmore
10.20.225.10|2000-H1|1/1/8|DeviceY_2250_|Kenmore
Desired Output File:
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
If user input is "lg"; then it should output the line without not ignoring it because the input file has "lg" in uppercase. (This part is already fixed on the script).
Desired Output:
10.20.223.10|2000-H1|1/1/2|DeviceX_4021|LG
10.20.225.10|2000-H1|1/1/5|DeviceZ_2050|LG
$ awk -F'|' -v n='225' '$4 ~ n' file
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
or if you don't want a partial match (e.g. against 1225) then one way is:
$ awk -F'|' -v n='225' '$4 ~ ("(^|[^0-9])" n "([^0-9]|$)")' file
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
or:
$ awk -F'|' -v n='225' '$4 ~ ("(^|_)" n "(_|$)")' file
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
There are other possibilities too. The right solution depends on the requirements you haven't told us about and will pass or fail when using input other then you've shown us yet.
awk
awk -F"|" -v var="[A-Za-z].225_" '$4 ~ var{print}'
sed
sed -n '/[A-Za-z].225./p'
grep
grep '[A-Za-z].225.'
Output
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
Using sed:
sed -n '/^\([^|]*\|\)\{3\}[^|]*225/p' < input
Explanation:
the -n option disables automatic output at the end of each sed cycle
the pattern matches arbitrary contents of the first three (\{3\}) columns of data via the \(parenthesized\) pattern [^|]*\| -- any number of non-delmiter characters followed by the column delimiter
it matches additional input at the beginning of the fourth column, but not spanning columns, with a similar subexpression: [^|]*
then comes the literal text you want to match
the p command after the pattern causes the line to be printed to sed's output in the event that it matches the pattern
There's almost certainly an awk solution too, but in Perl it's this:
$ perl -aF'\|' -ne '$F[3] =~ 225 and print' < input
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
-a: Autosplit the input into array #F
-F'\|: Set the autosplit delimiter to |
-n: Run code for each line in the input file
-e: Here's the code to run
$F[3]: The 4th element of the autosplit array #F
=~: Regex match
and print: Print the input line if the regex matches
Update: You can get the string you're interested in from a command line parameter by assigning it in a BEGIN block.
$ perl -aF'\|' -ne 'BEGIN { $x = shift } $F[3] =~ $x and print' 225 < input
I have strings like following which should be parsed with only unix command (bash)
49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
I want to trim the strings like above upto 4th underscore from end/right side. So output should be
49_sftp_mac_myfile_simul_test
Number of underscores can vary in overall string. For example, The string could be
49_sftp_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
Output should be (after trimming up to 4th occurrence of underscore from right.
49_sftp_simul_test
Easily done using awk that decrements NF i.e. no. of fields to -4 after setting input+output field separator as underscore:
s='49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed'
awk 'BEGIN{FS=OFS="_"} {NF -= 4; $1=$1} 1' <<< "$s"
49_sftp_mac_myfile_simul_test
You can use bash's parameter expansion for that:
string="..."
echo "${string%_*_*_*_*}"
With GNU sed:
$ sed -E 's/(_[^_]*){4}$//' <<< "49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed"
49_sftp_mac_myfile_simul_test
From the end of line, removes 4 occurrences of _ followed by non _ characters.
Perl one-liner
echo $your-string | perl -lne '$n++ while /_/g; print join "_",((split/_/)[-$n-1..-5])'
input
49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
the output
49_sftp_mac_myfile_simul_test
input
49_sftp_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
the output
49_sftp_simul_test
Not the fastest but maybe the easiest to remember and funiest:
echo "49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed"|
rev | cut -d"_" -f5- | rev
I have one command to cut string.
I wonder detail of control index of command in Linux "awk"
I have two different case.
I want to get word "Test" in below example string.
1. "Test-01-02-03"
2. "01-02-03-Test-Ref1-Ref2
First one I can get like
substr('Test-01-02-03',0,index('Test-01-02-03',"-"))
-> Then it will bring result only "test"
How about Second case I am not sure how can I get Test in that case using index function.
Do you have any idea about this using awk?
Thanks!
This is how to use index() to find/print a substring:
$ cat file
Test-01-02-03
01-02-03-Test-Ref1-Ref2
$ awk -v tgt="Test" 's=index($0,tgt){print substr($0,s,length(tgt))}' file
Test
Test
but that may not be the best solution for whatever your actual problem is.
For comparison here's how to do the equivalent with match() for an RE:
$ awk -v tgt="Test" 'match($0,tgt){print substr($0,RSTART,RLENGTH)}' file
Test
Test
and if you like the match() synopsis, here's how to write your own function to do it for strings:
awk -v tgt="Test" '
function strmatch(source,target) {
SSTART = index(source,target)
SLENGTH = length(target)
return SSTART
}
strmatch($0,tgt){print substr($0,SSTART,SLENGTH)}
' file
If these lines are the direct input to awk then the following work:
echo 'Test-01-02-03' | awk -F- '{print $1}' # First field
echo '01-02-03-Test-Ref1-Ref2' | awk -F- '{print $NF-2}' # Third field from the end.
If these lines are pulled out of a larger line in an awk script and need to be split again then the following snippets will do that:
str="Test-01-02-03"; split(str, a, /-/); print a[1]
str="01-02-03-Test-Ref1-Ref2"; numfields=split(str, a, /-/); print a[numfields-2]
Say i have a line:
Terminal="123" Pwd="567"
I want to select only number portion using awk
awk 'match($1, /[0-9]+/){print substr($1, RSTART, RLENGTH)};match($2, /[0-9]+/){print
substr($2, RSTART, RLENGTH)}' file
This gives the desired result.
123 567.
However there must be other better way to select both numbers without writing two match statements.
Thanks.
does grep work for you?
kent$ echo 'Terminal="123" Pwd="567"'|grep -o '[0-9]\+'
123
567
quick and dirty with awk:
awk -F'[^0-9]*' '{$1=$1}7'
test:
kent$ awk -F'[^0-9]*' '{$1=$1}7'<<< 'Terminal="123" Pwd="567"'
123 567
or:
kent$ awk '{gsub(/[^0-9 ]/,"")}7'<<< 'Terminal="123" Pwd="567"'
123 567
Here is a nice little solution with awk:
awk '{gsub("[^0-9]+"," "); print}'
Just converts all consecutive non-digit characters into one space, so it leaves one space before the digit sequence 123.
Here is another way to do it with awk. We set the field separator to "
$ echo 'Terminal="123" Pwd="567"' | awk -F\" '{print $2, $4}'
123 567
I ran into a similar problem but my patterns were more complex so I couldn't brush off my problems with gsub or such. I wrote a recursive function and a wrapper to it. It finds multiple matches in one variable and prints them out separated with a space:
awk '
function rec_wrap(str)
{
matches=""
return rec_func(str)
}
function rec_func(str2)
{
where=match(str2, /RE/)
if(where!=0) {
matches=(matches substr(str2, RSTART, RLENGTH) " ")
rec_func(substr(str2, RSTART+RLENGTH, length(str2)))
}
return matches
}
{print rec_wrap($1)}
' file.txt
The wrapper rec_wrap is needed to empty the variable matches. Function match writes the position and length of the leftmost match to variables RSTART and RLENGTH and the match is extracted with substr and appended to variable matches. Then the function rec_func calls itself with the rest of the string str2 as parameter until match fails to find anymore matches.
Need to grep only the word between the 2nd and 3rd to last /
This is shown in the extract below, to note that the location on the filename is not always the same counting from the front. Any ideas would be helpful.
/home/user/Drive-backup/2010 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/4.txt
Here is a Perl script that does the job:
my $str = q!/home/user/Drive-backup/2010 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/4.txt!;
my $res = (split('/',$str))[-3];
print $res;
output:
usernameneedtogrep
I'd use awk:
awk -F/ '{print $(NF-2)}'
splits on /
NF is the index of the last column, $NF the last column itself and $(NF-2) the 3rd-to-last column.
You might of course first need to filter out lines in your input that are not paths (e.g. using grep and then piping to awk)
a regular expression something like this should do the trick:
/.\/(.+?)\/.*?\/.*$/
(note I'm using lazy searches (+? and *?) so that it doesn't includes slashes where we don't want it to)