Trim a string up to 4th delimiter from right side - linux

I have strings like following which should be parsed with only unix command (bash)
49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
I want to trim the strings like above upto 4th underscore from end/right side. So output should be
49_sftp_mac_myfile_simul_test
Number of underscores can vary in overall string. For example, The string could be
49_sftp_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
Output should be (after trimming up to 4th occurrence of underscore from right.
49_sftp_simul_test

Easily done using awk that decrements NF i.e. no. of fields to -4 after setting input+output field separator as underscore:
s='49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed'
awk 'BEGIN{FS=OFS="_"} {NF -= 4; $1=$1} 1' <<< "$s"
49_sftp_mac_myfile_simul_test

You can use bash's parameter expansion for that:
string="..."
echo "${string%_*_*_*_*}"

With GNU sed:
$ sed -E 's/(_[^_]*){4}$//' <<< "49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed"
49_sftp_mac_myfile_simul_test
From the end of line, removes 4 occurrences of _ followed by non _ characters.

Perl one-liner
echo $your-string | perl -lne '$n++ while /_/g; print join "_",((split/_/)[-$n-1..-5])'
input
49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
the output
49_sftp_mac_myfile_simul_test
input
49_sftp_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
the output
49_sftp_simul_test

Not the fastest but maybe the easiest to remember and funiest:
echo "49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed"|
rev | cut -d"_" -f5- | rev

Related

Replace last character in specific column with value 0

How to replace the last character in column 2 with value 0
input
1232;1001;1
2231;2007;1
2234;2009;2
2003;1114;1
output desired
1232;1000;1
2231;2000;1
2234;2000;2
2003;1110;1
Modifying Input with gensub()
You can use any number of GNU awk string functions to do this, but the gensub() command is particularly useful. It has the signature:
gensub(regexp, replacement, how [, target])
which makes it extremely flexible for these sorts of transformations.
Converting Your Example
# Store your input in a shell variable for MCVE convenience, although
# you can have this data in a file or pass it on standard input if you
# prefer.
example_input='1232;1001;1
2231;2007;1
2234;2009;2
2003;1114;1'
# Use awk's gensub() string function.
echo "$example_input" | awk '{print gensub(/.;/, "0;", 2, $1)}'
This results in the following output:
1232;1000;1
2231;2000;1
2234;2000;2
2003;1110;1
awk approach:
awk -F';' '{ sub(/.$/,0,$2) }1' OFS=';' file
The output:
1232;1000;1
2231;2000;1
2234;2000;2
2003;1110;1
Or the same with substr() function:
awk -F';' '{ $2=substr($2,0,3)0 }1' OFS=';' file
not necessarily better, but a mathematical approach for numerical data...
$ awk 'BEGIN{FS=OFS=";"} {$2=int($2/10)*10}1'
round down the last digits (ones), to round down two digits (ones and tens) replace 10 with 100.
Or, simple replacement is easier with GNU sed
$ sed 's/.;/0;/2'
I would do that with sed:
sed -e 's/^\([^;]*;[^;]*\).;/\10;/' filename

bash Changing every other comma to point

I am working with set of data which is written in Swedish format. comma is used instead of point for decimal numbers in Sweden.
My data set is like this:
1,188,1,250,0,757,0,946,8,960
1,257,1,300,0,802,1,002,9,485
1,328,1,350,0,846,1,058,10,021
1,381,1,400,0,880,1,100,10,418
Which I want to change every other comma to point and have output like this:
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418
Any idea of how to do that with simple shell scripting. It is fine If I do it in multiple steps. I mean if I change first the first instance of comma and then the third instance and ...
Thank you very much for your help.
Using sed
sed 's/,\([^,]*\(,\|$\)\)/.\1/g' file
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418
For reference, here is a possible way to achieve the conversion using awk:
awk -F, '{for(i=1;i<=NF;i=i+2) {printf $i "." $(i+1); if(i<NF-2) printf FS }; printf "\n" }' file
The for loop iterates every 2 fields separated by a comma (set by the option -F,) and prints the current element and the next one separated by a dot.
The comma separator represented by FS is printed except at the end of line.
As a Perl one-liner, using split and array manipulation:
perl -F, -e '#a = #b = (); while (#b = splice #F, 0, 2) {
push #a, join ".", #b} print join ",", #a' file
Output:
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418
Many sed dialects allow you to specify which instance of a pattern to replace by specifying a numeric option to s///.
sed -e 's/,/./9' -e 's/,/./7' -e 's/,/./5' -e 's/,/./3' -e 's/,/./'
ISTR some sed dialects would allow you to simplify this to
sed 's/,/./1,2'
but this is not supported on my Debian.
Demo: http://ideone.com/6s2lAl

split a string variable through shell script

i have a string containing date and time as timestamp= 12-12-2012 16:45:00
I need to reformat it into timestamp= 16:45:00 12-12-2012
How to achieve this in shell script?
Note Please : variable's value is 12-12-2012 16:45:00 and timestamp is the name of variable
#!usr/bin/expect
set timestamp "16:45:00 12-12-2012"
Now what should i do so value of timestamp will become 12-12-2012 16:45:00
script extention is .tcl example test.tcl
You could use variable patterned removal. ## means "greedily remove everything that matches the pattern, starting from the left". %% means the same from the right:
tm=${timestamp##* }
dt=${timestamp%% *}
result="$tm $dt"
or you could use cut to do the same, giving a field delimiter:
tm=$(echo $timestamp | cut -f2 -d' ')
dt=$(echo $timestamp | cut -f1 -d' ')
result="$tm $dt"
or you could use sed to swap them with a regex (see other post).
or if you are pulling the date from the date command, you could ask it to format it for you:
result=$(date +'%r %F')
and for that matter, you might have a version of date that will parse your date and then let you express it however you want:
result=$(date -d '12/12/2012 4:45 pm' +'%r %F')
admittedely, this last one is picky about date input...see "info date" for information on accepted inputs.
If you want to use regex, I like Perl's...they are cleaner to write:
echo $timestamp | perl -p -e 's/^(\S+)\s+(\S+)/$2 $1/'
where \S matches non-space characters, + means "one or more", and \s matches spaces. The parens do captures of the parts matched.
EDIT:
Sorry, didn't realize that the "timestamp=" was part of the actual data. All of the above example work if you first strip that bit out:
var='timestamp=2012-12-12 16:45:11'
timestamp=${var#timestamp=}
... then as above ...
Using sed:
sed 's/\([0-9]*-[0-9]*-[0-9]*\)\([ \t]*\)\(.*\)/\3\2\1/' input
this command works on lines containing the pattern number-number-number whitespace antyhing. It simply swaps the number-number-number part \([0-9]*-[0-9]*-[0-9]*\) with the anything part \(.*\), also keeping the original whitespaces \([ \t]*\). So the replace part of sed is \3\2\1, which means the third part, white spaces, and the first part.
Same logic with tcl:
set timestamp "12-12-2012 16:45:00"
set s [regsub {([0-9]*-[0-9]*-[0-9]*)([ \t]*)(.*)} $timestamp \\3\\2\\1]
puts $s
awk solution here:
string="timestamp= 12-12-2012 16:45:00"
awk '{print $1, $3, $2}' <<< "$string"
In bash (and similar shells):
$ timestamp="12-12-2012 16:45:00"
$ read -a tsarr <<< "$timestamp"
$ echo "${tsarr[1]} ${tsarr[0]}"
16:45:00 12-12-2012

grep only for certain word on line

Need to grep only the word between the 2nd and 3rd to last /
This is shown in the extract below, to note that the location on the filename is not always the same counting from the front. Any ideas would be helpful.
/home/user/Drive-backup/2010 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/4.txt
Here is a Perl script that does the job:
my $str = q!/home/user/Drive-backup/2010 Backup/2010 Account/Jan/usernameneedtogrep/user.dir/4.txt!;
my $res = (split('/',$str))[-3];
print $res;
output:
usernameneedtogrep
I'd use awk:
awk -F/ '{print $(NF-2)}'
splits on /
NF is the index of the last column, $NF the last column itself and $(NF-2) the 3rd-to-last column.
You might of course first need to filter out lines in your input that are not paths (e.g. using grep and then piping to awk)
a regular expression something like this should do the trick:
/.\/(.+?)\/.*?\/.*$/
(note I'm using lazy searches (+? and *?) so that it doesn't includes slashes where we don't want it to)

Getting n-th line of text output

I have a script that generates two lines as output each time. I'm really just interested in the second line. Moreover I'm only interested in the text that appears between a pair of #'s on the second line. Additionally, between the hashes, another delimiter is used: ^A. It would be great if I can also break apart each part of text that is ^A-delimited (Note that ^A is SOH special character and can be typed by using Ctrl-A)
output | sed -n '1p' #prints the 1st line of output
output | sed -n '1,3p' #prints the 1st, 2nd and 3rd line of output
your.program | tail +2 | cut -d# -f2
should get you 2/3 of the way.
Improving Grumdrig's answer:
your.program | head -n 2| tail -1 | cut -d# -f2
I'd probably use awk for that.
your_script | awk -F# 'NR == 2 && NF == 3 {
num_tokens=split($2, tokens, "^A")
for (i = 1; i <= num_tokens; ++i) {
print tokens[i]
}
}'
This says
1. Set the field separator to #
2. On lines that are the 2nd line, and also have 3 fields (text#text#text)
3. Split the middle (2nd) field using "^A" as the delimiter into the array named tokens
4. Print each token
Obviously this makes a lot of assumptions. You might need to tweak it if, for example, # or ^A can appear legitimately in the data, without being separators. But something like that should get you started. You might need to use nawk or gawk or something, I'm not entirely sure if plain awk can handle splitting on a control character.
bash:
read
read line
result="${line#*#}"
result="${result%#*}"
IFS=$'\001' read result -a <<< "$result"
$result is now an array that contains the elements you're interested in. Just pipe the output of the script to this one.
here's a possible awk solution
awk -F"#" 'NR==2{
for(i=2;i<=NF;i+=2){
split($i,a,"\001") # split on SOH
for(o in a ) print o # print the splitted hash
}
}' file

Resources