Replace value in string

Replace value in string - linux

I have text file with ~70k lines like this:
/dir1/dir2/dir3/2013/04/04/file.pdf
and I need to convert it to:
dir4/dir5/2013/04/4/file.pdf
It's important that the leading 0 in 6th place is removed, values in this place go from 1 to 31. Can anyone help with this?

Using sed :
sed -E 's#(/[^/]*){3}(/[0-9]+/[0-9]+/)0?([0-9]+.*)#dir4/dir5\2\3#' your_file
We match the three dirs in a first group that will be disregarded (we'd be using a non-capturing group if sed supported it), then the year and month in a second group, then optionnaly the leading 0 of the day, then the rest of the day and the filename in a third group. The replacement pattern just specifies the new path root then refers to the second and third groups. I'm using # as a delimiter to avoid having to espace all the / in the pattern and replacement pattern, any character that isn't found in them would work as well.
Try it online !

Related

linux shell script delimiter

How to change delimiter from current comma (,) to semicolon (;) inside .txt file using linux command?
Here is my ME_1384_DataWarehouse_*.txt file:
Data Warehouse,ME_1384,Budget for HW/SVC,13/05/2022,10,9999,13/05/2022,27,08,27,08
Data Warehouse,ME_1384,Budget for HW/SVC,09/05/2022,10,9999,09/05/2022,45,58,45,58
Data Warehouse,ME_1384,Budget for HW/SVC,25/05/2022,10,9999,25/05/2022,7,54,7,54
Data Warehouse,ME_1384,Budget for HW/SVC,25/05/2022,10,9999,25/05/2022,7,54,7,54
It is very important that value of last two columns is number with 2 decimal places, so value of last 2 columns in first row for example is:"27,08"
That could be the main problem why delimiter couldn't be change in proper way.
I tried with:
sed 's/,/;/g' ME_1384_DataWarehouse_*.txt
and every comma sign has been changed, including mentioned value of the last 2 columns.
Is there anyone who can help me out with this issue?

With sed you can replace the nth occurrence of a certain lookup string. Example:
$ sed 's/,/;/4' file
will replace the 4th comma with a semicolon.
So, if you know you have 11 fields (10 commas), you can do
$ sed 's/,/;/g;s/;/,/10;s/;/,/8' file
Example:
$ seq 1 11 | paste -sd, | sed 's/,/;/g;s/;/,/10;s/;/,/8'
1;2;3;4;5;6;7;8,9;10,11

Your question is somewhat unclear, but if you are trying to say "don't change the last comma, or the third-to-last one", a solution to that might be
perl -pi~ -e 's/,(?![^,]+(?:,[^,]+,[^,]+)?$)/;/g' ME_1384_DataWarehouse_*.txt
Perl in isolation does not perform any loop over the input lines, but the -p option says to loop over input one line at a time, like sed, and print every line (there is also -n to simulate the behavior of sed -n); the -i~ says to modify the file, but save the original with a tilde added to its file name as a backup; and the regex uses a negative lookahead (?!...) to protect the two fields you want to exempt from the replacement. Lookaheads are a modern regex feature which isn't supported by older tools like sed.
Once you are satisfied with the solution, you can remove the ~ after -i to disable the generation of backups.

You can do this with awk:
awk -F, 'BEGIN {OFS=";"} {a=$NF;NF-=1; printf "%s,%s\n",$0,a} ' input_file
This should work with most awk version (do not count on Solaris standard awk)
The idea is to store the last element from row in variable, decrease the number of fields and then print using new delimiter, comma and stored last field.

sed/awk | single digits to two digits (zero) after a second slash

maybe someone can help me briefly...
for example in file.txt...
nw-3001-e0z-4581a/2/5
sed 's/\<[0-9]\>/0&/' file.txt ...
nw-3001-e0z-4581a/02/5
but I want the filled zero only after the second slash, the first number should remain a single digit
thanks in advance! greetz

Could you please try following, written and tested with shown samples. Simply setting field separator and output field separator as / for awk program and then simply adding 0 before 3rd column(if there is only single digit present in it) and print the line.
echo "nw-3001-e0z-4581a/2/5" | awk 'BEGIN{FS=OFS="/"} {$3=sprintf("%02d",$3)} 1'

You can use
awk 'BEGIN{FS=OFS="/"} $NF ~ /^[0-9]$/ {$NF="0"$NF}1' file.txt
Details:
BEGIN{FS=OFS="/"} - sets input/output field separator to /
$NF ~ /^[0-9]$/ - if last field is a single digit
{$NF="0"$NF} - prepend last field with 0
1 - print tjhe result.

Using sed:
sed -rn 's#(^.*/)(.*/)([[:digit:]]{1}$)#\1\20\3#p' <<< "nw-3001-e0z-4581a/2/5"
Split the string into 3 sections using regular expressions (-r). Ensure that the last section has one digit only with [[:digit:]]{1} and substitute the line for the first and second sections, followed by "0" and the third section, printing the result.

$ sed 's:/:&0:2' file
nw-3001-e0z-4581a/2/05
If that's not all you need then edit your question to show more truly representative sample input/output including cases that doesn't work for.

unix command to replace anything between between two delimiter positions

Please help me with a unix command to replace anything between two delimiter positions.
For ex: I have multiple files with below header data and I want replace the data between * delimiters at 9th and 10th position
ISA*00* *00* *ZZ*80881 *ZZ*TNC0022 *190115*1237*^*00501*000320089*0*P*|~
My output should like this:
ISA*00* *00* *ZZ*80881 *ZZ*TNC0022 *190327*1237*^*00501*000320089*0*P*|~

Try this:
perl -pe 's/^((?:[^*]*\*){9})([^*]+)(.*)/${1}190327$3/'
The regexp searches for 9 occurences {9} of anything but not being a star [^*] followed by a star \* and stores all in the first capture group. The second capture is at least one character not being a star [^*]+. And the third capture is the rest of the line.
A matching line gets replaced by the first part ${1}, your new value 190327 and the third part $3.

How to change a specific colum content strings using bash/shell?

I'm having a .txt file looking like this (along about 400 rows):
lettuceFMnode_1240 J_C7R5_99354_KNKSR3_Oligomycin 81.52
lettuceFMnode_3755 H_C1R3_99940_KNKSF2_Tubulysin 70
lettuceFMnode_17813 G_C4R5_80184_KNKS113774F_Tetronasin 79.57
lettuceFMnode_69469 J_C11R7_99276_KNKSF2_Nystatin 87.27
I want to edit the names in the entire 2nd column so that only the last part will stay (meaning delete anything before that, so in fact leaving what comes after the last _).
I looked into different solutions using a combination of cut and sed, but couldn't understand how the code should be built.
Would appreciate any tips and help!
Thank you!

Here's one way:
perl -pe 's/^\S+\s+\K\S+_//'
For every line of input (-p) we execute some code (-e ...).
The code performs a subtitution (s/PATTERN/REPLACEMENT/).
The pattern matches as follows:
^ beginning of string
\S+ 1 or more non-whitespace characters (the first column)
\s+ 1 or more whitespace characters (the space after the first column)
\K do not treat the text matched so far as part of the final match
\S+ 1 or more non-whitespace characters (the second column)
_ an underscore
Because + is greedy (it matches as many characters as possible), \S+_ will match everything up to the last _ in the second column.
Because we used \K, only the rest of the pattern (i.e. the part of the match that lies in the second column) gets replaced.
The replacement string is empty, so the match is effectively removed.

With sed:
sed 's/ [^ ]*_/ /' file
Replace first space followed by non-space characters ([^ ]*) followed by _ widh one space.

changing the date formatin linux bash

Regarding an earlier answer, I need to change the date format from yyyy-mm-dd to yyyy/mm/dd.
I was given this answer:
sed -i 's#,\(....\)-\(..\)-\(..\) #,\1/\2/\3 #' /home/Documents/blah.csv
this works perfectly, for only one instance per line. However one line can have many of these dates, how do I change the sed command so it does it for every instance detected (not just the first).
Example document:
2012-09-09,123143,2012-09-09,12837,2012-09-07,2131,2012-08-06,1237
#and many more lines like that.
after running the sed command, I get this:
2012-09-09,123143,2012/09/09,12837,2012-09-07,2131,2012-08-06,1237
It only works on the second date instance, How do i make it work for all of them?

Use the g flag, to make substitutions for every match in a line, not just the first. Also, the first date isn't matched because it isn't preceded by a comma.
sed -i 's#\(....\)-\(..\)-\(..\)#\1/\2/\3/#g' /home/Documents/blah.csv
This fixes a few issues:
Don't bother matching the commas; the 4-2-2 nature of the data should be sufficient, and the first field is not matched because it isn't preceded by a comma.
Add the g flag following the terminating # to replace all matches, not just the first.
Added a forgotten / between the year (\1) and the month (\2).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Replace value in string - linux

I have text file with ~70k lines like this: /dir1/dir2/dir3/2013/04/04/file.pdf and I need to convert it to: dir4/dir5/2013/04/4/file.pdf It's important that the leading 0 in 6th place is removed, values in this place go from 1 to 31. Can anyone help with this?

Related

linux shell script delimiter

sed/awk | single digits to two digits (zero) after a second slash

unix command to replace anything between between two delimiter positions

How to change a specific colum content strings using bash/shell?

changing the date formatin linux bash

Categories

Resources