sed/awk | single digits to two digits (zero) after a second slash - linux

maybe someone can help me briefly...
for example in file.txt...
nw-3001-e0z-4581a/2/5
sed 's/\<[0-9]\>/0&/' file.txt ...
nw-3001-e0z-4581a/02/5
but I want the filled zero only after the second slash, the first number should remain a single digit
thanks in advance! greetz

Could you please try following, written and tested with shown samples. Simply setting field separator and output field separator as / for awk program and then simply adding 0 before 3rd column(if there is only single digit present in it) and print the line.
echo "nw-3001-e0z-4581a/2/5" | awk 'BEGIN{FS=OFS="/"} {$3=sprintf("%02d",$3)} 1'

You can use
awk 'BEGIN{FS=OFS="/"} $NF ~ /^[0-9]$/ {$NF="0"$NF}1' file.txt
Details:
BEGIN{FS=OFS="/"} - sets input/output field separator to /
$NF ~ /^[0-9]$/ - if last field is a single digit
{$NF="0"$NF} - prepend last field with 0
1 - print tjhe result.

Using sed:
sed -rn 's#(^.*/)(.*/)([[:digit:]]{1}$)#\1\20\3#p' <<< "nw-3001-e0z-4581a/2/5"
Split the string into 3 sections using regular expressions (-r). Ensure that the last section has one digit only with [[:digit:]]{1} and substitute the line for the first and second sections, followed by "0" and the third section, printing the result.

$ sed 's:/:&0:2' file
nw-3001-e0z-4581a/2/05
If that's not all you need then edit your question to show more truly representative sample input/output including cases that doesn't work for.

Related

linux shell script delimiter

How to change delimiter from current comma (,) to semicolon (;) inside .txt file using linux command?
Here is my ME_1384_DataWarehouse_*.txt file:
Data Warehouse,ME_1384,Budget for HW/SVC,13/05/2022,10,9999,13/05/2022,27,08,27,08
Data Warehouse,ME_1384,Budget for HW/SVC,09/05/2022,10,9999,09/05/2022,45,58,45,58
Data Warehouse,ME_1384,Budget for HW/SVC,25/05/2022,10,9999,25/05/2022,7,54,7,54
Data Warehouse,ME_1384,Budget for HW/SVC,25/05/2022,10,9999,25/05/2022,7,54,7,54
It is very important that value of last two columns is number with 2 decimal places, so value of last 2 columns in first row for example is:"27,08"
That could be the main problem why delimiter couldn't be change in proper way.
I tried with:
sed 's/,/;/g' ME_1384_DataWarehouse_*.txt
and every comma sign has been changed, including mentioned value of the last 2 columns.
Is there anyone who can help me out with this issue?
With sed you can replace the nth occurrence of a certain lookup string. Example:
$ sed 's/,/;/4' file
will replace the 4th comma with a semicolon.
So, if you know you have 11 fields (10 commas), you can do
$ sed 's/,/;/g;s/;/,/10;s/;/,/8' file
Example:
$ seq 1 11 | paste -sd, | sed 's/,/;/g;s/;/,/10;s/;/,/8'
1;2;3;4;5;6;7;8,9;10,11
Your question is somewhat unclear, but if you are trying to say "don't change the last comma, or the third-to-last one", a solution to that might be
perl -pi~ -e 's/,(?![^,]+(?:,[^,]+,[^,]+)?$)/;/g' ME_1384_DataWarehouse_*.txt
Perl in isolation does not perform any loop over the input lines, but the -p option says to loop over input one line at a time, like sed, and print every line (there is also -n to simulate the behavior of sed -n); the -i~ says to modify the file, but save the original with a tilde added to its file name as a backup; and the regex uses a negative lookahead (?!...) to protect the two fields you want to exempt from the replacement. Lookaheads are a modern regex feature which isn't supported by older tools like sed.
Once you are satisfied with the solution, you can remove the ~ after -i to disable the generation of backups.
You can do this with awk:
awk -F, 'BEGIN {OFS=";"} {a=$NF;NF-=1; printf "%s,%s\n",$0,a} ' input_file
This should work with most awk version (do not count on Solaris standard awk)
The idea is to store the last element from row in variable, decrease the number of fields and then print using new delimiter, comma and stored last field.

AWK - Show lines where column contains a specific string

I have a document (.txt) composed like that.
info1: info2: info3: info4
And I want to show some information by column.
For example, I have some different information in "info3" shield, I want to see only the lines who are composed by "test" in "info3" column.
I think I have to use sort but I'm not sure.
Any idea ?
The previous answers are assuming that the third column is exactly equal to test. It looks like you were looking for columns where the value included test. We need to use awk's match function
awk -F: 'match($3, "test")' file
You can use awk for this. Assuming your columns are de-limited by : and column 3 has entries having test, below command lists only those lines having that value.
awk -F':' '$3=="test"' input-file
Assuming that the spacing is consistent, and you're looking for only test in the third column, use
grep ".*:.*: test:.*" file.txt
Or to take care of any spacing that might occur
grep ".*:.*: *test *:.*" file.txt

Finding different groups between 0 and 1000 that are in the file

I have a file with 7 fields separated with a :. In field 4 it has the group number. I want to display the group numbers within 0-1000. If there is a duplicate, I only want to print one copy of it along with the other group numbers that don't have a duplicate.
I have to use grep, awk, sort and uniq.
I don't know the first place to start. Can someone please help me?
awk to the rescue!
$ awk -F: '$4>=0 && $4<=1000 && !a[$4]++' file
conditions are trivial, the array indexed by $4 will have nonzero value for the duplicates and not printed, only the first value of duplicates will have zero (before the ++ increment) value and printed.

linux command to get lines in specified format

I have data in csv in the following form:
1 number tab one
2 number two
3 number three
Now I want to convert the data to the following form:
1 number tab one
2 number two
3 number three
i.e. I want the first tab to remain as it is..but the second and consecutive tabs to be replaced by spaces. Is it possible to do so using a linux command (like sed, etc). I know I can use sed for substitution but is it possible to make it skip the first tab space and start substitution from the second tab space.
This might work for you (GNU sed):
sed 's/\t/ /2g' file
Using awk, you can do like this.
cat file
1 number tab one
2 number two
3 number three
The awk
awk '{$1=$1;sub(/ /,"\t")}1'
1 number tab one
2 number two
3 number three
$1=$1 sets all spaces to default one space.
sub(/ /,"\t") changes first spaces to a tab
1 print everything
PS You can skip first tab using a for loop and going trough all fields, but why make it more complicated then needed when the function are there? Only school work has this type of request.
cat file
1 number tab one
2 number two
3 number three
Try this:
sed 's/\s\+/ /2g' file
1 number tab one
2 number two
3 number three
Skipping the first tab ain't easy.
But you could reframe the problem this way:
Replace all the tabs with spaces
Replace the first space with tab
This may be a bit lossy, but it's actually negligible, and the outcome is the same:
sed -e 's/ / /g; s/ / /' < yourfile.txt
To enter TAB characters on the command line you may have to type Ctrl-V TAB.
In older implementations of sed where semicolon doesn't work to separate two commands you can use 2 -e expressions instead:
sed -e 's/ / /g' -e 's/ / /' < yourfile.txt

How to delete double lines in bash

Given a long text file like this one (that we will call file.txt):
EDITED
1 AA
2 ab
3 azd
4 ab
5 AA
6 aslmdkfj
7 AA
How to delete the lines that appear at least twice in the same file in bash? What I mean is that I want to have this result:
1 AA
2 ab
3 azd
6 aslmdkfj
I do not want to have the same lines in double, given a specific text file. Could you show me the command please?
Assuming whitespace is significant, the typical solution is:
awk '!x[$0]++' file.txt
(eg, The line "ab " is not considered the same as "ab". It is probably simplest to pre-process the data if you want to treat whitespace differently.)
--EDIT--
Given the modified question, which I'll interpret as only wanting to check uniqueness after a given column, try something like:
awk '!x[ substr( $0, 2 )]++' file.txt
This will only compare columns 2 through the end of the line, ignoring the first column. This is a typical awk idiom: we are simply building an array named x (one letter variable names are a terrible idea in a script, but are reasonable for a one-liner on the command line) which holds the number of times a given string is seen. The first time it is seen, it is printed. In the first case, we are using the entire input line contained in $0. In the second case we are only using the substring consisting of everything including and after the 2nd character.
Try this simple script:
cat file.txt | sort | uniq
cat will output the contents of the file,
sort will put duplicate entries adjacent to each other
uniq will remove adjcacent duplicate entries.
Hope this helps!
The uniq command will do what you want.
But make sure the file is sorted first, it only checks for consecutive lines.
Like this:
sort file.txt | uniq

Resources