How to change delimiter from current comma (,) to semicolon (;) inside .txt file using linux command?
Here is my ME_1384_DataWarehouse_*.txt file:
Data Warehouse,ME_1384,Budget for HW/SVC,13/05/2022,10,9999,13/05/2022,27,08,27,08
Data Warehouse,ME_1384,Budget for HW/SVC,09/05/2022,10,9999,09/05/2022,45,58,45,58
Data Warehouse,ME_1384,Budget for HW/SVC,25/05/2022,10,9999,25/05/2022,7,54,7,54
Data Warehouse,ME_1384,Budget for HW/SVC,25/05/2022,10,9999,25/05/2022,7,54,7,54
It is very important that value of last two columns is number with 2 decimal places, so value of last 2 columns in first row for example is:"27,08"
That could be the main problem why delimiter couldn't be change in proper way.
I tried with:
sed 's/,/;/g' ME_1384_DataWarehouse_*.txt
and every comma sign has been changed, including mentioned value of the last 2 columns.
Is there anyone who can help me out with this issue?
With sed you can replace the nth occurrence of a certain lookup string. Example:
$ sed 's/,/;/4' file
will replace the 4th comma with a semicolon.
So, if you know you have 11 fields (10 commas), you can do
$ sed 's/,/;/g;s/;/,/10;s/;/,/8' file
Example:
$ seq 1 11 | paste -sd, | sed 's/,/;/g;s/;/,/10;s/;/,/8'
1;2;3;4;5;6;7;8,9;10,11
Your question is somewhat unclear, but if you are trying to say "don't change the last comma, or the third-to-last one", a solution to that might be
perl -pi~ -e 's/,(?![^,]+(?:,[^,]+,[^,]+)?$)/;/g' ME_1384_DataWarehouse_*.txt
Perl in isolation does not perform any loop over the input lines, but the -p option says to loop over input one line at a time, like sed, and print every line (there is also -n to simulate the behavior of sed -n); the -i~ says to modify the file, but save the original with a tilde added to its file name as a backup; and the regex uses a negative lookahead (?!...) to protect the two fields you want to exempt from the replacement. Lookaheads are a modern regex feature which isn't supported by older tools like sed.
Once you are satisfied with the solution, you can remove the ~ after -i to disable the generation of backups.
You can do this with awk:
awk -F, 'BEGIN {OFS=";"} {a=$NF;NF-=1; printf "%s,%s\n",$0,a} ' input_file
This should work with most awk version (do not count on Solaris standard awk)
The idea is to store the last element from row in variable, decrease the number of fields and then print using new delimiter, comma and stored last field.
I have a document (.txt) composed like that.
info1: info2: info3: info4
And I want to show some information by column.
For example, I have some different information in "info3" shield, I want to see only the lines who are composed by "test" in "info3" column.
I think I have to use sort but I'm not sure.
Any idea ?
The previous answers are assuming that the third column is exactly equal to test. It looks like you were looking for columns where the value included test. We need to use awk's match function
awk -F: 'match($3, "test")' file
You can use awk for this. Assuming your columns are de-limited by : and column 3 has entries having test, below command lists only those lines having that value.
awk -F':' '$3=="test"' input-file
Assuming that the spacing is consistent, and you're looking for only test in the third column, use
grep ".*:.*: test:.*" file.txt
Or to take care of any spacing that might occur
grep ".*:.*: *test *:.*" file.txt
I have a file with 7 fields separated with a :. In field 4 it has the group number. I want to display the group numbers within 0-1000. If there is a duplicate, I only want to print one copy of it along with the other group numbers that don't have a duplicate.
I have to use grep, awk, sort and uniq.
I don't know the first place to start. Can someone please help me?
awk to the rescue!
$ awk -F: '$4>=0 && $4<=1000 && !a[$4]++' file
conditions are trivial, the array indexed by $4 will have nonzero value for the duplicates and not printed, only the first value of duplicates will have zero (before the ++ increment) value and printed.
I have data in csv in the following form:
1 number tab one
2 number two
3 number three
Now I want to convert the data to the following form:
1 number tab one
2 number two
3 number three
i.e. I want the first tab to remain as it is..but the second and consecutive tabs to be replaced by spaces. Is it possible to do so using a linux command (like sed, etc). I know I can use sed for substitution but is it possible to make it skip the first tab space and start substitution from the second tab space.
This might work for you (GNU sed):
sed 's/\t/ /2g' file
Using awk, you can do like this.
cat file
1 number tab one
2 number two
3 number three
The awk
awk '{$1=$1;sub(/ /,"\t")}1'
1 number tab one
2 number two
3 number three
$1=$1 sets all spaces to default one space.
sub(/ /,"\t") changes first spaces to a tab
1 print everything
PS You can skip first tab using a for loop and going trough all fields, but why make it more complicated then needed when the function are there? Only school work has this type of request.
cat file
1 number tab one
2 number two
3 number three
Try this:
sed 's/\s\+/ /2g' file
1 number tab one
2 number two
3 number three
Skipping the first tab ain't easy.
But you could reframe the problem this way:
Replace all the tabs with spaces
Replace the first space with tab
This may be a bit lossy, but it's actually negligible, and the outcome is the same:
sed -e 's/ / /g; s/ / /' < yourfile.txt
To enter TAB characters on the command line you may have to type Ctrl-V TAB.
In older implementations of sed where semicolon doesn't work to separate two commands you can use 2 -e expressions instead:
sed -e 's/ / /g' -e 's/ / /' < yourfile.txt
Given a long text file like this one (that we will call file.txt):
EDITED
1 AA
2 ab
3 azd
4 ab
5 AA
6 aslmdkfj
7 AA
How to delete the lines that appear at least twice in the same file in bash? What I mean is that I want to have this result:
1 AA
2 ab
3 azd
6 aslmdkfj
I do not want to have the same lines in double, given a specific text file. Could you show me the command please?
Assuming whitespace is significant, the typical solution is:
awk '!x[$0]++' file.txt
(eg, The line "ab " is not considered the same as "ab". It is probably simplest to pre-process the data if you want to treat whitespace differently.)
--EDIT--
Given the modified question, which I'll interpret as only wanting to check uniqueness after a given column, try something like:
awk '!x[ substr( $0, 2 )]++' file.txt
This will only compare columns 2 through the end of the line, ignoring the first column. This is a typical awk idiom: we are simply building an array named x (one letter variable names are a terrible idea in a script, but are reasonable for a one-liner on the command line) which holds the number of times a given string is seen. The first time it is seen, it is printed. In the first case, we are using the entire input line contained in $0. In the second case we are only using the substring consisting of everything including and after the 2nd character.
Try this simple script:
cat file.txt | sort | uniq
cat will output the contents of the file,
sort will put duplicate entries adjacent to each other
uniq will remove adjcacent duplicate entries.
Hope this helps!
The uniq command will do what you want.
But make sure the file is sorted first, it only checks for consecutive lines.
Like this:
sort file.txt | uniq