I have data in csv in the following form:
1 number tab one
2 number two
3 number three
Now I want to convert the data to the following form:
1 number tab one
2 number two
3 number three
i.e. I want the first tab to remain as it is..but the second and consecutive tabs to be replaced by spaces. Is it possible to do so using a linux command (like sed, etc). I know I can use sed for substitution but is it possible to make it skip the first tab space and start substitution from the second tab space.
This might work for you (GNU sed):
sed 's/\t/ /2g' file
Using awk, you can do like this.
cat file
1 number tab one
2 number two
3 number three
The awk
awk '{$1=$1;sub(/ /,"\t")}1'
1 number tab one
2 number two
3 number three
$1=$1 sets all spaces to default one space.
sub(/ /,"\t") changes first spaces to a tab
1 print everything
PS You can skip first tab using a for loop and going trough all fields, but why make it more complicated then needed when the function are there? Only school work has this type of request.
cat file
1 number tab one
2 number two
3 number three
Try this:
sed 's/\s\+/ /2g' file
1 number tab one
2 number two
3 number three
Skipping the first tab ain't easy.
But you could reframe the problem this way:
Replace all the tabs with spaces
Replace the first space with tab
This may be a bit lossy, but it's actually negligible, and the outcome is the same:
sed -e 's/ / /g; s/ / /' < yourfile.txt
To enter TAB characters on the command line you may have to type Ctrl-V TAB.
In older implementations of sed where semicolon doesn't work to separate two commands you can use 2 -e expressions instead:
sed -e 's/ / /g' -e 's/ / /' < yourfile.txt
Related
How to change delimiter from current comma (,) to semicolon (;) inside .txt file using linux command?
Here is my ME_1384_DataWarehouse_*.txt file:
Data Warehouse,ME_1384,Budget for HW/SVC,13/05/2022,10,9999,13/05/2022,27,08,27,08
Data Warehouse,ME_1384,Budget for HW/SVC,09/05/2022,10,9999,09/05/2022,45,58,45,58
Data Warehouse,ME_1384,Budget for HW/SVC,25/05/2022,10,9999,25/05/2022,7,54,7,54
Data Warehouse,ME_1384,Budget for HW/SVC,25/05/2022,10,9999,25/05/2022,7,54,7,54
It is very important that value of last two columns is number with 2 decimal places, so value of last 2 columns in first row for example is:"27,08"
That could be the main problem why delimiter couldn't be change in proper way.
I tried with:
sed 's/,/;/g' ME_1384_DataWarehouse_*.txt
and every comma sign has been changed, including mentioned value of the last 2 columns.
Is there anyone who can help me out with this issue?
With sed you can replace the nth occurrence of a certain lookup string. Example:
$ sed 's/,/;/4' file
will replace the 4th comma with a semicolon.
So, if you know you have 11 fields (10 commas), you can do
$ sed 's/,/;/g;s/;/,/10;s/;/,/8' file
Example:
$ seq 1 11 | paste -sd, | sed 's/,/;/g;s/;/,/10;s/;/,/8'
1;2;3;4;5;6;7;8,9;10,11
Your question is somewhat unclear, but if you are trying to say "don't change the last comma, or the third-to-last one", a solution to that might be
perl -pi~ -e 's/,(?![^,]+(?:,[^,]+,[^,]+)?$)/;/g' ME_1384_DataWarehouse_*.txt
Perl in isolation does not perform any loop over the input lines, but the -p option says to loop over input one line at a time, like sed, and print every line (there is also -n to simulate the behavior of sed -n); the -i~ says to modify the file, but save the original with a tilde added to its file name as a backup; and the regex uses a negative lookahead (?!...) to protect the two fields you want to exempt from the replacement. Lookaheads are a modern regex feature which isn't supported by older tools like sed.
Once you are satisfied with the solution, you can remove the ~ after -i to disable the generation of backups.
You can do this with awk:
awk -F, 'BEGIN {OFS=";"} {a=$NF;NF-=1; printf "%s,%s\n",$0,a} ' input_file
This should work with most awk version (do not count on Solaris standard awk)
The idea is to store the last element from row in variable, decrease the number of fields and then print using new delimiter, comma and stored last field.
Given a text file with lines (for example, a file with three sentences, it will be three lines).
It is necessary in the lines where there are numbers to add the current time in front of them (lines).
By inserting the current time, I sort of figured it out:
sed "s/^/$(date +%T) /" text.txt
I saw it but it doesn't suit me as it is here used IF
But how can I make the strings also be checked for the presence of digits?
But how to check a string for numbers and insert a date before it with one command?
It is possible without
if
statement?
You can use a regex to match the lines
sed "/[0-9]/s/^/$(date +%T) /" text.txt
maybe someone can help me briefly...
for example in file.txt...
nw-3001-e0z-4581a/2/5
sed 's/\<[0-9]\>/0&/' file.txt ...
nw-3001-e0z-4581a/02/5
but I want the filled zero only after the second slash, the first number should remain a single digit
thanks in advance! greetz
Could you please try following, written and tested with shown samples. Simply setting field separator and output field separator as / for awk program and then simply adding 0 before 3rd column(if there is only single digit present in it) and print the line.
echo "nw-3001-e0z-4581a/2/5" | awk 'BEGIN{FS=OFS="/"} {$3=sprintf("%02d",$3)} 1'
You can use
awk 'BEGIN{FS=OFS="/"} $NF ~ /^[0-9]$/ {$NF="0"$NF}1' file.txt
Details:
BEGIN{FS=OFS="/"} - sets input/output field separator to /
$NF ~ /^[0-9]$/ - if last field is a single digit
{$NF="0"$NF} - prepend last field with 0
1 - print tjhe result.
Using sed:
sed -rn 's#(^.*/)(.*/)([[:digit:]]{1}$)#\1\20\3#p' <<< "nw-3001-e0z-4581a/2/5"
Split the string into 3 sections using regular expressions (-r). Ensure that the last section has one digit only with [[:digit:]]{1} and substitute the line for the first and second sections, followed by "0" and the third section, printing the result.
$ sed 's:/:&0:2' file
nw-3001-e0z-4581a/2/05
If that's not all you need then edit your question to show more truly representative sample input/output including cases that doesn't work for.
Given a long text file like this one (that we will call file.txt):
EDITED
1 AA
2 ab
3 azd
4 ab
5 AA
6 aslmdkfj
7 AA
How to delete the lines that appear at least twice in the same file in bash? What I mean is that I want to have this result:
1 AA
2 ab
3 azd
6 aslmdkfj
I do not want to have the same lines in double, given a specific text file. Could you show me the command please?
Assuming whitespace is significant, the typical solution is:
awk '!x[$0]++' file.txt
(eg, The line "ab " is not considered the same as "ab". It is probably simplest to pre-process the data if you want to treat whitespace differently.)
--EDIT--
Given the modified question, which I'll interpret as only wanting to check uniqueness after a given column, try something like:
awk '!x[ substr( $0, 2 )]++' file.txt
This will only compare columns 2 through the end of the line, ignoring the first column. This is a typical awk idiom: we are simply building an array named x (one letter variable names are a terrible idea in a script, but are reasonable for a one-liner on the command line) which holds the number of times a given string is seen. The first time it is seen, it is printed. In the first case, we are using the entire input line contained in $0. In the second case we are only using the substring consisting of everything including and after the 2nd character.
Try this simple script:
cat file.txt | sort | uniq
cat will output the contents of the file,
sort will put duplicate entries adjacent to each other
uniq will remove adjcacent duplicate entries.
Hope this helps!
The uniq command will do what you want.
But make sure the file is sorted first, it only checks for consecutive lines.
Like this:
sort file.txt | uniq
How can you run AWK in Vim's selection of the search?
My pseudo-code
%s/!awk '{ print $2 }'//d
I am trying to delete the given column in the file.
Though they probably address the issue of the original poster, none of the answer addresses the issue advertised in the title of the question. My proposal to remove the first line of the question and to retitle it as "Deleting one column in vim" having been unanimously rejected, here is a solution for people arriving there by actually looking for that.
Deleting a column (here the second one, as in OP's pseudocode example) with awk in vim :
:%!awk '{$2=""; print $0}'
Of course, it also works for a portion of the file — e.g. for lines 10 to 20 :
:10,20!awk '{$2=""; print $0}'
As for "[running] awk in Vim's selection of the search", not sure you can exactly do that but anyway the search and substitution is an easy job for awk, if not its primary purpose. The following replaces "pattern" with "betterpattern" in the second column if it matches :
:%!awk '$2~"pattern" {gsub("pattern","betterpattern",$2)}
Note that the NOT operator requires escaping (\! instead of !). The following replaces the value in the second column by its increment by 10 if it matches "number" and let other lines unchanged :
:%!awk '$2~"number" {gsub($1,$1+10)} $2\!~"number" {print $0}'
Appart from this point it's just awk syntax.
In command mode, press Ctrl-v to go into visual mode, then you can block-select the column using cursor movement keys. You can then yank and put it or delete it or whatever you need using the appropriate vim commands and keystrokes.
You do not have to use awk, even if the second column is not a rectangular region. Use a substitution:
:%s/ \w\+ / /
The second column is made up of at least one from word characters (\w\+) separated by blanks. The replacement is one blank. This one is for a selected range of lines:
:'<,'>s/ \w\+ / /
if you want to delete something, use :%s/pattern//
pattern can't be a command, it's mostly a regular expression. expressing 2nd field in regular expression is not very easy
if you want to delete 2nd field, you can filter the text through cut utility
:%! cut -d ' ' -f 2 --complement
You can delete a given column in a file just from vim.
In command mode use the following to delete column n:
:%s/\(.\{n-1}\).\{1}\(.*$\)/\1\2/g
you could press 0, then press w to go to your 2nd column, and do cw.