Output columns format sort linux - linux

Basically after I sort I want my columns to be separated by tabs. right now it is separated by two spaces. The man pages did not have anything related to output formatting (at least I didn't notice it).
If its not possible, I guess I have to use awk to sort and print. Any better alternative?
EDIT:
To clarify the question, the location of the double spaces is not consistent. I actually have data like this:
<date>\t<user>\t<message>.
I sort by date by year, month, day and time which looks like
Wed Jan 11 23:44:30 CST 2012
and then have the output of the sorted data like the original file that is
<date>\t<user>\t<message>.
EDIT 2: Seems like my testing for tab was wrong. I was copy pasting raw line from bash to my Windows box. That's why it didn't recognize as a tab instead it showed spaces. I downloaded whole file to windows and now I can see that the fields are tab separated.
Also, I figured out that separation of fields (\t \n , : ;, etc) is same in the new file after sorting. That means, in the original file if I have tab separated field, my sorted file is also going to be tab separated.
One last thing, the "correct" answer was not exactly the correct solution to the problem. I don't know if I can comment on my own thread and mark it as correct. If it is OK to do that, please let me know.
Thanks for the comments guys. Really appreciate your help!

Pipe your output to column:
sort <whatever> | column -t -s\t

You can use sed:
sort data.txt | sed 's/ /\t/g'
^^
||
2 blank spaces
This will take the output of your sort operation and substitute a single tab for 2 consecutive blanks.

From what I understood the file is already sorted and what you want is to replace the two separating spaces by a TAB character, in that case, use the following:
sed 's/ /\t/g' < sorted_file > new_formatted_file
(Be careful to copy/paste correctly the two spaces in the regular expression)

Related

linux shell script delimiter

How to change delimiter from current comma (,) to semicolon (;) inside .txt file using linux command?
Here is my ME_1384_DataWarehouse_*.txt file:
Data Warehouse,ME_1384,Budget for HW/SVC,13/05/2022,10,9999,13/05/2022,27,08,27,08
Data Warehouse,ME_1384,Budget for HW/SVC,09/05/2022,10,9999,09/05/2022,45,58,45,58
Data Warehouse,ME_1384,Budget for HW/SVC,25/05/2022,10,9999,25/05/2022,7,54,7,54
Data Warehouse,ME_1384,Budget for HW/SVC,25/05/2022,10,9999,25/05/2022,7,54,7,54
It is very important that value of last two columns is number with 2 decimal places, so value of last 2 columns in first row for example is:"27,08"
That could be the main problem why delimiter couldn't be change in proper way.
I tried with:
sed 's/,/;/g' ME_1384_DataWarehouse_*.txt
and every comma sign has been changed, including mentioned value of the last 2 columns.
Is there anyone who can help me out with this issue?
With sed you can replace the nth occurrence of a certain lookup string. Example:
$ sed 's/,/;/4' file
will replace the 4th comma with a semicolon.
So, if you know you have 11 fields (10 commas), you can do
$ sed 's/,/;/g;s/;/,/10;s/;/,/8' file
Example:
$ seq 1 11 | paste -sd, | sed 's/,/;/g;s/;/,/10;s/;/,/8'
1;2;3;4;5;6;7;8,9;10,11
Your question is somewhat unclear, but if you are trying to say "don't change the last comma, or the third-to-last one", a solution to that might be
perl -pi~ -e 's/,(?![^,]+(?:,[^,]+,[^,]+)?$)/;/g' ME_1384_DataWarehouse_*.txt
Perl in isolation does not perform any loop over the input lines, but the -p option says to loop over input one line at a time, like sed, and print every line (there is also -n to simulate the behavior of sed -n); the -i~ says to modify the file, but save the original with a tilde added to its file name as a backup; and the regex uses a negative lookahead (?!...) to protect the two fields you want to exempt from the replacement. Lookaheads are a modern regex feature which isn't supported by older tools like sed.
Once you are satisfied with the solution, you can remove the ~ after -i to disable the generation of backups.
You can do this with awk:
awk -F, 'BEGIN {OFS=";"} {a=$NF;NF-=1; printf "%s,%s\n",$0,a} ' input_file
This should work with most awk version (do not count on Solaris standard awk)
The idea is to store the last element from row in variable, decrease the number of fields and then print using new delimiter, comma and stored last field.

AWK - Show lines where column contains a specific string

I have a document (.txt) composed like that.
info1: info2: info3: info4
And I want to show some information by column.
For example, I have some different information in "info3" shield, I want to see only the lines who are composed by "test" in "info3" column.
I think I have to use sort but I'm not sure.
Any idea ?
The previous answers are assuming that the third column is exactly equal to test. It looks like you were looking for columns where the value included test. We need to use awk's match function
awk -F: 'match($3, "test")' file
You can use awk for this. Assuming your columns are de-limited by : and column 3 has entries having test, below command lists only those lines having that value.
awk -F':' '$3=="test"' input-file
Assuming that the spacing is consistent, and you're looking for only test in the third column, use
grep ".*:.*: test:.*" file.txt
Or to take care of any spacing that might occur
grep ".*:.*: *test *:.*" file.txt

Working with complex CSV from Linux command line

I have a complex CSV file (here is external link because even a small part of it wouldn't look nice on SO) where a particular column may be composed of several columns separated by space.
reset,angle,sine,multiStepPredictions.actual,multiStepPredictions.1,anomalyScore,multiStepBestPredictions.actual,multiStepBestPredictions.1,anomalyLabel,multiStepBestPredictions:multiStep:errorMetric='altMAPE':steps=[1]:window=1000:field=sine,multiStepBestPredictions:multiStep:errorMetric='aae':steps=[1]:window=1000:field=sine
int,string,string,string,string,string,string,string,string,float,float
R,,,,,,,,,,
0,0.0,0.0,0.0,None,1.0,0.0,None,[],0,0
0,0.0314159265359,0.0314107590781,0.0314107590781,{0.0: 1.0},1.0,0.0314107590781,0.0,[],100.0,0.0314107590781
0,0.0628318530718,0.0627905195293,0.0627905195293,{0.0: 0.0039840637450199202 0.03141075907812829: 0.99601593625497931},1.0,0.0627905195293,0.0314107590781,[],66.6556977331,0.0313952597647
0,0.0942477796077,0.0941083133185,0.0941083133185,{0.03141075907812829: 1.0},1.0,0.0941083133185,0.0314107590781,[],66.63923621,0.0418293579232
0,0.125663706144,0.125333233564,0.125333233564,{0.06279051952931337: 0.98942669172932329 0.03141075907812829: 0.010573308270676691},1.0,0.125333233564,0.0627905195293,[],59.9506102238,0.0470076969512
0,0.157079632679,0.15643446504,0.15643446504,{0.03141075907812829: 0.0040463956041429626 0.09410831331851431: 0.94917381047888194 0.06279051952931337: 0.046779793916975114},1.0,0.15643446504,0.0941083133185,[],53.2586756624,0.0500713879053
0,0.188495559215,0.187381314586,0.187381314586,{0.12533323356430426: 0.85789473684210527 0.09410831331851431: 0.14210526315789476},1.0,0.187381314586,0.125333233564,[],47.5170631454,0.0520675034246
For viewing I am using this trick column -s,$'\t' -t < *.csv | less -#2 -N -S which is an upgraded version borrowed from Command line CSV viewer. If I'm using this trick is explicitly clear what is the 1st 2nd 3rd ... column and what is the data which are composed of several space separated data in particular column.
My question is if there is any trick to manipulating such complex CSV? I know that I can use awk to filter 5th column, then from this filtered column filter again 2nd column to get the desired portion of complex data, but I need to watch if there wasn't another composed column before 5th (so I need to get actually 6th not 5th column etc) some columns may contain also mix of composed and non composed data. So awk is probably not right tool.
The CSV viewer link mentions a tool called csvlook which adds to output pipes as a separator. This could be more easy to filter because pipes will delimit columns and white spaces will delimit composed data on one column. But I cannot run csvlook with multiple delimiters (comma and tab) as I did for column so it did not generate data properly. What is the most comfortable way of handling this?
As long as your input doesn't contain columns with escaped embedded , chars., you should be able to parse it with awk, using , as the field separator; e.g.:
awk -F, '{ n = split($5, subField, "[[:blank:]]+"); for (i=1;i<=n;++i) print subField[i] }' file.csv
The above splits the 5th field into sub-fields by whitespace, using the split() function.
Take a look at cut command. You can specify a list of fields, or a range of fields.

What's the best way to rearrange columns in a tab delimited text file in vim?

I already know how to do it with
:%s/\(\S\+\)^I\(\S\+\)/\2^I\1/
but I feel like I'm typing way to much stuff. Is there a cleaner, quicker way to do it?
If the columns are lined up, you can use visual block mode by hitting Ctrl+V, then cut and paste. If the columns are not lined up, increase the tab width first so that it's longer than the content of the columns in question.
Best way to do it in VIM is - not to do it with VIM and (re)use existing tools for the job. *NIX specific solution:
:%!awk -F \\t '{print $2 FS $1}'
Would pipe the content of the tab-delimited file to awk and it will print first two columns swapped, separated by field separator (FS). awk can be also found for Windows.
P.S. Initially I wanted to write the same with cut but for whatever reason on my system the cut -f 2,1 (-d is not needed as TAB is the default delimiter) printed the fields in the same order, not swapped :|

Running AWK in Vim's search

How can you run AWK in Vim's selection of the search?
My pseudo-code
%s/!awk '{ print $2 }'//d
I am trying to delete the given column in the file.
Though they probably address the issue of the original poster, none of the answer addresses the issue advertised in the title of the question. My proposal to remove the first line of the question and to retitle it as "Deleting one column in vim" having been unanimously rejected, here is a solution for people arriving there by actually looking for that.
Deleting a column (here the second one, as in OP's pseudocode example) with awk in vim :
:%!awk '{$2=""; print $0}'
Of course, it also works for a portion of the file — e.g. for lines 10 to 20 :
:10,20!awk '{$2=""; print $0}'
As for "[running] awk in Vim's selection of the search", not sure you can exactly do that but anyway the search and substitution is an easy job for awk, if not its primary purpose. The following replaces "pattern" with "betterpattern" in the second column if it matches :
:%!awk '$2~"pattern" {gsub("pattern","betterpattern",$2)}
Note that the NOT operator requires escaping (\! instead of !). The following replaces the value in the second column by its increment by 10 if it matches "number" and let other lines unchanged :
:%!awk '$2~"number" {gsub($1,$1+10)} $2\!~"number" {print $0}'
Appart from this point it's just awk syntax.
In command mode, press Ctrl-v to go into visual mode, then you can block-select the column using cursor movement keys. You can then yank and put it or delete it or whatever you need using the appropriate vim commands and keystrokes.
You do not have to use awk, even if the second column is not a rectangular region. Use a substitution:
:%s/ \w\+ / /
The second column is made up of at least one from word characters (\w\+) separated by blanks. The replacement is one blank. This one is for a selected range of lines:
:'<,'>s/ \w\+ / /
if you want to delete something, use :%s/pattern//
pattern can't be a command, it's mostly a regular expression. expressing 2nd field in regular expression is not very easy
if you want to delete 2nd field, you can filter the text through cut utility
:%! cut -d ' ' -f 2 --complement
You can delete a given column in a file just from vim.
In command mode use the following to delete column n:
:%s/\(.\{n-1}\).\{1}\(.*$\)/\1\2/g
you could press 0, then press w to go to your 2nd column, and do cw.

Resources