convert one single column to multiple columns in linux - linux

I am trying to convert a single column to multiple columns using
grep -v '^\s*$' $1| pr -ts" " --columns $2
but I get this error:
pr: page width too narrow
could someone help me with this error?

The question isn't fully clear to me. Maybe you can provide some sample input data as well. If you want to convert 1 column to multiple columns xargs command can be used.
xargs -n10 --> will convert 1 column to 10 i.e. it will take 10 rows and make them one.
BR,

Related

Trouble pipelining grep into a sort

I have data in file in the form:
Torch Lake township | Antrim | 1194
I'm able use grep to look for keywords and pipe that into a sort but the sort isn't behaving how I intendended
This is what I have:
grep '| Saginaw' data | sort -k 5,5
I would like to be able to sort by the numerical value in the last column but it currently isn't and I'm unsure what I'm actually doing wrong.
A few things seem to be bogging you down.
First, the vertical bar can be a special character in grep. It means OR. Ex:
A|B
could be interpreted as A or B, and not A vertical bar B.
To correct that, you need to tell grep to interpret the | as a non-special character. To do that, escape it, like this:
grep '\| Saginaw' data
or, simply remove it altogether, if you data format allows that.
Second, the sort command needs to know what your column separator is. By default, it uses a space character (Actually, it's any white space). sort -k 5,5 actually says "sort on the 5th word"
To specify that your column separator is actually the vertical bay, use the -t option:
sort -t'|' -k 5,5
alternately,
sort --field-separator='|' -k 5,5
Third, You've got a bit of a sticky wicket now. Your data is formatted as:
Field1 | Field2 | Field3
...and not...
Field1|Field2|Field3
You may have issues with that additional space. Or maybe not. If all of your data has EXACTLY the same white-space, you'll be fine. If some have a single space, some have 2 spaces, and others have a tab, your sort will get jacked up.
Fourth, sorting by numbers may not be intuitive for you. The number 10 comes after the number 1 and before the number 2.
To sort the way you think it ought to be, where 10 comes after 9, use the option -n for numeric sort.
grep '\| Saginaw' data | sort -t'|' -n -k 5,5
The entire filed #5 will be sorted. Thus, 10 Abbington will come before 10 Andover.

Linux command to find Lowest value in a particular column of a data file [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I have a data file with three columns and I want to search the lowest value in the third column and print the corresponding values of column 1,2 and 3.
I want to do it using a linux terminal command. How should I do it ?
I tried grep command and cut -f1 -d"," contourRESFsi1.dat | sort -n | head -1 also, but it is not giving me the right values..
Thank you.
You can use sort do get the min of the third column:
sort -n -k 3 file.txt | head -n 1
-n option is to sort by number (default is alpha)
-k option is to specify which column to sort
-t option is to specify the column separator (default is space or tab)
Explanation:
The command sort is sorting your file on the fly (putting the lowest number first, and the greatest at the end), thanks to the option -n.
Then, the head command takes the first line of that "on the fly, sorted buffer", hence producing only one line (the one with the lowest value).
To cut the three lowest values, for example:
sort -n -k 3 file.txt | head -n 3

Deleting colums of large csv files

I have a large CSV file of around 2 GB, containing 7 columns. I want to delete its 4th column which is a text (snippet). I used "cut" command like:
cut -d, f 4 -- complement file
But it is not removing the column because it is making columns whenever a comma is encountered in a row and deleting the 4th column from that row. Following answer here, i used csvquote like:
csvquote file | cut -d "," -f 4 --complement | uniq -c | csvquote -u
it worked for a small file, but throwing an error for large files:
errno: Value too large for defined data type
I want to know some solutions for deleting columns of the large data file. Thanks.
Edit: Head file output:
funny,user_id,review_id,text,business_id,stars,date,useful,type,cool
0,WV5XKbgVHJXEgw7f-b6PVA,hhmpSM4LcHQv6noXlYYCgw,"Went out of our way to find this place because I read they had amazing poutine. Worth the traveling. It really was spot on amazing. Served out of a storage container this place is hip. $10 for two huge portions of poutine. The fries were crisp and held up to the creamy gravy well. Topped with a huge portion of squeaky white cheese curds this was a fantastic meal.
Have you tried telling cut to use the other fields instead?
Like this:
trucks | cut -f 1,3- -d , | uniq -c | csvquote/csvquote -u
I tested it on my machine and it seems to work. But I didn't see a sample of your data, also you didn't note which program is throwing the
errno: Value too large for defined data type

Descending order sort for very small numbers

I have my input file as :
Helguson 1.11889675673e-06
CAPTION_spot 1.37407731642e-07
Earning 1.20657023177e-06
340km 6.82228429758e-07
Mortimer 3.08700799033e-07
yellow 6.26784196571e-06
four 0.000271117940104
Pronk 5.79848408861e-07
jihad 3.25632057648e-07
I want to sort in descending order of the second column and hence, I tried using the linux command:
sort -k2 -nr input.txt > output.txt
My output is generated as:
340km 6.82228429758e-07
yellow 6.26784196571e-06
Pronk 5.79848408861e-07
jihad 3.25632057648e-07
Mortimer 3.08700799033e-07
CAPTION_spot 1.37407731642e-07
Earning 1.20657023177e-06
Helguson 1.11889675673e-06
four 0.000271117940104
It is not sorting properly. How to resolve this? Please help.
You need to include the -g option in sort. Otherwise it sorts in alphanumeric order, but with -g it converts it to a number first and then sorts.

How to delete double lines in bash

Given a long text file like this one (that we will call file.txt):
EDITED
1 AA
2 ab
3 azd
4 ab
5 AA
6 aslmdkfj
7 AA
How to delete the lines that appear at least twice in the same file in bash? What I mean is that I want to have this result:
1 AA
2 ab
3 azd
6 aslmdkfj
I do not want to have the same lines in double, given a specific text file. Could you show me the command please?
Assuming whitespace is significant, the typical solution is:
awk '!x[$0]++' file.txt
(eg, The line "ab " is not considered the same as "ab". It is probably simplest to pre-process the data if you want to treat whitespace differently.)
--EDIT--
Given the modified question, which I'll interpret as only wanting to check uniqueness after a given column, try something like:
awk '!x[ substr( $0, 2 )]++' file.txt
This will only compare columns 2 through the end of the line, ignoring the first column. This is a typical awk idiom: we are simply building an array named x (one letter variable names are a terrible idea in a script, but are reasonable for a one-liner on the command line) which holds the number of times a given string is seen. The first time it is seen, it is printed. In the first case, we are using the entire input line contained in $0. In the second case we are only using the substring consisting of everything including and after the 2nd character.
Try this simple script:
cat file.txt | sort | uniq
cat will output the contents of the file,
sort will put duplicate entries adjacent to each other
uniq will remove adjcacent duplicate entries.
Hope this helps!
The uniq command will do what you want.
But make sure the file is sorted first, it only checks for consecutive lines.
Like this:
sort file.txt | uniq

Resources