View tabular file such as CSV from command line [closed] - linux

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 4 years ago.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Improve this question
Anyone know of a command-line CSV viewer for Linux/OS X? I'm thinking of something like less but that spaces out the columns in a more readable way. (I'd be fine with opening it with OpenOffice Calc or Excel, but that's way too overpowered for just looking at the data like I need to.) Having horizontal and vertical scrolling would be great.

You can also use this:
column -s, -t < somefile.csv | less -#2 -N -S
column is a standard unix program that is very convenient -- it finds the appropriate width of each column, and displays the text as a nicely formatted table.
Note: whenever you have empty fields, you need to put some kind of placeholder in it, otherwise the column gets merged with following columns. The following example demonstrates how to use sed to insert a placeholder:
$ cat data.csv
1,2,3,4,5
1,,,,5
$ sed 's/,,/, ,/g;s/,,/, ,/g' data.csv | column -s, -t
1 2 3 4 5
1 5
$ cat data.csv
1,2,3,4,5
1,,,,5
$ column -s, -t < data.csv
1 2 3 4 5
1 5
$ sed 's/,,/, ,/g;s/,,/, ,/g' data.csv | column -s, -t
1 2 3 4 5
1 5
Note that the substitution of ,, for , , is done twice. If you do it only once, 1,,,4 will become 1, ,,4 since the second comma is matched already.

You can install csvtool (on Ubuntu) via
sudo apt-get install csvtool
and then run:
csvtool readable filename | view -
This will make it nice and pretty inside of a read-only vim instance, even if you have some cells with very long values.

Have a look at csvkit. It provides a set of tools that adhere to the UNIX philosophy (meaning they are small, simple, single-purposed and can be combined).
Here is an example that extracts the ten most populated cities in Germany from the free Maxmind World Cities database and displays the result in a console-readable format:
$ csvgrep -e iso-8859-1 -c 1 -m "de" worldcitiespop | csvgrep -c 5 -r "\d+"
| csvsort -r -c 5 -l | csvcut -c 1,2,4,6 | head -n 11 | csvlook
-----------------------------------------------------
| line_number | Country | AccentCity | Population |
-----------------------------------------------------
| 1 | de | Berlin | 3398362 |
| 2 | de | Hamburg | 1733846 |
| 3 | de | Munich | 1246133 |
| 4 | de | Cologne | 968823 |
| 5 | de | Frankfurt | 648034 |
| 6 | de | Dortmund | 594255 |
| 7 | de | Stuttgart | 591688 |
| 8 | de | Düsseldorf | 577139 |
| 9 | de | Essen | 576914 |
| 10 | de | Bremen | 546429 |
-----------------------------------------------------
Csvkit is platform independent because it is written in Python.

Tabview: lightweight python curses command line CSV file viewer (and also other tabular Python data, like a list of lists) is here on Github
Features:
Python 2.7+, 3.x
Unicode support
Spreadsheet-like view for easily visualizing tabular data
Vim-like navigation (h,j,k,l, g(top), G(bottom), 12G goto line 12, m - mark,
' - goto mark, etc.)
Toggle persistent header row
Dynamically resize column widths and gap
Sort ascending or descending by any column. 'Natural' order sort for numeric values.
Full-text search, n and p to cycle between search results
'Enter' to view the full cell contents
Yank cell contents to clipboard
F1 or ? for keybindings
Can also use from python command line to visualize any tabular data (e.g.
list-of-lists)

If you're a vimmer, use the CSV plugin, which is juuust beautiful:
.

The nodejs package tecfu/tty-table can be globally installed to do precisely this:
apt-get install nodejs
npm i -g tty-table
cat data.csv | tty-table
It can also handle streams.
For more info, see the docs for terminal usage here.

xsv is more than a viewer. I recommend it for most CSV task on the command line, especially when dealing with large datasets.

I used pisswillis's answer for a long time.
csview()
{
local file="$1"
sed "s/,/\t/g" "$file" | less -S
}
But then combined some code I found at http://chrisjean.com/2011/06/17/view-csv-data-from-the-command-line which works better for me:
csview()
{
local file="$1"
cat "$file" | sed -e 's/,,/, ,/g' | column -s, -t | less -#5 -N -S
}
The reason it works better for me is that it handles wide columns better.

Ofri's answer gives you everything you asked for.
But.. if you don't want to remember the command you can add this to your ~/.bashrc (or equivalent):
csview()
{
local file="$1"
sed "s/,/\t/g" "$file" | less -S
}
This is exactly the same as Ofri's answer except I have wrapped it in a shell function and am using the less -S option to stop the wrapping of lines (makes less behaves more like a office/oocalc).
Open a new shell (or type source ~/.bashrc in your current shell) and run the command using:
csview <filename>

Here's a (probably too) simple option:
sed "s/,/\t/g" filename.csv | less

Yet another multi-functional CSV (and not only) manipulation tool: Miller. From its own description, it is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON. (link to github repository: https://github.com/johnkerl/miller)

tblless in the Tabulator package wraps the unix column command, and also aligns numeric columns.

I've created tablign for these (and other) purposes. Install with
pip install tablign
and
$ cat test.csv
Header1,Header2,Header3
Pizza,Artichoke dip,Bob's Special of the Day
BLT,Ham on rye with the works,
$ tablign test.csv
Header1 , Header2 , Header3
Pizza , Artichoke dip , Bob's Special of the Day
BLT , Ham on rye with the works ,
Also works if the data is separated by something else than commas. Most importantly, it preserves the delimiters so you can also use it to style your ASCII tables without sacrificing your [Markdown,CSV,LaTeX] syntax.

I wrote this csv_view.sh to format CSVs from the command line, this reads the entire file to figure out the optimal width of each column (requires perl, assumes there are no commas in fields, also uses less):
#!/bin/bash
perl -we '
sub max( # ) {
my $max = shift;
map { $max = $_ if $_ > $max } #_;
return $max;
}
sub transpose( # ) {
my #matrix = #_;
my $width = scalar #{ $matrix[ 0 ] };
my $height = scalar #matrix;
return map { my $x = $_; [ map { $matrix[ $_ ][ $x ] } 0 .. $height - 1 ] } 0 .. $width - 1;
}
# Read all lines, as arrays of fields
my #lines = map { s/\r?\n$//; [ split /,/ ] } ;
my $widths =
# Build a pack expression based on column lengths
join "",
# For each column get the longest length plus 1
map { 'A' . ( 1 + max map { length } #$_ ) }
# Get arrays of columns
transpose
#lines
;
# Format all lines with pack
map { print pack( $widths, #$_ ) . "\n" } #lines;
' $1 | less -NS

Tabview is really good. Worked with 200+MB files that displayed nicely which were buggy with LibreOffice as well as csv plugin in gvim.
The Anaconda version is available here: https://anaconda.org/bioconda/tabview

Using TxtSushi you can do:
csvtopretty filename.csv | less -S

I wrote a script, viewtab , in Groovy for just this purpose. You invoke it like:
viewtab filename.csv
It is basically a super-lightweight spreadsheet that can be invoked from the command line, handles CSV and tab separated files, can read VERY large files that Excel and Numbers choke on, and is very fast. It's not command-line in the sense of being text-only, but it is platform independent and will probably fit the bill for many people looking for a solution to the problem of quickly inspecting many or large CSV files while working in a command line environment.
The script and how to install it are described here:
http://bayesianconspiracy.blogspot.com/2012/06/quick-csvtab-file-viewer.html

There's this short command line script in python: https://github.com/rgrp/csv2ascii/blob/master/csv2ascii.py
Just download and place in your path. Usage is like
csv2ascii.py [options] csv-file-path
Convert csv file at csv-file-path to ascii form returning the result on
stdout. If csv-file-path = '-' then read from stdin.
Options:
-h, --help show this help message and exit
-w WIDTH, --width=WIDTH
Width of ascii output
-c COLUMNS, --columns=COLUMNS
Only display this number of columns

Related

Tee pipe into 3 different processes and grepping the second match

I am trying to create a bash script which shows me the latest stats about corona infection numbers in the countries Germany and Switzerland and also in the whole world.
corona () {
curl -s https://corona-stats.online\?minimal\=true | tee >(head -n 1) > >(grep "(CH)\|(DE)")
curl -s https://corona-stats.online\?minimal\=true | tail -n 20 | grep World
}
As you can see, to do this I had to create this very ugly script where curl is called twice. I had to do this because the website looks like this:
Rank World Total Cases New Cases ▲ Total Deaths New Deaths ▲ Recovered Active Critical Cases / 1M pop
1 USA (US) 7,497,256 2,585 ▲ 212,694 34 ▲ 4,737,369 2,547,193 14,190 22,617
2 India (IN) 6,397,896 5,936 ▲ 99,833 29 ▲ 5,352,078 945,985 8,944 4,625
3 Brazil (BR) 4,849,229 144,767 4,212,772 491,690 8,318 22,773
4 Russia (RU) 1,194,643 9,412 ▲ 21,077 186 ▲ 970,296 203,270 2,300 8,185
...
22 Germany (DE) 295,943 413 ▲ 9,586 259,500 26,857 362 3,529
...
58 Switzerland (CH) 54,384 552 ▲ 2,075 1 ▲ 45,300 7,009 32 6,272
...
World 34,534,040 63,822 ▲ 1,028,540 1,395 ▲ 25,482,492 8,023,008 66,092 4,430.85
Code: https://github.com/sagarkarira/coronavirus-tracker-cli
Twitter: https://twitter.com/ekrysis
Last Updated on: 02-Oct-2020 12:10 UTC
US STATES API: https://corona-stats.online/states/us
HELP: https://corona-stats.online/help
SPONSORED BY: ZEIT NOW
Checkout fun new side project I am working on: https://messagink.com/story/5eefb79b77193090dd29d3ce/global-response-to-coronavirus
I only want to display the first line, the last line of the table (World) and the two lines about Germany and Switzerland. I manged to display the first line as well as the two countries by piping the output of curl into head -n 1 and grepping the country codes. I was able to do both things thanks to this answer.
Now I want to get the last line in the table, the one where the current cases of the whole World are displayed. I tried to use tee again to pipe it into a third process tee >(head -n 1) > >(grep "(CH)\|(DE)") > >(tail -n 20 | grep World). But that didn't work. My first question is, how can I pipe an output into 3 different processes using tee?
The second question revolves around the way I try to grep the World line. I tail the last 20 lines and then grep "World". I do this because if I simply grep "World", it only return the title line where "World" can also be found. So my second question is: How can I grep only the last or second occurance?
You can chain several tee commands and throw away only the last output of tee:
curl -s ... | tee >( cmd1 ) | tee >( cmd2 ) | tee > >( cmd3 )
Actually, we can shorten it to:
curl -s ... | tee >( cmd1 ) | tee >( cmd2 ) | cmd3
because we do not use the output of the last tee anyway.
Having multiple commands write to the terminal at the same time might get the output mixed up. A much more elegant solution is to use only one grep, e.g.
curl -s ... | grep '(DE)\|(CH)\|World.*,'
The expression World.*, will just look for a comma in the same line after World, in order to exclude the head line.
I think a variable should suit better what you need (at least in this case), something like:
corona() {
data="$(curl -s https://corona-stats.online\?minimal\=true)"
echo "$data" | head -n 1
echo "$data" | grep "(CH)\|(DE)"
echo "$data" | tail -n 20 | grep World
}
It would convey easier what you're trying to do and would also be easier to expand if you'd need to change anything.
You can try this:
curl -s https://corona-stats.online\?minimal\=true | grep -E "(Rank|^1[^0-9]|\(CH\)|\(DE\))"
Use grep to display only line contain "Rank", 1[non-digit], (CH), (DE)

Bash: Loop Read N lines at time from CSV

I have a csv file of 100000 ids
wef7efwe1fwe8
wef7efwe1fwe3
ewefwefwfwgrwergrgr
that are being transformed into a json object using jq
output=$(jq -Rsn '
{"id":
[inputs
| . / "\n"
| (.[] | select(length > 0) | . / ";") as $input
| $input[0]]}
' <$FILE)
output
{
"id": [
"wef7efwe1fwe8",
"wef7efwe1fwe3",
....
]
}
currently, I need to manually split the file into smaller 10000 line files... because the API call has a limit.
I would like a way to automatically loop through the large file... and only use 10000 lines as a time as $FILE... up until the end of the list.
I would use the split command and write a little shell script around it:
#!/bin/bash
input_file=ids.txt
temp_dir=splits
api_limit=10000
# Make sure that there are no leftovers from previous runs
rm -rf "${temp_dir}"
# Create temporary folder for splitting the file
mkdir "${temp_dir}"
# Split the input file based on the api limit
split --lines "${api_limit}" "${input_file}" "${temp_dir}/"
# Iterate through splits and make an api call per split
for split in "${temp_dir}"/* ; do
jq -Rsn '
{"id":
[inputs
| . / "\n"
| (.[] | select(length > 0) | . / ";") as $input
| $input[0]]
}' "${split}" > api_payload.json
# now do something ...
# curl -dapi_payload.json http://...
rm -f api_payload.json
done
# Clean up
rm -rf "${temp_dir}"
Here's a simple and efficient solution that at its core just uses jq. It takes advantage of the -c command-line option. I've used xargs printf ... for illustration - mainly to show how easy it is to set up a shell pipeline.
< data.txt jq -Rnc '
def batch($n; stream):
def b: [limit($n; stream)]
| select(length > 0)
| (., b);
b;
{id: batch(10000; inputs | select(length>0) | (. / ";")[0])}
' | xargs printf "%s\n"
Parameterizing batch size
It might make sense to set things up so that the batch size is specified outside the jq program. This could be done in numerous ways, e.g. by invoking jq along the lines of:
jq --argjson n 10000 ....
and of course using $n instead of 10000 in the jq program.
Why “def b:”?
For efficiency. jq’s TCO (tail recursion optimization) only works for arity-0 filters.
Note on -s
In the Q as originally posted, the command-line options -sn are used in conjunction with inputs. Using -s with inputs defeats the whole purpose of inputs, which is to make it possible to process input in a stream-oriented way (i.e. one line of input or one JSON entity at a time).

How can I fix my bash script to find a random word from a dictionary?

I'm studying bash scripting and I'm stuck fixing an exercise of this site: https://ryanstutorials.net/bash-scripting-tutorial/bash-variables.php#activities
The task is to write a bash script to output a random word from a dictionary whose length is equal to the number supplied as the first command line argument.
My idea was to create a sub-dictionary, assign each word a number line, select a random number from those lines and filter the output, which worked for a similar simpler script, but not for this.
This is the code I used:
6 DIC='/usr/share/dict/words'
7 SUBDIC=$( egrep '^.{'$1'}$' $DIC )
8
9 MAX=$( $SUBDIC | wc -l )
10 RANDRANGE=$((1 + RANDOM % $MAX))
11
12 RWORD=$(nl "$SUBDIC" | grep "\b$RANDRANGE\b" | awk '{print $2}')
13
14 echo "Random generated word from $DIC which is $1 characters long:"
15 echo $RWORD
and this is the error I get using as input "21":
bash script.sh 21
script.sh: line 9: counterintelligence's: command not found
script.sh: line 10: 1 + RANDOM % 0: division by 0 (error token is "0")
nl: 'counterintelligence'\''s'$'\n''electroencephalograms'$'\n''electroencephalograph': No such file or directory
Random generated word from /usr/share/dict/words which is 21 characters long:
I tried in bash to split the code in smaller pieces obtaining no error (input=21):
egrep '^.{'21'}$' /usr/share/dict/words | wc -l
3
but once in the script line 9 and 10 give error.
Where do you think is the error?
problems
SUBDIC=$( egrep '^.{'$1'}$' $DIC ) will store all words of the given length in the SUBDIC variable, so it's content is now something like foo bar baz.
MAX=$( $SUBDIC | ... ) will try to run the command foo bar baz which is obviously bogus; it should be more like MAX=$(echo $SUBDIC | ... )
MAX=$( ... | wc -l ) will count the lines; when using the above mentioned echo $SUBDIC you will have multiple words, but all in one line...
RWORD=$(nl "$SUBDIC" | ...) same problem as above: there's only one line (also note #armali's answer that nl requires a file or stdin)
RWORD=$(... | grep "\b$RANDRANGE\b" | ...) might match the dictionary entry catch 22
likely RWORD=$(... | awk '{print $2}') won't handle lines containing spaces
a simple solution
doing a "random sort" over the all the possible words and taking the first line, should be sufficient:
egrep "^.{$1}$" "${DIC}" | sort -R | head -1
MAX=$( $SUBDIC | wc -l ) - A pipe is used for connecting a command's output, while $SUBDIC isn't a command; an appropriate syntax is MAX=$( <<<$SUBDIC wc -l ).
nl "$SUBDIC" - The argument to nl has to be a filename, which "$SUBDIC" isn't; an appropriate syntax is nl <<<"$SUBDIC".
This code will do it. My test dictionary of words is in file file. It's a good idea to get all words of a given length first but put them in an array not in var. And then get a random index and echo it.
dic=( $(sed -n "/^.\{$1\}$/p" file) )
ind=$((0 + RANDOM % ${#dic[#]}))
echo ${dic[$ind]}
I am also doing this activity and I create one simple solution.
I create the script.
#!/bin/bash
awk "NR==$1 {print}" /usr/share/dict/words
Here if you want a random string then you have to run the script as per the below command from the terminal.
./script.sh $RANDOM
If you want the print any specific number string then you can run as per the below command from the terminal.
./script.sh 465
cat /usr/share/dict/american-english | head -n $RANDOM | tail -n 1
$RANDOM - Returns a different random number each time is it referred to.
this simple line outputs random word from the mentioned dictionary.
Otherwise as umläute mentined you can do:
cat /usr/share/dict/american-english | sort -R | head -1

Exclude one string from bash output

I'm working now on a project. In this project for some reasons I need to exclude first string from the output (or file) that matches the pattern. The difficulty is in that I need to exclude just one string, just first string from the stream.
For example, if I have:
1 abc
2 qwerty
3 open
4 abc
5 talk
After some script working I should have this:
2 qwerty
3 open
4 abc
5 talk
NOTE: I don't know anything about digits before words, so I can't filter the output using knowledge about them.
I've written small script with grep, but it cuts out every string, that matches the pattern:
'some program' | grep -v "abc"
Read info about awk, sed, etc. but didn't understand if I can solve my problem.
Anything helps, Thank you.
Using awk:
some program | awk '{ if (/abc/ && !seen) { seen = 1 } else print }'
Alternatively, using only filters:
some program | awk '!/abc/ || seen { print } /abc/ && !seen { seen = 1 }'
You can use Ex editor. For example to remove the first pattern from the file:
ex +"/abc/d" -scwq file.txt
From the input (replace cat with your program):
ex +"/abc/d" +%p -scq! <(cat file.txt)
You can also read from stdin by replacing cat with /dev/stdin.
Explanation:
+cmd - execute Ex/Vim command
/pattern/d - find the pattern and delete,
%p - print the current buffer
-s - silent mode
-cq! - execute quite without saving (!)
<(cmd) - shell process substitution
give line numbers using sed which you want to delete
sed 1,2d
instead of 1 2 use line numbers that you want to delete
otherwise you can use
sed '/pattrent to match/d'
here we can have
sed '0,/abc/{//d;}'
You can also use a list of commands { list; } to read the first line and print the rest:
command | { read first_line; cat -; }
Simple example:
$ cat file
1 abc
2 qwerty
3 open
4 abc
5 talk
$ cat file | { read first_line; cat -; }
2 qwerty
3 open
4 abc
5 talk
awk '!/1/' file
2 qwerty
3 open
4 abc
5 talk
Thats all!

grouping lines from a txt file using filters in Linux to create multiple txt files

I have a txt file, where each line starts with participant No, followed by the date and other variables (numbers only), so has format:
S001_2 20090926 14756 93
S002_2 20090803 15876 13
I want to write a script that creates smaller txt files containing only 20 participants per file (so first one will contain lines from S001_2 to S020_2;second from S021_2 to S040_2; total number of subjects approximately 200). However, subjects are not organized, therefore I can`t set a range with sed.
What would be the best command to filter ppts into chunks depending on what number (SOO1_2) the line starts with?
Thanks in advance.
Use the split command to split a file (or a filtered result) without ranges and sed. According to the documentation, this should work:
cat file.txt | split -l 20 - PREFIX
This will produce the files PREFIXaa, PREFIXab, ... (Note that it does not add the .txt extension to the file name!)
If you want to filter the files first, in the way #Sergey described:
cat file.txt | sort | split -l 20 - PREFIX
Sort without any parameters should be suitable, because there are leading zeros in your numbers like S001_2. So, first sort the file:
sort file.txt > sorted.txt
Then you will be able to set ranges with sed for file_sort.txt
This looks like a whole script for splitting sorted file into 20-line files:
num=1;
i=1;
lines=`wc -l sorted.txt | cut -d' ' -f 1`;#get number of lines
while [ $i -lt $lines ];do
sed -n $i,`echo $i+19 | bc`p sorted.txt > file$num;
num=`echo $num+1 | bc`;
i=`echo $i+20 | bc`;
done;
$ split -d -l 20 file.txt -a3 db_
produces: db_000, db_001, db_002, ..., db_N

Resources