I'm studying bash scripting and I'm stuck fixing an exercise of this site: https://ryanstutorials.net/bash-scripting-tutorial/bash-variables.php#activities
The task is to write a bash script to output a random word from a dictionary whose length is equal to the number supplied as the first command line argument.
My idea was to create a sub-dictionary, assign each word a number line, select a random number from those lines and filter the output, which worked for a similar simpler script, but not for this.
This is the code I used:
6 DIC='/usr/share/dict/words'
7 SUBDIC=$( egrep '^.{'$1'}$' $DIC )
8
9 MAX=$( $SUBDIC | wc -l )
10 RANDRANGE=$((1 + RANDOM % $MAX))
11
12 RWORD=$(nl "$SUBDIC" | grep "\b$RANDRANGE\b" | awk '{print $2}')
13
14 echo "Random generated word from $DIC which is $1 characters long:"
15 echo $RWORD
and this is the error I get using as input "21":
bash script.sh 21
script.sh: line 9: counterintelligence's: command not found
script.sh: line 10: 1 + RANDOM % 0: division by 0 (error token is "0")
nl: 'counterintelligence'\''s'$'\n''electroencephalograms'$'\n''electroencephalograph': No such file or directory
Random generated word from /usr/share/dict/words which is 21 characters long:
I tried in bash to split the code in smaller pieces obtaining no error (input=21):
egrep '^.{'21'}$' /usr/share/dict/words | wc -l
3
but once in the script line 9 and 10 give error.
Where do you think is the error?
problems
SUBDIC=$( egrep '^.{'$1'}$' $DIC ) will store all words of the given length in the SUBDIC variable, so it's content is now something like foo bar baz.
MAX=$( $SUBDIC | ... ) will try to run the command foo bar baz which is obviously bogus; it should be more like MAX=$(echo $SUBDIC | ... )
MAX=$( ... | wc -l ) will count the lines; when using the above mentioned echo $SUBDIC you will have multiple words, but all in one line...
RWORD=$(nl "$SUBDIC" | ...) same problem as above: there's only one line (also note #armali's answer that nl requires a file or stdin)
RWORD=$(... | grep "\b$RANDRANGE\b" | ...) might match the dictionary entry catch 22
likely RWORD=$(... | awk '{print $2}') won't handle lines containing spaces
a simple solution
doing a "random sort" over the all the possible words and taking the first line, should be sufficient:
egrep "^.{$1}$" "${DIC}" | sort -R | head -1
MAX=$( $SUBDIC | wc -l ) - A pipe is used for connecting a command's output, while $SUBDIC isn't a command; an appropriate syntax is MAX=$( <<<$SUBDIC wc -l ).
nl "$SUBDIC" - The argument to nl has to be a filename, which "$SUBDIC" isn't; an appropriate syntax is nl <<<"$SUBDIC".
This code will do it. My test dictionary of words is in file file. It's a good idea to get all words of a given length first but put them in an array not in var. And then get a random index and echo it.
dic=( $(sed -n "/^.\{$1\}$/p" file) )
ind=$((0 + RANDOM % ${#dic[#]}))
echo ${dic[$ind]}
I am also doing this activity and I create one simple solution.
I create the script.
#!/bin/bash
awk "NR==$1 {print}" /usr/share/dict/words
Here if you want a random string then you have to run the script as per the below command from the terminal.
./script.sh $RANDOM
If you want the print any specific number string then you can run as per the below command from the terminal.
./script.sh 465
cat /usr/share/dict/american-english | head -n $RANDOM | tail -n 1
$RANDOM - Returns a different random number each time is it referred to.
this simple line outputs random word from the mentioned dictionary.
Otherwise as umläute mentined you can do:
cat /usr/share/dict/american-english | sort -R | head -1
My question is not easy to ask, I try explain the problem with the following example:
/home/luther/tipical_surnames.txt
Smith
Johnson
Williams
Jones
Brown
#Davis
Miller
Wilson
#Moore
Taylor
Anderson
/home/luther/employers.txt
2000 Johnson A lot-of details / BJC3000,6000, i550 0
2101 Smith A lot-of details / BJC3000,6000, i550 0
2102 Smith A lot-of details / BJC3000,6000, i550 0
2103 Jones A lot-of details / BJC3000,6000, i550 0
2104 Johnson A lot-of details / BJC3000,6000, i550 0
2100 Smith A lot-of details / BJC3000,6000, i550 0
I have a list with the favorite surnames and another with the name of employers.
Let's check how many people have the most popular surname in the company, using console:
grep -v "#" /home/luther/tipical_surnames.txt | sed -n 1'p' | cut -f 1
Smith
grep Smith /home/luther/employers.txt | wc -l
230
Work perfect.
Now lets check the first 5 most popular surnames using a simple bash script:
#!/bin/bash
counter=1
while [ $counter -le 5 ]
do
surname=`grep -v "#" /home/luther/tipical_surnames.txt | sed -n "$counter"'p' | cut -f 1`
qty=`grep "$surname" /home/luther/employers.txt | wc -l`
echo $surname
echo $qty
counter=$(( $counter + 1 ))
done
And the result the follows:
Smith
0
Johnson
0
Williams
0
Jones
0
Brown
0
Whats wrong?
Update:
Like I wrote I tested the script on other computer and everything is works fine.
After I try the follow:
root#problematic:/var/www# cat testfile.bash
#!/bin/bash
for (( c=1; c<=5; c++ ))
{
echo $c
}
root#problematic:/var/www# bash testfile.bash
testfile.bash: line 2: syntax error near unexpected token `$'\r''
'estfile.bash: line 2: `for (( c=1; c<=5; c++ ))
root#problematic:/var/www# echo $BASH_VERSION
4.2.37(1)-release
root#problematic:/var/www#
Of course on other computer this simply script work as expected, without error.
This is obviously untested since you haven't posted sample input but this is the kind of approach you should use:
awk '
NR==FNR { if (!/#/) cnt[$1]=0; next }
{ cnt[$WHATEVER]++ }
END {
PROCINFO["sorted_in"] = "#val_num_desc"
for (name in cnt) {
print name, cnt
if (++c == 5) {
break
}
}
}
' /home/luther/tipical_surnames.txt /home/luther/employers.txt
Replace "WHATEVER" with the field number where employee surnames are stored in employers.txt.
The above uses GNU awk for sorted_in, with other awks I'd just remove the PROCINFO line and the count from the output loop and pipe the output to sort then head, e.g.:
awk '
NR==FNR { if (!/#/) cnt[$1]=0; next }
{ cnt[$WHATEVER]++ }
END {
for (name in cnt) {
print name, cnt
}
}
' /home/luther/tipical_surnames.txt /home/luther/employers.txt | sort -k2,1nr | head -5
or whatever the right sort options are.
I'm actually not quite sure. I tested your script, by copying it and pasting it, with imagined data (/usr/share/dict/words) and it seems to work as expected. I wonder if there is a difference between the script you posted and the script you're running?
While at it, I took the liberty of making it run a bit smoother. Notice how, in the loop, you read the entirety of the surnames file in each iteration? Also, grep + wc -l may be replaced by grep -c. I'm also adding -F to the first invocation of grep since the pattern (#) is fixed strings. The grep into the employee file uses \<$name\> to make sure we only get the Johns and no Johnssons when $name is John.
#!/bin/bash
employees_in="/usr/share/dict/words"
names_in="/usr/share/dict/words"
grep -v -F "#" "$names_in" | head -n 5 | cut -f 1 |
while read -r name; do
count="$( grep -c "\<$names\> " "$employees_in" )"
printf "name: %-10s\tcount: %d\n" "$name" "$count"
done
Testing it:
$ bash script.sh
name: A count: 1
name: a count: 1
name: aa count: 1
name: aal count: 1
name: aalii count: 1
Note: I get only ones in the count because the dictionary (not surprisingly) contains only unique words.
I'm doing a script in unix for obtaining specific data, after running a program it gives as an output a very huge string, for example: (is just a random example)
In this example, the null scorex: 34;hypothesis of "marginal homogeneity" would mean there was no effect of the treatment. From the above data, the McNemar scorex: 687;test statistic with Yates's continuity correction is scorex: 9;
and I like that whenever it finds the string "scorex: " it gives me the actual score: 34, 687 or 9, for this example.
Thank you
I forgot, my string is inside a variable called RESULTADO
You can use grep:
grep -oP 'scorex:\s?\K\d*' input
or
<command> | grep -oP 'scorex:\s?\K\d*'
For your example:
$ echo "In this example, the null scorex: 34;hypothesis of "marginal homogeneity" would mean there was no effect of the treatment. From the above data, the McNemar scorex: 687;test statistic with Yates's continuity correction is scorex: 9;" | grep -oP 'scorex:\s?\K\d*'
34
687
9
This can be solved via a regex. Considering the following pattern:
scorex: (\d+)
Using this pattern with grep would look like this:
grep -Eo "scorex: (\d+)" file_containing_string | cut -d: -f2
The output of this is for every capture the result:
34
687
9
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 4 years ago.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Improve this question
Anyone know of a command-line CSV viewer for Linux/OS X? I'm thinking of something like less but that spaces out the columns in a more readable way. (I'd be fine with opening it with OpenOffice Calc or Excel, but that's way too overpowered for just looking at the data like I need to.) Having horizontal and vertical scrolling would be great.
You can also use this:
column -s, -t < somefile.csv | less -#2 -N -S
column is a standard unix program that is very convenient -- it finds the appropriate width of each column, and displays the text as a nicely formatted table.
Note: whenever you have empty fields, you need to put some kind of placeholder in it, otherwise the column gets merged with following columns. The following example demonstrates how to use sed to insert a placeholder:
$ cat data.csv
1,2,3,4,5
1,,,,5
$ sed 's/,,/, ,/g;s/,,/, ,/g' data.csv | column -s, -t
1 2 3 4 5
1 5
$ cat data.csv
1,2,3,4,5
1,,,,5
$ column -s, -t < data.csv
1 2 3 4 5
1 5
$ sed 's/,,/, ,/g;s/,,/, ,/g' data.csv | column -s, -t
1 2 3 4 5
1 5
Note that the substitution of ,, for , , is done twice. If you do it only once, 1,,,4 will become 1, ,,4 since the second comma is matched already.
You can install csvtool (on Ubuntu) via
sudo apt-get install csvtool
and then run:
csvtool readable filename | view -
This will make it nice and pretty inside of a read-only vim instance, even if you have some cells with very long values.
Have a look at csvkit. It provides a set of tools that adhere to the UNIX philosophy (meaning they are small, simple, single-purposed and can be combined).
Here is an example that extracts the ten most populated cities in Germany from the free Maxmind World Cities database and displays the result in a console-readable format:
$ csvgrep -e iso-8859-1 -c 1 -m "de" worldcitiespop | csvgrep -c 5 -r "\d+"
| csvsort -r -c 5 -l | csvcut -c 1,2,4,6 | head -n 11 | csvlook
-----------------------------------------------------
| line_number | Country | AccentCity | Population |
-----------------------------------------------------
| 1 | de | Berlin | 3398362 |
| 2 | de | Hamburg | 1733846 |
| 3 | de | Munich | 1246133 |
| 4 | de | Cologne | 968823 |
| 5 | de | Frankfurt | 648034 |
| 6 | de | Dortmund | 594255 |
| 7 | de | Stuttgart | 591688 |
| 8 | de | Düsseldorf | 577139 |
| 9 | de | Essen | 576914 |
| 10 | de | Bremen | 546429 |
-----------------------------------------------------
Csvkit is platform independent because it is written in Python.
Tabview: lightweight python curses command line CSV file viewer (and also other tabular Python data, like a list of lists) is here on Github
Features:
Python 2.7+, 3.x
Unicode support
Spreadsheet-like view for easily visualizing tabular data
Vim-like navigation (h,j,k,l, g(top), G(bottom), 12G goto line 12, m - mark,
' - goto mark, etc.)
Toggle persistent header row
Dynamically resize column widths and gap
Sort ascending or descending by any column. 'Natural' order sort for numeric values.
Full-text search, n and p to cycle between search results
'Enter' to view the full cell contents
Yank cell contents to clipboard
F1 or ? for keybindings
Can also use from python command line to visualize any tabular data (e.g.
list-of-lists)
If you're a vimmer, use the CSV plugin, which is juuust beautiful:
.
The nodejs package tecfu/tty-table can be globally installed to do precisely this:
apt-get install nodejs
npm i -g tty-table
cat data.csv | tty-table
It can also handle streams.
For more info, see the docs for terminal usage here.
xsv is more than a viewer. I recommend it for most CSV task on the command line, especially when dealing with large datasets.
I used pisswillis's answer for a long time.
csview()
{
local file="$1"
sed "s/,/\t/g" "$file" | less -S
}
But then combined some code I found at http://chrisjean.com/2011/06/17/view-csv-data-from-the-command-line which works better for me:
csview()
{
local file="$1"
cat "$file" | sed -e 's/,,/, ,/g' | column -s, -t | less -#5 -N -S
}
The reason it works better for me is that it handles wide columns better.
Ofri's answer gives you everything you asked for.
But.. if you don't want to remember the command you can add this to your ~/.bashrc (or equivalent):
csview()
{
local file="$1"
sed "s/,/\t/g" "$file" | less -S
}
This is exactly the same as Ofri's answer except I have wrapped it in a shell function and am using the less -S option to stop the wrapping of lines (makes less behaves more like a office/oocalc).
Open a new shell (or type source ~/.bashrc in your current shell) and run the command using:
csview <filename>
Here's a (probably too) simple option:
sed "s/,/\t/g" filename.csv | less
Yet another multi-functional CSV (and not only) manipulation tool: Miller. From its own description, it is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON. (link to github repository: https://github.com/johnkerl/miller)
tblless in the Tabulator package wraps the unix column command, and also aligns numeric columns.
I've created tablign for these (and other) purposes. Install with
pip install tablign
and
$ cat test.csv
Header1,Header2,Header3
Pizza,Artichoke dip,Bob's Special of the Day
BLT,Ham on rye with the works,
$ tablign test.csv
Header1 , Header2 , Header3
Pizza , Artichoke dip , Bob's Special of the Day
BLT , Ham on rye with the works ,
Also works if the data is separated by something else than commas. Most importantly, it preserves the delimiters so you can also use it to style your ASCII tables without sacrificing your [Markdown,CSV,LaTeX] syntax.
I wrote this csv_view.sh to format CSVs from the command line, this reads the entire file to figure out the optimal width of each column (requires perl, assumes there are no commas in fields, also uses less):
#!/bin/bash
perl -we '
sub max( # ) {
my $max = shift;
map { $max = $_ if $_ > $max } #_;
return $max;
}
sub transpose( # ) {
my #matrix = #_;
my $width = scalar #{ $matrix[ 0 ] };
my $height = scalar #matrix;
return map { my $x = $_; [ map { $matrix[ $_ ][ $x ] } 0 .. $height - 1 ] } 0 .. $width - 1;
}
# Read all lines, as arrays of fields
my #lines = map { s/\r?\n$//; [ split /,/ ] } ;
my $widths =
# Build a pack expression based on column lengths
join "",
# For each column get the longest length plus 1
map { 'A' . ( 1 + max map { length } #$_ ) }
# Get arrays of columns
transpose
#lines
;
# Format all lines with pack
map { print pack( $widths, #$_ ) . "\n" } #lines;
' $1 | less -NS
Tabview is really good. Worked with 200+MB files that displayed nicely which were buggy with LibreOffice as well as csv plugin in gvim.
The Anaconda version is available here: https://anaconda.org/bioconda/tabview
Using TxtSushi you can do:
csvtopretty filename.csv | less -S
I wrote a script, viewtab , in Groovy for just this purpose. You invoke it like:
viewtab filename.csv
It is basically a super-lightweight spreadsheet that can be invoked from the command line, handles CSV and tab separated files, can read VERY large files that Excel and Numbers choke on, and is very fast. It's not command-line in the sense of being text-only, but it is platform independent and will probably fit the bill for many people looking for a solution to the problem of quickly inspecting many or large CSV files while working in a command line environment.
The script and how to install it are described here:
http://bayesianconspiracy.blogspot.com/2012/06/quick-csvtab-file-viewer.html
There's this short command line script in python: https://github.com/rgrp/csv2ascii/blob/master/csv2ascii.py
Just download and place in your path. Usage is like
csv2ascii.py [options] csv-file-path
Convert csv file at csv-file-path to ascii form returning the result on
stdout. If csv-file-path = '-' then read from stdin.
Options:
-h, --help show this help message and exit
-w WIDTH, --width=WIDTH
Width of ascii output
-c COLUMNS, --columns=COLUMNS
Only display this number of columns