How to convert columnar labeled data to json using Bash? [duplicate]

How to convert columnar labeled data to json using Bash? [duplicate] - linux

This question already has answers here:
CSV to JSON using jq
(9 answers)
Closed 3 years ago.
Suppose I have the following stdout output from an expression
Min 1st_Qu Median Mean 3rd_Qu Max NAs
1.000 1.000 2.000 1.875 2.250 3.000 1
And the goal is:
{ "Min":1.0, "1st_Qu":1.0, ..., "NAs":1 }
Is there a trivial way to do this with bash? (also trying to limit dependencies, and would prefer to just cherry pick what I need using awk over adding any non-native linux dependencies).

The following might count as trivial but it does not preserve the precision of the numbers. It assumes the separator is the regex " +", but you could easily change that, e.g. to "\t" if the value separator is a tab:
jq -n -R '[inputs | [splits(" +") | select(length>0)]]
| transpose | map({(.[0]): .[1]|tonumber}) | add'
Handling multiple data rows
The following assumes a jq -nR invocation:
def zip(headers):
. as $in
| reduce range(0; headers|length) as $i ({}; .[headers[$i]] = ($in[$i]) );
def s2a: [splits(" +") | select(length>0)];
(input | s2a) as $h
| inputs | s2a | map(tonumber) | zip($h)

Related

find different beginnings of files with bash [duplicate]

This question already has answers here:
Getting the count of unique values in a column in bash
(7 answers)
Closed 2 years ago.
I have a directory with n files in it, all starting with a date in format yyyymmdd.
Example:
20210208_bla.txt
20210208_bla2.txt
20210209_bla.txt
I want know how many files of a certain date I have, so output should be like:
20210208 112
20210209 96
20210210 213
...
Or at least find the different beginnings of the actual files (=the different dates) in my folder.
Thanks

A very simple solution would be to do something like:
ls | cut -f 1 -d _ | sort -n | uniq -c
With your example this gives:
2 20210208
1 20210209
Update: If you need to swap the two columns you can follow: https://stackoverflow.com/a/11967849/2001017
ls | cut -f 1 -d _ | sort -n | uniq -c | awk '{ t = $1; $1 = $2; $2 = t; print; }'
which prints:
20210208 2
20210209 1

Add "invisible" decimal places to end of number?

I am printing a "Table" to the console. I will be using this same table structure for several different variables. However as you can see from Output below, the lines don't all align.
One way to resolve it would be to increase the number of decimal places (e.g. 6.730000 for Standard Deviation) which would push the line into place.
However, I do not want this many decimal places.
Is it possible to add extra 0s to the end of a number, and make these invisible?
I am planning on using this table structure for several variables, and the length of Mean, Stddev, and Median will likely never be more than 6 characters.
EDIT - I would really like to ensure that each value which appears in the table will be 6 characters long, and if it is not 6 characters long, add additional "invisible" zeros.
Input
# Create and structure Table to store descriptive statistics for each variable.
subtitle = "| Mean | Stddev | Median |"
structure = '| {:0.2f} | {:0.2f} | {:0.2f} |'
lines = '=' * len(subtitle)
# Print table.
print(lines)
print(subtitle)
print(lines)
print(structure.format(mean, std, median))
print(lines)
Output:
======================================
| Mean | Stddev | Median |
======================================
| 181.26 | 6.73 | 180.34 |
======================================

Didn't really figure this out - but found a workaround.
I just did the following:
"| {:^6} | {:^6} | {:^6} | {:^6} | {:^6} |"
This keeps the width between | consistent.

Calculating a sum of numbers in C shell

I'm trying to calculate a sum numbers positioned on separate lines using C shell.
I must do it with specific commands using pipes.
There is a number of commands: comand.. | comand.. | (comands...)
printing lines in the following form:
1
2
8
4
7
The result should be 22, since 1 + 2 + 8 + 4 + 7 = 22.
I tried ... | bc | tr "\n" "+" | bc, but it didn't work.
I can't use AWK, or variables. That is why I am asking for help.

You actually can use the C shell variables, as they are part of the syntax. Without using variables, you need to pipe, and pipe again:
your-command | sed '2~1 s/^/+/' | xargs | bc
The sed command prepends plus character to all lines starting from the second; xargs joins the lines as a sequence of arguments.
The SED expression can be improved to filter out non-numeric lines:
'/^[^0-9]\+$/ d; 2~1 s/\([0-9]\+\)/+\1/'

Cannot get this simple sed command

This sed command is described as follows
Delete the cars that are $10,000 or more. Pipe the output of the sort into a sed to do this, by quitting as soon as we match a regular expression representing 5 (or more) digits at the end of a record (DO NOT use repetition for this):
So far the command is:
$ grep -iv chevy cars | sort -nk 5
I have to add another pipe at the end of that command I think which "quits as soon as we match a regular expression representing 5 or more digits at the end of a record"
I tried things like
$ grep -iv chevy cars | sort -nk 5 | sed "/[0-9][0-9][0-9][0-9][0-9]/ q"
and other variations within the // but nothing works! What is the command which matches a regular expression representing 5 or more digits and quits according to this question?

Nominally, you should add a $ before the second / to match 5 digits at the end of the record. If you omit the $, then any sequence of 5 digits will cause sed to quit, so if there is another number (a VIN, perhaps) before the price, it might match when you didn't intend it to.
grep -iv chevy cars | sort -nk 5 | sed '/[0-9][0-9][0-9][0-9][0-9]$/q'
On the whole, it's safer to use single quotes around the regex, unless you need to substitute a shell variable into it (or unless the regex contains single quotes itself). You can also specify the repetition:
grep -iv chevy cars | sort -nk 5 | sed '/[0-9]\{5,\}$/q'
The \{5,\} part matches 5 or more digits. If for any reason that doesn't work, you might find you're using GNU sed and you need to do something like sed --posix to get it working in the normal mode. Or you might be able to just remove the backslashes. There certainly are options to GNU sed to change the regex mechanism it uses (as there are with GNU grep too).

Another way.
As you don't post a file sample, a did it as a guess.
Here I'm looking for lines with the word "chevy" where the field 5 is less than 10000.
awk '/chevy/ {if ( $5 < 10000 ) print $0} ' cars
I forgot the flag -i from grep ... so the correct is:
awk 'BEGIN{IGNORECASE=1} /chevy/ {if ( $5 < 10000 ) print $0} ' cars
$ cat > cars
Chevy 2 3 4 10000
Chevy 2 3 4 5000
chEvy 2 3 4 1000
CHEVY 2 3 4 10000
CHEVY 2 3 4 2000
Prevy 2 3 4 1000
Prevy 2 3 4 10000
$ awk 'BEGIN{IGNORECASE=1} /chevy/ {if ( $5 < 10000 ) print $0} ' cars
Chevy 2 3 4 5000
chEvy 2 3 4 1000
CHEVY 2 3 4 2000

grep -iv chevy cars | sort -nk 5 | sed '/[0-9][0-9][0-9][0-9][0-9]$/d'

Sorting space delimited numbers with Linux/Bash

Is there a Linux utility or a Bash command I can use to sort a space delimited string of numbers?

Here's a simple example to get you going:
echo "81 4 6 12 3 0" | tr " " "\n" | sort -g
tr translates the spaces delimiting the numbers, into carriage returns, because sort uses carriage returns as delimiters (ie it is for sorting lines of text). The -g option tells sort to sort by "general numerical value".
man sort for further details about sort.

This is a variation from #JamesMorris answer:
echo "81 4 6 12 3 0" | xargs -n1 | sort -g | xargs
Instead of tr, I use xargs -n1 to convert to new lines. The final xargs is to convert back, to a space separated sequence of numbers.

This is a variation on ghostdog74's answer that's too big to fit in a comment. It shows digits instead of names of numbers and both the original string and the result are in space-delimited strings (instead of an array which becomes a newline-delimited string).
$ s="3 2 11 15 8"
$ sorted=$(echo $(printf "%s\n" $s | sort -n))
$ echo $sorted
2 3 8 11 15
$ echo "$sorted"
2 3 8 11 15
If you didn't use the echo when setting the value of sorted, then the string has newlines in it. In that case echoing it without quotes puts it all on one line, but, as echoing it with quotes would show, each number would appear on its own line. This is the case whether the original is an array or a string.
# demo
$ s="3 2 11 15 8"
$ sorted=$(printf "%s\n" $s | sort -n)
$ echo $sorted
2 3 8 11 15
$ echo "$sorted"
2
3
8
11
15

$ s=(one two three four)
$ sorted=$(printf "%s\n" ${s[#]}|sort)
$ echo $sorted
four one three two

Using Bash parameter expansion (to replace spaces with newlines) we can do:
str="3 2 11 15 8"
sort -n <<< "${str// /$'\n'}"
# alternative
NL=$'\n'
str="3 2 11 15 8"
sort -n <<< "${str// /${NL}}"

If you actually have a space-delimited string of numbers, then one of the other answers provided would work fine. If your list is a bash array, then:
oldIFS="$IFS"
IFS=$'\n'
array=($(sort -g <<< "${array[*]}"))
IFS="$oldIFS"
might be a better solution. The newline delimiter would help if you want to generalize to sorting an array of strings instead of numbers.

Improving on Evan Krall's nice Bash "array sort" by limiting the scope of IFS to a single command:
printf "%q\n" "${IFS}"
array=(3 2 11 15 8)
array=($(IFS=$'\n' sort -n <<< "${array[*]}"))
echo "${array[#]}"
printf "%q\n" "${IFS}"

$ awk 'BEGIN{split(ARGV[1], numbers);for(i in numbers) {print numbers[i]} }' \
"6 7 4 1 2 3" | sort -n

I added this to my .zshrc (or .bashrc) file:
#sort a space-separated list of words (e.g. a list of HTML classes)
sortwords() {
echo $1 | xargs -n1 | sort -g | xargs
}
Call it from the terminal like this:
sortwords "banana date apple cherry"
# apple banana cherry date
Thanks to #FranMowinckel and others for inspiration.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to convert columnar labeled data to json using Bash? [duplicate] - linux

Related

find different beginnings of files with bash [duplicate]

Add "invisible" decimal places to end of number?

Calculating a sum of numbers in C shell

Cannot get this simple sed command

Sorting space delimited numbers with Linux/Bash

Categories

Resources