convert bash `ls` output to json array - linux

Is it possible to use a bash script to format the output of the ls to a json array? To be valid json, all names of the dirs and files need to be wrapped in double quotes, seperated by a comma, and the entire thing needs to be wrapped in square brackets. I.e. convert:
jeroen#jeroen-ubuntu:~/Desktop$ ls
foo.txt bar baz
to
[ "foo.txt", "bar", "baz" ]
edit: I strongly prefer something that works across all my Linux servers; hence rather not depend on python, but have a pure bash solution.

If you know that no filename contains newlines, use jq:
ls | jq -R -s -c 'split("\n")[:-1]'
Short explanation of the flags to jq:
-R treats the input as string instead of JSON
-s joins all lines into an array
-c creates a compact output
[:-1] removes the last empty string in the output array
This requires version 1.4 or later of jq. Try this if it doesn't work for you:
ls | jq -R '[.]' | jq -s -c 'add'

Yes, but the corner cases and Unicode handling will drive you up the wall. Better to delegate to a scripting language that supports it natively.
$ ls
あ a "a" à a b 私
$ python -c 'import os, json; print json.dumps(os.listdir("."))'
["\u00e0", "\"a\"", "\u79c1", "a b", "\u3042", "a"]

Hello you can do that with sed and awk:
ls | awk ' BEGIN { ORS = ""; print "["; } { print "\/\#"$0"\/\#"; } END { print "]"; }' | sed "s^\"^\\\\\"^g;s^\/\#\/\#^\", \"^g;s^\/\#^\"^g"
EDIT: updated to solve the problem with " and spaces. I use /# as replacement pattern for ", since / is not a valid character for filename.

Use perl as the encoder; it's guaranteed to be non-buggy, is everywhere, and with pipes, it's still reasonably clean:
ls | perl -e 'use JSON; #in=grep(s/\n$//, <>); print encode_json(\#in)."\n";'

Most of the Linux machine already has python. all you have to do is:
python -c 'import os, json; print json.dumps(os.listdir("/yourdirectory"))'
This is for . directory , you can add any path.

Here's a bash line
echo '[' ; ls --format=commas|sed -e 's/^/\"/'|sed -e 's/,$/\",/'|sed -e 's/\([^,]\)$/\1\"\]/'|sed -e 's/, /\", \"/g'
Won't properly deal with ", \ or some commas in the name of the file. Also, if ls puts newlines between filenames, so will this.

I was also searching for a way to output a Linux folder / file tree to some JSON or XML file. Why not use this simple terminal command:
$ tree --dirsfirst --noreport -n -X -i -s -D -f -o my.xml
so, just the linux tree command, and config your own parameters. Here -X gives XML output! For me, that's OK, and i guess there's some script to convert XML to JSON ..
NOTE: I think this covers the same question.

Personnaly, I would code script that would run the command ls, send the output to a file of you choice while parsing the output to make format it to a valid JSON format.
I'm sure that a simple Bash file will do the work.
Bash ouput

Can't you use a python script like this?
myOutput = subprocess.check_output["ls"]
output = ["+str(e)+" for e in myOutput]
return output
I didn't check if it works, but you can find the specification here

Should be pretty easy.
$ cat ls2json.bash
#!/bin/bash
echo -n '['
for FILE in $(ls | sed -e 's/"/\\"/g')
do
echo -n \"${FILE}\",
done
echo -en \\b']'
then run:
$ ./ls2json.bash > json.out
but python would be even easier
import os
directory = '/some/dir'
ls = os.listdir(directory)
dirstring = str(ls)
print dirstring.replace("'",'"')

Here's an elegant one-liner solution that doesn't rely on jq:
echo '[ "'"$(echo "$list" | sed ':a;N;$!ba;s/\n/", "/g')"'" ]'
$list here is a newline-separated string.

Using gnu column (i.e. doesn't work on OSX)
ls -ldG * --time-style=long-iso | column -t -n "$PWD" -N mod,links,user,size,date,time,name -J
Output :
{
"/home/pouet": [
{"mod":"-rwxr-xr-x", "links":"1", "user":"pouet", "size":"21978", "date":"2022-08-12", "time":"11:47", "name":"file1"},
{"mod":"-rw-r--r--", "links":"1", "user":"pouet", "size":"2634", "date":"2022-06-20", "time":"11:14", "name":"file2"}
]
}

Don't use bash, use a scripting language. Untested perl example:
use JSON;
my #ls_output = `ls`; ## probably better to use a perl module to do this, like DirHandle
print encode_json( #ls_output );

Related

How to get only some simbols in bash script? [duplicate]

Given a string file path such as /foo/fizzbuzz.bar, how would I use bash to extract just the fizzbuzz portion of said string?
Here's how to do it with the # and % operators in Bash.
$ x="/foo/fizzbuzz.bar"
$ y=${x%.bar}
$ echo ${y##*/}
fizzbuzz
${x%.bar} could also be ${x%.*} to remove everything after a dot or ${x%%.*} to remove everything after the first dot.
Example:
$ x="/foo/fizzbuzz.bar.quux"
$ y=${x%.*}
$ echo $y
/foo/fizzbuzz.bar
$ y=${x%%.*}
$ echo $y
/foo/fizzbuzz
Documentation can be found in the Bash manual. Look for ${parameter%word} and ${parameter%%word} trailing portion matching section.
look at the basename command:
NAME="$(basename /foo/fizzbuzz.bar .bar)"
instructs it to remove the suffix .bar, results in NAME=fizzbuzz
Pure bash, done in two separate operations:
Remove the path from a path-string:
path=/foo/bar/bim/baz/file.gif
file=${path##*/}
#$file is now 'file.gif'
Remove the extension from a path-string:
base=${file%.*}
#${base} is now 'file'.
Using basename I used the following to achieve this:
for file in *; do
ext=${file##*.}
fname=`basename $file $ext`
# Do things with $fname
done;
This requires no a priori knowledge of the file extension and works even when you have a filename that has dots in it's filename (in front of it's extension); it does require the program basename though, but this is part of the GNU coreutils so it should ship with any distro.
The basename and dirname functions are what you're after:
mystring=/foo/fizzbuzz.bar
echo basename: $(basename "${mystring}")
echo basename + remove .bar: $(basename "${mystring}" .bar)
echo dirname: $(dirname "${mystring}")
Has output:
basename: fizzbuzz.bar
basename + remove .bar: fizzbuzz
dirname: /foo
Pure bash way:
~$ x="/foo/bar/fizzbuzz.bar.quux.zoom";
~$ y=${x/\/*\//};
~$ echo ${y/.*/};
fizzbuzz
This functionality is explained on man bash under "Parameter Expansion". Non bash ways abound: awk, perl, sed and so on.
EDIT: Works with dots in file suffixes and doesn't need to know the suffix (extension), but doesn’t work with dots in the name itself.
Using basename assumes that you know what the file extension is, doesn't it?
And I believe that the various regular expression suggestions don't cope with a filename containing more than one "."
The following seems to cope with double dots. Oh, and filenames that contain a "/" themselves (just for kicks)
To paraphrase Pascal, "Sorry this script is so long. I didn't have time to make it shorter"
#!/usr/bin/perl
$fullname = $ARGV[0];
($path,$name) = $fullname =~ /^(.*[^\\]\/)*(.*)$/;
($basename,$extension) = $name =~ /^(.*)(\.[^.]*)$/;
print $basename . "\n";
In addition to the POSIX conformant syntax used in this answer,
basename string [suffix]
as in
basename /foo/fizzbuzz.bar .bar
GNU basename supports another syntax:
basename -s .bar /foo/fizzbuzz.bar
with the same result. The difference and advantage is that -s implies -a, which supports multiple arguments:
$ basename -s .bar /foo/fizzbuzz.bar /baz/foobar.bar
fizzbuzz
foobar
This can even be made filename-safe by separating the output with NUL bytes using the -z option, for example for these files containing blanks, newlines and glob characters (quoted by ls):
$ ls has*
'has'$'\n''newline.bar' 'has space.bar' 'has*.bar'
Reading into an array:
$ readarray -d $'\0' arr < <(basename -zs .bar has*)
$ declare -p arr
declare -a arr=([0]=$'has\nnewline' [1]="has space" [2]="has*")
readarray -d requires Bash 4.4 or newer. For older versions, we have to loop:
while IFS= read -r -d '' fname; do arr+=("$fname"); done < <(basename -zs .bar has*)
perl -pe 's/\..*$//;s{^.*/}{}'
If you can't use basename as suggested in other posts, you can always use sed. Here is an (ugly) example. It isn't the greatest, but it works by extracting the wanted string and replacing the input with the wanted string.
echo '/foo/fizzbuzz.bar' | sed 's|.*\/\([^\.]*\)\(\..*\)$|\1|g'
Which will get you the output
fizzbuzz
Beware of the suggested perl solution: it removes anything after the first dot.
$ echo some.file.with.dots | perl -pe 's/\..*$//;s{^.*/}{}'
some
If you want to do it with perl, this works:
$ echo some.file.with.dots | perl -pe 's/(.*)\..*$/$1/;s{^.*/}{}'
some.file.with
But if you are using Bash, the solutions with y=${x%.*} (or basename "$x" .ext if you know the extension) are much simpler.
The basename does that, removes the path. It will also remove the suffix if given and if it matches the suffix of the file but you would need to know the suffix to give to the command. Otherwise you can use mv and figure out what the new name should be some other way.
Combining the top-rated answer with the second-top-rated answer to get the filename without the full path:
$ x="/foo/fizzbuzz.bar.quux"
$ y=(`basename ${x%%.*}`)
$ echo $y
fizzbuzz
If you want to keep just the filename with extension and strip the file path
$ x="myfile/hello/foo/fizzbuzz.bar"
$ echo ${x##*/}
$ fizzbuzz.bar
Explanation in Bash manual, see ${parameter##word}
You can use
mv *<PATTERN>.jar "$(basename *<PATTERN>.jar <PATTERN>.jar).jar"
For e.g:- I wanted to remove -SNAPSHOT from my file name. For that used below command
mv *-SNAPSHOT.jar "$(basename *-SNAPSHOT.jar -SNAPSHOT.jar).jar"

Linux sed command to return truncated output [duplicate]

I'm trying to parse JSON returned from a curl request, like so:
curl 'http://twitter.com/users/username.json' |
sed -e 's/[{}]/''/g' |
awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}'
The above splits the JSON into fields, for example:
% ...
"geo_enabled":false
"friends_count":245
"profile_text_color":"000000"
"status":"in_reply_to_screen_name":null
"source":"web"
"truncated":false
"text":"My status"
"favorited":false
% ...
How do I print a specific field (denoted by the -v k=text)?
There are a number of tools specifically designed for the purpose of manipulating JSON from the command line, and will be a lot easier and more reliable than doing it with Awk, such as jq:
curl -s 'https://api.github.com/users/lambda' | jq -r '.name'
You can also do this with tools that are likely already installed on your system, like Python using the json module, and so avoid any extra dependencies, while still having the benefit of a proper JSON parser. The following assume you want to use UTF-8, which the original JSON should be encoded in and is what most modern terminals use as well:
Python 3:
curl -s 'https://api.github.com/users/lambda' | \
python3 -c "import sys, json; print(json.load(sys.stdin)['name'])"
Python 2:
export PYTHONIOENCODING=utf8
curl -s 'https://api.github.com/users/lambda' | \
python2 -c "import sys, json; print json.load(sys.stdin)['name']"
Frequently Asked Questions
Why not a pure shell solution?
The standard POSIX/Single Unix Specification shell is a very limited language which doesn't contain facilities for representing sequences (list or arrays) or associative arrays (also known as hash tables, maps, dicts, or objects in some other languages). This makes representing the result of parsing JSON somewhat tricky in portable shell scripts. There are somewhat hacky ways to do it, but many of them can break if keys or values contain certain special characters.
Bash 4 and later, zsh, and ksh have support for arrays and associative arrays, but these shells are not universally available (macOS stopped updating Bash at Bash 3, due to a change from GPLv2 to GPLv3, while many Linux systems don't have zsh installed out of the box). It's possible that you could write a script that would work in either Bash 4 or zsh, one of which is available on most macOS, Linux, and BSD systems these days, but it would be tough to write a shebang line that worked for such a polyglot script.
Finally, writing a full fledged JSON parser in shell would be a significant enough dependency that you might as well just use an existing dependency like jq or Python instead. It's not going to be a one-liner, or even small five-line snippet, to do a good implementation.
Why not use awk, sed, or grep?
It is possible to use these tools to do some quick extraction from JSON with a known shape and formatted in a known way, such as one key per line. There are several examples of suggestions for this in other answers.
However, these tools are designed for line based or record based formats; they are not designed for recursive parsing of matched delimiters with possible escape characters.
So these quick and dirty solutions using awk/sed/grep are likely to be fragile, and break if some aspect of the input format changes, such as collapsing whitespace, or adding additional levels of nesting to the JSON objects, or an escaped quote within a string. A solution that is robust enough to handle all JSON input without breaking will also be fairly large and complex, and so not too much different than adding another dependency on jq or Python.
I have had to deal with large amounts of customer data being deleted due to poor input parsing in a shell script before, so I never recommend quick and dirty methods that may be fragile in this way. If you're doing some one-off processing, see the other answers for suggestions, but I still highly recommend just using an existing tested JSON parser.
Historical notes
This answer originally recommended jsawk, which should still work, but is a little more cumbersome to use than jq, and depends on a standalone JavaScript interpreter being installed which is less common than a Python interpreter, so the above answers are probably preferable:
curl -s 'https://api.github.com/users/lambda' | jsawk -a 'return this.name'
This answer also originally used the Twitter API from the question, but that API no longer works, making it hard to copy the examples to test out, and the new Twitter API requires API keys, so I've switched to using the GitHub API which can be used easily without API keys. The first answer for the original question would be:
curl 'http://twitter.com/users/username.json' | jq -r '.text'
To quickly extract the values for a particular key, I personally like to use "grep -o", which only returns the regex's match. For example, to get the "text" field from tweets, something like:
grep -Po '"text":.*?[^\\]",' tweets.json
This regex is more robust than you might think; for example, it deals fine with strings having embedded commas and escaped quotes inside them. I think with a little more work you could make one that is actually guaranteed to extract the value, if it's atomic. (If it has nesting, then a regex can't do it of course.)
And to further clean (albeit keeping the string's original escaping) you can use something like: | perl -pe 's/"text"://; s/^"//; s/",$//'. (I did this for this analysis.)
To all the haters who insist you should use a real JSON parser -- yes, that is essential for correctness, but
To do a really quick analysis, like counting values to check on data cleaning bugs or get a general feel for the data, banging out something on the command line is faster. Opening an editor to write a script is distracting.
grep -o is orders of magnitude faster than the Python standard json library, at least when doing this for tweets (which are ~2 KB each). I'm not sure if this is just because json is slow (I should compare to yajl sometime); but in principle, a regex should be faster since it's finite state and much more optimizable, instead of a parser that has to support recursion, and in this case, spends lots of CPU building trees for structures you don't care about. (If someone wrote a finite state transducer that did proper (depth-limited) JSON parsing, that would be fantastic! In the meantime we have "grep -o".)
To write maintainable code, I always use a real parsing library. I haven't tried jsawk, but if it works well, that would address point #1.
One last, wackier, solution: I wrote a script that uses Python json and extracts the keys you want, into tab-separated columns; then I pipe through a wrapper around awk that allows named access to columns. In here: the json2tsv and tsvawk scripts. So for this example it would be:
json2tsv id text < tweets.json | tsvawk '{print "tweet " $id " is: " $text}'
This approach doesn't address #2, is more inefficient than a single Python script, and it's a little brittle: it forces normalization of newlines and tabs in string values, to play nice with awk's field/record-delimited view of the world. But it does let you stay on the command line, with more correctness than grep -o.
On the basis that some of the recommendations here (especially in the comments) suggested the use of Python, I was disappointed not to find an example.
So, here's a one-liner to get a single value from some JSON data. It assumes that you are piping the data in (from somewhere) and so should be useful in a scripting context.
echo '{"hostname":"test","domainname":"example.com"}' | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["hostname"]'
Following martinr's and Boecko's lead:
curl -s 'http://twitter.com/users/username.json' | python -mjson.tool
That will give you an extremely grep-friendly output. Very convenient:
curl -s 'http://twitter.com/users/username.json' | python -mjson.tool | grep my_key
You could just download jq binary for your platform and run (chmod +x jq):
$ curl 'https://twitter.com/users/username.json' | ./jq -r '.name'
It extracts "name" attribute from the json object.
jq homepage says it is like sed for JSON data.
Using Node.js
If the system has Node.js installed, it's possible to use the -p print and -e evaluate script flags with JSON.parse to pull out any value that is needed.
A simple example using the JSON string { "foo": "bar" } and pulling out the value of "foo":
node -pe 'JSON.parse(process.argv[1]).foo' '{ "foo": "bar" }'
Output:
bar
Because we have access to cat and other utilities, we can use this for files:
node -pe 'JSON.parse(process.argv[1]).foo' "$(cat foobar.json)"
Output:
bar
Or any other format such as an URL that contains JSON:
node -pe 'JSON.parse(process.argv[1]).name' "$(curl -s https://api.github.com/users/trevorsenior)"
Output:
Trevor Senior
Use Python's JSON support instead of using AWK!
Something like this:
curl -s http://twitter.com/users/username.json | \
python -c "import json,sys;obj=json.load(sys.stdin);print(obj['name']);"
macOS v12.3 (Monterey) removed /usr/bin/python, so we must use /usr/bin/python3 for macOS v12.3 and later.
curl -s http://twitter.com/users/username.json | \
python3 -c "import json,sys;obj=json.load(sys.stdin);print(obj['name']);"
You've asked how to shoot yourself in the foot and I'm here to provide the ammo:
curl -s 'http://twitter.com/users/username.json' | sed -e 's/[{}]/''/g' | awk -v RS=',"' -F: '/^text/ {print $2}'
You could use tr -d '{}' instead of sed. But leaving them out completely seems to have the desired effect as well.
If you want to strip off the outer quotes, pipe the result of the above through sed 's/\(^"\|"$\)//g'
I think others have sounded sufficient alarm. I'll be standing by with a cell phone to call an ambulance. Fire when ready.
Using Bash with Python
Create a Bash function in your .bashrc file:
function getJsonVal () {
python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin)$1))";
}
Then
curl 'http://twitter.com/users/username.json' | getJsonVal "['text']"
Output:
My status
Here is the same function, but with error checking.
function getJsonVal() {
if [ \( $# -ne 1 \) -o \( -t 0 \) ]; then
cat <<EOF
Usage: getJsonVal 'key' < /tmp/
-- or --
cat /tmp/input | getJsonVal 'key'
EOF
return;
fi;
python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin)$1))";
}
Where $# -ne 1 makes sure at least 1 input, and -t 0 make sure you are redirecting from a pipe.
The nice thing about this implementation is that you can access nested JSON values and get JSON content in return! =)
Example:
echo '{"foo": {"bar": "baz", "a": [1,2,3]}}' | getJsonVal "['foo']['a'][1]"
Output:
2
If you want to be really fancy, you could pretty print the data:
function getJsonVal () {
python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin)$1, sort_keys=True, indent=4))";
}
echo '{"foo": {"bar": "baz", "a": [1,2,3]}}' | getJsonVal "['foo']"
{
"a": [
1,
2,
3
],
"bar": "baz"
}
Update (2020)
My biggest issue with external tools (e.g., Python) was that you have to deal with package managers and dependencies to install them.
However, now that we have jq as a standalone, static tool that's easy to install cross-platform via GitHub Releases and Webi (webinstall.dev/jq), I'd recommend that:
Mac, Linux:
curl -sS https://webi.sh/jq | bash
Windows 10:
curl.exe -A MS https://webi.ms/jq | powershell
Cheat Sheet: https://webinstall.dev/jq
Original (2011)
TickTick is a JSON parser written in bash (less than 250 lines of code).
Here's the author's snippet from his article, Imagine a world where Bash supports JSON:
#!/bin/bash
. ticktick.sh
``
people = {
"Writers": [
"Rod Serling",
"Charles Beaumont",
"Richard Matheson"
],
"Cast": {
"Rod Serling": { "Episodes": 156 },
"Martin Landau": { "Episodes": 2 },
"William Shatner": { "Episodes": 2 }
}
}
``
function printDirectors() {
echo " The ``people.Directors.length()`` Directors are:"
for director in ``people.Directors.items()``; do
printf " - %s\n" ${!director}
done
}
`` people.Directors = [ "John Brahm", "Douglas Heyes" ] ``
printDirectors
newDirector="Lamont Johnson"
`` people.Directors.push($newDirector) ``
printDirectors
echo "Shifted: "``people.Directors.shift()``
printDirectors
echo "Popped: "``people.Directors.pop()``
printDirectors
This is using standard Unix tools available on most distributions. It also works well with backslashes (\) and quotes (").
Warning: This doesn't come close to the power of jq and will only work with very simple JSON objects. It's an attempt to answer to the original question and in situations where you can't install additional tools.
function parse_json()
{
echo $1 | \
sed -e 's/[{}]/''/g' | \
sed -e 's/", "/'\",\"'/g' | \
sed -e 's/" ,"/'\",\"'/g' | \
sed -e 's/" , "/'\",\"'/g' | \
sed -e 's/","/'\"---SEPERATOR---\"'/g' | \
awk -F=':' -v RS='---SEPERATOR---' "\$1~/\"$2\"/ {print}" | \
sed -e "s/\"$2\"://" | \
tr -d "\n\t" | \
sed -e 's/\\"/"/g' | \
sed -e 's/\\\\/\\/g' | \
sed -e 's/^[ \t]*//g' | \
sed -e 's/^"//' -e 's/"$//'
}
parse_json '{"username":"john, doe","email":"john#doe.com"}' username
parse_json '{"username":"john doe","email":"john#doe.com"}' email
--- outputs ---
john, doe
johh#doe.com
Parsing JSON with PHP CLI
It is arguably off-topic, but since precedence reigns, this question remains incomplete without a mention of our trusty and faithful PHP, am I right?
It is using the same example JSON, but let’s assign it to a variable to reduce obscurity.
export JSON='{"hostname":"test","domainname":"example.com"}'
Now for PHP goodness, it is using file_get_contents and the php://stdin stream wrapper.
echo $JSON | php -r 'echo json_decode(file_get_contents("php://stdin"))->hostname;'
Or as pointed out using fgets and the already opened stream at CLI constant STDIN.
echo $JSON | php -r 'echo json_decode(fgets(STDIN))->hostname;'
If someone just wants to extract values from simple JSON objects without the need for nested structures, it is possible to use regular expressions without even leaving Bash.
Here is a function I defined using bash regular expressions based on the JSON standard:
function json_extract() {
local key=$1
local json=$2
local string_regex='"([^"\]|\\.)*"'
local number_regex='-?(0|[1-9][0-9]*)(\.[0-9]+)?([eE][+-]?[0-9]+)?'
local value_regex="${string_regex}|${number_regex}|true|false|null"
local pair_regex="\"${key}\"[[:space:]]*:[[:space:]]*(${value_regex})"
if [[ ${json} =~ ${pair_regex} ]]; then
echo $(sed 's/^"\|"$//g' <<< "${BASH_REMATCH[1]}")
else
return 1
fi
}
Caveats: objects and arrays are not supported as values, but all other value types defined in the standard are supported. Also, a pair will be matched no matter how deep in the JSON document it is as long as it has exactly the same key name.
Using the OP's example:
$ json_extract text "$(curl 'http://twitter.com/users/username.json')"
My status
$ json_extract friends_count "$(curl 'http://twitter.com/users/username.json')"
245
Unfortunately the top voted answer that uses grep returns the full match that didn't work in my scenario, but if you know the JSON format will remain constant you can use lookbehind and lookahead to extract just the desired values.
# echo '{"TotalPages":33,"FooBar":"he\"llo","anotherValue":100}' | grep -Po '(?<="FooBar":")(.*?)(?=",)'
he\"llo
# echo '{"TotalPages":33,"FooBar":"he\"llo","anotherValue":100}' | grep -Po '(?<="TotalPages":)(.*?)(?=,)'
33
# echo '{"TotalPages":33,"FooBar":"he\"llo","anotherValue":100}' | grep -Po '(?<="anotherValue":)(.*?)(?=})'
100
Version which uses Ruby and http://flori.github.com/json/
< file.json ruby -e "require 'rubygems'; require 'json'; puts JSON.pretty_generate(JSON[STDIN.read]);"
Or more concisely:
< file.json ruby -r rubygems -r json -e "puts JSON.pretty_generate(JSON[STDIN.read]);"
This is yet another Bash and Python hybrid answer. I posted this answer, because I wanted to process more complex JSON output, but, reducing the complexity of my bash application. I want to crack open the following JSON object from http://www.arcgis.com/sharing/rest/info?f=json in Bash:
{
"owningSystemUrl": "http://www.arcgis.com",
"authInfo": {
"tokenServicesUrl": "https://www.arcgis.com/sharing/rest/generateToken",
"isTokenBasedSecurity": true
}
}
In the following example, I created my own implementation of jq and unquote leveraging Python. You'll note that once we import the Python object from json to a Python dictionary we can use Python syntax to navigate the dictionary. To navigate the above, the syntax is:
data
data[ "authInfo" ]
data[ "authInfo" ][ "tokenServicesUrl" ]
By using magic in Bash, we omit data and only supply the Python text to the right of data, i.e.
jq
jq '[ "authInfo" ]'
jq '[ "authInfo" ][ "tokenServicesUrl" ]'
Note, with no parameters, jq acts as a JSON prettifier. With parameters, we can use Python syntax to extract anything we want from the dictionary including navigating subdictionaries and array elements.
Here are the Bash Python hybrid functions:
#!/bin/bash -xe
jq_py() {
cat <<EOF
import json, sys
data = json.load( sys.stdin )
print( json.dumps( data$1, indent = 4 ) )
EOF
}
jq() {
python -c "$( jq_py "$1" )"
}
unquote_py() {
cat <<EOF
import json,sys
print( json.load( sys.stdin ) )
EOF
}
unquote() {
python -c "$( unquote_py )"
}
Here's a sample usage of the Bash Python functions:
curl http://www.arcgis.com/sharing/rest/info?f=json | tee arcgis.json
# {"owningSystemUrl":"https://www.arcgis.com","authInfo":{"tokenServicesUrl":"https://www.arcgis.com/sharing/rest/generateToken","isTokenBasedSecurity":true}}
cat arcgis.json | jq
# {
# "owningSystemUrl": "https://www.arcgis.com",
# "authInfo": {
# "tokenServicesUrl": "https://www.arcgis.com/sharing/rest/generateToken",
# "isTokenBasedSecurity": true
# }
# }
cat arcgis.json | jq '[ "authInfo" ]'
# {
# "tokenServicesUrl": "https://www.arcgis.com/sharing/rest/generateToken",
# "isTokenBasedSecurity": true
# }
cat arcgis.json | jq '[ "authInfo" ][ "tokenServicesUrl" ]'
# "https://www.arcgis.com/sharing/rest/generateToken"
cat arcgis.json | jq '[ "authInfo" ][ "tokenServicesUrl" ]' | unquote
# https://www.arcgis.com/sharing/rest/generateToken
There is an easier way to get a property from a JSON string. Using a package.json file as an example, try this:
#!/usr/bin/env bash
my_val="$(json=$(<package.json) node -pe "JSON.parse(process.env.json)['version']")"
We're using process.env, because this gets the file's contents into Node.js as a string without any risk of malicious contents escaping their quoting and being parsed as code.
Now that PowerShell is cross platform, I thought I'd throw its way out there, since I find it to be fairly intuitive and extremely simple.
curl -s 'https://api.github.com/users/lambda' | ConvertFrom-Json
ConvertFrom-Json converts the JSON into a PowerShell custom object, so you can easily work with the properties from that point forward. If you only wanted the 'id' property for example, you'd just do this:
curl -s 'https://api.github.com/users/lambda' | ConvertFrom-Json | select -ExpandProperty id
If you wanted to invoke the whole thing from within Bash, then you'd have to call it like this:
powershell 'curl -s "https://api.github.com/users/lambda" | ConvertFrom-Json'
Of course, there's a pure PowerShell way to do it without curl, which would be:
Invoke-WebRequest 'https://api.github.com/users/lambda' | select -ExpandProperty Content | ConvertFrom-Json
Finally, there's also ConvertTo-Json which converts a custom object to JSON just as easily. Here's an example:
(New-Object PsObject -Property #{ Name = "Tester"; SomeList = #('one','two','three')}) | ConvertTo-Json
Which would produce nice JSON like this:
{
"Name": "Tester",
"SomeList": [
"one",
"two",
"three"
]
}
Admittedly, using a Windows shell on Unix is somewhat sacrilegious, but PowerShell is really good at some things, and parsing JSON and XML are a couple of them. This is the GitHub page for the cross platform version: PowerShell
I can not use any of the answers here. Neither jq, shell arrays, declare, grep -P, lookbehind, lookahead, Python, Perl, Ruby, or even Bash, is available.
The remaining answers simply do not work well. JavaScript sounded familiar, but the tin says Nescaffe - so it is a no go, too :) Even if available, for my simple needs - they would be overkill and slow.
Yet, it is extremely important for me to get many variables from the JSON formatted reply of my modem. I am doing it in Bourne shell (sh) with a very trimmed down BusyBox at my routers! There aren't any problems using AWK alone: just set delimiters and read the data. For a single variable, that is all!
awk 'BEGIN { FS="\""; RS="," }; { if ($2 == "login") {print $4} }' test.json
Remember I don't have any arrays? I had to assign within the AWK parsed data to the 11 variables which I need in a shell script. Wherever I looked, that was said to be an impossible mission. No problem with that, either.
My solution is simple. This code will:
parse .json file from the question (actually, I have borrowed a working data sample from the most upvoted answer) and picked out the quoted data, plus
create shell variables from within the awk assigning free named shell variable names.
eval $( curl -s 'https://api.github.com/users/lambda' |
awk ' BEGIN { FS="""; RS="," };
{
if ($2 == "login") { print "Login=""$4""" }
if ($2 == "name") { print "Name=""$4""" }
if ($2 == "updated_at") { print "Updated=""$4""" }
}' )
echo "$Login, $Name, $Updated"
There aren't any problems with blanks within. In my use, the same command parses a long single line output. As eval is used, this solution is suited for trusted data only.
It is simple to adapt it to pickup unquoted data. For a huge number of variables, a marginal speed gain can be achieved using else if. Lack of arrays obviously means: no multiple records without extra fiddling. But where arrays are available, adapting this solution is a simple task.
#maikel's sed answer almost works (but I can not comment on it). For my nicely formatted data - it works. Not so much with the example used here (missing quotes throw it off). It is complicated and difficult to modify. Plus, I do not like having to make 11 calls to extract 11 variables. Why? I timed 100 loops extracting 9 variables: the sed function took 48.99 seconds and my solution took 0.91 second! Not fair? Doing just a single extraction of 9 variables: 0.51 vs. 0.02 second.
Someone who also has XML files, might want to look at my Xidel. It is a command-line interface, dependency-free JSONiq processor. (I.e., it also supports XQuery for XML or JSON processing.)
The example in the question would be:
xidel -e 'json("http://twitter.com/users/username.json")("name")'
Or with my own, nonstandard extension syntax:
xidel -e 'json("http://twitter.com/users/username.json").name'
You can try something like this -
curl -s 'http://twitter.com/users/jaypalsingh.json' |
awk -F=":" -v RS="," '$1~/"text"/ {print}'
One interesting tool that hasn't be covered in the existing answers is using gron written in Go which has a tagline that says Make JSON greppable! which is exactly what it does.
So essentially gron breaks down your JSON into discrete assignments see the absolute 'path' to it. The primary advantage of it over other tools like jq would be to allow searching for the value without knowing how nested the record to search is present at, without breaking the original JSON structure
e.g., I want to search for the 'twitter_username' field from the following link, I just do
% gron 'https://api.github.com/users/lambda' | fgrep 'twitter_username'
json.twitter_username = "unlambda";
% gron 'https://api.github.com/users/lambda' | fgrep 'twitter_username' | gron -u
{
"twitter_username": "unlambda"
}
As simple as that. Note how the gron -u (short for ungron) reconstructs the JSON back from the search path. The need for fgrep is just to filter your search to the paths needed and not let the search expression be evaluated as a regex, but as a fixed string (which is essentially grep -F)
Another example to search for a string to see where in the nested structure the record is under
% echo '{"foo":{"bar":{"zoo":{"moo":"fine"}}}}' | gron | fgrep "fine"
json.foo.bar.zoo.moo = "fine";
It also supports streaming JSON with its -s command line flag, where you can continuously gron the input stream for a matching record. Also gron has zero runtime dependencies. You can download a binary for Linux, Mac, Windows or FreeBSD and run it.
More usage examples and trips can be found at the official Github page - Advanced Usage
As for why you one can use gron over other JSON parsing tools, see from author's note from the project page.
Why shouldn't I just use jq?
jq is awesome, and a lot more powerful than gron, but with that power comes complexity. gron aims to make it easier to use the tools you already know, like grep and sed.
You can use jshon:
curl 'http://twitter.com/users/username.json' | jshon -e text
Here's one way you can do it with AWK:
curl -sL 'http://twitter.com/users/username.json' | awk -F"," -v k="text" '{
gsub(/{|}/,"")
for(i=1;i<=NF;i++){
if ( $i ~ k ){
print $i
}
}
}'
Here is a good reference. In this case:
curl 'http://twitter.com/users/username.json' | sed -e 's/[{}]/''/g' | awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) { where = match(a[i], /\"text\"/); if(where) {print a[i]} } }'
Parsing JSON is painful in a shell script. With a more appropriate language, create a tool that extracts JSON attributes in a way consistent with shell scripting conventions. You can use your new tool to solve the immediate shell scripting problem and then add it to your kit for future situations.
For example, consider a tool jsonlookup such that if I say jsonlookup access token id it will return the attribute id defined within the attribute token defined within the attribute access from standard input, which is presumably JSON data. If the attribute doesn't exist, the tool returns nothing (exit status 1). If the parsing fails, exit status 2 and a message to standard error. If the lookup succeeds, the tool prints the attribute's value.
Having created a Unix tool for the precise purpose of extracting JSON values you can easily use it in shell scripts:
access_token=$(curl <some horrible crap> | jsonlookup access token id)
Any language will do for the implementation of jsonlookup. Here is a fairly concise Python version:
#!/usr/bin/python
import sys
import json
try: rep = json.loads(sys.stdin.read())
except:
sys.stderr.write(sys.argv[0] + ": unable to parse JSON from stdin\n")
sys.exit(2)
for key in sys.argv[1:]:
if key not in rep:
sys.exit(1)
rep = rep[key]
print rep
A two-liner which uses Python. It works particularly well if you're writing a single .sh file and you don't want to depend on another .py file. It also leverages the usage of pipe |. echo "{\"field\": \"value\"}" can be replaced by anything printing a JSON file to standard output.
echo "{\"field\": \"value\"}" | python -c 'import sys, json
print(json.load(sys.stdin)["field"])'
If you have the PHP interpreter installed:
php -r 'var_export(json_decode(`curl http://twitter.com/users/username.json`, 1));'
For example:
We have a resource that provides JSON content with countries' ISO codes: http://country.io/iso3.json and we can easily see it in a shell with curl:
curl http://country.io/iso3.json
But it looks not very convenient, and not readable. Better parse the JSON content and see a readable structure:
php -r 'var_export(json_decode(`curl http://country.io/iso3.json`, 1));'
This code will print something like:
array (
'BD' => 'BGD',
'BE' => 'BEL',
'BF' => 'BFA',
'BG' => 'BGR',
'BA' => 'BIH',
'BB' => 'BRB',
'WF' => 'WLF',
'BL' => 'BLM',
...
If you have nested arrays this output will looks much better...
There is also a very simple, but powerful, JSON CLI processing tool, fx.
Examples
Use an anonymous function:
echo '{"key": "value"}' | fx "x => x.key"
Output:
value
If you don't pass anonymous function parameter → ..., code will be automatically transformed into an anonymous function. And you can get access to JSON by this keyword:
$ echo '[1,2,3]' | fx "this.map(x => x * 2)"
[2, 4, 6]
Or just use dot syntax too:
echo '{"items": {"one": 1}}' | fx .items.one
Output:
1
You can pass any number of anonymous functions for reducing JSON:
echo '{"items": ["one", "two"]}' | fx "this.items" "this[1]"
Output:
two
You can update existing JSON using spread operator:
echo '{"count": 0}' | fx "{...this, count: 1}"
Output:
{"count": 1}
Just plain JavaScript. There isn't any need to learn new syntax.
Later version of fx has an interactive mode! -
I needed something in Bash that was short and would run without dependencies beyond vanilla Linux LSB and Mac OS for both Python 2.7 & 3 and handle errors, e.g. would report JSON parse errors and missing property errors without spewing Python exceptions:
json-extract () {
if [[ "$1" == "" || "$1" == "-h" || "$1" == "-?" || "$1" == "--help" ]] ; then
echo 'Extract top level property value from json document'
echo ' Usage: json-extract <property> [ <file-path> ]'
echo ' Example 1: json-extract status /tmp/response.json'
echo ' Example 2: echo $JSON_STRING | json-extract status'
echo ' Status codes: 0 - success, 1 - json parse error, 2 - property missing'
else
python -c $'import sys, json;\ntry: obj = json.load(open(sys.argv[2])); \nexcept: sys.exit(1)\ntry: print(obj[sys.argv[1]])\nexcept: sys.exit(2)' "$1" "${2:-/dev/stdin}"
fi
}

How to get list of commands used in a shell script?

I have a shell script of more than 1000 lines, i would like to check if all the commands used in the script are installed in my Linux operating system.
Is there any tool to get the list of Linux commands used in the shell script?
Or how can i write a small script which can do this for me?
The script runs successfully on the Ubuntu machine, it is invoked as a part of C++ application. we need to run the same on a device where a Linux with limited capability runs. I have identified manually, few commands which the script runs and not present on Device OS. before we try installing these commands i would like to check all other commands and install all at once.
Thanks in advance
I already tried this in the past and got to the conclusion that is very difficult to provide a solution which would work for all scripts. The reason is that each script with complex commands has a different approach in using the shells features.
In case of a simple linear script, it might be as easy as using debug mode.
For example: bash -x script.sh 2>&1 | grep ^+ | awk '{print $2}' | sort -u
In case the script has some decisions, then you might use the same approach an consider that for the "else" cases the commands would still be the same just with different arguments or would be something trivial (echo + exit).
In case of a complex script, I attempted to write a script that would just look for commands in the same place I would do it myself. The challenge is to create expressions that would help identify all used possibilities, I would say this is doable for about 80-90% of the script and the output should only be used as reference since it will contain invalid data (~20%).
Here is an example script that would parse itself using a very simple approach (separate commands on different lines, 1st word will be the command):
# 1. Eliminate all quoted text
# 2. Eliminate all comments
# 3. Replace all delimiters between commands with new lines ( ; | && || )
# 4. extract the command from 1st column and print it once
cat $0 \
| sed -e 's/\"/./g' -e "s/'[^']*'//g" -e 's/"[^"]*"//g' \
| sed -e "s/^[[:space:]]*#.*$//" -e "s/\([^\\]\)#[^\"']*$/\1/" \
| sed -e "s/&&/;/g" -e "s/||/;/g" | tr ";|" "\n\n" \
| awk '{print $1}' | sort -u
the output is:
.
/
/g.
awk
cat
sed
sort
tr
There are many more cases to consider (command substitutions, aliases etc.), 1, 2 and 3 are just beginning, but they would still cover 80% of most complex scripts.
The regular expressions used would need to be adjusted or extended to increase precision and special cases.
In conclusion if you really need something like this, then you can write a script as above, but don't trust the output until you verify it yourself.
Add export PATH='' to the second line of your script.
Execute your_script.sh 2>&1 > /dev/null | grep 'No such file or directory' | awk '{print $4;}' | grep -v '/' | sort | uniq | sed 's/.$//'.
If you have a fedora/redhat based system, bash has been patched with the --rpm-requires flag
--rpm-requires: Produce the list of files that are required for the shell script to run. This implies -n and is subject to the same limitations as compile time error checking checking; Command substitutions, Conditional expressions and eval builtin are not parsed so some dependencies may be missed.
So when you run the following:
$ bash --rpm-requires script.sh
executable(command1)
function(function1)
function(function2)
executable(command2)
function(function3)
There are some limitations here:
command and process substitutions and conditional expressions are not picked up. So the following are ignored:
$(command)
<(command)
>(command)
command1 && command2 || command3
commands as strings are not picked up. So the following line will be ignored
"/path/to/my/command"
commands that contain shell variables are not listed. This generally makes sense since
some might be the result of some script logic, but even the following is ignored
$HOME/bin/command
This point can however be bypassed by using envsubst and running it as
$ bash --rpm-requires <(<script envsubst)
However, if you use shellcheck, you most likely quoted this and it will still be ignored due to point 2
So if you want to use check if your scripts are all there, you can do something like:
while IFS='' read -r app; do
[ "${app%%(*}" == "executable" ] || continue
app="${app#*(}"; app="${app%)}";
if [ "$(type -t "${app}")" != "builtin" ] && \
! [ -x "$(command -v "${app}")" ]
then
echo "${app}: missing application"
fi
done < <(bash --rpm-requires <(<"$0" envsubst) )
If your script contains files that are sourced that might contain various functions and other important definitions, you might want to do something like
bash --rpm-requires <(cat source1 source2 ... <(<script.sh envsubst))
Based #czvtools’ answer, I added some extra checks to filter out bad values:
#!/usr/bin/fish
if test "$argv[1]" = ""
echo "Give path to command to be tested"
exit 1
end
set commands (cat $argv \
| sed -e 's/\"/./g' -e "s/'[^']*'//g" -e 's/"[^"]*"//g' \
| sed -e "s/^[[:space:]]*#.*\$//" -e "s/\([^\\]\)#[^\"']*\$/\1/" \
| sed -e "s/&&/;/g" -e "s/||/;/g" | tr ";|" "\n\n" \
| awk '{print $1}' | sort -u)
for command in $commands
if command -q -- $command
set -a resolved (realpath (which $command))
end
end
set resolved (string join0 $resolved | sort -z -u | string split0)
for command in $resolved
echo $command
end

How to search with grep exactly string in a file via shell linux?

I have a file, the content of file has a string like this:
'/ad/e','#'.base64_decode("ZXZhbA==").'($zad)', 'add'
I want to check the file has this string. But when I use grep to check, It always return false. I try some ways:
grep "'/ad/e','#'.base64_decode("ZXZhbA==").'($zad)', 'add'" foo.txt
grep "'/ad/e','#'\.base64_decode\("ZXZhbA\=\="\)\.'\(\$zad\)', 'add'" foo.txt
str="'/ad/e','#'\.base64_decode\("ZXZhbA\=\="\)\.'\(\$zad\)', 'add'"
grep "$str" foo.txt
Can you help me? Maybe, another command line.
This is my case:
while read str; do
if [ ! -z "$str" ]; then
if grep -Fxq "$str" "$file_path"; then
do somthing
fi
fi
done < <(cat /usr/local/caotoc/db.dat)
Thank you so much!
First, you need to make sure the string is quoted properly. This is a bit of an art form, since your string contains both single and double quotes.
One thought would be to use read and a here-document to avoid having to escape anything.
Second, you need to use -F to perform exact string matching instead of more general regular-expression matching.
IFS= read -r str <<'EOF'
'/ad/e','#'.base64_decode("ZXZhbA==").'($zad)', 'add'
EOF
grep -F "$str" foo.txt
Based on the update, you can use a simple loop to read them one at a time.
while IFS= read -r str; do
grep -F "$str" foo.txt
done < /usr/local/caotoc/db.dat
You may be able to simply use the -f option to grep, which will cause grep to output lines from foo.txt that match any line from db.dat.
grep -f /usr/local/caotoc/db.dat -F foo.txt
Instead of trying to workaround regexes, the simplest way is to turn off regular expressions using -F (or --fixed-strings) option, which makes grep act like a simple string search
-F, --fixed-strings PATTERN is a set of newline-separated strings
like this:
grep -F "'/ad/e','#'.base64_decode(\"ZXZhbA==\").'(\$zad)', 'add'" test
Note: because of the shell, you still need to escape:
double quotes
dollar sign or else $zad is evaluated as an environment variable

Looping through the elements of a path variable in Bash

I want to loop through a path list that I have gotten from an echo $VARIABLE command.
For example:
echo $MANPATH will return
/usr/lib:/usr/sfw/lib:/usr/info
So that is three different paths, each separated by a colon. I want to loop though each of those paths. Is there a way to do that? Thanks.
Thanks for all the replies so far, it looks like I actually don't need a loop after all. I just need a way to take out the colon so I can run one ls command on those three paths.
You can set the Internal Field Separator:
( IFS=:
for p in $MANPATH; do
echo "$p"
done
)
I used a subshell so the change in IFS is not reflected in my current shell.
The canonical way to do this, in Bash, is to use the read builtin appropriately:
IFS=: read -r -d '' -a path_array < <(printf '%s:\0' "$MANPATH")
This is the only robust solution: will do exactly what you want: split the string on the delimiter : and be safe with respect to spaces, newlines, and glob characters like *, [ ], etc. (unlike the other answers: they are all broken).
After this command, you'll have an array path_array, and you can loop on it:
for p in "${path_array[#]}"; do
printf '%s\n' "$p"
done
You can use Bash's pattern substitution parameter expansion to populate your loop variable. For example:
MANPATH=/usr/lib:/usr/sfw/lib:/usr/info
# Replace colons with spaces to create list.
for path in ${MANPATH//:/ }; do
echo "$path"
done
Note: Don't enclose the substitution expansion in quotes. You want the expanded values from MANPATH to be interpreted by the for-loop as separate words, rather than as a single string.
In this way you can safely go through the $PATH with a single loop, while $IFS will remain the same inside or outside the loop.
while IFS=: read -d: -r path; do # `$IFS` is only set for the `read` command
echo $path
done <<< "${PATH:+"${PATH}:"}" # append an extra ':' if `$PATH` is set
You can check the value of $IFS,
IFS='xxxxxxxx'
while IFS=: read -d: -r path; do
echo "${IFS}${path}"
done <<< "${PATH:+"${PATH}:"}"
and the output will be something like this.
xxxxxxxx/usr/local/bin
xxxxxxxx/usr/bin
xxxxxxxx/bin
Reference to another question on StackExchange.
for p in $(echo $MANPATH | tr ":" " ") ;do
echo $p
done
IFS=:
arr=(${MANPATH})
for path in "${arr[#]}" ; do # <- quotes required
echo $path
done
... it does take care of spaces :o) but also adds empty elements if you have something like:
:/usr/bin::/usr/lib:
... then index 0,2 will be empty (''), cannot say why index 4 isnt set at all
This can also be solved with Python, on the command line:
python -c "import os,sys;[os.system(' '.join(sys.argv[1:]).format(p)) for p in os.getenv('PATH').split(':')]" echo {}
Or as an alias:
alias foreachpath="python -c \"import os,sys;[os.system(' '.join(sys.argv[1:]).format(p)) for p in os.getenv('PATH').split(':')]\""
With example usage:
foreachpath echo {}
The advantage to this approach is that {} will be replaced by each path in succession. This can be used to construct all sorts of commands, for instance to list the size of all files and directories in the directories in $PATH. including directories with spaces in the name:
foreachpath 'for e in "{}"/*; do du -h "$e"; done'
Here is an example that shortens the length of the $PATH variable by creating symlinks to every file and directory in the $PATH in $HOME/.allbin. This is not useful for everyday usage, but may be useful if you get the too many arguments error message in a docker container, because bitbake uses the full $PATH as part of the command line...
mkdir -p "$HOME/.allbin"
python -c "import os,sys;[os.system(' '.join(sys.argv[1:]).format(p)) for p in os.getenv('PATH').split(':')]" 'for e in "{}"/*; do ln -sf "$e" "$HOME/.allbin/$(basename $e)"; done'
export PATH="$HOME/.allbin"
This should also, in theory, speed up regular shell usage and shell scripts, since there are fewer paths to search for every command that is executed. It is pretty hacky, though, so I don't recommend that anyone shorten their $PATH this way.
The foreachpath alias might come in handy, though.
Combining ideas from:
https://stackoverflow.com/a/29949759 - gniourf_gniourf
https://stackoverflow.com/a/31017384 - Yi H.
code:
PATHVAR='foo:bar baz:spam:eggs:' # demo path with space and empty
printf '%s:\0' "$PATHVAR" | while IFS=: read -d: -r p; do
echo $p
done | cat -n
output:
1 foo
2 bar baz
3 spam
4 eggs
5
You can use Bash's for X in ${} notation to accomplish this:
for p in ${PATH//:/$'\n'} ; do
echo $p;
done
OP's update wants to ls the resulting folders, and has pointed out that ls only requires a space-separated list.
ls $(echo $PATH | tr ':' ' ') is nice and simple and should fit the bill nicely.

Resources