I want to convert a json file to csv using shell script without using jq. Is it possible?
Here is a json :
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
},
{
"id": "0002",
"type": "donut2",
"name": "Cake2",
"ppu": 0.5522,
}
I don't want to use jq.
I want to store it in a csv file.
Bare-bones core-only perl one-liner version, to complement the python and ruby ones already given:
perl -MJSON::PP -0777 -nE '$,=","; say #$_{"id","type","name","ppu"} for #{decode_json $_}' input.json
A more robust one would use a more efficient non-core JSON parser and a CSV module to do things like properly quote fields when needed, but since your sample data doesn't include such fields I didn't bother. Can if requested.
And the unrequested jq version, because that really is the best approach whether you want it or not:
jq -r '.[] | [.id, .type, .name, .ppu] | #csv' input.json
Bash is not the tool to do this at all. But if for some reason you cannot install jq, you can simply use Python 3, which comes by default in most distros of Linux and in MacOS.
#!/usr/local/bin/python3
import json
objs=json.loads("""
[
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55
},
{
"id": "0002",
"type": "donut2",
"name": "Cake2",
"ppu": 0.5522
}
]
""")
for item in objs :
print("{}{}{}{}{}{}{}".format(item['id'],",",item['type'],",",item['name'],",",item['ppu']))
If you do not have Python 3 either, you can then do it in Ruby, which also comes by default in most distros and MacOS :
#!/usr/bin/ruby
require 'json'
content = '
[
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55
},
{
"id": "0002",
"type": "donut2",
"name": "Cake2",
"ppu": 0.5522
}
]
'
JSON.parse(content).each { |item| puts "#{item['id']},#{item['type']},#{item['name']},#{item['ppu']}" }
You can then redirect the output to a file :
script.rb > output.csv
And thats it.
Nevertheless, if you can be completely sure of the format of your input, you can do some bash magic, specially using awk. But as others also said, please don't do that.
Related
I am looking for a way to use grep on a linux server to find duplicate json records, is it possible to have a grep to search for duplicate id's in the example below ?
so the grep would return: 01
{
"book": [
{
"id": "01",
"language": "Java",
"edition": "third",
"author": "Herbert Schildt"
},
{
"id": "02",
"language": "Java",
"edition": "third",
"author": "Herbert Schildt"
},
{
"id": "03",
"language": "Java",
"edition": "third",
"author": "Herbert Schildt"
},
{
"id": "01",
"language": "Java",
"edition": "third",
"author": "Herbert Schildt"
},
{
"id": "04",
"language": "C++",
"edition": "second",
"author": "E.Balagurusamy"
}
]
}
use grep along with uniq.
grep '"id":' filename | sort | uniq -d
The -d option only prints duplicates.
However, this depends on the JSON being laid out neatly. To handle more general formatting, I recommend you use the jq utility.
A jq-based approach:
jq -r '.book[].id' < in.json | sort | uniq -d
01
This should work even for minified JSON files with no newlines.
OK, discarding any whitespace from the JSON strings I can offer this if awk is acceptable - hutch being the formatted chunk of JSON above in a file.
I use tr to remove any whitespace, use , as a field separator in awk; iterate over the one long lines elements with a for-loop, do some pattern-matching in awk to isolate ID fields and increment an array for each matched ID. At the end of processing I iterate over the array and print ID's that have more than one match.
Here your data:
$ cat hutch
{
"book": [
{
"id": "01",
"language": "Java",
"edition": "third",
"author": "Herbert Schildt"
},
{
"id": "02",
"language": "Java",
"edition": "third",
"author": "Herbert Schildt"
},
{
"id": "03",
"language": "Java",
"edition": "third",
"author": "Herbert Schildt"
},
{
"id": "01",
"language": "Java",
"edition": "third",
"author": "Herbert Schildt"
},
{
"id": "04",
"language": "C++",
"edition": "second",
"author": "E.Balagurusamy"
}
]
}
And here the finding of dupes:
$ tr -d '[:space:]' <hutch | awk -F, '{for(i=1;i<=NF;i++){if($i~/"id":/){a[gensub(/^.*"id":"([0-9]+)"$/, "\\1","1",$i)]++}}}END{for(i in a){if(a[i]>1){print i}}}'
01
Use a Perl one-liner to extract the numeric ids, then sort | uniq -d to print only the duplicates (as in the answer by Barmar):
This assumes that the id key/value pair is on the same line, but disregards whitespace (or lack of whitespace) anywhere on the line (leading, trailing, and in between):
perl -lne 'print for /"id":\s*"(\d+)"/' in.json | sort | uniq -d
This makes no assumptions (disregards whitespace and newlines). Note that it reads the entire json file into memory (using the -0777 command line switch):
perl -0777 -nE 'say for /"id":\s*"(\d+)"/g' in.json | sort | uniq -d
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-E : Tells Perl to look for code in-line, instead of in a file. Also enables all optional features. Here, enables say.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-0777 : Slurp files whole.
The regex uses this modifier:
/g : Multiple matches.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start
So I am building this react app, where I have to create a json structure and then add this to a json file.
Here is the structure I have to build:
{
"name": {
"label": "Name",
"type": "text",
"operators": ["equal", "not_equal"],
"defaultOperator": "not_equal"
},
"age": {
"label": "Age",
"type": "number",
"operators": [
"equal",
"not_equal",
"less",
"less_or_equal",
"greater",
"greater_or_equal",
"between",
"not_between",
"is_empty",
"is_not_empty"
]
},
"gender": {
"label": "Gender",
"type": "select",
"listValues": {
"male": "Male",
"female": "Female"
}
}
}
After finishing with the json structure, I want to push this to a json file, which is the configuration file for a react library (react-awesome-query-builder). Now how can I write to a json file using JS?
I know that I can use Node.js and use fs for this, but I am not sure how to use this in react. Perhaps there is a library I can use to do this?
Can someone point me to the right direction?
You can write it exactly as a traditional text file (because it is basically text). Meanwhile don't forget to give to your file the extension .json!
It is when you will open it afterward that you will have to parse it as json wich is well handled by plenty of libraries
Well... I guess I figured it out. Instead of using an external json file I tested the config file inside as a variable and it works. No need to use read/write from node.js or any of that.
I have a JSON file with 12,166,466 of lines.
I want to remove quotes from values on keys:
"timestamp": "1538564256",and "score": "10", to look like
"timestamp": 1538564256, and "score": 10,.
Input:
{
"title": "DNS domain", ,
"timestamp": "1538564256",
"domain": {
"dns": [
"www.google.com"
]
},
"score": "10",
"link": "www.bit.ky/sdasd/asddsa"
"id": "c-1eOWYB9XD0VZRJuWL6"
}, {
"title": "DNS domain",
"timestamp": "1538564256",
"domain": {
"dns": [
"google.de"
]
},
"score": "10",
"link": "www.bit.ky/sdasd/asddsa",
"id": "du1eOWYB9XD0VZRJuWL6"
}
}
Expected output:
{
"title": "DNS domain", ,
"timestamp": 1538564256,
"domain": {
"dns": [
"www.google.com"
]
},
"score": 10,
"link": "www.bit.ky/sdasd/asddsa"
"id": "c-1eOWYB9XD0VZRJuWL6"
}, {
"title": "DNS domain",
"timestamp": 1538564256,
"domain": {
"dns": [
"google.de"
]
},
**"score": 10,**
"link": "www.bit.ky/sdasd/asddsa",
"id": "du1eOWYB9XD0VZRJuWL6"
}
}
I have tried:
sed -E '
s/"timestamp": "/"timestamp": /g
s/"score": "/"score": /g
'
the first part is quite straightforward, but how to remove ", at that the end of the line that contains "timestamp" and "score"? How do I access that using sed or even awk, or other tool with the mind that I have 12 million lines to process?
Assuming that you fix your JSON input file like this:
<file jq .
[
{
"title": "DNS domain",
"timestamp": "1538564256",
"domain": {
"dns": [
"www.google.com"
]
},
"score": "10",
"link": "www.bit.ky/sdasd/asddsa",
"id": "c-1eOWYB9XD0VZRJuWL6"
},
{
"title": "DNS domain",
"timestamp": "1538564256",
"domain": {
"dns": [
"google.de"
]
},
"score": "10",
"link": "www.bit.ky/sdasd/asddsa",
"id": "du1eOWYB9XD0VZRJuWL6"
}
]
You can use jq and its tonumber function to change the wanted strings to values:
<file jq '.[].timestamp |= tonumber | .[].score |= tonumber'
If the JSON structure matches roughly your example (e. g., there won't be any other whitespace characters between "timestamp", the colon, and the value), then this awk should be ok. If available, using jq for JSON transformation is the better choice by far!
awk '{print gensub(/("(timestamp|score)": )"([0-9]+)"/, "\\1\\3", "g")}' file
Be warned that tonumber can lose precision. If using tonumber is inadmissible, and if the output is produced by jq (or otherwise linearized vertically), then using awk as proposed elsewhere on this page is a good way to go. (If your awk does not have gensub, then the awk program can be easily adapted.) Here is the same thing using sed, assuming its flag for extended regex processing is -E:
sed -E -e 's/"(timestamp|score)": "([0-9]+)"/"\1": \2/'
For reference, if there's any doubt about where the relevant keys are located, here's a filter in jq that is agnostic about that:
walk(if type == "object"
then if has("timestamp") then .timestamp|=tonumber else . end
| if has("score") then .score|=tonumber else end
else . end)
If your jq does not have walk/1, then simply snarf its def from the web, e.g. from https://raw.githubusercontent.com/stedolan/jq/master/src/builtin.jq
If you wanted to convert all number-valued strings to numbers, you could write:
walk(if type=="object" then map_values(tonumber? // .) else . end)
This might work for you (GNU sed):
sed ':a;/"timestamp":\s*"1538564256",/{s/"//3g;:b;n;/timestamp/ba;/"score":\s*"10"/s/"//3g;Tb}' file
On encountering a line that contains "timestamp": "1538564256", remove the 3rd or more "'s. Then read on until another line containing timestamp and repeat or a line containing "score": "10 and remove the 3rd or more "'s.
This question already has answers here:
Parsing JSON with Unix tools
(45 answers)
Closed 4 years ago.
I tried using Parse JSON to array in a shell script
but unable to get required field. Below is my json:
{
"status": "UP",
"databaseHealthCheck":
{
"status": "UP",
"dataSource":
{
"maxActive": 100,
"maxIdle": 8,
"numActive": 0,
"url": "jdbc:oracle:thin:#hostname:port/db_name",
"userName": "test_123"
}
},
"JMSHealthCheck":
{
"status": "UP",
"producerTemplate":
{
"name": "Test_2",
"pendingCount": 0,
"operator": "<"
}
},
"diskSpace":
{
"status": "UP",
"total": 414302519296,
"free": 16099868672,
"threshold": 10485760
}
}
I want to extract pendingCount value under producerTemplate under JMSHealthCheck.
Have restriction to use utility like jq.
Bash Version 3.x
In absence of jq, you may use this gnu grep command:
read -r s < <(grep -zoP '"JMSHealthCheck":\s*{[^{}]*?"producerTemplate":\s*{[^{}]*?"pendingCount":\h*\K\d+' file.json)
echo "$s"
0
However, please keep in mind that parsing a JSON using regex is not recommended. If you have jq then it would be a very simple jq command lke this:
jq '.JMSHealthCheck.producerTemplate.pendingCount' file.json
Hi i am using the below content in a file , i want the value of shortversion to be printed ,
{
"app_versions": [
{
"version": "15",
"shortversion": "0.0.15",
"title": "java expert",
"timestamp": 1469530069,
"appsize": 3436229,
"notes": ,
"mandatory": false,
"external": false,
"device_family": null,
"id": 9,
"app_id": 356250,
"minimum_os_version": "4.1",
,
{
"version": "7",
"shortversion": "0.0.7",
"title": "java expert",
"timestamp": 1469528889,
"appsize": 3436225,
,
{
"version": "3",
"shortversion": "0.0.3",
"title": "javaExpert",
"timestamp": 1469209202,
"appsize": 3420965,
how can i print the value of first occurrence of short version using sed,i have used the following awk command to get the shortversion awk -F'"' '/\"shortversion\"/{print $10;}' read.version this command is generating output of 0.0.15 which is correct , but the file is getting generated dynamically , need your valuable help on this
It is more modular to use a command line JSON parser like jq to parse your JSON input. It would be easier to maintain your script in case your JSON object tree change in the future.
You can get shortversion for the first element of your app_versions array with the following :
jq -r ". | .app_versions[1].shortversion" your_file.json
Maybe you can change a qualifier ':';eg
awk -F":" '/shortversion/{print $2}' datafile
and then use 'sed' to replace ','and '"';