How can i remove these special characters from JSON output file - linux

^[[0;32m ?~V? ^[[0m
JSON file is being written by shell script.
So the text processing produces these special characters, tried using dos2unix and changing the characters globally using %s option as well.

Check this out. I introduced some control characters in a sample JSON file which can be displayed using "cat -v" command. Those with ^B,^A,^D are control characters.
Use perl to remove the control characters completely. You can redirect to a new file
> cat -v json_control.txt
^B{"menu": {
"id": "file",
"value": "File",
"popup": ^B{
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
]
}
}}^D
^A
> perl -pe ' { s/[\x00-\x09\x0B-\x1F]//g } ' json_control.txt | cat -v
{"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
]
}
}}
>

Related

BASH How to treat JSON inside a variable to remove only a specific part of the text

I have a big JSON inside a var and I need to remove only from that specific comma ( and I have incontable number of others comma before that ) until the penultimate Curly Brackets..
in short, Only the BOLD text..... ( text between ** and next ** )
edit
originally there is no ** in json, I put it in the code just to show where it starts and ends what I want to remove
##################################################
}
]
}**,
"meta": {
"timeout": 0,
"priority": "LOW_PRIORITY",
"validationType": "SAME_FINGERS",
"labelFilters": [],
"externalIDs": [
{
"name": "chaveProcesso",
"key": "01025.2021.0002170"
}
]
}**
}
It would help if you showed more context, but basically you want something like:
jq 'del(.meta)'
or:
jq 'with_entries(select(.key != "meta"))'
eg:
#!/bin/sh
json='{
"foo": 5,
"meta": {
"timeout": 0,
"priority": "LOW_PRIORITY",
"validationType": "SAME_FINGERS",
"labelFilters": [],
"externalIDs": [
{
"name": "chaveProcesso",
"key": "01025.2021.0002170"
}
]
}
}'
echo "$json" | jq 'del(.meta)'

using grep commands to find a duplicate id within a json file

I am looking for a way to use grep on a linux server to find duplicate json records, is it possible to have a grep to search for duplicate id's in the example below ?
so the grep would return: 01
{
"book": [
{
"id": "01",
"language": "Java",
"edition": "third",
"author": "Herbert Schildt"
},
{
"id": "02",
"language": "Java",
"edition": "third",
"author": "Herbert Schildt"
},
{
"id": "03",
"language": "Java",
"edition": "third",
"author": "Herbert Schildt"
},
{
"id": "01",
"language": "Java",
"edition": "third",
"author": "Herbert Schildt"
},
{
"id": "04",
"language": "C++",
"edition": "second",
"author": "E.Balagurusamy"
}
]
}
use grep along with uniq.
grep '"id":' filename | sort | uniq -d
The -d option only prints duplicates.
However, this depends on the JSON being laid out neatly. To handle more general formatting, I recommend you use the jq utility.
A jq-based approach:
jq -r '.book[].id' < in.json | sort | uniq -d
01
This should work even for minified JSON files with no newlines.
OK, discarding any whitespace from the JSON strings I can offer this if awk is acceptable - hutch being the formatted chunk of JSON above in a file.
I use tr to remove any whitespace, use , as a field separator in awk; iterate over the one long lines elements with a for-loop, do some pattern-matching in awk to isolate ID fields and increment an array for each matched ID. At the end of processing I iterate over the array and print ID's that have more than one match.
Here your data:
$ cat hutch
{
"book": [
{
"id": "01",
"language": "Java",
"edition": "third",
"author": "Herbert Schildt"
},
{
"id": "02",
"language": "Java",
"edition": "third",
"author": "Herbert Schildt"
},
{
"id": "03",
"language": "Java",
"edition": "third",
"author": "Herbert Schildt"
},
{
"id": "01",
"language": "Java",
"edition": "third",
"author": "Herbert Schildt"
},
{
"id": "04",
"language": "C++",
"edition": "second",
"author": "E.Balagurusamy"
}
]
}
And here the finding of dupes:
$ tr -d '[:space:]' <hutch | awk -F, '{for(i=1;i<=NF;i++){if($i~/"id":/){a[gensub(/^.*"id":"([0-9]+)"$/, "\\1","1",$i)]++}}}END{for(i in a){if(a[i]>1){print i}}}'
01
Use a Perl one-liner to extract the numeric ids, then sort | uniq -d to print only the duplicates (as in the answer by Barmar):
This assumes that the id key/value pair is on the same line, but disregards whitespace (or lack of whitespace) anywhere on the line (leading, trailing, and in between):
perl -lne 'print for /"id":\s*"(\d+)"/' in.json | sort | uniq -d
This makes no assumptions (disregards whitespace and newlines). Note that it reads the entire json file into memory (using the -0777 command line switch):
perl -0777 -nE 'say for /"id":\s*"(\d+)"/g' in.json | sort | uniq -d
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-E : Tells Perl to look for code in-line, instead of in a file. Also enables all optional features. Here, enables say.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-0777 : Slurp files whole.
The regex uses this modifier:
/g : Multiple matches.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start

Convert a json file to csv using shell script without using jq?

I want to convert a json file to csv using shell script without using jq. Is it possible?
Here is a json :
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
},
{
"id": "0002",
"type": "donut2",
"name": "Cake2",
"ppu": 0.5522,
}
I don't want to use jq.
I want to store it in a csv file.
Bare-bones core-only perl one-liner version, to complement the python and ruby ones already given:
perl -MJSON::PP -0777 -nE '$,=","; say #$_{"id","type","name","ppu"} for #{decode_json $_}' input.json
A more robust one would use a more efficient non-core JSON parser and a CSV module to do things like properly quote fields when needed, but since your sample data doesn't include such fields I didn't bother. Can if requested.
And the unrequested jq version, because that really is the best approach whether you want it or not:
jq -r '.[] | [.id, .type, .name, .ppu] | #csv' input.json
Bash is not the tool to do this at all. But if for some reason you cannot install jq, you can simply use Python 3, which comes by default in most distros of Linux and in MacOS.
#!/usr/local/bin/python3
import json
objs=json.loads("""
[
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55
},
{
"id": "0002",
"type": "donut2",
"name": "Cake2",
"ppu": 0.5522
}
]
""")
for item in objs :
print("{}{}{}{}{}{}{}".format(item['id'],",",item['type'],",",item['name'],",",item['ppu']))
If you do not have Python 3 either, you can then do it in Ruby, which also comes by default in most distros and MacOS :
#!/usr/bin/ruby
require 'json'
content = '
[
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55
},
{
"id": "0002",
"type": "donut2",
"name": "Cake2",
"ppu": 0.5522
}
]
'
JSON.parse(content).each { |item| puts "#{item['id']},#{item['type']},#{item['name']},#{item['ppu']}" }
You can then redirect the output to a file :
script.rb > output.csv
And thats it.
Nevertheless, if you can be completely sure of the format of your input, you can do some bash magic, specially using awk. But as others also said, please don't do that.

Removing pattern from multiple lines using sed or awk in two places in the same line

I have a JSON file with 12,166,466 of lines.
I want to remove quotes from values on keys:
"timestamp": "1538564256",and "score": "10", to look like
"timestamp": 1538564256, and "score": 10,.
Input:
{
"title": "DNS domain", ,
"timestamp": "1538564256",
"domain": {
"dns": [
"www.google.com"
]
},
"score": "10",
"link": "www.bit.ky/sdasd/asddsa"
"id": "c-1eOWYB9XD0VZRJuWL6"
}, {
"title": "DNS domain",
"timestamp": "1538564256",
"domain": {
"dns": [
"google.de"
]
},
"score": "10",
"link": "www.bit.ky/sdasd/asddsa",
"id": "du1eOWYB9XD0VZRJuWL6"
}
}
Expected output:
{
"title": "DNS domain", ,
"timestamp": 1538564256,
"domain": {
"dns": [
"www.google.com"
]
},
"score": 10,
"link": "www.bit.ky/sdasd/asddsa"
"id": "c-1eOWYB9XD0VZRJuWL6"
}, {
"title": "DNS domain",
"timestamp": 1538564256,
"domain": {
"dns": [
"google.de"
]
},
**"score": 10,**
"link": "www.bit.ky/sdasd/asddsa",
"id": "du1eOWYB9XD0VZRJuWL6"
}
}
I have tried:
sed -E '
s/"timestamp": "/"timestamp": /g
s/"score": "/"score": /g
'
the first part is quite straightforward, but how to remove ", at that the end of the line that contains "timestamp" and "score"? How do I access that using sed or even awk, or other tool with the mind that I have 12 million lines to process?
Assuming that you fix your JSON input file like this:
<file jq .
[
{
"title": "DNS domain",
"timestamp": "1538564256",
"domain": {
"dns": [
"www.google.com"
]
},
"score": "10",
"link": "www.bit.ky/sdasd/asddsa",
"id": "c-1eOWYB9XD0VZRJuWL6"
},
{
"title": "DNS domain",
"timestamp": "1538564256",
"domain": {
"dns": [
"google.de"
]
},
"score": "10",
"link": "www.bit.ky/sdasd/asddsa",
"id": "du1eOWYB9XD0VZRJuWL6"
}
]
You can use jq and its tonumber function to change the wanted strings to values:
<file jq '.[].timestamp |= tonumber | .[].score |= tonumber'
If the JSON structure matches roughly your example (e. g., there won't be any other whitespace characters between "timestamp", the colon, and the value), then this awk should be ok. If available, using jq for JSON transformation is the better choice by far!
awk '{print gensub(/("(timestamp|score)": )"([0-9]+)"/, "\\1\\3", "g")}' file
Be warned that tonumber can lose precision. If using tonumber is inadmissible, and if the output is produced by jq (or otherwise linearized vertically), then using awk as proposed elsewhere on this page is a good way to go. (If your awk does not have gensub, then the awk program can be easily adapted.) Here is the same thing using sed, assuming its flag for extended regex processing is -E:
sed -E -e 's/"(timestamp|score)": "([0-9]+)"/"\1": \2/'
For reference, if there's any doubt about where the relevant keys are located, here's a filter in jq that is agnostic about that:
walk(if type == "object"
then if has("timestamp") then .timestamp|=tonumber else . end
| if has("score") then .score|=tonumber else end
else . end)
If your jq does not have walk/1, then simply snarf its def from the web, e.g. from https://raw.githubusercontent.com/stedolan/jq/master/src/builtin.jq
If you wanted to convert all number-valued strings to numbers, you could write:
walk(if type=="object" then map_values(tonumber? // .) else . end)
This might work for you (GNU sed):
sed ':a;/"timestamp":\s*"1538564256",/{s/"//3g;:b;n;/timestamp/ba;/"score":\s*"10"/s/"//3g;Tb}' file
On encountering a line that contains "timestamp": "1538564256", remove the 3rd or more "'s. Then read on until another line containing timestamp and repeat or a line containing "score": "10 and remove the 3rd or more "'s.

Replacing date within a JSON config file

I'm new but trying to get a new script running but I need it to call on todays date as a variable within the configuration file so the program can be run.
I'm sure sure the best way to implement it so far this line will replace the correct part of the configuration file I need but I can't figure out how to get it to use the "todays date" e.g. date +%F command.
sed -i 's/"to_date":.*/"to_date":"date +%F"/' /config/settings
config following:
{
"username":"admin",
"password":"redhat",
"assumeyes":true,
"to_date": "2011-10-01",
"skip_depsolve":false,
"skip_errata_depsolve":false,
"security_only":false,
"use_update_date":false,
"no_errata_sync":false,
"dry_run":false,
"errata": ["RHSA-2014:0043", "RHBA-2014:0085"],
"blacklist": {
},
"removelist": {
},
"channels":[
{
"rhel-x86_64-server-5": {
"label": "my-rhel5-x86_64-clone",
"existing-parent-do-not-modify": true
},
"rhn-tools-rhel-x86_64-server-5": {
"label": "my-tools-5-x86_64-clone",
"name": "My Clone's Name",
"summary": "This is my channel's summary",
"description": "This is my channel's description"
}
},
{
"rhel-i386-server-5": "my-rhel5-i386-clone"
}
]
}
Using a proper JSON parser jq with the --arg field to pass the current date,
jq --arg inputDate $(date +%F) '.to_date = $inputDate' /config/settings
{
"username": "admin",
"password": "redhat",
"assumeyes": true,
"to_date": "2017-01-27",
"skip_depsolve": false,
"skip_errata_depsolve": false,
"security_only": false,
"use_update_date": false,
"no_errata_sync": false,
"dry_run": false,
"errata": [
"RHSA-2014:0043",
"RHBA-2014:0085"
],
"blacklist": {},
"removelist": {},
"channels": [
{
"rhel-x86_64-server-5": {
"label": "my-rhel5-x86_64-clone",
"existing-parent-do-not-modify": true
},
"rhn-tools-rhel-x86_64-server-5": {
"label": "my-tools-5-x86_64-clone",
"name": "My Clone's Name",
"summary": "This is my channel's summary",
"description": "This is my channel's description"
}
},
{
"rhel-i386-server-5": "my-rhel5-i386-clone"
}
]
}
The jq download and usage instructions are pretty straight forward. Recommend using it for manipulating JSON, instead of depending upon regex.
jq does not edit the file in-place, save it to a temporary file and rename it back, using GNU mktemp
jsonTemp=$(mktemp)
jq --arg inputDate $(date +%F) '.to_date = $inputDate' /config/settings > "$jsonTemp"
mv "$jsonTemp" /config/settings
To include the output of a command inside some quoted text, you have to use a subshell and use double-quotes so the text will get expanded :
sed -i "s/\"to_date\":.*/\"to_date\":\"$(date +%F)\"/" /config/settings
Also I second Inian's comment : you should be using jq to manipulate JSON data.
For example, the following command should do the modification you need :
jq ".toDate = $(date +%F)" /config/settings

Resources