Use match from file in string - linux

I have a list of user ids in a file and I'm trying to generate some sql from the file.
cat users.json | awk '/UID/{print "INSERT IGNORE INTO potential_problem_users VALUES ("$2");" }'
But it's not doing what I expected it to do:
);SERT IGNORE INTO potential_problem_users VALUES ("1"
);SERT IGNORE INTO potential_problem_users VALUES ("2"
The query I am trying for is:
INSERT IGNORE INTO potential_problem_users VALUES ([userID]);
example json if it helps:
{
"results": [
{
"UID": "abc"
},
{
"UID": "124"
}
],
"objectsCount": 5,
"totalCount": 10966,
"statusCode": 200,
"errorCode": 0,
"statusReason": "OK"
}
Am I using the right tool? If I am, what am I doing wrong?

Your JSON input has a couple of issues which makes it impossible to parse them using parser like jq which on OS X you can install by brew install jq
The error free JSON from your question would be
{
"results": [
{
"UID": "abc"
},
{
"UID": "124"
}
],
"objectsCount": 5,
"totalCount": 10966,
"statusCode": 200,
"errorCode": 0,
"statusReason": "OK"
}
To parse this JSON and produce two statements as queries containing the UID values, just do
jq --raw-output '"INSERT IGNORE INTO potential_problem_users VALUES (" + (.results[] | .UID) + ")"' users.json
would produce an output as
INSERT IGNORE INTO potential_problem_users VALUES (abc)
INSERT IGNORE INTO potential_problem_users VALUES (124)
The addition operator + in jq allows you to concatenate strings to form the final string.
jq snippet from jqplay.

awk -F: '/UID/ {print "INSERT IGNORE INTO potential_problem_users VALUES ("gensub(" ","","g",$2)");"}' users.json
Use gensub to take the spaces out of second : delimited field. Note also that you don't need to cat the file to awk it.

You have several JSON parsers available to you on OS X that will also work on Linux. You can use Perl, Python or Ruby among others.
Here is a demo in Ruby:
Given:
$ cat json
{
"results": [
{
"UID": "abc"
},
{
"UID": "124"
}
],
"objectsCount": 5,
"totalCount": 10966,
"statusCode": 200,
"errorCode": 0,
"statusReason": "OK"
}
You can parse that file and print to lines of interest this way:
$ ruby -0777 -lane 'require "json"
d=JSON.parse($_)
d["results"].each {|e| puts "INSERT IGNORE INTO potential_problem_users VALUES (#{e["UID"]})" }' json
INSERT IGNORE INTO potential_problem_users VALUES (abc)
INSERT IGNORE INTO potential_problem_users VALUES (124)

users.json contains carriage returns (aka \rs aka control-Ms), run dos2unix or similar on it first then re-run your awk script (or run any other script) on it and then let us know if you still have a problem.
Almost every time you find characters you expect to see at the end of a line showing up at the start instead the problem is control-Ms.

Related

How do I grep and replace string in bash

I have a file which contains my json
{
"type": "xyz",
"my_version": "1.0.1.66~22hgde",
}
I want to edit the value for key my_version and everytime replace the value after third dot with another number which is stored in a variable so it will become something like 1.0.1.32~22hgde. I am using sed to replace it
sed -i "s/\"my_version\": \"1.0.1.66~22hgde\"/\"my_version\": \"1.0.1.$VAR~22hgde\"/g" test.json
This works but the issue is that my_version string doesn't remain constant and it can change and the string can be something like this 1.0.2.66 or 2.0.1.66. So how do I handle such case in bash?
how do I handle such case?
You write a regular expression to match any possible combination of characters that can be there. You can learn regex with fun with regex crosswords online.
Do not edit JSON files with sed - sed is for lines. Consider using JSON aware tools - like jq, which will handle any possible case.
A jq answer: file.json contains
{
"type": "xyz",
"my_version": "1.0.1.66~22hgde",
"object": "can't end with a comma"
}
then, replacing the last octet before the tilde:
VAR=32
jq --arg octet "$VAR" '.my_version |= sub("[0-9]+(?=~)"; $octet)' file.json
outputs
{
"type": "xyz",
"my_version": "1.0.1.32~22hgde",
"object": "can't end with a comma"
}

Sed replace inside a json file with regex

I have a huge json file, i will copy a part of it :
"panels": [
"targets": [
{
"alias": "First Prod",
"dimensions": {
"Function": "robot-support-dev-create-human"
},
}
{
"alias": "Second Prod",
"dimensions": {
"Function": "robot-support-prototype-dev-beta-activate-human"
},
}
{
"alias": "third Prod",
"dimensions": {
"Function": "robot-support-dev-jira-kill-human"
},
}
{
"alias": "Somehting",
"dimensions": {
"Robotalias": "default",
"RobotName": "Robot-prod-prototype",
"Operation": "Fight"
},
]
]
I want to perform a Regex on Function each time it contains the robot-support-dev to robot-support-prod-...
sed -i ' s/"robot-support-([a-zA-Z0-9_]+)-dev-([^"]+)"/robot-support-\\1-prod-\\2/g;'
This is what i did but there's something wrong with my regex maybe
You can't match a json with a regex. The "right way(TM)" would be to first extract Function using a json aware tool like jq, then modify it with sed, then insert it back using json aware tool.
Sed by default uses basic regex, so you need to change ( ) + into \( \) \{1,\} (the \+ is a GNU extension) or with GNU sed just use sed -E to use extended regex. Also the \\1 would be interpreted as 2 characters \ with 1, you want to use \1 with a single \. But anyway, your regex is just invalid and does not match what you want (I guess). Also the " are missing on the right side in the replacement string, so your command would just remove the ". Just substitute what you need, try:
sed 's/"robot-support-dev-/"robot-support-prod-/g;'
As #KamilCuk pointed out, there are some issues with the regex you are currently using. I think that if the occurrences you give as an example are the only possibilities, it would work if you match these groups:
sed -i 's/"robot-support\(-\|-.*-\)dev-\(.*\)"/"robot-support\1prod-\2"/g'
As pointed out elsewhere on this page, using sed for this type of problem is, at best, fraught with danger. Since the question has been tagged with jq, it should be pointed out that jq is an excellent match for this type of problem. In particular, a trivial solution can be obtained using the filter sub of arity 2 or 3, i.e. sub/3: sub("FROM"; "TO"; "g").
You might also wish to use walk so that you don't have to be concerned about where exactly the "Function" keys occur, e.g.
walk( if type == "object" and (.Function|type) == "string"
then .Function |= sub( "robot-support-(?<a>([^-]+-)?)dev-"; "robot-support-\(.a)prod-"; "g")
else . end)

AWK argument prints unwanted newline

Disclaimer: I used an extremely simple example thinking each argument had some hidden encoding I wasn't aware of. Turns out my
formatting was entirely wrong. As #miken32 said, I should be using
commas. I changed my format and it works perfectly. Valuable lesson
learned.
I've exported a csvfile from an xlsl with Excel 2013 (on Windows). I emailed myself the new csv file and am running these tests on Unix (MacOS Sierra).
Consider the following CSV file:
John
Adam
Cameron
Jordan
I'm trying to format each line to look like this:
{'operator':'EQ', 'property':'first_name', 'value':'John'},
{'operator':'EQ', 'property':'first_name', 'value':'Adam'},
{'operator':'EQ', 'property':'first_name', 'value':'Cameron'},
{'operator':'EQ', 'property':'first_name', 'value':'Jordan'}
So value is the only argument changing between each line.
Here is the awk file I wrote:
BEGIN { }
{
print "{'operator':'EQ', 'property':'first_name', 'value':'"$0"'},";
}
END { }
But after executing this is the output I get:
{'operator':'EQ', 'property':'first_name', 'value':'John
'},
{'operator':'EQ', 'property':'first_name', 'value':'Adam
'},
Notice how right after the argument ($0) is printed out, a newline is printed? This is messing with my JSON format. I have a feeling this has to do with the excel exporting (which was done by Save as .csv).
Any suggestions?
In awk, $0 represents the entire line, whereas $1, $2, $n represent the delimited fields in the line.
The sample provided isn't a CSV file, since there aren't any values separated by commas. If it were, you could do this:
awk -F, '{print "{'"'"'operator'"'"':'"'"'EQ'"'"', '"'"'property'"'"':'"'"'first_name'"'"', '"'"'value'"'"':'"'"'"$1"'"'"'},"}' foo.txt
Which gets a bit crazy with the shell-friendly quoting!
You should be aware that there are tools such as jq, which are designed to create and work with JSON data. If this is more than a one-off task you might be better served looking at those.
Edit using a suggestion by Ed Morton from a comment:
awk -F, '{print "{\047operator\047:\047EQ\047, \047property\047:\047first_name\047, \047value\047:\047"$1"\047},"}' foo.txt
(But from your original question it looks like you're using a separate script file anyway, so you won't have to worry about escaping quotes.)
As has been noted, your sample output with '-based quoting isn't valid JSON, where only " may be used.
Ensuring valid JSON output is a good reason to
use the jq CLI, which not only makes the task more robust, but also simplifies it:
jq -Rnc 'inputs | { operator: "EQ", property: "first_name", value: . }' <<EOF
John
Adam
Cameron
Jordan
EOF
yields:
{"operator":"EQ","property":"first_name","value":"John"}
{"operator":"EQ","property":"first_name","value":"Adam"}
{"operator":"EQ","property":"first_name","value":"Cameron"}
{"operator":"EQ","property":"first_name","value":"Jordan"}
Explanation:
-R reads Raw input (input that isn't JSON)
-n suppresses automatic reading of the input, so that special variables input and inputs can be used instead.
-c produces compact output (not pretty-printed)
inputs represents all input lines, and the expression after | sees each line as ., iteratively.
The output object can be specified using JavaScript syntax, which simplifies matters because the property names don't require quoting; the expanded value of { ... } is converted to JSON on output.
Perl:
perl -MJSON -nlE 'push #p,{operator=>"EQ",property=>"first_name",value=>$_}}{say JSON->new->pretty->encode(\#p)' file
output is valid, pretty-printed JSON:
[
{
"operator" : "EQ",
"property" : "first_name",
"value" : "John"
},
{
"operator" : "EQ",
"value" : "Adam",
"property" : "first_name"
},
{
"operator" : "EQ",
"property" : "first_name",
"value" : "Cameron"
},
{
"property" : "first_name",
"value" : "Jordan",
"operator" : "EQ"
}
]
more readble:
perl -MJSON -nlE '
push #p, { operator=>"EQ", property=>"first_name", value=>$_}
END {
say JSON->new->pretty->encode(\#p)
}' file
If you generating JSON, a final note: in the JSON the single quotes aren't allowed.

Bash script print output in json format

I'm trying to curl a webpage and does some processing to it and in final i am trying to print in json format.(which actually needs to be in mongodb input)
so the input (which is read though curl) is
Input:
brendan google engineer
stones microsoft chief_engineer
david facebook tester
for the kind of processing, i'm assigning values to the variables ($name, $emloyer, $designation)
my final command which converts to json is,
echo [{\"Name\":\"$name\"},{\"Employer\":\"$employer\"},{\"dDesignation\":\"$designation\"}]
The current output is,
[{"Name":"brendan","Employer":"google","Designation":"engineer"}]
[{"Name":"stones","Employer":"microsoft","Designation":"chief_engineer"}]
[{"Name":"david","Employer":"facebook","Designation":"tester"}]
but, i want the output in the same line separated by comma and square brackets in the start and end (not on every lines)
Expected output:
[{"Name":"brendan","Employer":"google","Designation":"engineer"},{"Name":"stones","Employer":"microsoft","Designation":"chief_engineer"},
{"Name":"david","Employer":"facebook","Designation":"tester"}]
any suggestions.
Conventional text-processing tools can't do this right for the general case. There are a bunch of corner cases to JSON -- nonprintable and high-Unicode characters (and quotes) need to be escaped, for instance. Use a tool that's actually built for the job, such as jq:
jq -n -R '
[
inputs |
split(" ") |
{ "Name": .[0], "Employer": .[1], "Designation": .[2] }
]' <<EOF
brendan google engineer
stones microsoft chief_engineer
david facebook tester
EOF
...emits as output:
[
{
"Name": "brendan",
"Employer": "google",
"Designation": "engineer"
},
{
"Name": "stones",
"Employer": "microsoft",
"Designation": "chief_engineer"
},
{
"Name": "david",
"Employer": "facebook",
"Designation": "tester"
}
]
Something like this?
sep='['
curl "...whatever..." |
while read -r name employer designation; do
printf '%s{"Name": "%s", "Employer": "%s", "Designation": "%s"}' "$sep" "$name" "$employer" "$designation"
sep=', '
done
printf ']\n'
I do agree that this is brittle and error-prone; if you can use a JSON-aware tool like jq, by all means do that instead.
If you have access to jq 1.5 or later, then you can use inputs and may wish to consider using splits(" +") in case the tokens might be separated by more than one space:
jq -n -R '
[inputs
| [splits(" +")]
| { "Name": .[0], "Employer": .[1], "Designation": .[2] }]'
If you do not have ready access to jq 1.5 or later, then please note that the following will work with jq 1.4:
jq -R -s '
[split("\n")[]
| select(length>0)
| split(" ")
| { "Name": .[0], "Employer": .[1], "Designation": .[2] }]'

Can some one explain what this code is doing? what are all the special characters in sed doing?

jsonval () {
temp=`echo $haystack | sed 's/\\\\\//\//g' | sed 's/[{}]//g' | awk -v k="text" ' {n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}' | sed 's/\"\:\"/\|/g' | sed 's/[\,]/ /g' | sed ' s/\"//g' | grep -w $needle`
echo ${temp##*|}
}
dev_key='xxxxxxxxxxxx'
zip_code='48446'
city='Lapeer'
state='MI'
red=$(tput setaf 1)
textreset=$(tput sgr0)
haystack=$(curl -Ls -X GET http://api.wunderground.com/api/$dev_key/conditions/q/$state/$city.json)
needle='temperature_string'
temperature=$(jsonval $needle $haystack)
needle='weather'
current_condition=$(jsonval $needle $haystack)
echo -e '\n' $red $current_condition 'and' $temperature $textreset '\n'
this code is supposed to parse json weather data to terminal using a developer key to call the information.
This the full code, can someone explain what sed is doing, I know it supposed to act as a substitute method, but why are there so many slashes and special characters used?
Also what is the echo ${temp##*|} doing, all these special characters is making it hard for me to understand this code.
It seems that this command try to parse json It's far to be a good idea, since there's some nice item in the toolbox. One of them is jq. It's good at formatting JSON outputs or retrieving items in complicated Data Source. Example :
file.json
{
"items": [
{
"tags": [
"bash",
"vim",
"zsh"
],
"owner": {
"reputation": 178,
"user_id": 22734,
"user_type": "registered",
"profile_image": "https://www.gravatar.com/avatar/25ee9a1b9f5a16feb1432882a9ef2f06?s=128&d=identicon&r=PG",
"display_name": "Brad Parks",
"link": "http://unix.stackexchange.com/users/22734/brad-parks"
},
"is_answered": false,
"view_count": 2,
"answer_count": 0,
"score": 0,
"last_activity_date": 1417919326,
"creation_date": 1417919326,
"question_id": 171907,
"link": "http://unix.stackexchange.com/questions/171907/use-netrw-or-nerdtree-in-zsh-bash-to-select-a-file-by-browsing",
"title": "Use Netrw or Nerdtree in Zsh/Bash to select a file BY BROWSING?"
}
]
}
Output from searching owner's sub HASH :
Don't reinvent the wheel badly ;)

Resources