AWK argument prints unwanted newline

AWK argument prints unwanted newline - excel

Disclaimer: I used an extremely simple example thinking each argument had some hidden encoding I wasn't aware of. Turns out my
formatting was entirely wrong. As #miken32 said, I should be using
commas. I changed my format and it works perfectly. Valuable lesson
learned.
I've exported a csvfile from an xlsl with Excel 2013 (on Windows). I emailed myself the new csv file and am running these tests on Unix (MacOS Sierra).
Consider the following CSV file:
John
Adam
Cameron
Jordan
I'm trying to format each line to look like this:
{'operator':'EQ', 'property':'first_name', 'value':'John'},
{'operator':'EQ', 'property':'first_name', 'value':'Adam'},
{'operator':'EQ', 'property':'first_name', 'value':'Cameron'},
{'operator':'EQ', 'property':'first_name', 'value':'Jordan'}
So value is the only argument changing between each line.
Here is the awk file I wrote:
BEGIN { }
{
print "{'operator':'EQ', 'property':'first_name', 'value':'"$0"'},";
}
END { }
But after executing this is the output I get:
{'operator':'EQ', 'property':'first_name', 'value':'John
'},
{'operator':'EQ', 'property':'first_name', 'value':'Adam
'},
Notice how right after the argument ($0) is printed out, a newline is printed? This is messing with my JSON format. I have a feeling this has to do with the excel exporting (which was done by Save as .csv).
Any suggestions?

In awk, $0 represents the entire line, whereas $1, $2, $n represent the delimited fields in the line.
The sample provided isn't a CSV file, since there aren't any values separated by commas. If it were, you could do this:
awk -F, '{print "{'"'"'operator'"'"':'"'"'EQ'"'"', '"'"'property'"'"':'"'"'first_name'"'"', '"'"'value'"'"':'"'"'"$1"'"'"'},"}' foo.txt
Which gets a bit crazy with the shell-friendly quoting!
You should be aware that there are tools such as jq, which are designed to create and work with JSON data. If this is more than a one-off task you might be better served looking at those.
Edit using a suggestion by Ed Morton from a comment:
awk -F, '{print "{\047operator\047:\047EQ\047, \047property\047:\047first_name\047, \047value\047:\047"$1"\047},"}' foo.txt
(But from your original question it looks like you're using a separate script file anyway, so you won't have to worry about escaping quotes.)

As has been noted, your sample output with '-based quoting isn't valid JSON, where only " may be used.
Ensuring valid JSON output is a good reason to
use the jq CLI, which not only makes the task more robust, but also simplifies it:
jq -Rnc 'inputs | { operator: "EQ", property: "first_name", value: . }' <<EOF
John
Adam
Cameron
Jordan
EOF
yields:
{"operator":"EQ","property":"first_name","value":"John"}
{"operator":"EQ","property":"first_name","value":"Adam"}
{"operator":"EQ","property":"first_name","value":"Cameron"}
{"operator":"EQ","property":"first_name","value":"Jordan"}
Explanation:
-R reads Raw input (input that isn't JSON)
-n suppresses automatic reading of the input, so that special variables input and inputs can be used instead.
-c produces compact output (not pretty-printed)
inputs represents all input lines, and the expression after | sees each line as ., iteratively.
The output object can be specified using JavaScript syntax, which simplifies matters because the property names don't require quoting; the expanded value of { ... } is converted to JSON on output.

Perl:
perl -MJSON -nlE 'push #p,{operator=>"EQ",property=>"first_name",value=>$_}}{say JSON->new->pretty->encode(\#p)' file
output is valid, pretty-printed JSON:
[
{
"operator" : "EQ",
"property" : "first_name",
"value" : "John"
},
{
"operator" : "EQ",
"value" : "Adam",
"property" : "first_name"
},
{
"operator" : "EQ",
"property" : "first_name",
"value" : "Cameron"
},
{
"property" : "first_name",
"value" : "Jordan",
"operator" : "EQ"
}
]
more readble:
perl -MJSON -nlE '
push #p, { operator=>"EQ", property=>"first_name", value=>$_}
END {
say JSON->new->pretty->encode(\#p)
}' file
If you generating JSON, a final note: in the JSON the single quotes aren't allowed.

Related

How do I grep and replace string in bash

I have a file which contains my json
{
"type": "xyz",
"my_version": "1.0.1.66~22hgde",
}
I want to edit the value for key my_version and everytime replace the value after third dot with another number which is stored in a variable so it will become something like 1.0.1.32~22hgde. I am using sed to replace it
sed -i "s/\"my_version\": \"1.0.1.66~22hgde\"/\"my_version\": \"1.0.1.$VAR~22hgde\"/g" test.json
This works but the issue is that my_version string doesn't remain constant and it can change and the string can be something like this 1.0.2.66 or 2.0.1.66. So how do I handle such case in bash?

how do I handle such case?
You write a regular expression to match any possible combination of characters that can be there. You can learn regex with fun with regex crosswords online.
Do not edit JSON files with sed - sed is for lines. Consider using JSON aware tools - like jq, which will handle any possible case.

A jq answer: file.json contains
{
"type": "xyz",
"my_version": "1.0.1.66~22hgde",
"object": "can't end with a comma"
}
then, replacing the last octet before the tilde:
VAR=32
jq --arg octet "$VAR" '.my_version |= sub("[0-9]+(?=~)"; $octet)' file.json
outputs
{
"type": "xyz",
"my_version": "1.0.1.32~22hgde",
"object": "can't end with a comma"
}

Adding the curly braces in begning and end of output

I have command that gives me following output:
'First' : 'abc',
'Second' :'xyz',
'Third' :'lmn'
Requirement here is to convert this output into valid json format.
So I replaced all ' to " using sed :
<command> | sed "s/'/\"/g"
"First" : "abc",
"Second" :"xyz",
"Third" :"lmn"
Now I also need to add { in the begining and end of the output how can I do that.
Any other thoughts are also welcome.

sed -z "s/[[:space:]]*'\([^']*\)'[[:space:]]*:[[:space:]]*'\([^']*\)'[[:space:]]*/"'"\1":"\2",/g; s/,$//; s/^/{/; s/$/}/'
First match the '<this>' : '<and this>'
Then convert each such sequences into "<this>":"<and this>",
Remove trailing comma.
Add { } in front of it.
-z is a GNU extension to parse it all as one line. Alternatively you could remove newlines before passing to sed.

|sed -e '1s/^/{/' -e "s/'/"/g" -e '$s/$/}/' does the work.

Convert key:value to CSV file

I found the following bash script for converting a file with key:value information to CSV file:
awk -F ":" -v OFS="," '
BEGIN { print "category","recommenderSubtype", "resource", "matchesPattern", "resource", "value" }
function printline() {
print data["category"], data["recommenderSubtype"], data["resource"], data["matchesPattern"], data["resource"], data["value"]
}
{data[$1] = $2}
NF == 0 {printline(); delete data}
END {printline()}
' file.yaml
But after executed it, it only converts the first group of data (only the first 6 rows of data), like this
category,recommenderSubtype,resource,matchesPattern,resource,value
COST,CHANGE_MACHINE_TYPE,instance-1,f1-micro,instance-1,g1-small
My original file is like this (with 1000 rows and more):
category:COST
recommenderSubtype:CHANGE_MACHINE_TYPE
resource:portal-1
matchesPattern:f1-micro
resource:portal-1
value:g1-small
category:PERFORMANCE
recommenderSubtype:CHANGE_MACHINE_TYPE
resource:old-3
matchesPattern:n1-standard-4
resource:old-3
value:n1-highmem-2
Is there any command am I missing?

The problem with the original script are these lines:
NF == 0 {printline(); delete data}
END {printline()}
The first line means: Call printline() if the current line has no records. The second line means call printline() after all data has been processed.
The difficulty with the input data format is that it does not really give a good indicator when to output the next record. In the following, I have simply changed the script to output the data every six records. In case there can be duplicate keys, the criterion for output might be "all fields populated" or such which would need to be programmed slightly differently.
#!/bin/sh -e
awk -F ":" -v OFS="," '
BEGIN {
records_in = 0
print "category","recommenderSubtype", "resource", "matchesPattern", "resource", "value"
}
{
data[$1] = $2
records_in++
if(records_in == 6) {
records_in = 0;
print data["category"], data["recommenderSubtype"], data["resource"], data["matchesPattern"], data["resource"], data["value"]
}
}
' file.yaml
Other commends
I have just removed the delete statement, because I am unsure what it does. The POSIX specification for awk only defines it for deleting single array elements. In case the whole array should be deleted, it recommends doing a loop over the elements. In case all fields are always present, however, it might as well be possible to eliminate it altogether.
Welcome to SO (I am new here as well). Next time you are asking, I would recommend tagging the question awk rather than bash because AWK is really the scripting language used in this question with bash only being responsible for calling awk with suitable parameters :)

Bash - String replace of index-based substring

I have a file that includes, among other things, a json. The json contains passwords that need masking. The bash script responsible for the masking has no way of knowing the actual password itself, so its not a simple sed search-and-replace.
The passwords appear within the json under a constant key named "password" or "Password". Typically, the appearance is like -
...random content..."Password\":\"actualPWD\"...random content....
The bash script needs to change such appearances to -
...random content..."Password\":\"******\"...random content....
The quotes aren't important, so even ...random
content..."Password\":******...random content...
would work.
I reckon the logic would need to find the index of the ':' that appears after the text "Password"/"password" and then substring from that point on till the second occurrence of quote (") from there and replace the whole thing with *****. But I'm not sure how to do this with sed or awk. Any suggestion would be helpful.

Perl to the rescue!
perl -pe 's/("[Pp]assword\\":\\")(.*?)(\\")/$1 . ("." x length $2) . $3/ge'
/e interprets the replacement part as code, so you can use the repetition operator x and repeat the dot length $2 times.

Since JSON is structured, any approach based solely on regular expressions is bound to fail at some point unless the input is constrained in some way. It would be far better (simpler and safer) to use a JSON-aware approach.
One particularly elegant JSON-aware tool worth knowing about is jq. (Yes, the "j" is for JSON :-)
Assuming we have an input file consisting of valid JSON and that we want to change the value of every "password" or "Password" key to "******" (no matter how deeply nested the object having these keys may be), we could proceed as follows:
Place the following into a file, say mask.jq:
def mask(p): if has(p) then .[p] = "******" else . end;
.. |= if type == "object"
then mask("password") | mask("Password") else . end
Now suppose in.json has this JSON:
{"password": "secret", "details": [ {"Password": "another secret"} ]}
Then executing the command:
jq -f mask.jq in.json
produces:
{
"password": "******",
"details": [
{
"Password": "******"
}
]
}
More on jq at https://github.com/stedolan/jq

Using Unix Tools to Extract String Values

I wrote a small Perl script to extract all the values from a JSON formatted string for a given key name (shown below). So, if I set a command line switch for the Perl script to id, then it would return 1,2, and stringVal from the JSON example below. This script does the job, but I want to see how others would solve this same problem using other unix style tools such as awk, sed, or perl itself. Thanks
{
"id":"1",
"key2":"blah"
},
{
"id":"2",
"key9":"more blah"
},
{
"id":"stringVal",
"anotherKey":"even more blah"
}
Excerpt of perl script that extracts JSON values:
my #values;
while(<STDIN>) {
chomp;
s/\s+//g; # Remove spaces
s/"//g; # Remove quotes
push #values, /$opt_s:([\w]+),?/g; # $opt_s is a command line switch for the key to find
}
print join("\n",#values);

use JSON;

I would strongly suggest using the JSON module. It will parse your json input in one function (and back). It also offers an OOP interface.

gawk
gawk 'BEGIN{
FS=":"
printf "Enter key name: "
getline key < "-"
}
$0~key{
k=$2; getline ; v = $2
gsub("\"","",k)
gsub("\"","",v)
print k,v
}' file
output
$ ./shell.sh
Enter key name: id
1, blah
2, more blah
stringVal, even more blah
If you just want the id value,
$ key="id"
$ awk -vkey=$key -F":" '$0~key{gsub("\042|,","",$2);print $2}' file
1
2
stringVal

Here is a very rough Awk script to accomplish the task:
awk -v k=id -F: '/{|}/{next}{gsub(/^ +|,$/,"");gsub(/"/,"");if($1==k)print $2}' data
the -F: specifies ':' as the field separator
The -v k=id sets the key you're
searching for.
lines containing '{'
or '}' are skipped.
the first gsub
gets rid of leading whitespace and
trailing commas.
The second gsub gets
rid of double quotes.
Finally, if k
matches $1, $2 is printed.
data is the file containing your JSON

sed (provided that file is formatted as above, no more than one entry per line):
KEY=id;cat file|sed -n "s/^[[:space:]]*\"$KEY\":\"//p"|sed 's/".*$//'

Why are you parsing the string yourself when there are libraries to do this for you? json.org has JSON parsing and encoding libraries for practically every language you can think of (and probably a few that you haven't). In Perl:
use strict;
use warnings;
use JSON qw(from_json to_json);
# enable slurp mode
local $/;
my $string = <DATA>;
my $data = from_json($string);
use Data::Dumper;
print "the data was parsed as: " . Dumper($data);
__DATA__
[
{
"id":"1",
"key2":"blah"
},
{
"id":"2",
"key9":"more blah"
},
{
"id":"stringVal",
"anotherKey":"even more blah"
}
]
..produces the output (I added a top level array around the data so it would be parsed as one object):
the data was parsed as: $VAR1 = [
{
'key2' => 'blah',
'id' => '1'
},
{
'key9' => 'more blah',
'id' => '2'
},
{
'anotherKey' => 'even more blah',
'id' => 'stringVal'
}
];

If you don't mind seeing the quote and colon characters, I would simply use grep:
grep id file.json

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

AWK argument prints unwanted newline - excel

Related

How do I grep and replace string in bash

Adding the curly braces in begning and end of output

Convert key:value to CSV file

Bash - String replace of index-based substring

Using Unix Tools to Extract String Values

Categories

Resources