Bash - String replace of index-based substring - string

I have a file that includes, among other things, a json. The json contains passwords that need masking. The bash script responsible for the masking has no way of knowing the actual password itself, so its not a simple sed search-and-replace.
The passwords appear within the json under a constant key named "password" or "Password". Typically, the appearance is like -
...random content..."Password\":\"actualPWD\"...random content....
The bash script needs to change such appearances to -
...random content..."Password\":\"******\"...random content....
The quotes aren't important, so even ...random
content..."Password\":******...random content...
would work.
I reckon the logic would need to find the index of the ':' that appears after the text "Password"/"password" and then substring from that point on till the second occurrence of quote (") from there and replace the whole thing with *****. But I'm not sure how to do this with sed or awk. Any suggestion would be helpful.

Perl to the rescue!
perl -pe 's/("[Pp]assword\\":\\")(.*?)(\\")/$1 . ("." x length $2) . $3/ge'
/e interprets the replacement part as code, so you can use the repetition operator x and repeat the dot length $2 times.

Since JSON is structured, any approach based solely on regular expressions is bound to fail at some point unless the input is constrained in some way. It would be far better (simpler and safer) to use a JSON-aware approach.
One particularly elegant JSON-aware tool worth knowing about is jq. (Yes, the "j" is for JSON :-)
Assuming we have an input file consisting of valid JSON and that we want to change the value of every "password" or "Password" key to "******" (no matter how deeply nested the object having these keys may be), we could proceed as follows:
Place the following into a file, say mask.jq:
def mask(p): if has(p) then .[p] = "******" else . end;
.. |= if type == "object"
then mask("password") | mask("Password") else . end
Now suppose in.json has this JSON:
{"password": "secret", "details": [ {"Password": "another secret"} ]}
Then executing the command:
jq -f mask.jq in.json
produces:
{
"password": "******",
"details": [
{
"Password": "******"
}
]
}
More on jq at https://github.com/stedolan/jq

Related

How do I grep and replace string in bash

I have a file which contains my json
{
"type": "xyz",
"my_version": "1.0.1.66~22hgde",
}
I want to edit the value for key my_version and everytime replace the value after third dot with another number which is stored in a variable so it will become something like 1.0.1.32~22hgde. I am using sed to replace it
sed -i "s/\"my_version\": \"1.0.1.66~22hgde\"/\"my_version\": \"1.0.1.$VAR~22hgde\"/g" test.json
This works but the issue is that my_version string doesn't remain constant and it can change and the string can be something like this 1.0.2.66 or 2.0.1.66. So how do I handle such case in bash?
how do I handle such case?
You write a regular expression to match any possible combination of characters that can be there. You can learn regex with fun with regex crosswords online.
Do not edit JSON files with sed - sed is for lines. Consider using JSON aware tools - like jq, which will handle any possible case.
A jq answer: file.json contains
{
"type": "xyz",
"my_version": "1.0.1.66~22hgde",
"object": "can't end with a comma"
}
then, replacing the last octet before the tilde:
VAR=32
jq --arg octet "$VAR" '.my_version |= sub("[0-9]+(?=~)"; $octet)' file.json
outputs
{
"type": "xyz",
"my_version": "1.0.1.32~22hgde",
"object": "can't end with a comma"
}

Is there a linux command that can cut and pick columns that match string patterns?

I need to analyze logs and my end user has to be able to see them in a formatted way, as mentioned below, and my nature of logs is the key variables will be in different position, rather than at fixed columns based on the application, as these log formats are from various applications.
"thread":"t1","key1":"value1","key2":"value2",......"key15":"value15"
I have a way to split and cut this to analyze only particular keys, using the following,
cat file.txt | grep 'value1' | cut -d',' -f2,7,8-
This is the command I am able to get, the requirement is I need to grep all logs which have 'key1' as 'value1', and this value1 will be most likely unique among all, so I am using a grep directly, if required, I can use grep to pick along with the key and value string, but main problem I am facing, is the part is after cut. I want to pick only key2, key7, key8 among these lines, but key2, key7, key8 might not appear in the same column numbers like in this order, key2 might even be at column 3 or 4 or after key7/key8, so I want pick based on the key value and get exactly
"key2":"value2", "key7":"value7", "key8:value8"
The end user is not particularly picky about the order in which they appear, they need only these keys from each line to be displayed..
Can someone help me? I tried piping with awk / grep again, but they still match the entire line not on the columns alone
My input is
{"#timestamp":"2021-08-05T06:38:48.084Z","level":"INFO","thread":"main","logger":"className1","message":"Message 1"}
{"#timestamp":"2021-08-05T06:38:48.092Z","level":"DEBUG","thread":"main","logger":"className2","message":"Message 2"}
{"#timestamp":"2021-08-05T06:38:48.092Z","level":"DEBUG","thread":"thead1","logger":"className2","message":"Message 2"}
I basically want my output to be more like, find only the "thread":"main" lines and print only the key and values of "logger" and "message" for each line which matched, since the other key and value are irrelevant to me. there is more than 15 to 16 keys in my file and my key positions could be swapped, like "message" could be the first to appear and "logger" could be the second to appear in some log files. Of course, the keys are just an example, the real keys I am trying to find are not "logger" and "message" alone.
There are log analysis tools, but this is a pretty old system, and the logs are not real time ones I am analyzing and displaying some files which are pretty much older than years.
Not sure I really understand your specification but the following awk script could be a starting point:
$ cat foo.awk
BEGIN {
k["\"key1\""] = 1; k["\"key7\""] = 1; k["\"key8\""] = 1;
}
/"key1":"value1"/ {
s = "";
for(i = 1; i <= NF; i+=2)
if($i in k)
s = s (s ? "," : "") $i ":" $(i+1);
print s;
}
$ awk -F',|:' -f foo.awk foo.txt
"key1":"value1","key7":"value7","key8":"value8"
Explanation:
awk is called with the -F',|:' option such that the fields separator in each record is the comma or the colon.
In the BEGIN section we declare an associative array (k) of the selected keys, including the surrounding double quotes.
The rest of the awk script applies to each record containing "key1":"value1".
Variable s is used to prepare the output string; it is initialized to "".
For each odd field (the keys) we check if it is in k. If it is, we concatenate to s:
a comma if s is not empty,
the key field,
a colon,
the following even field (the value).
We print s.

How to extract data from live log and pipe it to postgres

I need help with awk/grep/sed or whatever you think can do the job.
I have a log file and need to continuously monitor it and get some data out of the new lines as they are written to it.
The new lines are very long and not structured but they will contain the following pattern UserName=SOMEUSRNAME, NetworkDevice=SOMENETWORKDEVICE, Calling-Station-ID=SOMEMACADDRESS.
Exmaple:
May 15 03:59:16 MTN-LAB-ISE-B1 CISE_Passed_Authentications 0000043297 1 0 2017-05-15 03:59:16.979 +00:00 0013123384 5200 NOTICE Passed-Authentication: Authentication succeeded, ConfigVersionId=170, Device IP Address=10.97.31.130, DestinationIPAddress=10.62.56.152, DestinationPort=1812, UserName=abcd\testuser, Protocol=Radius, RequestLatency=313, NetworkDeviceName=SHROCLUSW-WLAN-LAB, User-Name=d4d748fefe96, NAS-IP-Address=10.97.31.130, NAS-Port=50005, Service-Type=Call Check, Framed-IP-Address=10.97.109.64, Framed-MTU=1500, Called-Station-ID=64-E9-50-B6-DE-05, Calling-Station-ID=D4-D7-48-FE-FE-96, NAS-Port-Type=Ethernet, NAS-Port-Id=GigabitEthernet0/5, EAP-Key-Name=,
I was thinking using tail -f to monitor the log file and pipe it to grep/sed/awk to extract the needed data.
I only need the SOMEUSERNAME, SOMENETWORKDEVICE, SOMEMACADDRESS and not the pattern also.
And of course to make this even more complicated after the extraction is done I need to pipe it to postgres.
Can someone give me a hint on how to do matching/extraction part and maybe the pipe to postgres?
This might be done with grep/sed as well but I personally prefer awk.
I did this short script filter.awk:
{
# find info in line
userName = gensub(/^.*UserName=([^,\r\n]+).*$/, "\\1", 1, $0)
networkDevice = gensub(/^.*NetworkDeviceName=([^,\r\n]+).*$/, "\\1", 1, $0)
callingStationId = gensub(/^.*Calling-Station-ID=([^,\r\n]+).*$/, "\\1", 1, $0)
# print filtered info (if any of patterns matched)
if (userName != "" || networkDevice != "" || callingStationId != "") {
print "INSERT INTO logs (username, networkdevice, calling_station_id) VALUES ('"userName"', '"networkDevice"', '"callingStationId"');"
}
# If "all patterns" is required instead of "any pattern"
# the "||" operators have to be replaced with "&&".
}
I tested it with GNU awk on bash in cygwin (Window 10):
$ cat >filter.txt <<EOF
> May 15 03:59:16 MTN-LAB-ISE-B1 CISE_Passed_Authentications 0000043297 1 0 2017-05-15 03:59:16.979 +00:00 0013123384 5200 NOTICE Passed-Authentication: Authentication succeeded, ConfigVersionId=170, Device IP Address=10.97.31.130, DestinationIPAddress=10.62.56.152, DestinationPort=1812, UserName=abcd\testuser, Protocol=Radius, RequestLatency=313, NetworkDeviceName=SHROCLUSW-WLAN-LAB, User-Name=d4d748fefe96, NAS-IP-Address=10.97.31.130, NAS-Port=50005, Service-Type=Call Check, Framed-IP-Address=10.97.109.64, Framed-MTU=1500, Called-Station-ID=64-E9-50-B6-DE-05, Calling-Station-ID=D4-D7-48-FE-FE-96, NAS-Port-Type=Ethernet, NAS-Port-Id=GigabitEthernet0/5, EAP-Key-Name=,
> EOF
$ awk -f filter.awk filter.txt
INSERT INTO logs (username, networkdevice, calling_station_id) VALUES ('abcd\testuser', 'SHROCLUSW-WLAN-LAB', 'D4-D7-48-FE-FE-96');
$
Notes:
The NetworkDevice= pattern doesn't seem to be sufficient for me. I replaced it with NetworkDeviceName=. (It should be easy to replace this if I'm wrong.)
I do not know how to format output correctly for postgres nor do I know the database structure of the questioner. Thus, the print statement probably has to be adjusted. (There is only one print statement in script.) However, the print statement outputs to standard output channel (what you already might have expected). Thus, it can be piped into any other input consuming process easily.
It is unclear whether it is required that all patterns must match or (instead) at least one.
I implemented "at least one".
To implement "all", the || operators in the if statement had to be replaced by && operators. (There is only one if statement in script.)
Unfortunately, the gensub() function is available in GNU awk only. For non-GNU awk, another solution could be done using gsub() instead. However, the gensub() function is much more convenient to use. Thus, I prefer it as long as a non-GNU awk solution is not explicitly required.

AWK argument prints unwanted newline

Disclaimer: I used an extremely simple example thinking each argument had some hidden encoding I wasn't aware of. Turns out my
formatting was entirely wrong. As #miken32 said, I should be using
commas. I changed my format and it works perfectly. Valuable lesson
learned.
I've exported a csvfile from an xlsl with Excel 2013 (on Windows). I emailed myself the new csv file and am running these tests on Unix (MacOS Sierra).
Consider the following CSV file:
John
Adam
Cameron
Jordan
I'm trying to format each line to look like this:
{'operator':'EQ', 'property':'first_name', 'value':'John'},
{'operator':'EQ', 'property':'first_name', 'value':'Adam'},
{'operator':'EQ', 'property':'first_name', 'value':'Cameron'},
{'operator':'EQ', 'property':'first_name', 'value':'Jordan'}
So value is the only argument changing between each line.
Here is the awk file I wrote:
BEGIN { }
{
print "{'operator':'EQ', 'property':'first_name', 'value':'"$0"'},";
}
END { }
But after executing this is the output I get:
{'operator':'EQ', 'property':'first_name', 'value':'John
'},
{'operator':'EQ', 'property':'first_name', 'value':'Adam
'},
Notice how right after the argument ($0) is printed out, a newline is printed? This is messing with my JSON format. I have a feeling this has to do with the excel exporting (which was done by Save as .csv).
Any suggestions?
In awk, $0 represents the entire line, whereas $1, $2, $n represent the delimited fields in the line.
The sample provided isn't a CSV file, since there aren't any values separated by commas. If it were, you could do this:
awk -F, '{print "{'"'"'operator'"'"':'"'"'EQ'"'"', '"'"'property'"'"':'"'"'first_name'"'"', '"'"'value'"'"':'"'"'"$1"'"'"'},"}' foo.txt
Which gets a bit crazy with the shell-friendly quoting!
You should be aware that there are tools such as jq, which are designed to create and work with JSON data. If this is more than a one-off task you might be better served looking at those.
Edit using a suggestion by Ed Morton from a comment:
awk -F, '{print "{\047operator\047:\047EQ\047, \047property\047:\047first_name\047, \047value\047:\047"$1"\047},"}' foo.txt
(But from your original question it looks like you're using a separate script file anyway, so you won't have to worry about escaping quotes.)
As has been noted, your sample output with '-based quoting isn't valid JSON, where only " may be used.
Ensuring valid JSON output is a good reason to
use the jq CLI, which not only makes the task more robust, but also simplifies it:
jq -Rnc 'inputs | { operator: "EQ", property: "first_name", value: . }' <<EOF
John
Adam
Cameron
Jordan
EOF
yields:
{"operator":"EQ","property":"first_name","value":"John"}
{"operator":"EQ","property":"first_name","value":"Adam"}
{"operator":"EQ","property":"first_name","value":"Cameron"}
{"operator":"EQ","property":"first_name","value":"Jordan"}
Explanation:
-R reads Raw input (input that isn't JSON)
-n suppresses automatic reading of the input, so that special variables input and inputs can be used instead.
-c produces compact output (not pretty-printed)
inputs represents all input lines, and the expression after | sees each line as ., iteratively.
The output object can be specified using JavaScript syntax, which simplifies matters because the property names don't require quoting; the expanded value of { ... } is converted to JSON on output.
Perl:
perl -MJSON -nlE 'push #p,{operator=>"EQ",property=>"first_name",value=>$_}}{say JSON->new->pretty->encode(\#p)' file
output is valid, pretty-printed JSON:
[
{
"operator" : "EQ",
"property" : "first_name",
"value" : "John"
},
{
"operator" : "EQ",
"value" : "Adam",
"property" : "first_name"
},
{
"operator" : "EQ",
"property" : "first_name",
"value" : "Cameron"
},
{
"property" : "first_name",
"value" : "Jordan",
"operator" : "EQ"
}
]
more readble:
perl -MJSON -nlE '
push #p, { operator=>"EQ", property=>"first_name", value=>$_}
END {
say JSON->new->pretty->encode(\#p)
}' file
If you generating JSON, a final note: in the JSON the single quotes aren't allowed.

Bash (or alternative) to find and replace a number of patterns in csv file using another csv file

I have a very large csv file that is too big to open in excel for this operation.
I need to replace a specific string for approx 6000 records out of the 1.5mil in the csv, the string itself is in the comma separated format like so:
ABC,FOO.BAR,123456
With other columns on either side that are of no concern. I only need enough to get enough data to make sure the final data string (the numbers) are unique.
I have another file with the string to replace and the replacement string like (for the above):
"ABC,FOO.BAR,123456","ABC,FOO.BAR,654321"
So in the case above 123456 is being replaced by 654321. A simple (yet maddeningly slow) way to do this is open both docs in notepad++ and find the first string then replace with the second string, but with over 6000 records this isnt great.
I was hoping someone could give advice on a scripting solution? e.g.:
$file1 = base.csv
$file2 = replace.csv
For each row in $file2 {
awk '{sub(/$file2($firstcolumn)/,$file2($Secondcolumn)' $file1
}
Though Im not entirely sure how to adapt awk to do an operation like this..
EDIT: Sorry I should have been more specific, the data in my replacement csv is only in two columns; two raw strings!
it would be easier of course if your delimiter is not used within the fields...
you can do in two steps, create a sed script from the lookup file and use it for the main data file for replacements
for example,
(assumes there is no escaped quotes in the fields, may not hold)
$ awk -F'","' '{print "s/" $1 "\"/\"" $2 "/"}' lookup_file > replace.sed
$ sed -f replace.sed data_file
awk -F\" '
NR==FNR { subst[$2]=$4; next }
{
for (s in subst) {
pos = index($0, s)
if (pos) {
$0 = substr($0, 1, pos-1) subst[s] substr($0, pos + length(s))
break
}
}
print
}
' "$file2" "$file1" # > "$file1.$$.tmp" && mv "$file1.$$.tmp" "$file1"
The part after the # shows how you could replace the input data file with the output.
The block associated with NR==FNR is only executed for the first input file, the one with the search and replacement strings.
subst[$2]=$4 builds an associative array (dictionary): the key is the search string, the value the replacement string.
Fields $2 and $4 are the search string and the replacement string, respectively, because Awk was instructed to break in the input into fields by " (-F\"); note that this assumes that your strings do not contain escaped embedded " chars.
The remaining block then processes the data file:
For each input line, it loops over the search strings and looks for a match on the current line:
Once a match is found, the replacement string is substituted for the search string, and matching stops.
print simply prints the (possibly modified) line.
Note that since you want literal string replacements, regex-based functions such as sub() are explicitly avoided in favor of literal string-processing functions index() and substr().
As an aside: since you say there are columns on either side in the data file, consider making the search/replacement strings more robust by placing , on either side of them (this could be done inside the awk script).
I would recommend using a language with a CSV parsing library rather than trying to do this with shell tools. For example, Ruby:
require 'csv'
replacements = CSV.open('replace.csv','r').to_h
File.open('base.csv', 'r').each_line do |line|
replacements.each do |old, new|
line.gsub!(old) { new }
end
puts line
end
Note that Enumerable#to_h requires Ruby v2.1+; replace with this for older Rubys:
replacements = Hash[*CSV.open('replace.csv','r').to_a.flatten]
You only really need CSV for the replacements file; this assumes you can apply the substitutions to the other file as plain text, which speeds things up a bit and avoids having to parse the old/new strings out into fields themselves.

Resources