Bash script key/value pair regardless of bash version - linux

I am writing a curl bash script to test webservices. I will have file_1 which would contain the URL paths
/path/to/url/1/{dynamic_path}.xml
/path/to/url/2/list.xml?{query_param}
Since the values in between {} is dynamic, I am creating a separate file, which will have values for these params. the input would be in key-value pair i.e.,
dynamic_path=123
query_param=shipment
By combining two files, the input should become
/path/to/url/1/123.xml
/path/to/url/2/list.xml?shipment
This is the background of my problem. Now my questions
I am doing it in bash script, and the approach I am using is first reading the file with parameters and parse it based on '=' and store it in key/value pair. so it will be easy to replace i.e., for each url I will find the substring between {} and whatever the text it comes with, I will use it as the key to fetch the value from the array
My approach sounds okay (at least to me) BUT, I just realized that
declare -A input_map is only supported in bashscript higher than 4.0. Now, I am not 100% sure what would be the target environment for my script, since it could run in multiple department.
Is there anything better you could suggest ? Any other approach ? Any other design ?
P.S:
This is the first time i am working on bash script.

Here's a risky way to do it: Assuming the values are in a file named "values"
. values
eval "$( sed 's/^/echo "/; s/{/${/; s/$/"/' file_1 )"
Basically, stick a dollar sign in front of the braces and transform each line into an echo statement.
More effort, with awk:
awk '
NR==FNR {split($0, a, /=/); v[a[1]]=a[2]; next}
(i=index($0, "{")) && (j=index($0,"}")) {
key=substr($0,i+1, j-i-1)
print substr($0, 1, i-1) v[key] substr($0, j+1)
}
' values file_1

There are many ways to do this. You seem to think of putting all inputs in a hashmap, and then iterate over that hashmap. In shell scripting it's more common and practical to process things as a stream using pipelines.
For example, your inputs could be in a csv file:
123,shipment
345,order
Then you could process this file like this:
while IFS=, read path param; do
sed -e "s/{dynamic_path}/$path/" -e "s/{query_param}/$param/" file_1
done < input.csv
The output will be:
/path/to/url/1/123.xml
/path/to/url/2/list.xml?shipment
/path/to/url/1/345.xml
/path/to/url/2/list.xml?order
But this is just an example, there can be so many other ways.
You should definitely start by writing a proof of concept and test it on your deployment server. This example should work in old versions of bash too.

Related

how can i make the lines variable in a file?

I am using a unix based program. I want to automate the code so as not to copy and paste data one by one. For this, I need to define line-by-line data in a file as a variable for the code
The program converts xyz coordinates to local coordinates. How can I run the coordinates in the xyz_coordinates file I created, one by one, in the code below? In the program I use, the conversion code works like this:
echo 4208830.039709186 2334850.551667509 4171267.377406844 -6.753E-01 4.493E-01 2.849E-01 | xyz2env.py
and this is the file i am trying to run:
2679689.926729193 -727950.9964290063 5722789.538975053 7.873E-02 3.466E-01 6.410E-01
2679689.927123377 -727950.9971557076 5722789.540522 7.912E-02 3.458E-01 6.425E-01
2679689.930567728 -727950.9979971027 5722789.550832021 8.257E-02 3.450E-01 6.528E-01
2679689.931029495 -727950.9992263148 5722789.549927638 8.303E-02 3.438E-01 6.519E-01
2679689.929031829 -727950.9981009626 5722789.546359798 8.103E-02 3.449E-01 6.484E-01
........
and it goes on like this. Also, there are spaces between the lines. Will this be a problem?
You can use xargs to invoke the command for a specific number of arguments (6 in your case) and have the advantage of skipping empty lines automatically
< file.txt xargs -n 6 xyz2env.py

Best way to identify similar text inside strings?

I've a list of phrases, actually it's an Excel file, but I can extract each single line if needed.
I need to find the line that is quite similar, for example one line can be:
ANTIBRATING SSPIRING JOINT (type 2) mod. GA160 (temp.max60°)
and some line after I can have the same line or this one:
ANTIBRATING SSPIRING JOINT (type 2) mod. GA200 (temp.max60°)
Like you can see these two lines are pretty the same, not equal in this case but at 98%
The main problem is that I've to process about 45k lines, for this reason I'm searching a way to do that in a quick and maybe visual way.
The first thing that came in my mind was to compare the very 1st line to the 2nd then the 3rd till the end, and so on with the 2nd one and the 3rd one till latest-1 and make a kind of score, for example the 1st line is 100% with line 42, 99% with line 522 ... 21% with line 22142 etc etc...
But is only one idea, maybe not the best.
Maybe out there's already a good program/script/online services/program, I searched but I can't find it, so at the end I asked here.
Anyone knows a good way (if this is possible) or script or one online services to achieve this?
One thing you can do is write a script, which does as follows:
Extract data from csv file
Define a regex which can conclude a similarity, a python example can be:
[\w\s]+\([\w]+\)[\w\s]+\([\w°]+\)
Or such, refer the documentation.
The problem you have is that you are not looking for an exact match, but a like.
This is a problem even databases have never solved and results in a full table scan.
So we're unlikely to solve it.
However, I'd like to propose that you consider alternatives:
You could decide to limit the differences to specific character sets.
In the above example, you were ignoring numbers, but respected letters.
If we can assume that this rule will always hold true, then we can perform a text replace on the string.
ANTIBRATING SSPIRING JOINT (type 2) mod. GA160 (temp.max60°) ==> ANTIBRATING SSPIRING JOINT (type _) mod. GA_ (temp.max_°)
Now, we can deal with this problem by performing an exact string comparison. This can be done by hashing. The easiest way is to feed a hashmap/hashset or a database with a hash index on the column where you will store this adjusted text.
You could decide to trade time for space.
For example, you can feed the strings to a service which will build lots of different variations of indexes on your string. For example, feed elasticsearch with your data, and then perform analytic queries on it.
Fuzzy searches is the key.
I found several projects and ideas, but the one I used is tree-agrep, I know that is quite old but in this case works for me, I created this little script to help me to create a list of differences, so I can manually check it with my file
#!/bin/bash
########## CONFIGURATIONS ##########
original_file=/path/jjj.txt
t_agrep_bin="$(command -v tre-agrep)"
destination_file=/path/destination_file.txt
distance=1
########## CONFIGURATIONS ##########
lines=$(grep "" -c "$original_file")
if [[ -s "$destination_file" ]]; then
rm -rf "$destination_file"
fi
start=1
while IFS= read -r line; do
echo "Checking line $start/$lines"
lista=$($t_agrep_bin -$distance -B --colour -s -n -i "$line" $original_file)
echo "$lista" | awk -F ':' '{print $1}' ORS=' ' >> "$destination_file"
echo >> "$destination_file"
start=$((start+1))
done < "$original_file"

Split single record into Multiple records in Unix shell Script

I have record
Example:
EMP_ID|EMP_NAME|AGE|SALARAy
123456|XXXXXXXXX|30|10000000
Is there a way i can split the record into multiple records. Example output should be like
EMP_ID|Attributes
123456|XXXXXXX
123456|30
123456|10000000
I want to split the same record into multiple records. Here Employee id is my unique column and remaining 3 columns i want to run in a loop and create 3 records. Like EMP_ID|EMP_NAME , EMP_ID|AGE , EMP_ID|SALARY. I may have some more columns as well but for sample i have provided 3 columns along with Employee id.
Please help me with any suggestion.
With bash:
record='123456|XXXXXXXXX|30|10000000'
IFS='|' read -ra fields <<<"$record"
for ((i=1; i < "${#fields[#]}"; i++)); do
printf "%s|%s\n" "${fields[0]}" "${fields[i]}"
done
123456|XXXXXXXXX
123456|30
123456|10000000
For the whole file:
{
IFS= read -r header
while IFS='|' read -ra fields; do
for ((i=1; i < "${#fields[#]}"; i++)); do
printf "%s|%s\n" "${fields[0]}" "${fields[i]}"
done
done
} < filename
Record of lines with fields separated by a special delimiter character such as | can be manipulated by basic Unix command line tools such as awk. For example with your input records in file records.txt:
awk -F\| 'NR>1{for(i=2;i<=NF;i++){print $1"|"$(i)}}' records.txt
I recommend to read a awk tutorial and play around with it. Related command line tools worth to learn include grep, sort, wc, uniq, head, tail, and cut. If you regularly do data processing of delimiter-separated files, you will likely need them on a daily basis. As soon as your data structuring format gets more complex (e.g. CSV format with possibility to also use the delimiter character in field values) you need more specific tools, for instance see this question on CSV tools or jq for processing JSON. Still knowledge of basic Unix command line tools will save you a lot of time.

Pipe output to bash function

I have as simple function in a bash script and I would like to pipe stdout to it as an input.
jc_hms(){
printf "$1"
}
I'd like to use it in this manner.
var=`echo "teststring" | jc_hms`
Of course I used redundant functions echo and printf to simplify the question, but you get the idea. Right now I get a "not found" error, which I assume means my parameter delimiting is wrong (the "$1" part). Any suggestions?
Originally the jc_hms function was used like this:
echo `jc_hms "teststring"` > //dev/tts/0
but I'd like to store the results in a variable for further processing first, before sending it to the serial port.
EDIT:
So to clarify, I am NOT trying to print stuff to the serial port, I'd like to interface to my bash functions should the "|" pipe character, and I am wondering if this is possible.
EDIT: Alright, here's the full function.
jc_hms(){
hr=$(($1 / 3600))
min=$(($1 / 60))
sec=$(($1 % 60))
printf "$hs:%02d:%02d" $min $sec
}
I'm using the function to form a string which come this line of code
songplaytime=`echo $songtime | awk '{print S1 }'`
printstring="`jc_hms $songplaytime`" #store resulting string in printstring
Where $songtime is a string expressed as "playtime totaltime" delimited by a space.
I wish I can just do this in one line, and pipe it after the awk
printstring=`echo $songtime | awk '{print S1 }' | jc_hms`
like so.
To answer your actual question, when a shell function is on the receiving end of a pipe, standard input is inherited by all commands in the function, but only commands that actually read form their standard input consume any data. For commands that run one after the other, later commands can only see what isn't consumed by previous commands. When two commands run in parallel, which commands see which data depends on how the OS schedules the commands.
Since printf is the first and only command in your function, standard input is effectively ignored. There are several ways around that, including using the read built-in to read standard input into a variable which can be passed to printf:
jc_hms () {
read foo
hr=$(($foo / 3600))
min=$(($foo / 60))
sec=$(($foo % 60))
printf "%d:%02d:%02d" "$hr" "$min" "$sec"
}
However, since your need for a pipeline seems to depend on your perceived need to use awk, let me suggest the following alternative:
printstring=$( jc_hms $songtime )
Since songtime consists of a space-separated pair of numbers, the shell performs word-splitting on the value of songtime, and jc_hms sees two separate parameters. This requires no change in the definition of jc_hms, and no need to pipe anything into it via standard input.
If you still have a different reason for jc_hms to read standard input, please let us know.
You can't pipe stuff directly to a bash function like that, however you can use read to pull it in instead:
jc_hms() {
while read -r data; do
printf "%s" "$data"
done
}
should be what you want
1) I know this is a pretty old post
2) I like most of the answers here
However, I found this post because I needed to something similar. While everyone agrees stdin is what needs to be used, what the answers here are missing is the actual usage of the /dev/stdin file.
Using the read builtin forces this function to be used with piped input, so it can no longer be used in a typical way. I think utilizing /dev/stdin is a superior way of solving this problem, so I wanted to add my 2 cents for completeness.
My solution:
jc_hms() {
declare -i i=${1:-$(</dev/stdin)};
declare hr=$(($i/3600)) min=$(($i/60%60)) sec=$(($i%60));
printf "%02d:%02d:%02d\n" $hr $min $sec;
}
In action:
user#hostname:pwd$ jc_hms 7800
02:10:00
user#hostname:pwd$ echo 7800 | jc_hms
02:10:00
I hope this may help someone.
Happy hacking!
Or, you can also do it in a simple way.
jc_hms() {
cat
}
Though all answers so far have disregarded the fact that this was not what OP wanted (he stated the function is simplified)
I like user.friendly's answer using the Bash built-in conditional unset substitution syntax.
Here's a slight tweak to make his answer more generic, such as for cases with an indeterminate parameter count:
function myfunc() {
declare MY_INPUT=${*:-$(</dev/stdin)}
for PARAM in $MY_INPUT; do
# do what needs to be done on each input value
done
}
Hmmmm....
songplaytime=`echo $songtime | awk '{print S1 }'`
printstring="`jc_hms $songplaytime`" #store resulting string in printstring
if you're calling awk anyway, why not use it?
printstring=`TZ=UTC gawk -vT=$songplaytime 'BEGIN{print strftime("%T",T)}'`
I'm assuming you're using Gnu's Awk, which is the best one and also free; this will work in common linux distros which aren't necessarily using the most recent gawk. The most recent versions of gawk will let you specify UTC as a third parameter to the strftime() function.
The proposed solutions require content on stdin or read to be only conditionally called. Otherwise the function will wait for content from the console and require an Enter or Ctrl+D before continuing.
A workaround is to use read with a timeout. e.g. read -t <seconds>
function test ()
{
# ...
# process any parameters
# ...
read -t 0.001 piped
if [[ "${piped:-}" ]]; then
echo $piped
fi
}
Note, -t 0 did not work for me.
You might have to use a different value for the time-out.
Too small a value might result in bugs and a too large time-out delays the script.
seems nothing works, but there are work arounds
mentioned work around xargs ref function
$ FUNCS=$(functions hi); seq 3 | xargs -I{} zsh -c "eval $FUNCS; hi {}"
then this doesn't work either because your function could reference another function. so I ended up writing some function that accepts pipe inputs, like this:
somefunc() {
while read -r data; do
printf "%s" "$data"
done
}

Using Awk to process a file where each record has different fixed-width fields

I have some data files from a legacy system that I would like to process using Awk. Each file consists of a list of records. There are several different record types and each record type has a different set of fixed-width fields (there is no field separator character). The first two characters of the record indicate the type, from this you then know which fields should follow. A file might look something like this:
AAField1Field2LongerField3
BBField4Field5Field6VeryVeryLongField7Field8
CCField99
Using Gawk I can set the FIELDWIDTHS, but that applies to the whole file (unless I am missing some way of setting this on a record-by-record basis), or I can set FS to "" and process the file one character at a time, but that's a bit cumbersome.
Is there a good way to extract the fields from such a file using Awk?
Edit: Yes, I could use Perl (or something else). I'm still keen to know whether there is a sensible way of doing it with Awk though.
Hopefully this will lead you in the right direction. Assuming your multi-line records are guaranteed to be terminated by a 'CC' type row you can pre-process your text file using simple if-then logic. I have presumed you require fields1,5 and 7 on one row and a sample awk script would be.
BEGIN {
field1=""
field5=""
field7=""
}
{
record_type = substr($0,1,2)
if (record_type == "AA")
{
field1=substr($0,3,6)
}
else if (record_type == "BB")
{
field5=substr($0,9,6)
field7=substr($0,21,18)
}
else if (record_type == "CC")
{
print field1"|"field5"|"field7
}
}
Create an awk script file called program.awk and pop that code into it. Execute the script using :
awk -f program.awk < my_multi_line_file.txt
You maybe can use two passes:
1step.awk
/^AA/{printf "2 6 6 12" }
/^BB/{printf "2 6 6 6 18 6"}
/^CC/{printf "2 8" }
{printf "\n%s\n", $0}
2step.awk
NR%2 == 1 {FIELDWIDTHS=$0}
NR%2 == 0 {print $2}
And then
awk -f 1step.awk sample | awk -f 2step.awk
You probably need to suppress (or at least ignore) awk's built-in field separation code, and use a program along the lines of:
awk '/^AA/ { manually process record AA out of $0 }
/^BB/ { manually process record BB out of $0 }
/^CC/ { manually process record CC out of $0 }' file ...
The manual processing will be a bit fiddly - I suppose you'll need to use the substr function to extract each field by position, so what I've got as one line per record type will be more like one line per field in each record type, plus the follow-on printing.
I do think you might be better off with Perl and its unpack feature, but awk can handle it too, albeit verbosely.
Could you use Perl and then select an unpack template based on the first two chars of the line?
Better use some fully featured scripting language like perl or ruby.
What about 2 scripts? E.g. 1st script inserts field separators based on the first characters, then the 2nd should process it?
Or first of all define some function in your AWK script, which splits the lines into variables based on the input - I would go this way, for the possible re-usage.

Resources