how can i make the lines variable in a file? - linux

I am using a unix based program. I want to automate the code so as not to copy and paste data one by one. For this, I need to define line-by-line data in a file as a variable for the code
The program converts xyz coordinates to local coordinates. How can I run the coordinates in the xyz_coordinates file I created, one by one, in the code below? In the program I use, the conversion code works like this:
echo 4208830.039709186 2334850.551667509 4171267.377406844 -6.753E-01 4.493E-01 2.849E-01 | xyz2env.py
and this is the file i am trying to run:
2679689.926729193 -727950.9964290063 5722789.538975053 7.873E-02 3.466E-01 6.410E-01
2679689.927123377 -727950.9971557076 5722789.540522 7.912E-02 3.458E-01 6.425E-01
2679689.930567728 -727950.9979971027 5722789.550832021 8.257E-02 3.450E-01 6.528E-01
2679689.931029495 -727950.9992263148 5722789.549927638 8.303E-02 3.438E-01 6.519E-01
2679689.929031829 -727950.9981009626 5722789.546359798 8.103E-02 3.449E-01 6.484E-01
........
and it goes on like this. Also, there are spaces between the lines. Will this be a problem?

You can use xargs to invoke the command for a specific number of arguments (6 in your case) and have the advantage of skipping empty lines automatically
< file.txt xargs -n 6 xyz2env.py

Related

Best way to identify similar text inside strings?

I've a list of phrases, actually it's an Excel file, but I can extract each single line if needed.
I need to find the line that is quite similar, for example one line can be:
ANTIBRATING SSPIRING JOINT (type 2) mod. GA160 (temp.max60°)
and some line after I can have the same line or this one:
ANTIBRATING SSPIRING JOINT (type 2) mod. GA200 (temp.max60°)
Like you can see these two lines are pretty the same, not equal in this case but at 98%
The main problem is that I've to process about 45k lines, for this reason I'm searching a way to do that in a quick and maybe visual way.
The first thing that came in my mind was to compare the very 1st line to the 2nd then the 3rd till the end, and so on with the 2nd one and the 3rd one till latest-1 and make a kind of score, for example the 1st line is 100% with line 42, 99% with line 522 ... 21% with line 22142 etc etc...
But is only one idea, maybe not the best.
Maybe out there's already a good program/script/online services/program, I searched but I can't find it, so at the end I asked here.
Anyone knows a good way (if this is possible) or script or one online services to achieve this?
One thing you can do is write a script, which does as follows:
Extract data from csv file
Define a regex which can conclude a similarity, a python example can be:
[\w\s]+\([\w]+\)[\w\s]+\([\w°]+\)
Or such, refer the documentation.
The problem you have is that you are not looking for an exact match, but a like.
This is a problem even databases have never solved and results in a full table scan.
So we're unlikely to solve it.
However, I'd like to propose that you consider alternatives:
You could decide to limit the differences to specific character sets.
In the above example, you were ignoring numbers, but respected letters.
If we can assume that this rule will always hold true, then we can perform a text replace on the string.
ANTIBRATING SSPIRING JOINT (type 2) mod. GA160 (temp.max60°) ==> ANTIBRATING SSPIRING JOINT (type _) mod. GA_ (temp.max_°)
Now, we can deal with this problem by performing an exact string comparison. This can be done by hashing. The easiest way is to feed a hashmap/hashset or a database with a hash index on the column where you will store this adjusted text.
You could decide to trade time for space.
For example, you can feed the strings to a service which will build lots of different variations of indexes on your string. For example, feed elasticsearch with your data, and then perform analytic queries on it.
Fuzzy searches is the key.
I found several projects and ideas, but the one I used is tree-agrep, I know that is quite old but in this case works for me, I created this little script to help me to create a list of differences, so I can manually check it with my file
#!/bin/bash
########## CONFIGURATIONS ##########
original_file=/path/jjj.txt
t_agrep_bin="$(command -v tre-agrep)"
destination_file=/path/destination_file.txt
distance=1
########## CONFIGURATIONS ##########
lines=$(grep "" -c "$original_file")
if [[ -s "$destination_file" ]]; then
rm -rf "$destination_file"
fi
start=1
while IFS= read -r line; do
echo "Checking line $start/$lines"
lista=$($t_agrep_bin -$distance -B --colour -s -n -i "$line" $original_file)
echo "$lista" | awk -F ':' '{print $1}' ORS=' ' >> "$destination_file"
echo >> "$destination_file"
start=$((start+1))
done < "$original_file"

How to block-sort rows of text file in bash?

I have a bar-separated text file that looks like this:
stringC|rest-of-lineC3
stringC|rest-of-lineC1
stringC|rest-of-lineC2
stringA|rest-of-lineA2
stringA|rest-of-lineA1
stringB|rest-of-lineB4
stringB|rest-of-lineB1
stringB|rest-of-lineB3
stringB|rest-of-lineB2
I need to block-sort it by the string before the | without doing secondary sort of what comes after the bar. So the example below SHOULD be sorted into:
stringA|rest-of-lineA2
stringA|rest-of-lineA1
stringB|rest-of-lineB4
stringB|rest-of-lineB1
stringB|rest-of-lineB3
stringB|rest-of-lineB2
stringC|rest-of-lineC3
stringC|rest-of-lineC1
stringC|rest-of-lineC2
but NOT into:
stringA|rest-of-lineA1
stringA|rest-of-lineA2
stringB|rest-of-lineB1
...
Is there a way of doing this in a bash script using sort or any other commands?
How can I get the same result as above in case the original file is not in blocks, i.e. it looks something like this:
stringC|rest-of-lineC3
stringA|rest-of-lineA2
stringC|rest-of-lineC1
stringA|rest-of-lineA1
stringB|rest-of-lineB4
stringB|rest-of-lineB1
stringC|rest-of-lineC2
stringB|rest-of-lineB3
stringB|rest-of-lineB2
sort with -s option for stable
sort -st'|' -k1,1 file
initial blocks doesn't matter, should work for both.

Bash script key/value pair regardless of bash version

I am writing a curl bash script to test webservices. I will have file_1 which would contain the URL paths
/path/to/url/1/{dynamic_path}.xml
/path/to/url/2/list.xml?{query_param}
Since the values in between {} is dynamic, I am creating a separate file, which will have values for these params. the input would be in key-value pair i.e.,
dynamic_path=123
query_param=shipment
By combining two files, the input should become
/path/to/url/1/123.xml
/path/to/url/2/list.xml?shipment
This is the background of my problem. Now my questions
I am doing it in bash script, and the approach I am using is first reading the file with parameters and parse it based on '=' and store it in key/value pair. so it will be easy to replace i.e., for each url I will find the substring between {} and whatever the text it comes with, I will use it as the key to fetch the value from the array
My approach sounds okay (at least to me) BUT, I just realized that
declare -A input_map is only supported in bashscript higher than 4.0. Now, I am not 100% sure what would be the target environment for my script, since it could run in multiple department.
Is there anything better you could suggest ? Any other approach ? Any other design ?
P.S:
This is the first time i am working on bash script.
Here's a risky way to do it: Assuming the values are in a file named "values"
. values
eval "$( sed 's/^/echo "/; s/{/${/; s/$/"/' file_1 )"
Basically, stick a dollar sign in front of the braces and transform each line into an echo statement.
More effort, with awk:
awk '
NR==FNR {split($0, a, /=/); v[a[1]]=a[2]; next}
(i=index($0, "{")) && (j=index($0,"}")) {
key=substr($0,i+1, j-i-1)
print substr($0, 1, i-1) v[key] substr($0, j+1)
}
' values file_1
There are many ways to do this. You seem to think of putting all inputs in a hashmap, and then iterate over that hashmap. In shell scripting it's more common and practical to process things as a stream using pipelines.
For example, your inputs could be in a csv file:
123,shipment
345,order
Then you could process this file like this:
while IFS=, read path param; do
sed -e "s/{dynamic_path}/$path/" -e "s/{query_param}/$param/" file_1
done < input.csv
The output will be:
/path/to/url/1/123.xml
/path/to/url/2/list.xml?shipment
/path/to/url/1/345.xml
/path/to/url/2/list.xml?order
But this is just an example, there can be so many other ways.
You should definitely start by writing a proof of concept and test it on your deployment server. This example should work in old versions of bash too.

Passing data into perl script from command line

I have a perl script the creates a report based on an xml definition. Currently these definitions all exist as .xml files.
So I have the script run-report.pl, which can take a path to a definition file and create the report.
Now I want to create run-reports-from-db.pl, which will generate the report definition based on same database entries. I don't want to create temp files to pass to run-report.pl, I would just like to pass in the definition somehow.
So instead of saying:
run-report.pl -def=./path/to/def.xml
I want to be able to say:
run-report.pl --stream
And have the report definition available in <STDIN>
I am sure there is pretty trivial way to do this???
If I understand your question correctly, all you need is one | (pipe).
./generate-xml-from-db.pl | ./run-report.pl --stream
Anything the first process in the pipeline prints to stdout will appear in the second process's stdin.
As long as you read from STDIN, you have it available. Notice what happens with you take the code below name it something like echo.pl run it at the command line and paste reams of text.
#!/usr/bin/perl -w
use 5.010;
use strict;
use warnings;
while ( <> ) {
say;
}
<> is the Perl shorthand for "read from STDIN".
As long as the method you're using to launch the process has a way to get a hold of the standard input and outputs, you can just write it to that handle. You have to use the ways that are available to you. In Java, for example, you'd have to get the input stream of the process, in a batch command you have to pipe it. At a GUI terminal you can cut and paste.

Filename manipulation in cygwin

I am running cygwin on Windows 7. I am using a signal processing tool and basically performing alignments. I had about 1200 input files. Each file is of the format given below.
input_file_ format = "AC_XXXXXX.abc"
The first step required building some kind of indexes for all the input files, this was done with the tool's build-index command and now each file had 6 indexes associated with it. Therefore now I have about 1200*6 = 7200 index files. The indexes are of the form given below.
indexes_format = "AC_XXXXXX.abc.1",
"AC_XXXXXX.abc.2",
"AC_XXXXXX.abc.3",
"AC_XXXXXX.abc.4",
"AC_XXXXXX.abc.rev.1",
"AC_XXXXXX.abc.rev.1"
Now, I need to use these indexes to perform the alignment. All the 6 indexes of each file are called together and the final operation is done as follows.
signal-processing-tool ..\path-to-indexes\AC_XXXXXX.abc ..\Query file
Where AC_XXXXXX.abc is the index associated with that particular index file. All 6 index files are called with **AC_XXXXXX.abc*.
My problem is that I need to use only the first 14 characters of the index file names for the final operation.
When I use the code below, the alignment is not executed.
for file in indexes/*; do ./tool $file|cut -b1-14 Project/query_file; done
I'd appreciate help with this!
First of all, keep in mind that $file will always start with "indexes/", so trimming first 14 characters would always include that folder name in the beginning.
To use first 14 characters in a variable, use ${file:0:14}, where 0 is the starting string index, and 14 is the length of the desired substring.
Alternatively, if you want to use cut, you need to run it in a subshell: for file in indexes/*; do ./tool $(echo $file|cut -c 1-14) Project/query_file; done I changed the arg for cut to -c for characters instead of bytes

Resources