Create multiple files based on a pattern - linux

I am interested in creating multiple files based on a simple pattern.
a.kf
b.kf
c.kf
...
fss1.lsk
fss2.lsk
fss3.lsk
...
Of course I could use a loop, however I would prefer a more elegant solution. Perhaps using the tee command, however I've had difficulty implementing this idea.

Use brace expansions.
$ touch {a..c}.kf fss{1..3}.lsk
$ ls
a.kf b.kf c.kf fss1.lsk fss2.lsk fss3.lsk

Related

What is the logic behind Unix's sort command?

So I'm sorting a large file that has many lines of the form:
<node_number>,<value>
The Unix command I'm using is very simple:
sort <file_name>
Here's a part of the output:
....
100009,0.000000
1000090,0.000000
100009,0.050510
1000093,0.000000
1000095,0.000000
1000095,0.000000
....
But why is:
1000090,0.000000
coming before:
100009,0.050510
even though:
100009,0.000000
was before it?
Shouldn't the order be:
....
100009,0.000000
100009,0.050510
1000090,0.000000
....
Why is this not the order in the output file?
This weird order was there in many other parts of the file.
Important Note: I know there are options for the sort command to solve this issue. My intention isn't to find those options in the sort command. I just want to understand the logic behind the current sort output.

Best way to identify similar text inside strings?

I've a list of phrases, actually it's an Excel file, but I can extract each single line if needed.
I need to find the line that is quite similar, for example one line can be:
ANTIBRATING SSPIRING JOINT (type 2) mod. GA160 (temp.max60°)
and some line after I can have the same line or this one:
ANTIBRATING SSPIRING JOINT (type 2) mod. GA200 (temp.max60°)
Like you can see these two lines are pretty the same, not equal in this case but at 98%
The main problem is that I've to process about 45k lines, for this reason I'm searching a way to do that in a quick and maybe visual way.
The first thing that came in my mind was to compare the very 1st line to the 2nd then the 3rd till the end, and so on with the 2nd one and the 3rd one till latest-1 and make a kind of score, for example the 1st line is 100% with line 42, 99% with line 522 ... 21% with line 22142 etc etc...
But is only one idea, maybe not the best.
Maybe out there's already a good program/script/online services/program, I searched but I can't find it, so at the end I asked here.
Anyone knows a good way (if this is possible) or script or one online services to achieve this?
One thing you can do is write a script, which does as follows:
Extract data from csv file
Define a regex which can conclude a similarity, a python example can be:
[\w\s]+\([\w]+\)[\w\s]+\([\w°]+\)
Or such, refer the documentation.
The problem you have is that you are not looking for an exact match, but a like.
This is a problem even databases have never solved and results in a full table scan.
So we're unlikely to solve it.
However, I'd like to propose that you consider alternatives:
You could decide to limit the differences to specific character sets.
In the above example, you were ignoring numbers, but respected letters.
If we can assume that this rule will always hold true, then we can perform a text replace on the string.
ANTIBRATING SSPIRING JOINT (type 2) mod. GA160 (temp.max60°) ==> ANTIBRATING SSPIRING JOINT (type _) mod. GA_ (temp.max_°)
Now, we can deal with this problem by performing an exact string comparison. This can be done by hashing. The easiest way is to feed a hashmap/hashset or a database with a hash index on the column where you will store this adjusted text.
You could decide to trade time for space.
For example, you can feed the strings to a service which will build lots of different variations of indexes on your string. For example, feed elasticsearch with your data, and then perform analytic queries on it.
Fuzzy searches is the key.
I found several projects and ideas, but the one I used is tree-agrep, I know that is quite old but in this case works for me, I created this little script to help me to create a list of differences, so I can manually check it with my file
#!/bin/bash
########## CONFIGURATIONS ##########
original_file=/path/jjj.txt
t_agrep_bin="$(command -v tre-agrep)"
destination_file=/path/destination_file.txt
distance=1
########## CONFIGURATIONS ##########
lines=$(grep "" -c "$original_file")
if [[ -s "$destination_file" ]]; then
rm -rf "$destination_file"
fi
start=1
while IFS= read -r line; do
echo "Checking line $start/$lines"
lista=$($t_agrep_bin -$distance -B --colour -s -n -i "$line" $original_file)
echo "$lista" | awk -F ':' '{print $1}' ORS=' ' >> "$destination_file"
echo >> "$destination_file"
start=$((start+1))
done < "$original_file"

Bash script key/value pair regardless of bash version

I am writing a curl bash script to test webservices. I will have file_1 which would contain the URL paths
/path/to/url/1/{dynamic_path}.xml
/path/to/url/2/list.xml?{query_param}
Since the values in between {} is dynamic, I am creating a separate file, which will have values for these params. the input would be in key-value pair i.e.,
dynamic_path=123
query_param=shipment
By combining two files, the input should become
/path/to/url/1/123.xml
/path/to/url/2/list.xml?shipment
This is the background of my problem. Now my questions
I am doing it in bash script, and the approach I am using is first reading the file with parameters and parse it based on '=' and store it in key/value pair. so it will be easy to replace i.e., for each url I will find the substring between {} and whatever the text it comes with, I will use it as the key to fetch the value from the array
My approach sounds okay (at least to me) BUT, I just realized that
declare -A input_map is only supported in bashscript higher than 4.0. Now, I am not 100% sure what would be the target environment for my script, since it could run in multiple department.
Is there anything better you could suggest ? Any other approach ? Any other design ?
P.S:
This is the first time i am working on bash script.
Here's a risky way to do it: Assuming the values are in a file named "values"
. values
eval "$( sed 's/^/echo "/; s/{/${/; s/$/"/' file_1 )"
Basically, stick a dollar sign in front of the braces and transform each line into an echo statement.
More effort, with awk:
awk '
NR==FNR {split($0, a, /=/); v[a[1]]=a[2]; next}
(i=index($0, "{")) && (j=index($0,"}")) {
key=substr($0,i+1, j-i-1)
print substr($0, 1, i-1) v[key] substr($0, j+1)
}
' values file_1
There are many ways to do this. You seem to think of putting all inputs in a hashmap, and then iterate over that hashmap. In shell scripting it's more common and practical to process things as a stream using pipelines.
For example, your inputs could be in a csv file:
123,shipment
345,order
Then you could process this file like this:
while IFS=, read path param; do
sed -e "s/{dynamic_path}/$path/" -e "s/{query_param}/$param/" file_1
done < input.csv
The output will be:
/path/to/url/1/123.xml
/path/to/url/2/list.xml?shipment
/path/to/url/1/345.xml
/path/to/url/2/list.xml?order
But this is just an example, there can be so many other ways.
You should definitely start by writing a proof of concept and test it on your deployment server. This example should work in old versions of bash too.

Using a Chef recipe to append multiple lines to a config file

I'm trying to create a Chef recipe to append multiple lines (20-30) to a specific config file.
I'm aware the recommended pattern is to change entire config files rather than just appending to a file, but I dislike this approach for multiple reasons.
So far the only solution I found was to use a cookbook_file and then use a bash resource to do:
cat lines_to_append >> /path/configfile
Obviously this wouldn't work properly, as it'd append the file over and over, each time you run chef-client. I'd have to create a small bash script to check for a specific string first, and, if not found, append to the file.
But this seems to defeat the purpose of using Chef. There must be a better way.
One promising solution was the line cookbook from OpsCode Community. It aimed to solve this exact problem. Unfortunately the functionality is incomplete, buggy, and the code is just a quick hack. Far from being a solid solution.
Another option I evaluated was augeas. Seems pretty powerful, but it'd add yet-another layer of abstraction to the system. Overkill, in my case.
Given that this is one of the most obvious tasks for any sysadmin, is there any easy and beautiful solution with Chef that I'm not seeing?
EDIT: here's how I'm solving it so far:
cookbook_file "/tmp/parms_to_append.conf" do
source "parms_to_append.conf"
end
bash "append_to_config" do
user "root"
code <<-EOF
cat /tmp/parms_to_append.conf >> /etc/config
rm /tmp/parms_to_append.conf
EOF
not_if "grep -q MY_IDENTIFIER /etc/config"
end
It works, but not sure this is the recommended Chef pattern.
As you said yourself, the recommended Chef pattern is to manage the whole file.
If you're using Chef 11 you could probably make use of partials for what you're trying to achieve.
There's more info here and on this example cookbook.
As long as you have access to the original config template, just append <%= render "original_config.erb" %> to the top of your parms_to_append.conf template.
As said before, using templates and partials is common way of doing this, but chef allows appending files, and even changing(editing) file lines. Appendind is performed using following functions:
insert_line_after_match(regex, newline);
insert_line_if_no_match(regex, newline)
You may find and example here on stackoverflow, and the full documentation on rubydoc.info
Please use it with caution, and only when partials and templates are not appropriate.
I did something like this:
monit_overwrites/templates/default/monitrc.erb:
#---FLOWDOCK-START
set mail-format { from: monit#ourservice.com }
#---FLOWDOCK-END
In my recipe I did this:
monit_overwrites/recipes/default.rb:
execute "Clean up monitrc from earlier runs" do
user "root"
command "sed '/#---FLOWDOCK-START/,/#---FLOWDOCK-END/d' > /etc/monitrc"
end
template "/tmp/monitrc_append.conf" do
source "monitrc_append.erb"
end
execute "Setup monit to push notifications into flowdock" do
user "root"
command "cat /tmp/monitrc_append.conf >> /etc/monitrc"
end
execute "Remove monitrc_append" do
command "rm /tmp/monitrc_append.conf"
end
The easiest way to tackle this would be to create a string and pass it to content. Of course bash blocks work... but I think file resources are elegant.
lines = ""
File.open('input file') do |f|
f.lines.each do |line|
lines = lines + line + "\n"
end
end
file "file path" do
content line
end
Here is the example ruby block for inserting 2 new lines after match:
ruby_block "insert_lines" do
block do
file = Chef::Util::FileEdit.new("/etc/nginx/nginx.conf")
file.insert_line_after_match("worker_rlimit_nofile", "load_module 1")
file.insert_line_after_match("pid", "load_module 2")
file.write_file
end
end
insert_line_after_match searches for the regex/string and it will insert the value in after the match.

Passing data into perl script from command line

I have a perl script the creates a report based on an xml definition. Currently these definitions all exist as .xml files.
So I have the script run-report.pl, which can take a path to a definition file and create the report.
Now I want to create run-reports-from-db.pl, which will generate the report definition based on same database entries. I don't want to create temp files to pass to run-report.pl, I would just like to pass in the definition somehow.
So instead of saying:
run-report.pl -def=./path/to/def.xml
I want to be able to say:
run-report.pl --stream
And have the report definition available in <STDIN>
I am sure there is pretty trivial way to do this???
If I understand your question correctly, all you need is one | (pipe).
./generate-xml-from-db.pl | ./run-report.pl --stream
Anything the first process in the pipeline prints to stdout will appear in the second process's stdin.
As long as you read from STDIN, you have it available. Notice what happens with you take the code below name it something like echo.pl run it at the command line and paste reams of text.
#!/usr/bin/perl -w
use 5.010;
use strict;
use warnings;
while ( <> ) {
say;
}
<> is the Perl shorthand for "read from STDIN".
As long as the method you're using to launch the process has a way to get a hold of the standard input and outputs, you can just write it to that handle. You have to use the ways that are available to you. In Java, for example, you'd have to get the input stream of the process, in a batch command you have to pipe it. At a GUI terminal you can cut and paste.

Resources