Concatenate multiple yaml files with seperator - linux

I need to concat multiple k8s deployment yaml files into one deployment script, and in doing so create a specific separator --- between each file. I know the specific depth at which the files will live as well as the filename, however I don't know how many there will be at a given time, so I've used the find statement below to
recursively search for the yaml file
concat each
piped in the tail command as a seperator
find . -type f -name 'deployment.yml' -exec cat {} + | tail -n +1 * > finalDeployment.yml
However, this creates broken yaml syntax by inserting the ==> <== delimeter:
I could simply have another task run a find/replace using the above as prefix/suffix tokens, however I'd like something more succinct within the above statement.
Is it possible to pipe in a specific character/set of characters a delimeter within a cat command, or is there another method to accomplish this?

What you want to do is not guaranteed to work. For example, you have these two YAML files:
foo: bar
and:
%YAML 1.2
---
baz
As you can see, the second file contains a directive. It also shows that --- in YAML is not a separator, but a directives end marker. It is optional if you don't have any directives like in the first document. If you concatenate both documents in the way you want to do it, you will get a document with two --- and %YAML 1.2 will be interpreted as content because it occurs after a directives end marker.
So what you actually want to do is to mark the end of each document with ..., the document end marker. After that marker, the parser is reset to its initial state, which guarantees that the second document is parsed exactly as it would have been when it was in a separate file.
Also, no harm is done by adding ... to the last document since it does not start another document implicitly. So your command could look like this (I depend on your statement that you know the depth at which the files lie here and as example, expect a depth of 3 directories):
echo -n > finalDeplayment.yml
for f in */*/*/deployment.yml; do
cat $f >> finalDeployment.yml; echo "..." >> finalDeployment.yml
done

Related

How to rename files in bash to increase number in name?

I have a few thousand files named as follows:
Cyprinus_carpio_600_nanopore_trim_reads.fasta
Cyprinus_carpio_700_nanopore_trim_reads.fasta
Cyprinus_carpio_800_nanopore_trim_reads.fasta
Cyprinus_carpio_900_nanopore_trim_reads.fasta
Vibrio_cholerae_3900_nanopore_trim_reads.fasta
for 80 variations of the first two words (80 different species), i would like to rename all of these files such that the number is increased by 100 - for example:
Vibrio_cholerae_3900_nanopore_trim_reads.fasta
would become
Vibrio_cholerae_4000_nanopore_trim_reads.fasta
or
Cyprinus_carpio_300_nanopore_trim_reads.fasta
would become
Cyprinus_carpio_400_nanopore_trim_reads.fasta
Unfortunately I can't work out how to get to rename them, i've had some luck with following the solutions on https://unix.stackexchange.com/questions/40523/rename-files-by-incrementing-a-number-within-the-filename
But i can't get it to work for the inside of the name, i'm running on Ubuntu 18.04 if that helps
If you can get hold of the Perl-flavoured version of rename, that is simple like this:
rename -n 's/(\d+)/$1 + 100/e' *fasta
Sample Output
'Ciprianus_maximus_11_fred.fasta' would be renamed to 'Ciprianus_maximus_111_fred.fasta'
'Ciprianus_maximus_300_fred.fasta' would be renamed to 'Ciprianus_maximus_400_fred.fasta'
'Ciprianus_maximus_3900_fred.fasta' would be renamed to 'Ciprianus_maximus_4000_fred.fasta'
If you can't read Perl, that says... "Do a single substitution as follows. Wherever you see a bunch of digits next to each other in a row (\d+), remember them (because I put that in parentheses), and then replace them with the evaluated expression of that bunch of digits ($1) plus 100.".
Remove the -n if the dry-run looks correct. The only "tricky part" is the use of e at the end of the substitution which means evaluate the expression in the substitution - or I call it a "calculated replacement".
If there is only one number in your string then below two line of code should provide help you resolve your issue
filename="Vibrio_cholerae_3900_nanopore_trim_reads.fasta"
var=$(echo $filename | grep -oP '\d+')
echo ${filename/${var}/$((var+100))}
Instead of echoing the changed file name, you can take it into a variable and use mv command to rename it
Considering the filename conflicts in the increasing order, I first thought of reversing the order but there still remains the possibility of conflicts in the alphabetical (standard) sort due to the difference to the numerical sort.
Then how about a two-step solution: in the 1st step, an escape character (or whatever character which does not appear in the filename) is inserted in the filename and it is removed in the 2nd step.
#!/bin/bash
esc=$'\033' # ESC character
# 1st pass: increase the number by 100 and insert a ESC before it
for f in *.fasta; do
num=${f//[^0-9]/}
num2=$((num + 100))
f2=${f/$num/$esc$num2}
mv "$f" "$f2"
done
# 2nd pass: remove the ESC from the filename
for f in *.fasta; do
f2=${f/$esc/}
mv "$f" "$f2"
done
Mark's perl-rename solution looks great but you should apply it twice with a bump of 50 to avoid name conflict. If you can't find this flavor of rename you could try my rene.py (https://rene-file-renamer.sourceforge.io) for which the command would be (also applied twice) rene *_*_*_* *_*_?_* B/50. rene would be a little easier because it automatically shows you the changes and asks whether you want to make them and it has an undo if you change your mind.

copy and append specific lines to a file with specific name format?

I am copying some specific lines from one file to another.
grep '^stringmatch' /path/sfile-*.cfg >> /path/nfile-*.cfg
Here what's happening: its creating a new file called nfile-*.cfg and copying those lines in that. The file names sfile- * and nfile- * are randomly generated and are generally followed by a number. Both sfile-* and nfile-* are existing files and there is only one such file in the same directory. Only the number that follows is randomly generated. The numbers following in sfile and nfile need not be same. The files are not created simultaneously but are generated when a specific command is given. But some lines from one file to the another file needs to be appended.
I'm guessing you actually want something like
for f in /path/sfile-*.cfg; do
grep '^stringmatch' "$f" >"/path/nfile-${f#/path/sfile-}"
done
This will loop over all sfile matches and create an nfile target file with the same number after the dash as the corresponding source sfile. (The parameter substitution ${variable#prefix} returns the value of variable with any leading match on the pattern prefix removed.)
If there is only one matching file, the loop will only run once. If there are no matches on the wildcard, the loop will still run once unless you enable nullglob, which changes the shell's globbing behavior so that wildcards with no matches expand into nothing, instead of to the wildcard expression itself. If you don't want to enable nullglob, a common workaround is to add this inside the loop, before the grep;
test -e "$f" || break
If you want the loop to only process the first match if there are several, add break on a line by itself before the done.
If I interpret your question correctly, you want to output to an existing nfile, which has a random number in it, but instead the shell is creating a file with an asterisk in it, so literally nfile-*.cfg.
This is happening because the nfile doesn't exist when you first run the command. If the file doesn't exist, bash will fail to expand nfile-*.cfg and will instead use the * as a literal character. This is correct behaviour in bash.
So, it looks like the problem is that the nfile doesn't exist when you start your grep. You'll need to create one.
I'll leave code to others, but I hope the explanation is useful.

Using diff for two files and send by email [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I have files like below. I use crontab every 5 min to check the files to see if the system's added one file, for example like this: AIR_2015xxxxT0yyyyyyyy.cfg. Then I need to use the diff command automatically between the last one and before the last one.
AIR_20151021T163514000.cfg
AIR_20151026T103845000.cfg
AIR_2015xxxxT0yyyyyyyy.cfg
I want to do this in a script like the one below:
#!/bin/bash
/var/opt/fds/
diff AIR_2015xxxxT0yyyyyyyy.cfg AIR_20151026T103845000.cfg > Test.txt
body(){
cat body.txt
}
(echo -e "$(body)") | -a Test.txt mailx -s 'Comparison' user#email.com
Given a list of files in the directory /var/opt/fds with names in the format:
AIR_YYYYmmddTHHMMSSfff.cfg
where the letter Y represents digits for the year, m for month, d for day, H for hour, M for minute, S for second, and f for fraction (milliseconds), then you need to establish the two most recent files in the directory to compare them.
One way to do this is:
cd /var/opt/fds || exit 1
old=
new=
for file in AIR_20[0-9][0-9]????T?????????.cfg
do
old=$new
new=$file
done
if [ -n "$old" ] && [ -n "$new" ]
then
diff "$old" "$new" > test.txt
mailx -a test.txt -s 'Comparison' user#example.com < body.txt
fi
Note that if the new file has a name containing letters x and y as shown in the question and comments, it will be listed after the names containing the time stamp as digits, so it will be picked up as the new file. It also assumes permission to write in the /var/opt/fds directory, and that the mail body file is present in that directory too. Those assumptions can be trivially fixed if necessary. The test.txt file should be deleted after it is sent, too, and you could check that it is non-empty before sending the email (just in case the two most recent files are in fact identical). You could embed a time-stamp in the generated file name containing the diffs instead of using test.txt:
output="diff.$(date +'%Y%m%dT%H%M%S000').txt"
and then use $output in place of test.txt.
The test ensures that there was both an old and a new name. The pattern match is sloppier than it could be, but using [0-9] or an appropriate subrange ([01], [0-3], [0-2], [0-5]) for the question marks makes the pattern unreadably long:
for file in AIR_20[0-9][0-9][01][0-9][0-3][0-9]T[0-2][0-9][0-5][0-9][0-5][0-9][0-9][0-9][0-9].cfg
It also probably provides very little extra in the way of protection. Of course, as shown, it imposes a Y2.1K crisis on the system, not that it is hard to fix that. You could also cut down the range of valid dates by basing it on today's date, but beware of the end of the year, etc. You might decide you only need entries from the last month or so.
Using globbing is generally better than trying to parse ls or find output. In this context, where the file names have a restricted set of characters in the name (no newlines, no blanks or tabs, no quotes, no dollar signs, etc), it is feasible to use either find or ls — but if you have to deal with arbitrary names created by random end users, those tools are not suitable. (The ls command does all sorts of weird stuff with weird names and basically is hard to use reliably in the face of user cussedness. The find command and its -print0 option can be used, especially if you have a sort that recognizes -z to work with null-terminated 'lines' and an xargs that supports -0 to handle such lines too — but you have to very careful.)
Note that this scheme does not keep a record of the last file analyzed (so if no new files appear for an hour, you might send a dozen copies of the same differences), nor does it directly report on the file names (but using diff -u or diff -c would include the file names being diffed in the output). Again, these issues can be worked around if that's appropriate (and it probably is). Keeping the record of which files have been compared is probably the hardest job; even that's not too bad:
echo "$old" "$new" >> reported.diffs
to record what's been processed; then
if grep -q "$old $new" reported.diffs
then : Already processed
else : Process $old and $new
fi

How to remove part of file names between periods?

I would like to rename many files in this format
abc.123.fits
abcd.1234.fits
efg.12.fits
to this format
abc.fits
abcd.fits
efg.fits
I tried the rename function, but since the part I'm trying to replace is not the same in all files, it did not work. I am using Linux.
for f in *; do mv "$f" "${f%%.*}.${f##*.}"; done`
${f%%.*} removes everything after the first period, including the period. ${f##*.} removes everything before the last period, including the period (i.e. it gets the file extension). Concatenating these two, with a period between them, gives you the desired result.
You can change the * to a more restrictive pattern such as *.fits if you don't want to rename all files in the current directory. The quotes around the parameters to mv are necessary if any filenames contain whitespace.
Many other variable substitution expressions are available in bash; see a reference such as TLDP's Bash Parameter Substitution for more information.

Copy a section within two keywords into a target file

I have thousand of files in a directory and each file contains numbers of defined variables starting with keyword DEFINE and ending with a semicolon (;), I want to copy all the occurrences of the data between this keyword(Inclusive) into a target file.
Example: Below is the content of the text file:
/* This code is for lookup */
DEFINE variable as a1 expr= extract (n123f1 using brach, code);
END.
Now from the above content i just want to copy the section starting with DEFINE and ending with ; into a target file i.e. the output should be:
DEFINE variable as a1 expr= extract (n123f1 using brach, code);
this needs to done for thousands of scripts and multiple occurences, Please help out.
Thanks a lot , the provided code works, but to a limited extent only when the whole sentence is in a single line but the data is not supposed to be in one single line it is spread in multiple line like below:
/* This code is for lookup */
DEFINE variable as a1 expr= if branchno > 55
then
extract (n123f1 using brach, code)
else
branchno = null
;
END.
The code is also in the above fashion i need to capture all the data between DEFINE and semicolon (;) after every define there will be an ending semicolon ;, this is the pattern.
It sounds like you want grep(1):
grep '^DEFINE.*;$' input > output
Try using grep. Let's say you have files with extension .txt in present directory,
grep -ho 'DEFINE.*;' *.txt > outfile
Output:
DEFINE variable as a1 expr= extract (n123f1 using brach, code);
Short Description
-o will give you only matching string rather than whole line, if line also contains something else and want to ommit it.
-h will suppress file names before matching result
Read man page of grep by typing man grep on your terminal
EDIT
If you want capability to search in multiple lines, you can use pcregrep with -M option
pcregrep -M 'DEFINE.*?(\n|.)*?;' *.txt > outfile
Works fine on my system. Check man pcregrep for more details
Reference : SO Question
One can make a simple solution using sed with version :
sed -n -e '/^DEFINE/{:a p;/;$/!{n;ba}}' your-file
Option -n prevents sed from printing every line; then each time a line begins with DEFINE, print the line (command p) then enter a loop: until you find a line ending with ;, grab the next line and loop to the print command. When exiting the loop, you do nothing.
It looks a bit dirty; it seems that the version sed15 has a shorter (and more straightforward) way to achieve this in one line:
sed -n -e '/^DEFINE/,/;$/p' your-file
Indeed, only for this version of sed, both patterns are treated; for other versions of sed like mine under cygwin, the range patterns must be on separate lines to work properly.
One last thing to remember: it does not treat inclusive patterned ranges, i.e. it stops printing after the first encountered end-pattern even if multiple start patterns have been matched. Prefer something with awk if this is a feature you are looking for.

Resources