Dynamic searching and string copying in bash - linux

I use mailget for a home-made "backup" system, which backs pre-specified files up when receiving a mail containing the string "backup" by using the following search command:
$ grep -rnw '/path/to/mailbox/' -e "backup"
I want to extract a mailaddress to a variable $var looking like this whereas the string "Return-Path: " (13 chars), always is static in the beginning of each mail file as following:
Return-Path: <someone#domain.com>
In conclusion: When a file containing the string "backup" is detected under a given path, the script is supposed to extract the mailaddress from the regarded file to $var.
Can't get my head around this one, grateful for any help.

The natural mechanism for capturing the output of a command in a variable is "command substitution". The syntax for a command substitution is $( <the command> ); it expands to the standard output of the specified command.
The standard lightweight general tools appropriate for extracting text from a file such as yours are sed and awk. You can also use grep's -l option to make it emit the name of the file wherein it found a match, rather than the match itself. You might put those together something like this:
var=$(sed -n -e '/^Return-Path:/ {s/.*<\(.*\)>.*/\1/;p;q}' $(grep -rlw '/path/to/mailbox/' -e "backup"))
The nested command substitution obtains the names of the files containing the target string; the sed command processes those files and extracts (only) the text between the < and > on the first line starting with "Return-Path:". It makes some assumptions that render it shorter but less robust; my objective is merely to demonstrate, not to write production-quality code for you.

Related

convert this linux statement into a statement which is supported by windows command prompt

This is my statement supported by unix environment
"cat document.xml | grep \'<w:t\' | sed \'s/<[^<]*>//g\' | grep -v \'^[[:space:]]*$\'"
But I want to execute that statement in windows command prompt .
How do I do that? and what are the commands which are similar to cat, grep,sed .
please tell me the exact code supported for windows similar to above command
The double quotes around the pipeline in your question are a syntax error, and the backslashed single quotes should apparently really not have backslashes, but I assume it's just an artefact of a slightly imprecise presentation.
Here's what the code does.
cat document.xml |
This is a useless use of cat but its purpose is to feed the contents of this file into the pipeline.
grep '<w:t' |
This looks for lines containing the literal string <w:t (probably the start of a tag in the XML format in the file). The single quotes quote the string so that it is not interpreted by the shell (otherwise the < would be interpreted as a redirection operator); they are consumed by the shell, and not passed through to grep.
sed 's/<[^<]*>//g' |
This replaces every pair of open/close brokets with an empty string. The regular expression [^<]* matches zero or more occurrences of a character which can be anything except <. If the XML is well-formed, these should always occur in pairs, and so we effectively remove all XML tags.
grep -v '^[[:space:]]*$'
This removes any line which is empty or consists entirely of whitespace.
Because sed is a superset of grep, the program could easily be rephrased as a single sed script. Perhaps the easiest solution for your immediate problem would be to obtain a copy of sed for your platform.
sed -e '/<w:t/!d' -e 's/<[^<]*>//g' -e '/[^[:space]]/!d' document.xml
I understand quoting rules on Windows may be different; try with double quotes instead of single, or put the script in a file and use sed -f file document.xml where file contains the script itself, like this:
/<w:t/!d
s/<[^<]*>//g
/[^[:space]]/!d
This is a rather crude way to extract the CDATA from an XML document, anyway; perhaps some XML processor would be the proper way forward. E.g. xmlstarlet appears to be available for Windows. It works even if the XML input doesn't have the beginning and ending <w:t> tags on the same line, with nothing else on it. (In fact, parsing XML with line-oriented tools is a massive antipattern.)
May try with "powershell" ?
It is included since Win8 I think,
for sure on W10 it is.
I've just tested a "cat" command and it works.
"grep" don't but may be adapt like this :
PowerShell equivalent to grep -f
and
https://communary.wordpress.com/2014/11/10/grep-the-powershell-way/
The equivalent of grep on windows would be findstr and the equivalent of cat would be type.

Get numeric value from file name

I am a new guy of Linux. I have a question:
I have a bunch of files in a directory, like:
abc-188_1.out
abc-188_2.out
abc-188_3.out
how can a get the number 188 from those names?
Assuming (since you are on linux and are working with files), that you will use a shell / bash-script... (If you use something different (say, python, ...), the solution will, of course, be a different one.)
... this will work
for file in `ls *`; do out=`echo "${file//[!0-9]/ }"|xargs|cut -d' ' -f1`; echo $out; done
Explanation
The basic problem is to extract a number from a string in bash script (search stackoverflow for this, you will find dozens of different solutions).
This is done in the command above as (the string from which numbers are to be extracted being saved in the variable file):
${file//[!0-9]/ }
or, without spaces
${file//[!0-9]/}
It is complicated here by two things:
Do this recursively on the contents of a directory. This is done here with a bash for loop (note that the variable file takes as value the name of each of the files on the current working directory, one after another)
for file in ls *; do (commands you want done for every file in the CWD, seperated by ";"); done
There are multiple numbers in the filenames, you just want the first one.
Therefore, we leave the spaces in, and pipe the result (that being only numbers and spaces from the current file name) into two other commands, xargs (removes leading and trailing whitespace) and cut -d' ' -f1` (returns only the part of the string before the first remaining space, i.e. the first number in our filename),
We save the resulting string in a variable "out" and print it with echo $out,
out=echo "${file//[!0-9]/ }"|xargs|cut -d' ' -f1; echo $out
Note that the number is still in a string data type. You can transform it to integer if you want by using double brackets preceeded by $ out_int=$((out))

grep -f on files in a zipped folder

I am performing a recursive fgrep/grep -f search on a zipped up folder using the following command in one of my programs:
The command I am using:
grep -r -i -z -I -f /path/to/pattern/file /home/folder/TestZipFolder.zip
Inside the pattern file is the string "Dog" that I am trying to search for.
In the zipped up folder there are a number of text files containing the string "Dog".
The grep -f command successfully finds the text files containing the string "Dog" in 3 files inside the zipped up folder, but it prints the output all on one line and some strange characters appear at the end i.e PK (as shown below). And when I try and print the output to a file in my program other characters appear on the end such as ^B^T^#
Output from the grep -f command:
TestZipFolder/test.txtThis is a file containing the string DogPKtest1.txtDog, is found again in this file.PKTestZipFolder/another.txtDog is written in this file.PK
How would I get each of the files where the string "Dog" has been found to print on a new line so they are not all grouped together on one line like they are now?
Also where are the "PK" and other strange characters appearing from in the output and how do i prevent them from appearing?
Desired output
TestZipFolder/test.txt:This is a file containing the string Dog
TestZipFolder/test1.txt:Dog, is found again in this file
TestZipFolder/another.txt:Dog is written in this file
Something along these lines, whereby the user is able to see where the string can be found in the file (you actually get the output in this format if you run the grep command on a file that is not a zip file).
If you need a multiline output, better use zipgrep :
zipgrep -s "pattern" TestZipFolder.zip
the -s is to suppress error messages(optional). This command will print every matched lines along with the file name. If you want to remove the duplicate names, when more than one match is in a file, some other processing must be done using loops/grep or awk or sed.
Actually, zipgrep is a combination egrep and unzip. And its usage is as follows :
zipgrep [egrep_options] pattern file[.zip] [file(s) ...] [-x xfile(s) ...]
so you can pass any egrep options to it.

How do I insert the results of several commands on a file as part of my sed stream?

I use DJing software on linux (xwax) which uses a 'scanning' script (visible here) that compiles all the music files available to the software and outputs a string which contains a path to the filename and then the title of the mp3. For example, if it scans path-to-mp3/Artist - Test.mp3, it will spit out a string like so:
path-to-mp3/Artist - Test.mp3[tab]Artist - Test
I have tagged all my mp3s with BPM information via the id3v2 tool and have a commandline method for extracting that information as follows:
id3v2 -l name-of-mp3.mp3 | grep TBPM | cut -D: -f2
That spits out JUST the numerical BPM to me. What I'd like to do is prepend the BPM number from the above command as part of the xwax scanning script, but I'm not sure how to insert that command in the midst of the script. What I'd want it to generate is:
path-to-mp3/Artist - Test.mp3[tab][bpm]Artist - Test
Any ideas?
It's not clear to me where in that script you want to insert the BPM number, but the idea is this:
To embed the output of one command into the arguments of another, you can use the "command substitution" notation `...` or $(...). For example, this:
rm $(echo abcd)
runs the command echo abcd and substitutes its output (abcd) into the overall command; so that's equivalent to just rm abcd. It will remove the file named abcd.
The above doesn't work inside single-quotes. If you want, you can just put it outside quotes, as I did in the above example; but it's generally safer to put it inside double-quotes (so as to prevent some unwanted postprocessing). Either of these:
rm "$(echo abcd)"
rm "a$(echo bc)d"
will remove the file named abcd.
In your case, you need to embed the command substitution into the middle of an argument that's mostly single-quoted. You can do that by simply putting the single-quoted strings and double-quoted strings right next to each other with no space in between, so that Bash will combine them into a single argument. (This also works with unquoted strings.) For example, either of these:
rm a"$(echo bc)"d
rm 'a'"$(echo bc)"'d'
will remove the file named abcd.
Edited to add: O.K., I think I understand what you're trying to do. You have a command that either (1) outputs out all the files in a specified directory (and any subdirectories and so on), one per line, or (2) outputs the contents of a file, where the contents of that file is a list of files, one per line. So in either case, it's outputting a list of files, one per line. And you're piping that list into this command:
sed -n '
{
# /[<num>[.]] <artist> - <title>.ext
s:/\([0-9]\+.\? \+\)\?\([^/]*\) \+- \+\([^/]*\)\.[A-Z0-9]*$:\0\t\2\t\3:pi
t
# /<artist> - <album>[/(Disc|Side) <name>]/[<ABnum>[.]] <title>.ext
s:/\([^/]*\) \+- \+\([^/]*\)\(/\(disc\|side\) [0-9A-Z][^/]*\)\?/\([A-H]\?[A0-9]\?[0-9].\? \+\)\?\([^/]*\)\.[A-Z0-9]*$:\0\t\1\t\6:pi
t
# /[<ABnum>[.]] <name>.ext
s:/\([A-H]\?[A0-9]\?[0-9].\? \+\)\?\([^/]*\)\.[A-Z0-9]*$:\0\t\t\2:pi
}
'
which runs a sed script over that list. What you want is for all of the replacement-strings to change from \0\t... to \0\tBPM\t..., where BPM is the BPM number computed from your command. Right? And you need to compute that BPM number separately for each file, so instead of relying on seds implicit line-by-line looping, you need to handle the looping yourself, and process one line at a time. Right?
So, you should change the above command to this:
while read -r LINE ; do # loop over the lines, saving each one as "$LINE"
BPM=$(id3v2 -l "$LINE" | grep TBPM | cut -D: -f2) # save BPM as "$BPM"
sed -n '
{
# /[<num>[.]] <artist> - <title>.ext
s:/\([0-9]\+.\? \+\)\?\([^/]*\) \+- \+\([^/]*\)\.[A-Z0-9]*$:\0\t'"$BPM"'\t\2\t\3:pi
t
# /<artist> - <album>[/(Disc|Side) <name>]/[<ABnum>[.]] <title>.ext
s:/\([^/]*\) \+- \+\([^/]*\)\(/\(disc\|side\) [0-9A-Z][^/]*\)\?/\([A-H]\?[A0-9]\?[0-9].\? \+\)\?\([^/]*\)\.[A-Z0-9]*$:\0\t'"$BPM"'\t\1\t\6:pi
t
# /[<ABnum>[.]] <name>.ext
s:/\([A-H]\?[A0-9]\?[0-9].\? \+\)\?\([^/]*\)\.[A-Z0-9]*$:\0\t'"$BPM"'\t\t\2:pi
}
' <<<"$LINE" # take $LINE as input, rather than reading more lines
done
(where the only change to the sed script itself was to insert '"$BPM"'\t in a few places to switch from single-quoting to double-quoting, then insert the BPM, then switch back to single-quoting and add a tab).

Extracting sub-strings in Unix

I'm using cygwin on Windows 7. I want to loop through a folder consisting of about 10,000 files and perform a signal processing tool's operation on each file. The problem is that the files names have some excess characters that are not compatible with the operation. Hence, I need to extract just a certain part of the file names.
For example if the file name is abc123456_justlike.txt.rna I need to use abc123456_justlike.txt. How should I write a loop to go through each file and perform the operation on the shortened file names?
I tried the cut - b1-10 command but that doesn't let my tool perform the necessary operation. I'd appreciate help with this problem
Try some shell scripting, using the ${NAME%TAIL} parameter substitution: the contents of variable NAME are expanded, but any suffix material which matches the TAIL glob pattern is chopped off.
$ NAME=abc12345.txt.rna
$ echo ${NAME%.rna} #
# process all files in the directory, taking off their .rna suffix
$ for x in *; do signal_processing_tool ${x%.rna} ; done
If there are variations among the file names, you can classify them with a case:
for x in * ; do
case $x in
*.rna )
# do something with .rna files
;;
*.txt )
# do something else with .txt files
;;
* )
# default catch-all-else case
;;
esac
done
Try sed:
echo a.b.c | sed 's/\.[^.]*$//'
The s command in sed performs a search-and-replace operation, in this case it replaces the regular expression \.[^.]*$ (meaning: a dot, followed by any number of non-dots, at the end of the string) with the empty string.
If you are not yet familiar with regular expressions, this is a good point to learn them. I find manipulating string using regular expressions much more straightforward than using tools like cut (or their equivalents).
If you are trying to extract the list of filenames from a directory use the below command.
ls -ltr | awk -F " " '{print $9}' | cut -c1-10

Resources