Search directory for files with specific string and print location - string

I need to create a perl script that will search through a given directory and find all txt files that contain a specific string. I want the perl script to then print the location of those files. So far I have accomplished that task but I want to add a few more features to the script.
This is what i have so far:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
my $dir ='C:\path\to\dir';
my $keyword = "apple";
find(\&txtsearch, $dir);
sub txtsearch {
if(-f $_ && $_ =~ /\.txt$/) {
open my $file, "<", $_ or do {
warn qq(Unable to open "$File::Find::name": $!); #checks permissions
return;
};
while(<$file>) {
if (/\Q$keyword\E/) {
print "$File::Find::name\n"; #prints location of file
return; #stops searching once found
}
}
}
}
Now I want to add a few things:
Firstly, I would like to include a way to include capitalization in the search. For example, if I want to find all the instances of the word "apple" I don't want it to overlook any uses of "Apple" or "APPLE".
I would also like to be apple to input the keyword in the command line rather then the actual script itself.
Also, would it be possible to have the script create a file in the specified directory named "output.txt" and have the output of the script import into that? I know one the command line you will have to have something like > output.txt but I am not completely sure how to implement that in the script.
Lastly, it would be nice if I could include a printout of how many occurrences of the string are in the file. For example, if the word "apple" appears 5 times in a text file, I would like to see an output of the location of the file and also, something like "apple appears 5 times in FILE.txt"
Thank you so much for the help!

Firstly, I would like to include a way to include capitalization in the search. For example, if I want to find all the instances of the word "apple" I don't want it to overlook any uses of "Apple" or "APPLE".
Add i modifier to your regex
Reference: perldoc perlre
I would also like to be apple to input the keyword in the command line rather then the actual script itself.
Use GetOpt::Long module for command line parameter processing.
Reference: GetOpt::Long
Also, would it be possible to have the script create a file in the specified directory named "output.txt" and have the output of the script import into that? I know one the command line you will have to have something like > output.txt but I am not completely sure how to implement that in the script.
Open a file for writing using open(my $output_filehandle, ">", "output.txt") or die "Can not open file for writing: output.txt. $!"
Then replace print <STUFF> with print $output_filehandle <STUFF> .
Reference: perldoc open and Tutorial on opening things in Perl on Perldoc
Lastly, it would be nice if I could include a printout of how many occurrences of the string are in the file. For example, if the word "apple" appears 5 times in a text file, I would like to see an output of the location of the file and also, something like "apple appears 5 times in FILE.txt"
Keep a counter variable instead of returning after first hit. Then print filename and counter after file-lines loop ends.
Reference: Basic understanding of programming and loops, not really related to Perl at all.

Related

how to use do loop to read several files with similar names in shell script

I have several files named scale1.dat, scale2.dat scale3.dat ... up to scale9.dat.
I want to read these files in do loop one by one and with each file I want to do some manipulation (I want to write the 1st column of each scale*.dat file to scale*.txt).
So my question is, is there a way to read files with similar names. Thanks.
The regular syntax for this is
for file in scale*.dat; do
awk '{print $1}' "$file" >"${file%.dat}.txt"
done
The asterisk * matches any text or no text; if you want to constrain to just single non-zero digits, you could say for file in scale[1-9].dat instead.
In Bash, there is a non-standard additional glob syntax scale{1..9}.dat but this is Bash-only, and so will not work in #!/bin/sh scripts. (Your question has both sh and bash so it's not clear which you require. Your comment that the Bash syntax is not working for you suggests that you may need a POSIX portable solution.) Furthermore, Bash has something called extended globbing, which allows for quite elaborate pattern matching. See also http://mywiki.wooledge.org/glob
For a simple task like this, you don't really need the shell at all, though.
awk 'FNR==1 { if (f) close (f); f=FILENAME; sub(/\.dat/, ".txt", f); }
{ print $1 >f }' scale[1-9]*.dat
(Okay, maybe that's slightly intimidating for a first-timer. But the basic point is that you will often find that the commands you want to use will happily work on multiple files, and so you don't need shell loops at all in those cases.)
I don't think so. Similar names or not, you will have to iterate through all your files (perhaps with a for loop) and use a nested loop to iterate through lines or words or whatever you plan to read from those files.
Alternatively, you can copy your files into one (say, scale-all.dat) and read that single file.

Scripting-Search Multiple Strings

I have a script (./lookup) that will search a file ($c). The file will contain a list of cities. What I would like to do be able to search the file for what the user enters as an argument (./lookup Miami). For example; I can make the script return what I want if it is a single word city (Miami), but I can't figure out a way to make it work for 2 or more words (Los Angeles). I can get the single strings to return what I want with the following.
grep $1 $c
I was thinking about a loop, but I am not sure on how to do that as I am new to scripting and Linux. Thanks for any help.
Whenever arguments could possibly contain spaces, proper quoting is essential in Bash:
grep "$1" "$c"
The user will need to say ./lookup "Los Angeles". If you don't like that, you can try:
grep "$*" "$c"
Then all arguments to the script will be passed together as one string to grep.

Remove string of characters from filename using Applescript or Linux

I am converting several word documents to pdfs. The input file names are like this "CM_Genetics_in_OBGYN_docx" while the output file names are like this "job_10-Microsoft_Word_-_CM_Genetics_in_OBGYN_docx.pdf" I want to delete "job_10-Microsoft_Word_-_" and "_docx" and only have the pdf file name left "CM_Genetics_in_OBGYN.pdf". I would really like to end up with "CM Genetics in OBGYN.pdf" but "CM_Genetics_in_OBGYN.pdf" would be acceptable if that last part makes it too complicated. I have some experience with applescript and linux commands but can't nail this down.
Here you go:
for fn in job_*.pdf; do
newname=${fn#job_??-*-_}
newname=${newname/_docx}
newname=${newname//_/ }
echo "mv '$fn' '$newname'"
done
This will print mv commands ready execute, but without renaming anything. To execute the rename, simply pipe the output to sh.
The echo is useful to test everything safely. Make sure to check on the strangest pattern you can find to cover all corner cases. If everything looks good, change the echo to do the real action you want to perform instead, for example:
for fn in job_*.pdf; do
newname=${fn#job_??-*-_}
newname=${newname/_docx}
newname=${newname//_/ }
mv "$fn" "/some/other/dir/$newname"
done

Copy a section within two keywords into a target file

I have thousand of files in a directory and each file contains numbers of defined variables starting with keyword DEFINE and ending with a semicolon (;), I want to copy all the occurrences of the data between this keyword(Inclusive) into a target file.
Example: Below is the content of the text file:
/* This code is for lookup */
DEFINE variable as a1 expr= extract (n123f1 using brach, code);
END.
Now from the above content i just want to copy the section starting with DEFINE and ending with ; into a target file i.e. the output should be:
DEFINE variable as a1 expr= extract (n123f1 using brach, code);
this needs to done for thousands of scripts and multiple occurences, Please help out.
Thanks a lot , the provided code works, but to a limited extent only when the whole sentence is in a single line but the data is not supposed to be in one single line it is spread in multiple line like below:
/* This code is for lookup */
DEFINE variable as a1 expr= if branchno > 55
then
extract (n123f1 using brach, code)
else
branchno = null
;
END.
The code is also in the above fashion i need to capture all the data between DEFINE and semicolon (;) after every define there will be an ending semicolon ;, this is the pattern.
It sounds like you want grep(1):
grep '^DEFINE.*;$' input > output
Try using grep. Let's say you have files with extension .txt in present directory,
grep -ho 'DEFINE.*;' *.txt > outfile
Output:
DEFINE variable as a1 expr= extract (n123f1 using brach, code);
Short Description
-o will give you only matching string rather than whole line, if line also contains something else and want to ommit it.
-h will suppress file names before matching result
Read man page of grep by typing man grep on your terminal
EDIT
If you want capability to search in multiple lines, you can use pcregrep with -M option
pcregrep -M 'DEFINE.*?(\n|.)*?;' *.txt > outfile
Works fine on my system. Check man pcregrep for more details
Reference : SO Question
One can make a simple solution using sed with version :
sed -n -e '/^DEFINE/{:a p;/;$/!{n;ba}}' your-file
Option -n prevents sed from printing every line; then each time a line begins with DEFINE, print the line (command p) then enter a loop: until you find a line ending with ;, grab the next line and loop to the print command. When exiting the loop, you do nothing.
It looks a bit dirty; it seems that the version sed15 has a shorter (and more straightforward) way to achieve this in one line:
sed -n -e '/^DEFINE/,/;$/p' your-file
Indeed, only for this version of sed, both patterns are treated; for other versions of sed like mine under cygwin, the range patterns must be on separate lines to work properly.
One last thing to remember: it does not treat inclusive patterned ranges, i.e. it stops printing after the first encountered end-pattern even if multiple start patterns have been matched. Prefer something with awk if this is a feature you are looking for.

extract variable's value from script file in Install Anywhere

I am using Install Anywhere 2012 and would like to be able to parse a batch or shell script for a give value and have that value stored in an IA variable. For instance, if I have the following shell file:
MY_VAR1=123
MY_VAR2=a\b\c
ECHO $MY_VAR1
I would like to pass in the path to the file and the variable name (ex. MY_VAR1) and have the result, 123, stored in an IA variable of my choosing (lets say $OUTPUT$). I could achieve this through writing some java custom code but was wondering if there was an alternative approach built into IA that would make this much easier. The variable will not be initialized when I need to figure out its value so essentially just echoing it's value or something similar will not work. Any help would be greatly appreciated!
example in Windows batch:
#echo off &setlocal
for /f "tokens=2delims==" %%a in ('findstr "MY_VAR1" "ShellFile"') do set "output=%%a"
if defined output (echo MY_VAR1: %output%) else echo MY_VAR1 not found!
On Linux/Unix, you could use perl or awk (both are standard utilities in most distros). Python or Ruby are also candidates, but may not be installed on your target system. You could even write your own targeted parser using Lex and Yacc and ship it with your installer. However, for your needs, that's surely overkill.
Here's an example of a possible awk solution in an Execute Script/Batch File Action:
#!/bin/bash
awk '
# Process lines that begin with our variable name,
# preceded by optional spaces or tabs.
/^[ \t]*$TARGET_VARIABLE_NAME$=.+/ {
# Split the current line on "=" into
# an array called result
split($0, result, "=")
value = result[1]
# Look for trailing comments and remove them.
offset = index(value, "#")
if (offset > 0) {
value = substr(value, 1, offset - 1)
}
# Remove any possible leading spaces and quotes.
# Note that the single-quote is escaped. That escape
# is for bash, not for awk. I am doing this from
# memory and do not have access to IA right now.
# you may have to play with the escaping.
gsub(/^[\'" ]*/, "", value)
# Remove any possible trailing spaces and quotes.
# See above regarding the escaped single-quote.
gsub(/[\'" ]*$/, "", value)
# send "value" to stdout
print value
}
' < $SHELL_INPUT_FILE$
The print value line (near the end) sends value to stdout.
In the Execute Script/Batch File Action settings you can designate variables that receive the stdout and stderr streams produced by the script action. By default, the stdout stream is stored in $EXECUTE_STDOUT$. You can change this to a variable name of your choosing.
In the example, above, $TARGET_VARIABLE_NAME$ and $SHELL_INPUT_FILE$ are InstallAnywhere variables that hold the name of the variable to find and the name of the file to parse, respectively. These variables will be replaced by their values before the Action executes.
Assume we have a script called /home/fred/hello.sh, which contains the following code:
#!/bin/bash
WIFE='Wilma'
NEIGHBOR="Barney Rubble"
echo "Hello to $WIFE and $NEIGHBOR from $PWD"
Before the Execute Script/Batch File Action runs, stuff the name of the script file into $SHELL_INPUT_FILE$ (/home/fred/hello.sh). Then set the value of $TARGET_VARIABLE_NAME$ to the variable you wish to find (say, NEIGHBOR). After the action completes, $EXECUTE_STDOUT$ in InstallAnywhere will contain Barney Rubble.
You can build on this idea to parse arbitrarily complex files in an Execute Script/Batch File Action. Just make your awk (or perl/Ruby/Python) script as complex as needed.
NOTE: when scripting Unix shell scripts in InstallAnywhere ALWAYS check the "Do not replace unknown variables" option. If you don't, InstallAnywhere will quietly convert anything that looks vaguely like an InstallAnywhere variable into blanks... It's very annoying.
For a Windows solution, find a standalone Windows version of awk or perl and include it with your installation. Then extend the above solution to work for batch files.
You'd want to create two Execute Script/Batch File Actions, one with a rule for Linux/Unix and one with a rule for Windows. You'd have to install the Windows awk or perl executable before calling this action though. Also, you'd need to fully qualify the path to the awk/perl executable. Finally, the actual script will need to be sensitive to differences in batch syntax versus shell syntax.
Below is an awk script modified to look for batch variable definitions. The pattern changes and you won't have to worry about embedded comments:
$PATH_TO_AWK_EXE$ '
# This pattern looks for optional spaces, the word SET
# with any capitalization, the target variable, more
# optional spaces and the equals sign.
/^[ \t]*[Ss][Ee][Tt][ \t]*$TARGET_VARIABLE_NAME$[ \t]*=.+/ {
# Split the current line on "=" into
# an array called result
split($0, result, "=")
value = result[1]
# No trailing comments in Batch files.
# Remove any possible leading spaces and quotes.
# Note that the single-quote is escaped. That escape
# is for bash, not for awk. I am doing this from
# memory and do not have access to IA right now.
# you may have to play with the escaping.
gsub(/^[\'" ]*/, "", value)
# Remove any possible trailing spaces and quotes.
# See above regarding the escaped single-quote.
gsub(/[\'" ]*$/, "", value)
# send "value" to stdout
print value
}
' < $SHELL_INPUT_FILE$
Above, the IA variable $PATH_TO_AWK_EXE$ points to the location where awk was installed. It would be set as some combination of $USER_INSTALL_FOLDER$, possibly other directory names, and the name of the awk.exe file. $PATH_TO_AWK_EXE$ can later be used to remove the awk executable, if desired.
You can try to get variable from output of script
"Execute Script" -> Store process's stdout in: $EXECUTE_OUTPUT$
Than you can use $EXECUTE_OUTPUT$ as variable after that

Resources