Recursive grep with include giving incorrect results for current folder - linux

I have created a test directory structure:
t1.html
t2.php
a/t1.html
a/t2.php
b/t1.html
b/t2.php
All files contain the string "HELLO".
The following commands are run from the root folder above:
> grep -r "HELLO" *
b/t1.html:HELLO
b/t2.php:HELLO
c/t1.html:HELLO
c/t2.php:HELLO
t1.html:HELLO
t2.php:HELLO
> grep -r --include=*.html "HELLO" *
b/t1.html:HELLO
c/t1.html:HELLO
t2.php:HELLO
Why is it including the correct .html files from the sub-directories, but the .php file from the current directory?
If I pop up a level to the directory above my whole structure, then it gives following result:
grep -r --include=*.html "HELLO" *
a/t1.html:HELLO
a/c/t1.html:HELLO
a/b/t1.html:HELLO
This is what I expected when ran from within my structure.
I assume I can achieve the goal using find+grep together, but I thought this was valid usage of grep.
Thanks for any help.
Andy

Use a dot instead of the asterisk:
grep -r HELLO .
Asterisk gets evaluated by the shell and replaced with the list of all the files in the current directory (whose names don't start with a dot). All of them are then grepped recursively.

Related

How to copy multiple files with varying version numbers from one directory to another using bash?

I have a folder /home/user/Document/filepath where I have three files namely file1-1.1.0.txt, file2-1.1.1.txt, file3-1.1.2.txt
and another folder named /home/user/Document/backuppath where I have to move files from /home/user/Document/folderpath which has file1-1.0.0.txt, file2-1.0.1.txt and file3-1.0.2.txt
task is to copy the specific files from folder path to backup path.
To summarize:
the below is the files.txt where I listed the files which has to be copied:
file1-*.txt
file2-*.txt
The below is the move.sh script that execute the movements
for file in `cat files.txt`; do cp "/home/user/Document/folderpath/$file" "/home/user/Documents/backuppath/" ; done
for the above script I am getting the error like
cp: cannot stat '/home/user/Document/folderpath/file1-*.txt': No such file or directory found
cp: cannot stat '/home/user/Document/folderpath/file2-*.txt': No such file or directory found
what I would like to accomplish is that I would like to use the script to copy specific files using * in the place of version numbers., since the version number may vary in the future.
You have wildcard characters in your files.txt. In your cp command, you are using quotes. These quotes prevent the wildcards to be expanded, as you can clearly see from the error message.
One obvious possibility is to not use quotes:
cp /home/user/Document/folderpath/$file /home/user/Documents/backuppath/
Or not use a loop at all:
cp $(<files.txt) /home/user/Documents/backuppath/
However, this would of course break if one line in your files.txt is a filename pattern which contains white spaces. Therefore, I would recommend a second loop over the expanded pattern:
while read file # Puts the next line into 'file'
do
for f in $file # This expands the pattern in 'file'
do
cp "/home/user/Document/folderpath/$f" /home/user/Documents/backuppath
done
done < files.txt

string manipulation of Directory structure

Scenario: I have a script but no idea where I am in the directory tree, I need to resolve back to the nearest known location UPROC[something]
What I have so far:
I have a script running in a directory for example:
/home/jim/query/UPROCL/test/bob/dircut.sh
now the only constant in this is that the Directory I want will begin with UPROC... maybe not UPROCL but definitely UPROC
So I have written the following:
#!/bin/bash
#Absolute path for this script
SCRIPT=$(readlink -f "$0")
echo $SCRIPT
#Gets Path of script without script name
SCRIPTPATH=$(dirname "$SCRIPT")
echo $SCRIPTPATH
#Cuts everything after UPROC(.* is wildcard)/
CUTDOWN=$(sed 's/\(UPROC.*\/\).*/\1/' <<< $SCRIPTPATH)
echo $CUTDOWN
The only problem is that it output is:
/home/jim/query/UPROCL/test/bob/dircut.sh
/home/jim/query/UPROCL/test/bob
/home/jim/query/UPROCL/test/
Can some tell me what is wrong with my sed command as it is not cutting down to
/home/jim/query/UPROCL/
Because * is greedy. You want to be more selective about what characters are allowed following "UPROC" -- any non-slash
Not
sed 's/\(UPROC.*\/\).*/\1/'
but
sed -r 's,(UPROC[^/]*/).*,\1,'
Using different delimiters for the s/// command reduces the "leaning toothpick" problem.
Because the .* in the () is matching to the / at the end of test/.
You need [^/]* instead of . to not match any slashes.
When you want to know in which directory you are, why don't use pwd?
One thing which might be useful: the command pwd shows the value of the environment variable PWD (uppercase). In case you want to use the current directory as a value, you might use this.

Unix: finding a string within a directory and listing only its associated file names

I have been working on this for quite some time and decided to ask for some help. I'm trying to use a command to find a multiple occurrences of a function (basically a string) within a directory (that has multiple files) and would like to view only the file names which the string is found.
Lets say this was the directory I want to search filled with multiple .h and .cpp files is:
~/Project/Files
and I was looking for occurrences of a function called 'doThis'
So far I have tried:
grep -r doThis ~/Project/Files
But I get the path and where it occurs in the file, I only need the file names.
Also grep -f wont work because I get an error message saying "No such file or directory" and when using just grep I get an error message saying "path is a directory"
Any help would be great: Thanks guys!
Simply use the -l switch ;)
So :
grep -rl foobar dir

How do you format output string in bash script for input by another script?

I need to unzip a bunch of student assignment (jar) files so that I can use a script to submit the contents to the Moss (Stanford) plagiarism detection server. I did the same thing in Java which was trivial but I'm trying to re-implement to as a bash script.
I am trying to do the following:
Get a list of student names (each student has a directory).
In each student directory, sub-directories exist numbered from 1 to the
latest submission. I need to get the directory with the highest
number.
Inside of each of those submission directories contains a
jar file that I need. I copy each jar into a temp directory with the
same name as the student and unzip it.
I need that temp directory listing formatted as a string in the form
/tempDir/studentName1/.languageExt /tempDir/studentName2/.languageExt
The student directory has the basic structure:
Student_Root_Directory:
Student1
Student2
Student1
Sub-Directories: 1 2 3 4 5
1: student1.jar
2: student1.jar
...
Student2
Sub-Directories: 1 2 3
1. student2.jar
...
To do the first 3 steps above I did:
#!/bin/bash
# Extract all jar files into a temp directory called /home/moss/tempJarFiles/studentName
# $1 is the command line argument that contains the path to the institution submission dir.
# $2 is the language extension: .c, .cpp, .java, .py
students=`ls $1`
student_dir=$1
languageExt=$2
mossDir="/home/moss"
tempDir="/home/moss/tempJarStorage"
for student in $students
do
latestSubmissionDir=`ls -t $student_dir/$student | head -1`
for jarDir in $latestSubmissionDir
do
mkdir $tempDir/$student
cp $student_dir/$student/$jarDir/*.jar $tempDir/$student
unzip -d $tempDir/$student/ -o -j $tempDir/$student/$student.jar *.$languageExt
rm $tempDir/$student/$student.jar
done
done
...which results in a number of student directories being created in a temp directory that contains only the unzipped contents for the student submissions.
I need the ls output of the new temp directories formatted as a string that contains:
/tempDir/studentName1/\*.languageExt /tempDir/studentName2/\*.languageExt
I have tried variations on
find "$tempDir" -iname "*.$languageExt" -printf "%p/*.$languageExt"
using iname and not - but I either have output that contains extra directory information such as $tempDir/*.languageExt (when I just need the subdirectories $tempDir/$studentName/*.languageExt) or I have output where the path for every source file is also listed such as:
$tempDir/$studentName/studentNameA.java
$tempDir/$studentName/studentNameB.java
when I only need
$tempDir/$studentName/*.java
I think this should be really easy and I'm just over thinking it. Any hints for improving the script also appreciated.
Here's a revised version of the script hat may work:
#/bin/bash
# Extract all jar files into a temp directory called /home/moss/tempJarFiles/studentName
# $1 is the command line argument that contains the path to the institution submission dir.
# $2 is the language extension: c, cpp, java, py
students_dir=$1
languageExt=$2
studentPathsT=( "$students_dir"/*/ )
mossDir='/home/moss'
tempDir='/home/moss/tempJarStorage'
for studentPathT in "${studentPathsT[#]}"; do
student=$(basename "$studentPathT")
mkdir "$tempDir/$student"
submissionDirsT=( "$studentPathT"*/ )
latestSubmissionDirT=${submissionDirsT[${#submissionDirsT[#]-1]}
cp "$latestSubmissionDirT"*.jar "$tempDir/$student/"
unzip -d "$tempDir/$student/" -o -j "$tempDir/$student/*.jar" "*.$languageExt"
rm "$tempDir/$student"/*.jar
done
# Note that at this point `"$tempDir"/*/*.$languageExt` would expand
# to all extracted submission files, across all students.
# Finally, output each student's extracted files as an unexpanded glob à la
# /{tempDir}/{studentName1}/*.{languageExt}
for pT in "$tempDir"/*/; do
echo "$pT*.$languageExt"
# Note: If there is a chance that your filenames contain
# embedded newlines (rare in practice) using `echo` won't work properly
# as #Charles Duffy points out.
# If that is a concern, use
# printf '%s\0' "$pT*.$languageExt"
# and process the output with a utility that can process NUL characters
# as separators, such as `xargs -0`.
done
It avoids using ls and only uses pathname expansion and array variables so as to properly deal with paths that contain embedded spaces and other shell metacharacters.
suffix ...T in variable names indicates that a particular path or array of paths is *T*erminated, i.e, that it ends in a /.
The assumption is that the numbered subdirectories do not go beyond 9, as the implicit lexical sorting of pathname expansion is relied upon; if the numbers go higher, explicit numerical sorting must be applied.
Note that the globs (pathname patterns) passed to unzip are intentionally double-quoted, as they should be interpreted by unzip, not the shell.
Note that, based on your original code, I've assumed that $languageExt does NOT start with . (e.g., cpp rather than .cpp), despite what your comment says.

Execute program on Files in subDirectory

I have following architecture of files in a directory.
Directory
/A/abc.xyz
/B/abc.xyz
/C/abc.xyz
/D/abc.xyz
/E/abc.xyz
I want to execute a program on acb.xyz in each SubDirectory. Save Output files in different directory i.e. Directory/processed with the name of SubDirectory appended in the name of output files.
Can it be written in following way? Need corrections.
for i in `ls "Directory/"`
do
program.pl $i/abc.xyz > processed/$i-abc.xyz
done
for dir in Directory/*; do
program.pl "$dir/abc.xyz" > "processed/${dir##*/}-abc.xyz"
done
The ${dir##*/} part strips the leading directory names from $dir, so Directory/A becomes just A. I added quotes to ensure directory names with whitespace don't cause issue (a good habit, even if you know there are no spaces).
As an alternative to the string munging you could simplify this if you first change directory:
cd Directory
for dir in *; do
program.pl "$dir/abc.xyz" > "processed/$dir-abc.xyz"
done

Resources