Output filename/lines/type for given directory - linux

I'm trying to teach myself basic file manipulation and scripting in linux but I've hit a wall. Right now I'm trying to output a table that gives something like
FILENAME LINES TYPE
File1 22 File
File2 56 File
Folder1 N/A Directory
when given any directory to search. I've been researching how to format output using awk and using maybe grep and wc to try and get my data but I'm a bit lost. For all I know I'm barking up the wrong tree entirely.

Look at printf to format your output, then look at the commands file to find your file type, wc to print out the number of lines, etc.
All this could be done via a find | while read loop:
printf "%-20.20s %-3.3s %s\n", "File", "Lines", "Type"
find . -type f -print0 | while read -d $'\0' file
do
file_name=$(basename $file)
lines="$(cat $file | wc -l | sed 's/^ *//')"
desc="$(file --brief "$file")"
printf "%-20.20s %3.3s %s\n", "$file_name", $lines, "$desc"
done
The $(...) syntax returns the output of the enclosed command as a string that can be assigned to variable. I use cat $file | wc -l to eliminate the name of the file, and then use sed to remove leading spaces.

Related

Open header-files in editor based on content in corresponding source

I have several files that have the same name, but a different extension. For example
echo "array" > A.hpp
echo "..." > A.h
echo "content" > B.hpp
echo "..." > B.h
echo "content" > C.hpp
echo "..." > C.h
I want to get a list of *.h files based on some content in the corresponding *.hpp file. In particular I am looking for a one-liner to open them in my editor.
It is fair to assume that for each *.hpp file the corresponding *.h file exists. Also, since they are source files, it may be assumed that the filenames do not contain whitespaces.
Current approach
I know how to get a list of *.hpp files based on their content. An approach (but surely not the only or the best) is to
find . -type f -iname '*.hpp' -print | xargs grep -i 'content' | cut -d":" -f1
which gives
./B.hpp
./C.hpp
Opening in my editor is then done by
st `find . -type f -iname '*.hpp' -print | xargs grep -i 'content' | cut -d":" -f1`
But how can I get/open the corresponding *.h files?
You say you want to get a list of *.h files based on some content in the corresponding *.hpp file.
while read -r line ; do
echo "${line%.hpp}.h"
done < <(grep -i 'content' *.hpp| cut -d":" -f1)
BashFAQ 001 recommends to use a while loop and read command to read a data stream.
One-liner as requested
st `while IFS= read -r line ; do echo "${line%.hpp}.h"; done < <(grep -i 'content' *.hpp| cut -d":" -f1)`
If you are dealing with filenames containing whitespace, you need to use printf instead of echo.
st `while IFS= read -r line ; do printf '%q' "${line%.hpp}.h"; done < <(grep -i 'content' *.hpp| cut -d":" -f1)`
The %q lets printf format the output so that it can be reused as shell input.
Explanation
You have to read it from behind. First we grep all files ending in .hpp in the current directory for the string 'content' and cut everything but the basename.
The while loop will read the output of grep and assign the basename to the variable line.
Inside the while loop we use bash's parameter substitution to change the file extension from .h to .hpp.
Your question still isn't clear but is this all you're trying to do (using GNU awk for gensub())?
$ awk '/content/{print gensub(/[^.]+$/,"h",1,FILENAME)}' *.hpp
B.h
C.h

Printing the number of lines

I have a directory that contains only .txt files. I want to print the number of lines for every file. When I write cat file.txt | wc -l the number of lines appears but when I want to make a script it's more complicated. I have this code:
for fis in `ls -R $1`
do
echo `cat $fis | wc -l`
done
I tried: wc -l $fis , with awk,grep and it doesn't work. It tells that:
cat: fis1: No such file or directory
0
How can I do to print the number of lines?
To find files recursively in subdirectories, use the find command, not ls -R, which is mainly intended for human reading.
find "$1" -type f -exec wc -l {} +
The problems with looping over the output of ls -R are:
Filenames with whitespace won't be parsed correctly.
It prints other output beside just the filenames.
Not the problem here, but the echo command is more than needed:
You can use
wc -l "${fis}"
What goes wrong?
You have a subdir called fis1. Look to the output of ls:
# ls -R fis1
fis1:
file1_in_fis1.txt
When you are parsing this output, your script will try
echo `cat fis1: | wc -l`
The cat will tell you No such file or directory and wc counts 0.
As #Barmar explained, ls prints additional output you do not want.
Do not try to patch your attempt by | grep .txt and if [ -f "${fis}"]; then .., these will fail with filename with spaces.txt. So use find or shopt (and accept the answer of #Barmar or #Cyrus).

Reformatting name / content pairs from grep in a bash script

I'm attempting to create a bash script that will grep a single file for two separate pieces of data, and print them to stdout.
So far this is what I have:
#!/bin/sh
cd /my/filePath/to/directory
APP=`grep -r --include "inputs.conf" "\[" | grep -oP '^[^\/]+'`
INPUT=`grep -r --include "inputs.conf" "\[" | grep -oP '\[[^\]]+'`
for i in $APP
do
{cd /opt/splunk/etc/deployment-apps
INPUT=`grep -r --include "inputs.conf" "\[" | grep -oP '\[[^\]]+'`
echo -n "$i | $INPUT"}
done
echo "";
exit
Which gives me an output printing the entire output of the first command (which is about 200 lines), then a |, then the other results from the second command. I was thinking I could create an array to do this, however I'm still learning bash.
This is an output example from the command without piping to grep:
TA-XA6x-Server/local/inputs.conf:[perfmon://Processor]
There are 200+ of these in a single execution, and I was looking to have the format be printed as something like this
app="TA-XA6x-Server/local/inputs.conf:" | input="[perfmon://Processor]"
There are essentially two pieces of information I'm attempting to stitch together:
the file path to the file
the contents of the file itself (the input)
Here is an example of the file path:
/opt/splunk/etc/deployment-apps/TA-XA6x-Server/local/inputs.conf
and this is an example of the inputs.conf file contents:
[perfmon://TCPv4]
The easy, mostly-working-ish approach is something like this:
#!/bin/bash
while IFS=: read -r name content; do
printf 'app="%s" | input="%s"\n' "$name" "$content"
done < <(grep -r --include "inputs.conf" "\[")
If you need to work reliably with all possible filenames (including names with colons or newlines) and have GNU grep available, consider the --null argument to grep and adjusting the read usage appropriately:
#!/bin/bash
while IFS= read -r -d '' name && IFS= read -r content; do
printf 'app="%s" | input="%s"\n' "$name" "$content"
done < <(grep -r --null --include "inputs.conf" "\[")

how to compare output of two ls in linux

So here is the task which I can't solve. I have a directory with .h files and a directory with .i files, which have the same names as the .h files. I want just by typing a command to have all .h files which are not found as .i files. It's not a hard problem, I can do it in some programming language, but I'm just curious how it will look like in cmd :). To be more specific here is the algo:
get file names without extensions from ls *.h
get file names without extensions from ls *.i
compare them
print all names from 1 that are not met in 2
Good luck!
diff \
<(ls dir.with.h | sed 's/\.h$//') \
<(ls dir.with.i | sed 's/\.i$//') \
| grep '$<' \
| cut -c3-
diff <(ls dir.with.h | sed 's/\.h$//') <(ls dir.with.i | sed 's/\.i$//') executes ls on the two directories, cuts off the extensions, and compares the two lists. Then grep '$<' finds the files that are only in the first listing, and cut -c3- cuts off the "< " characters that diff inserted.
ls ./dir_h/*.h | sed -r -n 's:.*dir_h/([^.]*).h$:dir_i/\1.i:p' | xargs ls 2>&1 | \
grep "No such file or directory" | awk '{print $4}' | sed -n -r 's:dir_i/([^:]*).*:dir_h/\1:p'
ls -1 dir1/*.hh dir2/*.ii | awk -F"/" '{print $NF}' |awk -F"." '{a[$1]++;b[$0]}END{for(i in a)if(a[i]==1 && b[i".hh"]) print i}'
explanation:
ls -1 dir1/*.hh dir2/*.ii
above will list all the files *.hh and *.ii files in both the directories.
awk -F"/" '{print $NF}'
above will just print the file name excluding the complete path of the file.
awk -F"." '{a[$1]++;b[$0]}END{for(i in a)if(a[i]==1 && b[i".hh"]) print i}'
above will create two associative arrays one with file name and one with excluding the extension.
if both hh and ii files exist the value in the assosciative array will 2 if there is only one file then the value will be 1.so we need array item whose value is 1 and it should be a header file (.hh).
this can be checked using the asso..array b which is done in the END block.
Assuming bash is your shell:
for file in $( ls dir_with_h/*.h ); do
name=${file%\.h}; # trim trailing ".h" file extension
name=${name#dir_with_h/}; # trim leading folder name
if [ ! -e dir_with_i/${name}.i ]; then
echo ${name};
fi
done
Undoubtedly this can be ported to virtually all other shells. I find this less cryptic than some other approaches (although this is surely my problem) but it is a little wordy. As such. a shell script might help recall it.

Problems with Grep Command in bash script

I'm having some rather unusual problems using grep in a bash script. Below is an example of the bash script code that I'm using that exhibits the behaviour:
UNIQ_SCAN_INIT_POINT=1
cat "$FILE_BASENAME_LIST" | uniq -d >> $UNIQ_LIST
sed '/^$/d' $UNIQ_LIST >> $UNIQ_LIST_FINAL
UNIQ_LINE_COUNT=`wc -l $UNIQ_LIST_FINAL | cut -d \ -f 1`
while [ -n "`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`" ]; do
CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
CURRENT_DUPECHK_FILE=$FILE_DUPEMATCH-$CURRENT_LINE
grep $CURRENT_LINE $FILE_LOCTN_LIST >> $CURRENT_DUPECHK_FILE
MATCH=`grep -c $CURRENT_LINE $FILE_BASENAME_LIST`
CMD_ECHO="$CURRENT_LINE matched $MATCH times," cmd_line_echo
echo "$CURRENT_DUPECHK_FILE" >> $FILE_DUPEMATCH_FILELIST
let UNIQ_SCAN_INIT_POINT=UNIQ_SCAN_INIT_POINT+1
done
On numerous occasions, when grepping for the current line in the file location list, it has put no output to the current dupechk file even though there have definitely been matches to the current line in the file location list (I ran the command in terminal with no issues).
I've rummaged around the internet to see if anyone else has had similar behaviour, and thus far all I have found is that it is something to do with buffered and unbuffered outputs from other commands operating before the grep command in the Bash script....
However no one seems to have found a solution, so basically I'm asking you guys if you have ever come across this, and any idea/tips/solutions to this problem...
Regards
Paul
The `problem' is the standard I/O library. When it is writing to a terminal
it is unbuffered, but if it is writing to a pipe then it sets up buffering.
try changing
CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
to
CURRENT LINE=`sed "$UNIQ_SCAN_INIT_POINT"'q;d' $UNIQ_LIST_FINAL`
Are there any directories with spaces in their names in $FILE_LOCTN_LIST? Because if they are, those spaces will need escaped somehow. Some combination of find and xargs can usually deal with that for you, especially xargs -0
A small bash script using md5sum and sort that detects duplicate files in the current directory:
CURRENT="" md5sum * |
sort |
while read md5sum filename;
do
[[ $CURRENT == $md5sum ]] && echo $filename is duplicate;
CURRENT=$md5sum;
done
you tagged linux, some i assume you have tools like GNU find,md5sum,uniq, sort etc. here's a simple example to find duplicate files
$ echo "hello world">file
$ md5sum file
6f5902ac237024bdd0c176cb93063dc4 file
$ cp file file1
$ md5sum file1
6f5902ac237024bdd0c176cb93063dc4 file1
$ echo "blah" > file2
$ md5sum file2
0d599f0ec05c3bda8c3b8a68c32a1b47 file2
$ find . -type f -exec md5sum "{}" \; |sort -n | uniq -w32 -D
6f5902ac237024bdd0c176cb93063dc4 ./file
6f5902ac237024bdd0c176cb93063dc4 ./file1

Resources