Linux:How to list the information about file or directory(size,permission,number of files by type?) in total - linux

Suppose I am staying in currenty directory, I wanted to list all the files in total numbers, as well as the size, permission, and also the number of files by types.
here is the sample outputs:
Here is a sample :
Print information about "/home/user/poker"
total number of file : 83
pdf files : 5
html files : 9
text files : 15
unknown : 5
NB: anyfile without extension could be consider as unknown.
i hope to use some simple command like ls, cut, sort, unique ,(just examples) put each different extension in file and using wc -l to count number of lines
or do i need to use grep, awk , or something else?
Hope to get the everybody's advices.thank you!

Best way is to use file to output only mimetype and pass it to awk.
file * -ib | awk -F'[;/.]' '{print $(NF-1)}' | sort -n | uniq -c
On my home directory it produces this output.
35 directory
3 html
1 jpeg
1 octet-stream
1 pdf
32 plain
5 png
1 spreadsheet
7 symlink
1 text
1 x-c++
3 x-empty
1 xml
2 x-ms-asf
4 x-shellscript
1 x-shockwave-flash
If you think text/x-c++ and text/plain should be in same Use this
file * -ib | awk -F'[;/.]' '{print $1}' | sort -n | uniq -c
6 application
6 image
45 inode
40 text
2 video
Change the {print $1} part according to your need to get the appropriate output.

You need bash.
files=(*)
pdfs=(*.pdf)
echo "${#files[#]}"
echo "${#pdfs[#]}"
echo "$((${#files[#]}-${#pdfs[#]}))"

find . -type f | xargs -n1 basename | fgrep . | sed 's/.*\.//' | sort | uniq -c | sort -n
That gives you a recursive list of file extensions. If you want only the current directory add a -maxdepth 1 to the find command.

Related

Write a command to display text file name and its size in different lines in linux

I want to display text file name and its size in different lines
I have tried
du *.* | cut -f 1
This give me only size of the files in given directory
du *.* | cut -f 2
This gives the filenames
But i could't figure out how to format it in way where the size comes first then the file name.
example :
4
file1.txt
5
file2.txt
I just figured it out this is working as expected.
du *.txt* | tr [:space:] '\n'
You can do some awk scripting:
for file in *
do
echo "$file $(du "$file" | awk '{print $1}')"
done

Validating file records shell script

I have a file with content as follows and want to validate the content as
1.I have entries of rec$NUM and this field should be repeated 7 times only.
for example I have rec1.any_attribute this rec1 should come only 7 times in whole file.
2.I need validating script for this.
If records for rec$NUM are less than 7 or Greater than 7 script should report that record.
FILE IS AS FOLLOWS :::
rec1:sourcefile.name=
rec1:mapfile.name=
rec1:outputfile.name=
rec1:logfile.name=
rec1:sourcefile.nodename_col=
rec1:sourcefle.snmpnode_col=
rec1:mapfile.enc=
rec2:sourcefile.name=abc
rec2:mapfile.name=
rec2:outputfile.name=
rec2:logfile.name=
rec2:sourcefile.nodename_col=
rec2:sourcefle.snmpnode_col=
rec2:mapfile.enc=
rec3:sourcefile.name=abc
rec3:mapfile.name=
rec3:outputfile.name=
rec3:logfile.name=
rec3:sourcefile.nodename_col=
rec3:sourcefle.snmpnode_col=
rec3:mapfile.enc=
Please Help
Thanks in Advance... :)
Simple awk:
awk -F: '/^rec/{a[$1]++}END{for(t in a){if(a[t]!=7){print "Some error for record: " t}}}' test.rc
grep '^rec1' file.txt | wc -l
grep '^rec2' file.txt | wc -l
grep '^rec3' file.txt | wc -l
All above should return 7.
The commands:
grep rec file2.txt | cut -d':' -f1 | uniq -c | egrep -v '^ *7'
will success if file follows your rules, fails (and returns the failing record) if it doesn't.
(replace "uniq -c" by "sort -u" if record numbers can be mixed).

bash script - print X rows from a seleccted file from a folder

I'm trying to write a script which help to follows the logs of my application.
The logs of my application are written to "var/log/MyLogs/" with the following pattern:
runningNumber_XXX.txt , for example:
0_XXX.txt
37_xxx.txt
99_xxx.txt
101_xxx.txt
103_xxx.txt
I'm trying to write a bash script (without a success for now) which will print last 20 rows of the last log file (the last log file is the file with has the biggest prefix number).
I know I need to go over the files in the folder (for file in /var/log/MyLogs/*) and check which file name has the biggest prefix, and after it print the last 20 rows from the selected file.
please help me....
Thanks...
find /var/log/MyLogs -iname '*_xxx.txt' | sort -n | tail -1 | xargs tail -20
Get correct files
Sort numerically
Get last log file
Get last 20 rows
tail -20 $(ls -1 /var/log/MyLogs/*_*.txt | sort -n -t _ -k 1 -r | head -1)
ls -1 [0-9]*_XXX.txt | sort -rn | head -1 | xargs tail -20
Usually is the bad practice using ls in shell scripts, but if you can ensure than the logfiles doesn't contains spaces and other strange characters, you can use a simple:
tail -20 $(ls -t1 /var/log/[0-9]*_XXX.txt | head -1)
The:
ls -t sorts the files my modification time newest comes first
head the the 1st
tail print the last lines
AGAIN, this is usually a bad practice, you can use it only when you knows what you're doing.

how to compare output of two ls in linux

So here is the task which I can't solve. I have a directory with .h files and a directory with .i files, which have the same names as the .h files. I want just by typing a command to have all .h files which are not found as .i files. It's not a hard problem, I can do it in some programming language, but I'm just curious how it will look like in cmd :). To be more specific here is the algo:
get file names without extensions from ls *.h
get file names without extensions from ls *.i
compare them
print all names from 1 that are not met in 2
Good luck!
diff \
<(ls dir.with.h | sed 's/\.h$//') \
<(ls dir.with.i | sed 's/\.i$//') \
| grep '$<' \
| cut -c3-
diff <(ls dir.with.h | sed 's/\.h$//') <(ls dir.with.i | sed 's/\.i$//') executes ls on the two directories, cuts off the extensions, and compares the two lists. Then grep '$<' finds the files that are only in the first listing, and cut -c3- cuts off the "< " characters that diff inserted.
ls ./dir_h/*.h | sed -r -n 's:.*dir_h/([^.]*).h$:dir_i/\1.i:p' | xargs ls 2>&1 | \
grep "No such file or directory" | awk '{print $4}' | sed -n -r 's:dir_i/([^:]*).*:dir_h/\1:p'
ls -1 dir1/*.hh dir2/*.ii | awk -F"/" '{print $NF}' |awk -F"." '{a[$1]++;b[$0]}END{for(i in a)if(a[i]==1 && b[i".hh"]) print i}'
explanation:
ls -1 dir1/*.hh dir2/*.ii
above will list all the files *.hh and *.ii files in both the directories.
awk -F"/" '{print $NF}'
above will just print the file name excluding the complete path of the file.
awk -F"." '{a[$1]++;b[$0]}END{for(i in a)if(a[i]==1 && b[i".hh"]) print i}'
above will create two associative arrays one with file name and one with excluding the extension.
if both hh and ii files exist the value in the assosciative array will 2 if there is only one file then the value will be 1.so we need array item whose value is 1 and it should be a header file (.hh).
this can be checked using the asso..array b which is done in the END block.
Assuming bash is your shell:
for file in $( ls dir_with_h/*.h ); do
name=${file%\.h}; # trim trailing ".h" file extension
name=${name#dir_with_h/}; # trim leading folder name
if [ ! -e dir_with_i/${name}.i ]; then
echo ${name};
fi
done
Undoubtedly this can be ported to virtually all other shells. I find this less cryptic than some other approaches (although this is surely my problem) but it is a little wordy. As such. a shell script might help recall it.

Looping through a text file containing domains using bash script

I have written a script that reads href tag of a webpage and fetches the links on that webpage and writes them to a text file. Now I have a text file containing links such as these for example:
http://news.bbc.co.uk/2/hi/health/default.stm
http://news.bbc.co.uk/weather/
http://news.bbc.co.uk/weather/forecast/8?area=London
http://newsvote.bbc.co.uk/1/shared/fds/hi/business/market_data/overview/default.stm
http://purl.org/dc/terms/
http://static.bbci.co.uk/bbcdotcom/0.3.131/style/3pt_ads.css
http://static.bbci.co.uk/frameworks/barlesque/2.8.7/desktop/3.5/style/main.css
http://static.bbci.co.uk/frameworks/pulsesurvey/0.7.0/style/pulse.css
http://static.bbci.co.uk/wwhomepage-3.5/1.0.48/css/bundles/ie6.css
http://static.bbci.co.uk/wwhomepage-3.5/1.0.48/css/bundles/ie7.css
http://static.bbci.co.uk/wwhomepage-3.5/1.0.48/css/bundles/ie8.css
http://static.bbci.co.uk/wwhomepage-3.5/1.0.48/css/bundles/main.css
http://static.bbci.co.uk/wwhomepage-3.5/1.0.48/img/iphone.png
http://www.bbcamerica.com/
http://www.bbc.com/future
http://www.bbc.com/future/
http://www.bbc.com/future/story/20120719-how-to-land-on-mars
http://www.bbc.com/future/story/20120719-road-opens-for-connected-cars
http://www.bbc.com/future/story/20120724-in-search-of-aliens
http://www.bbc.com/news/
I would like to be able to filter them such that I return something like:
http://www.bbc.com : 6
http://static.bbci.co.uk: 15
The values on the the side indicate the number of times the domain appears in the file. How can i be able to achieve this in bash considering I would have a loop going through the file. I am a newbie to bash shell scripting?
$ cut -d/ -f-3 urls.txt | sort | uniq -c
3 http://news.bbc.co.uk
1 http://newsvote.bbc.co.uk
1 http://purl.org
8 http://static.bbci.co.uk
1 http://www.bbcamerica.com
6 http://www.bbc.com
Just like this
egrep -o '^http://[^/]+' domain.txt | sort | uniq -c
Output of this on your example data:
3 http://news.bbc.co.uk/
1 http://newsvote.bbc.co.uk/
1 http://purl.org/
8 http://static.bbci.co.uk/
6 http://www.bbc.com/
1 http://www.bbcamerica.com/
This solution works even if your line is made up of a simple url without a trailing slash, so
http://www.bbc.com/news
http://www.bbc.com/
http://www.bbc.com
will all be in the same group.
If you want to allow https, then you can write:
egrep -o '^https?://[^/]+' domain.txt | sort | uniq -c
If other protocols are possible, such as ftp, mailto, etc. you can even be very loose and write:
egrep -o '^[^:]+://[^/]+' domain.txt | sort | uniq -c

Resources