how to count the number of files an extension was just added to? - linux

so I just added the extension .txt to all files in a directory, I want to go beyond that and now count the number of files whose extension I just changed. Any help is appreciated!

To know the number of .txt files, you can simply do ls | grep '.txt$' | wc -l
To know the number of file you change, you need to either count them while you change the extension, or count the number before, the number after, and substract them.
This last method can be done like this:
oldnum="$(ls | grep '.txt$' | wc -l)"
# Do the rename here
newnum="$(ls | grep '.txt$' | wc -l)"
result=$((newnum - oldnum)) # $result now hold the number of renamed files

I hope you didn't forget an hour when you had modified files.
For example, if you have modified files 1 hour ago, just run in working directory:
find . -maxdepth 1 -type f -name '*\.txt' -cmin -65
This code will print all the files with *.txt name who were modified less than 65 minutes ago.

Related

How to list files in a directory, sorted by size, but without listing folder sizes?

I'm writing a bash script that should output the first 10 heaviest files in a directory and all subfolders (the directory is passed to the script as an argument).
And for this I use the following command:
sudo du -haS ../../src/ | sort -hr
, but its output contains folder sizes, and I only need files. Help!
Why using du at all? You could do a
ls -S1AF
This will list all entries in the current directoriy, sorted descending by size. It will also include the names of the subdirectories, but they will be at the end (because the size of a directory entry is always zero), and you can recognize them because they have a slash at the end.
To exclude those directories and pick the first 10 lines, you can do a
ls -S1AF | head -n 10 | grep -v '/$'
UPDATE:
If your directory contains not only subdirectories, but also files of length zero, some of those empty files might not be shown in the output, as pointed out in the comment by F.Hauri. If this is an issue for your application, I suggest to exchange the order and do a
ls -S1AF | grep -v '/$' | head -n 10
instead.
Would you please try the following:
dir="../../src/"
sudo find "$dir" -type f -printf "%s\t%p\n" | sort -nr | head -n 10 | cut -f2-
find "$dir" -type f searches $dir for files recursively.
The -printf "%s\t%p\n" option tells find to print the filesize
and the filename delimited by a tab character.
The final cut -f2- in the pipeline prints the 2nd and the following
columns, dropping the filesize column only.
It will work with the filenames which contain special characters such as a whitespace except for
a newline character.

Shell script to find and count total number of characters in all the files

I'm struggling to make a script to find every file in your home directory that is less than 3 days old and then get a count of the total number of characters in all of these files.
Any suggestions?
Thanks.
Below command should work from the current directory.
find ./ -ctime -3 | xargs wc -c
Below should work for home directory.
find ~ -ctime -3 | xargs wc -c

Bash script that writes subdirectories who has more than 5 files

while I was trying to practice my linux skills, but I could not solve this question.
So its basically saying "Write a bash script that takes a name of
directory as a command argument and printf the name of subdirectories
that has more than 5 files in it."
I thought we will use the find command but ı still could not figure it out. My code is:
find directory -type d -mindepth5
but it's not working.
You can use find twice:
First you can use find and wc to count the number of files in a given directory:
nb=$(find directory -maxdepth 1 -type f -printf "x\n" | wc -l)
This just asks find to output an x on a line for each file in the directory directory, proceeding non-recursively, then wc -l counts the number of lines, so, really, nb is the number of files in directory.
If you want to know whether a directory contains more than 5 files, it's a good idea to stop find as soon as 6 files are found:
nb=$(find directory -maxdepth 1 -type f -printf "x\n" | head -6 | wc -l)
Here nb has an upper threshold of 6.
Now if for each subdirectory of a directory directory you want to output the number of files (threshold at 6), you can do this:
find directory -type d -exec bash -c 'nb=$(find "$0" -maxdepth 1 -type f -printf "x\n" | head -6 | wc -l); echo "$nb"' {} \;
where the $0 that appears is the 0-th argument, namely {} that find will replaced by the subdirectory of directory.
Finally, you only want to display the subdirectory name if the number of files is more than 5:
find . -type d -exec bash -c 'nb=$(find "$0" -maxdepth 1 -type f -printf "x\n" | head -6 | wc -l); ((nb>5))' {} \; -print
The final test ((nb>5)) returns success or failure whether nb is greater than 5 or not, and in case of success, find will -print the subdirectory name.
This should do the trick:
find directory/ -type f | sed 's/\(.*\)\/.*/\1/g' | sort | uniq -c | sort -n | awk '{if($1>5) print($2)}'
Using mindpeth is useless here since it only lists directories at at least depth 5. You say you need subdirectories with more then 5 files in it.
find directory -type f prints all files in subdirectories
sed 's/\(.*\)\/.*/\1/g' removes names of files leaving only list of subdirecotries without filenames
sort sorts that list so we can use uniq
uniq -c merges duplicate lines and writes how many times it occured
sort -n sorts it by number of occurences (so you end up with a list:(how many times, subdirectory))
awk '{if($1>5) print($2)}' prints only those with first comlun 1 > 5 (and it only prints the second column)
So you end up with a list of subdirectories with at least 5 files inside.
EDIT:
A fix for paths with spaces was proposed:
Instead of awk '{if($1>5) print($2)}' there should be awk '{if($1>5){ $1=""; print(substr($0,2)) }}' which sets first part of line to "" and then prints whole line without a leading space (which was delimiter). So put together we get this:
find directory/ -type f | sed 's/\(.*\)\/.*/\1/g' | sort | uniq -c | sort -n | awk '{if($1>5){ $1=""; print(substr($0,2)) }}'

Linux/Bash - Create Loop for "find" to output to a file

Apologies if this has been answered, I'm somewhat new to Linux but I didn't see anything here that was on target.
Anyway, I'm running this command:
find 2013-12-28 -name '*.gz' | xargs zcat | gzip > /fast/me/2013-12-28.csv.gz
The issue is that I need to run this command for about 250 distinct dates, so doing this one at a time is quite tedious.
What I want to do is have a script that will increment the date by 1 day after the "find" and in the file name. I really don't even know what this would look like, what commands to use, etc.
Background:
The find command is being used in a folder that's full of folders, each for 1 day of data. Each day's folder contains 24 subfolders, with each subfolder containing about 100 gzipped CSV files. So the find command is necessary 2 levels up from the folder because it will scan through each folder to combine all the data. The end result is that all the zipped up files are combined into 1 large zipped up file.
If anyone can help it would be hugely appreciated, otherwise I have about 250 more commands to execute, which obviously will suck.
What about something like this?
prev_date="2013-12-28"
for i in {0..250}; do
next_date=$(date -d"$prev_date +1 day" +%Y-%m-%d)
prev_date=$next_date
find $next_date -name '*.gz' | xargs zcat | gzip > /fast/me/$next_date.csv.gz
done
It should iterate through 250 dates like:
2014-08-27
2014-08-28
2014-08-29
2014-08-30
2014-08-31
2014-09-01
2014-09-02
2014-09-03
2014-09-04
2014-09-05
jmunsch's solution works very well if the dates are sequential. Otherwise you could do this:
(edited to replace dash characters with colons)
for folderName in $(find . -type d -mindepth 1 -maxdepth 1 )
do
date=$(basename $folderName)
dateWithColons=$(echo $date | sed "s#-#:#g") # this will replace - with :
find "$folderName" -name '*.gz' | xargs zcat | gzip > /fast/me/$dateWithColons.csv.gz
done

Finding and Listing Duplicate Words in a Plain Text file

I have a rather large file that I am trying to make sense of.
I generated a list of my entire directory structure that contains a lot of files using the du -ah command.
The result basically lists all the folders under a specific folder and the consequent files inside the folder in plain text format.
eg:
4.0G ./REEL_02/SCANS/200113/001/Promise Pegasus/BMB 10/RED EPIC DATA/R3D/18-09-12/CAM B/B119_0918NO/B119_0918NO.RDM/B119_C004_0918XJ.RDC/B119_C004_0918XJ_003.R3D
3.1G ./REEL_02/SCANS/200113/001/Promise Pegasus/BMB 10/RED EPIC DATA/R3D/18-09-12/CAM B/B119_0918NO/B119_0918NO.RDM/B119_C004_0918XJ.RDC/B119_C004_0918XJ_004.R3D
15G ./REEL_02/SCANS/200113/001/Promise Pegasus/BMB 10/RED EPIC DATA/R3D/18-09-12/CAM B/B119_0918NO/B119_0918NO.RDM/B119_C004_0918XJ.RDC
Is there any command that I can run or utility that I can use that will help me identify if there is more than one record of the same filename (usually the last 16 characters in each line + extension) and if such duplicate entries exist, to write out the entire path (full line) to a different text file so i can find and move out duplicate files from my NAS, using a script or something.
Please let me know as this is incredibly stressful to do when the plaintext file itself is 5.2Mb :)
Split each line on /, get the last item (cut cannot do it, so revert each line and take the first one), then sort and run uniq with -d which shows duplicates.
rev FILE | cut -f1 -d/ | rev | sort | uniq -d
I'm not entirely sure what you want to achieve here, but I have the feeling that you are doing it in a difficult way anyway :) Your text file seems to contain spaces in files which make it hard to parse.
I take it that you want to find all files whose name is duplicate. I would start with something like:
find DIR -type f -printf '%f\n' | uniq -d
That means
DIR - look for files in this directory
'-type f' - print only files (not directories or other special files)
-printf '%f' - do not use default find output format, print only file name of each file
uniq -d - print only lines which occur multiple times
You may want to list only some files, not all of them. You can limit which files are taken into account by more rules to find. If you care only about *.R3D and *.RDC files you can use
find . \( -name '*.RDC' -o -name '*.R3D' \) -type f -printf '%f\n' | ...
If I wrongly guessed what you need, sorry :)
I think you are looking for fslint: http://www.pixelbeat.org/fslint/
It can find duplicate files, broken links, and stuff like that.
The following will scan the current subdirectory (using find) and print the full path to duplicate files. You can adapt it take a different action, e.g. delete/move the duplicate files.
while IFS="|" read FNAME LINE; do
# FNAME contains the filename (without dir), LINE contains the full path
if [ "$PREV" != "$FNAME" ]; then
PREV="$FNAME" # new filename found. store
else
echo "Duplicate : $LINE" # duplicate filename. Do something with it
fi
done < <(find . -type f -printf "%f|%p\n" | sort -s)
To try it out, simply copy paste that into a bash shell or save it as a script.
Note that:
due to the sort, the list of files will have to be loaded into memory before the loop begins so the performance will be affected by the number of files returned
the order the files appears after a sort will affect which files are treated as duplicates since the first occurence is assumed to be the original. The -s options ensures a stable sort, which means the order will be dictated by find.
A more straight-forward by less robust robust approach would be something along the lines of:
find . -type f -printf "%20f %p\n" | sort | uniq -D -w20 | cut -c 22-
That will print all files that have duplicate entries, assuming that the longest filename will be 30 characters long. The output differs from the solution above in all entries with the same name are listed (not N-1 entries as above).
You'll need to change the numbers in the find, uniq and cut commands to match the actual case. A number too small may result in false positives.
find . -type f -printf "%20f %p\n" | sort | uniq -D -w20 | cut -c 22-
---------------------------------- ---- ------------ ----------
| | | |
Find all files in current dir | | |
and subdirs and print out | print out all |
the filename (padded to 20 | entries that |
characters) followed by the | have duplicates |
full path | but only look at |
| the first 20 chars |
| |
Sort the output Discard the first
21 chars of each line

Resources