How do I determine the newest created file in a directory on a linux machine? - linux

I am trying to figure out a way to determine the most recent file created in a directory. I cannot use a module and I am on a Linux OS.

just a simple google gave me good answer
#list = `ls -t`;
$newest = $list[0];
or completely in perl
opendir(my $DH, $DIR) or die "Error opening $DIR: $!";
my %files = map { $_ => (stat("$DIR/$_"))[9] } grep(! /^\.\.?$/, readdir($DH));
closedir($DH);
my #sorted_files = sort { $files{$b} <=> $files{$a} } (keys %files);
$sorted_files[0] is the most-recently modified. If it isn't the actual
file-of-interest, you can iterate through #sorted_files until you find
the interesting file(s).

No you can not get the files on the basis of their birth date, as their is no linux command to get the birth date of a file, but of-course you can get the access, modification and change information about the file. To get the access, modification and change time information of any file use this :
stat file-name
Also, to get the most recent changed/modified file use this:
ls -ltr | tail -1

Try:
cd DIR
ls -l -rt | tail -1

The naughty IO::All method: ;-)
use IO::All;
use v5.20;
# sort files by their modification time and store in an array:
my #files = sort{$b->mtime <=> $a->mtime} io(".")->all_files;
# get the first/newest file from the sort:
say "$files[0] ". ~~localtime($files[0]->mtime);

Related

Find patterns and rename multiple files

I have a list of machine names and hostnames
ex)
# cat /tmp/machine_list.txt
[one]apple machine #1 myserver1
[two]apple machine #2 myserver2
[three]apple machine #3 myserver3
and, server each directory
and each directory contains an tar file and a file with the host name written on it.
# ls /tmp/sos1/*
sosreport1.tar.gz
hostname_map.txt
# cat /tmp/sos1/hostname_map.txt
myserver1
# ls /tmp/sos2/*
sosreport2.tar.gz
hostname_map.txt
# cat /tmp/sos2/hostname_map.txt
myserver2
# ls /tmp/sos3/*
sosreport3.tar.gz
hostname_map.txt
# cat /tmp/sos3/hostname_map.txt
myserver3
Is it possible to rename the sosreport*.tar.gz by referencing the hostname_map in each directory relative to the /tmp/machine_list.txt file? (like below)
# ls /tmp/sos1/*
[one]apple_machine_#1_myserver1_sosreport1.tar.gz
# ls /tmp/sos2/*
[two]apple_machine_#2_myserver2_sosreport2.tar.gz
# ls /tmp/sos3/*
[three]apple_machine_#3_myserver3_sosreport3.tar.gz
A single change is possible, but what about multiple changes?
Something like this?
srvname () {
awk -v srv="$(cat "$1")" -F '\t' '$2==srv { print $1; exit }' machine_list.txt
}
for dir in /tmp/sos*/; do
server=$(srvname "$dir"/hostname_map.txt)
mv "$dir"/sosreport*.tar.gz "$dir/$server.tar.gz"
done
Demo: https://ideone.com/TS5VyQ
The function assumes your mapping file is tab-delimited. If you want underscores instead of spaces in the server names, change the mapping file.
This should be portable to POSIX sh; the cat could be replaced with a Bash redirection, but I feel that it's not worth giving up portability for such a small change.
If this were my project, I'd probably make the function into a self-contained reusable script (with the input file replaced with a here document in the script itself) since there will probably be more situations where you need to perform the same mapping.

How not to process files which were already processed before? (they will not be zipped or renamed)

I have read-only access to the folder containing lot of logs with names starting with SystemOut*:
SystemOut_15.03.12_1215124.log
SystemOut_15.03.12_23624.log
SystemOut_15.03.02_845645.log
SystemOut_15.03.14_745665.log
SystemOut_15.03.16_456457.log
SystemOut_15.03.07_474574.log
The logs are not zipped or renamed.
What I need to implement is to parse them in such a way that the logs already processed will not be processed again. Also, the mandatory condition is not to process the log with the latest modification date&time.
I would potentially think I need to create a separate file on a location I have write access with the log names my script has already processed?
Grateful if you could provide some suggestions and how to implement them.
Thanks
I agree keeping track of the logs you have already processed in a separate file is a good idea. It's not clear from your question how you will identify the current log, so I leave that in your court.
Try something like this:
mysavedfiles=/some/path/file.txt
curfile=$(ls -tr | tail -n 1)
for fn in logfiles/*.log
do
if ! grep -q $fn $mysavedfiles && [ "$fn" != "$curfile" ]
then
... process it ...
echo $fn >>$mysavedfiles
fi
done
You could also exclude the last file by changing to a while read loop fed by some processing:
ls -tr logfile/*.log | head -n -1 | while read fn
do
....
done

perl while loop

In this code I parse a file (containing the output from ls -lrt) for a log file's modification date. Then I move all log files into a new folder with their modification dates added to the filenames, and than making a tar of all those files.
The problem I am getting is in the while loop. Because it's reading the data for all the files the while loop keeps on running 15 times. I understand that there is some issue in the code but I can't figure it out.
Inside the while loop I am splitting the ls -lrt records to find the log file modified date. $file is the output of the ls command that I am storing in the text file /scripts/yagya.txt in order to get the modification date. But the while loop is executing 15 times since there are 15 log files in the folder which match the pattern.
#!/usr/bin/perl
use File::Find;
use strict;
my #field;
my $filenew;
my $date;
my $file = `ls -lrt /scripts/*log*`;
my $directory="/scripts/*.log";
my $current = localtime;
my $current_time = $current;
$current_time = s/\s+//g;
my $freetime = $current_time;
my $daytime = substr($current_time,0,8);
my $seconddir = "/$freetime/";
system ("mkdir $seconddir");
open (MYFILE,">/scripts/yagya.txt");
print MYFILE "$file";
close (MYFILE);
my $data = "/scripts/yagya.txt";
my $datas = "/scripts/";
my %options = (
wanted => \&wanted,
untaint => 1
);
find (\%options, $datas);
sub wanted {
if (/[._]log\d*$/){
my $files;
my #fields;
my $fields;
chomp;
$files=$_;
open (MYFILE,$data);
while(<MYFILE>){
chop;
s/#.*//;
next unless /\S/;
#fields = (split)[5,6,7];
$fields = join('',#fields), "\n";
}
close (MYFILE);
system ("mv $files $seconddir$fields$files");
}
}
system ("tar cvf /$daytime/$daytime.tar.gz /$daytime/*log*");
system ("rm $seconddir*log*");
system ("rm $data");
Your code is very difficult to read. It looks like you have written the program as a single big chunk before you started to test it. That way of working is common but very wrong. You should start by implementing a small part of the program and testing that before you add a little more functionality, test again, and so on. That way you won't be overwhelmed with fixing many problems at once in a large untested program.
It would also help you a lot if you added use warnings to your use strict at the top of the program. It helps to catch simple errors that you may overlook.
Also, are you aware that File::Find will call your wanted callback subroutine every time it encounters a file? It doesn't pass all the files at once.
The problem seems to be that you are reading all the way through the yagya.txt file when you should be stopping when you find the record that matches the current file that File::Find has found. What you need to do is to check whether the current record in the ls output ends with the name of the current file. If you write the loop like this
while (<MYFILE>) {
if (/\Q$files\E$/) {
my #fields = (split)[5,6,7];
$fields = join('',#fields);
last;
}
}
then $fields will end up with the modification date of the current file, which is what you want.
But this would be a thousand times easier if you used Perl to read the file modification date for you.
Instead of writing an ls listing to a file and reading it back, you should do something like this
use File::stat;
my $mtime = localtime(stat($files)->mtime);
which will give you a string like Wed Jun 13 11:25:23 2012. The date from my ls output includes only the month name, day of month, and time of day, like Jun 8 12:37. That isn't very specific and you perhaps should at least include a year, but to generate the same string from this $mtime you can write
my $fields = join '', (split ' ', $mtime)[1,2,3];
There is a lot more I could say about your program, but I hope this gets it going for you for now.
Another couple of things I have noticed:
The line $current_time = s/\s+//g should be $current_time =~ s/\s+//g to remove all spaces from the current time string
A value like Sun Jun 3 11:50:54 2012 will be reduced to SunJun311:53:552012, and $daytime will then take the value SunJun31 which is incorrect
I'm usually not recommending using bash instead of perl, but sometimes it is much shorter
this problem has 2 parts:
rename files into another directory and adding timestamp into the filenames
archive them by every minutes or hours, days ... etc..
for 1.)
find ./scripts -name \*[_.]log\* -type f -printf "%p\0./logs/%TY%Tm%Td-%TH%Tk%TM-%f\0" | xargs -0 -L 2 mv
The above will find all plain files with [_.]log in their names and rename them into the ./logs directory with timestamp prefix. e.g.
./scripts/aaa.log12 get renamed into ./logs/20120403-102233-aaa.log12
2.) archiving
ls logs | sed 's/\(........-....\).*/\1/' | sort -u | while read groupby
do
( cd logs && echo tar cvzf ../$groupby.tgz $groupby* )
done
this will create tar archives by timestamp-prefix. (Assumed than the ./logs contain only files with valid (timestamped) filenames)
Of course, the above sed pattern is not nice, but clearly shows deleting seconds from the timestamp - so it is creating archives by minutes. If want another grouping, you can use:
sed 's/\(........-..\).*/\1/' - by hours
sed 's/\(........\).*/\1/' - by days
Other:
the -printf for find is supported only in gnu version of find - common in Linux
usually not a good practice working directly in '/', like /scripts, therefore my example uses ./
if in your ./scrips subtree exists the same filename with the same timestamp, the mv will overwrite the first, e.g. both of ./scripts/a/a.log and ./scripts/x/a.log with the same timestamp will be renamed into ./logs/TIMESTAMP-a.log

"Copy failed: File too large" error in perl

Ok so i have 6.5 Million images in a folder and I need to get them moved asap. I will be moving them into their own folder structure but first I must get them moved off this server.
I tried rsync and cp and all sorts of other tools but they always end up erroring out. So i wrote a perl script to pull the information in a more direct method. Using opendir and having it count all the files works perfect. It can count them all in about 10 seconds. Now I try to just step my script up one more notch and have it actually move the files and I get the error "File too large". This must be some sort of false error as the files themselves are all fairly small.
#!/usr/bin/perl
#############################################
# CopyFilesLite
# Russell Perkins
# 7/12/2010
#
# Tool is used to copy millions of files
# while using as little memory as possible.
#############################################
use strict;
use warnings;
use File::Copy;
#dir1, dir2 passed from command line
my $dir1 = shift;
my $dir2 = shift;
#Varibles to keep count of things
my $count = 0;
my $cnt_FileExsists = 0;
my $cnt_FileCopied = 0;
#simple error checking and validation
die "Usage: $0 directory1 directory2\n" unless defined $dir2;
die "Not a directory: $dir1\n" unless -d $dir1;
die "Not a directory: $dir2\n" unless -d $dir2;
opendir DIR, "$dir1" or die "Could not open $dir1: $!\n";
while (my $file = readdir DIR){
if (-e $dir2 .'/' . $file){
#print $file . " exsists in " . $dir2 . "\n"; #debuging
$cnt_FileExsists++;
}else{
copy($dir1 . '/' . $file,$dir2 . '/' . $file) or die "Copy failed: $!";
$cnt_FileCopied++;
#print $file . " does not exsists in " . $dir2 . "\n"; #debuging
}
$count++;
}
closedir DIR;
#ToDo: Clean up output.
print "Total files: $count\nFiles not copied: $cnt_FileExsists\nFiles Copied: $cnt_FileCopied\n\n";
So have any of you ran into this before? What would cause this and how can it be fixed?
On your error handling code, could you please change or die "Copy failed: $!"; to 'or die "Copy failed: '$dir1/$file' to '$dir2/$file': $!";' ?
Then it should tell you where the error happens.
Then check 2 things -
1) Does it fail every time on the same file?
2) Is that file somehow special? Weird name? Unusual size? Not a regular file? Not a file at all (as the other answer theorized)?
I am not sure if this is related to your problem, but readdir will return a list of all directory contents, including subdirectories, if present, and the current (.) and parent directories (..) on many operating systems. You may be attempting to copy directories as well as files.
The following will not attempt to copy any directories:
while (my $file = readdir DIR){
next if -d "$dir1/$file";
Seems this was an issue either with my nfs mount of the server that it was mounted to. I hooked up a usb drive to it and the files are copying with extreme speed...if you count usb 2 as extreme.
6.5 million images in one folder is very extreme and puts a load on the machine just to read a directory, whether it's in shell or Perl. That's one big folder structure.
I know you're chasing a solution in Perl now, but when dealing with that many files from the shell you'll want to take advantage of the xargs command. It can help a lot by grouping the files into manageable chunks. http://en.wikipedia.org/wiki/Xargs
Maybe the file system of partition you are send the data to do not support very large data.

How to compare two tarball's content

I want to tell whether two tarball files contain identical files, in terms of file name and file content, not including meta-data like date, user, group.
However, There are some restrictions:
first, I have no control of whether the meta-data is included when making the tar file, actually, the tar file always contains meta-data, so directly diff the two tar files doesn't work.
Second, since some tar files are so large that I cannot afford to untar them in to a temp directory and diff the contained files one by one. (I know if I can untar file1.tar into file1/, I can compare them by invoking 'tar -dvf file2.tar' in file/. But usually I cannot afford untar even one of them)
Any idea how I can compare the two tar files? It would be better if it can be accomplished within SHELL scripts. Alternatively, is there any way to get each sub-file's checksum without actually untar a tarball?
Thanks,
Try also pkgdiff to visualize differences between packages (detects added/removed/renamed files and changed content, exist with zero code if unchanged):
pkgdiff PKG-0.tgz PKG-1.tgz
Are you controlling the creation of these tar files?
If so, the best trick would be to create a MD5 checksum and store it in a file within the archive itself. Then, when you want to compare two files, you just extract this checksum files and compare them.
If you can afford to extract just one tar file, you can use the --diff option of tar to look for differences with the contents of other tar file.
One more crude trick if you are fine with just a comparison of the filenames and their sizes.
Remember, this does not guarantee that the other files are same!
execute a tar tvf to list the contents of each file and store the outputs in two different files. then, slice out everything besides the filename and size columns. Preferably sort the two files too. Then, just do a file diff between the two lists.
Just remember that this last scheme does not really do checksum.
Sample tar and output (all files are zero size in this example).
$ tar tvfj pack1.tar.bz2
drwxr-xr-x user/group 0 2009-06-23 10:29:51 dir1/
-rw-r--r-- user/group 0 2009-06-23 10:29:50 dir1/file1
-rw-r--r-- user/group 0 2009-06-23 10:29:51 dir1/file2
drwxr-xr-x user/group 0 2009-06-23 10:29:59 dir2/
-rw-r--r-- user/group 0 2009-06-23 10:29:57 dir2/file1
-rw-r--r-- user/group 0 2009-06-23 10:29:59 dir2/file3
drwxr-xr-x user/group 0 2009-06-23 10:29:45 dir3/
Command to generate sorted name/size list
$ tar tvfj pack1.tar.bz2 | awk '{printf "%10s %s\n",$3,$6}' | sort -k 2
0 dir1/
0 dir1/file1
0 dir1/file2
0 dir2/
0 dir2/file1
0 dir2/file3
0 dir3/
You can take two such sorted lists and diff them.
You can also use the date and time columns if that works for you.
tarsum is almost what you need. Take its output, run it through sort to get the ordering identical on each, and then compare the two with diff. That should get you a basic implementation going, and it would be easily enough to pull those steps into the main program by modifying the Python code to do the whole job.
Here is my variant, it is checking the unix permission too:
Works only if the filenames are shorter than 200 char.
diff <(tar -tvf 1.tar | awk '{printf "%10s %200s %10s\n",$3,$6,$1}'|sort -k2) <(tar -tvf 2.tar|awk '{printf "%10s %200s %10s\n",$3,$6,$1}'|sort -k2)
EDIT: See the comment by #StéphaneGourichon
I realise that this is a late reply, but I came across the thread whilst attempting to achieve the same thing. The solution that I've implemented outputs the tar to stdout, and pipes it to whichever hash you choose:
tar -xOzf archive.tar.gz | sort | sha1sum
Note that the order of the arguments is important; particularly O which signals to use stdout.
Is tardiff what you're looking for? It's "a simple perl script" that "compares the contents of two tarballs and reports on any differences found between them."
There is also diffoscope, which is more generic, and allows to compare things recursively (including various formats).
pip install diffoscope
I propose gtarsum, that I have written in Go, which means it will be an autonomous executable (no Python or other execution environment needed).
go get github.com/VonC/gtarsum
It will read a tar file, and:
sort the list of files alphabetically,
compute a SHA256 for each file content,
concatenate those hashes into one giant string
compute the SHA256 of that string
The result is a "global hash" for a tar file, based on the list of files and their content.
It can compare multiple tar files, and return 0 if they are identical, 1 if they are not.
Just throwing this out there since none of the above solutions worked for what I needed.
This function gets the md5 hash of the md5 hashes of all the file-paths matching a given path. If the hashes are the same, the file hierarchy and file lists are the same.
I know it's not as performant as others, but it provides the certainty I needed.
PATH_TO_CHECK="some/path"
for template in $(find build/ -name '*.tar'); do
tar -xvf $template --to-command=md5sum |
grep $PATH_TO_CHECK -A 1 |
grep -v $PATH_TO_CHECK |
awk '{print $1}' |
md5sum |
awk "{print \"$template\",\$1}"
done
*note: An invalid path simply returns nothing.
If not extracting the archives nor needing the differences, try diff's -q option:
diff -q 1.tar 2.tar
This quiet result will be "1.tar 2.tar differ" or nothing, if no differences.
There is tool called archdiff. It is basically a perl script that can look into the archives.
Takes two archives, or an archive and a directory and shows a summary of the
differences between them.
I have a similar question and i resolve it by python, here is the code.
ps:although this code is used to compare two zipball's content,but it's similar with tarball, hope i can help you
import zipfile
import os,md5
import hashlib
import shutil
def decompressZip(zipName, dirName):
try:
zipFile = zipfile.ZipFile(zipName, "r")
fileNames = zipFile.namelist()
for file in fileNames:
zipFile.extract(file, dirName)
zipFile.close()
return fileNames
except Exception,e:
raise Exception,e
def md5sum(filename):
f = open(filename,"rb")
md5obj = hashlib.md5()
md5obj.update(f.read())
hash = md5obj.hexdigest()
f.close()
return str(hash).upper()
if __name__ == "__main__":
oldFileList = decompressZip("./old.zip", "./oldDir")
newFileList = decompressZip("./new.zip", "./newDir")
oldDict = dict()
newDict = dict()
for oldFile in oldFileList:
tmpOldFile = "./oldDir/" + oldFile
if not os.path.isdir(tmpOldFile):
oldFileMD5 = md5sum(tmpOldFile)
oldDict[oldFile] = oldFileMD5
for newFile in newFileList:
tmpNewFile = "./newDir/" + newFile
if not os.path.isdir(tmpNewFile):
newFileMD5 = md5sum(tmpNewFile)
newDict[newFile] = newFileMD5
additionList = list()
modifyList = list()
for key in newDict:
if not oldDict.has_key(key):
additionList.append(key)
else:
newMD5 = newDict[key]
oldMD5 = oldDict[key]
if not newMD5 == oldMD5:
modifyList.append(key)
print "new file lis:%s" % additionList
print "modified file list:%s" % modifyList
shutil.rmtree("./oldDir")
shutil.rmtree("./newDir")

Resources