Find the latest log file from multiple servers - python-3.x

For our daily monitoring we need to access 16 servers of a particular application and find the latest log file on one of those servers (it usually generates on the first 8).
The problem is that this code is giving me the latest file from each server instead of providing the latest log file from the entire group of servers.
Also, since this is an hourly activity, once the file is processed, it gets archived, so many of the servers don't have any log files present in them at a particular time. Due to this, while the below code is getting executed, I get - ValueError: max() arg is an empty sequence response and the code stops at server 3 if server 4 does not have any log files.
I tried adding default = 0 argument to latest_file but it gives me the error message TypeError: expected str, bytes or os.PathLike object, not int
Can you please help me out here? I am using Python 3.8 and PyCharm.
This is what I have so far :
import glob
import os
import re
paths = [r'\\Server1\Logs\*.log',
r'\\Server2\Logs\*.log',
.....
r'\\Server16\Logs\*.log']
for path in paths:
list_of_files = glob.glob(path)
latest_file = max(list_of_files, key=os.path.getctime)
f = open(os.path.join(latest_file), "r")
print(latest_file)

Create the list first and then find the max.
import glob
import os
import re
paths = [r'\\Server1\Logs\*.log',
r'\\Server2\Logs\*.log',
.....
r'\\Server16\Logs\*.log']
list_of_files = []
for path in paths:
list_of_files.extend(glob.glob(path))
if list_of_files:
latest_file = max(list_of_files, key=os.path.getctime)
f = open(os.path.join(latest_file), "r")
print(latest_file)
else:
print("No log files found!")

Related

Python how to search files using regular expression [duplicate]

I recently started getting into Python and I am having a hard time searching through directories and matching files based on a regex that I have created.
Basically I want it to scan through all the directories in another directory and find all the files that ends with .zip or .rar or .r01 and then run various commands based on what file it is.
import os, re
rootdir = "/mnt/externa/Torrents/completed"
for subdir, dirs, files in os.walk(rootdir):
if re.search('(w?.zip)|(w?.rar)|(w?.r01)', files):
print "match: " . files
import os
import re
rootdir = "/mnt/externa/Torrents/completed"
regex = re.compile('(.*zip$)|(.*rar$)|(.*r01$)')
for root, dirs, files in os.walk(rootdir):
for file in files:
if regex.match(file):
print(file)
CODE BELLOW ANSWERS QUESTION IN FOLLOWING COMMENT
That worked really well, is there a way to do this if match is found on regex group 1 and do this if match is found on regex group 2 etc ? – nillenilsson
import os
import re
regex = re.compile('(.*zip$)|(.*rar$)|(.*r01$)')
rx = '(.*zip$)|(.*rar$)|(.*r01$)'
for root, dirs, files in os.walk("../Documents"):
for file in files:
res = re.match(rx, file)
if res:
if res.group(1):
print("ZIP",file)
if res.group(2):
print("RAR",file)
if res.group(3):
print("R01",file)
It might be possible to do this in a nicer way, but this works.
Given that you are a beginner, I would recommend using glob in place of a quickly written file-walking-regex matcher.
Snippets of functions using glob and a file-walking-regex matcher
The below snippet contains two file-regex searching functions (one using glob and the other using a custom file-walking-regex matcher). The snippet also contains a "stopwatch" function to time the two functions.
import os
import sys
from datetime import timedelta
from timeit import time
import os
import re
import glob
def stopwatch(method):
def timed(*args, **kw):
ts = time.perf_counter()
result = method(*args, **kw)
te = time.perf_counter()
duration = timedelta(seconds=te - ts)
print(f"{method.__name__}: {duration}")
return result
return timed
#stopwatch
def get_filepaths_with_oswalk(root_path: str, file_regex: str):
files_paths = []
pattern = re.compile(file_regex)
for root, directories, files in os.walk(root_path):
for file in files:
if pattern.match(file):
files_paths.append(os.path.join(root, file))
return files_paths
#stopwatch
def get_filepaths_with_glob(root_path: str, file_regex: str):
return glob.glob(os.path.join(root_path, file_regex))
Comparing runtimes of the above functions
On using the above two functions to find 5076 files matching the regex filename_*.csv in a dir called root_path (containing 66,948 files):
>>> glob_files = get_filepaths_with_glob(root_path, 'filename_*.csv')
get_filepaths_with_glob: 0:00:00.176400
>>> oswalk_files = get_filepaths_with_oswalk(root_path,'filename_(.*).csv')
get_filepaths_with_oswalk: 0:03:29.385379
The glob method is much faster and the code for it is shorter.
For your case
For your case, you can probably use something like the following to get your *.zip,*.rar and *.r01 files:
files = []
for ext in ['*.zip', '*.rar', '*.r01']:
files += get_filepaths_with_glob(root_path, ext)
Here's an alternative using glob.
from pathlib import Path
rootdir = "/mnt/externa/Torrents/completed"
for extension in 'zip rar r01'.split():
for path in Path(rootdir).glob('*.' + extension):
print("match: " + path)
I would do it this way:
import re
from pathlib import Path
def glob_re(path, regex="", glob_mask="**/*", inverse=False):
p = Path(path)
if inverse:
res = [str(f) for f in p.glob(glob_mask) if not re.search(regex, str(f))]
else:
res = [str(f) for f in p.glob(glob_mask) if re.search(regex, str(f))]
return res
NOTE: per default it will recursively scan all subdirectories. If you want to scan only the current directory then you should explicitly specify glob_mask="*"

How do I get the latest file of the same name from a folder?

I have different files, some may have the same name, I need to select latest file of each.
I have this code:
import glob
import os
import pandas as pd
path = r'C:\Work\files\TestFolders\WebIQ'
files_path = os.path.join(path, '*')
files = []
# Get the latest Billing Code file
for i in os.listdir(path):
if 'Billing Code' in i:
# files.append(i)
files = sorted(glob.iglob(i), key=os.path.getctime, reverse=True)[0]
I'm struggling to get in right inside the IF Statement, I have 3 Billing Code files, I need to get the latest, and my if statement does returns all three
The following code modifications will provide your solution.
# Get the latest Billing Code file
for i in glob.iglob(files_path):
if 'Billing Code' in i:
# files.append(i)
files = sorted(files, key=os.path.getctime, reverse=True)[0]

How to copy from zip file to a folder without unzipping it?

How to make this code works?
There is a zip file with folders and .png files in it. Folder ".\icons_by_year" is empty. I need to get every file one by one without unzipping it and copy to the root of the selected folder (so no extra folders made).
class ArrangerOutZip(Arranger):
def __init__(self):
self.base_source_folder = '\\icons.zip'
self.base_output_folder = ".\\icons_by_year"
def proceed(self):
self.create_and_copy()
def create_and_copy(self):
reg_pattern = re.compile('.+\.\w{1,4}$')
f = open(self.base_source_folder, 'rb')
zfile = zipfile.ZipFile(f)
for cont in zfile.namelist():
if reg_pattern.match(cont):
with zfile.open(cont) as file:
shutil.copyfileobj(file, self.base_output_folder)
zfile.close()
f.close()
arranger = ArrangerOutZip()
arranger.proceed()
shutil.copyfileobj uses file objects for source and destination files. To open the destination you need to construct a file path for it. pathlib is a part of the standard python library and is a nice way to handle file paths. And ZipFile.extract does some of the work of creating intermediate output directories for you (plus sets file metadata) and can be used instead of copyfileobj.
One risk of unzipping files is that they can contain absolute or relative paths outside of the target directory you intend (e.g., "../../badvirus.exe"). extract is a bit too lax about that - putting those files in the root of the target directory - so I wrote a little something to reject the whole zip if you are being messed with.
With a few tweeks to make this a testable program,
from pathlib import Path
import re
import zipfile
#import shutil
#class ArrangerOutZip(Arranger):
class ArrangerOutZip:
def __init__(self, base_source_folder, base_output_folder):
self.base_source_folder = Path(base_source_folder).resolve(strict=True)
self.base_output_folder = Path(base_output_folder).resolve()
def proceed(self):
self.create_and_copy()
def create_and_copy(self):
"""Unzip files matching pattern to base_output_folder, raising
ValueError if any resulting paths are outside of that folder.
Output folder created if it does not exist."""
reg_pattern = re.compile('.+\.\w{1,4}$')
with open(self.base_source_folder, 'rb') as f:
with zipfile.ZipFile(f) as zfile:
wanted_files = [cont for cont in zfile.namelist()
if reg_pattern.match(cont)]
rebased_files = self._rebase_paths(wanted_files,
self.base_output_folder)
for cont, rebased in zip(wanted_files, rebased_files):
print(cont, rebased, rebased.parent)
# option 1: use shutil
#rebased.parent.mkdir(parents=True, exist_ok=True)
#with zfile.open(cont) as file, open(rebased, 'wb') as outfile:
# shutil.copyfileobj(file, outfile)
# option 2: zipfile does the work for you
zfile.extract(cont, self.base_output_folder)
#staticmethod
def _rebase_paths(pathlist, target_dir):
"""Rebase relative file paths to target directory, raising
ValueError if any resulting paths are not within target_dir"""
target = Path(target_dir).resolve()
newpaths = []
for path in pathlist:
newpath = target.joinpath(path).resolve()
newpath.relative_to(target) # raises ValueError if not subpath
newpaths.append(newpath)
return newpaths
#arranger = ArrangerOutZip('\\icons.zip', '.\\icons_by_year')
import sys
try:
arranger = ArrangerOutZip(sys.argv[1], sys.argv[2])
arranger.proceed()
except IndexError:
print("usage: test.py zipfile targetdir")
I'd take a look at the zipfile libraries' getinfo() and also ZipFile.Path() for construction since the constructor class can also use paths that way if you intend to do any creation.
Specifically PathObjects. This is able to do is to construct an object with a path in it, and it appears to be based on pathlib. Assuming you don't need to create zipfiles, you can ignore this ZipFile.Path()
However, that's not exactly what I wanted to point out. Rather consider the following:
zipfile.getinfo()
There is a person who I think is getting at this exact situation here:
https://www.programcreek.com/python/example/104991/zipfile.getinfo
This person seems to be getting a path using getinfo(). It's also clear that NOT every zipfile has the info.

How do I update an AWS Gamelift script with boto3 in python?

I am running into a problem trying to update an AWS Gamelift script with a python command that zips a directory and uploads it with all its contents as a newer version to AWS Gamelift.
from zipfile import ZipFile
import os
from os.path import basename
import boto3
import sys, getopt
def main(argv):
versInput = sys.argv[1]
#initializes client for updating script in aws gamelift
client = boto3.client('gamelift')
#Where is the directory relative to the script directory. In this case, one folder dir lower and the contents of the RealtimeServer dir
dirName = '../RealtimeServer'
# create a ZipFile object
with ZipFile('RealtimeServer.zip', 'w') as zipObj:
# Iterate over all the files in directory
for folderName, subfolders, filenames in os.walk(dirName):
rootlen = len(dirName) + 1
for filename in filenames:
#create complete filepath of file in directory
filePath = os.path.join(folderName, filename)
# Add file to zip
zipObj.write(filePath, filePath[rootlen:])
response = client.update_script(
ScriptId=SCRIPT_ID_GOES_HERE,
Version=sys.argv[1],
ZipFile=b'--zip-file \"fileb://RealtimeServer.zip\"'
)
if __name__ == "__main__":
main(sys.argv[1])
I plan on using it by giving it a new version number everytime I make changes with:
python updateScript.py "0.1.1"
This is meant to help speed up development. However, I am doing something wrong with the ZipFile parameter of client.update_script()
For context, I can use the AWS CLI directly from the commandline and update a script without a problem by using:
aws gamelift update-script --script-id SCRIPT_STRING_ID_HERE --script-version "0.4.5" --zip-file fileb://RealtimeServer.zip
However, I am not sure what is going on because it fails to unzip the file when I try it:
botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the UpdateScript operation: Failed to unzip the zipped file.
UPDATE:
After reading more documentation about the ZipFile parameter:
https://docs.aws.amazon.com/gamelift/latest/apireference/API_UpdateScript.html
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/gamelift.html#GameLift.Client.update_script
I tried sending a base64 encoded version of the zip file. However, that didn't work. I put the following code before the client_update part of the script and used b64EncodedZip as the ZipFile parameter.
with open("RealtimeServer.zip", "rb") as f:
bytes = f.read()
b64EncodedZip = base64.b64encode(bytes)
I was able to get it to work by having some help from a maintainer of boto3 over at https://github.com/boto/boto3/issues/2646
(Thanks #swetashre)
Here is the code and it will only work up to 5mb and requires use of an s3 bucket if you want to upload a zip file any larger than that.
from zipfile import ZipFile
import os
from os.path import basename
import boto3
import sys, getopt
def main(argv):
versInput = sys.argv[1]
#initializes client for updating script in aws gamelift
client = boto3.client('gamelift')
#Where is the directory relative to the script directory. In this case, one folder dir lower and the contents of the RealtimeServer dir
dirName = '../RealtimeServer'
# create a ZipFile object
with ZipFile('RealtimeServer.zip', 'w') as zipObj:
# Iterate over all the files in directory
for folderName, subfolders, filenames in os.walk(dirName):
rootlen = len(dirName) + 1
for filename in filenames:
#create complete filepath of file in directory
filePath = os.path.join(folderName, filename)
# Add file to zip
zipObj.write(filePath, filePath[rootlen:])
with open('RealtimeServer.zip','rb') as f:
contents = f.read()
response = client.update_script(
ScriptId="SCRIPT_ID_GOES_HERE",
Version=sys.argv[1],
ZipFile=contents
)
if __name__ == "__main__":
main(sys.argv[1])
I got the script working but I did it by avoiding the use of boto3. I don't like it but it works.
os.system("aws gamelift update-script --script-id \"SCRIPT_ID_GOES_HERE\" --script-version " + sys.argv[1] + " --zip-file fileb://RealtimeServer.zip")
If anyone knows how to get boto3 to work for updating an AWS Gamelift script then please let me know.

How to save files to a directory and append those files to a list in Python?

Scenario:
I want to check whether if a directory contains a certain '.png' image file. If so, this image file along with all the other files (with png extension only) gets stored in a different directory. (The solution I am looking for should work in all OS platforms i.e Windows, Unix, etc.) and in a remote server i.e (FTP etc.)
I have tried the following code below:
import os, sys
import shutil
import pathlib
import glob
def search():
image_file = 'picture.png'
try:
arr = [] #List will be used to append all the files in a particular directory.
directory = pathlib.Path("collection") #checks if the collection directory exists.
files = []
#need to convert the PosixPath (directory) to a string.
[files.extend(glob.glob(str(directory) + "/**/*.png", recursive = True))]
res = [img for img in files if(img in image_file)] #checks if the image is within the list of files i.e 'picture.png' == 'collection\\picture.png'
if str(bool(res)): #If True...proceed
print("Image is available in image upload storage directory")
for file in files:
transfer_file = str(file)
shutil.copy(file, 'PNG_files/') #send all the files to a different directory i.e 'PNG_files' by using the shutil module.
arr.append(transfer_file)
return arr
else:
print("image not found in directory")
except OSError as e:
return e.errno
result = search() #result should return the 'arr' list. This list should contain png images only.
However, during execution, the For loop is not getting executed. Which means:
The image files are not stored in the 'PNG_files' directory.
The images are not getting appended in the 'arr' list.
The code above the For loop worked as expected. Can anyone explain to me what went wrong?
There are several issues:
In this line
res = [img for img in files if(img in image_file)] #checks if the image is within the list of files i.e 'picture.png' == 'collection\\picture.png'
you should check the other way around (as written in the comment): image_file in img, e.g. picture.png in collection/picture.png.
str(directory) + "/**/*.png" is not OS independent. If you need this to work on Windows, too, you should use os.path.join(str(directory), '**', '*.png') instead!
This check is incorrect: if str(bool(res)):. It's actually always true, because bool(res) is either True or False, str(bool(res)) is either "True" or "False", but both are actually True, as neither is an empty string. Correctly: if res:.
And finally, you're missing the creation of the PNG_files directory. You need to either manually create it before running the script, or call os.mkdir().

Resources