python shutil copy the files from source dir to remote dir based on condition - python-3.x

I'm looking to copy the files from source directory to the remote directory using shutil(), however I need to have few checks as follows.
Don't copy the zero byte file to the remote.
If the file is already exits on the remote then don't copy it again unless the file in source has changed contents or updated.
I'm looking for the directory which is of current month, so, traverse to the directory if its available for the current month, like it should be January for the current month.
Importing the modules:
import os
import glob
import shutil
import datetime
Variable to pick the current month:
Info_month = datetime.datetime.now().strftime("%B")
Code snippet:
for filename in glob.glob("/data/Info_month/*/*.txt"):
if not os.path.exists("/remote/data/" + os.path.basename(filename)):
shutil.copy(filename, "/remote/data/")
Above code doesn't take the variable Info_month However, hardcoding the directory name works.
I'm having challenges due to my lack of Python knowledge.
How can I include the variable Info_month into the source dir path?
How to place the check for not to copy zero byte files?
os.path.getsize(fullpathhere) > 0
My rudimentary silly logic:
for filename in glob.glob("/data/Info_month/*/*.txt"):
if os.path.getsize(fullpathhere) > 0 :
if not os.path.exists("/remote/data/" + os.path.basename(filename)):
shutil.copy(filename, "/remote/data/")
else:
pass

Here is a fix of your existing script. This doesn't yet attempt to implement the "source newer than target" logic since you didn't specifically ask about that, and this is arguably too broad already.
for filename in glob.glob("/data/{0}/*/*.txt".format(Info_month)):
# The result of the above glob _is_ a full path
if os.path.getsize(filename) > 0:
# Minor tweak: use os.path.join for portability
if not os.path.exists(os.path.join(["/remote/data/", os.path.basename(filename)])):
shutil.copy(filename, "/remote/data/")
# no need for an explicit "else" if it's a no-op

Related

Extract text from first page of word document and use it as a folder name, then move the file inside that folder

I have hundreds of word documents that needs to be processed but need to organized them first by versions in subfolders.
I basically get a drop of these word documents within a single folder and need to automate the organization moving forward before I get nuts.
So I have a script that basically creates a folder with the same name of the file and moves the file inside that folder, this part is done.
Now I need to go into each subfolder, and get the document version from within the first word page of each document, then create a sub-folder withe version number and move the word file into that subfolder.
The structure should be as follows (taking two folders as examples):
(Folder) Test
(Subfolder) 12.0
Test.docx
(Folder) Test1
(Subfolder) 13.0
Test1.docx
Luckily I was able to figure it out that "doc.paragraphs[6].text" will always return the version information in a single line as follows:
>>> doc.paragraphs[6].text
'Version Number: 12.0'
Would appreciate if someone can point me out to the right direction.
This is the script I have so far:
#!/usr/bin/env python3
import glob, os, shutil, docx, sys
folder = sys.argv[1]
#print(folder)
for file_path in glob.glob(os.path.join(folder, '*.docx')):
new_dir = file_path.rsplit('.', 1)[0]
#print(new_dir)
try:
os.mkdir(os.path.join(folder, new_dir))
except WindowsError:
# Handle the case where the target dir already exist.
pass
shutil.move(file_path, os.path.join(new_dir, os.path.basename(file_path)))
Please see below the complete solution to your requirement.
Note: To know about re.search go through https://www.geeksforgeeks.org/python-regex-re-search-vs-re-findall/
import docx, os, glob, re, shutil
from pathlib import Path
def create_dir(path): # function to check if a given path exist and create one if not
# Check whether the specified path exists or not
is_exist = os.path.exists(path)
# Create a new directory the path does not exist
if not is_exist:
os.makedirs(path)
folder = fr"C:\Users\rams\Documents\word_docs" #my local folder
for file in glob.glob(os.path.join(folder, '*.docx')):
# Test, Test1, Test2 in your structure
main_folder = os.path.join(folder,Path(file).stem)
file_name = os.path.basename(file)
# Get the first line from the docx
doc = docx.Document(file).paragraphs[0].text
# group(1) = Version Number: (.*)
version_no = re.search("(Version Number: (.*))", doc).group(1)
# extract the number portion from version_no
sub_folder = version_no.split(':')[1].strip()
# path to actual sub_folder with version_no
sub_folder = os.path.join(main_folder, sub_folder)
# destination path
dest_file_path = os.path.join(sub_folder, file_name)
for i in [main_folder,sub_folder]:
create_dir(i) # function call
# to move the file to the corresponding version folder (overwrite if exists)
if os.path.exists(dest_file_path):
os.remove(dest_file_path)
shutil.move(file, sub_folder)
else:
shutil.move(file, sub_folder)
Before execution:
After Execution
So you have a script that creates a folder name being the file name and moves the file inside that folder. This part is done. OK.
Now you know how to get the document version from within the first word page of each document you need to create a sub-folder with this version number and move the word file into that sub-folder. This can be done using the same code as before replacing:
new_dir = file_path.rsplit('.', 1)[0]
with
document_dir = os.path.dirname(file_path)
document_name = os.path.basename(file_path)
# check if the document is already in the right directory:
assert os.path.basename(document_dir) == document_name.rsplit('.', 1)[0]
# here comes: doc = some_function_getting_the_doc_object(file_path)
doc_version_tuple = doc.paragraphs[6].text.rsplit(': ', 1)
# check if doc_version_tuple has the right content:
assert doc_version_tuple[0] == 'Version Number'
doc_version = doc_version_tuple[1]
new_dir = os.path.join(document_dir, doc_version)
Notice that you can also do both of the two steps in one run over the list of full path document names.
Notice further that running the script you posted in your question twice without the check:
assert os.path.basename(document_dir) != document_name.rsplit('.', 1)[0]
giving an Error if the script was already run and the documents are already in folders with the document name will destroy what you already achieved and you will need to write another script to reverse it.
The above is the reason why it would be a good idea to have a backup copy of all the documents you can use to re-create the directory with the documents in case something goes wrong. And ... it is generally a good idea to have always a backup copy if you work on files especially when using a self-written script.

Using Python to copy contents of multiple files and paste in a main file

I'll start by mentioning that I've no knowledge in Python but read online that it could help me with my situation.
I'd like to do a few things using (I believe?) a Python script.
I have a bunch of .yml files that I want to transfer the contents into one main .yml file (let's call it Main.yml). However, I'd also like to be able to take the name of each individual .yml and add it before it's content into Main.yml as "##Name". If possible, the script would look like each file in a directory, instead of having to list every .yml file I want it to look for (my directory in question only contains .yml files). Not sure if I need to specify, but just in case: I want to append the contents of all files into Main.yml & keep the indentation (spacing). P.S. I'm on Windows
Example of what I want:
File: Apes.yml
Contents:
Documentation:
"Apes":
year: 2009
img: 'link'
After running the script, my Main.yml would like like:
##Apes.yml
Documentation:
"Apes":
year: 2009
img: 'link'
I'm just starting out in Python too so this was a great opportunity to see if my newly learned skills work!
I think you want to use the os.walk function to go through all of the files and folders in the directory.
This code should work - it assumes your files are stored in a folder called "Folder" which is a subfolder of where your Python script is stored
# This ensures that you have the correct library available
import os
# Open a new file to write to
output_file = open('output.txt','w+')
# This starts the 'walk' through the directory
for folder , sub_folders , files in os.walk("Folder"):
# For each file...
for f in files:
# create the current path using the folder variable plus the file variable
current_path = folder+"\\"+f
# write the filename & path to the current open file
output_file.write(current_path)
# Open the file to read the contents
current_file = open(current_path, 'r')
# read each line one at a time and then write them to your file
for line in current_file:
output_file.write(line)
# close the file
current_file.close()
#close your output file
output_file.close()

How do I write a python script to read through files in a Linux directory and perform certain actions?

I need to write a python script to read through files in a directory, retrieve the header record (which contains date)? I need to compare the date in the header record of each file with current date and if the difference is greater than 30 days. I need to delete such files.
I managed to come up with below code but not sure how to proceed since I am new to Python.
Example:
Sample file in the directory (/tmp/ah): abcdedfgh1234.123456
Header record : FILE-edidc40: 20200602-123539 46082 /tmp/ah/srcfile
I have below code for the list of files in the current directory. I need to pass the python equivalent of below actions on unix files
head -1 file|cut -c 15-22
Output: 20200206 (to compare with current date and if older than 30) then delete file(using rm command).
import os
def files in os.listdir(path):
for files in os.listdir(path):
if os.path.isfile(os.path.join(path,file)):
yield file
for file in files(".") : # prints the list of files

The system cannot find the file specified - WinError 2

Upon looping a directory to delete txt files ONLY - a message is returned indicating The System cannot find the file specified: 'File.txt'.
I've made sure the txt files that I'm attempting to delete exist in the directory I'm looping. I've also checked my code and to make sure it can see my files by printing them in a list with the print command.
import os
fileLoc = 'c:\\temp\\files'
for files in os.listdir(fileLoc):
if files.endswith('.txt'):
os.unlink(files)
Upon initial execution, I expected to see all txt files deleted except for other non-txt files. The actual result was an error message "FileNotFoundError: [WinError 2] The system cannot find the file specified: 'File.txt'.
Not sure what I'm doing wrong, any help would be appreciated.
It isn't found because the the path you intended to unlink is relative to fileLoc. In fact with your code, the effect is to unlink the file relative to the current working directory. If there were *.txt files
in the cwd then the code would have unfortunate side-effects.
Another way to look at it:
Essentially, by analogy, in the shell what you're trying to do is equivalent to this:
# first the setup
$ mkdir foo
$ touch foo/a.txt
# now your code is equvalent to:
$ rm *.txt
# won't work as intended because it removes the *.txt files in the
# current directory. In fact the bug is also that your code would unlink
# any *.txt files in the current working directory unintentionally.
# what you intended was:
$ rm foo/*.txt
The missing piece was the path to the file in question.
I'll add some editorial: The Old Bard taught us to "when in doubt, print variables". In other words, debug it. I don't see from the OP an attempt to do that. Just a thing to keep in mind.
Anyway the new code:
Revised:
import os
fileLoc = 'c:\\temp\\files'
for file in os.listdir(fileLoc):
if file.endswith('.txt'):
os.unlink(os.path.join(fileLoc,file))
The fix: os.path.join() builds a path for you from parts. One part is the directory (path) where the file exists, aka: fileLoc. The other part is the filename, aka file.
os.path.join() makes a whole valid path from them using whatever OS directory separator is appropriate for your platform.
Also, might want to glance through:
https://docs.python.org/2/library/os.path.html

Moving all files out from a directory and all its subdirectories

I have a small program that moves all files out of a directory and then searches all subdirectories for other files which it also moves out.
import shutil
import os
import ctypes
import sys
copyfrom = r'D:\Downloads\'
copyto = r'D:\Downloads\'
for r, d, f in os.walk(copyfrom):
for file in f:
if os.path.join(r, file) == copyto:
continue
print(os.path.join(r, file))
shutil.move(os.path.join(r, file), os.path.join(copyto, file))
It works right now but will overwrite every file that has a filename of an existing file. For example if i have banana.mp3 and banana.jpeg it will overwrite one of the files. Instead i would like the file with an existing name to be renamed.
You may check whether the file exists using os.path.exists(destination) . But you should make sure that no race condition is going to happen. So you may open the existing file using a command like os.open(), do your work and then close the file.

Resources