How do I write a python script to read through files in a Linux directory and perform certain actions? - python-3.x

I need to write a python script to read through files in a directory, retrieve the header record (which contains date)? I need to compare the date in the header record of each file with current date and if the difference is greater than 30 days. I need to delete such files.
I managed to come up with below code but not sure how to proceed since I am new to Python.
Example:
Sample file in the directory (/tmp/ah): abcdedfgh1234.123456
Header record : FILE-edidc40: 20200602-123539 46082 /tmp/ah/srcfile
I have below code for the list of files in the current directory. I need to pass the python equivalent of below actions on unix files
head -1 file|cut -c 15-22
Output: 20200206 (to compare with current date and if older than 30) then delete file(using rm command).
import os
def files in os.listdir(path):
for files in os.listdir(path):
if os.path.isfile(os.path.join(path,file)):
yield file
for file in files(".") : # prints the list of files

Related

Using regex and cp: cannot stat

I am trying to copy files over from an old file structure where data are stored in folders with improper names to a new (better) structure, but as there are 671 participants I need to copy, I want to use regex in order to streamline it (each participant has files saved in the same format). However, I keep getting a cp: cannot stat error message saying that no file/directory exists. I had assumed this meant that I had missed a / or put "" in the wrong location but I cannot see anything in the code that would suggest it.
My code is as follows (which I add a lot of comments so other collaborators can understand):
#!/bin/bash
# This code below copies the initial .nii file.
# These data are copied into my Trial Participant folders.
# Create a variable called parent_folder1 that describes initial mask directory e.g. folders for each participant which contains the files.
parent_folder1="/this/path/here/contains/Trial_Participants"
# The original folders are named according to ClinicalID_scandate_randomdigits e.g. folder 1234567890_20000101_987654.
# The destination folders are named according to TrialIDNumber e.g. LBC100001.
# The .nii files are saved under TrialIDNumber_1_ICV.nii.gz e.g. LBC1000001_1_ICV.nii.gz.
# These files need copied over from their directories into the Trial Participant folders, using the for loop function.
# The * symbol is used as a wildcard.
for i in $(ls -1d "${parent_folder1}"/*_20*); do
lbc=$(ls ${i}/finalMasks/*ICV* | sed 's/^.*\///'); lbc=${lbc:0:9}
cp "${parent_folder1}/${i}"/finalMasks/*_1_ICV.nii.gz /this/path/is/the/destination/path/${lbc}/
done
# This code uses regular expression to find the initial ICV file.
# ls asks for a list, -1 makes each new folder on a new line, d is for directory.
# *_20* refers to the name of the folders. The * covers the ClinicalID, _20* refers to the scan date and random digits.
# I have no idea what the | sed 's/^.*\///' does, but I think it strips the path.
# lbc=${lbc:0:9} is used to keep the ID numbers.
# cp copies the files that are named under TrialIDNumber(replaced by *)_1_ICV.nii.gz to the destination under the respective folder.
So after a bit of fooling around, I changed the code a lot (took out sed as it confuses me), and came up with this that worked. Thanks to those who commented!
# Create a variable called parent_folder1 that describes initial mask directory.
parent_folder1="/original/path/here"
# Iterate over directories in parent_folder1
for i in $(ls -1d "${parent_folder1}"/*_20*); do
# Extract the base name of the file in the finalMasks directory
lbc=$(basename $(ls "${i}"/finalMasks/*ICV*))
# Extract the LBC number from the file name
lbc=${lbc:0:9}
# Copy the file to the specific folder
cp "${i}"/finalMasks/${lbc}_1_ICV.nii.gz /destination/path/here/${lbc}/
done

Using Python to copy contents of multiple files and paste in a main file

I'll start by mentioning that I've no knowledge in Python but read online that it could help me with my situation.
I'd like to do a few things using (I believe?) a Python script.
I have a bunch of .yml files that I want to transfer the contents into one main .yml file (let's call it Main.yml). However, I'd also like to be able to take the name of each individual .yml and add it before it's content into Main.yml as "##Name". If possible, the script would look like each file in a directory, instead of having to list every .yml file I want it to look for (my directory in question only contains .yml files). Not sure if I need to specify, but just in case: I want to append the contents of all files into Main.yml & keep the indentation (spacing). P.S. I'm on Windows
Example of what I want:
File: Apes.yml
Contents:
Documentation:
"Apes":
year: 2009
img: 'link'
After running the script, my Main.yml would like like:
##Apes.yml
Documentation:
"Apes":
year: 2009
img: 'link'
I'm just starting out in Python too so this was a great opportunity to see if my newly learned skills work!
I think you want to use the os.walk function to go through all of the files and folders in the directory.
This code should work - it assumes your files are stored in a folder called "Folder" which is a subfolder of where your Python script is stored
# This ensures that you have the correct library available
import os
# Open a new file to write to
output_file = open('output.txt','w+')
# This starts the 'walk' through the directory
for folder , sub_folders , files in os.walk("Folder"):
# For each file...
for f in files:
# create the current path using the folder variable plus the file variable
current_path = folder+"\\"+f
# write the filename & path to the current open file
output_file.write(current_path)
# Open the file to read the contents
current_file = open(current_path, 'r')
# read each line one at a time and then write them to your file
for line in current_file:
output_file.write(line)
# close the file
current_file.close()
#close your output file
output_file.close()

Extract tar.gz{some integer} in python

I am trying to extract a file name with this format--> filename.tar.gz10
I have tried mutpile wayd but for all of them, I get the error that is unknow format. it works fine for files ends with tar.gz00. I tried to change the name but still does not work.
Here are what I have tried,
import tarfile
file = tarfile.open('filename.tar.gz10')
file.extractall('./extracted_path')
file.close()
Another way is,
shutil.unpack_archive('./filename.tar.gz10', './extracted_path', 'tar.gz17')
Thanks for your help in advance.
This coule be because the archive was split into smaller chunks, on linux you could do so using the split -b command so one big file is actually multiple smaller ones now, and they are named like
file.tar.gz01
file.tar.gz02
file.tar.gz03
file.tar.gz04
etc...
you wont be able to decompress these file individually, so you have to concatenate them first into one file then decompress.
To verify whther it was split or not, run file {filename} and if does not recognize it as a gzip compressed archive then it is propably split (this is why you get unknown format error)
You can try to do the following:
from glob import glob
import os
path = '/path/to/' # location of your files
list_of_files = glob(path + '*.tar.gz*') # list all gzip files
bash_command = 'gzip -dk filename.tar.gz' + ' '.join(list_of_files) # create bash command to concatenate the files
os.system(bash_command)

python shutil copy the files from source dir to remote dir based on condition

I'm looking to copy the files from source directory to the remote directory using shutil(), however I need to have few checks as follows.
Don't copy the zero byte file to the remote.
If the file is already exits on the remote then don't copy it again unless the file in source has changed contents or updated.
I'm looking for the directory which is of current month, so, traverse to the directory if its available for the current month, like it should be January for the current month.
Importing the modules:
import os
import glob
import shutil
import datetime
Variable to pick the current month:
Info_month = datetime.datetime.now().strftime("%B")
Code snippet:
for filename in glob.glob("/data/Info_month/*/*.txt"):
if not os.path.exists("/remote/data/" + os.path.basename(filename)):
shutil.copy(filename, "/remote/data/")
Above code doesn't take the variable Info_month However, hardcoding the directory name works.
I'm having challenges due to my lack of Python knowledge.
How can I include the variable Info_month into the source dir path?
How to place the check for not to copy zero byte files?
os.path.getsize(fullpathhere) > 0
My rudimentary silly logic:
for filename in glob.glob("/data/Info_month/*/*.txt"):
if os.path.getsize(fullpathhere) > 0 :
if not os.path.exists("/remote/data/" + os.path.basename(filename)):
shutil.copy(filename, "/remote/data/")
else:
pass
Here is a fix of your existing script. This doesn't yet attempt to implement the "source newer than target" logic since you didn't specifically ask about that, and this is arguably too broad already.
for filename in glob.glob("/data/{0}/*/*.txt".format(Info_month)):
# The result of the above glob _is_ a full path
if os.path.getsize(filename) > 0:
# Minor tweak: use os.path.join for portability
if not os.path.exists(os.path.join(["/remote/data/", os.path.basename(filename)])):
shutil.copy(filename, "/remote/data/")
# no need for an explicit "else" if it's a no-op

Replacing files in one folder and all its subdirectories with modified versions in another folder

I have two folders, one called 'modified' and one called 'original'.
'modified' has no subdirectories and contains 4000 wav files each with unique names.
The 4000 files are copies of files from 'original' except this folder has many subdirectories inside which the original wav files are located.
I want to, for all the wav files in 'modified', replace their name-counterpart in 'original' wherever they may be.
For example, if one file is called 'sound1.wav' in modified, then I want to find 'sound1.wav' in some subdirectory of 'original' and replace the original there with the modified version.
I run Windows 8 so command prompt or cygwin would be best to work in.
As requested, I've written the python code that does the above. I use the 'os' and 'shutil' modules to first navigate over directories and second to overwrite files.
'C:/../modified' refers to the directory containing the files we have modified and want to use to overwrite the originals.
'C:/../originals' refers to the directory containing many sub-directories with files with the same names as in 'modified'.
The code works by listing every file in the modified directory, and for each file, we state the path for the file. Then, we look through all the sub-directories of the original directory, and where the modified and original files share the same name, we replace the original with the modified using shutil.copyfile().
Because we are working with Windows, it was necessary to change the direction of the slashes to '/'.
This is performed for every file in the modified directory.
If anyone ever has the same problem I hope this comes in handy!
import os
import shutil
for wav in os.listdir('C:/../modified'):
modified_file = 'C:/../modified/' + wav
for root, dirs, files in os.walk('C:/../original'):
for name in files:
if name == wav:
original_file = root + '/' + name
original_file = replace_file.replace('\\','/')
shutil.copyfile(modified_file, original_file)
print wav + ' overwritten'
print 'Complete'

Resources