Hi everyone i am trying to build a file watcher in python 3.5 using watchgod. I want to continuously watch a directory and if any file is added then i want to send a list of added files to another program which will perform a series of task. Following is my code in python :-
print("execution of main file begins !!!!")
import os
from watchgod import watch
#changes gives a set object when watch finds any kind of changes in directory
for changes in watch(r'C:\Users\Rajat.Malik\Desktop\Requests'):
fileStatus = [obj[0] for obj in list(changes) ] #converting set to list which gives file status as added, changed or modified
fileLocation = [obj[1] for obj in list(changes) ] #similarly getting list of location of files added
var2 = 0
for var1 in fileLocation:
if fileStatus[var2] == 1: #if file is added then passing all files to another code which will work on the list of files added
os.system('python split_thread_module.py '+var1) #now this code will start executing
var2 = var2 + 1
So the problem i am having is that while split_thread_module.py is executing the watcher is not watching the directory. Any file which is coming at time when split_thread_module.py is executing is not reflecting in changes. How can i watch the changes in directory and pass it to the other program on the fly even when the other program is executing. I am not a python programmer. Can anyone help me in this regard ?
Thanks in advance !!!!
Sorry for delayed, I'm the developer of watchgod. I've added a python-watchgod tag to your question which I'll watch (no pun intended) in future so I can answer such questions more quickly.
To answer your question, watchgod will not miss changes which occur in the filesystem while other code is running. They'll just be reported as changes next time watch iterates.
More generally the best approach would be to run the other code asynchronously so the main process can get back to watching the directory.
a few other hints for neater python
no need to call list(changes) in the comprehension
os.system is deprecated, better to use subprocess.run
since split_thread_module.py is also python, do you really need to run it in a separate process? Even if you do you might have more luck with python multiprocessing than starting a new process with the system's process initiation.
Overall you might try something like:
from concurrent.futures import ProcessPoolExecutor
from time import sleep
from watchgod import watch
def slow_job(status, location):
print(f'status: {status}, location: {location}, starting...')
sleep(10)
print(f'status: {status}, location: {location}, done')
with ProcessPoolExecutor() as executor:
for changes in watch('./tests'):
for status, location in changes:
executor.submit(slow_job, status, location)
Related
I am pretty confused about the best way to do what I am trying to do.
What do I want?
API call to the flask application
Flask route starts 4-5 multiprocess using Process module and combine results(on a sliced pandas dataframe) using a shared Managers().list()
Return computed results back to the client.
My implementation:
pos_iter_list = get_chunking_iter_list(len(position_records), 10000)
manager = Manager()
data_dict = manager.list()
processes = []
for i in range(len(pos_iter_list) - 1):
temp_list = data_dict[pos_iter_list[i]:pos_iter_list[i + 1]]
p = Process(
target=transpose_dataset,
args=(temp_list, name_space, align_namespace, measure_master_id, df_searchable, products,
channels, all_cols, potential_col, adoption_col, final_segment, col_map, product_segments,
data_dict)
)
p.start()
processes.append(p)
for p in processes:
p.join()
My directory structure:
- main.py(flask entry point)
- helper.py(contains function where above code is executed & calls transpose_dataset function)
Error that i am getting while running the same?
RuntimeError: No root path can be found for the provided module "mp_main". This can happen because the module came from an import hook that does not provide file name information or because it's a namespace package. In this case the root path needs to be explicitly provided.
Not sure what went wong here, manager list works fine when called from a sample.py file using if __name__ == '__main__':
Update: The same piece of code is working fine on my MacBook and not on windows os.
A sample flask API call:
#app.route(PREFIX + "ping", methods=['GET'])
def ping():
man = mp.Manager()
data = man.list()
processes = []
for i in range(0,5):
pr = mp.Process(target=test_func, args=(data, i))
pr.start()
processes.append(pr)
for pr in processes:
pr.join()
return json.dumps(list(data))
Stack has an ongoing bug preventing me from commenting, so I'll just write up an answer..
Python has 2 (main) ways to start a new process: "spawn", and "fork". Fork is a system command only available in *nix (read: linux or macos), and therefore spawn is the only option in windows. After 3.8 spawn will be the default on MacOS, but fork is still available. The big difference is that fork basically makes a copy of the existing process while spawn starts a whole new process (like just opening a new cmd window). There's a lot of nuance to why and how, but in order to be able to run the function you want the child process to run using spawn, the child has to import the main file. Importing a file is tantamount to just executing that file and then typically binding its namespace to a variable: import flask will run the flask/__ini__.py file, and bind its global namespace to the variable flask. There's often code however that is only used by the main process, and doesn't need to be imported / executed in the child process. In some cases running that code again actually breaks things, so instead you need to prevent it from running outside of the main process. This is taken into account in that the "magic" variable __name__ is only equal to "__main__" in the main file (and not in child processes or when importing modules).
In your specific case, you're creating a new app = Flask(__name__), which does some amount of validation and checks before you ever run the server. It's one of these setup/validation steps that it's tripping over when run from the child process. Fixing it by not letting it run at all is imao the cleaner solution, but you can also fix it by giving it a value that it won't trip over, then just never start that secondary server (again by protecting it with if __name__ == "__main__":)
i'm writing a simple gui script A which is calling another script B in it. Scrip B has subprocess in it which takes some time. I would like to print something like "processing..." in one of the labels on that gui and that print should be there until subprocess from script B is finished. How i can do that?
edit:
If i should have to listen for termination of subprocess of script A from script A i would simply name that process (i.e p) and check its p.poll(). Since that subprocess is product of antorher script B, i thought if i could name that process and import that script B in script A and then check for p.poll. But i faced another problem, i couldn't import script B to A. The steps i was doing were from:
Importing variables from another file?
Every time i got message that there is no such file. Fortunately at the end i found another way around to achieve what i wanted.
I would split this task into two stages:
get subprocess PID.
check whether PID is still running.
I have multiple folders in a directory and each folder has multiple files. I have a code which checks for a specific file in each folder and does some data preprocessing and analysis if the specific file is present.
A snippet of it is given below.
import pandas as pd
import json
import os
rootdir = os.path.abspath(os.getcwd())
df_list = []
for subdir, dirs, files in os.walk(rootdir):
for file in files:
if file.startswith("StudyParticipants") and file.endswith(".csv"):
temp = pd.read_csv(os.path.join(subdir, file))
.....
.....
'some analysis'
Merged_df.to_excel(path + '\Processed Data Files\Study_Participants_Merged.xlsx')
Now, I want to automate this process. I want this script to be executed whenever a new folder is added. This is my first in exploring automation process and I ham stuck on this for quite a while without major progress.
I am using windows system and Jupyter notebook to create these dataframes and perform analysis.
Any help is greatly appreciated.
Thanks.
I've wrote a script which you should only run once and it will work.
Please note:
1.) This solution does not take into account which folder was created. If this information is required I can rewrite the answer.
2.) This solution assumes folders won't be deleted from the main folder. If this isn't the case, I can rewrite the answer as well.
import time
import os
def DoSomething():
pass
if __name__ == '__main__':
# go to folder of interest
os.chdir('/home/somefolders/.../A1')
# get current number of folders inside it
N = len(os.listdir())
while True:
time.sleep(5) # sleep for 5 secs
if N != len(os.listdir()):
print('New folder added! Doing something useful...')
DoSomething()
N = len(os.listdir()) # update N
take a look at watchdog.
http://thepythoncorner.com/dev/how-to-create-a-watchdog-in-python-to-look-for-filesystem-changes/
you could also code a very simple watchdog service on your own.
list all files in the directory you want to observe
wait a time span you define, say every few seconds
make again a list of the filesystem
compare the two lists, take the difference of them
the resulting list from this difference are your filesystem changes
Best greetings
I have 2 different python scripts, the first called main.py and the second called running.py
the main.py script is the following:
#setting things up and stuff
while True:
#stuff
Here running.py should be started
#stuff
while running.py contains the following
#various imports
while True:
#stuff
My question is, how can i run running.py from main.py as a new thread knowing only the name of the running script?
I looked a bit into it and, since i need to communicate and share data between main.py and running.py, I don't think creating a subprocess is the best course of action but I haven't found a way to run a whole script in a thread only knowing the script's name.
For various (stupid) company reasons I can't change the content of running.py and i can't import it into main.py so creating a threading class inside it is not a possibility but i have free rein on main.py
Is what i'm trying to do even possible?
I have a number of python scripts that I would like to automate using Python's Datetime and Schedule module.
They are too numerous to consider breaking apart and adding to one large file.
What is the easiest way to write a python script that will open and run these other python scripts?
I have browsed similar questions but none offered a concrete answer that I could find. Thanks for your help.
A minimally demonstrative example
In a file called "child.py", write a file to the current directory:
with open('test', 'w') as f:
f.write('hello world')
Then, in a file called "parent.py", execute the "child.py" script:
import subprocess
subprocess.call(['python', 'child.py'])
Now, from your command line, you can type (assuming both "parent.py" and "child.py" are in the current directory):
python parent.py
In the next instant, you should see a file called "test" in your current directory. Open it up. What do you see?
Well, hello world of course!
The above example makes a child of the current process (meaning it inherits the environment variables in the parent), and waits until the child process completes before returning control to the parent. If you want the child script to run in the background, then you need to use Popen:
subprocess.Popen(['python', 'child.py'])