gsutil without -m multithreading / parallel default behavior

gsutil without -m multithreading / parallel default behavior - multithreading

I am trying to find out if gsutil mv is called without the -m option, what the defaults are. I see in the config.py source code that it looks like even without the -m option the default would be to calculate the number of CPU cores and set that along with 5 threads. So by default if you had a 4 core machine you would get 4 processes and 5 threads, basically multi-threaded out of the box. How would we find out what -m does, I think I saw in some documentation that -m defaults to 10 threads, but how many processes are spawned? I know you can override these settings but whats default with -m?
should_prohibit_multiprocessing, unused_os =ShouldProhibitMultiprocessing()
if should_prohibit_multiprocessing:
DEFAULT_PARALLEL_PROCESS_COUNT = 1
DEFAULT_PARALLEL_THREAD_COUNT = 24
else:
DEFAULT_PARALLEL_PROCESS_COUNT = min(multiprocessing.cpu_count(), 32)
DEFAULT_PARALLEL_THREAD_COUNT = 5
Also would a mv command in a for loop take advantage of -m or will it just feed the gsutil command one at a time rendering parallel obsolete? The reason I ask because using the below loop with 50000 files took 24 hours to complete, I wanted to know if I used the -m option if it would of helped? Not sure if calling the gsutil command each iteration would allow full threading or would it just do it with 10 processes and 10 threads making it twice as fast?
#!/bin/bash
for files in $(cat listing2.txt) ; do
echo "Renaming: $files --> ${files#removeprefix-}"
gsutil mv gs://testbucket/$files gs://testbucket/${files#removeprefix-}
done

Thanks to the commenters #guillaume blaquiere,
I engineered a python program that would multi process the API calls to move the files in the cloud with 25 concurrent processes. I will share the code here to hopefully help others.
import time
import subprocess
import multiprocessing
class GsRenamer:
def __init__(self):
self.gs_cmd = '~/google-cloud-sdk/bin/gsutil'
def execute_jobs(self, cmd):
try:
print('RUNNING PARALLEL RENAME: [{0}]'.format(cmd))
print(cmd)
subprocess.run(cmd, check=True, shell=True)
except subprocess.CalledProcessError as e:
print('[{0}] FATAL: Command failed with error [{1}]').format(cmd,
e)
def get_filenames_from_gs(self):
self.file_list = []
cmd = [self.gs_cmd, 'ls',
'gs://gs-bucket/jason_testing']
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
output = p.stdout.readlines()
for files in output:
files = files.decode('utf-8').strip()
tokens = files.split('/')[-1]
self.file_list.append(tokens)
self.file_list = list(filter(None, self.file_list))
def rename_files(self, string_original, string_replace):
final_rename_list = []
for files in self.file_list:
renamed_files = files.replace(string_original,
string_replace)
rename_command = "{0} mv gs://gs-bucket/jason_testing/{1} " \
"gs://gs-bucket/jason_testing/{2}".format(
self.gs_cmd, files, renamed_files)
final_rename_list.append(rename_command)
final_rename_list.sort()
multiprocessing.pool = multiprocessing.Pool(
processes=25)
multiprocessing.pool.map(self.execute_jobs, final_rename_list)
def main():
gsr = GsRenamer()
gsr.get_filenames_from_gs()
#gsr.rename_files('sample', 'jason')
gsr.rename_files('jason', 'sample')
if __name__ == "__main__":
main()

Related

Run linux shell commands in Python3

I am creating a service manager to manage services such as apache , tomcat .. etc.
I can enable/disable services by srvmanage.sh enable <service_name> in shell.
I want to do this using python script. How to do it?
service_info = ServiceDB.query.filter_by(service_id=service_id).first()
service_name = service_info.service
subprocess.run(['/home/service_manager/bin/srvmanage.sh enable', service_name],shell=True)
what is the problem with this code ?

I'm guessing if you want to do this in python you may want more functionality. If not #programandoconro answer would do. However, you could also use the subprocess module to get more functionality. It will allow you to run a command with arguments and return a CompletedProcess instance. For example
import subprocess
# call to shell script
process = subprocess.run(['/path/to/script', options/variables], capture_output=True, text=True)
You can add in additional functionality by capturing the stderr/stdout and return code. For example:
# call to shell script
process = subprocess.run(['/path/to/script', options/variables], capture_output=True, text=True)
# return code of 0 test case. Successful run should return 0
if process.returncode != 0:
print('There was a problem')
exit(1)
Docs for subprocess is here

You can use os module to access system commands.
import os
os.system("srvmanage.sh enable <service_name>")

I fixed this issue .
operation = 'enable'
service_operation_script = '/home/service_manager/bin/srvmanage.sh'
service_status = subprocess.check_output(
"sudo " +"/bin/bash " + service_operation_script + " " + operation + " " + service_name, shell=True)
response = service_status.decode("utf-8")
print(response)

Run random process in ubuntu terminal

I want to know if there is a way to run random processes to simulate normal activity from a user that could work in the PC.
For example generate random processes that consume resources as a browser or a pdf would, until has 50% or 60% of memory working.
I am trying to get data from virtual machines but I would like to have the most heterogeneous data that I could.
I have tried the following:
Run random command in bash script
https://unix.stackexchange.com/questions/174688/how-can-i-start-a-process-with-any-name-which-does-nothing
But that's no exactly what I am looking for.
Can you help me?
Thanks in advance.

Would something like this work for you?
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import random
import subprocess
import time
import psutil
timeout = 5 * 60 # seconds
poll_time = 5 # seconds
mem_limit = psutil.virtual_memory().total * (50/100) # limit mem usage at 50%
cmds = [
'firefox www.some_website.com',
'firefox www.other_website.com',
'okular some_document.pdf',
'vlc some_video.mp4',
'vlc some_audio.mp3',
# etc.
]
procs = []
quit = False
init_time = time.time()
while not quit:
if mem_limit < psutil.virtual_memory().used:
i = random.randint(0, len(cmds))
cmd = cmds[i]
procs.append(subprocess.Popen(cmd, shell=True))
time.sleep(poll_time)
if timeout > (time.time() - init_time):
for proc in procs:
proc.kill()
quit = True

Find execution time for subprocess.Popen python

Here's the Python code to run an arbitrary command returning its stdout data, or raise an exception on non-zero exit codes:
proc = subprocess.Popen(
cmd,
stderr=subprocess.STDOUT, # Merge stdout and stderr
stdout=subprocess.PIPE,
shell=True)
The subprocess module does not support execution-time and if it exceeds specific threshold => timeout(ability to kill a process running for more than X number of seconds)
What is the simplest way to implement get_execution_time and timeout in Python2.6 program meant to run on Linux?

Good question. Here is the complete code for this:
import time, subprocess # Importing modules.
timeoutInSeconds = 1 # Our timeout value.
cmd = "sleep 5" # Your desired command.
proc = subprocess.Popen(cmd,shell=True) # Starting main process.
timeStarted = time.time() # Save start time.
cmdTimer = "sleep "+str(timeoutInSeconds) # Waiting for timeout...
cmdKill = "kill "+str(proc.pid)+" 2>/dev/null" # And killing process.
cmdTimeout = cmdTimer+" && "+cmdKill # Combine commands above.
procTimeout = subprocess.Popen(cmdTimeout,shell=True) # Start timeout process.
proc.communicate() # Process is finished.
timeDelta = time.time() - timeStarted # Get execution time.
print("Finished process in "+str(timeDelta)+" seconds.") # Output result.

Setting timeout when using os.system function

Firstly, I'd like to say I just begin to learn python, And I want to execute maven command inside my python script (see the partial code below)
os.system("mvn surefire:test")
But unfortunately, sometimes this command will time out, So I wanna to know how to set a timeout threshold to control this command.
That is to say, if the executing time is beyond X seconds, the program will skip the command.
What's more, can other useful solution deal with my problem? Thanks in advance!

use the subprocess module instead. By using a list and sticking with the default shell=False, we can just kill the process when the timeout hits.
p = subprocess.Popen(['mvn', 'surfire:test'])
try:
p.wait(my_timeout)
except subprocess.TimeoutExpired:
p.kill()

Also, you can use in terminal timeout:
Do like that:
import os
os.system('timeout 5s [Type Command Here]')
Also, you can use s, m, h, d for second, min, hours, day.
You can send different signal to command. If you want to learn more, see at:
https://linuxize.com/post/timeout-command-in-linux/

Simple answer
os.system not support timeout.
you can use Python 3's subprocess instead, which support timeout parameter
such as:
yourCommand = "mvn surefire:test"
timeoutSeconds = 5
subprocess.check_output(yourCommand, shell=True, timeout=timeoutSeconds)
Detailed Explanation
in further, I have encapsulate to a function getCommandOutput for you:
def getCommandOutput(consoleCommand, consoleOutputEncoding="utf-8", timeout=2):
"""get command output from terminal
Args:
consoleCommand (str): console/terminal command string
consoleOutputEncoding (str): console output encoding, default is utf-8
timeout (int): wait max timeout for run console command
Returns:
console output (str)
Raises:
"""
# print("getCommandOutput: consoleCommand=%s" % consoleCommand)
isRunCmdOk = False
consoleOutput = ""
try:
# consoleOutputByte = subprocess.check_output(consoleCommand)
consoleOutputByte = subprocess.check_output(consoleCommand, shell=True, timeout=timeout)
# commandPartList = consoleCommand.split(" ")
# print("commandPartList=%s" % commandPartList)
# consoleOutputByte = subprocess.check_output(commandPartList)
# print("type(consoleOutputByte)=%s" % type(consoleOutputByte)) # <class 'bytes'>
# print("consoleOutputByte=%s" % consoleOutputByte) # b'640x360\n'
consoleOutput = consoleOutputByte.decode(consoleOutputEncoding) # '640x360\n'
consoleOutput = consoleOutput.strip() # '640x360'
isRunCmdOk = True
except subprocess.CalledProcessError as callProcessErr:
cmdErrStr = str(callProcessErr)
print("Error %s for run command %s" % (cmdErrStr, consoleCommand))
# print("isRunCmdOk=%s, consoleOutput=%s" % (isRunCmdOk, consoleOutput))
return isRunCmdOk, consoleOutput
demo :
isRunOk, cmdOutputStr = getCommandOutput("mvn surefire:test", timeout=5)

Execute python scripts from another python script opening another shell

I'm using python 3, I need one script to call the other and run it in a different shell, without passing arguments, I'm using mac os x, but I need it to be cross platform.
I tried with
os.system('script2.py')
subprocess.Popen('script2.py', shell=true)
os.execl(sys.executable, "python3", 'script2.py')
But none of them accomplish what I need.
I use the second script to get inputs, while the first one handles the outputs...
EDIT
This is the code on my second script:
import sys
import os
import datetime
os.remove('Logs/consoleLog.txt')
try:
os.remove('Temp/commands.txt')
except:
...
stopSim = False
command = ''
okFile = open('ok.txt', 'w')
okFile.write('True')
consoleLog = open('Logs/consoleLog.txt', 'w')
okFile.close()
while not stopSim:
try:
sysTime = datetime.datetime.now()
stringT = str(sysTime)
split1 = stringT.split(" ")
split2 = split1[0].split("-")
split3 = split1[1].split(":")
for i in range(3):
split2.append(split3[i])
timeString = "{0}-{1}-{2} {3}:{4}".format(split2[2], split2[1], split2[0], split2[3], split2[4])
except:
timestring = "Time"
commandFile = open('Temp/commands.txt', 'w')
command = input(timeString + ": ")
command = command.lower()
consoleLog.write(timeString + ': ' + command + "\n")
commandFile.write(command)
commandFile.close()
if command == 'stop simulation' or command == 'stop sim':
stopSim = True
consoleLog.close()
os.remove('Temp/commands.txt')
and this is where I call and what for the other script to be operative in script 1:
#Open console
while not consoleOpent:
try:
okFile = open('ok.txt', 'r')
c = okFile.read()
if c == 'True':
consoleOpent = True
except:
...
Sorry for the long question...
Any suggestion to improve the code is welcomed.

Probably the easiest solution is to make the contents of you second script a function in the first script, and execute it as a multiprocessing Process. Note that you can use e.q. multiprocessing.Pipe or multiprocessing.Queue to exchange data between the different processes. You can also share values and arrays via multiprocessing.sharedctypes.

This will be platform-dependent. Here a solution for Mac OS X.
Create new file run_script2 with this content:
/full/path/to/python /full/path/to/script2.py
Make it executable.: chmod +x run_script2
Run from Python with:
os.system('open -a Terminal run_script2')
Alternatively you can use: subprocess.call.
subprocess.call(['open -a Terminal run_script2'], shell=True)
On Windows you can do something similar with (untested):
os.system('start cmd /D /C "python script2.py && pause"')

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

gsutil without -m multithreading / parallel default behavior - multithreading

Related

Run linux shell commands in Python3

Run random process in ubuntu terminal

Find execution time for subprocess.Popen python

Setting timeout when using os.system function

Execute python scripts from another python script opening another shell

Categories

Resources