Pass an array as command line argument to the script - python-3.x

I'd like to experiment codes from command line, so import argv form sys.
from sys import argv
def binary_search(array, item):
low = 0
high = len(array) - 1
while low <= high:
mid = (low + high) // 2 # round down if not an even
guess = array[mid]
if guess == item:
return mid
if guess > item:
high = mid - 1
else:
low = mid + 1
return None
def main():
script, array, item = argv
binary_search(array, item)
When run it on the command line:
$ python3 binary_search.py [1, 2, 3] 8
Traceback (most recent call last): File "binary_search.py", line 94, in <module>
main() File "binary_search.py", line 89, in main
script, array, item = argvValueError: too many values to unpack (expected 3)
I tested and found that arguments passed from command line are treated as str by argv.
How can pass an array as argument?

There are a couple different ways you can do this...
using re
Using regular expressions may be one of the easiest ways of handling this.
from sys import argv
import re
def binary_search(array, item):
low = 0
high = len(array) - 1
while low <= high:
mid = (low + high) // 2 # round down if not an even
guess = array[mid]
if guess == item:
return mid
if guess > item:
high = mid - 1
else:
low = mid + 1
return None
def main():
array = re.findall(r"[\w]+", argv[1])
array = [int(i) for i in array]
item = int(argv[2])
binary_search(array,item)
if __name__== "__main__":
main()
using exec()
You can also use exec() which may be more risky and complicated. Here's a simple example:
from sys import argv
command = 'mylist = {0}'.format(argv[1])
exec(command)
for item in mylist:
print(item)
example output:
C:\path>py foo.py [1,2,3]
1
2
3

The arguments on the command line are strings, they're not parsed like literals in the program.
argv construct the strings automatically to a list from command line arguments (as separated by spaces), in short,
sys.argv is a list.
Additionally, module argparse helps
The argparse module makes it easy to write user-friendly command-line interfaces. The program defines what arguments it requires, and argparse will figure out how to parse those out of sys.argv. The argparse module also automatically generates help and usage messages and issues errors when users give the program invalid arguments.

Related

Multiprocessing to return large data sets in Python

I have 2 functions in a Python 3.7 script that search 2 separate network nodes and returns very large data sets of strings in a list. The smaller data set length is ~300K entries, while the larger one is ~1.5M. This script takes almost an hour to execute because of how it has to compile the data sets as well as having the second data set be significantly larger. I don't have a way to shorten the run time by changing how the compilation happens, there's no easier way for me to get the data from the network nodes. But I can cut almost 10 minutes if I can run them simultaneously, so I'm trying to shorten the run time by using multiprocessing so I can run both of them at once.
I do not need them to necessarily start within the same second or finish at the same second, just want them to run at the same time.
Here's a breakdown of first attempt at coding for multiprocessing:
def p_func(arg1, arg2, pval):
## Do Stuff
return pval
def s_func(arg1, sval):
## Do Stuff
return sval
# Creating variables to get return values that multiprocessing can handle
pval = multiprocessing.Value(list)
sval = multiprocessing.Value(list)
# setting up multiprocessing Processes for each function and passing arguments
p1 = multiprocessing.Process(target=p_func, args=(arg1, arg2, pval))
s2 = multiprocessing.Process(target=s_func, args=(arg3, sval))
p1.start()
s1.start()
p1.join()
s1.join()
print("Number of values in pval: ", len(pval))
print("Number of values in sval: ", len(sval))
I believe I have solved my list concerns, so....
Based on comments I've updated my code as follows:
#! python3
import multiprocessing as mp
def p_func(arg1, arg2, pval):
# takes arg1 and arg2 and queries network node to return list of ~300K
# values and assigns that list to pval for return to main()
return pval
def s_func(arg1, sval):
# takes arg1 and queries network node to return list of ~1.5M
# values and assigns that list to sval for return to main()
return sval
# Creating variables to get return values that multiprocessing can handle in
# main()
with mp.Manager() as mgr
pval = mgr.list()
sval = mgr.list()
# setting up multiprocessing Processes for each function and passing
# arguments
p1 = mp.Process(target=p_func, args=(arg1, arg2, pval))
s1 = mp.Process(target=s_func, args=(arg3, sval))
p1.start()
s1.start()
p1.join()
s1.join()
# out of with block
print("Number of values in pval: ", len(pval))
print("Number of values in sval: ", len(sval))
Now getting a TypeError: can't pickle _thread.lock objects on the p1.start() invocation. I'm guessing that one of the variables I have passed in the p1 declaration is causing a problem with multiprocessing, but I'm not sure how to read the error or resolve the problem.
Use a Manager.list() instead:
import multiprocessing as mp
def p_func(pval):
pval.extend(list(range(300000)))
def s_func(sval):
sval.extend(list(range(1500000)))
if __name__ == '__main__':
# Creating variables to get return values that mp can handle
with mp.Manager() as mgr:
pval = mgr.list()
sval = mgr.list()
# setting up mp Processes for each function and passing arguments
p1 = mp.Process(target=p_func, args=(pval,))
s2 = mp.Process(target=s_func, args=(sval,))
p1.start()
s2.start()
p1.join()
s2.join()
print("Number of values in pval: ", len(pval))
print("Number of values in sval: ", len(sval))
Output:
Number of values in pval: 300000
Number of values in sval: 1500000
Manager objects are slower than shared memory but more flexible. Shared memory is faster, so if you know an upper limit for your arrays, you could use a fixed-sized shared memory Array and a shared value indicating the used size instead, such as:
#!python3
import multiprocessing as mp
def p_func(parr,psize):
for i in range(10):
parr[i] = i
psize.value = 10
def s_func(sarr,ssize):
for i in range(5):
sarr[i] = i
ssize.value = 5
if __name__ == '__main__':
# Creating variables to get return values that mp can handle
parr = mp.Array('i',2<<20) # 2M
sarr = mp.Array('i',2<<20)
psize = mp.Value('i',0)
ssize = mp.Value('i',0)
# setting up mp Processes for each function and passing arguments
p1 = mp.Process(target=p_func, args=(parr,psize))
s2 = mp.Process(target=s_func, args=(sarr,ssize))
p1.start()
s2.start()
p1.join()
s2.join()
print("parr: ", parr[:psize.value])
print("sarr: ", sarr[:ssize.value])
Output:
parr: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
sarr: [0, 1, 2, 3, 4]

how to change this code that use xrange to run in python 3?

I'm reading the High-Performance Python book from O'Reilly collection, in the page number 11 I found this code that works for python 2, the point here is to make one instruction that performs(by vectorizing) several at the same time
import math
def check_prime(number):
sqrt_number = math.sqrt(number)
number_float = float(number)
numbers = range(2, int(sqrt_number)+1)
for i in xrange(0, len(numbers), 5):
# the following line is not valid Python code
result = (number_float / numbers[i:(i+5)]).is_integer()
if any(result):
return False
return True
but I get this error
TypeError: unsupported operand type(s) for /: 'float' and 'list'
I've tried to change it to work on python 3 here is my try:
import math
def check_prime(number):
sqrt_number = math.sqrt(number)
number_float = float(number)
numbers = list(range(2, int(sqrt_number)+1))
for i in range(0, len(numbers), 5):
# the following line is not valid Python code
result = (number_float / numbers[i:(i+5)]).is_integer()
if any(result):
return False
return True
I changed the xrange for range and the range(2, int(sqrt_number)+1) for list(range(2, int(sqrt_number)+1)) but i did not have succeed in this. I suppose there is a special operator for sets or something like that but have no idea. if any of you people can help me I'll be so grateful whit you
I looked at the book and that line is not supposed to actually work as is; you cannot divide by a list in Python. The author uses that code as an example of what vectorization would look like. The # the following line is not valid Python code comment is in the original to indicate that.
The closest in term of functionality and semantics would probably be this code:
import math
def check_prime(number):
sqrt_number = math.sqrt(number)
number_float = float(number)
numbers = list(range(2, int(sqrt_number)+1))
for i in range(0, len(numbers), 5):
# the following line is now valid Python code, but not vectorized
result = [(number_float / n).is_integer for n in numbers[i:(i+5)]]
if any(result):
return False
return True
Note that the processing for result in this version is not done in parallel, so it's probably not what the author wanted to demonstrate. As far as I know, vectorization isn't natively available in Python, you would have to use numpy to do it. This article should be useful if you want to try it.
Try this:
import math
def check_prime(number):
sqrt_number = math.sqrt(number)
number_float = float(number)
numbers = list(range(2, int(sqrt_number)+1))
for i in range(0, len(numbers), 5):
result = [number_float % num == 0 for num in numbers[i:(i+5)]]
if any(result):
return False
return True

Problem converting decimal to binary in python

The following code is to convert decimal to binary.
My question is: When num becomes less than or equal to 1, python jumps to the last line i.e print(num % 2, end = '') and consequently prints out 1. But after that, why does it move to line 'decimalToBinary(num // 2)'? That line is supposed to execute only when num > 1
def decimalToBinary(num):
if num > 1:
decimalToBinary(num // 2)
print(num % 2, end='')
decimalToBinary(17)
It's because the last function in the stack finished,so it jumps to the call point of the function in the upper stack.If you add a=1 behind the print(num%2,end='') statement, you will see that a=1 gets called before the control returns to the upper function.
import numpy as np
import pandas as pd
# Read the input
s = int(input())
# Write your code below
print(format(s, 'b'))

python3 - easiest way to share variable from parent to child using scripts

Using script parent.py I would like to set variable parvar and execute child.py and have parvar printed. I am having a hard time wrapping my head around the easiest way to accomplish this. It seems like I could use os.fork() as data present in the parent are presented to the child, but I cannot get it to work. Reading through examples of using multiprocessing I cannot find examples that show sharing data across two different scripts like this.
This is what I have so far:
parent.py
#!/usr/bin/env python3
import subprocess, os
parvar = 'parent var'
pid = os.fork()
if pid == 0:
print('child pid is running')
subprocess.call(['python3', 'child.py'])
exit()
child.py
#!/usr/bin/env python3
childvar = 'child var'
print('this is child var: ', childvar)
print(parvar)
Which returns a NameError:
$ ./parent.py
child pid is running
$ this is child var: child var
Traceback (most recent call last):
File "child.py", line 4, in <module>
print(parvar)
NameError: name 'parvar' is not defined
I think I understand why that is not working. The subprocess call replaces the existing process - spawning a new one. Because that was not forked, whatever I made available to my child PID is now inaccessible to that 3rd process.
Can someone help me with a simple example of getting the above to work?
This is not an answer (not yet at least) but figured I would post this here since it seems to move things in the right direction.
Using the mmap example from this page:
https://blog.schmichael.com/2011/05/15/sharing-python-data-between-processes-using-mmap/
I have (I believe correctly) re-factored it for Python 3 (3.6.8):
a.py
#!/usr/bin/env python3
import ctypes
import mmap
import os
import struct
def main():
# Create new empty file to back memory map on disk
fd = os.open('/tmp/mmaptest', os.O_CREAT | os.O_TRUNC | os.O_RDWR)
# Zero out the file to insure it's the right size
assert os.write(fd, b'\x00' * mmap.PAGESIZE) == mmap.PAGESIZE
# Create the mmap instace with the following params:
# fd: File descriptor which backs the mapping or -1 for anonymous mapping
# length: Must in multiples of PAGESIZE (usually 4 KB)
# flags: MAP_SHARED means other processes can share this mmap
# prot: PROT_WRITE means this process can write to this mmap
buf = mmap.mmap(fd, mmap.PAGESIZE, mmap.MAP_SHARED, mmap.PROT_WRITE)
# Now create an int in the memory mapping
i = ctypes.c_int.from_buffer(buf)
# Set a value
i.value = 10
# And manipulate it for kicks
i.value += 1
assert i.value == 11
# Before we create a new value, we need to find the offset of the next free
# memory address within the mmap
offset = struct.calcsize(i._type_)
# The offset should be uninitialized ('\x00')
assert buf[offset] == 0
# Now ceate a string containing 'foo' by first creating a c_char array
s_type = ctypes.c_char * len('foo')
# Now create the ctypes instance
s = s_type.from_buffer(buf, offset)
# And finally set it
s.raw = b'foo'
print('First 10 bytes of memory mapping: %r' % buf[:10])
input('Now run b.py and press ENTER')
print
print('Changing i')
i.value *= i.value
print('Changing s')
s.raw = b'bar'
new_i = input('Enter a new value for i: ')
i.value = int(new_i)
if __name__ == '__main__':
main()
b.py
#!/usr/bin/env python3
import mmap
import os
import struct
import time
def main():
# Open the file for reading
fd = os.open('/tmp/mmaptest', os.O_RDONLY)
# Memory map the file
buf = mmap.mmap(fd, mmap.PAGESIZE, mmap.MAP_SHARED, mmap.PROT_READ)
i = None
s = None
while 1:
new_i, = struct.unpack('i', buf[:4])
new_s, = struct.unpack('3s', buf[4:7])
if i != new_i or s != new_s:
print('i: %s => %d' % (i, new_i))
print('s: %s => %s' % (s, new_s))
print('Press Ctrl-C to exit')
i = new_i
s = new_s
time.sleep(1)
if __name__ == '__main__':
main()
Execution Example
terminal 1
$ ./a.py
First 10 bytes of memory mapping: b'\x0b\x00\x00\x00foo\x00\x00\x00'
Now run b.py and press ENTER
Changing i
Changing s
Enter a new value for i: 87
terminal 2
$ ./b.py
i: None => 11
s: None => b'foo'
Press Ctrl-C to exit
i: 11 => 121
s: b'foo' => b'bar'
Press Ctrl-C to exit
i: 121 => 87
s: b'bar' => b'bar'
Press Ctrl-C to exit

Factorial of a given number using sequential program

I am new to this python coding.So,please can someone find what is the problem with this code.
def factorial(n):
sum=1
for i in range(1..n+1):
sum=sum*i
print(sum)
return sum
v=int(input("enter the number:"))
factorial(v)
the error i get:
enter the number:4
Traceback (most recent call last):
File "C:/Users/Ramakrishnar/AppData/Local/Programs/Python/Python36/fact.py",line 9, in <module>
factorial(v)
File "C:/Users/Ramakrishnar/AppData/Local/Programs/Python/Python36/fact.py", line 3, in factorial
for i in range(1..n+1):
AttributeError: 'float' object has no attribute 'n'
There are two ways you can write your program. To reformat your code so that it is in good form, you might organize your program like so:
def main():
variable = int(input('Enter the number: '))
print(factorial(variable))
def factorial(number):
total = 1
for integer in range(1, number + 1):
total *= integer
return total
if __name__ == '__main__':
main()
If instead you are trying to accomplish the same thing using the least amount of code, the following two lines will do the exact same thing for you:
import math
print(math.factorial(int(input('Enter the number: '))))

Resources