I'm having trouble with the call function
I'm trying to redirect the output of a program to a text file by using the '>'
This is what I've tried:
import subprocess
subprocess.call(["python3", "test.py", ">", "file.txt"])
but it still displaying the output on the command prompt and not in the txt file
There are two approaches to solving this.
Have python handle the redirection:
with open('file.txt', 'w') as f:
subprocess.call(["python3", "test.py"], stdout=f)
Have the shell handle redirection:
subprocess.call(["python3 test.py >file.txt"], shell=True)
Generally, the first is to be preferred because it avoids the vagaries of the shell.
Lastly, you should look into the possibility that test.py can be run as an imported module rather than calling it via subprocess. Python is designed so that it is easy to write scripts so that the same functionality is available either at the command line (python3 test.py) or as a module (import test).
Related
I have a big text file that has around 200K lines of records/lines.
But I need to extract only specific lines which Start with CLM. For example, if the file has 100K lines that start with CLM I should print all that 100K lines alone.
Can anyone help me to achieve this using python script?
There can be multiple ways to achieve this.
you can simply iterate through the lines and search for a pattern using the re library
Solution 1
# Note :- Regex is faster in terms of execution as compared to string match
import re
pattern = re.compile("CLM")
for line in open("sample.txt"):
for match in re.finditer(pattern, line):
print(line)
If you want you can also run the bash command inside the python script.
Solution 2
There are two popular modules to use:- os and subprocess
os is kind of deprecated, I would recommend using the subprocess module as below:-
Below is the code to print the output on the console: -
import subprocess
process = subprocess.Popen(['grep', '-i', '^hel*', 'sample.txt'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,universal_newlines=True)
stdout, stderr = process.communicate()
print(stdout)
In the above, we are passing the argument universal_newlines=True because the output (stdout) is of type bytes.
In the above grep command I have passes -i argument to ignore case sensitivity. If you want only to search for CLM and not clm, remove that and use it
I have used the grep command to depict the use case, you can also use awk or sed or any command as per your requirement.
Just an addon, if you want to save the output in some file, let's say ouput.txt you can achieve this as below:-
import subprocess
with open('output.txt', 'w') as f:
process = subprocess.Popen(['grep', '-i', '^hel*', 'file.txt'], stdout=f)
If your file is extremely large, you can also do a poll and check for the subprocess execution status. Refer to the below link for more details on that.
Python-Shell-Commands
try:
with open('file.txt') as f:
for line in f:
if line.startswith('CLM'):
print(line.rstrip())
I am using a third-party C++ program to generate intermediate results for the python program that I am working on. The terminal command that I use looks like follows, and it works fine.
./ukb/src/ukb_wsd --ppr_w2w -K ukb/scripts/wn30g.bin -D ukb/scripts/wn30_dict.txt ../data/glass_ukb_input2.txt > ../data/glass_ukb_output2w2.txt
If I break it down into smaller pieces:
./ukb/src/ukb_wsd - executable program
--ppr_w2w - one of the options/switches
-K ukb/scripts/wn30g.bin - parameter K indicates that the next item is a file (network file)
-D ukb/scripts/wn30_dict.txt - parameter D indicate that the next item is a file (dictionary file)
../data/glass_ukb_input2.txt - input file
> - shell command to write the output to a file
../data/glass_ukb_output2w2.txt - output file
The above works fine for one instance. I am trying to do this for around 70000 items (input files). So found a way by using the subprocess module in Python. The body of the python function that I created looks like this.
with open('../data/glass_ukb_input2.txt', 'r') as input, open('../data/glass_ukb_output2w2w_subproc.txt', 'w') as output:
subprocess.run(['./ukb/src/ukb_wsd', '--ppr_w2w', '-K', 'ukb/scripts/wn30g.bin', '-D', 'ukb/scripts/wn30_dict.txt'],
stdin=input,
stdout=output)
This error is no longer there
When I execute the function, it gives an error as follows:
...
STDOUT = subprocess.STDOUT
AttributeError: module 'subprocess' has no attribute 'STDOUT'
Can anyone shed some light about solving this problem.
EDIT
The error was due to a file named subprocess.py in the source dir which masked Python's subprocess file. Once it was removed no error.
But the program could not identify the input file given in stdin. I am thinking it has to do with having 3 input files. Is there a way to provide more than one input file?
EDIT 2
This problem is now solved with the current approach:
subprocess.run('./ukb/src/ukb_wsd --ppr_w2w -K ukb/scripts/wn30g.bin -D ukb/scripts/wn30_dict.txt ../data/glass_ukb_input2.txt > ../data/glass_ukb_output2w2w_subproc.txt',shell=True)
I have a very simple (test) code which I'm running either from a Linux shell, or in interactive mode, and I have two different behaviours I cannot figure out the reason of.
I have a file generated by a Popen call, previously, where each line is a file path. This is the code used to generate the file:
with open('find.txt','w') as f:
find = subprocess.Popen(["find",".","-name","myfile.out"],stdout=f)
(Incidentally, I was trying to build a PIPE originally, namely inputting the output of this command to a grep command, and since I wasn't successful in any way, I decided to break the problem down and just read the file paths from a file, and process them one by one. So maybe there is a common issue that is blocking me somewhere in this procedure).
Since in this second step I wasn't even able to open and process the files by opening the addresses contained in each line of the find.txt file, I just tried to print the file lines out, because for sure they're available in there:
with open('find.txt','r') as g:
for l in g.readlines():
print(l)
Now, the interesting part:
if I paste the lines above into a python shell, everything works fine and I get my outputs as expected
if, on the other hand, I try to run python test.py, where test.py is the name of the file containing the lines above, no output appears in the shell's stdout.
I've tried sys.stdout.flush() to no avail. I've also inserted some dummy print() statements along the way: everything gets printed but what's after the g.readlines() statement.
Here's the full script I'm trying to make work (a pre-precursor of what I'm actually after, tbh).
#!/usr/bin/env python3
import subprocess
import sys
with open('find.txt','w') as f:
find = subprocess.Popen(["find",".","-name","myfile.out"],stdout=f)
print('hello')
with open('find.txt','r') as g:
print('hello?')
for l in g.readlines():
print('help me!')
print(l)
sys.stdout.flush()
output being:
{ancis:>106> python test.py
hello
hello?
{ancis:>106>
EDIT
I've quickly tried the very same lines (but without the call to find, which isn't available) on my python installation in Windows: it works as expected)
Based on that, I've tried to run the simpler code below:
print('hello')
with open('find.txt','r') as g:
print('hello?')
for l in g.readlines():
print('help me!')
print(l)
sys.stdout.flush()
as a script, in Linux - This also works w/o problems.
This should mean that somehow I'm messing things up with the call to Popen... But what?
This is a race condition.
Your call to
find = subprocess.Popen(["find",".","-name","myfile.out"],stdout=f)
is opening another process and running your find command which takes a bit of time to fully execute.
Python then continues on and reaches the reading of the file portion before the command is fully executed and the file is generated.
Want to test it out?
Add a time.sleep(1) just before the opening of the file.
Full test script:
#!/usr/bin/env python3
import subprocess
import time
with open('find.txt','w') as f:
find = subprocess.Popen(["find",".","-name","myfile.out"],stdout=f)
time.sleep(1)
with open('find.txt','r') as g:
for l in g:
print(l)
To block until the process is complete you can use find.communicate().
With this you can also optionally set a timeout if that's something that you want.
#!/usr/bin/env python3
import subprocess
with open('find.txt','w') as f:
find = subprocess.Popen(["find",".","-name","myfile.out"],stdout=f)
find.communicate()
with open('find.txt','r') as g:
for l in g:
print(l)
Source:
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.communicate
Currently, I have a command that looks something like the following:
my_command = Popen([activate_this_python_virtualenv_file, \
"-m", "my_command", "-l", \
directory_where_ini_file_for_my_command_is + "/" + my_ini_file_name], \
stderr=subprocess.STDOUT, stdout=subprocess.PIPE, shell=False,
universal_newlines=False, cwd=directory_where_my_module_is)
I have figured out how to access and process the output, deal with subprocess.PIPE, and make subprocess do a few other neat tricks.
However, it seems odd to me that the standard Python documentation for subprocess doesn't mention a way to just get the actual command line as subprocess.Popen puts it together from arguments to the Popen constructor.
For example, perhaps my_command.get_args() or something like that?
Is it just that getting the command line run in Popen should be easy enough?
I can just put the arguments together on my own, without accessing the command subprocess runs with Popen, but if there's a better way, I'd like to know it.
It was added in Python 3.3. According to docs:
The following attributes are also available:
Popen.args The args argument as it was passed to Popen – a sequence of
program arguments or else a single string.
New in version 3.3.
So sample code would be:
my_args_list = [] # yourlist
p = subprocess.Popen(my_args_list)
assert p.args == my_args_list
I would like to easily test my python programs without constantly using the python shell since each time the program is modified you have to quit, re-enter the python shell and import the program again. I am using a 2012 Macbook pro with OSX. I have the following code:
import sys
def read_strings(filename):
with open(filename) as file:
return file.read().split('>')[1:0]
file1 = sys.argv[1]
filename = read_strings(file1)
Essentially I would like to read into and split a txt file containing:
id1>id2>id3>id4
I am entering this into my command line:
pal-nat184-102-127:python_stuff ceb$ python3 program.py string.txt
However when I try the sys.argv approach on the command line my program returns nothing. Is this a good approach to testing code, could anyone point me in the correct direction?
This is what I would like to happen:
pal-nat184-102-127:python_stuff ceb$ python3 program.py string.txt
['id1', 'id2', 'id3', 'id4']
Let's take this a piece at a time:
However when I try the sys.argv approach on the command line my
program returns nothing
The final result of your program is that it writes a string into the variable filename. It's a little strange to have a program "return" a value. Generally, you want a program to print it's something out or save something to a file. I'm guessing it would ease your debugging if you modified your program by adding,
print (filename)
at the end: you'd be able to see the result of your program.
could anyone point me in the correct direction?
One other debugging note: It can be useful to write your .py files so that they can be run both independently at the command line or in a python shell. How you've currently structured your code, this will work semi-poorly. (Starting a shell and then importing your file will cause an error because sys.argv[1] isn't defined.)
A solution to this is to change your the bottom section of your code as follows:
if __name__ == '__main__':
file1 = sys.argv[1]
filename = read_strings(file1)
The if guard at the top says, "If running as a standalone script, then run what's below me. If you imported me from some place else, then do not execute what's below me."
Feel free to follow up below if I misinterpreted your question.
You never do anything with the result of read_strings. Try:
print(read_strings(file1))