In Python, list certain type of file in a directory on Linux - linux

In my directory, there are a kind of type of file end in .log file.
In ordinary, I use ls .*log commands to list all files.
However, I wanna to use Python code to handle with it. There are two ways I've tried.
First:
import subprocess
ls_al = subprocess.check_output(['ls','.*log'])
but it returns ls: .*log: No such file or directory
Second:
import subprocess
ls_al = subprocess.check_Popen(['ls','.*log'],stdout=subprocess.PIPE)
ls = ls_al.stdout.read().strip()
but those two didn't work.
Can anyone help with this?

Globbing patterns are expanded by the shell, but you are running the command directly. You'd have to run the command through the shell:
ls_al = subprocess.check_output('ls *.log', shell=True)
where you pass in the full command line to the shell as a string (and use the correct glob syntax).
Demo (using *.py):
>>> subprocess.check_output(['ls', '*.py'])
ls: *.py: No such file or directory
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/subprocess.py", line 575, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['ls', '*.py']' returned non-zero exit status 1
>>> subprocess.check_output('ls *.py', shell=True)
'calc.py\ndAll.py\nexample.py\ninplace.py\nmyTests.py\ntest.py\n'
Note that the correct way in Python is to use os.listdir() with manual filtering, filter with the fnmatch module, or use the glob module to list and filter together:
>>> import glob
>>> glob.glob('*.py')
['calc.py', 'dAll.py', 'example.py', 'inplace.py', 'myTests.py', 'test.py']

.*log seems like regular expression, not globbing pattern. Do you mean *.log? (need shell=True argument to make shell do glob expansion)
BTW, glob.glob('*.log') is more preferable way if you want list of file paths.

Rather than run an external command, you could use Python's os module to get the files in the directory. Then the re module can be used to create a regular expression to filter for your log files. I think this would be a more pythonic approach. It should also work on multiple platforms without modification. Note that in the code below I'm assuming your log files all end with '.log'; if you need something else you'll need to tinker with the regex.
import os
import re
import sys
the_dir = sys.argv[1]
all_files = os.listdir(the_dir)
log_files = []
log_pattern = re.compile('.*\.log')
for fn in all_files:
if re.match(log_pattern, fn):
log_files.append(fn)
print log_files

Why not use glob?
$ ls
abc.txt bar.log def.txt foo.log ghi.txt zoo.log
$ python
>>> import glob
>>> for logfile in glob.glob('*.log'):
... print(logfile)
...
bar.log
foo.log
zoo.log
>>>

Related

Is there a way to extract only specific lines from a text file using python

I have a big text file that has around 200K lines of records/lines.
But I need to extract only specific lines which Start with CLM. For example, if the file has 100K lines that start with CLM I should print all that 100K lines alone.
Can anyone help me to achieve this using python script?
There can be multiple ways to achieve this.
you can simply iterate through the lines and search for a pattern using the re library
Solution 1
# Note :- Regex is faster in terms of execution as compared to string match
import re
pattern = re.compile("CLM")
for line in open("sample.txt"):
for match in re.finditer(pattern, line):
print(line)
If you want you can also run the bash command inside the python script.
Solution 2
There are two popular modules to use:- os and subprocess
os is kind of deprecated, I would recommend using the subprocess module as below:-
Below is the code to print the output on the console: -
import subprocess
process = subprocess.Popen(['grep', '-i', '^hel*', 'sample.txt'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,universal_newlines=True)
stdout, stderr = process.communicate()
print(stdout)
In the above, we are passing the argument universal_newlines=True because the output (stdout) is of type bytes.
In the above grep command I have passes -i argument to ignore case sensitivity. If you want only to search for CLM and not clm, remove that and use it
I have used the grep command to depict the use case, you can also use awk or sed or any command as per your requirement.
Just an addon, if you want to save the output in some file, let's say ouput.txt you can achieve this as below:-
import subprocess
with open('output.txt', 'w') as f:
process = subprocess.Popen(['grep', '-i', '^hel*', 'file.txt'], stdout=f)
If your file is extremely large, you can also do a poll and check for the subprocess execution status. Refer to the below link for more details on that.
Python-Shell-Commands
try:
with open('file.txt') as f:
for line in f:
if line.startswith('CLM'):
print(line.rstrip())

Python: NameError from calling a file from the commandline arguments

For an assignment I'm supposed to have to have a line to open a file that is passed as an argument in the commandline, I keep getting
Traceback (most recent call last):
File "execute.py", line 1, in <module>
program=open(programfilename, "r")
NameError: name 'programfilename' is not defined
My code to this point is program=open(programfilename, "r"). I'm not quiet sure what is wrong. It is the first line in my program. Execute.py is the name of my code.
You need to set the programfilename variable to the name/path of the file on a previous line. Alternatively, you could put the filename in quotes instead.
It is the first line in my program
Well there's your problem. You are using programfilename without having defined it first.
Try something like
import sys
programfilename = sys.argv[0] # argument you passed into your program.
program=open(programfilename, "r")
I am not sure what exactly you are trying to.
If you want to call a file using command line, the code can be like this
import sys
with open(sys.argv[1], 'r') as f:
print(f.read())
Run like this
python3 execute.py programfilename
If you want your program to get printed on the console, the code can be like this
import sys
with open(sys.argv[0], 'r') as f:
print(f.read())
This will print the code on the console.
Run like this
python3 execute.py

subprocess.call() problems using the '>'

I'm having trouble with the call function
I'm trying to redirect the output of a program to a text file by using the '>'
This is what I've tried:
import subprocess
subprocess.call(["python3", "test.py", ">", "file.txt"])
but it still displaying the output on the command prompt and not in the txt file
There are two approaches to solving this.
Have python handle the redirection:
with open('file.txt', 'w') as f:
subprocess.call(["python3", "test.py"], stdout=f)
Have the shell handle redirection:
subprocess.call(["python3 test.py >file.txt"], shell=True)
Generally, the first is to be preferred because it avoids the vagaries of the shell.
Lastly, you should look into the possibility that test.py can be run as an imported module rather than calling it via subprocess. Python is designed so that it is easy to write scripts so that the same functionality is available either at the command line (python3 test.py) or as a module (import test).

Execute command on linux terminal using subprocess in python

I want to execute following command on linux terminal using python script
hg log -r "((last(tag())):(first(last(tag(),2))))" work
This command give changesets between last two tags who have affected files in "work" directory
I tried:
import subprocess
releaseNotesFile = 'diff.txt'
with open(releaseNotesFile, 'w') as f:
f.write(subprocess.call(['hg', 'log', '-r', '"((last(tag())):(first(last(tag(),2))))"', 'work']))
error:
abort: unknown revision '((last(tag())):(first(last(tag(),2))))'!
Traceback (most recent call last):
File "test.py", line 4, in <module>
f.write(subprocess.call(['hg', 'log', '-r', '"((last(tag())):(first(last(tag(),2))))"', 'work']))
TypeError: expected a character buffer object
Working with os.popen()
with open(releaseNotesFile, 'w') as file:
f = os.popen('hg log -r "((last(tag())):(first(last(tag(),2))))" work')
file.write(f.read())
How to execute that command using subprocess ?
To solve your problem, change the f.write(subprocess... line to:
f.write(subprocess.call(['hg', 'log', '-r', '((last(tag())):(first(last(tag(),2))))', 'dcpp']))
Explanation
When calling a program from a command line (like bash), will "ignore" the " characters. The two commands below are equivalent:
hg log -r something
hg "log" "-r" "something"
In your specific case, the original version in the shell has to be enclosed in double quotes because it has parenthesis and those have a special meaning in bash. In python that is not necessary since you are enclosing them using single quotes.

Subprocess library won't execute compgen

I am trying to make list of every command available on my linux (Lubuntu) machine. I would like to further work with that list in Python. Normally to list the commands in the console I would write "compgen -c" and it would print the results to stdout.
I would like to execute that command using Python subprocess library but it gives me an error and I don't know why.
Here is the code:
#!/usr/bin/python
import subprocess
#get list of available linux commands
l_commands = subprocess.Popen(['compgen', '-c'])
print l_commands
Here is the error I'm getting:
Traceback (most recent call last):
File "commands.py", line 6, in <module>
l_commands = subprocess.Popen(['compgen', '-c'])
File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
I'm stuck. Could you guys help me with this? How to I execute the compgen command using subprocess?
compgen is a builtin bash command, run it in the shell:
from subprocess import check_output
output = check_output('compgen -c', shell=True, executable='/bin/bash')
commands = output.splitlines()
You could also write it as:
output = check_output(['/bin/bash', '-c', 'compgen -c'])
But it puts the essential part (compgen) last, so I prefer the first variant.
I'm not sure what compgen is, but that path needs to be absolute. When I use subprocess, I spell out the exact page /absolute/path/to/compgen

Resources