File edition after glob function through python - linux

I wanted to find every instances of a file under different directories and search for value 0 in each of those files and replace them with 500.
Please find the code below:
!/usr/bin/python
import glob
import os
a = glob.glob('/home/*/hostresolve')
for i in a:
print i
=================================
Now that I found all instances of hostresolve file under home. I wanted to search for value 0 and replace them with value 500 in each of these files. I know there is find and replace function in python but I wanted to know how can we use it to output that we got through glob.

As from [Python Docs] (https://docs.python.org/2/library/glob.html) glob.glob returns a list. In your case, its a list of matching files in the directory. hence to replace the required text in the all the files, we should iterative over the list. Accordingly the code would be
import glob
import os
a = glob.glob('/home/*/host*')
for files in a:
with open(files, 'r') as writingfile:
read_data = writingfile.read()
with open(files, 'w') as writingfile:
write_data = read_data.replace('0', '500')
writingfile.write(write_data)
Also using "with" to operate on file data is efficient, because it handles close() and flush() automatically avoiding excess code and it has been suggested in previous answers [1] (https://stackoverflow.com/a/17141572/6005652).
Further to reuse or make it more efficient, u can refer to maps (https://docs.python.org/3/library/functions.html?highlight=map#map) as the list of files is an iterable object.
From my understanding, this suffices an answer to your question.

It worked except for one thing
There are three files instances the code worked for 2 and 3 instance but first instance file remains same.
[root#localhost home]# cat /home/dir1/hostresolve
O
[root#localhost home]# cat /home/dir2/hostresolve
500
[root#localhost home]# cat /home/dir3/hostresolve
500
Please find the code below :
!/usr/bin/python
import glob
import os
a = glob.glob('/home/*/hostresolve')
for files in a:
print files
with open(files, 'r') as writingfile:
read_data = writingfile.read()
with open(files , 'w') as writingfile:
write_data = read_data.replace('0','500')
writingfile.write(write_data)
But when I print files I get all instance of the file which means for loop will process all 3 instances and also checked the permission of these files and I found that all 3 have same permissions

Related

Extract tar.gz{some integer} in python

I am trying to extract a file name with this format--> filename.tar.gz10
I have tried mutpile wayd but for all of them, I get the error that is unknow format. it works fine for files ends with tar.gz00. I tried to change the name but still does not work.
Here are what I have tried,
import tarfile
file = tarfile.open('filename.tar.gz10')
file.extractall('./extracted_path')
file.close()
Another way is,
shutil.unpack_archive('./filename.tar.gz10', './extracted_path', 'tar.gz17')
Thanks for your help in advance.
This coule be because the archive was split into smaller chunks, on linux you could do so using the split -b command so one big file is actually multiple smaller ones now, and they are named like
file.tar.gz01
file.tar.gz02
file.tar.gz03
file.tar.gz04
etc...
you wont be able to decompress these file individually, so you have to concatenate them first into one file then decompress.
To verify whther it was split or not, run file {filename} and if does not recognize it as a gzip compressed archive then it is propably split (this is why you get unknown format error)
You can try to do the following:
from glob import glob
import os
path = '/path/to/' # location of your files
list_of_files = glob(path + '*.tar.gz*') # list all gzip files
bash_command = 'gzip -dk filename.tar.gz' + ' '.join(list_of_files) # create bash command to concatenate the files
os.system(bash_command)

Is there a way to extract only specific lines from a text file using python

I have a big text file that has around 200K lines of records/lines.
But I need to extract only specific lines which Start with CLM. For example, if the file has 100K lines that start with CLM I should print all that 100K lines alone.
Can anyone help me to achieve this using python script?
There can be multiple ways to achieve this.
you can simply iterate through the lines and search for a pattern using the re library
Solution 1
# Note :- Regex is faster in terms of execution as compared to string match
import re
pattern = re.compile("CLM")
for line in open("sample.txt"):
for match in re.finditer(pattern, line):
print(line)
If you want you can also run the bash command inside the python script.
Solution 2
There are two popular modules to use:- os and subprocess
os is kind of deprecated, I would recommend using the subprocess module as below:-
Below is the code to print the output on the console: -
import subprocess
process = subprocess.Popen(['grep', '-i', '^hel*', 'sample.txt'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,universal_newlines=True)
stdout, stderr = process.communicate()
print(stdout)
In the above, we are passing the argument universal_newlines=True because the output (stdout) is of type bytes.
In the above grep command I have passes -i argument to ignore case sensitivity. If you want only to search for CLM and not clm, remove that and use it
I have used the grep command to depict the use case, you can also use awk or sed or any command as per your requirement.
Just an addon, if you want to save the output in some file, let's say ouput.txt you can achieve this as below:-
import subprocess
with open('output.txt', 'w') as f:
process = subprocess.Popen(['grep', '-i', '^hel*', 'file.txt'], stdout=f)
If your file is extremely large, you can also do a poll and check for the subprocess execution status. Refer to the below link for more details on that.
Python-Shell-Commands
try:
with open('file.txt') as f:
for line in f:
if line.startswith('CLM'):
print(line.rstrip())

Python script does not print output as supposed

I have a very simple (test) code which I'm running either from a Linux shell, or in interactive mode, and I have two different behaviours I cannot figure out the reason of.
I have a file generated by a Popen call, previously, where each line is a file path. This is the code used to generate the file:
with open('find.txt','w') as f:
find = subprocess.Popen(["find",".","-name","myfile.out"],stdout=f)
(Incidentally, I was trying to build a PIPE originally, namely inputting the output of this command to a grep command, and since I wasn't successful in any way, I decided to break the problem down and just read the file paths from a file, and process them one by one. So maybe there is a common issue that is blocking me somewhere in this procedure).
Since in this second step I wasn't even able to open and process the files by opening the addresses contained in each line of the find.txt file, I just tried to print the file lines out, because for sure they're available in there:
with open('find.txt','r') as g:
for l in g.readlines():
print(l)
Now, the interesting part:
if I paste the lines above into a python shell, everything works fine and I get my outputs as expected
if, on the other hand, I try to run python test.py, where test.py is the name of the file containing the lines above, no output appears in the shell's stdout.
I've tried sys.stdout.flush() to no avail. I've also inserted some dummy print() statements along the way: everything gets printed but what's after the g.readlines() statement.
Here's the full script I'm trying to make work (a pre-precursor of what I'm actually after, tbh).
#!/usr/bin/env python3
import subprocess
import sys
with open('find.txt','w') as f:
find = subprocess.Popen(["find",".","-name","myfile.out"],stdout=f)
print('hello')
with open('find.txt','r') as g:
print('hello?')
for l in g.readlines():
print('help me!')
print(l)
sys.stdout.flush()
output being:
{ancis:>106> python test.py
hello
hello?
{ancis:>106>
EDIT
I've quickly tried the very same lines (but without the call to find, which isn't available) on my python installation in Windows: it works as expected)
Based on that, I've tried to run the simpler code below:
print('hello')
with open('find.txt','r') as g:
print('hello?')
for l in g.readlines():
print('help me!')
print(l)
sys.stdout.flush()
as a script, in Linux - This also works w/o problems.
This should mean that somehow I'm messing things up with the call to Popen... But what?
This is a race condition.
Your call to
find = subprocess.Popen(["find",".","-name","myfile.out"],stdout=f)
is opening another process and running your find command which takes a bit of time to fully execute.
Python then continues on and reaches the reading of the file portion before the command is fully executed and the file is generated.
Want to test it out?
Add a time.sleep(1) just before the opening of the file.
Full test script:
#!/usr/bin/env python3
import subprocess
import time
with open('find.txt','w') as f:
find = subprocess.Popen(["find",".","-name","myfile.out"],stdout=f)
time.sleep(1)
with open('find.txt','r') as g:
for l in g:
print(l)
To block until the process is complete you can use find.communicate().
With this you can also optionally set a timeout if that's something that you want.
#!/usr/bin/env python3
import subprocess
with open('find.txt','w') as f:
find = subprocess.Popen(["find",".","-name","myfile.out"],stdout=f)
find.communicate()
with open('find.txt','r') as g:
for l in g:
print(l)
Source:
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.communicate

File Paths From Ch 10 Python Crash Course File Path Example

I'm trying to figure out how to use File Paths. This is the example that I am given but, it makes zero sense, I tried to copy the exact path that didn't work. I'm using Pycharm.
What I tested
file_path = 'D:\PycharmProjects\Standard_Library\pi_digits.txt'
with open(file_path) as file_object:
Book Example Below
file_path = '/home/ehmatthes/other_files/text_files/filename.txt'
with open(file_path) as file_object:
The author uses the Unix system and you are using the windows system and the only difference between the two examples is the file-separator.
In Python, you can declare separators either with hard-coded (For Unix: /, for windows: \)
But you can use os.path to remove the confusion of the os separator. Just place the text file in your current directory and you can use it in the example like below:
import os.path
text_file = 'pi_digits.txt'
file_path = os.path.join(os.getcwd(), text_file)
print(file_path)
Out:
/Users/PycharmProjects/StackOverFlow-pip/pi_digits.txt
Since I'm also using a Unix system my example is similar to the book example. But If you try it in your pc you will see similar to the below:
'D:\PycharmProjects\Standard_Library\pi_digits.txt'
Then you can open the text file and read it using with open(file_path) as file_object:

In Python, list certain type of file in a directory on Linux

In my directory, there are a kind of type of file end in .log file.
In ordinary, I use ls .*log commands to list all files.
However, I wanna to use Python code to handle with it. There are two ways I've tried.
First:
import subprocess
ls_al = subprocess.check_output(['ls','.*log'])
but it returns ls: .*log: No such file or directory
Second:
import subprocess
ls_al = subprocess.check_Popen(['ls','.*log'],stdout=subprocess.PIPE)
ls = ls_al.stdout.read().strip()
but those two didn't work.
Can anyone help with this?
Globbing patterns are expanded by the shell, but you are running the command directly. You'd have to run the command through the shell:
ls_al = subprocess.check_output('ls *.log', shell=True)
where you pass in the full command line to the shell as a string (and use the correct glob syntax).
Demo (using *.py):
>>> subprocess.check_output(['ls', '*.py'])
ls: *.py: No such file or directory
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/subprocess.py", line 575, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['ls', '*.py']' returned non-zero exit status 1
>>> subprocess.check_output('ls *.py', shell=True)
'calc.py\ndAll.py\nexample.py\ninplace.py\nmyTests.py\ntest.py\n'
Note that the correct way in Python is to use os.listdir() with manual filtering, filter with the fnmatch module, or use the glob module to list and filter together:
>>> import glob
>>> glob.glob('*.py')
['calc.py', 'dAll.py', 'example.py', 'inplace.py', 'myTests.py', 'test.py']
.*log seems like regular expression, not globbing pattern. Do you mean *.log? (need shell=True argument to make shell do glob expansion)
BTW, glob.glob('*.log') is more preferable way if you want list of file paths.
Rather than run an external command, you could use Python's os module to get the files in the directory. Then the re module can be used to create a regular expression to filter for your log files. I think this would be a more pythonic approach. It should also work on multiple platforms without modification. Note that in the code below I'm assuming your log files all end with '.log'; if you need something else you'll need to tinker with the regex.
import os
import re
import sys
the_dir = sys.argv[1]
all_files = os.listdir(the_dir)
log_files = []
log_pattern = re.compile('.*\.log')
for fn in all_files:
if re.match(log_pattern, fn):
log_files.append(fn)
print log_files
Why not use glob?
$ ls
abc.txt bar.log def.txt foo.log ghi.txt zoo.log
$ python
>>> import glob
>>> for logfile in glob.glob('*.log'):
... print(logfile)
...
bar.log
foo.log
zoo.log
>>>

Resources