Finding and deleting files using python script [duplicate] - linux

This question already has answers here:
Get a filtered list of files in a directory
(14 answers)
Closed 6 years ago.
I am writing a Python script to find and remove all .py files having corresponding .pyc files.
How to extract this file list and remove them?
For example : consider there some file in /foo/bar:
file.py
file.pyc
file3.py
file2.py
file2.pyc...etc
I want to delete file.py,file2.py and not file3.py as it do not have corresponding .pyc file.
and I want to do in all folders under '/'.
Is there one-liner bash code for the same?
P.S : I am using CentOS 6.8, having python2.7

Here's my solution:
import os
ab=[]
for roots,dirs,files in os.walk("/home/foo/bar/"):
for file in files:
if file.endswith(".py"):
ab.append(os.path.join(roots,file))
bc=[]
for i in range(len(ab)):
bc.append(ab[i]+"c")
xy=[]
for roots,dirs,files in os.walk("/home/foo/bar/"):
for file in files:
if file.endswith(".pyc"):
xy.append(os.path.join(roots,file))
ex=[x[:-1] for x in bc if x in xy]
for i in ex:
os.remove(i)
P.S: Newbie in python scriptiing.

Bash solution:
#!/bin/bash
find /foo/bar -name "*.py" -exec ls {} \; > file1.txt
find /foo/bar/ -name "*.pyc" -exec ls {} \; > file2.txt
p=`wc -l file1.txt| cut -d' ' -f1`
for ((c=1;c<=$p;c++))
do
grep `sed -n ${c}p file1.txt | sed s/$/c/g` file2.txt > /dev/null
if [ $? -eq 0 ]
then
list=`sed -n ${c}p file1.txt`
echo " exist : $list"
rm -rf `sed -n ${c}p file1.txt`
fi
done

this is a very operating-system-near solution
maybe make a shell script from the following commands and invoke it from python using subprocess.call (How to call a shell script from python code?, Calling an external command in Python)
find . -name "*.pyc" > /tmp/pyc.txt
find . -name "*.py" > /tmp/py.txt
from the entries of these files remove path and file ending using sed or basename:
for f in $(cat /tmp/pyc.txt) ; do
sed 's/.*\///' remove path
sed 's/\.[^.]*$//' remove file ending
done
for f in $(cat /tmp/py.txt) ; do
sed 's/.*\///' remove path
sed 's/\.[^.]*$//' remove file ending
done
(https://unix.stackexchange.com/questions/44735/how-to-get-only-filename-using-sed)
awk 'FNR==NR{a[$1];next}($1 in a){print}' /tmp/pyc.txt /tmp/py.txt > /tmp/rm.txt (https://unix.stackexchange.com/questions/125155/compare-two-files-for-matching-lines-and-store-positive-results)
for f in $(cat /tmp/rm.txt) ; do
rm $f
done (Unix: How to delete files listed in a file)

The following code will work for a single layer directory. (Note: I wasn't sure how you wanted to handle multiple layers of folders --- e.g. if you have A.py in one folder and A.pyc in another, does it count as having both present, or do they have to be in the same layer of the same folder? If the latter case, it should be fairly simple to just loop through the folders and just call this code within each loop.)
import os
# Produces a sorted list of all files in a directory
dirList = os.listdir(folder_path) # Use os.listdir() if want current directory
dirList.sort()
# Takes advantage of fact that both py and pyc files will share same name and
# that pyc files will appear immediately after their py counterparts in dirList
lastPyName = ""
for file in dirList:
if file[-3:] == ".py":
lastPyName = file[:-3]
elif file[-4:] == ".pyc":
if lastPyName == file[:-4]:
os.remove(lastPyName + ".py")
os.remove(lastPyName + ".pyc") # In case you want to delete this too

Related

How to output difference of files from two folders and save the output with the same name on different folder

I have two folders which have same file names, but different contents. So, I am trying to generate a script to get the difference and to see what is being changed. I wrote a script below :
folder1="/opt/dir1"
folder2=`ls/opt/dir2`
find "$folder1/" /opt/dir2/ -printf '%P\n' | sort | uniq -d
for item in `ls $folder1`
do
if [[ $item == $folder2 ]]; then
diff -r $item $folder2 >> output.txt
fi
done
I believe this script has to work, but it is not giving any output on output folder.
So the desired output should be in one file . Ex:
cat output.txt
diff -r /opt/folder1/file1 /opt/folder2/file1
1387c1387
< ALL X'25' BY SPACE
---
> ALL X'0A' BY SPACE
diff -r /opt/folder1/file2 /opt/folder2/file2
2591c2591
< ALL X'25' BY SPACE
---
> ALL X'0A' BY SPACE
Any help is appreciated!
Ok. So twofold:
First get the files in one folder. Never use ls. Forget it exists. ls is for nice printing in our console. In scripts, use find.
Then for each file do some command. A simple while read loop.
So:
{
# make find print relative to `/opr/dir1` director
cd /opt/dir1 &&
# Use `%P` so that print without leading `./`
find . -mindepth 1 -type f -print "%P\n"
} |
while IFS= read -r file; do
diff /opt/dir1/"$file" /opt/dir2/"$file" >> output/"$file"
done
Notes:
always quote your variable
Why you shouldn't parse the output of ls(1)

How to rename file based on parent and child folder name in bash script

I would like to rename file based on parent/subparent directories name.
For example:
test.xml file located at
/usr/local/data/A/20180101
/usr/local/data/A/20180102
/usr/local/data/B/20180101
how to save test.xml file in /usr/local/data/output as
A_20180101_test.xml
A_20180102_test.xml
b_20180101_test.xml
tried shall script as below but does not help.
#!/usr/bin/env bash
target_dir_path="/usr/local/data/output"
for file in /usr/local/data/*/*/test.xml; do
l1="${file%%/*}"
l2="${file#*/}"
l2="${l2%%/*}"
filename="${file##*/}"
target_file_name="${l1}_${l2}_${filename}"
echo cp "$file" "${target_dir_path}/${target_file_name}"
done
Anything i am doing wrong in this shall script?
You can use the following command to do this operation:
source_folder="usr/local/data/";target_folder="target"; find $source_folder -type f -name test.xml | awk -v targetF=$target_folder 'BEGIN{FS="/"; OFS="_"}{printf $0" "; print targetF"/"$(NF-2),$(NF-1),$NF}' | xargs -n2 cp;
or on several lines for readibility:
source_folder="usr/local/data/";
target_folder="target";
find $source_folder -type f -name test.xml |\
awk -v targetF=$target_folder 'BEGIN{FS="/"; OFS="_"}{printf $0" "; print targetF"/"$(NF-2),$(NF-1),$NF}' |\
xargs -n2 cp;
where
target_folder is your target folder
source_folder is your source folder
the find command will search for all the test.xml named files present under this source folder
then the awk command will receive the target folder as a variable to be able to use it, then in the BEGIN bloc you define the field separator and output field separator, then you just print the initial filename as well as the new one
you use xargs to pass the result output grouped by 2 to the cp command and the trick is done
TESTED:
TODO:
you will just need to set up your source_folder and target_folder variables with what is on your environment and eventually put it in a script and you are good to go!
I've modified your code a little to get it to work. See comments in code
target_dir_path=""/usr/local/data/output"
for file in /usr/local/data/*/*/test.xml; do
tmp=${file%/*/*/*}
curr="${file#"$tmp/"}" # Extract wanted part of the filename
mod=${curr//[\/]/_} # Replace forward slash with underscore
mv "$file" "$target_dir_path$mod" # Move the file
done
if you have perl based rename command
$ for f in tst/*/*/test.xml; do
rename -n 's|.*/([^/]+)/([^/]+)/(test.xml)|./$1_$2_$3|' "$f"
done
rename(tst/A/20180101/test.xml, ./A_20180101_test.xml)
rename(tst/A/20180102/test.xml, ./A_20180102_test.xml)
rename(tst/B/20180101/test.xml, ./B_20180101_test.xml)
-n option is for dry run, remove it after testing
change tst to /usr/local/data and ./ to /usr/local/data/output/ for your usecase
.*/ to ignore file path
([^/]+)/([^/]+)/(test.xml) capture required portions
$1_$2_$3 re-arrange as required

UNIX\LINUX: How to add a directory text to each line inside a file?

UNIX\LINUX: How to add a directory text to each line inside a file?
NOTE: I am just using shell(CMD TOOL OF LINUX REDHAT EPIC) no other...
You see I have many log files(.txt.gz) and I was able to open all of them just by using:
foreach i (./*/*dumpfiles.txt.gz_*)
> foreach? zcat $i
> foreach? grep "-e" $i
> foreach? END
Meaning I am going through all those folders finding a file dumpfiles.txt.gz_
The the output is like:
0x4899252 move x -999
0x4899231 move y -0
0x4899222 find scribe
0x4899231 move x -999
etc..
The problem is that I need the directory to be set to each line of the file...
I could get the directory by the command pwd.
The question to my problem is how to add a directory name on each line of the file?
Example:
(directory) (per line of all files)
machine01 0x4899252 move x -999
machine01 0x4899231 move y -0
machine09 0x4899222 find scribe
machine09 0x4899231 move x -999
etc..
I tried using $ sed but I cant find the solution... :(
Thanks...
here's a little perl script that does what you ask for (input is the filename):
$file = shift;
$path = `pwd`;
chomp($path);
open(TRY, "< $file");
while ($line = <TRY>) { print ($path . $line);}
close(TRY);
of course this prints to the screen, but you can pour it to file and rename it at the end of the script to $file
if you want to run it on the entire dir and downyou can run
find . -exec scriptname {} \;
if you want it to be on the current dir only, you need to add a -maxdepth 1 flag to the find after the '.'
update:
this also works (with no script, just a shell line):
perl -pi -e 's/^/$ENV{PWD} /g'

Remove all files of a certain type except for one type in linux terminal

On my computer running Ubuntu, I have a folder full of hundreds files all named "index.html.n" where n starts at one and continues upwards. Some of those files are actual html files, some are image files (png and jpg), and some of them are zip files.
My goal is to permanently remove every single file except the zip archives. I assume it's some combination of rm and file, but I'm not sure of the exact syntax.
If it fits into your argument list and no filenames contain colon a simple pipe with xargs should do:
file * | grep -vi zip | cut -d: -f1 | tr '\n' '\0' | xargs -0 rm
First find to find matching file, then file to get file types. sed eliminates other file types and also removes everything but the filenames from the output of file. lastly, rm for deleting:
find -name 'index.html.[0-9]*' | \
xargs file | \
sed -n 's/\([^:]*\): Zip archive.*/\1/p' |
xargs rm
I would run:
for f in in index.html.*
do
file "$f" | grep -qi zip
[ $? -ne 0 ] && rm -i "$f"
done
and remove -i option if you feel confident enough
Here's the approach I'd use; it's not entirely automated, but it's less error-prone than some other approaches.
file * > cleanup.sh
or
file index.html.* > cleanup.sh
This generates a list of all files (excluding dot files), or of all index.html.* files, in your current directory and writes the list to cleanup.sh.
Using your favorite text editor (mine happens to be vim), edit cleanup.sh:
Add #!/bin/sh as the first line
Delete all lines containing the string "Zip archive"
On each line, delete everything from the : to the end of the line (in vim, :%s/:.*$//)
Replace the beginning of each line with "rm" followed by a space
Exit your editor, updating the file.
chmod +x cleanup.sh
You should now have a shell script that will delete everything except zip files.
Carefully inspect the script before running it. Look out for typos, and for files whose names contain shell metacharacters. You might need to add quotation marks to the file names.
(Note that if you do this as a one-line shell command, you don't have the opportunity to inspect the list of files you're going to delete before you actually delete them.)
Once you're satisfied that your script is correct, run
./cleanup.sh
from your shell prompt.
for i in index.html.*
do
$type = file $i;
if [[ ! $file =~ "Zip" ]]
then
rm $file
fi
done
Change the rm to a ls for testing purposes.

Script for renaming files with logical

Someone has very kindly help get me started on a mass rename script for renaming PDF files.
As you can see I need to add a bit of logical to stop the below happening - so something like add a unique number to a duplicate file name?
rename 's/^(.{5}).*(\..*)$/$1$2/' *
rename -n 's/^(.{5}).*(\..*)$/$1$2/' *
Annexes 123114345234525.pdf renamed as Annex.pdf
Annexes 123114432452352.pdf renamed as Annex.pdf
Hope this makes sense?
Thanks
for i in *
do
x='' # counter
j="${i:0:2}" # new name
e="${i##*.}" # ext
while [ -e "$j$x" ] # try to find other name
do
((x++)) # inc counter
done
mv "$i" "$j$x" # rename
done
before
$ ls
he.pdf hejjj.pdf hello.pdf wo.pdf workd.pdf world.pdf
after
$ ls
he.pdf he1.pdf he2.pdf wo.pdf wo1.pdf wo2.pdf
This should check whether there will be any duplicates:
rename -n [...] | grep -o ' renamed as .*' | sort | uniq -d
If you get any output of the form renamed as [...], then you have a collision.
Of course, this won't work in a couple corner cases - If your files contain newlines or the literal string renamed as, for example.
As noted in my answer on your previous question:
for f in *.pdf; do
tmp=`echo $f | sed -r 's/^(.{5}).*(\..*)$/$1$2/'`
mv -b ./"$f" ./"$tmp"
done
That will make backups of deleted or overwritten files. A better alternative would be this script:
#!/bin/bash
for f in $*; do
tar -rvf /tmp/backup.tar $f
tmp=`echo $f | sed -r 's/^(.{5}).*(\..*)$/$1$2/'`
i=1
while [ -e tmp ]; do
tmp=`echo $tmp | sed "s/\./-$i/"`
i+=1
done
mv -b ./"$f" ./"$tmp"
done
Run the script like this:
find . -exec thescript '{}' \;
The find command gives you lots of options for specifing which files to run on, works recursively, and passes all the filenames in to the script. The script backs all file up with tar (uncompressed) and then renames them.
This isn't the best script, since it isn't smart enough to avoid the manual loop and check for identical file names.

Resources