I've written a script (foo) which makes a simple sed replacement on text in the input file. I have a directory (a) containing a large number of subdirectories (a/b1, a/b2 etc) which all have the same subdirs (c, etc) and contain a file with the same name (d). So the rough structure is:
a/
-b1/
--c/
---d
-b2/
--c/
---d
-b3/
--c/
---d
I want to run my script on every file (d) in the tree. Unfortunately the following doesn't work:
sudo sh foo a/*/c/d
how do I use wildcards in a bash command like this? Do I have to use find with specific max and mindepth, or is there a more elegant solution?
The wildcard expansion in your example should work, and no find should be needed. I assume a b and c are just some generic file names to simplify the question. Do any of your folders/files contain spaces?
If you do:
ls -l a/*/d/c
are you getting the files you need listed? If so, then it is how you handle the $* in your script file. Mind sharing it with us?
As you can see, wildcard expansion works
$ ls -l a/*/c/d
-rw-r--r-- 1 user wheel 0 15 Apr 08:05 a/b1/c/d
-rw-r--r-- 1 user wheel 0 15 Apr 08:05 a/b2/c/d
-rw-r--r-- 1 user wheel 0 15 Apr 08:05 a/b3/c/d
Related
In Linux how do I move files without replacing if a particular file already exists in the destination?
I tried the following command:
mv --backup=t <source> <dest>
The file doesn't get replaced but the issue is the extension gets changed because it puts "~" at the back of the filename.
Is there any other way to preserve the extension but only the filename gets changed when moving?
E.g.
test~1.txt instead of test.txt~1
When the extension gets replaced, subsequently you can't just view a file by double clicking on it.
If you want to make it in shell, without requiring atomicity (so if two shell processes are running the same code at the same time, you could be in trouble), you simply can (using the builtin test(1) feature of your shell)
[ -f destfile.txt ] || mv srcfile.txt destfile.txt
If you require atomicity (something that works when two processes are simultaneously running it), things are quite difficult, and you'll need to call some system calls in C. Look into renameat2(2)
Perhaps you should consider using some version control system like git ?
mv has an option:
-S, --suffix=SUFFIX
override the usual backup suffix
which you might use; however afaik mv doesn't have a functionality to change part of the filename but not the extension. If you just want to be able to open the backup file with a text editor, you might consider something like:
mv --suffix=.backup.txt <source> <dest>
how this would work: suppose you have
-rw-r--r-- 1 chris users 2 Jan 25 11:43 test2.txt
-rw-r--r-- 1 chris users 0 Jan 25 11:42 test.txt
then after the command mv --suffix=.backup.txt test.txt test2.txt you get:
-rw-r--r-- 1 chris users 0 Jan 25 11:42 test2.txt
-rw-r--r-- 1 chris users 2 Jan 25 11:43 test2.txt.backup.txt
#aandroidtest: if you are able to rely upon a Bash shell script and the source directory (where the files reside presently) and the target directory (where you want to them to move to) are same file system, I suggest you try out a script that I wrote. You can find it at https://github.com/jmmitchell/movestough
In short, the script allows you to move files from a source directory to a target directory while taking into account new files, duplicate (same file name, same contents) files, file collisions (same file name, different contents), as well as replicating needed subdirectory structures. In addition, the script handles file collision renaming in three forms. As an example if, /some/path/somefile.name.ext was found to be a conflicting file. It would be moved to the target directory with a name like one of the following, depending on the deconflicting style chosen (via the -u= or --unique-style= flag):
default style : /some/path/somefile.name.ext-< unique string here >
style 1 : /some/path/somefile.name.< unique string here >.ext
style 2 : /some/path/somefile.< unique string here >.name.ext
Let me know if you have any questions.
Guess mv command is quite limited if moving files with same filename.
Below is the bash script that can be used to move and if the file with the same filename exists it will append a number to the filename and the extension is also preserved for easier viewing.
I modified the script that can be found here:
https://superuser.com/a/313924
#!/bin/bash
source=$1
dest=$2
file=$(basename $source)
basename=${file%.*}
ext=${file##*.}
if [[ ! -e "$dest/$basename.$ext" ]]; then
mv "$source" "$dest"
else
num=1
while [[ -e "$dest/$basename$num.$ext" ]]; do
(( num++ ))
done
mv "$source" "$dest/$basename$num.$ext"
fi
I have a directory that looks a little like this:
drw-r--r-- 1 root root 0 Jan 24 17:26 -=1=-directoryname
drw-r--r-- 1 root root 0 Jan 24 17:26 -=2=-directoryname
drw-r--r-- 1 root root 0 Jan 24 17:26 -=3=-directoryname
drw-r--r-- 1 root root 0 Jan 24 17:26 -=4=-directoryname
drw-r--r-- 1 root root 0 Jan 24 17:26 -=5=-directoryname
I am trying to write a script to change these folders from
-=1=- Folder#1
to strip off the "-=1=-" section, but alas I am having no luck.
Can anyone help me find a solution to this?
So far my script below has failed me.
#!/bin/bash
for i in {1..250}
do
rename "-=$i=-" ""*
i=i+1
done
I have used the 1..250 because there are 250 folders.
Given the number, you can manufacture the names and use the mv command:
#!/bin/bash
for i in {1..250}
do
mv "-=$i=- Folder#$i" "Folder#$i"
done
With the Perl-based rename command (sometimes called prename), you could use:
rename 's/-=\d+=- //' -=*=-*Folder#*
or, given the revised question (the information after the pattern isn't fixed):
rename 's/-=\d+=- //' -=*=-*
This worked! Can you please explain how it worked? What's the \d+ for?
The \d is Perl regex notation for a digit 0..9. The + modifier indicates 'one or more'. So, the regex part of s/-=\d+=- // looks for a minus, an equals, one or more digits, an equals, a minus and a space. The replace part converts all of the matched material into an empty string. It's all surrounded by single quotes so the shell leaves it alone (though there's only the backslash that's a shell metacharacter in that substitute command, but the backslash and space would need protecting if you omitted the quotes).
I'm not sure how you'd use the C-based rename command for this job; it is much less powerful than the Perl-based version.
What does the "->" notation mean in Linux .
eg. When I do ls -l in a particular folder, I get the following.
lrwxrwxrwx 1 root root 29 Feb 27 12:23 ojdbc.jar -> /apps/hadoop/sqoop/ojdbc6.jar
Is the first file a placeholder of the second one?
Kind Regards.
It means the file is not a physical file, but a symbolic link pointing to the file to the right of the arrow.
The command "ls -l" uses "->" to denote a symbolic-link (that is, a psuedo-file which only points to another file).
In your example ojdbc.jar is a symbolic-link to /apps/hadoop/sqoop/ojdbc6.jar.
I'm not aware that this meaning holds beyond ls, however.
I need to do a find on roughly 1500 file names and was wondering if there is a way to execute simultaneous find commands at the same time.
Right now I do something like
for fil in $(cat my_file)
do
find . -name $fil >> outputfile
done
is there a way to spawn multiple instances of find to speed up the process. Right now it takes about 7 hours to run this loop one file at a time.
Given the 7-hour runtime you mention, I presume the file system has some millions of files in it so that OS disk buffers loaded in one query are being reused before the next query begins. You can test this hypothesis by timing the same find a few times, as in following example.
tini ~ > time find . -name IMG_0772.JPG -ls
25430459 9504 lrwxrwxrwx 1 omg omg 9732338 Aug 1 01:33 ./pix/rainbow/IMG_0772.JPG
20341373 5024 -rwxr-xr-x 1 omg omg 5144339 Apr 22 2009 ./pc/2009-04/IMG_0772.JPG
22678808 2848 -rwxr-xr-x 1 omg omg 2916237 Jul 21 21:03 ./pc/2012-07/IMG_0772.JPG
real 0m15.823s
user 0m0.908s
sys 0m1.608s
tini ~ > time find . -name IMG_0772.JPG -ls
25430459 9504 lrwxrwxrwx 1 omg omg 9732338 Aug 1 01:33 ./pix/rainbow/IMG_0772.JPG
20341373 5024 -rwxr-xr-x 1 omg omg 5144339 Apr 22 2009 ./pc/2009-04/IMG_0772.JPG
22678808 2848 -rwxr-xr-x 1 omg omg 2916237 Jul 21 21:03 ./pc/2012-07/IMG_0772.JPG
real 0m0.715s
user 0m0.340s
sys 0m0.368s
In the example, the second find ran much faster because the OS still had buffers in RAM from the first find. [On my small Linux 3.2.0-32 system, according to top at the moment 2.5GB of RAM is buffers, 0.3GB is free, and 3.8GB in use (ie about 1.3GB for programs and OS).]
Anyhow, to speed up processing, you need to find a way to make better use of OS disk buffering. For example, double or quadruple your system memory. For an alternative, try the locate command. The query
time locate IMG_0772.JPG
consistently takes under a second on my system. You may wish to run updatedb just before starting the job that finds the 1500 file names. See man updatedb. If directory . in your find's gives only a small part of the overall file system, so that the locate database includes numerous irrelevant files, use various prune options when you run updatedb, to minimize the size of the locate database that is accessed when you run locate; and afterwards, run a plain updatedb to restore other filenames to the locate database. Using locate you probably can cut the run time to 20 minutes.
This solution calls find and fgrep only once:
find . | fgrep -f my_file > outputfile
I assume that my_file has a list of files you are looking for, with each name on a separate line.
Explanation
The find command finds all the files (including directories) in the current directory. Its output is a list of files/directories, one per line
The fgrep command search from the output of the find command, but instead of specifying the search term on the command line, it gets the search terms from my_file--that's what the -f flag for.
The output of the fgrep command, which is the list of files you are looking for, are redirected into outputfile
maybe something like
find . \( -name file1 -o -name file2 -o ... \) >outputfile
You could build lines of this kind, depending on the number of names in my_file:
find . \( $(xargs <my_file printf "-name %s -o " | sed 's/-o $//') \) >outputfile
is there a way to spawn multiple instances of find to speed up the process.
This is not how you want to solve the problem, since find is I/O- and FS-limited.
Either use multiple -name arguments grouped together with -o in order to use one find command to look for multiple filenames at once, or find all files once and use a tool such as grep to search the resultant list of files for the filenames of interest.
I was sent a zip file containing 40 files with the same name.
I wanted to extract each of these files to a seperate folder OR extract each file with a different name (file1, file2, etc).
Is there a way to do this automatically with standard linux tools? A check of man unzip revealed nothing that could help me. zipsplit also does not seem to allow an arbitrary splitting of zip files (I was trying to split the zip into 40 archives, each containing one file).
At the moment I am (r)enaming my files individually. This is not so much of a problem with a 40 file archive, but is obviously unscalable.
Anyone have a nice, simple way of doing this? More curious than anything else.
Thanks.
Assuming that no such tool currently exists, then it should be quite easy to write one in python. Python has a zipfile module that should be sufficient.
Something like this (maybe, untested):
#!/usr/bin/env python
import os
import sys
import zipfile
count = 0
z = zipfile.ZipFile(sys.argv[1],"r")
for info in z.infolist():
directory = str(count)
os.makedirs(directory)
z.extract(info,directory)
count += 1
z.close()
I know this is a couple years old, but the answers above did not solve my particular problem here so I thought I should go ahead and post a solution that worked for me.
Without scripting, you can just use command line input to interact with the unzip tools text interface. That is, when you type this at the command line:
unzip file.zip
and it contains files of the same name, it will prompt you with:
replace sameName.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename:
If you wanted to do this by hand, you would type "r", and then at the next prompt:
new name:
you would just type the new file name.
To automate this, simply create a text file with the responses to these prompts and use it as the input to unzip, as follows.
r
sameName_1.txt
r
sameName_2.txt
...
That is generated pretty easily using your favorite scripting language. Save it as unzip_input.txt and then use it as input to unzip like this:
unzip < unzip_input.txt
For me, this was less of a headache than trying to get the Perl or Python extraction modules working the way I needed. Hope this helps someone...
here is a linux script version
in this case the 834733991_T_ONTIME.csv is the name of the file that is the same inside every zip file, and the .csv after "$count" simply has to be swapped with the file type you want
#!/bin/bash
count=0
for a in *.zip
do
unzip -q "$a"
mv 834733991_T_ONTIME.csv "$count".csv
count=$(($count+1))
done`
This thread is old but there is still room for improvement. Personally I prefer the following one-liner in bash
unzipd ()
{
unzip -d "${1%.*}" "$1"
}
Nice, clean, and simple way to remove the extension and use the
Using unzip -B file.zip did the trick for me. It creates a backup file suffixed with ~<number> in case the file already exists.
For example:
$ rm *.xml
$ unzip -B bogus.zip
Archive: bogus.zip
inflating: foo.xml
inflating: foo.xml
inflating: foo.xml
inflating: foo.xml
inflating: foo.xml
$ ls -l
-rw-rw-r-- 1 user user 1161 Dec 20 20:03 bogus.zip
-rw-rw-r-- 1 user user 1501 Dec 16 14:34 foo.xml
-rw-rw-r-- 1 user user 1520 Dec 16 14:45 foo.xml~
-rw-rw-r-- 1 user user 1501 Dec 16 14:47 foo.xml~1
-rw-rw-r-- 1 user user 1520 Dec 16 14:53 foo.xml~2
-rw-rw-r-- 1 user user 1520 Dec 16 14:54 foo.xml~3
Note: the -B option does not show up in unzip --help, but is mentioned in the man pages: https://manpages.org/unzip#options