Running a command over several files and keep the same name - linux

How can I run a shell command on several files in linux/mac while keeping the same name (excluding the extension) ?
e.g. let's assume that I want to compile a list of files using a command to some other files with the same name :
{command} [name].less [same-name].css

EDIT: Supposing, more generally, that the two targets are located in two different paths, say, "path/to/folder2" and "path/to/folder3" and keeping in mind you can always specify the list used in the for cycle, you can try:
for i in $(ls path/to/folder3 | grep .less); do . /path/to/folder1/script.sh $(echo "path/to/folder3/$i $( echo "path/to/folder2/$i" | sed -e s/.less/.css/)") ; done
Still sorry for the brutality and perhaps non-elegant solution.

You can do something like this:
ls sameName.*
or simply
ls same* > list_of_filenams_starting_with_SAME.txt

IMHO, the more concise, performant and intuitive solution is to use GNU Parallel. Your command becomes:
parallel command {} {.}.css ::: *.less
So, for example, let's say your "command" is ls -l, and you have these files in your directory:
Freddy Frog.css
Freddy Frog.less
a.css
a.less
then your command would be
parallel ls -l {} {.}.css ::: *.less
-rw-r--r-- 1 mark staff 0 7 Aug 08:09 Freddy Frog.css
-rw-r--r-- 1 mark staff 0 7 Aug 08:09 Freddy Frog.less
-rw-r--r-- 1 mark staff 0 7 Aug 08:09 a.css
-rw-r--r-- 1 mark staff 0 7 Aug 08:09 a.less
The benefits are firstly that it is a nice, concise syntax and a one-liner. Secondly, it'll run commands in parallel using as many cores as your CPU(s) have so it will be faster. If you do that, you may want the -k option to keep the outputs in order from the different commands.
If, you need it to run across many folders in a hierarchy, you can pipe the filenames in like this:
find <somepleace> -name \*.less | parallel <command> {} {.}.css
To understand these last two points (piping in and order), look at this example:
seq 1 10 | parallel echo
6
7
8
5
4
9
3
2
1
10
And now with -k to keep the order:
seq 1 10 | parallel -k echo
1
2
3
4
5
6
7
8
9
10
If, for some reason, you want to run the jobs sequentially one after the other, just add the switch -j 1 to the parallel command to set the number of parallel jobs to 1.
Try this out on your Linux machine as GNU Parallel is generally installed there. On the Mac under OS X , the easiest way to install GNU Parallel is with homebrew - please ask before trying if you are not familiar.

Related

Top Command Output is Empty when run from cron

I was trying to redirect the TOP command output in the particular file in every 5 minutes with the below command.
top -b -n 1 > /var/tmp/TOP_USAGE.csv.$(date +"%I-%M-%p_%d-%m-%Y")
-rw-r--r-- 1 root root 0 Dec 9 17:20 TOP_USAGE.csv.05-20-PM_09-12-2015
-rw-r--r-- 1 root root 0 Dec 9 17:25 TOP_USAGE.csv.05-25-PM_09-12-2015
-rw-r--r-- 1 root root 0 Dec 9 17:30 TOP_USAGE.csv.05-30-PM_09-12-2015
-rw-r--r-- 1 root root 0 Dec 9 17:35 TOP_USAGE.csv.05-35-PM_09-12-2015
Hence i made a very small (1 line) shell script for this, so that i can run in every 5 minutes via cronjob.
Problem is when i run this script manually then i can see the output in the file, however when this script in running automatically, file is generating in every 5 minutes but there is no data (aka file is empty)
Can anyone please help me on this?
I now modified the script and still it's the same.
#!/bin/sh
PATH=$(/usr/bin/getconf PATH)
/usr/bin/top -b -n 1 > /var/tmp/TOP_USAGE.csv.$(date +"%I-%M-%p_%d-%m-%Y")
I met the same problem as you.
Top command with -b option must be added.Saving top output to variable before we use it.
the scripts are below
date >> /tmp/mysql-mem-moniter.log
MEM=/usr/bin/top -b -n 1 -u mysql
echo "$MEM" | grep mysql >> /tmp/mysql-mem-moniter.log
Most likely the environment passed to your script from cron is too minimal. In particular, PATH may not be what you think it is (no profiles are read by scripts started from cron).
Place PATH=$(/usr/bin/getconf PATH) at the start of your script, then run it with
/usr/bin/env -i /path/to/script
Once that works without error, it's ready for cron.

Using RSync to copy a sequential range of files

Sorry if this makes no sense, but I will try to give all the information needed!
I would like to use rsync to copy a range of sequentially numbered files from one folder to another.
I am archiving a DCDM (Its a film thing) and it contains in the order of 600,000 individually numbered, sequential .tif image files (~10mb ea.).
I need to break this up to properly archive onto LTO6 tapes. And I would like to use rsync to prep the folders such that my simple bash .sh file can automate the various folders and files that I want to back up to tape.
The command I normally use when running rsync is:
sudo rsync -rvhW --progress --size only <src> <dest>
I use sudo if needed, and I always test the outcome first with --dry-run
The only way I’ve got anything to work (without kicking out errors) is by using the * wildcard. However, this only does files with the set pattern (eg. 01* will only move files from the range 010000 - 019999) and I would have to repeat for 02, 03, 04 etc..
I've looked on the internet, and am struggling to find an answer that works.
This might not be possible, and with 600,000 .tif files, I can't write an exclude for each one!
Any thoughts as to how (if at all) this could be done?
Owen.
You can check for the file name starting with a digit by using pattern matching:
for file in [0-9]*; do
# do something to $file name that starts with digit
done
Or, you could enable the extglob option and loop over all file names that contain only digits. This could eliminate any potential unwanted files that start with a digit but contain non-digits after the first character.
shopt -s extglob
for file in +([0-9]); do
# do something to $file name that contains only digits
done
+([0-9]) expands to one or more occurrence of a digit
Update:
Based on the file name pattern in your recent comment:
shopt -s extglob
for file in legendary_dcdm_3d+([0-9]).tif; do
# do something to $file
done
Globing is the feature of the shell to expand a wildcard to a list of matching file names. You have already used it in your question.
For the following explanations, I will assume we are in a directory with the following files:
$ ls -l
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 file.txt
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 funny_cat.jpg
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2013-1.pdf
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2013-2.pdf
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2013-3.pdf
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2013-4.pdf
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2014-1.pdf
-rw-r----- 1 5gon12eder staff 0 Sep 8 17:26 report_2014-2.pdf
The most simple case is to match all files. The following makes for a poor man's ls.
$ echo *
file.txt funny_cat.jpg report_2013-1.pdf report_2013-2.pdf report_2013-3.pdf report_2013-4.pdf report_2014-1.pdf report_2014-2.pdf
If we want to match all reports from 2013, we can narrow the match:
$ echo report_2013-*.pdf
report_2013-1.pdf report_2013-2.pdf report_2013-3.pdf report_2013-4.pdf
We could, for example, have left out the .pdf part but I like to be as specific as possible.
You have already come up with a solution to use this for selecting a range of numbered files. For example, we can match reports by quater:
$ for q in 1 2 3 4; do echo "$q. quater: " report_*-$q.pdf; done
1. quater: report_2013-1.pdf report_2014-1.pdf
2. quater: report_2013-2.pdf report_2014-2.pdf
3. quater: report_2013-3.pdf
4. quater: report_2013-4.pdf
If we are to lazy to type 1 2 3 4, we could have used $(seq 4) instead. This invokes the program seq with argument 4 and substitutes its output (1 2 3 4 in this case).
Now back to your problem: If you want chunk sizes that are a power of 10, you should be able to extend the above example to fit your needs.
old question i know, but someone may find this useful. the above examples for expanding a range also work with rsync. for example to copy files starting with a, b and c but not d and e from dir /tmp/from_here to dir /tmp/to_here:
$ rsync -avv /tmp/from_here/[a-c]* /tmp/to_here
sending incremental file list
delta-transmission disabled for local transfer or --whole-file
alice/
bob/
cedric/
total: matches=0 hash_hits=0 false_alarms=0 data=0
sent 89 bytes received 24 bytes 226.00 bytes/sec
total size is 0 speedup is 0.00
If you are writing to LTO6 tapes, you should consider including "--inplace" to your command. Inplace is meant for writing to linear filesystems such as LTO

What's the "p" permission found on /var/run/screen/ ...?

I was wondering if I can do a tail on screen session files,
so I went into /var/run/screen/S-Username.
This is what I found on that directory (using ll -l)
XXXX#ubuntu:/var/run/screen/S-XXXX $ ll -a
total 0
drwx------ 2 XXXX XXXX 60 XXXX 5 09:42 ./
drwxrwxr-x 3 root utmp 60 XXXX 5 09:42 ../
prwx------ 1 XXXX XXXX 0 XXXX 5 09:42 3031.pts-1.ubuntu
I’ve tried googling for “Linux file permissions”,
and no one seems to mention the p flag. Can anyone
tell me what the p permission flag is?
P.S: Also, it seems that I can’t do cat or tail on that file either.
It's not a permission. The p means that it's a named pipe, not a regular file.
p stands for FIFO, a named pipe. So it's not a permission, but a file type (just like d for directory).
You can't use cat or tail to get its content, because a FIFO isn't a regular file, it's used for inter-process communication.

Backup files on webserver ! and ~

My LAMP web server renders backup files like these:
!index.php
!~index.php
bak.index.php
Copy%20of%20index.php
I tried deleting with rm but it cannot find the files.
Does this have something to do with bash or vim? How can this be fixed?
Escape the characters (with a backslash) like so:
[ 09:55 jon#hozbox.com ~/t ]$ ll
total 0
-rw-r--r-- 1 jon people 0 Nov 27 09:55 !abc.html
-rw-r--r-- 1 jon people 0 Nov 27 09:55 ~qwerty.php
[ 09:55 jon#hozbox.com ~/t ]$ rm -v \!abc.html \~qwerty.php
removed '!abc.html'
removed '~qwerty.php'
[ 09:56 jon#hozbox.com ~/t ]$ ll
total 0
[ 09:56 jon#hozbox.com ~/t ]$
Another way to do that, other than the one suggested by chown, is write the filenames within "".
Example:
rm "!abc.html" "~qwerty.php"
If you don't like the special treatment of the character !, use set +H in your shell to turn of history expansion. See section 'HISTORY EXPANSION' in man bash for more information.
Interestingly, I can delete files starting with ~ without having to escape the file names.

How to measure CPU usage

I would like to log CPU usage at a frequency of 1 second.
One possible way to do it is via vmstat 1 command.
The problem is that the time between each output is not always exactly one second, especially on a busy server. I would like to be able to output the timestamp along with the CPU usage every second. What would be a simple way to accomplish this, without installing special tools?
There are many ways to do that. Except top another way is to you the "sar" utility. So something like
sar -u 1 10
will give you the cpu utilization for 10 times every 1 second. At the end it will print averages for each one of the sys, user, iowait, idle
Another utility is the "mpstat", that gives you similar things with sar
Use the well-known UNIX tool top that is normally available on Linux systems:
top -b -d 1 > /tmp/top.log
The first line of each output block from top contains a timestamp.
I see no command line option to limit the number of rows that top displays.
Section 5a. SYSTEM Configuration File and 5b. PERSONAL Configuration File of the top man page describes pressing W when running top in interactive mode to create a $HOME/.toprc configuration file.
I did this, then edited my .toprc file and changed all maxtasks values so that they are maxtasks=4. Then top only displays 4 rows of output.
For completeness, the alternative way to do this using pipes is:
top -b -d 1 | awk '/load average/ {n=10} {if (n-- > 0) {print}}' > /tmp/top.log
You might want to try htop and atop. htop is beautifully interactive while atop gathers information and can report CPU usage even for terminated processes.
I found a neat way to get the timestamp information to be displayed along with the output of vmstat.
Sample command:
vmstat -n 1 3 | while read line; do echo "$(date --iso-8601=seconds) $line"; done
Output:
2013-09-13T14:01:31-0700 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
2013-09-13T14:01:31-0700 r b swpd free buff cache si so bi bo in cs us sy id wa
2013-09-13T14:01:31-0700 1 1 4197640 29952 124584 12477708 12 5 449 147 2 0 7 4 82 7
2013-09-13T14:01:32-0700 3 0 4197780 28232 124504 12480324 392 180 15984 180 1792 1301 31 15 38 16
2013-09-13T14:01:33-0700 0 1 4197656 30464 124504 12477492 344 0 2008 0 1892 1929 32 14 43 10
To monitor the disk usage, cpu and load i created a small bash scripts that writes the values to a log file every 10 seconds.
This logfile is processed by logstash kibana and riemann.
# #!/usr/bin/env bash
# Define a timestamp function
LOGPATH="/var/log/systemstatus.log"
timestamp() {
date +"%Y-%m-%dT%T.%N"
}
#server load
while ( sleep 10 ) ; do
echo -n "$(timestamp) linux::systemstatus::load " >> $LOGPATH
cat /proc/loadavg >> $LOGPATH
#cpu usage
echo -n "$(timestamp) linux::systemstatus::cpu " >> $LOGPATH
top -bn 1 | sed -n 3p >> $LOGPAT
#disk usage
echo -n "$(timestamp) linux::systemstatus::storage " >> $LOGPATH
df --total|grep total|sed "s/total//g"| sed 's/^ *//' >> $LOGPATH
done

Resources