Command line sorcerery - linux

I have a directory full of .xls files that I want to convert to .csv. I'm using xls2csv. This command only prints out the csv to the screen so I believe you have to do xls2csv (xls file) > (new file).csv. So for this I need to write a loop.
for f in `ls`; do xls2csv > `rev $f` | cut -d "." | rev | echo ".csv"
That's what I have so far and it doesn't work. I'm just hoping you can understand exactly what I want to do by the above example.

for f in *.xls; do
basename="${f%.xls}"
csvname="$basename.csv"
xls2csv "$f" > "$csvname"
done
[update] fixed the typo, so that $basename is actually used. Thanks.

GNU Parallel has a feature for this: {.} which is the original string but with the .extension removed:
ls | parallel xls2csv {} ">" {.}.csv
Plus you get the added bonus that xls2csv will be run in parallel if you have multiple CPUs. It also deals correctly with file names like:
My Brother's 12" records.xls
To learn more watch the intro video: http://www.youtube.com/watch?v=OpaiGYxkSuQ

for f in *; do
c=`echo $f | sed 's/.xls$/.csv/'`
xls2csv $f >$c
done

You should check out the basename command, using the -s switch.
(I think you're using rev to reverse the filename - is that right? I removed it.)
for f in `ls`; do
xls2csv $f > `basename -s xls $f`csv;
done
Try that. I don't know if xls2csv is destructive (like sed), so back up your directory.

Try this
type=".csv";
for f in `ls -1`;
do
file=`echo $f|cut -d '.' -f1`
file=${file}${type}
`xls2csv $f > $file`
done

Related

How to touch all files that are returned by a sorted ls?

If I have the following:
ls|sort -n
How would I touch all those files in the order of the sorted files? Something like:
ls|sort -n|touch
What would be the proper syntax? Note that I need to sort touch the files in the exact order they're being sorted -- as I'm trying to sort these files for a FAT reader with minimal metadata reading.
ls -1tr | while read file; do touch "$file"; sleep 1; done
If you want to preserve distance in modification time from one file to the next then call this instead:
upmodstamps() {
oldest_elapsed=$(( $(date +%s) - $(stat -c %Y "`ls -1tr|head -1`") ))
for file in *; do
oldstamp=$(stat -c %Y "$file")
newstamp=$(( $oldstamp + $oldest_elapsed ))
newstamp_fmt=$(date --date=#${newstamp} +'%Y%m%d%H%M.%S')
touch -t ${newstamp_fmt} "$file"
done
}
Note: date usage assumes GNU
You can use this command
(ls|sort -n >> list.txt )
touch $(cat list.txt)
OR
touch $(ls /path/to/dir | sort -n)
OR if you want to copy files instead of creating empty files use this command
cp list.txt ./DirectoryWhereYouWantToCopy
Try like this
touch $(ls | sort -n)
Can you give a few file name?
if you have file names with numbers as 1file, 10file, 11file .. 20file, then you need use --general-numeric-sort
ls | sort --general-numeric-sort --output=../workingDirectory/sortedFiles.txt
cat sortedFiles.txt
1file
10file
11file
12file
20file
and move sortedFile.txt into your working directory or where ever you want.
touch $(cat ../workingDirectory/sortedFiles.txt)
this will create empty files with the exact same name

Linux batch copy files into directories based on filename pattern

I have a list of almost 500 pdf files with the following filename structure:
XXXX-YYYY-MM-DD.pdf
where XXXX is a variable lenght numeric code (1 to 4 digits) always delimitated by "-", for example:
51-2016-08-22.pdf
776-2016-08-22.pdf
3881-2016-08-22.pdf
4-2016-08-22.pdf
2860-2016-08-22.pdf
The goal is to copy each file into its own directory, naming the directories like the pattern (ie: file 776-2016-08-22.pdf goes to directory 776). How can I use awk or sed to delimitate the variable lenght field?
Here's my code:
for f in *.pdf
do
FOLDERNAME=`echo $f| awk (awk or sed missing code here)`
mkdir /my/dir/structure/$FOLDERNAME
cp $f /my/dir/structure/$FOLDERNAME/
done
Thanks for your support.
You can use:
for f in *.pdf; do
d="${f%%-*}"
mkdir -p "$d" && cp "$f" "$d"
done
As rightly pointed out by ed-morton, This is NOT recommended solution as it fails in many cases. Please follow https://stackoverflow.com/a/39089589/3834860
Keeping this answer for reference.
awk -F '-' to specify delimiter and '{print $1}' for first element before delimiter.
for f in *.pdf
do
FOLDERNAME=`echo $f| awk -F '-' '{print $1}'`
mkdir /my/dir/structure/$FOLDERNAME
cp $f /my/dir/structure/$FOLDERNAME/
done

linux-shell: renaming files to creation time

Good morning everybody,
for a website I'd like to rename files(pictures) in a folder from "1.jpg, 2.jpg, 3.jpg ..." to "yyyymmdd_hhmmss.jpg" - so I'd like to read out the creation times an set this times as names for the pics. Does anybody have an idea how to do that for example with a linux-shell or with imagemagick?
Thank you!
Naming based on file system date
In the linux shell:
for f in *.jpg
do
mv -n "$f" "$(date -r "$f" +"%Y%m%d_%H%M%S").jpg"
done
Explanation:
for f in *.jpg
do
This starts the loop over all jpeg files. A feature of this is that it will work with all file names, even ones with spaces, tabs or other difficult characters in the names.
mv -n "$f" "$(date -r "$f" +"%Y%m%d_%H%M%S").jpg"
This renames the file. It uses the -r option which tells date to display the date of the file rather than the current date. The specification +"%Y%m%d_%H%M%S" tells date to format it as you specified.
The file name, $f, is placed in double quotes where ever it is used. This assures that odd file names will not cause errors.
The -n option to mv tells move never to overwrite an existing file.
done
This completes the loop.
For interactive use, you may prefer that the command is all on one line. In that case, use:
for f in *.jpg; do mv -n "$f" "$(date -r "$f" +"%Y%m%d_%H%M%S").jpg"; done
Naming based on EXIF Create Date
To name the file based on the EXIF Create Date (instead of the file system date), we need exiftool or equivalent:
for f in *.jpg
do
mv -n "$f" "$(exiftool -d "%Y%m%d_%H%M%S" -CreateDate "$f" | awk '{print $4".jpg"}')"
done
Explanation:
The above is quite similar to the commands for the file date but with the use of exiftool and awk to extract the EXIF image Create Date.
The exiftool command provides the date in a format like:
$ exiftool -d "%Y%m%d_%H%M%S" -CreateDate sample.jpg
Create Date : 20121027_181338
The actual date that we want is the fourth field in the output.
We pass the exiftool output to awk so that it can extract the field that we want:
awk '{print $4".jpg"}'
This selects the date field and also adds on the .jpg extension.
Thanks to #John1024 !
I needed to rename files with different extensions in the same time, according to last modification date :
for f in *; do
fn=$(basename "$f")
mv "$fn" "$(date -r "$f" +"%Y-%m-%d_%H-%M-%S")_$fn"
done
"DSC_0189.JPG" ➜ "2016-02-21_18-22-15_DSC_0189.JPG"
"MOV_0131.avi" ➜ "2016-01-01_20-30-31_MOV_0131.avi"
If you don't want to keep original filename :
mv "$fn" "$(date -r "$pathAndFileName" +"%Y-%m-%d_%H-%M-%S")"
Hope it helps noobs as me !
Try this
for file in `ls -1 *.jpg`; do name=`stat -c %y $file | awk -F"." '{ print $1 }' | sed -e "s/\-//g" -e "s/\://g" -e "s/[ ]/_/g"`.jpg; mv $file $name; done
Though there might be an easier way.
I created a shell script; I think it's mac only, linux might need other arguments.
#!/bin/bash
BASEDIR=$1;
for file in `ls -1 $BASEDIR`; do
TIMESTAMP=`stat -f "%B" $BASEDIR/$file`;
DATENAME=`date -r $TIMESTAMP +'%Y%m%d-%H%M%S'`-$file
mv -v $BASEDIR/$file $BASEDIR/$DATENAME;
done
when called with a directory path, moves all files in that directory to prepend the creation date of that file, like
../camera/P1210232.JPG -> ../camera/20220121-103456-P1210232.JPG
Change filename based on file creation time:
exiftool "-filename<FileCreateDate" -d %Y%m%d_%H%M%S%z%%-c.%%le input.jpg

How do I insert a new line before concatenating?

I have about 80000 files which I am trying to concatenate. This one:
cat files_*.raw >> All
is extremely fast whereas the following:
for f in `ls files_*.raw`; do cat $f >> All; done;
is extremely slow. Because of this reason, I am trying to stick with the first option except that I need to be able to insert a new line after each file is concatenated to All. Is there any fast way of doing this?
What about
ls files_*.raw | xargs -L1 sed -e '$s/$/\n/' >>ALL
That will insert an extra newline at the end of each file as you concat them.
And a parallel version if you don't care about the order of concatenation:
find ./ -name "*.raw" -print | xargs -n1 -P4 sed -e '$s/$/\n/' >>All
The second command might be slow because you are opening the 'All' file for append 80000 times vs. 1 time in the first command. Try a simple variant of the second command:
for f in `ls files_*.raw`; do cat $f ; echo '' ; done >> All
I don't know why it would be slow, but I don't think you have much choice:
for f in `ls files_*.raw`; do cat $f >> All; echo '' >> All; done
Each time awk opens another file to process, the FRN equals 0, so:
awk '(0==FRN){print ""} {print}' files_*.raw >> All
Note, it's all done in one awk process. Performance should be close to the cat command from the question.

Problems with Grep Command in bash script

I'm having some rather unusual problems using grep in a bash script. Below is an example of the bash script code that I'm using that exhibits the behaviour:
UNIQ_SCAN_INIT_POINT=1
cat "$FILE_BASENAME_LIST" | uniq -d >> $UNIQ_LIST
sed '/^$/d' $UNIQ_LIST >> $UNIQ_LIST_FINAL
UNIQ_LINE_COUNT=`wc -l $UNIQ_LIST_FINAL | cut -d \ -f 1`
while [ -n "`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`" ]; do
CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
CURRENT_DUPECHK_FILE=$FILE_DUPEMATCH-$CURRENT_LINE
grep $CURRENT_LINE $FILE_LOCTN_LIST >> $CURRENT_DUPECHK_FILE
MATCH=`grep -c $CURRENT_LINE $FILE_BASENAME_LIST`
CMD_ECHO="$CURRENT_LINE matched $MATCH times," cmd_line_echo
echo "$CURRENT_DUPECHK_FILE" >> $FILE_DUPEMATCH_FILELIST
let UNIQ_SCAN_INIT_POINT=UNIQ_SCAN_INIT_POINT+1
done
On numerous occasions, when grepping for the current line in the file location list, it has put no output to the current dupechk file even though there have definitely been matches to the current line in the file location list (I ran the command in terminal with no issues).
I've rummaged around the internet to see if anyone else has had similar behaviour, and thus far all I have found is that it is something to do with buffered and unbuffered outputs from other commands operating before the grep command in the Bash script....
However no one seems to have found a solution, so basically I'm asking you guys if you have ever come across this, and any idea/tips/solutions to this problem...
Regards
Paul
The `problem' is the standard I/O library. When it is writing to a terminal
it is unbuffered, but if it is writing to a pipe then it sets up buffering.
try changing
CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
to
CURRENT LINE=`sed "$UNIQ_SCAN_INIT_POINT"'q;d' $UNIQ_LIST_FINAL`
Are there any directories with spaces in their names in $FILE_LOCTN_LIST? Because if they are, those spaces will need escaped somehow. Some combination of find and xargs can usually deal with that for you, especially xargs -0
A small bash script using md5sum and sort that detects duplicate files in the current directory:
CURRENT="" md5sum * |
sort |
while read md5sum filename;
do
[[ $CURRENT == $md5sum ]] && echo $filename is duplicate;
CURRENT=$md5sum;
done
you tagged linux, some i assume you have tools like GNU find,md5sum,uniq, sort etc. here's a simple example to find duplicate files
$ echo "hello world">file
$ md5sum file
6f5902ac237024bdd0c176cb93063dc4 file
$ cp file file1
$ md5sum file1
6f5902ac237024bdd0c176cb93063dc4 file1
$ echo "blah" > file2
$ md5sum file2
0d599f0ec05c3bda8c3b8a68c32a1b47 file2
$ find . -type f -exec md5sum "{}" \; |sort -n | uniq -w32 -D
6f5902ac237024bdd0c176cb93063dc4 ./file
6f5902ac237024bdd0c176cb93063dc4 ./file1

Resources