Output list of pdf files as one pdf using pdftk bash script - linux

I have a bash script containing a list of PDF files.
The printer output is tailored by commenting/ uncommenting file requirement (see example script below).
I wish to add to this so I can choose to print all selected files to a single pdf using pdftk (or the paper printer). I am familiar with pdftk though not bash.
Can anyone indicate the bash code to output as one PDF ?
Thank you
!/bin/bash
# Print document pack
# Instructions
# ls > print.txt Generate the file list into a file
# -o sides=two-sided-long-edge -#1 Prints two sided long edge, one copy
# -o sides=two-sided-long-edge -#2 Prints two sided long edge, two copies
# -o page-ranges=1 -#1 Print first page only, one copy
# -o page-ranges=1-2,5-7 -#1 Print pages 1-2 & 5-7 only, one copy
# -lt x sets number of packs to be printed
# while [ $COUNTER -lt ***** 1 ****** ]; do NUMBER INSIDE STARS INDICATES NUMBER PACKS TO PRINT
COUNTER=0
while [ $COUNTER -lt 1 ]; do
lpr "doc01.pdf"
lpr "doc02.pdf"
lpr -o sides=two-sided-long-edge -#1 "doc03.pdf"
# lpr "doc04.pdf"
# lpr "doc05.pdf"
lpr -o sides=one-sided-long-edge -o page-ranges=1 -#1 "doc06.pdf"
# lpr "doc07.pdf"
lpr -o sides=two-sided-long-edge -#2 "doc08.pdf"
lpr "doc09.pdf"
lpr "doc10.pdf"
let COUNTER=COUNTER+1
done

Something like this:
#!/bin/bash
files=()
add() {
files+=("'""$1""'")
}
add "file1.pdf"
#add "file2.pdf"
add "file3.pdf"
add "file with spaces.pdf"
echo "${files[*]}"
Naturally, substitute the proper pdftk command for echo.
Edit 2
This new "version" will work better with filenames containing spaces.
Edit 3
To hand the files over to the command, it seems something like the following will do the trick:
bash -c "stat $(echo "${files[*]}")"

Related

Is there a way to pass multiple values into a CSV file, based on the output of a linux script

I have written a small script that will take the users input and then generate the md5sum values for it
count = 0
echo "Enter number of records"
read number
while [ $count -le $number ]
do
echo "Enter path"
read path
echo "file name"
read file_name
md5sum $path"/"$filename #it shows the md5sum value and path+filename
((count++))
done
How can I pass these values ( path,file name, and md5sums ) to CSV file. ( assuming the user chooses to enter more than 1 record)
The output should be like
/c/training,sample.txt,34234435345346549862123454651324 #placeholder values
/c/file,text.sh,4534534534534534345345435342342
Interactively prompting for the number of files to process is just obnoxious. Change the script so it accepts the files you want to process as command-line arguments.
#!/bin/sh
md5sum "$#" |
sed 's%^\([0-9a-f]*\) \(\(.*\)/\)?\([^/]*\)$%\3,\4,\1%'
There are no Bash-only constructs here, so I switched the shebang to /bin/sh; obviously, you are still free to use Bash if you like.
There is a reason md5sum prints the checksum before the path name. The reordered output will be ambiguous if you have file names which contain commas (or newlines, for that matter). Using CSV format is actually probably something you should avoid if you can; Unix tools generally work better with simpler formats like tab-delimited (which of course also breaks if you have file names with tabs in them).
Rather than prompting the user for both a path to a directory and the name of a file in that directory, you could prompt for a full path to the file. You can then extract what you need from that path using bash string manipulations.
#!/bin/bash
set -euo pipefail
function calc_md5() {
local path="${1}"
if [[ -f "${path}" ]] ; then
echo "${path%/*}, ${path##*/}, $(md5sum ${path} | awk '{ print $1 }')"
else
echo "
x - Script requires path to file.
Usage: $0 /path/to/file.txt
"
exit 1
fi
}
calc_md5 "$#"
Usage example:
$ ./script.sh /tmp/test/foo.txt
/tmp/test, foo.txt, b05403212c66bdc8ccc597fedf6cd5fe

Problem with splitting files based on numeric suffix

I have a file called files.txt and I need to split it based on lines. The command is as follows -
split -l 1 files.txt file --numeric-suffixes=1 --suffix-length=4
The numeric suffixes here start from file0001 to file9000. But I want it to be from 1 to 9000.
I can't seem to change it when --suffix-length=1, as split exhausted output filenames. Any suggestions using the same split command?
I don't think split will do what you want it to do, though I'm on macOS, so the *nix I'm using is Darwin not Linux; however, a simple shell script would do the trick:
#!/bin/bash
N=1
cat $1 | while read line
do
echo "$line" > file$N
N=`expr $N + 1`
done
Assuming you save it as mysplit (don't forget chmod -x mysplit), then you run it:
./mysplit files.txt

How does Google search PDF's so quickly even though it takes way more time for the PDF to load when I open the link?

It takes time for the PDF load completely when I click on links in Google. But Google searches millions of files and returns the exact result. It even returns part where words I searched can be found(briefly below the result). All of this in a few seconds.
However it takes way more time to open those links individually.
MY REASONING- Google has already gone through the internet(as soon as the link is uploaded to net) and Google just gives me links from its history rather doing the search in real time.
But still it sounds unconvincing as it might make it slightly quick but not much.
Also as an extension, can solution to this be used as hack to open web-pages / PDF quickly avoiding all irrelevant parts(like ads/toolbars in some news pages etc)? You know if Google can search it in such short time, there should be a way for us to get relevant pages quickly,right?
Thanks in advance.
Example: Image of the PDF which took over 10s to open. But Google returned the result in 0.58s(according Google itself)
In case you wanted to do this yourself, here is one solution that I came up with to the problem of quickly searching every law ever enacted by the U.S. Congress. These laws are in more than 32GB of PDF files, which you can download for free from here and there on the Internet.
For more than 150 PDF files that I downloaded, I used the naming convention of V<volume>[C<congress>[S<session>]]Y<year>[<description>].pdf. Here's some examples of how my PDF files were named, whitespace and all.
V1C1Y1789-1791.pdf
V6Y1789-1845 Private Laws and Resolutions.pdf
V7Y1789-1845 Indian Treaties.pdf
V50C75S1Y1937.pdf
V51C75S2Y1937.pdf
V52C75S3Y1938.pdf
V53C76S1Y1939.pdf
V54C76S2-3Y1939-1941.pdf
Then I made the following directories to hold the PDF files and their text representations (to be used for fast searching) that I was going to create. I placed all the downloaded PDFs in the first, Originals, directory.
~/Documents/Books/Laws/Originals/
~/Documents/Books/Laws/PDF/
~/Documents/Books/Laws/Text/
The first problem I encountered was that, for the volumes before number 65, the (selectable) text in the PDF files was poorly constructed: often out of order and jumbled around. (I initially discovered this when using a pdfgrep tool.) This problem made the text almost impossible to search through. However, the images of the PDF files seemed quite reasonable.
Using brew, I installed the ocrmypdf tool ("brew install ocrmypdf") to improve the OCR text layer on those problematic PDF files. It worked very well.
To get around some apparent limitations of xargs (long command lines halted file substitutions) and zargs (file substitution halted after a pipe, a redirect, or the first substitution in a string), I had created the following Zsh function, which I used to mass execute ocrmypdf on 92 PDF files.
# Make a function to execute a string of shell code in the first argument on
# one or more files specified in the remaining arguments. Every specification
# of $F in the string will be replaced with the current file. If the last
# argument is "test," show each command before asking the user to execute it.
#
# XonFs <command_string> <file_or_directory>... [test]
#
XonFs()
{
# If the last argument is a test, identify and remove it.
#
local testing
if [[ ${argv[-1]} == "test" ]]
then
testing=1
unset -v 'argv[-1]'
fi
# Get a list of files, from each argument after the first one, and sort
# the list like the Finder does. The IFS setting makes the output of the
# sort command be separated by newlines, instead of by whitespace.
#
local F IFS=$'\n' answer=""
local files=($(sort -Vi <<<"${argv[2,-1]}"))
# Execute the command for each file. But if we are testing, show the
# command that will be executed before asking the user for permission to
# do so.
#
for F in $files
do
# If this is a test, show the user the command that we will execute,
# using the current filename. Then ask the user if they want execute
# the command or not, or just quit the script. If this is not a test,
# execute the command.
#
if (( $testing ))
then
# Separate each file execution with a newline. Show the first
# argument to the function, the command that can be executed, with
# the f variable expanded as the current file. Then, ask the user
# whether the command should be executed.
#
[[ -n $answer ]] && print
printf "%s\n" "${(e)1}"
read -ks "answer?EXECUTE? y/n/q [no] "
# Report what the user's answer is interpreted to be, and do what
# they want.
#
if [[ "$answer" == [yY] ]]
then
print "Yes."
eval $1
elif [[ "$answer" == [qQ] ]]
then
print "Quit."
break
else
answer="n"
print "No."
fi
else
eval $1
fi
done
}
Thus, I used the following shell commands to put the best versions of the PDF files in the PDF directory. (It took about a day on my computer to complete the ocrmypdf conversions.) During the conversions, I also had text files created from the converted PDF files and placed in the Originals directory.
cd ~/Documents/Books/Laws/Originals/
cp V{65..128}[YC]*.pdf ../PDF
XonFs 'print "$F"; ocrmypdf -f --sidecar "${F:r}.txt" "$F" "../PDF/$F"; print;' V{1..64}[YC]*.pdf
I then used pdftotext to create the text file versions of the original (unconverted) PDF files, as follows. If I remember correctly, the pdftotext tool is installed automatically with the installation of pdfgrep ("brew install pdfgrep").
XonFs 'print "$F"; pdftotext "$F" "${F:r}.txt";' V{65..128}[YC]*.pdf
Next, I created easily and quickly searchable versions of all the text files (direct conversions of the PDF files) and placed these new versions of the text files in the Text directory with the following command.
XonFs 'print "$F"; cat "$F" | tr -cds "[:space:][:alnum:]\!\$%&,.:;?" "\!\$%&,.:;?" | tr -s "\n" "=" | tr "\f" "_" | tr -s "[:space:]" " " | sed -E -e "s/ ?& ?/ and /g" -e '"'s/[ =]*_[ =]*/\\'$'\n/g' -e 's/( ?= ?)+/\\'$'\t/g'"' > "../Text/$F";' *.txt
(Ok, tr and sed commands look crazy, but they basically do the following. Delete everything except certain characters and remove some of their redundancies, change all newlines to =, change each formfeed to _, change all whitespace to " ", and change each "&" into "and", change each _ to newline, and change each = to tab. Thus, in the new versions of the text files, a lot of extraneous characters are removed or reduced, newlines separate pages, and tabs represent newlines. Unfortunately, sed seems to require tricky escaping of characters like \n and \t in sed replacement specifications.)
The below is the zsh code for a tool I created called greplaw (and a supporting function called error). Since I will be using this tool a lot, I placed this code in my ~/.zshenv file.
# Provide a function to print an error message for the current executor, which
# is identified by the first argument. The second argument, if not null, is a
# custom error message to print. If the third argument exists and is neither
# zero nor null, the script is exited, but only to the prompt if there is one.
# The fourth argument, if present, is the current linenumber to report in the
# error message.
#
# Usage:
# error [executor [messageString [exitIndicator [lineNumber]]]]
#
# Examples:
# error greplaw
# error greplaw "" 1
# error greplaw "No text files found" 0 $LINENO
# error greplaw "No pdf files found" "" $LINENO
# error greplaw "No files found" x $LINENO
# error greplaw HELL eject $LINENO
#
error()
{
print ${1:-"Script"}": "${4:+"Line $4: "}"Error: "${2:-"Uknown"} 1>&2
[[ $3 =~ '^0*$' ]] || { ${missing_variable_ejector:?} } 2>/dev/null
}
# Function to grep through law files: see usage below or execute with -h.
#
greplaw()
{
# Provide a function to print an error message for the current executor.
# If the user did not include any arguments, give the user a little help.
#
local executor=$( basename $0 )
local err() {error $executor $*}
(( $# )) || 1="-h"
# Create variables with any defaults that we have. The color and no color
# variables are for coloring the extra output that this script outputs
# beyond what grep does.
#
local lawFileFilter=() contextLines=1 maxFileMatches=5 grepOutput=1
local maxPageMatches=5 quiet=0 c="%B%F{cyan}" nc="%f%b" grep="pcregrep"
local grepOptions=(-M -i --color) fileGrepOptions=(-M -n -i)
# Print out the usage for the greplaw function roughly in the fashion of a
# man page, with color and bolding. The output of this function should be
# sent to a print -P command.
#
local help()
{
# Make some local variables to make the description more readable in
# the below string. However, insert the codes for bold and color as
# appropriate.
#
local func="%B%F{red}$executor%f%b" name synopsis description examples
local c f g h l o p q number pattern option
# Mass declare our variables for usage categories, function flags, and
# function argument types. Use the man page standards for formatting
# the text in these variables.
#
for var in name synopsis description examples
declare $var="%B%F{red}"${(U)var}"%f%b"
for var in c f g h l o p q
declare $var="%B%F{red}-"$var"%f%b"
for var in number pattern option
declare $var="%U"$var"%u"
# Print the usage for the function, using our easier to use and read
# variables.
#
cat <<greplawUsage
$name
$func
$synopsis
$func [$c $number] [$f $number] [$g] [$h] [$l $pattern]... [$o $option]...
[$p $number] [$q] $pattern...
$description
This function searches law files with a regular expression, which is
specified as one or more arguments. If more than one argument is provided,
they are joined, with a pattern of whitespace, into a singular expression.
The searches are done without regard to case or whitespace, including line
and page breaks. The output of this function is the path of each law file
the expression is found in, including the PDF page as well as the results
of the $grep. If just a page is reported without any $grep results,
the match begins on that page and continues to the next one.
The following options are available:
$c $number
Context of $number lines around each match shown. Default is 1.
$f $number
File matches maximum is $number. Default is 5. Infinite is -1.
$g
Grep output is omitted
$h
Help message is merely printed.
$l $pattern
Law file regex $pattern will be added as a filename filter.
$o $option
Option $option added to the final $grep execution, for output.
$p $number
Page matches maximum is $number. Default is 5. Infinite is -1.
$q
Quiet file and page information: information not from $grep.
$examples
$func bureau of investigation
$func $o --color=always congress has not | less -r
$func $l " " $l Law congress
greplawUsage
}
# Update our defaulted variables according to the supplied arguments,
# until an argument is not understood. If an agument looks invalid,
# complain and eject. Add each law file filter to an array.
#
while (( $# ))
do
case $1 in
(-[Cc]*)
[[ $2 =~ '^[0-9]+$' ]] || err "Bad $1 argument: $2" eject
contextLines=$2
shift 2;;
(-f*)
[[ $2 =~ '^-?[0-9]+$' ]] || err "Bad $1 argument: $2" eject
maxFileMatches=$2
shift 2;;
(-g*)
grepOutput=0
shift;;
(-h*|--h*)
print -P "$( help )"
return;;
(-l*)
lawFileFilter+=$2
shift 2;;
(-o*)
grepOptions+=$2
shift 2;;
(-p*)
[[ $2 =~ '^-?[0-9]+$' ]] || err "Bad $1 argument: $2" eject
maxPageMatches=$2
shift 2;;
(-q*)
quiet=1
shift;;
(*)
break;;
esac
done
# If the user specified that the script and that grep has no output, then
# we will give no output at all, so just eject with an error. Also, make
# sure we have remaining arguments to assemble the search pattern with.
# Assemble it by joining them with the only allowable whitespace in the
# text files: a space, a tab (which represents a newline), or a newline
# (which prepresents a new PDF page).
#
(( $quiet && ! $grepOutput )) && err "No grep output and quiet: nothing" x
(( $# )) || err "No pattern supplied to grep law files with" eject
local pattern=${(j:[ \t\n]:)argv[1,-1]}
# Quickly seachable text files are searched as representatives of the
# actual PDF law files. Define our PDF and text directories. Note that to
# expand the home directory specification, no quotes can be used.
#
local pdfDirectory=~/Documents/Books/Laws/PDF
local textDirectory=${pdfDirectory:h}"/Text"
# Get a list of the text files, without their directory specifications,
# sorted like the Finder would: this makes the file in order of when the
# laws were created. The IFS setting separates the output of the sort
# command by newlines, instead of by whitespace.
#
local filter fileName fileMatches=0 IFS=$'\n'
local files=( $textDirectory/*.txt )
local fileNames=( $( sort -Vi <<<${files:t} ) )
[[ $#files -gt 1 ]] || err "No text files found" eject $LINENO
# Repeatedly filter the fileNames for each of the law file filters that
# were passed in.
#
for filter in $lawFileFilter
fileNames=( $( grep $filter <<<"${fileNames}" ) )
[[ $#fileNames -gt 0 ]] || err "All law files were filtered out" eject
# For each filename, search for pattern matches. If there are any, report
# the corresponding PDF file, the page numbers and lines of the match.
#
for fileName in $fileNames
do
# Do a case-insensitive, multiline grep of the current file for the
# search pattern. In the grep, have each line prepended with the line
# number, which represents the PDF page number.
#
local pages=() page="" pageMatches=0
local file=$textDirectory"/"$fileName
pages=( $( $grep $fileGrepOptions -e $pattern $file ) )
# If the grep found nothing, move on to the next file. Otherwise, if
# the maximum file matches has been defined and has been exeeded, then
# stop processing files.
#
if [[ $#pages -eq 0 ]]
then
continue
elif [[ ++fileMatches -gt $maxFileMatches && $maxFileMatches -gt 0 ]]
then
break
fi
# For each page with a match, print the page number and the matching
# lines in the page.
#
for page in $pages
do
# If there have been no previous page matches in the current file,
# identify the corresponding PDF file that the matches, in theory,
# come from.
#
if [[ ++pageMatches -eq 1 ]]
then
# Put a blank line between matches for each file, unless
# either minimum output is requested or page matches are not
# reported.
#
if [[ $fileMatches -ne 1 && $pageMatches -ne 0
&& $maxPageMatches -ne 0 ]]
then
(( $quiet )) || print
fi
# Identify and print in color the full location of the PDF
# file (prepended with an open command for easy access),
# unless minimum output is requested.
#
local pdfFile=$pdfDirectory"/"${fileName:r}".pdf"
(( $quiet )) || print -P $c"open "$pdfFile$nc
fi
# If the maximum pages matches has been defined and has been
# exeeded, stop processing pages for the current file.
#
if [[ $maxPageMatches -gt 0 && $pageMatches -gt $maxPageMatches ]]
then
break
fi
# Extract and remove the page number specification (an initial
# number before a colon) from the grep output for the page. Then
# extract the lines of the page: tabs are decoded as newlines.
#
local pageNumber=${page%%:*}
page=${page#*:}
local lines=( $( tr '\t' '\n' <<<$page ) )
# Print the PDF page number in yellow. Then grep the lines of the
# page that we have, matching possibly multiple lines without
# regard to case. And have any grep output use color and a line
# before and after the match, for context.
#
(( $quiet )) || print -P $c"Page "$pageNumber$nc
if (( $grepOutput ))
then
$grep -C $contextLines -e $pattern $grepOptions <<<$lines
fi
done
done
}
Yes, if I had to do it again, I would have used Perl...
Here's the usage for greplaw, as a zsh shell function. It's a surprisingly fast search tool.
NAME
greplaw
SYNOPSIS
greplaw [-c number] [-f number] [-g] [-h] [-l pattern]... [-o option]...
[-p number] [-q] pattern...
DESCRIPTION
This function searches law files with a regular expression, which is
specified as one or more arguments. If more than one argument is provided,
they are joined, with a pattern of whitespace, into a singular expression.
The searches are done without regard to case or whitespace, including line
and page breaks. The output of this function is the path of each law file
the expression is found in, including the PDF page as well as the results
of the pcregrep. If just a page is reported without any pcregrep results,
the match begins on that page and continues to the next one.
The following options are available:
-c number
Context of number lines around each match shown. Default is 1.
-f number
File matches maximum is number. Default is 5. Infinite is -1.
-g
Grep output is omitted
-h
Help message is merely printed.
-l pattern
Law file regex pattern will be added as a filename filter.
-o option
Option option added to the final pcregrep execution, for output.
-p number
Page matches maximum is number. Default is 5. Infinite is -1.
-q
Quiet file and page information: information not from pcregrep.
EXAMPLES
greplaw bureau of investigation
greplaw -o --color=always congress has not | less -r
greplaw -l " " -l Law congress
That'll do it... (If I remembered everything correctly and didn't do any typos...)

How can we increment a string variable within a for loop

#! /bin/bash
for i in $(ls);
do
j=1
echo "$i"
not expected Output:-
autodeploy
bin
config
console-ext
edit.lok
need Output like below if give input 2 it should print "bin" based on below condition, but I want out put like Directory list
1.)autodeploy
2.)bin
3.)config
4.)console-ext
5.)edit.lok
and if i like as input:- 2 then it should print "bin"
Per BashFAQ #1, a while read loop is the correct way to read content line-by-line:
#!/usr/bin/env bash
enumerate() {
local line i
i=0
while IFS= read -r line; do
((++i))
printf '%d.) %s\n' "$i" "$line"
done
}
ls | enumerate
However, ls is not an appropriate tool for programmatic use; the above is acceptable if the results of ls are only for human consumption, but not if they're going to be parsed by a machine -- see Why you shouldn't parse the output of ls(1).
If you want to list files and let the user choose among them by number, pass the results of a glob expression to select:
select filename in *; do
echo "$filename" && break
done
I don't understand what you mean in your question by like Directory list, but following your example, you do not need to write a loop:
ls|nl -s '.)' -w 1
If you want to avoid ls, you can do the following (but be careful - this only works if the directory entries do not contain white spaces (because this would make fmt to break them into two lines):
echo *|fmt -w 1 |nl -s '.)' -w 1

Writing a function to replace duplicate files with hardlinks

I need to write a bash script that iterates through the files of a specified directory and replaces duplicates of files with hardlinks. Right now, my entire function looks like this:
#! /bin/bash
# sameln --- remove duplicate copies of files in specified directory
D=$1
cd $D #go to directory specified as default input
fileNum=0 #loop counter
DIR=".*|*"
for f in $DIR #for every file in the directory
do
files[$fileNum]=$f #save that file into the array
fileNum=$((fileNum+1)) #increment the counter
done
for((j=0; j<$fileNum; j++)) #for every file
do
if [ -f "$files[$j]" ] #access that file in the array
then
for((k=0; k<$fileNum; k++)) #for every other file
do
if [ -f "$files[$k]" ] #access other files in the array
then
test[cmp -s ${files[$j]} ${files[$k]}] #compare if the files are identical
[ln ${files[$j]} ${files[$k]}] #change second file to a hard link
fi
done
fi
done
Basically:
Loop through all files of depth 1 in specified directory
Put file contents into array
Compare each array item with every other array item and replace duplicates with hardlinks
The test directory has four files: a, b, c, d
a and b are different, but c and d are duplicates (they are empty). After running the script, ls -l shows that all of the files still only have 1 hardlink, so the script appears to have basically done nothing.
Where am I going wrong?
DIR=".*|*"
for f in $DIR #for every file in the directory
do
echo $f
done
This code outputs
.*|*
You should not loop over files like this. Look into the find command. As you see, your code doesn't work because the first loop is already faulty.
BTW, don't name your variables all uppercase, those are reserved for system variables, I believe.
You may be making this process a bit harder on yourself than necessary. There is already a Linux command fdupes that scans a directory conducting a byte-by-byte, md5sum, date & time comparison to determine whether files are duplicates of one another. It can easily find and return groups of files that are duplicates. Your are left with only using the results.
Below is a quick example of using this tool for the job. NOTE this quick example works only for filenames that do not contain spaces within them. You will have to modify it if you are dealing with filenames containing spaces. This is intended to show an approach to using a tool that already does what you want. Also note the actual ln command is commented out below. The program just prints what it would do. After testing you can remove the comment to the ln command once you are satisfied with the results.
#! /bin/bash
# sameln --- remove duplicate copies of files in specified directory using fdupes
[ -d "$1" ] || { # test valid directory supplied
printf "error: invalid directory '%s'. usage: %s <dir>\n" "$1" "${0//\//}"
exit 1
}
type fdupes &>/dev/null || { # verify fdupes is available in path
printf "error: 'fdupes' required. Program not found within your path\n"
exit 1
}
pushd "$1" &>/dev/null # go to directory specified as default input
declare -a files # declare files and dupes array
declare -a dupes
## read duplicate files into files array
IFS=$'\n' read -d '' -a files < <(fdupes --sameline .)
## for each list of duplicates
for ((i = 0; i < ${#files[#]}; i++)); do
printf "\n duplicate files %s\n\n" "${files[i]}"
## split into original files (no interal 'spaces' allowed in filenames)
dupes=( ${files[i]} )
## for the 1st duplicate on
for ((j = 1; j < ${#dupes[#]}; j++)); do
## create hardlink to original (actual command commented)
printf " ln -f %s %s\n" "${dupes[0]}" "${dupes[j]}"
# ln -f "${dupes[0]}" "${dupes[j]}"
done
done
exit 0
Output/Example
$ bash rmdupes.sh dat
duplicate files ./output.dat ./tmptest ./env4.dat.out
ln -f ./output.dat ./tmptest
ln -f ./output.dat ./env4.dat.out
duplicate files ./vh.conf ./vhawk.conf
ln -f ./vh.conf ./vhawk.conf
duplicate files ./outfile.txt ./newfile.txt
ln -f ./outfile.txt ./newfile.txt
duplicate files ./z1 ./z1cpy
ln -f ./z1 ./z1cpy

Resources