How does Google search PDF's so quickly even though it takes way more time for the PDF to load when I open the link? - search

It takes time for the PDF load completely when I click on links in Google. But Google searches millions of files and returns the exact result. It even returns part where words I searched can be found(briefly below the result). All of this in a few seconds.
However it takes way more time to open those links individually.
MY REASONING- Google has already gone through the internet(as soon as the link is uploaded to net) and Google just gives me links from its history rather doing the search in real time.
But still it sounds unconvincing as it might make it slightly quick but not much.
Also as an extension, can solution to this be used as hack to open web-pages / PDF quickly avoiding all irrelevant parts(like ads/toolbars in some news pages etc)? You know if Google can search it in such short time, there should be a way for us to get relevant pages quickly,right?
Thanks in advance.
Example: Image of the PDF which took over 10s to open. But Google returned the result in 0.58s(according Google itself)

In case you wanted to do this yourself, here is one solution that I came up with to the problem of quickly searching every law ever enacted by the U.S. Congress. These laws are in more than 32GB of PDF files, which you can download for free from here and there on the Internet.
For more than 150 PDF files that I downloaded, I used the naming convention of V<volume>[C<congress>[S<session>]]Y<year>[<description>].pdf. Here's some examples of how my PDF files were named, whitespace and all.
V1C1Y1789-1791.pdf
V6Y1789-1845 Private Laws and Resolutions.pdf
V7Y1789-1845 Indian Treaties.pdf
V50C75S1Y1937.pdf
V51C75S2Y1937.pdf
V52C75S3Y1938.pdf
V53C76S1Y1939.pdf
V54C76S2-3Y1939-1941.pdf
Then I made the following directories to hold the PDF files and their text representations (to be used for fast searching) that I was going to create. I placed all the downloaded PDFs in the first, Originals, directory.
~/Documents/Books/Laws/Originals/
~/Documents/Books/Laws/PDF/
~/Documents/Books/Laws/Text/
The first problem I encountered was that, for the volumes before number 65, the (selectable) text in the PDF files was poorly constructed: often out of order and jumbled around. (I initially discovered this when using a pdfgrep tool.) This problem made the text almost impossible to search through. However, the images of the PDF files seemed quite reasonable.
Using brew, I installed the ocrmypdf tool ("brew install ocrmypdf") to improve the OCR text layer on those problematic PDF files. It worked very well.
To get around some apparent limitations of xargs (long command lines halted file substitutions) and zargs (file substitution halted after a pipe, a redirect, or the first substitution in a string), I had created the following Zsh function, which I used to mass execute ocrmypdf on 92 PDF files.
# Make a function to execute a string of shell code in the first argument on
# one or more files specified in the remaining arguments. Every specification
# of $F in the string will be replaced with the current file. If the last
# argument is "test," show each command before asking the user to execute it.
#
# XonFs <command_string> <file_or_directory>... [test]
#
XonFs()
{
# If the last argument is a test, identify and remove it.
#
local testing
if [[ ${argv[-1]} == "test" ]]
then
testing=1
unset -v 'argv[-1]'
fi
# Get a list of files, from each argument after the first one, and sort
# the list like the Finder does. The IFS setting makes the output of the
# sort command be separated by newlines, instead of by whitespace.
#
local F IFS=$'\n' answer=""
local files=($(sort -Vi <<<"${argv[2,-1]}"))
# Execute the command for each file. But if we are testing, show the
# command that will be executed before asking the user for permission to
# do so.
#
for F in $files
do
# If this is a test, show the user the command that we will execute,
# using the current filename. Then ask the user if they want execute
# the command or not, or just quit the script. If this is not a test,
# execute the command.
#
if (( $testing ))
then
# Separate each file execution with a newline. Show the first
# argument to the function, the command that can be executed, with
# the f variable expanded as the current file. Then, ask the user
# whether the command should be executed.
#
[[ -n $answer ]] && print
printf "%s\n" "${(e)1}"
read -ks "answer?EXECUTE? y/n/q [no] "
# Report what the user's answer is interpreted to be, and do what
# they want.
#
if [[ "$answer" == [yY] ]]
then
print "Yes."
eval $1
elif [[ "$answer" == [qQ] ]]
then
print "Quit."
break
else
answer="n"
print "No."
fi
else
eval $1
fi
done
}
Thus, I used the following shell commands to put the best versions of the PDF files in the PDF directory. (It took about a day on my computer to complete the ocrmypdf conversions.) During the conversions, I also had text files created from the converted PDF files and placed in the Originals directory.
cd ~/Documents/Books/Laws/Originals/
cp V{65..128}[YC]*.pdf ../PDF
XonFs 'print "$F"; ocrmypdf -f --sidecar "${F:r}.txt" "$F" "../PDF/$F"; print;' V{1..64}[YC]*.pdf
I then used pdftotext to create the text file versions of the original (unconverted) PDF files, as follows. If I remember correctly, the pdftotext tool is installed automatically with the installation of pdfgrep ("brew install pdfgrep").
XonFs 'print "$F"; pdftotext "$F" "${F:r}.txt";' V{65..128}[YC]*.pdf
Next, I created easily and quickly searchable versions of all the text files (direct conversions of the PDF files) and placed these new versions of the text files in the Text directory with the following command.
XonFs 'print "$F"; cat "$F" | tr -cds "[:space:][:alnum:]\!\$%&,.:;?" "\!\$%&,.:;?" | tr -s "\n" "=" | tr "\f" "_" | tr -s "[:space:]" " " | sed -E -e "s/ ?& ?/ and /g" -e '"'s/[ =]*_[ =]*/\\'$'\n/g' -e 's/( ?= ?)+/\\'$'\t/g'"' > "../Text/$F";' *.txt
(Ok, tr and sed commands look crazy, but they basically do the following. Delete everything except certain characters and remove some of their redundancies, change all newlines to =, change each formfeed to _, change all whitespace to " ", and change each "&" into "and", change each _ to newline, and change each = to tab. Thus, in the new versions of the text files, a lot of extraneous characters are removed or reduced, newlines separate pages, and tabs represent newlines. Unfortunately, sed seems to require tricky escaping of characters like \n and \t in sed replacement specifications.)
The below is the zsh code for a tool I created called greplaw (and a supporting function called error). Since I will be using this tool a lot, I placed this code in my ~/.zshenv file.
# Provide a function to print an error message for the current executor, which
# is identified by the first argument. The second argument, if not null, is a
# custom error message to print. If the third argument exists and is neither
# zero nor null, the script is exited, but only to the prompt if there is one.
# The fourth argument, if present, is the current linenumber to report in the
# error message.
#
# Usage:
# error [executor [messageString [exitIndicator [lineNumber]]]]
#
# Examples:
# error greplaw
# error greplaw "" 1
# error greplaw "No text files found" 0 $LINENO
# error greplaw "No pdf files found" "" $LINENO
# error greplaw "No files found" x $LINENO
# error greplaw HELL eject $LINENO
#
error()
{
print ${1:-"Script"}": "${4:+"Line $4: "}"Error: "${2:-"Uknown"} 1>&2
[[ $3 =~ '^0*$' ]] || { ${missing_variable_ejector:?} } 2>/dev/null
}
# Function to grep through law files: see usage below or execute with -h.
#
greplaw()
{
# Provide a function to print an error message for the current executor.
# If the user did not include any arguments, give the user a little help.
#
local executor=$( basename $0 )
local err() {error $executor $*}
(( $# )) || 1="-h"
# Create variables with any defaults that we have. The color and no color
# variables are for coloring the extra output that this script outputs
# beyond what grep does.
#
local lawFileFilter=() contextLines=1 maxFileMatches=5 grepOutput=1
local maxPageMatches=5 quiet=0 c="%B%F{cyan}" nc="%f%b" grep="pcregrep"
local grepOptions=(-M -i --color) fileGrepOptions=(-M -n -i)
# Print out the usage for the greplaw function roughly in the fashion of a
# man page, with color and bolding. The output of this function should be
# sent to a print -P command.
#
local help()
{
# Make some local variables to make the description more readable in
# the below string. However, insert the codes for bold and color as
# appropriate.
#
local func="%B%F{red}$executor%f%b" name synopsis description examples
local c f g h l o p q number pattern option
# Mass declare our variables for usage categories, function flags, and
# function argument types. Use the man page standards for formatting
# the text in these variables.
#
for var in name synopsis description examples
declare $var="%B%F{red}"${(U)var}"%f%b"
for var in c f g h l o p q
declare $var="%B%F{red}-"$var"%f%b"
for var in number pattern option
declare $var="%U"$var"%u"
# Print the usage for the function, using our easier to use and read
# variables.
#
cat <<greplawUsage
$name
$func
$synopsis
$func [$c $number] [$f $number] [$g] [$h] [$l $pattern]... [$o $option]...
[$p $number] [$q] $pattern...
$description
This function searches law files with a regular expression, which is
specified as one or more arguments. If more than one argument is provided,
they are joined, with a pattern of whitespace, into a singular expression.
The searches are done without regard to case or whitespace, including line
and page breaks. The output of this function is the path of each law file
the expression is found in, including the PDF page as well as the results
of the $grep. If just a page is reported without any $grep results,
the match begins on that page and continues to the next one.
The following options are available:
$c $number
Context of $number lines around each match shown. Default is 1.
$f $number
File matches maximum is $number. Default is 5. Infinite is -1.
$g
Grep output is omitted
$h
Help message is merely printed.
$l $pattern
Law file regex $pattern will be added as a filename filter.
$o $option
Option $option added to the final $grep execution, for output.
$p $number
Page matches maximum is $number. Default is 5. Infinite is -1.
$q
Quiet file and page information: information not from $grep.
$examples
$func bureau of investigation
$func $o --color=always congress has not | less -r
$func $l " " $l Law congress
greplawUsage
}
# Update our defaulted variables according to the supplied arguments,
# until an argument is not understood. If an agument looks invalid,
# complain and eject. Add each law file filter to an array.
#
while (( $# ))
do
case $1 in
(-[Cc]*)
[[ $2 =~ '^[0-9]+$' ]] || err "Bad $1 argument: $2" eject
contextLines=$2
shift 2;;
(-f*)
[[ $2 =~ '^-?[0-9]+$' ]] || err "Bad $1 argument: $2" eject
maxFileMatches=$2
shift 2;;
(-g*)
grepOutput=0
shift;;
(-h*|--h*)
print -P "$( help )"
return;;
(-l*)
lawFileFilter+=$2
shift 2;;
(-o*)
grepOptions+=$2
shift 2;;
(-p*)
[[ $2 =~ '^-?[0-9]+$' ]] || err "Bad $1 argument: $2" eject
maxPageMatches=$2
shift 2;;
(-q*)
quiet=1
shift;;
(*)
break;;
esac
done
# If the user specified that the script and that grep has no output, then
# we will give no output at all, so just eject with an error. Also, make
# sure we have remaining arguments to assemble the search pattern with.
# Assemble it by joining them with the only allowable whitespace in the
# text files: a space, a tab (which represents a newline), or a newline
# (which prepresents a new PDF page).
#
(( $quiet && ! $grepOutput )) && err "No grep output and quiet: nothing" x
(( $# )) || err "No pattern supplied to grep law files with" eject
local pattern=${(j:[ \t\n]:)argv[1,-1]}
# Quickly seachable text files are searched as representatives of the
# actual PDF law files. Define our PDF and text directories. Note that to
# expand the home directory specification, no quotes can be used.
#
local pdfDirectory=~/Documents/Books/Laws/PDF
local textDirectory=${pdfDirectory:h}"/Text"
# Get a list of the text files, without their directory specifications,
# sorted like the Finder would: this makes the file in order of when the
# laws were created. The IFS setting separates the output of the sort
# command by newlines, instead of by whitespace.
#
local filter fileName fileMatches=0 IFS=$'\n'
local files=( $textDirectory/*.txt )
local fileNames=( $( sort -Vi <<<${files:t} ) )
[[ $#files -gt 1 ]] || err "No text files found" eject $LINENO
# Repeatedly filter the fileNames for each of the law file filters that
# were passed in.
#
for filter in $lawFileFilter
fileNames=( $( grep $filter <<<"${fileNames}" ) )
[[ $#fileNames -gt 0 ]] || err "All law files were filtered out" eject
# For each filename, search for pattern matches. If there are any, report
# the corresponding PDF file, the page numbers and lines of the match.
#
for fileName in $fileNames
do
# Do a case-insensitive, multiline grep of the current file for the
# search pattern. In the grep, have each line prepended with the line
# number, which represents the PDF page number.
#
local pages=() page="" pageMatches=0
local file=$textDirectory"/"$fileName
pages=( $( $grep $fileGrepOptions -e $pattern $file ) )
# If the grep found nothing, move on to the next file. Otherwise, if
# the maximum file matches has been defined and has been exeeded, then
# stop processing files.
#
if [[ $#pages -eq 0 ]]
then
continue
elif [[ ++fileMatches -gt $maxFileMatches && $maxFileMatches -gt 0 ]]
then
break
fi
# For each page with a match, print the page number and the matching
# lines in the page.
#
for page in $pages
do
# If there have been no previous page matches in the current file,
# identify the corresponding PDF file that the matches, in theory,
# come from.
#
if [[ ++pageMatches -eq 1 ]]
then
# Put a blank line between matches for each file, unless
# either minimum output is requested or page matches are not
# reported.
#
if [[ $fileMatches -ne 1 && $pageMatches -ne 0
&& $maxPageMatches -ne 0 ]]
then
(( $quiet )) || print
fi
# Identify and print in color the full location of the PDF
# file (prepended with an open command for easy access),
# unless minimum output is requested.
#
local pdfFile=$pdfDirectory"/"${fileName:r}".pdf"
(( $quiet )) || print -P $c"open "$pdfFile$nc
fi
# If the maximum pages matches has been defined and has been
# exeeded, stop processing pages for the current file.
#
if [[ $maxPageMatches -gt 0 && $pageMatches -gt $maxPageMatches ]]
then
break
fi
# Extract and remove the page number specification (an initial
# number before a colon) from the grep output for the page. Then
# extract the lines of the page: tabs are decoded as newlines.
#
local pageNumber=${page%%:*}
page=${page#*:}
local lines=( $( tr '\t' '\n' <<<$page ) )
# Print the PDF page number in yellow. Then grep the lines of the
# page that we have, matching possibly multiple lines without
# regard to case. And have any grep output use color and a line
# before and after the match, for context.
#
(( $quiet )) || print -P $c"Page "$pageNumber$nc
if (( $grepOutput ))
then
$grep -C $contextLines -e $pattern $grepOptions <<<$lines
fi
done
done
}
Yes, if I had to do it again, I would have used Perl...
Here's the usage for greplaw, as a zsh shell function. It's a surprisingly fast search tool.
NAME
greplaw
SYNOPSIS
greplaw [-c number] [-f number] [-g] [-h] [-l pattern]... [-o option]...
[-p number] [-q] pattern...
DESCRIPTION
This function searches law files with a regular expression, which is
specified as one or more arguments. If more than one argument is provided,
they are joined, with a pattern of whitespace, into a singular expression.
The searches are done without regard to case or whitespace, including line
and page breaks. The output of this function is the path of each law file
the expression is found in, including the PDF page as well as the results
of the pcregrep. If just a page is reported without any pcregrep results,
the match begins on that page and continues to the next one.
The following options are available:
-c number
Context of number lines around each match shown. Default is 1.
-f number
File matches maximum is number. Default is 5. Infinite is -1.
-g
Grep output is omitted
-h
Help message is merely printed.
-l pattern
Law file regex pattern will be added as a filename filter.
-o option
Option option added to the final pcregrep execution, for output.
-p number
Page matches maximum is number. Default is 5. Infinite is -1.
-q
Quiet file and page information: information not from pcregrep.
EXAMPLES
greplaw bureau of investigation
greplaw -o --color=always congress has not | less -r
greplaw -l " " -l Law congress
That'll do it... (If I remembered everything correctly and didn't do any typos...)

Related

Is there a way to pass multiple values into a CSV file, based on the output of a linux script

I have written a small script that will take the users input and then generate the md5sum values for it
count = 0
echo "Enter number of records"
read number
while [ $count -le $number ]
do
echo "Enter path"
read path
echo "file name"
read file_name
md5sum $path"/"$filename #it shows the md5sum value and path+filename
((count++))
done
How can I pass these values ( path,file name, and md5sums ) to CSV file. ( assuming the user chooses to enter more than 1 record)
The output should be like
/c/training,sample.txt,34234435345346549862123454651324 #placeholder values
/c/file,text.sh,4534534534534534345345435342342
Interactively prompting for the number of files to process is just obnoxious. Change the script so it accepts the files you want to process as command-line arguments.
#!/bin/sh
md5sum "$#" |
sed 's%^\([0-9a-f]*\) \(\(.*\)/\)?\([^/]*\)$%\3,\4,\1%'
There are no Bash-only constructs here, so I switched the shebang to /bin/sh; obviously, you are still free to use Bash if you like.
There is a reason md5sum prints the checksum before the path name. The reordered output will be ambiguous if you have file names which contain commas (or newlines, for that matter). Using CSV format is actually probably something you should avoid if you can; Unix tools generally work better with simpler formats like tab-delimited (which of course also breaks if you have file names with tabs in them).
Rather than prompting the user for both a path to a directory and the name of a file in that directory, you could prompt for a full path to the file. You can then extract what you need from that path using bash string manipulations.
#!/bin/bash
set -euo pipefail
function calc_md5() {
local path="${1}"
if [[ -f "${path}" ]] ; then
echo "${path%/*}, ${path##*/}, $(md5sum ${path} | awk '{ print $1 }')"
else
echo "
x - Script requires path to file.
Usage: $0 /path/to/file.txt
"
exit 1
fi
}
calc_md5 "$#"
Usage example:
$ ./script.sh /tmp/test/foo.txt
/tmp/test, foo.txt, b05403212c66bdc8ccc597fedf6cd5fe

Bash Issue: AWK

I came back to work from a break to see that my Bash script wasn't working like it used to. The below tid-bit of code would grab and filter what's in a file. Here's the contents of said file:
# A colon, ':', is used as the field terminator. A new line terminates
# the entry. Lines beginning with a pound sign, '#', are comments.
#
# Entries are of the form:
# $ORACLE_SID:$ORACLE_HOME:<N|Y>:
#
# The first and second fields are the system identifier and home
# directory of the database respectively. The third filed indicates
# to the dbstart utility that the database should , "Y", or should not,
# "N", be brought up at system boot time.
#
# Multiple entries with the same $ORACLE_SID are not allowed.
#
#
OEM:/software/oracle/agent/agent12c/core/12.1.0.3.0:N
*:/software/oracle/agent/agent11g:N
dev068:/software/oracle/ora-10.02.00.04.11:Y
dev299:/software/oracle/ora-10.02.00.04.11:Y
xtst036:/software/oracle/ora-10.02.00.04.11:Y
xtst161:/software/oracle/ora-10.02.00.04.11:Y
dev360:/software/oracle/ora-11.02.00.04.02:Y
dev361:/software/oracle/ora-11.02.00.04.02:Y
xtst215:/software/oracle/ora-11.02.00.04.02:Y
xtst216:/software/oracle/ora-11.02.00.04.02:Y
dev298:/software/oracle/ora-11.02.00.04.03:Y
xtst160:/software/oracle/ora-11.02.00.04.03:Y
What the code used to produce and throw into an array:
dev068
dev299
xtst036
xtst161
dev360
dev361
xtst215
xtst216
dev298
xtst160
It would look at the file (oratab), find the database names (e.g. xtst160), and put them into an array. I then used this array for other tasks later in the script. Here's the relevant Bash script code:
# Collect the databases using a mixture of AWK and regex, and throw it into an array.
printf "\n2) Collecting databases on %s:\n" $HOSTNAME
declare -a arr_dbs=(`awk -F: -v key='/software/oracle/ora' '$2 ~ key{print $ddma_input}' /etc/oratab`)
# Loop through and print the array of databases.
for i in ${arr_dbs[#]}
do
printf "%s " $i
done
It doesn't seem anyone has modified the code or that the oratab file format has changed. So I'm not 100% sure what's going on now. Instead of grabbing the few characters, it's grabbing the entire line:
dev068:/software/oracle/ora-10.02.00.04.11:Y
I'm trying to understand Bash and regex more but I'm stumped. Definitely not my forte. A broken down explanation of the awk line would be greatly appreciated.
I found the error. We changed the amount of arguments being passed in and the order they are received.
printing $1 instead $ddma_input and resolve the issue as well.
declare -a arr_dbs=(`awk -F ":" -v key='/software/oracle/ora' '$2 ~ key{print $1}' /etc/oratab`)
# Loop through and print the array of databases.
for i in ${arr_dbs[#]}
do
printf "%s " $i
done
You could easily implement this whole thing in native bash with no external tools at all:
arr_dbs=( )
while IFS= read -r line; do
case $line in
"#"*) continue ;;
*:/software/oracle/ora*:*) arr_dbs+=( "${line%%:*}" ) ;;
esac
done </etc/oratab
printf ' %s\n' "${arr_dbs[#]}"
This actually avoids some bugs you had in your original implementation. Let's say you had a line like the following:
*:/software/oracle/ora-default:Y
If you aren't careful with how you handle that *, it'll be replaced with a list of filenames in the current directory by the shell whenever expansion occurs.
What does "whenever expansion occurs" mean in this context? Well:
# this will expand a * into a list of filenames during the assignment to the array
arr=( $(echo "*") ) # vs the correct read -a arr < <(echo "*")
# this will expand a * into a list of filenames while generating items to iterate over
for i in ${arr[#]} # vs the correct for i in "${arr[#]}"
# this will expand a * into a list of filenames while building the argument list for echo
i="*"
echo $i # vs the correct printf '%s\n' "$i"
Note the use of printf over echo -- see the APPLICATION USAGE section of the POSIX specification of echo.

Split multiple files

I have a directory with hundreds of files and I have to divide all of them in 400 lines files (or less).
I have tried ls and split, wc and split and to make some scripts.
Actually I'm lost.
Please, can anybody help me?
EDIT:
Thanks to John Bollinger and his answer this is the scritp we will use to our purpose:
#!/bin/bash
# $# -> all args passed to the script
# The arguments passed in order:
# $1 = num of lines (required)
# $2 = dir origin (optional)
# $3 = dir destination (optional)
if [ $# -gt 0 ]; then
lin=$1
if [ $# -gt 1 ]; then
dirOrg=$2
if [ $# -gt 2 ]; then
dirDest=$3
if [ ! -d "$dirDest" ]; then
mkdir -p "$dirDest"
fi
else
dirDest=$dirOrg
fi
else
dirOrg=.
dirDest=.
fi
else
echo "Missing parameters: NumLineas [DirectorioOrigen] [DirectorioDestino]"
exit 1
fi
# The shell glob expands to all the files in the target directory; a different
# glob pattern could be used if you want to restrict splitting to a subset,
# or if you want to include dotfiles.
for file in "$dirOrg"/*; do
# Details of the split command are up to you. This one splits each file
# into pieces named by appending a sequence number to the original file's
# name. The original file is left in place.
fileDest=${file##*/}
split --lines="$lin" --numeric-suffixes "$file" "$dirDest"/"$fileDest"
done
exit0
Since you seem to know about split, and to want to use it for the job, I guess your issue revolves around using one script to wrap the whole task. The details are unclear, but something along these lines is probably what you want:
#!/bin/bash
# If an argument is given then it is the name of the directory containing the
# files to split. Otherwise, the files in the working directory are split.
if [ $# -gt 0 ]; then
dir=$1
else
dir=.
fi
# The shell glob expands to all the files in the target directory; a different
# glob pattern could be used if you want to restrict splitting to a subset,
# or if you want to include dotfiles.
for file in "$dir"/*; do
# Details of the split command are up to you. This one splits each file
# into pieces named by appending a sequence number to the original file's
# name. The original file is left in place.
split --lines=400 --numeric-suffixes "$file" "$file"
done

Bash and Variable Substitution for file with space in their name: application for gpsbabel

I am trying to program a script to run gpsbabel. I am stuck to handle files with name containing (white) spaces.
My problem is in the bash syntax. Any help or insight from bash programmers will be much appreciated.
gpsbabel is software which permit merging of tracks recorded by gps devices.
The syntax for my purpose and which is working is:
gpsbabel -i gpx -f "file 1.gpx" -f "file 2.gpx" -o gpx -F output.gpx -x track,merge
The input format of the GPS data is given by -i , the output format by -o.
The input data files are listed after -f, and the resulting file after -F
(ref. gpsbabel manual, see example 4.9)
I am trying to write a batch to run this syntax with a number of input file not known initially. It means that the sequence -f "name_of_the_input_file" has to be repeated for each input file passed from the batch parameters.
Here is a script working for file with no spaces in their name
#!/bin/bash
# Append multiple gpx files easily
# batch name merge_gpx.sh
# Usage:
# merge_gpx.sh track_*.gpx
gpsbabel -i gpx $(echo $* | for GPX; do echo -n " -f $GPX "; done) \
-o gpx -F appended.gpx
`
So I tried to modify this script to handle also filename with containing spaces.
I got lost in the bash substitution and wrote and more sequenced bash for debugging purpose with no success.
Here is one of my trial
I get an error from gpsbabel "Extra arguments on command line" suggesting that I made a mistake in the variable usage.
#/bin/bash
# Merging all tracks in a single one
old_IFS=$IFS # Backup internal separator
IFS=$'\n' # New IFS
let i=0
echo " Merging GPX files"
for file in $(ls -1 "$#")
do
let i++
echo "i=" $i "," "$file"
tGPX[$i]=$file
done
IFS=$old_IFS #
#
echo "Number of files:" ${#tGPX[#]}
echo
# List of the datafile to treat (each name protected with a ')
LISTE=$(for (( ifile=1; ifile<=${#tGPX[#]} ; ifile++)) ;do echo -ne " -f '""${tGPX[$ifile]}""'"; done)
echo "LISTE: " $(echo -n $LISTE)
echo "++Merging .."
if (( $i>=1 )); then
gpsbabel -t \
-i gpx $(echo -n $LISTE) \
-x track,merge,title="TEST COMPIL" \
-o gpx -F track_compil.gpx
else
echo "Wrong selection of input file"
fi
#end
You are making things way more complicated for yourself than they need to be.
Any reasonably posix/gnu-compatible utility which takes an option in the form of two command-line arguments (-f STRING, or equivalently -f FILENAME) should also accept a single command-line argument -fSTRING. If the utility uses either getopt or getopt_long, this is automatic. gpsbabel appears to not use standard posix or gnu libraries for argument parsing, but I believe it still gets this right.
Apparently, your script expects its arguments to be a list of filenames; presumably, if the filenames include whitespace, you will quote the names which include whitespace:
./myscript "file 1.gpx" "file 2.gpx"
In that case, you only need to change the list of arguments by prepending -f to each one, so that the argument list becomes, in effect:
"-ffile 1.gpx" "-ffile 2.gpx"
That's extremely straightforward. We'll use the bash-specific find-and-replace syntax, described in the bash manual: (I highlighted the two features this solution uses)
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pattern just as in pathname expansion. Parameter is expanded and the longest match of pattern against its value is replaced with string. If pattern begins with /, all matches of pattern are replaced with string. Normally only the first match is replaced. If pattern begins with #, it must match at the beginning of the expanded value of parameter. If pattern begins with %, it must match at the end of the expanded value of parameter. If string is null, matches of pattern are deleted and the / following pattern may be omitted. If parameter is # or *, the substitution operation is applied to each positional parameter in turn, and the expansion is the resultant list. If parameter is an array variable subscripted with # or *, the substitution operation is applied to each member of the array in turn, and the expansion is the resultant list.
So, "${#/#/-f}" is the list of arguments (#), with the empty pattern at the beginning (#) replaced with -f:
#/bin/bash
# Merging all tracks in a single one
# $# is the number of arguments to the script.
if (( $# > 0 )); then
gpsbabel -t \
-i gpx "${#/#/-f}" \
-x track,merge,title="TEST COMPIL" \
-o gpx -F track_compil.gpx
else
# I changed the error message to make it more clear, sent it to stderr
# and cause the script to fail.
echo "No input files specified" >> /dev/stderr
exit 1
fi
Use an array:
files=()
for f; do
files+=(-f "$f")
done
gpsbabel -i gpx "${files[#]}" -o gpx -F appended.gpx
for f; do is short for for f in "$#"; do; most often you want to use $# to access the command-line arguments instead of $*. Quoting "${files[#]}" produces a list of words, one per element, that are treated as if they were quoted, so array elements containing whitespace are treated as a single word.

Attempting to pass two arguments to a called script for a pattern search

I'm having trouble getting a script to do what I want.
I have a script that will search a file for a pattern and print the line numbers and instances of that pattern.
I want to know how to make it print the file name first before it prints the lines found
I also want to know how to write a new script that will call this one and pass two arguments to it.
The first argument being the pattern for grep and the second the location.
If the location is a directory, it will loop and search the pattern on all files in the directory using the script.
#!/bin/bash
if [[ $# -ne 2 ]]
then
echo "error: must provide 2 arguments."
exit -1
fi
if [[ ! -e $2 ]];
then
echo "error: second argument must be a file."
exit -2
fi
echo "------ File =" $2 "------"
grep -ne "$1" "$2"
This is the script i'm using that I need the new one to call. I just got a lot of help from asking a similar question but i'm still kind of lost. I know that I can use the -d command to test for the directory and then use 'for' to loop the command, but exactly how isn't panning out for me.
I think you just want to add the -H option to grep:
-H, --with-filename
Print the file name for each match. This is the default when there is more than one file to search.
grep has an option -r which can help you avoid testing for second argument being a directory and using for loop to iterate all files of that directory.
From the man page:
-R, -r, --recursive
Recursively search subdirectories listed.
It will also print the filename.
Test:
On one file:
[JS웃:~/Temp]$ grep -r '5' t
t:5 10 15
t:10 15 20
On a directory:
[JS웃:~/Temp]$ grep -r '5' perl/
perl//hello.pl:my $age=65;
perl//practice.pl:use v5.10;
perl//practice.pl:#array = (1,2,3,4,5);
perl//temp/person5.pm:#person5.pm
perl//temp/person9.pm: my #date = (localtime)[3,4,5];
perl//text.file:This is line 5

Resources