Hi I have this file full of data; the time stamps is basically the beginning of the line. I need to break down the file and print each line individually. How can I accomplish this using only bash and (if needed) standard UNIX tools (sed, awk, etc)?
The time stamp field goes from 08:30:00:324810: onward .. example 17:30:00:324810: . The number of field following the time stamp varies; so there could be 1 to x number of fields . So I need to find the time stamp format and then insert a page break.
08:30:00:324810: usg_07Y BidYield=1.99788141 Bid=99.20312500 08:30:00:325271: usg_07Y
AskYield=1.98578274 Ask=99.28125000 08:30:00:325535: usg_10Y Ask=0.00000000 08:30:01:324881:
usg_07Y BidYield=2.02938740 AskYield=1.97127853 Bid=99.00000000 Ask=99.37500000 08:30:01:377021:
usg_05Y Bid=0.00000000 Ask=0.00000000
Thanking u in advance
Matt
It is fairly trivial. Read the file into an array, find the timestamp, output a newline before it:
#!/bin/bash
set -f # inhibit globbing (filename expansion)
declare -i cnt=0 # simple counter
a=( $(<"$1") ) # read file into array
for i in "${a[#]}"; do # for each word in file
if [ "$cnt" -gt 0 ]; then # test counter > 0
# if last char ':', then output newline before word
[ ${i:(-1):1} = ':' ] && printf "\n%s" "${i}" || printf " %s" "$i"
else
printf "%s" "$i" # if first word, just print.
fi
((cnt++))
done
printf "\n"
Use/output:
$ bash parsedtstamp.sh filename.txt
08:30:00:324810: usg_07Y BidYield=1.99788141 Bid=99.20312500
08:30:00:325271: usg_07Y AskYield=1.98578274 Ask=99.28125000
08:30:00:325535: usg_10Y Ask=0.00000000
08:30:01:324881: usg_07Y BidYield=2.02938740 AskYield=1.97127853 Bid=99.00000000 Ask=99.37500000
08:30:01:377021: usg_05Y Bid=0.00000000 Ask=0.00000000
I added a counter var to only output the newline if not the first word.
Alternate version that avoids temporary array storage (for large files)
While there is no limit on array size in Bash, if you find yourself parsing million line files, it is probably better to avoid storing all lines in memory. This can be accomplished by simply processing the lines as they are read from the file. It is just a way of doing to same thing without using an array for intermediate storage:
#!/bin/bash
set -f # inhibit globbing (filename expansion)
declare -i cnt=0 # simple counter
# read each line in file
while read -r line_entries || [ -n "$line_entries" ]; do
for i in $line_entries; do # for each word in line (no quotes for word splitting)
if [ "$cnt" -gt 0 ]; then # test counter > 0
# if last char ':', then output newline before word
if [ ${i:(-1):1} = ':' ]; then
printf "\n%s" "${i}"
else
printf " %s" "$i"
fi
else
printf "%s" "$i" # if first word, just print.
fi
((cnt++)) # increment counter
done
done <"$1"
printf "\n"
An awk way
awk -vORS="" '{for(i=1;i<=NF;i++)if($i~/:$/&&x++)$i="\n"$i}$NF=$NF" "
END{print "\n"}' file
Sets output record sep to nothing.
Loops through fields.
If fields last char is : then it add a newline before the field.
Adds space to last field in case it is a date to prevent no space between colon and next field.
Prints a newline at the end.
Related
DESCRIPTION: Currently needing to input (or) pass the "realpath" of each file found hit found into a text file named "RESULTS.txt" that can be accessed by the user after the script runs.
ISSUE: Currently the syntax that i'm using to pass the code into the text file is failing. When I go to access the file it is blank. The following is the current code that is being used.
while read -r filename
do
filecount=$((filecount+1))
tput rc # return cursor to previously saved terminal line (tput sc)
# print filename (1st line of output); if shorter than previous filename we need to erase rest of line
filename="${filename%$'\r'}"
printf "file: ${filename}${erase}\n"
realpath "$filename" > RESULTS.txt
# print our status bar (2nd line of output) on the first and every ${modcount} pass through loop;
if [ ${filecount} -eq 1 ]
then
printf "[${barhash}${barspace}]\n"
elif [[ $((filecount % ${modcount} )) -eq 0 ]]
then
# for every ${modcount}th file we ...
barspace=${barspace:1:100000} # strip a space from barspace
barhash="${barhash}#" # add a '#' to barhash
printf "[${barhash}${barspace}]\n" # print our new status bar
fi
done < <(find "$dir_choice" -type f | sort -V )
...Pretty sure that the mistake that I am making is silly and is matter of syntax along with place ment not sure if I should be using realpath "$filname" >> RESULTS instead realpath "$filename" > RESULTS.txt
Write a shell script to count the number of lines, characters, words in a file (without the use of commands). Also delete the occurrence of word “Linux” from the file wherever it appears and save the results in a new file.
This is the nearest I could get without using any third party packages...
#!/bin/bash
count=0
while read -r line
do
count=$((count + 1))
done < "$filename"
echo "Number of lines: $count"
Sachin Bharadwaj gave a script that counts the lines.
Now, to count the words, we can use set to split the line into $# positional parameters.
And to count the characters, we can use the parameter length: ${#line}.
Finally, to delete every “Linux”, we can use pattern substitution: ${line//Linux}.
(Cf. Shell Parameter Expansion.)
All taken together:
while read -r line
do
((++count))
set -- $line
((wordcount+=$#))
((charcount+=${#line}+1)) # +1 for the '\n'
echo "${line//Linux}"
done < "$filename" >anewfile
echo "Number of lines: $count"
echo "Number of words: $wordcount"
echo "Number of chars: $charcount"
I need to find strings matching some regexp pattern and represent the search result as array for iterating through it with loop ), do I need to use sed ? In general I want to replace some strings but analyse them before replacing.
Using sed and diff:
sed -i.bak 's/this/that/' input
diff input input.bak
GNU sed will create a backup file before substitutions, and diff will show you those changes. However, if you are not using GNU sed:
mv input input.bak
sed 's/this/that/' input.bak > input
diff input input.bak
Another method using grep:
pattern="/X"
subst=that
while IFS='' read -r line; do
if [[ $line = *"$pattern"* ]]; then
echo "changing line: $line" 1>&2
echo "${line//$pattern/$subst}"
else
echo "$line"
fi
done < input > output
The best way to do this would be to use grep to get the lines, and populate an array with the result using newline as the internal field separator:
#!/bin/bash
# get just the desired lines
results=$(grep "mypattern" mysourcefile.txt)
# change the internal field separator to be a newline
IFS=$'/n'
# populate an array from the result lines
lines=($results)
# return the third result
echo "${lines[2]}"
You could build a loop to iterate through the results of the array, but a more traditional and simple solution would just be to use bash's iteration:
for line in $lines; do
echo "$line"
done
FYI: Here is a similar concept I created for fun. I thought it would be good to show how to loop a file and such with this. This is a script where I look at a Linux sudoers file check that it contains one of the valid words in my valid_words array list. Of course it ignores the comment "#" and blank "" lines with sed. In this example, we would probably want to just print the Invalid lines only but this script prints both.
#!/bin/bash
# -- Inspect a sudoer file, look for valid and invalid lines.
file="${1}"
declare -a valid_words=( _Alias = Defaults includedir )
actual_lines=$(cat "${file}" | wc -l)
functional_lines=$(cat "${file}" | sed '/^\s*#/d;/^\s*$/d' | wc -l)
while read line ;do
# -- set the line to nothing "" if it has a comment or is empty line.
line="$(echo "${line}" | sed '/^\s*#/d;/^\s*$/d')"
# -- if not set to nothing "", check if the line is valid from our list of valid words.
if ! [[ -z "$line" ]] ;then
unset found
for each in "${valid_words[#]}" ;do
found="$(echo "$line" | egrep -i "$each")"
[[ -z "$found" ]] || break;
done
[[ -z "$found" ]] && { echo "Invalid=$line"; sleep 3; } || echo "Valid=$found"
fi
done < "${file}"
echo "actual lines: $actual_lines funtional lines: $functional_lines"
I'm writing my first Bash script, I have some experience with C and C# so I think the logic of the program is correct, it's just the syntax is so complicated because apparently there are many different ways to write the same thing!
Here is the script, it simply checks if the argument (string) is contained in a certain file. If so it stores each line of the file in an array and writes an item of the array in a file. I'm sure there must be easier ways to achieve that but I want to do some practice with bash loops
#!/bin/bash
NOME=$1
c=0
#IF NAME IS FOUND IN THE PHONEBOOK THEN STORE EACH LINE OF THE FILE INTO ARRAY
#ONCE THE ARRAY IS DONE GET THE INDEX OF MATCHING NAME AND RETURN ARRAY[INDEX+1]
if grep "$NOME" /root/phonebook.txt ; then
echo "CREATING ARRAY"
while read line
do
myArray[$c]=$line # store line
c=$(expr $c + 1) # increase counter by 1
done < /root/phonebook.txt
else
echo "Name not found"
fi
c=0
for i in myArray;
do
if myArray[$i]="$NOME" ; then
echo ${myArray[i+1]} >> /root/numbertocall.txt
fi
done
This code returns the only the second item of myArray (myArray[2]) or the second line of the file, why?
The first part (where you build the array) looks ok, but the second part has a couple of serious errors:
for i in myArray; -- this executes the loop once, with $i set to "myArray". In this case, you want $i to iterate over the indexes of myArray, so you need to use
for i in "${!myArray[#]}"
or
for ((i=0; i<${#a[#]}; i++))
(although I generally prefer the first, since it'll work with noncontiguous and associative arrays).
Also, you don't need the ; unless do is on the same line (in shell, ; is mostly equivalent to a line break so having a semicolon at the end of a line is redundant).
if myArray[$i]="$NOME" ; then -- the if statement takes a command, and will therefore treat myArray[$i]="$NOME" as an assignment command, which is not at all what you wanted. In order to compare strings, you could use the test command or its synonym [
if [ "${myArray[i]}" = "$NOME" ]; then
or a bash conditional expression
if [[ "${myArray[i]}" = "$NOME" ]]; then
The two are very similar, but the conditional expression has much cleaner syntax (e.g. in a test command, > redirects output, while \> is a string comparison; in [[ ]] a plain > is a comparison).
In either case, you need to use an appropriate $ expression for myArray, or it'll be interpreted as a literal. On the other hand, you don't need a $ before the i in "${myArray[i]}" because it's in a numeric expression context and therefore will be expanded automatically.
Finally, note that the spaces between elements are absolutely required -- in shell, spaces are very important delimiters, not just there for readability like they usually are in c.
1.-This is what you wrote with small adjustments
#!/bin/bash
NOME=$1
#IF NAME IS FOUND IN THE PHONE-BOOK **THEN** READ THE PHONE BOOK LINES INTO AN ARRAY VARIABLE
#ONCE THE ARRAY IS COMPLETED, GET THE INDEX OF MATCHING LINE AND RETURN ARRAY[INDEX+1]
c=0
if grep "$NOME" /root/phonebook.txt ; then
echo "CREATING ARRAY...."
IFS= while read -r line #IFS= in case you want to preserve leading and trailing spaces
do
myArray[c]=$line # put line in the array
c=$((c+1)) # increase counter by 1
done < /root/phonebook.txt
for i in ${!myArray[#]}; do
if myArray[i]="$NOME" ; then
echo ${myArray[i+1]} >> /root/numbertocall.txt
fi
done
else
echo "Name not found"
fi
2.-But you can also read the array and stop looping like this:
#!/bin/bash
NOME=$1
c=0
if grep "$NOME" /root/phonebook.txt ; then
echo "CREATING ARRAY...."
readarray myArray < /root/phonebook.txt
for i in ${!myArray[#]}; do
if myArray[i]="$NOME" ; then
echo ${myArray[i+1]} >> /root/numbertocall.txt
break # stop looping
fi
done
else
echo "Name not found"
fi
exit 0
3.- The following improves things. Supposing a)$NAME matches the whole line that contains it and b)there's always one line after a $NOME found, this will work; if not (if $NOME can be the last line in the phone-book), then you need to do small adjustments.
!/bin/bash
PHONEBOOK="/root/phonebook.txt"
NUMBERTOCALL="/root/numbertocall.txt"
NOME="$1"
myline=""
myline=$(grep -A1 "$NOME" "$PHONEBOOK" | sed '1d')
if [ -z "$myline" ]; then
echo "Name not found :-("
else
echo -n "$NOME FOUND.... "
echo "$myline" >> "$NUMBERTOCALL"
echo " .... AND SAVED! :-)"
fi
exit 0
I'm looping over a series of large files with a shell script:
i=0
while read line
do
# get first char of line
first=`echo "$line" | head -c 1`
# make output filename
name="$first"
if [ "$first" = "," ]; then
name='comma'
fi
if [ "$first" = "." ]; then
name='period'
fi
# save line to new file
echo "$line" >> "$2/$name.txt"
# show live counter and inc
echo -en "\rLines:\t$i"
((i++))
done <$file
The first character in each line will either be alphanumeric, or one of the above defined characters (which is why I'm renaming them for use in the output file name).
It's way too slow.
5,000 lines takes 128seconds.
At this rate I've got a solid month of processing.
Will awk be faster here?
If so, how do I fit the logic into awk?
This can certainly be done more efficiently in bash.
To give you an example: echo foo | head does a fork() call, creates a subshell, sets up a pipeline, starts the external head program... and there's no reason for it at all.
If you want the first character of a line, without any inefficient mucking with subprocesses, it's as simple as this:
c=${line:0:1}
I would also seriously consider sorting your input, so you can only re-open the output file when a new first character is seen, rather than every time through the loop.
That is -- preprocess with sort (as by replacing <$file with < <(sort "$file")) and do the following each time through the loop, reopening the output file only conditionally:
if [[ $name != "$current_name" ]] ; then
current_name="$name"
exec 4>>"$2/$name" # open the output file on FD 4
fi
...and then append to the open file descriptor:
printf '%s\n' "$line" >&4
(not using echo because it can behave undesirably if your line is, say, -e or -n).
Alternately, if the number of possible output files is small, you can just open them all on different FDs up-front (substituting other, higher numbers where I chose 4), and conditionally output to one of those pre-opened files. Opening and closing files is expensive -- each close() forces a flush to disk -- so this should be a substantial help.
A few things to speed it up:
Don't use echo/head to get the first character. You're
spawning at least two additional processes per line. Instead,
use bash's parameter expansion facilities to get the first character.
Use if-elif to avoid checking $first against all the
possibilities
each time. Even better, if you are using bash 4.0 or later, use an associative array
to store the output file names, rather than checking against
$first in a big if-statement for each line.
If you don't have a version of bash that supports associative
arrays, replace your if statements with the following.
if [[ "$first" = "," ]]; then
name='comma'
elif [[ "$first" = "." ]]; then
name='period'
else
name="$first"
fi
But the following is suggested. Note the use of $REPLY as the default variable used by read if no name is given (just FYI).
declare -A OUTPUT_FNAMES
output[","]=comma
output["."]=period
output["?"]=question_mark
output["!"]=exclamation_mark
output["-"]=hyphen
output["'"]=apostrophe
i=0
while read
do
# get first char of line
first=${REPLY:0:1}
# make output filename
name=${output[$first]:-$first}
# save line to new file
echo $REPLY >> "$name.txt"
# show live counter and inc
echo -en "\r$i"
((i++))
done <$file
#!/usr/bin/awk -f
BEGIN {
punctlist = ", . ? ! - '"
pnamelist = "comma period question_mark exclamation_mark hyphen apostrophe"
pcount = split(punctlist, puncts)
ncount = split(pnamelist, pnames)
if (pcount != ncount) {print "error: counts don't match, pcount:", pcount, "ncount:", ncount; exit}
for (i = 1; i <= pcount; i++) {
punct_lookup[puncts[i]] = pnames[i]
}
}
{
print > punct_lookup[substr($0, 1, 1)] ".txt"
printf "\r%6d", i++
}
END {
printf "\n"
}
The BEGIN block builds an associative array so you can do punct_lookup[","] and get "comma".
The main block simply does the lookups for the filenames and outputs the line to the file. In AWK, > truncates the file the first time and appends subsequently. If you have existing files that you don't want truncated, then change it to >> (but don't use >> otherwise).
Yet another take:
declare -i i=0
declare -A names
while read line; do
first=${line:0:1}
if [[ -z ${names[$first]} ]]; then
case $first in
,) names[$first]="$2/comma.txt" ;;
.) names[$first]="$2/period.txt" ;;
*) names[$first]="$2/$first.txt" ;;
esac
fi
printf "%s\n" "$line" >> "${names[$first]}"
printf "\rLine $((++i))"
done < "$file"
and
awk -v dir="$2" '
{
first = substr($0,1,1)
if (! (first in names)) {
if (first == ",") names[first] = dir "/comma.txt"
else if (first == ".") names[first] = dir "/period.txt"
else names[first] = dir "/" first ".txt"
}
print > names[first]
printf("\rLine %d", NR)
}
'