Reading from file bash Linux - linux

I am having a hard time with the following bash script:
basically what the script does is receives a directory and then it searches in all of the folders that are in the directory for files that end with .log. after that it should print to the stdout all the lines from those files sorted by the date they were written in.
my script is this:
#!/bin/bash
find . -name ".*log" | cat *.log | sort --stable --reverse --key=2,3
when i run the script it does return the list but the sort doesnt work properly. my guess is because in some files there are \n which makes it start a new line.
is there a way to ignore the \n that are in the file while still having each line return on a new line?
thank you!
xxd command output:
ise#ise-virtual-machine:~$ xxd /home/ise/Downloads/f1.log
00000000: 3230 3139 2d30 382d 3232 5431 333a 3333 2019-08-22T13:33
00000010: 3a34 342e 3132 3334 3536 3738 3920 4865 :44.123456789 He
00000020: 6c6c 6f0a 576f 726c 640a 0032 3032 302d llo.World..2020-
00000030: 3031 2d30 3154 3131 3a32 323a 3333 2e31 01-01T11:22:33.1
00000040: 3233 3435 3637 3839 206c 6174 650a 23456789 late.
ise#ise-virtual-machine:~$ xxd /home/ise/Downloads/f2.log
00000000: 3230 3139 2d30 392d 3434 5431 333a 3434 2019-09-44T13:44
00000010: 3a32 312e 3938 3736 3534 3332 3120 5369 :21.987654321 Si
00000020: 6d70 6c65 206c 696e 650a mple line.
ise#ise-virtual-machine:~$ xxd /home/ise/Downloads/f3.log
00000000: 3230 3139 2d30 382d 3232 5431 333a 3333 2019-08-22T13:33
00000010: 3a34 342e 3132 3334 3536 3738 3920 4865 :44.123456789 He
00000020: 6c6c 6f0a 576f 726c 6420 320a 0032 3032 llo.World 2..202
00000030: 302d 3031 2d30 3154 3131 3a32 323a 3333 0-01-01T11:22:33
00000040: 2e31 3233 3435 3637 3839 206c 6174 6520 .123456789 late
00000050: 320a 2.

Given that the entries in the log file are terminated with \0 (NUL), find, sed and sort can be combined:
find . -name '*.log' | xargs sed -z 's/\n//g' | sort -z --key=2,3 --reverse

By assuming each record in the file starts with the date and the option --key=2,3 is not necessary, please try:
find . -name "*.log" -exec cat '{}' \; | sort -z | xargs -I{} -0 echo "{}"
The final command xargs .. echo .. will be necessary to print properly the null-terminated lines.
If you still require --key option, please modify the code as you like. I'm not aware how the lines look like as of now.
[UPDATE]
According to the provided information by the OP, I assume the format of the log files
will be:
Each record starts with the date in "yyyy-mm-ddTHH:MM:SS.nanosec" format
and a simple dictionary order sort can be applied.
Each record ends with "\n\0" except for the last record of the file
which ends just with "\n".
Each record may contain newline character(s) in the middle as a part
of the record for the line folding purpose.
Then how about:
find . -name "*.log" -type f -exec cat "{}" \; -exec echo -ne "\0" \; | sort -z
echo -ne "\0" appends a null character to the last record of a file.
Otherwise the record will be merged to the next record of another file.
The -z option to sort treats the null character as a record separator.
No other option to sort will be required so far.
Result with the posted input by the OP:
2019-08-22T13:33:44.123456789 Hello
World
2019-08-22T13:33:44.123456789 Hello
World 2
2019-09-44T13:44:21.987654321 Simple line
2020-01-01T11:22:33.123456789 late
2020-01-01T11:22:33.123456789 late 2
It still keeps the null character "\0" at the end of each record.
If you want to trim it off, please add the tr -d "\0" command
at the end of the pipeline as:
find . -name "*.log" -type f -exec cat "{}" \; -exec echo -ne "\0" \; | sort -z | tr -d "\0"
Hope this helps.

Related

ffmpeg messes up variables [duplicate]

This question already has answers here:
Bash script stops execution of ffmpeg in while loop - why?
(3 answers)
Execute "ffmpeg" command in a loop [duplicate]
(3 answers)
Closed 7 days ago.
I am trying to split audio files by their chapters. I have downloaded this as audio with yt-dlp with its chapters on. I have tried this very simple script to do the job:
#!/bin/sh
ffmpeg -loglevel 0 -i "$1" -f ffmetadata meta # take the metadata and output it to the file meta
cat meta | grep "END" | awk -F"=" '{print $2}' | awk -F"007000000" '{print $1}' > ends #
cat meta | grep "title=" | awk -F"=" '{print $2}' | cut -c4- > titles
from="0"
count=1
while IFS= read -r to; do
title=$(head -$count titles | tail -1)
ffmpeg -loglevel 0 -i "$1" -ss $from -to $to -c copy "$title".webm
echo $from $to
count=$(( $count+1 ))
from=$to
done < ends
You see that I echo out $from and $to because I noticed they are just wrong. Why is this? When I comment out the ffmpeg command in the while loop, the variables $from and $to turn out to be correct, but when it is uncommented they just become some stupid numbers.
Commented output:
0 465
465 770
770 890
890 1208
1208 1554
1554 1793
1793 2249
2249 2681
2681 2952
2952 3493
3493 3797
3797 3998
3998 4246
4246 4585
4585 5235
5235 5375
5375 5796
5796 6368
6368 6696
6696 6961
Uncommented output:
0 465
465 70
70 890
890 08
08 1554
1554 3
3 2249
2249
2952
2952 3493
3493
3998
3998 4246
4246 5235
5235 796
796 6368
6368
I tried lots of other stuff thinking that they might be the problem but they didn't change anything. One I remember is I tried havin $from and $to in the form of %H:%M:%S which, again, gave the same result.
Thanks in advance.
Here is an untested refactoring; hopefully it can at least help steer you in another direction.
Avoid temporary files.
Avoid reading the second input file repeatedly inside the loop.
Refactor the complex Awk scripts into a single script.
To be on the safe side, add a redirection from /dev/null to prevent ffmpeg from eating the input data.
#!/bin/sh
from=0
ffmpeg -loglevel 0 -i "$1" -f ffmetadata - |
awk -F '=' '/END/ { s=$2; sub(/007000000.*/, "", s); end[++i] = s }
/title=/ { t=$2; sub(/^([^-]-){3}/, "", t); title[++j] = t }
END { for(n=1; n<=i; n++) print end[n]; print title[n] }' |
while IFS="" read -r end; do
IFS="" read -r title
ffmpeg -loglevel 0 -i "$1" -ss "$from" -to "$end" -c copy "$title".webm </dev/null
from="$end"
done
The Awk script reads all the data into memory, and then prints one "end" marker followed by the corresponding title on the next line; I can't be sure what your ffmpeg -f ffmetadata command outputs, so I just blindly refactored what your scripts seemed to be doing. If the output is somewhat structured you can probably read one record at a time.

Use unix command inside awk commandline to process fields

Is there a way to use unix command inside awk one-liner to do something and output the result on STDIN?
For example:
ls -lrt|awk '$8 !~ /:/ {system(date -d \"$6" "$7" "$8\" +"%Y%m%d")"|\t"$0}'
You are parsing ls, which can cause several problems.
When you are trying to get your filenames order by last modification with yyyymmdd in front of it, you can look at
# not correct for some filenames
stat --format "%.10y %n" * | tr -d '-' | sort
The solution fails for filenames with -. One way to solve that is using
# Still not ok
stat --format "%.10y %n" * | sed -r 's/^(..)-(..)/\1\2/' | sort
This will fail for filenames with newlines.
touch -d "2019-09-01 12:00:00" "two
lines.txt"
shows some of the problems you can also see with ls.
How you should solve this depends on your exact requirements.
Example
find . -maxdepth 1 ! -name "[.]*" -printf '%TY%Tm%Td ' -print0 |
sed 's#[.]/##g'| tr "\n\0" "/\n" | sort
Explanation:
maxdepth 1 Only look in current directory
! -name "[.]*" Ignore hidden files
-printf '%TY%Tm%Td ' YYYYMMDD and space
-print0 Don't use \n but NULL at the end of each result
sed 's#[.]/##g' Remove the path ./
tr "\n\0" "/\n" Replace newlines in a filename with / and NULLs with newlines
After the sort you might want to tr '|' '\n'.
If you want to have the output of the command, you can make use of getline:
kent$ awk 'BEGIN{"date"|getline;print}'
Fri 15 Nov 2019 10:51:10 AM CET
You can also assign the output to an awk variable:
kent$ awk 'BEGIN{"date"|getline v;print v}'
Fri 15 Nov 2019 10:50:20 AM CET
You are trying to format the date output from ls.
The find command has extensive control over date and time output. Using -printf action.
For examples here:
$ ls -l
-rw-rw-r--. 1 cfrm cfrm 41 Nov 15 09:12 input.txt
-rw-rw-r--. 1 cfrm cfrm 67 Nov 15 09:13 script.awk
$ find . -printf "fileName=%f \t formatedDate-UTC=[%a] \t formatedDate-custom=[%AY-%Am-%Ad]\n"
fileName=. formatedDate-UTC=[Fri Nov 15 09:43:32.0222415982 2019] formatedDate-custom=[2019-11-15]
fileName=input.txt formatedDate-UTC=[Fri Nov 15 09:12:33.0117279463 2019] formatedDate-custom=[2019-11-15]
fileName=script.awk formatedDate-UTC=[Fri Nov 15 09:13:38.0743189896 2019] formatedDate-custom=[2019-11-15]
For sorting by timestamp, we can mark the sorting to start on the timestamp marker ([ in the following example)
$ find . -printf "%f timestamp=[%AY%Am%Ad:%AT]\n" |sort -t [
22114 timestamp=[20190511:10:32:22.6453184660]
5530 timestamp=[20190506:01:03:01.2225343480]
5764 timestamp=[20190506:01:03:34.7107944450]
.font-unix timestamp=[20191115:13:27:01.8699219890]
hsperfdata_artemis timestamp=[20191115:13:27:01.8699219890]
hsperfdata_cfrm timestamp=[20191115:13:27:01.8709219730]
hsperfdata_elasticsearch timestamp=[20191115:13:27:01.8699219890]
.ICE-unix timestamp=[20191115:13:27:01.8699219890]
input.txt timestamp=[20191115:09:12:33.1172794630]
junk timestamp=[20191115:09:43:32.2224159820]
script.awk timestamp=[20191115:09:13:38.7431898960]
systemd-private-1a6c51334d6f4723b46fe5ca51b632c6-chronyd.service-AoZvZM timestamp=[20190516:05:09:51.1884573210]
systemd-private-1a6c51334d6f4723b46fe5ca51b632c6-vgauthd.service-f2m9rt timestamp=[20190516:05:09:51.1884573210]
systemd-private-1a6c51334d6f4723b46fe5ca51b632c6-vmtoolsd.service-0CJ32C timestamp=[20190516:05:09:51.1884573210]
.Test-unix timestamp=[20191115:13:27:01.8699219890]
. timestamp=[20191115:13:26:56.8770048750]
.X11-unix timestamp=[20191115:13:27:01.8699219890]
.XIM-unix timestamp=[20191115:13:27:01.8699219890]

Read output of dd into a shell script variable

Being very new to shell scripts, I have pieced together the following to search /dev/sdd1, sector by sector, to find a string. How do I get the sector data into the $HAYSTACK variable?
#!/bin/bash
HAYSTACK=""
START_SEARCH=$1
NEEDLE=$2
START_SECTOR=2048
END_SECTOR=226512895+1
SECTOR_NUMBER=$((START_SEARCH + START_SECTOR))
while [ $SECTOR_NUMBER -lt $END_SECTOR ]; do
$HAYSTACK=`dd if=/dev/sdd1 skip=$SECTOR_NUMBER count=1 bs=512`
if [[ "$HAYSTACK" =~ "$NEEDLE" ]]; then
echo "Match found at sector $SECTOR_NUMBER"
break
fi
let SECTOR_NUMBER=SECTOR_NUMBER+1
done
Update
The intention is not to make a perfect script to handle fragmented file scenarios (I doubt that is possible at all).
In my case not being able to distinguish stings with nulls is also a non-issue.
If you could expand the pipe suggestions into an answer it would be more than enough. Thanks!
Background
I have managed to wipe my www folder and have been trying to recover as much of my source files as possible. I have used Scalpel to recover my php and html files. But the version I could get working on my Ubuntu 16.04 is Version 1.60 which does not support regex in header/footer so I cannot make a good pattern for css, js, and json files.
I remember fairly rare strings to search for and find my files, but have no idea where in a block the string could be. The solution I came up with is this shell script to read blocks from the partition and look for the substring and if a match is found print out the LSB number and exit.
If the searched for item is a text string, consider using the -t
option of the strings command to print the offset of where the
string is found. Since strings doesn't care where the data is
from, it works on files, block devices, and piped input from dd.
Example from the start of a hard disk:
sudo strings -t d /dev/sda | head -5
Output:
165 ZRr=
286 `|f
295 \|f1
392 GRUB
398 Geom
Instead of head that could be piped to grep -m 1 GRUB, which
would output only the first line with "GRUB":
sudo strings -t d /dev/sda | grep -m 1 GRUB
Output:
392 GRUB
From there, bash can do quite a lot. This code finds the first 5
instances of "GRUB" on my boot partition /dev/sda7:
s=GRUB ; sudo strings -t d /dev/sda7 | grep "$s" |
while read a b ; do
n=${b%%${s}*}
printf "String %-10.10s found %3i bytes into sector %i\n" \
"\"${b#${n}}\"" $(( (a % 512) + ${#n} )) $((a/512 + 1))
done | head -5
Output (the sector numbers here are relative to the start of the
partition):
String "GRUB Boot found 7 bytes into sector 17074
String "GRUB." found 548 bytes into sector 25702
String "GRUB." found 317 bytes into sector 25873
String "GRUBLAYO" found 269 bytes into sector 25972
String "GRUB" found 392 bytes into sector 26457
Things to watch out for:
Don't do dd-based single-block searches with strings as it would fail if the string spanned two blocks. Use strings to get
the offset first, then convert that offset to blocks, (or
sectors).
strings -t d can return big strings, and the "needle" might be several bytes into a string, in which case the offset would be the
start of the big string, rather than the grep string (or
"needle"). The above bash code allows for that and uses the $n
to calculate a corrected offset.
Lazy all-in-one util rafind2 method. Example, search for the
first instance of "GRUB" on /dev/sda7 as before:
sudo rafind2 -Xs GRUB /dev/sda7 | head -7
Output:
0x856207
- offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
0x00856207 4752 5542 2042 6f6f 7420 4d65 6e75 006e GRUB Boot Menu.n
0x00856217 6f20 666f 6e74 206c 6f61 6465 6400 6963 o font loaded.ic
0x00856227 6f6e 732f 0069 636f 6e64 6972 0025 733a ons/.icondir.%s:
0x00856237 2564 3a25 6420 6578 7072 6573 7369 6f6e %d:%d expression
0x00856247 2065 7870 6563 7465 6420 696e 2074 expected in t
With some bash and sed that output can be reworked into the same
format as the strings output:
s=GRUB ; sudo rafind2 -Xs "$s" /dev/sda7 |
sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g" |
sed -r -n 'h;n;n;s/.{52}//;H;n;n;n;n;g;s/\n//p' |
while read a b ; do
printf "String %-10.10s\" found %3i bytes into sector %i\n" \
"\"${b}" $((a%512)) $((a/512 + 1))
done | head -5
The first sed instance is borrowed from jfs' answer to "Program
that passes STDIN to STDOUT with color codes stripped?", since
the rafind2 outputs non-text color codes.
Output:
String "GRUB Boot" found 7 bytes into sector 17074
String "GRUB....L" found 36 bytes into sector 25703
String "GRUB...LI" found 317 bytes into sector 25873
String "GRUBLAYO." found 269 bytes into sector 25972
String "GRUB .Geo" found 392 bytes into sector 26457
Have you thought about some like this
cat /dev/sdd1 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ F l/'g > v1
cat /dev/sdd1 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/x F l/'g > v2
cmp -lb v1 v2
for example applying this to a .pdf file
od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ F l/'g > v1
od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ x l/'g > v2
cmp -l v1 v2
gives the output
228 106 F 170 x
23525 106 F 170 x
37737 106 F 170 x
48787 106 F 170 x
52577 106 F 170 x
56833 106 F 170 x
57869 106 F 170 x
118322 106 F 170 x
119342 106 F 170 x
where numbers in first column will be the byte offsets where the pattern being sought starts. These byte offsets are multiplied by four since od uses four bytes for every byte.
A single line form (in a bash shell), without writing large temporary files, would be
od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ x l/'g | cmp -lb - <(od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ F l/'g )
this avoids needing to write the contents of /dev/sdd1 to temporary files somewhere.
Here is an example looking for PDF on a USB drive device and dividing by 4 and 512 to get block numbers
dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | cmp -lb - <(dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/P D F/x D F/'g ) | awk '{print int($1/512/4)}' | head -10
testing this gives
100000+0 records in
100000+0 records out
51200000 bytes transferred in 18.784280 secs (2725683 bytes/sec)
100000+0 records in
100000+0 records out
51200000 bytes transferred in 40.915697 secs (1251353 bytes/sec)
cmp: EOF on -
28913
32370
32425
33885
35097
35224
37177
38522
39981
41570
where numbers are 512 byte block numbers. Checking gives
dd if=/dev/disk5s1 bs=512 skip=35224 count=1 | od -vc | grep P
0000340 \0 \0 \0 001 P D F C A R O \0 \0 \0 \0
Here is what an actual full example looks like with a disk and looking for character sequence live and where characters are separated by NUL
dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/l \\0 i \\0 v \\0 e/x \\0 i \\0 v \\0 e/'g | cmp -lb - <(dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/l \\0 i \\0 v \\0 e/l \\0 i \\0 v \\0 e/'g )
Note
this would not deal with fragmentation into non-consecutive blocks where that splits the pattern. The second sed, which does pattern and substitution, could be replaced by a custom program that does some partial pattern match and makes a substitution if number of matching characters is above some level. That might return false positives, but is probably the only way to deal with fragmentation.

Bash script check if zip file is empty

I am rather inexperienced in linux,
I have to check in bash script if some zip file is empty - i.e zip contains no files.
I found this code:
if ! zipinfo ${filetotransfer} | tail -n 1 | grep '^0 ' >/dev/null ; then # we have empty zip file!
echo " zip empty"
rm $filetotransfer
exit 0
fi
But it removes file both if zip is empty or not.
Is there any way to check it?
You can just check file size is 22 with stat or md5sum or zip header
# by checking file size
% stat -f%z a.zip
22
% xxd a.zip
0000000: 504b 0506 0000 0000 0000 0000 0000 0000 PK..............
0000010: 0000 0000 0000 ......
# with md5sum
$ md5sum a.zip
76cdb2bad9582d23c1f6f4d868218d6c a.zip
# or by checking zip header
% [ `head -n22 a.zip | tr -d '\0-\6'` = "PK" ] && echo 1
1
You can check the error status of zipinfo -t
f=test.zip
if zipinfo -t "$f" > /dev/null
then
echo "not empy"
else
echo "empty"
fi

Bash Script - getting input from standard input or a file

I have a bash script that prints columns by name taken from the command line. It works well if I give the script the file as one of the arguments. It does not work well if I pipe input to the script and use /dev/stdin as the file. Does anyone know how I can modify the script to accept standard input from a pipe correctly? Here is my script.
#!/bin/bash
insep=" "
outsep=" "
while [[ ${#} > 0 ]]
do
option="$1"
if [ -f $option ] || [ $option = /dev/stdin ];
then
break;
fi
case $option in
-s|--in_separator)
insep="$2"
shift # past argument
shift # past argument
;;
-o|--out_separator)
outsep="$2"
shift # past argument
shift # past argument
;;
*)
echo "unknown option $option"
exit 1;
;;
esac
done
headers="${#:2}"
grep_headers=$(echo "${headers[#]}" | sed 's/ /|/g')
file=$1
columns=$(awk -F: 'NR==FNR{b[($2)]=tolower($1);next}{print $1,b[$1]}' \
<(head -1 $file | sed "s/$insep/\n/g" | egrep -iwn "$grep_headers" | awk '{s=tolower($0);print s}') \
<(awk -F: -v header="$headers" 'BEGIN {n=split(tolower(header),a," ");for(i=1;i<=n;i++) print a[i]}' $file ) \
| awk '{print "$"$2}' ORS='OFS' | sed "s/OFS\$//")
awk -v insep="$insep" -v outsep="$outsep" "BEGIN{FS=insep;OFS=outsep}{print $columns}" $file
exit;
Sample Input:
col_1 col_2 col_3 col_4 col_5 col_6 col_7 col_8 col_9 col_10
10000 10010 10020 10030 10040 10050 10060 10070 10080 10090
10001 10011 10021 10031 10041 10051 10061 10071 10081 10091
10002 10012 10022 10032 10042 10052 10062 10072 10082 10092
10003 10013 10023 10033 10043 10053 10063 10073 10083 10093
10004 10014 10024 10034 10044 10054 10064 10074 10084 10094
10005 10015 10025 10035 10045 10055 10065 10075 10085 10095
10006 10016 10026 10036 10046 10056 10066 10076 10086 10096
10007 10017 10027 10037 10047 10057 10067 10077 10087 10097
10008 10018 10028 10038 10048 10058 10068 10078 10088 10098
Running with file as an argument (works as expected):
> ./shell_scripts/print_columns.sh file1.txt col_1 col_4 col_6 col_2 | head
col_1 col_4 col_6 col_2
10000 10030 10050 10010
10001 10031 10051 10011
10002 10032 10052 10012
10003 10033 10053 10013
Piping from standard in (does not work as expected):
> head file1.txt | ./shell_scripts/print_columns.sh /dev/stdin col_1 col_4 col_6 col_2 | head
0185 10215 10195
10136 10166 10186 10146
10137 10167 10187 10147
10138 10168 10188 10148
10139 10169 10189 10149
An example:
script.sh:
#!/bin/bash
if [[ -f "$1" ]]; then
file="$1"
cat "$file"
shift
else
while read -r file; do echo "$file"; done
fi
echo "${#}"
Test with:
./script.sh file1.txt abc 123 456
and with UUOC:
cat file1.txt | ./script.sh abc 123 456

Resources