How to remove the last CR char with `cut` - linux

I would like to get a portion of a string using cut. Here a dummy example:
$ echo "foobar" | cut -c1-3 | hexdump -C
00000000 66 6f 6f 0a |foo.|
00000004
Notice the \n char added at the end.
In that case there is no point to use cut to remove the last char as follow:
echo "foobar" | cut -c1-3 | rev | cut -c 1- | rev
I will still get this extra and unwanted char and I would like to avoid using an extra command such as:
shasum file | cut -c1-16 | perl -pe chomp

The \n is added by echo. Instead, use printf:
$ echo "foobar" | od -c
0000000 f o o b a r \n
0000007
$ printf "foobar" | od -c
0000000 f o o b a r
0000006
It is funny that cut itself also adds a new line:
$ printf "foobar" | cut -b1-3 | od -c
0000000 f o o \n
0000004
So the solution seems using printf to its output:
$ printf "%s" $(cut -b1-3 <<< "foobar") | od -c
0000000 f o o
0000003

Related

how to Replace all characters A and c from input to Z and e respectively

without using sed or awk
I tried this command to solve this problem
tr Ac Ze
but this command doesn't work
Does any help, please?
You can use sed command:
g: Apply the replacement to all matches to the regexp, not just the first.
s: stand for substitute
$ -> echo Aca | sed 's/A/Z/g; s/c/e/g'
Zea
Or just use tr command as said #James
$ -> echo Aca | tr Ac Ze
Zea
Another example:
#!/bin/bash
read -p "Insert word: " word
echo $word | tr Ac Ze
Result:
Insert word: Aca
Zea
Or:
#!/bin/bash
read -p "Insert word: " word
echo $word | sed 's/A/Z/g; s/c/e/g'
Aditional info:
tr
$ -> whatis tr
tr (1) - translate or delete characters
sed
$ -> whatis sed
sed (1) - stream editor for filtering and transforming text

Equivalent of head/tail command to show head/tail or a line

What is equivalent of head/tail command to show head/tail or a line?
head2 -2 "abcdefghijklmnopqrstuvwxyz"
=> ab
tail2 -2 "abcdefghijklmnopqrstuvwxyz"
=> yz
It's equivalent to head and tail if you want first/last characters of the whole stream
$ head -c2 <<<"abcdefghijklmnopqrstuvwxyz"
ab<will not output a newline>
$ tail -c3 <<<"abcdefghijklmnopqrstuvwxyz"
yz<newline>
The head will not output a newline, as it outputs only first two characters. tail counts newline as a character, so we need to output 3 to get the last two. Reformatting the commands to take arguments as in your example is trivial and I leave that to OP.
You can use cut if you want first characters of each line:
$ cut -c-2 <<<"abcdefghijklmnopqrstuvwxyz"$'\n''second line'
ab
se
and use rev | cut | rev mnemonic to get the last characters:
$ rev <<<"abcdefghijklmnopqrstuvwxyz"$'\n''second line' | cut -c-2 | rev
yz
ne
If you want to output more than 10 characters you can't use cut. Y
You could use cut
https://linux.die.net/man/1/cut
But you don't need it as we have bash substring extraction:
export txt="abcef"
echo head: ${txt:0:2}
echo tail: ${txt: -2}
https://www.tldp.org/LDP/abs/html/string-manipulation.html
You can directly use bash substring extraction syntax, no need to use external commands:
$ input="abcdefghijklmnopqrstuvwxyz"; echo ${input: -2}
yz
$ input="abcdefghijklmnopqrstuvwxyz"; echo ${input:0:2}
ab
With sed:
echo abcdefghijklmnopqrstuvwxyz | sed -E 's/^(..).*/\1/'; echo abcdefghijklmnopqrstuvwxyz | sed -E 's/^.*(..)$/\1/';
ab
yz
Depending on your distro, you can use the cut command:
Head:
echo "Hello! This is a test string." | cut -c1-2
Yields: He
For tail you basically do the same thing, but you reverse the string first, cut it, and reverse it again.
Tail:
echo "Hello! This is a test string." | rev | cut -c1-2 | rev
Yields: g.
2 is the amount of characters to print

Using tr to trim newlines from command-line argument ignored

I have a shell script that needs to trim newline from input. I am trying to trim new line like so:
param=$1
trimmed_param=$(echo $param | tr -d "\n")
# is the new line in my trimmed_param? yes
echo $trimmed_param| od -xc
# if i just run the tr -d on the data, it's trimmed.
# why is it not trimmed in the dynamic execution of echo in line 2
echo $param| tr -d "\n" |od -xc
I run it from command line as follows:
sh test.sh someword
And I get this output:
0000000 6f73 656d 6f77 6472 000a
s o m e w o r d \n
0000011
0000000 6f73 656d 6f77 6472
s o m e w o r d
0000010
The last command in the script echos what I would think trimmed_param would be if the tr -d "\n" had worked in line 2. What am I missing?
I realize I can use sed etc but ... I would love to understand why this method is failing.
There has never been a newline in the param. It's the echo which appends the newline. Try
# script.sh
param=$1
printf "%s" "${param}" | od -xc
Then
bash script.sh foo
gives you
0000000 6f66 006f
f o o
0000003

Read output of dd into a shell script variable

Being very new to shell scripts, I have pieced together the following to search /dev/sdd1, sector by sector, to find a string. How do I get the sector data into the $HAYSTACK variable?
#!/bin/bash
HAYSTACK=""
START_SEARCH=$1
NEEDLE=$2
START_SECTOR=2048
END_SECTOR=226512895+1
SECTOR_NUMBER=$((START_SEARCH + START_SECTOR))
while [ $SECTOR_NUMBER -lt $END_SECTOR ]; do
$HAYSTACK=`dd if=/dev/sdd1 skip=$SECTOR_NUMBER count=1 bs=512`
if [[ "$HAYSTACK" =~ "$NEEDLE" ]]; then
echo "Match found at sector $SECTOR_NUMBER"
break
fi
let SECTOR_NUMBER=SECTOR_NUMBER+1
done
Update
The intention is not to make a perfect script to handle fragmented file scenarios (I doubt that is possible at all).
In my case not being able to distinguish stings with nulls is also a non-issue.
If you could expand the pipe suggestions into an answer it would be more than enough. Thanks!
Background
I have managed to wipe my www folder and have been trying to recover as much of my source files as possible. I have used Scalpel to recover my php and html files. But the version I could get working on my Ubuntu 16.04 is Version 1.60 which does not support regex in header/footer so I cannot make a good pattern for css, js, and json files.
I remember fairly rare strings to search for and find my files, but have no idea where in a block the string could be. The solution I came up with is this shell script to read blocks from the partition and look for the substring and if a match is found print out the LSB number and exit.
If the searched for item is a text string, consider using the -t
option of the strings command to print the offset of where the
string is found. Since strings doesn't care where the data is
from, it works on files, block devices, and piped input from dd.
Example from the start of a hard disk:
sudo strings -t d /dev/sda | head -5
Output:
165 ZRr=
286 `|f
295 \|f1
392 GRUB
398 Geom
Instead of head that could be piped to grep -m 1 GRUB, which
would output only the first line with "GRUB":
sudo strings -t d /dev/sda | grep -m 1 GRUB
Output:
392 GRUB
From there, bash can do quite a lot. This code finds the first 5
instances of "GRUB" on my boot partition /dev/sda7:
s=GRUB ; sudo strings -t d /dev/sda7 | grep "$s" |
while read a b ; do
n=${b%%${s}*}
printf "String %-10.10s found %3i bytes into sector %i\n" \
"\"${b#${n}}\"" $(( (a % 512) + ${#n} )) $((a/512 + 1))
done | head -5
Output (the sector numbers here are relative to the start of the
partition):
String "GRUB Boot found 7 bytes into sector 17074
String "GRUB." found 548 bytes into sector 25702
String "GRUB." found 317 bytes into sector 25873
String "GRUBLAYO" found 269 bytes into sector 25972
String "GRUB" found 392 bytes into sector 26457
Things to watch out for:
Don't do dd-based single-block searches with strings as it would fail if the string spanned two blocks. Use strings to get
the offset first, then convert that offset to blocks, (or
sectors).
strings -t d can return big strings, and the "needle" might be several bytes into a string, in which case the offset would be the
start of the big string, rather than the grep string (or
"needle"). The above bash code allows for that and uses the $n
to calculate a corrected offset.
Lazy all-in-one util rafind2 method. Example, search for the
first instance of "GRUB" on /dev/sda7 as before:
sudo rafind2 -Xs GRUB /dev/sda7 | head -7
Output:
0x856207
- offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
0x00856207 4752 5542 2042 6f6f 7420 4d65 6e75 006e GRUB Boot Menu.n
0x00856217 6f20 666f 6e74 206c 6f61 6465 6400 6963 o font loaded.ic
0x00856227 6f6e 732f 0069 636f 6e64 6972 0025 733a ons/.icondir.%s:
0x00856237 2564 3a25 6420 6578 7072 6573 7369 6f6e %d:%d expression
0x00856247 2065 7870 6563 7465 6420 696e 2074 expected in t
With some bash and sed that output can be reworked into the same
format as the strings output:
s=GRUB ; sudo rafind2 -Xs "$s" /dev/sda7 |
sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g" |
sed -r -n 'h;n;n;s/.{52}//;H;n;n;n;n;g;s/\n//p' |
while read a b ; do
printf "String %-10.10s\" found %3i bytes into sector %i\n" \
"\"${b}" $((a%512)) $((a/512 + 1))
done | head -5
The first sed instance is borrowed from jfs' answer to "Program
that passes STDIN to STDOUT with color codes stripped?", since
the rafind2 outputs non-text color codes.
Output:
String "GRUB Boot" found 7 bytes into sector 17074
String "GRUB....L" found 36 bytes into sector 25703
String "GRUB...LI" found 317 bytes into sector 25873
String "GRUBLAYO." found 269 bytes into sector 25972
String "GRUB .Geo" found 392 bytes into sector 26457
Have you thought about some like this
cat /dev/sdd1 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ F l/'g > v1
cat /dev/sdd1 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/x F l/'g > v2
cmp -lb v1 v2
for example applying this to a .pdf file
od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ F l/'g > v1
od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ x l/'g > v2
cmp -l v1 v2
gives the output
228 106 F 170 x
23525 106 F 170 x
37737 106 F 170 x
48787 106 F 170 x
52577 106 F 170 x
56833 106 F 170 x
57869 106 F 170 x
118322 106 F 170 x
119342 106 F 170 x
where numbers in first column will be the byte offsets where the pattern being sought starts. These byte offsets are multiplied by four since od uses four bytes for every byte.
A single line form (in a bash shell), without writing large temporary files, would be
od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ x l/'g | cmp -lb - <(od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ F l/'g )
this avoids needing to write the contents of /dev/sdd1 to temporary files somewhere.
Here is an example looking for PDF on a USB drive device and dividing by 4 and 512 to get block numbers
dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | cmp -lb - <(dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/P D F/x D F/'g ) | awk '{print int($1/512/4)}' | head -10
testing this gives
100000+0 records in
100000+0 records out
51200000 bytes transferred in 18.784280 secs (2725683 bytes/sec)
100000+0 records in
100000+0 records out
51200000 bytes transferred in 40.915697 secs (1251353 bytes/sec)
cmp: EOF on -
28913
32370
32425
33885
35097
35224
37177
38522
39981
41570
where numbers are 512 byte block numbers. Checking gives
dd if=/dev/disk5s1 bs=512 skip=35224 count=1 | od -vc | grep P
0000340 \0 \0 \0 001 P D F C A R O \0 \0 \0 \0
Here is what an actual full example looks like with a disk and looking for character sequence live and where characters are separated by NUL
dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/l \\0 i \\0 v \\0 e/x \\0 i \\0 v \\0 e/'g | cmp -lb - <(dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/l \\0 i \\0 v \\0 e/l \\0 i \\0 v \\0 e/'g )
Note
this would not deal with fragmentation into non-consecutive blocks where that splits the pattern. The second sed, which does pattern and substitution, could be replaced by a custom program that does some partial pattern match and makes a substitution if number of matching characters is above some level. That might return false positives, but is probably the only way to deal with fragmentation.

Does awk CR LF handling break on cygwin?

On Linux, this runs as expected:
$ echo -e "line1\r\nline2"|awk -v RS="\r\n" '/^line/ {print "awk: "$0}'
awk: line1
awk: line2
But under windows the \r is dropped (awk considers this one line):
Windows:
$ echo -e "line1\r\nline2"|awk -v RS="\r\n" '/^line/ {print "awk: "$0}'
awk: line1
line2
Windows GNU Awk 4.0.1
Linux GNU Awk 3.1.8
EDIT from #EdMorton (sorry if this is an unwanted addition but I think maybe it helps demonstrate the issue):
Consider this RS setting and input (on cygwin):
$ awk 'BEGIN{printf "\"%s\"\n", RS}' | cat -v
"
"
$ echo -e "line1\r\nline2" | cat -v
line1^M
line2
This is Solaris with gawk:
$ echo -e "line1\r\nline2" | awk '1' | cat -v
line1^M
line2
and this is cygwin with gawk:
$ echo -e "line1\r\nline2" | awk '1' | cat -v
line1
line2
RS was just it's default newline so where did the control-M go in cygwin?
I just checked with Arnold Robbins (the provider of gawk) and the answer is that it's something done by the C libraries and to stop it happening you should set the awk BINMODE variable to 3:
$ echo -e "line1\r\nline2" | awk '1' | cat -v
line1
line2
$ echo -e "line1\r\nline2" | awk -v BINMODE=3 '1' | cat -v
line1^M
line2
See the man page for more info if interested.
It seems like the issue is awk specific under Cygwin.
I tried a few different things and it seems that awk is silently treating replacing \r\n with \n in the input data.
If we simply ask awk to repeat the text unmodified, it will "sanitize" the carriage returns without asking:
$ echo -e "line1\r\nline2" | od -a
0000000 l i n e 1 cr nl l i n e 2 nl
0000015
$ echo -e "line1\r\nline2" | awk '{ print $0; }' | od -a
0000000 l i n e 1 nl l i n e 2 nl
0000014
It will, however, leave other carriage returns intact:
$ echo -e "Test\rTesting\r\nTester\rTested" | awk '{ print $0; }' | od -a
0000000 T e s t cr T e s t i n g nl T e s
0000020 t e r cr T e s t e d nl
0000033
Using a custom record separator of _ ended up leaving the carriage returns intact:
$ echo -e "Testing\r_Tested" | awk -v RS="_" '{ print $0; }' | od -a
0000000 T e s t i n g cr nl T e s t e d nl
0000020 nl
0000021
The most telling example involves having \r\n in the data, but not as a record separator:
$ echo -e "Testing\r\nTested_Hello_World" | awk -v RS="_" '{ print $0; }' | od -a
0000000 T e s t i n g nl T e s t e d nl H
0000020 e l l o nl W o r l d nl nl
0000034
awk is blindly converting \r\n to \n in the input data even though we didn't ask it to.
This substitution seems to be happening before applying record separation, which explains why RS="\r\n" never matches anything. By the time awk is looking for \r\n, it's already substituted it with \n in the input data.

Resources