Add line feed every 2391 byte - linux

I am using Redhat Linux 6.
I have a file which should comes from mainframe MVS with EBCDIC-ASCII conversion.
(But I suspect some conversion may be wrong)
Anyway, I know that the record length is 2391 byte. There are 10 records and the file size is 23910 byte.
For each 2391 byte record, there are many 0a or 0d char (not CRLF). I want to replace them with, say, # and #.
Also, I want to add a LF (i.e.0a) every 2391 byte so as to make the file become a normal unix text file for further processing.
I have try to use
dd ibs=2391 obs=2391 if=emyfile of=myfile.new
But, this cannot work. Both files are the same.
I also try
dd ibs=2391 obs=2391 if=myfile | awk '{print $0}'
But, this also not work
Can anyone help on this ?

Something like this:
#!/bin/bash
for i in {0..9}; do
dd if=emyfile bs=2391 count=1 skip=$i | LC_CTYPE=C tr '\r\n' '##'
echo
done > newfile
If your files are longer, you will need more than 10 iterations. I would look to handle that by running an infinite looop and exiting the loop on error, like this:
#!/bin/bash
i=0
while :; do
dd if=emyfile bs=2391 count=1 skip=$i | LC_CTYPE=C tr '\r\n' '##'
[ ${PIPESTATUS[0]} -ne 0 ] && break
echo
((i++))
done > newfile
However, on my iMac under OSX, dd doesn't seem to exit with an error when you go past end of file - maybe try your luck on your OS.

You could try
$ dd bs=2391 cbs=2391 conv=ascii,unblock if=emyfile of=myfile.new
conv=ascii converts from EBCDIC to ASCII. conv=unblock inserts a newline at the end of each cbs-sized block (after removing trailing spaces).
If you already have a file in ASCII and just want to replace some characters in it before splitting the blocks, you could use tr(1). For example, the following will replace each carriage return with '#' and each newline (linefeed) with '#':
$ tr '\r\n' '##' < emyfile | dd bs=2391 cbs=2391 conv=unblock of=myfile.new

Related

How to generate a random string without repeated characters?

Im trying to generate a password in Bash that matches MacOS password requirements and one of them is that it can't have repeated characters (aa, bb, 44, 00, etc).
I know i can use openssl rand -base64 or /dev/urandom and use tr -d to manipulate the output string. I use grep -E '(.)\1{1,}' to search for repeated characters but if i use this regex to delete (tr -d (.)\1{1,}'), it deletes the entire string. I even tried tr -s '(.)\1{1,}' to squeeze the characters to just one occurrence but it keep generating repeated characters in some attempts. Is it possible to achieve what i'm trying to?
P.S.: that's a situation where i cant download any "password generator tool" like pwgen and more. It must be "native"
Sorry I have no bash at hands to, but trying to help you.
What about iteratively grabbing the unique chars, eg. by
chars=$(openssl rand -base64)
pwd=
for (( i=0; i<${#chars}; i++ )); do
if [[ "$(echo $pwd | grep "${chars:$i:1}")" == "" ]]; then
pwd=$pwd${chars:$i:1}
fi
done
The issue might be that you have non-printable characters, so it's not actually repeated. If you first get the first e.g. 30 characters, then delete any non-alphanumeric, non punctuation characters, then squeeze any of those characters, then from whatever is left get the first 20 characters, it seems to work:
cat /dev/urandom | tr -dc '[:alnum:][:punct:]' | fold -w ${1:-30} | head -n 1 | tr -s '[:alnum:][:punct:]' | cut -c-20
Output e.g.:
]'Zc,fs6m;wUo%wLIG%K
2O3Ff4dzi30~L.RH8jR0
sU?,WkK]&I;z'|eTSLjY
5gK]\H51i#Rtux.{bdC=
:g"\?5JsjBd1r])2^WR+
;{cR:jY\rIc&Q(2yo:|-
fFykmxvZ|ATX_l6L(8h:
^Sd*,V%9}bWnTYNv"w?'
6foMgbU6:n<*cWj2W=3&
*v39FWmB#LwE5O`a3C36
Is there a specific size requirement? Other required characters?
How about -
openssl rand -base64 20 | sed -E 's/(.)\1+/\1/g'
You're getting close. tr doesn't use regex (but does use POSIX character classes).
Either of these will squeeze repeats:
tr -cs '\0'
tr -s '[:graph:][:space:]'
They differ only in how we refer to "all characters". First is "complement of null" second is all printable and all white space characters. There may be a neater way to specify "all characters".
Or using sed:
sed -E 's/(.)\1+/\1/g'
This both squeezes printable characters, and removes white space:
tr -ds '[:space:]' '[:graph:]'
Example for 32 non whitespace characters, with no repeats:
tr -ds '[:space:]' '[:graph:]' < /dev/urandom |
dd bs=32 count=1
Also, this example specifies a list of allowed characters (letters, digits, and _.), then squeezes any repeats:
tr -dc '[:alnum:]_.' < /dev/urandom |
tr -sc '\0' |
dd bs=32 count=1
Example output:
9mCEqrhHPwmq7.1qEky6qn4jqzDpRK7b
Putting dd at the end means we get 32 characters after removing repeats. You may also want to add status=none to hide dd logging on stderr.
It's not clear if you don't want consecutive chars repeated or no repeated chars at all (which in either case I don't think it's a good idea as it would make your passwords weaker and easier to guess), but having said that
#! /bin/bash
awk -vN=20 '
{
n=split($0,ch,"");
for (i=1; i<n; i++) {
a[ch[i]]++
}
n=0;
for (c in a) {
if (++n > N) {
break;
}
printf("%c",c)
}
printf("\n")
}
' < <(openssl rand -base64 32)
this generated N length passwords without repeated chars from 32 random bytes (that is N should be much smaller than 32)

How to split a delimited string to compose a dd command in bash?

I would like to read a config file that should look similar to what is shown below:
source/path:blocksize,offset,seek,count
source/path2:blocksize,offset,seek
source/path3:blocksize,offset
Where source/path,source/path2 and source/path3 are paths to some binary file and offset, seek, count and blocksize are respective values for dd command.
Note that the variables may vary, like some binary file may not have seek or both seek and count values for dd command.
How should I split the above lines to compose dd command like this
dd if=${source/path} bs=${blocksize} seek=${seek} count=${count}
dd if=${source/path} bs=${blocksize} seek=${seek}
dd if=${source/path} bs=${blocksize}?
It is ok if modification is required in the above format to make it easy for parsing cause I ran out of all possibilities that my naive mind can think of.
Hope this helps:
$ cat <<EOF | while read line; do arr=($(sed 's/[,:]/ /g' <<< $line)); echo "source:${arr[0]} block:${arr[1]} offset:${arr[2]} seek:${arr[3]} count:${arr[4]}"; done
source/path:blocksize,offset,seek,count
source/path2:blocksize,offset,seek
source/path3:blocksize,offset
EOF
source:source/path block:blocksize offset:offset seek:seek count:count
source:source/path2 block:blocksize offset:offset seek:seek count:
source:source/path3 block:blocksize offset:offset seek: count:
General Idea:
#!/usr/bin/env bash
your_command | while read line; do
arr=($(sed 's/[,:]/ /g' <<< $line));
echo "source:${arr[0]} block:${arr[1]} offset:${arr[2]} seek:${arr[3]} count:${arr[4]}"
# Do whatever processing & validation you want here
# access from array : ${arr[0]}....${arr[n]}
#
done
If you're having file then:
#!/usr/bin/env bash
while read line; do
arr=($(sed 's/[,:]/ /g' <<< $line));
echo "source:${arr[0]} block:${arr[1]} offset:${arr[2]} seek:${arr[3]} count:${arr[4]}"
# Do whatever processing & validation you want here
# access from array : ${arr[0]}....${arr[n]}
#
done < "path/to/your-file"

how to "decdump" a string in bash?

I need to convert a string into a sequence of decimal ascii code using bash command.
example:
for the string 'abc' the desired output would be 979899 where a=97, b=98 and c=99 in ascii decimal code.
I was able to achieve this with ascii hex code using xxd.
printf '%s' 'abc' | xxd -p
which gives me the result: 616263
where a=61, b=62 and c=63 in ascii hexadecimal code.
Is there an equivalent to xxd that gives the result in ascii decimal code instead of ascii hex code?
If you don't mind the results are merged into a line, please try the following:
echo -n "abc" | xxd -p -c 1 |
while read -r line; do
echo -n "$(( 16#$line ))"
done
Result:
979899
str=abc
printf '%s' $str | od -An -tu1
The -An gets rid of the address line, which od normally outputs, and the -tu1 treats each input byte as unsigned integer. Note that it assumes that one character is one byte, so it won't work with Unicode, JIS or the like.
If you really don't want spaces in the result, pipe it further into tr -d ' '.
Unicode Solution
What makes this problem annoying is that you have to pipeline characters when converting from hex to decimal. So you can't do a simple conversion from char to hex to dec as some characters hex representations are longer than others.
Both of these solutions are compatible with unicode and use a character's code point. In both solutions, a newline is chosen as separator for clarity; change this to '' for no separator.
Bash
sep='\n'
charAry=($(printf 'abc🎶' | grep -o .))
for i in "${charAry[#]}"; do
printf "%d$sep" "'$i"
done && echo
97
98
99
127926
Python (in Bash)
Here, we use a list comprehension to convert every character to a decimal number (ord), join it as a string and print it. sys.stdin.read() allows us to use Python inline to get input from a pipe. If you replace input with your intended string, this solution is then cross-platform.
printf '%s' 'abc🎶' | python -c "
import sys
input = sys.stdin.read()
sep = '\n'
print(sep.join([str(ord(i)) for i in input]))"
97
98
99
127926
Edit: If all you care about is using hex regardless of encoding, use #user1934428's answer

remove end of line characters with a bash script?

I'm trying to make a script to remove this characters (/r/n) that windows puts. BUT ONLY if they are between this ( " ) why this?
because the dump file puts this characters I don't know why.
and why between quotes? because it only affect me if they are chopping my result
For Example. "this","is","a","result","from","database"
the problem :
"this","is","a","result","from","da
tabase"
[EDIT]
Thanks to the answer of #Cyrus I got something like this
, but it gets bad flag in substitute command '}' I'm on MAC OSX
Can you help me?
Thanks
OS X uses a different sed than the one that's typically installed in Linux.
The big differences are that sequences like \r and \n don't get expanded or used as part of the expression as you might expect, and you tend to need to separate commands with semicolons a little more.
If you can get by with a sed one-liner that implements a rule like "Remove any \r\n on lines containing quotes", it will certainly simplify your task...
For my experiments, I used what I infer is your sample input data:
$ od -c input.txt
0000000 F o r E x a m p l e . " t h
0000020 i s " , " i s " , " a " , " r e
0000040 s u l t " , " f r o m " , " d a
0000060 t a \r \n b a s e " \n
0000072
First off, a shell-only solution might be to use smaller tools that are built in to the operating system. For example, here's a one-liner:
od -A n -t o1 -v input.txt | rs 0 1 | while read n; do [ $n -eq 015 ] && read n && continue; printf "\\$n"; done
Broken out for easier reading, here's what this looks like:
od -A n -t o1 -v input.txt | rs 0 1 - convert the file into a stream of ocal numbers
| while read n; do - step through the numbers...
[ $n -eq 015 ] && - if the current number is 15 (i.e. octal for a Carriage Return)
read n - read a line (thus skipping it),
&& continue - and continue to the next octal number (thus skipping the newline after a CR)
printf "\\$n"; done - print the current octal number.
This kind of data conversion and stream logic works nicely in a pipeline, but is a bit harder to implement in sed, which only knows how to deal with the original input rather than its converted form.
Another bash option might be to use conditional expressions matching the original lines of input:
while read line; do
if [[ $line =~ .*\".*$'\r'$ ]]; then
echo -n "${line:0:$((${#line}-1))}"
else
echo "$line"
fi
done < input.txt
This walks through text, and if it sees a CR, it prints everything up to and not including it, with no trailing newline. For all other lines, it just prints them as usual. The result is that lines that had a carriage return are joined, other lines are not.
From sed's perspective, we're dealing with two input lines, the first of which ends in a carriage return. The strategy for this would be to search for carriage returns, remove them and join the lines. I struggled for a while trying to come up with something that would do this, then gave up. Not to say it's impossible, but I suspect a generally useful script will be lengthy (by sed standards).

Read characters from a text file using bash

Does anyone know how I can read the first two characters from a file, using a bash script. The file in question is actually an I/O driver, it has no new line characters in it, and is in effect infinitely long.
The read builtin supports the -n parameter:
$ echo "Two chars" | while read -n 2 i; do echo $i; done
Tw
o
ch
ar
s
$ cat /proc/your_driver | (read -n 2 i; echo $i;)
I think
dd if=your_file ibs=2 count=1 will do the trick
Looking at it with strace shows it is effectively doing a two bytes read from the file.
Here is an example reading from /dev/zero, and piped into hd to display the zero :
dd if=/dev/zero bs=2 count=1 | hd
1+0 enregistrements lus
1+0 enregistrements écrits
2 octets (2 B) copiés, 2,8497e-05 s, 70,2 kB/s
00000000 00 00 |..|
00000002
echo "Two chars" | sed 's/../&\n/g'
G'day,
Why not use od to get the slice that you need?
od --read-bytes=2 my_driver
Edit: You can't use head for this as the head command prints to stdout. If the first two chars are not printable, you don't see anything.
The od command has several options to format the bytes as you want.

Resources