How to make a binary string k-bits long - string

So what the function does is ask for input for the file name, k-bit binary strings, and n number of times to write k-bit binary strings.
def makeStrings():
fileName = str(input("File Name: "))
k = input("Bits Input: ")
n = int(input("Number of Strings: "))
outputFile = open(fileName, "w")
counter = 0
while (counter < n):
randomNumber = random.randint(0, 9223372036854775808) #that big number is the max representation of a 64bit binary string
binary = ('{:0' + str(k) + 'b}').format(randomNumber)
outputFile.write(str(binary) + "\n")
counter = counter + 1
outputFile.close()
So what my issue is is that the function works. It does what it is suppose to do except that it doesn't format the binary to be k-bits long. so say for instance I could have any binary representation from random numbers, but I only want them to be k-bits long. so if I make k = 8 it should give me random 8-bit binary strings that are 8 bits long or if I make k = 15 it should give me random 15-bit binary strings.
So say for instance my input is
>>> FileName: binarystext.txt #Name of the file
>>> 16 #number of bits for binary string
>>> 2 #number of times the k-bit binary string is written to the file
It should write to the file the following
1111110110101010
0001000101110100
the same representation would be applied for any bit binary string. 2, 3, 8, 32 etc.
I thought of maybe splitting all the numbers into their represented binary forms so like 0-255 for 8 bit for example and I would just format it if k = 8, but that would mean I would have to do that a ton of times.
Right now what I get is
1010001100000111111001100010000001000000011010111
or some other really long binary string.

The problem is that width in format string is the minimum width of the field:
width is a decimal integer defining the minimum field width. If not specified, then the field width will be determined by the content.
You could just limit the number you're randomizing to fix the issue:
randomNumber = random.randint(0, 2 ** k - 1)

Use random.getrandbits() and a format string:
>>> import random
>>> random.getrandbits(8)
41
>>>
>>> s = '{{:0{}b}}'
>>> k = 24
>>> s = s.format(k)
>>> s.format(random.getrandbits(k))
'101111111001101111100100'
>>>

Related

Image compression in python

For my image Compression, I am using the pillow library to get every pixel in rgb (for ex: (100, 0, 200). Using the Huffman encoding I already convert to binary to reduce the number of bits. For now, I have to save the sequence of bits into a text or binary file. The compress files to be consistently smaller than original, but for now, my txt file is larger than the original. What should I do ?
And after that how can I read the file and decompress it. Here is an instruction:
Your code should read in an image file, compute how many bits are required for a fixed length encoding
and then apply a compression algorithm to create a smaller encoding – you need to implement the
compression, you cannot use a compression library. You should output how many bits are required to store the image in your compressed format as well as the compression ratio achieved. When it comes
to saving your compressed image, you won’t be able to save it as a standard image format, since you will
have created your own encoding, but you can save the sequence of bits into a text or binary file.
Your code should also be able to prompt the user for the filename of a text file containing a compressed
sequence of bits and then decompress that file into the original image – you can assume that the file
uses the same compression format as the last file you compressed. So, for example, if you compressed pacificat.bmp into a series of bits stored in pacificat.txt and then the user asked you to decompress alt_encode.txt, you could assume that alt_pacificat.txt used the same compression data structure as encode.txt (it might be a subset of the data from the original image, for example).
There are a number of libraries that can help you store formatted data into a file from Python. If you research the options and find a way to store your compression data structure into a file, such that the user can select both a bit file and a data structure file and use the data structure to decompress the bit file
just use my current image: flag2.bmp
here is my code
from PIL import Image
import sys, string
import copy
import time
codes = {}
def sortFreq (freqs) :
letters = freqs.keys()
tuples = []
for let in letters :
tuples.append((freqs[let],let))
tuples.sort()
return tuples
def buildTree(tuples) :
while len(tuples) > 1 :
leastTwo = tuple(tuples[0:2]) # get the 2 to combine
theRest = tuples[2:] # all the others
combFreq = leastTwo[0][0] + leastTwo[1][0] # the branch points freq
tuples = theRest + [(combFreq,leastTwo)] # add branch point to the end
tuples.sort() # sort it into place
return tuples[0] # Return the single tree inside the list
def trimTree (tree) :
# Trim the freq counters off, leaving just the letters
p = tree[1] # ignore freq count in [0]
if type(p) == type("") : return p # if just a leaf, return it
else : return (trimTree(p[0]), trimTree(p[1])) # trim left then right and recombine
def assignCodes(node, pat=''):
global codes
if type(node) == type("") :
codes[node] = pat # A leaf. set its code
else : #
assignCodes(node[0], pat+"0") # Branch point. Do the left branch
assignCodes(node[1], pat+"1") # then do the right branch.
start = time.time()
dictionary = {}
table = {}
image = Image.open('flag2.bmp')
#image.show()
width, height = image.size
px= image.load()
totalpixel = width*height
print("Total pixel: "+ str(totalpixel))
for x in range(width):
for y in range(height):
# print(px[x, y])
for i in range(3):
if dictionary.get(str(px[x, y][i])) is None:
dictionary[str(px[x, y][i])] = 1
else:
dictionary[str(px[x, y][i])] = dictionary[str(px[x, y][i])] +1
table = copy.deepcopy(dictionary)
def encode2 (str) :
global codes
output = ""
for ch in str : output += codes[ch]
return output
def decode (tree, str) :
output = ""
p = tree
for bit in str :
if bit == '0' : p = p[0] # Head up the left branch
else : p = p[1] # or up the right branch
if type(p) == type("") :
output += p # found a character. Add to output
p = tree # and restart for next character
return output
combination = len(dictionary)
for value in table:
table[value] = table[value] / (totalpixel * combination) * 100
print(table)
print(dictionary)
sortdic = sortFreq(dictionary)
tree = buildTree(sortdic)
print("tree")
print(tree)
trim = trimTree(tree)
print("trim")
print(trim)
print("assign 01")
assignCodes(trim)
print(codes)
empty_tuple = ()
f = open("answer.txt","w")
for x in range(width):
for y in range(height):
list = []
list.append(codes[str(px[x, y][0])])
list.append(codes[str(px[x, y][1])])
list.append(codes[str(px[x, y][2])])
print(str(px[x, y]) + ": " +str(list))
f.write(str(list))
print("decode test:", str(decode (trim, "1100")))
stop = time.time()
times = (stop - start) * 1000
print("Run time takes %d miliseconds" % times)
[flag2.bmp][1]
Code Cleanup
Let's try to refactor your code a little, taking advantage of algorithms provided by Python standard library, while keeping to the spirit of your approach to Huffman tree calculation and image encoding.
Calculating Symbol Counts
First of all, we can refactor the symbol counting into a function and rewrite it in more concise way:
Use Image.getdata() to iterate over all the pixels in the image
Since each pixel is represented by a tuple, use itertools.chain.from_iterable to get a flattened view of intensities.
Take advantage of collections.Counter to get the symbol (intensity counts)
Additionally, we can change it to return a list of (symbol, count), sorted in ascending order by (count, symbol). To do so, we can combine it with a rewritten version of your sortFreq(...) function, taking advantage of:
Python sorted(...) function (which allows us to define the key to sort by), together with
Tuple slicing to reverse the (symbol, count) tuples for sorting
Implementation:
from collections import Counter
from itertools import chain
def count_symbols(image):
pixels = image.getdata()
values = chain.from_iterable(pixels)
counts = Counter(values).items()
return sorted(counts, key=lambda x:x[::-1])
Building the Tree
Only a small change is needed here -- since we already have the symbol counts sorted, we just need to reverse the tuples to let your existing tree-building algorithm to work. We can use list comprehension together with tuple slicing to express this concisely.
Implementation:
def build_tree(counts) :
nodes = [entry[::-1] for entry in counts] # Reverse each (symbol,count) tuple
while len(nodes) > 1 :
leastTwo = tuple(nodes[0:2]) # get the 2 to combine
theRest = nodes[2:] # all the others
combFreq = leastTwo[0][0] + leastTwo[1][0] # the branch points freq
nodes = theRest + [(combFreq, leastTwo)] # add branch point to the end
nodes.sort() # sort it into place
return nodes[0] # Return the single tree inside the list
Trimming the Tree
Again, just two small changes from your original implementation:
Change the test to check for tuple (node), to be independent of how a symbol is represented.
Get rid of the unnecessary else
Implementation:
def trim_tree(tree) :
p = tree[1] # Ignore freq count in [0]
if type(p) is tuple: # Node, trim left then right and recombine
return (trim_tree(p[0]), trim_tree(p[1]))
return p # Leaf, just return it
Assigning Codes
The most important change here is to eliminate the reliance on a global codes variable. To resolve it, we can split the implementation into two functions, one which handles the recursive code assignment, and a wrapper which creates a new local codes dictionary, dispatches the recursive function on it, and returns the output.
Let's also switch the representation of codes from strings to lists of bits (integers in range [0,1]) -- the usefulness of this will be apparent later.
Once more, we'll change the test to check for tuples (for same reason as when trimming).
Implementation:
def assign_codes_impl(codes, node, pat):
if type(node) == tuple:
assign_codes_impl(codes, node[0], pat + [0]) # Branch point. Do the left branch
assign_codes_impl(codes, node[1], pat + [1]) # then do the right branch.
else:
codes[node] = pat # A leaf. set its code
def assign_codes(tree):
codes = {}
assign_codes_impl(codes, tree, [])
return codes
Encoding
Let's make a small detour, and talk about encoding of the data.
First of all, let's observe that a raw RGB pixel is represented by 3 bytes (one for each colour channel. That's 24 bits per pixel, and forms our baseline.
Now, your current algorithm encodes the first pixel as the following ASCII string:
['000', '0010', '0011']
That's 23 bytes in total (or 184 bits). That's much, much worse than raw. Let's examine why:
There are two spaces, which just make it more readable to a human. Those carry no information. (2 bytes)
Each of the three codes is delimited by two apostrophes. Since the codes only consist of 0s and 1s, the apostrophes are unnecessary for parsing, and thus also carry no information. (6 bytes)
Each of the codes is a prefix code, therefore they can be parsed unambiguously, and thus the two commas used for code separation are also unnecessary. (2 bytes)
We know there are three codes per pixel, so we don't need the braces ([,]) to delimit pixels either (for same reason as above). (2 bytes)
In total, that's 12 bytes per pixel that carry no information at all. The remaining 11 bytes (in this particular case) do carry some information... but how much?
Notice that the only two possible symbols in the output alphabet are 0 and 1. That means that each symbol carries 1 bit of information. Since you store each symbol as ASCII character (a byte), you use 8 bits for each 1 bit of information.
Put together, in this particular case, you used 184 bits to represent 11 bits of information -- ~16.7x more than necessary, and ~7.67x worse than just storing the pixels in raw format.
Obviously, using a naive text representation of the encoded data will not yield any compression. We will need a better approach.
Bitstreams
From our earlier analysis, it becomes evident that in order to perform compression (and decompression) effectively, we need to be able to treat our output (or input) as a stream of individual bits. The standard Python libraries do not provide a direct solution to do this -- at the lowest granularity, we can only read or write a file one byte at a time.
Since we want to encode values that may consist of multiple bits, it's essential to decode on how they shall be ordered based on significance. Let's order them from the most significant to the least significant.
Bit I/O Utilities
As mentioned earlier, we shall represent a sequence of bits as a list of integers in range [0,1]. Let's start by writing some simple utility functions:
A function that converts an integer into the shortest sequence of bits that uniquely represents it (i.e. at least 1 bit, but otherwise no leading zeros).
A function that converts a sequence of bits into an integer.
A function that zero-extends (adds zeros to most significant positions) a sequence of bits (to allow fixed-length encoding).
Implementation:
def to_binary_list(n):
"""Convert integer into a list of bits"""
return [n] if (n <= 1) else to_binary_list(n >> 1) + [n & 1]
def from_binary_list(bits):
"""Convert list of bits into an integer"""
result = 0
for bit in bits:
result = (result << 1) | bit
return result
def pad_bits(bits, n):
"""Prefix list of bits with enough zeros to reach n digits"""
assert(n >= len(bits))
return ([0] * (n - len(bits)) + bits)
Example Usage:
>>> to_binary_list(14)
[1, 1, 1, 0]
>>> from_binary_list([1,1,1,0])
14
>>> pad_bits(to_binary_list(14),8)
[0, 0, 0, 0, 1, 1, 1, 0]
Output Bitstream
Since the file I/O API allows us to save only whole bytes, we need to create a wrapper class that will buffer the bits written into a stream in memory.
Let's provide means to write a single bit, as well as a sequence of bits.
Each write command (of 1 or more bits) will first add the bits into the buffer. Once the buffer contains more than 8 bits, groups of 8 bits are removed from the front, converted to an integer in range [0-255] and saved to the output file. This is done until the buffer contains less than 8 bits.
Finally, let's provide a way to "flush" the stream -- when the buffer is non-empty, but doesn't contain enough bits to make a whole byte, add zeros to the least significant position until there are 8 bits, and then write the byte. We need this when we're closing the bitstream (and there are some other benefits that we'll see later).
Implementation:
class OutputBitStream(object):
def __init__(self, file_name):
self.file_name = file_name
self.file = open(self.file_name, 'wb')
self.bytes_written = 0
self.buffer = []
def write_bit(self, value):
self.write_bits([value])
def write_bits(self, values):
self.buffer += values
while len(self.buffer) >= 8:
self._save_byte()
def flush(self):
if len(self.buffer) > 0: # Add trailing zeros to complete a byte and write it
self.buffer += [0] * (8 - len(self.buffer))
self._save_byte()
assert(len(self.buffer) == 0)
def _save_byte(self):
bits = self.buffer[:8]
self.buffer[:] = self.buffer[8:]
byte_value = from_binary_list(bits)
self.file.write(bytes([byte_value]))
self.bytes_written += 1
def close(self):
self.flush()
self.file.close()
Input Bitstream
Input bit stream follows similar theme. We want to read 1 or more bits at a time. To do so, we load bytes from the file, convert each byte to a list of bits and add it to the buffer, until there are enough to satisfy the read request.
The flush command in this case purges the buffer (assuring it contains only zeros).
Implementation:
class InputBitStream(object):
def __init__(self, file_name):
self.file_name = file_name
self.file = open(self.file_name, 'rb')
self.bytes_read = 0
self.buffer = []
def read_bit(self):
return self.read_bits(1)[0]
def read_bits(self, count):
while len(self.buffer) < count:
self._load_byte()
result = self.buffer[:count]
self.buffer[:] = self.buffer[count:]
return result
def flush(self):
assert(not any(self.buffer))
self.buffer[:] = []
def _load_byte(self):
value = ord(self.file.read(1))
self.buffer += pad_bits(to_binary_list(value), 8)
self.bytes_read += 1
def close(self):
self.file.close()
Compressed Format
Next we need to define the format of our compressed bitstream. There are three essential chunks of information that are needed to decode the image:
The shape of the image (height and width), with the assumption that it's a 3-channel RGB image.
Information necessary to reconstruct the Huffman codes on the decode side
Huffman-encoded pixel data
Let's make our compressed format as follows:
Header
Image height (16 bits, unsigned)
Image width (16 bits, unsigned)
Huffman table (beginning aligned to whole byte)
See this for the algorithm.
Pixel codes (beginning aligned to whole byte)
width * height * 3 Huffman codes in sequence
Compression
Implementation:
from PIL import Image
def compressed_size(counts, codes):
header_size = 2 * 16 # height and width as 16 bit values
tree_size = len(counts) * (1 + 8) # Leafs: 1 bit flag, 8 bit symbol each
tree_size += len(counts) - 1 # Nodes: 1 bit flag each
if tree_size % 8 > 0: # Padding to next full byte
tree_size += 8 - (tree_size % 8)
# Sum for each symbol of count * code length
pixels_size = sum([count * len(codes[symbol]) for symbol, count in counts])
if pixels_size % 8 > 0: # Padding to next full byte
pixels_size += 8 - (pixels_size % 8)
return (header_size + tree_size + pixels_size) / 8
def encode_header(image, bitstream):
height_bits = pad_bits(to_binary_list(image.height), 16)
bitstream.write_bits(height_bits)
width_bits = pad_bits(to_binary_list(image.width), 16)
bitstream.write_bits(width_bits)
def encode_tree(tree, bitstream):
if type(tree) == tuple: # Note - write 0 and encode children
bitstream.write_bit(0)
encode_tree(tree[0], bitstream)
encode_tree(tree[1], bitstream)
else: # Leaf - write 1, followed by 8 bit symbol
bitstream.write_bit(1)
symbol_bits = pad_bits(to_binary_list(tree), 8)
bitstream.write_bits(symbol_bits)
def encode_pixels(image, codes, bitstream):
for pixel in image.getdata():
for value in pixel:
bitstream.write_bits(codes[value])
def compress_image(in_file_name, out_file_name):
print('Compressing "%s" -> "%s"' % (in_file_name, out_file_name))
image = Image.open(in_file_name)
print('Image shape: (height=%d, width=%d)' % (image.height, image.width))
size_raw = raw_size(image.height, image.width)
print('RAW image size: %d bytes' % size_raw)
counts = count_symbols(image)
print('Counts: %s' % counts)
tree = build_tree(counts)
print('Tree: %s' % str(tree))
trimmed_tree = trim_tree(tree)
print('Trimmed tree: %s' % str(trimmed_tree))
codes = assign_codes(trimmed_tree)
print('Codes: %s' % codes)
size_estimate = compressed_size(counts, codes)
print('Estimated size: %d bytes' % size_estimate)
print('Writing...')
stream = OutputBitStream(out_file_name)
print('* Header offset: %d' % stream.bytes_written)
encode_header(image, stream)
stream.flush() # Ensure next chunk is byte-aligned
print('* Tree offset: %d' % stream.bytes_written)
encode_tree(trimmed_tree, stream)
stream.flush() # Ensure next chunk is byte-aligned
print('* Pixel offset: %d' % stream.bytes_written)
encode_pixels(image, codes, stream)
stream.close()
size_real = stream.bytes_written
print('Wrote %d bytes.' % size_real)
print('Estimate is %scorrect.' % ('' if size_estimate == size_real else 'in'))
print('Compression ratio: %0.2f' % (float(size_raw) / size_real))
Decompression
Implementation:
from PIL import Image
def decode_header(bitstream):
height = from_binary_list(bitstream.read_bits(16))
width = from_binary_list(bitstream.read_bits(16))
return (height, width)
# https://stackoverflow.com/a/759766/3962537
def decode_tree(bitstream):
flag = bitstream.read_bits(1)[0]
if flag == 1: # Leaf, read and return symbol
return from_binary_list(bitstream.read_bits(8))
left = decode_tree(bitstream)
right = decode_tree(bitstream)
return (left, right)
def decode_value(tree, bitstream):
bit = bitstream.read_bits(1)[0]
node = tree[bit]
if type(node) == tuple:
return decode_value(node, bitstream)
return node
def decode_pixels(height, width, tree, bitstream):
pixels = bytearray()
for i in range(height * width * 3):
pixels.append(decode_value(tree, bitstream))
return Image.frombytes('RGB', (width, height), bytes(pixels))
def decompress_image(in_file_name, out_file_name):
print('Decompressing "%s" -> "%s"' % (in_file_name, out_file_name))
print('Reading...')
stream = InputBitStream(in_file_name)
print('* Header offset: %d' % stream.bytes_read)
height, width = decode_header(stream)
stream.flush() # Ensure next chunk is byte-aligned
print('* Tree offset: %d' % stream.bytes_read)
trimmed_tree = decode_tree(stream)
stream.flush() # Ensure next chunk is byte-aligned
print('* Pixel offset: %d' % stream.bytes_read)
image = decode_pixels(height, width, trimmed_tree, stream)
stream.close()
print('Read %d bytes.' % stream.bytes_read)
print('Image size: (height=%d, width=%d)' % (height, width))
print('Trimmed tree: %s' % str(trimmed_tree))
image.save(out_file_name)
Test Run
from PIL import ImageChops
def raw_size(width, height):
header_size = 2 * 16 # height and width as 16 bit values
pixels_size = 3 * 8 * width * height # 3 channels, 8 bits per channel
return (header_size + pixels_size) / 8
def images_equal(file_name_a, file_name_b):
image_a = Image.open(file_name_a)
image_b = Image.open(file_name_b)
diff = ImageChops.difference(image_a, image_b)
return diff.getbbox() is None
if __name__ == '__main__':
start = time.time()
compress_image('flag.png', 'answer.txt')
print('-' * 40)
decompress_image('answer.txt', 'flag_out.png')
stop = time.time()
times = (stop - start) * 1000
print('-' * 40)
print('Run time takes %d miliseconds' % times)
print('Images equal = %s' % images_equal('flag.png', 'flag_out.png'))
I ran the script with the sample image you provided.
Console Output:
Compressing "flag.png" -> "answer.txt"
Image shape: (height=18, width=23)
RAW image size: 1246 bytes
Counts: [(24, 90), (131, 90), (215, 90), (59, 324), (60, 324), (110, 324)]
Tree: (1242, ((594, ((270, ((90, 215), (180, ((90, 24), (90, 131))))), (324, 59))), (648, ((324, 60), (324, 110)))))
Trimmed tree: (((215, (24, 131)), 59), (60, 110))
Codes: {215: [0, 0, 0], 24: [0, 0, 1, 0], 131: [0, 0, 1, 1], 59: [0, 1], 60: [1, 0], 110: [1, 1]}
Estimated size: 379 bytes
Writing...
* Header offset: 0
* Tree offset: 4
* Pixel offset: 12
Wrote 379 bytes.
Estimate is correct.
Compression ratio: 3.29
----------------------------------------
Decompressing "answer.txt" -> "flag_out.png"
Reading...
* Header offset: 0
* Tree offset: 4
* Pixel offset: 12
Read 379 bytes.
Image size: (height=18, width=23)
Trimmed tree: (((215, (24, 131)), 59), (60, 110))
----------------------------------------
Run time takes 32 miliseconds
Images equal = True
Potential Improvements
Huffman table per colour channel
Palette image support
Transformation filter (delta coding per channel, or more sophisticated predictor)
Model to handle repetitions (RLE, LZ...)
Canonical Huffman tables

Convert text to decimal python3

I need to convert words to numbers for RSA cipher, so i found code which can convert text to decimal, but when i run it in terminal by python 3 i get:
Traceback (most recent call last):
File "test.py", line 49, in <module>
numberOutput = int(bit_list_to_string(string_to_bits(inputString)),2) #1976620216402300889624482718775150
File "test.py", line 31, in string_to_bits
map(chr_to_bit, s)
File "test.py", line 30, in <listcomp>
return [b for group in
File "test.py", line 29, in chr_to_bit
return pad_bits(convert_to_bits(ord(c)), ASCII_BITS)
File "test.py", line 14, in pad_bits
assert len(bits) <= pad
AssertionError
when i use "python convert_text_to_decimal.py" in terminal it works correctly.
Code:
BITS = ('0', '1')
ASCII_BITS = 8
def bit_list_to_string(b):
"""converts list of {0, 1}* to string"""
return ''.join([BITS[e] for e in b])
def seq_to_bits(seq):
return [0 if b == '0' else 1 for b in seq]
def pad_bits(bits, pad):
"""pads seq with leading 0s up to length pad"""
assert len(bits) <= pad
return [0] * (pad - len(bits)) + bits
def convert_to_bits(n):
"""converts an integer `n` to bit array"""
result = []
if n == 0:
return [0]
while n > 0:
result = [(n % 2)] + result
n = n / 2
return result
def string_to_bits(s):
def chr_to_bit(c):
return pad_bits(convert_to_bits(ord(c)), ASCII_BITS)
return [b for group in
map(chr_to_bit, s)
for b in group]
def bits_to_char(b):
assert len(b) == ASCII_BITS
value = 0
for e in b:
value = (value * 2) + e
return chr(value)
def list_to_string(p):
return ''.join(p)
def bits_to_string(b):
return ''.join([bits_to_char(b[i:i + ASCII_BITS])
for i in range(0, len(b), ASCII_BITS)])
inputString = "attack at dawn"
numberOutput = int(bit_list_to_string(string_to_bits(inputString)),2) #1976620216402300889624482718775150
bitSeq = seq_to_bits(bin(numberOutput)[2:]) #[2:] is needed to get rid of 0b in front
paddedString = pad_bits(bitSeq,len(bitSeq) + (8 - (len(bitSeq) % 8))) #Need to pad because conversion from dec to bin throws away MSB's
outputString = bits_to_string(paddedString) #attack at dawn
So when i use just python he have 2.7 version. Please, help me fix this code to python 3
Change line 22,
n = n / 2
to
n = n // 2
This solves the immediate error you get (and another one that follows from it). The rest of the routine may or may not work for your purposes; I did not check any further.
You get the assert error because the function convert_to_bits should be, theoretically speaking, returning a proper list of single bit values for a valid integer in its range. It calculates this list by dividing the integer by 2 until 0 remains.
However.
One of the more significant changes from Python 2.7 to 3.x was the behavior of the division operator. Prior, this always returned an integer, but with Python 3 it was decided to have it return a float instead.
That means the simple bit calculation loop
while n > 0:
result = [(n % 2)] + result
n = n / 2
does not return a steady list of 0s and 1s anymore, always ending because the source integer runs out of numbers, but instead you get a list of more than a thousand floating point numbers. At a glance it may be unclear what that list represents, but as it ends with
… 1.03125, 0.0625, 0.125, 0.25, 0.5, 1]
you can see it's the divide-by-two loop that keeps on dividing until its input finally runs out of floating point accuracy and stops dividing further.
The resulting array is not only way, way larger than the next routines expect, its data is also of the wrong type. The values in this list are used as an index for the BITS tuple at the top of your code. With the floating point division, you get an error when trying to use the value as an index, even if it is a round 0.0 or 1.0. The integer division, again, fixes this.

Dynamic programming table - Finding the minimal cost to break a string

A certain string-processing language offers a primitive operation
which splits a string into two pieces. Since this operation involves
copying the original string, it takes n units of time for a string of
length n, regardless of the location of the cut. Suppose, now, that
you want to break a string into many pieces.
The order in which the breaks are made can affect the total running
time. For example, suppose we wish to break a 20-character string (for
example "abcdefghijklmnopqrst") after characters at indices 3, 8, and
10 to obtain for substrings: "abcd", "efghi", "jk" and "lmnopqrst". If
the breaks are made in left-right order, then the first break costs 20
units of time, the second break costs 16 units of time and the third
break costs 11 units of time, for a total of 47 steps. If the breaks
are made in right-left order, the first break costs 20 units of time,
the second break costs 11 units of time, and the third break costs 9
units of time, for a total of only 40 steps. However, the optimal
solution is 38 (and the order of the cuts is 10, 3, 8).
The input is the length of the string and an ascending-sorted array with the cut indexes. I need to design a dynamic programming table to find the minimal cost to break the string and the order in which the cuts should be performed.
I can't figure out how the table structure should look (certain cells should be the answer to certain sub-problems and should be computable from other entries etc.). Instead, I've written a recursive function to find the minimum cost to break the string: b0, b1, ..., bK are the indexes for the cuts that have to be made to the (sub)string between i and j.
totalCost(i, j, {b0, b1, ..., bK}) = j - i + 1 + min {
totalCost(b0 + 1, j, {b1, b2, ..., bK}),
totalCost(i, b1, {b0 }) + totalCost(b1 + 1, j, {b2, b3, ..., bK}),
totalCost(i, b2, {b0, b1 }) + totalCost(b2 + 1, j, {b3, b4, ..., bK}),
....................................................................................
totalCost(i, bK, {b0, b1, ..., b(k - 1)})
} if k + 1 (the number of cuts) > 1,
j - i + 1 otherwise.
Please help me figure out the structure of the table, thanks!
For example we have a string of length n = 20 and we need to break it in positions cuts = [3, 8, 10]. First of all let's add two fake cuts to our array: -1 and n - 1 (to avoid edge cases), now we have cuts = [-1, 3, 8, 10, 19]. Let's fill table M, where M[i, j] is a minimum units of time to make all breaks between i-th and j-th cuts. We can fill it by rule: M[i, j] = (cuts[j] - cuts[i]) + min(M[i, k] + M[k, j]) where i < k < j. The minimum time to make all cuts will be in the cell M[0, len(cuts) - 1]. Full code in python:
# input
n = 20
cuts = [3, 8, 10]
# add fake cuts
cuts = [-1] + cuts + [n - 1]
cuts_num = len(cuts)
# init table with zeros
table = []
for i in range(cuts_num):
table += [[0] * cuts_num]
# fill table
for diff in range(2, cuts_num):
for start in range(0, cuts_num - diff):
end = start + diff
table[start][end] = 1e9
for mid in range(start + 1, end):
table[start][end] = min(table[start][end], table[
start][mid] + table[mid][end])
table[start][end] += cuts[end] - cuts[start]
# print result: 38
print(table[0][cuts_num - 1])
Just in case you may feel easier to follow when everything is 1-based (same as DPV Dasgupta Algorithm book problem 6.9, and same as UdaCity Graduate Algorithm course initiated by GaTech), following is the python code that does the equivalent thing with the previous python code by Jemshit and Aleksei. It follows the chain multiply (binary tree) pattern as taught in the video lecture.
import numpy as np
# n is string len, P is of size m where P[i] is the split pos that split string into [1,i] and [i+1,n] (1-based)
def spliting_cost(P, n):
P = [0,] + P + [n,] # make sure pos list contains both ends of string
m = len(P)
P = [0,] + P # both C and P are 1-base indexed for easy reading
C = np.full((m+1,m+1), np.inf)
for i in range(1, m+1): C[i, i:i+2] = 0 # any segment <= 2 does not need split so is zero cost
for s in range(2, m): # s is split string len
for i in range(1, m-s+1):
j = i + s
for k in range(i, j+1):
C[i,j] = min(C[i,j], P[j] - P[i] + C[i,k] + C[k,j])
return C[1,m]
spliting_cost([3, 5, 10, 14, 16, 19], 20)
The output answer is 55, same as that with split points [2, 4, 9, 13, 15, 18] in the previous algorithm.

Python - binary to num and num to binary - Wrong Output

While working on some complex project, I came across an interesting bug:
Code reads the file, converts binary to integers, writes to the file.
Other fellow reads this file and converts integers to binary and writes to a file.
Ideally, input file and converted files should be same. But that is not happening.
Pl find the code below:
# read file -> convert to binary -> binary to num -> write file
def bits(f):
byte = (ord(b) for b in f.read())
for b in byte:
bstr = []
for i in range(8):
bstr.append( (b >> i) & 1)
yield bstr
def binaryToNum(S):
bits = len(S)
if (S==''): return 0
elif (S[0] == '0'): return binaryToNum(S[1:])
elif (S[0] == '1'): return ((2**(bits-1))) + binaryToNum(S[1:])
bstr = []
for b in bits(open('input_test', 'r')):
bstr.append(b)
dstr = ''
for i in bstr:
b_num = str(binaryToNum(''.join(str(e) for e in i))).zfill(6)
dstr = dstr + b_num
ter = open('im1', 'w')
for item in dstr:
ter.write(item)
ter.close()
This part seems correct, I checked manually for a-z, A-Z and 0-9
The code on other machine does this:
def readDecDataFromFile(filename):
data = []
with open(filename) as f:
data = data + f.readlines()
chunks, chunk_size = len(data[0]), 6
return [ data[0][i:i+chunk_size] for i in range(0, chunks, chunk_size) ]
def numToBinary(N):
return str(int(bin(int(N))[2:]))
ddata = readDecDataFromFile('im1')
bytes = []
for d in ddata:
bits = numToBinary(d)
bytes.append(int(bits[::-1], 2).to_bytes(1, 'little'))
f = open('orig_input', 'wb')
for b in bytes:
f.write(b)
f.close()
And here is the output:
input_test: my name is XYZ
orig_input: my7ameisY-
bytes list in last code yields:
[b'm', b'y', b'\x01', b'7', b'a', b'm', b'e', b'\x01', b'i', b's', b'\x01', b'\x0b', b'Y', b'-', b'\x05']
What could be the potential error?
two modifications are required.
while reading the bits, current order is little endian. To convert it,
reversed(range(8))
should be used in bits function.
While converting from bits to bytes at the time of writing, bit string is reversed. That is not needed. So Code changes from
bytes.append(int(bits[::-1], 2).to_bytes(1, 'little'))
To
bytes.append(int(bits, 2).to_bytes(1, 'little'))

How to sort 4 integers using only min() and max()? Python

I am trying to sort 4 integers input by the user into numerical order using only the min() and max() functions in python. I can get the highest and lowest number easily, but cannot work out a combination to order the two middle numbers? Does anyone have an idea?
So I'm guessing your input is something like this?
string = input('Type your numbers, separated by a space')
Then I'd do:
numbers = [int(i) for i in string.strip().split(' ')]
amount_of_numbers = len(numbers)
sorted = []
for i in range(amount_of_numbers):
x = max(numbers)
numbers.remove(x)
sorted.append(x)
print(sorted)
This will sort them using max, but min can also be used.
If you didn't have to use min and max:
string = input('Type your numbers, separated by a space')
numbers = [int(i) for i in string.strip().split(' ')]
numbers.sort() #an optional reverse argument possible
print(numbers)
LITERALLY just min and max? Odd, but, why not. I'm about to crash, but I think the following would work:
# Easy
arr[0] = max(a,b,c,d)
# Take the smallest element from each pair.
#
# You will never take the largest element from the set, but since one of the
# pairs will be (largest, second_largest) you will at some point take the
# second largest. Take the maximum value of the selected items - which
# will be the maximum of the items ignoring the largest value.
arr[1] = max(min(a,b)
min(a,c)
min(a,d)
min(b,c)
min(b,d)
min(c,d))
# Similar logic, but reversed, to take the smallest of the largest of each
# pair - again omitting the smallest number, then taking the smallest.
arr[2] = min(max(a,b)
max(a,c)
max(a,d)
max(b,c)
max(b,d)
max(c,d))
# Easy
arr[3] = min(a,b,c,d)
For Tankerbuzz's result for the following:
first_integer = 9
second_integer = 19
third_integer = 1
fourth_integer = 15
I get 1, 15, 9, 19 as the ascending values.
The following is one of the forms that gives symbolic form of the ascending values (using i1-i4 instead of first_integer, etc...):
Min(i1, i2, i3, i4)
Max(Min(i4, Max(Min(i1, i2), Min(i3, Max(i1, i2))), Max(i1, i2, i3)), Min(i1, i2, i3, Max(i1, i2)))
Max(Min(i1, i2), Min(i3, Max(i1, i2)), Min(i4, Max(i1, i2, i3)))
Max(i1, i2, i3, i4)
It was generated by a 'bubble sort' using the Min and Max functions of SymPy (a python CAS):
def minmaxsort(v):
"""return a sorted list of the elements in v using the
Min and Max functions.
Examples
========
>>> minmaxsort(3, 2, 1)
[1, 2, 3]
>>> minmaxsort(1, x, y)
[Min(1, x, y), Max(Min(1, x), Min(y, Max(1, x))), Max(1, x, y)]
>>> minmaxsort(1, y, x)
[Min(1, x, y), Max(Min(1, y), Min(x, Max(1, y))), Max(1, x, y)]
"""
from sympy import Min, Max
v = list(v)
v0 = Min(*v)
for j in range(len(v)):
for i in range(len(v) - j - 1):
w = v[i:i + 2]
v[i:i + 2] = [Min(*w), Max(*w)]
v[0] = v0
return v
I have worked it out.
min_integer = min(first_integer, second_integer, third_integer, fourth_integer)
mid_low_integer = min(max(first_integer, second_integer), max(third_integer, fourth_integer))
mid_high_integer = max(min(first_integer, second_integer), min(third_integer, fourth_integer))
max_integer = max(first_integer, second_integer, third_integer, fourth_integer)

Resources