Image compression in python - python-3.x

For my image Compression, I am using the pillow library to get every pixel in rgb (for ex: (100, 0, 200). Using the Huffman encoding I already convert to binary to reduce the number of bits. For now, I have to save the sequence of bits into a text or binary file. The compress files to be consistently smaller than original, but for now, my txt file is larger than the original. What should I do ?
And after that how can I read the file and decompress it. Here is an instruction:
Your code should read in an image file, compute how many bits are required for a fixed length encoding
and then apply a compression algorithm to create a smaller encoding – you need to implement the
compression, you cannot use a compression library. You should output how many bits are required to store the image in your compressed format as well as the compression ratio achieved. When it comes
to saving your compressed image, you won’t be able to save it as a standard image format, since you will
have created your own encoding, but you can save the sequence of bits into a text or binary file.
Your code should also be able to prompt the user for the filename of a text file containing a compressed
sequence of bits and then decompress that file into the original image – you can assume that the file
uses the same compression format as the last file you compressed. So, for example, if you compressed pacificat.bmp into a series of bits stored in pacificat.txt and then the user asked you to decompress alt_encode.txt, you could assume that alt_pacificat.txt used the same compression data structure as encode.txt (it might be a subset of the data from the original image, for example).
There are a number of libraries that can help you store formatted data into a file from Python. If you research the options and find a way to store your compression data structure into a file, such that the user can select both a bit file and a data structure file and use the data structure to decompress the bit file
just use my current image: flag2.bmp
here is my code
from PIL import Image
import sys, string
import copy
import time
codes = {}
def sortFreq (freqs) :
letters = freqs.keys()
tuples = []
for let in letters :
tuples.append((freqs[let],let))
tuples.sort()
return tuples
def buildTree(tuples) :
while len(tuples) > 1 :
leastTwo = tuple(tuples[0:2]) # get the 2 to combine
theRest = tuples[2:] # all the others
combFreq = leastTwo[0][0] + leastTwo[1][0] # the branch points freq
tuples = theRest + [(combFreq,leastTwo)] # add branch point to the end
tuples.sort() # sort it into place
return tuples[0] # Return the single tree inside the list
def trimTree (tree) :
# Trim the freq counters off, leaving just the letters
p = tree[1] # ignore freq count in [0]
if type(p) == type("") : return p # if just a leaf, return it
else : return (trimTree(p[0]), trimTree(p[1])) # trim left then right and recombine
def assignCodes(node, pat=''):
global codes
if type(node) == type("") :
codes[node] = pat # A leaf. set its code
else : #
assignCodes(node[0], pat+"0") # Branch point. Do the left branch
assignCodes(node[1], pat+"1") # then do the right branch.
start = time.time()
dictionary = {}
table = {}
image = Image.open('flag2.bmp')
#image.show()
width, height = image.size
px= image.load()
totalpixel = width*height
print("Total pixel: "+ str(totalpixel))
for x in range(width):
for y in range(height):
# print(px[x, y])
for i in range(3):
if dictionary.get(str(px[x, y][i])) is None:
dictionary[str(px[x, y][i])] = 1
else:
dictionary[str(px[x, y][i])] = dictionary[str(px[x, y][i])] +1
table = copy.deepcopy(dictionary)
def encode2 (str) :
global codes
output = ""
for ch in str : output += codes[ch]
return output
def decode (tree, str) :
output = ""
p = tree
for bit in str :
if bit == '0' : p = p[0] # Head up the left branch
else : p = p[1] # or up the right branch
if type(p) == type("") :
output += p # found a character. Add to output
p = tree # and restart for next character
return output
combination = len(dictionary)
for value in table:
table[value] = table[value] / (totalpixel * combination) * 100
print(table)
print(dictionary)
sortdic = sortFreq(dictionary)
tree = buildTree(sortdic)
print("tree")
print(tree)
trim = trimTree(tree)
print("trim")
print(trim)
print("assign 01")
assignCodes(trim)
print(codes)
empty_tuple = ()
f = open("answer.txt","w")
for x in range(width):
for y in range(height):
list = []
list.append(codes[str(px[x, y][0])])
list.append(codes[str(px[x, y][1])])
list.append(codes[str(px[x, y][2])])
print(str(px[x, y]) + ": " +str(list))
f.write(str(list))
print("decode test:", str(decode (trim, "1100")))
stop = time.time()
times = (stop - start) * 1000
print("Run time takes %d miliseconds" % times)
[flag2.bmp][1]

Code Cleanup
Let's try to refactor your code a little, taking advantage of algorithms provided by Python standard library, while keeping to the spirit of your approach to Huffman tree calculation and image encoding.
Calculating Symbol Counts
First of all, we can refactor the symbol counting into a function and rewrite it in more concise way:
Use Image.getdata() to iterate over all the pixels in the image
Since each pixel is represented by a tuple, use itertools.chain.from_iterable to get a flattened view of intensities.
Take advantage of collections.Counter to get the symbol (intensity counts)
Additionally, we can change it to return a list of (symbol, count), sorted in ascending order by (count, symbol). To do so, we can combine it with a rewritten version of your sortFreq(...) function, taking advantage of:
Python sorted(...) function (which allows us to define the key to sort by), together with
Tuple slicing to reverse the (symbol, count) tuples for sorting
Implementation:
from collections import Counter
from itertools import chain
def count_symbols(image):
pixels = image.getdata()
values = chain.from_iterable(pixels)
counts = Counter(values).items()
return sorted(counts, key=lambda x:x[::-1])
Building the Tree
Only a small change is needed here -- since we already have the symbol counts sorted, we just need to reverse the tuples to let your existing tree-building algorithm to work. We can use list comprehension together with tuple slicing to express this concisely.
Implementation:
def build_tree(counts) :
nodes = [entry[::-1] for entry in counts] # Reverse each (symbol,count) tuple
while len(nodes) > 1 :
leastTwo = tuple(nodes[0:2]) # get the 2 to combine
theRest = nodes[2:] # all the others
combFreq = leastTwo[0][0] + leastTwo[1][0] # the branch points freq
nodes = theRest + [(combFreq, leastTwo)] # add branch point to the end
nodes.sort() # sort it into place
return nodes[0] # Return the single tree inside the list
Trimming the Tree
Again, just two small changes from your original implementation:
Change the test to check for tuple (node), to be independent of how a symbol is represented.
Get rid of the unnecessary else
Implementation:
def trim_tree(tree) :
p = tree[1] # Ignore freq count in [0]
if type(p) is tuple: # Node, trim left then right and recombine
return (trim_tree(p[0]), trim_tree(p[1]))
return p # Leaf, just return it
Assigning Codes
The most important change here is to eliminate the reliance on a global codes variable. To resolve it, we can split the implementation into two functions, one which handles the recursive code assignment, and a wrapper which creates a new local codes dictionary, dispatches the recursive function on it, and returns the output.
Let's also switch the representation of codes from strings to lists of bits (integers in range [0,1]) -- the usefulness of this will be apparent later.
Once more, we'll change the test to check for tuples (for same reason as when trimming).
Implementation:
def assign_codes_impl(codes, node, pat):
if type(node) == tuple:
assign_codes_impl(codes, node[0], pat + [0]) # Branch point. Do the left branch
assign_codes_impl(codes, node[1], pat + [1]) # then do the right branch.
else:
codes[node] = pat # A leaf. set its code
def assign_codes(tree):
codes = {}
assign_codes_impl(codes, tree, [])
return codes
Encoding
Let's make a small detour, and talk about encoding of the data.
First of all, let's observe that a raw RGB pixel is represented by 3 bytes (one for each colour channel. That's 24 bits per pixel, and forms our baseline.
Now, your current algorithm encodes the first pixel as the following ASCII string:
['000', '0010', '0011']
That's 23 bytes in total (or 184 bits). That's much, much worse than raw. Let's examine why:
There are two spaces, which just make it more readable to a human. Those carry no information. (2 bytes)
Each of the three codes is delimited by two apostrophes. Since the codes only consist of 0s and 1s, the apostrophes are unnecessary for parsing, and thus also carry no information. (6 bytes)
Each of the codes is a prefix code, therefore they can be parsed unambiguously, and thus the two commas used for code separation are also unnecessary. (2 bytes)
We know there are three codes per pixel, so we don't need the braces ([,]) to delimit pixels either (for same reason as above). (2 bytes)
In total, that's 12 bytes per pixel that carry no information at all. The remaining 11 bytes (in this particular case) do carry some information... but how much?
Notice that the only two possible symbols in the output alphabet are 0 and 1. That means that each symbol carries 1 bit of information. Since you store each symbol as ASCII character (a byte), you use 8 bits for each 1 bit of information.
Put together, in this particular case, you used 184 bits to represent 11 bits of information -- ~16.7x more than necessary, and ~7.67x worse than just storing the pixels in raw format.
Obviously, using a naive text representation of the encoded data will not yield any compression. We will need a better approach.
Bitstreams
From our earlier analysis, it becomes evident that in order to perform compression (and decompression) effectively, we need to be able to treat our output (or input) as a stream of individual bits. The standard Python libraries do not provide a direct solution to do this -- at the lowest granularity, we can only read or write a file one byte at a time.
Since we want to encode values that may consist of multiple bits, it's essential to decode on how they shall be ordered based on significance. Let's order them from the most significant to the least significant.
Bit I/O Utilities
As mentioned earlier, we shall represent a sequence of bits as a list of integers in range [0,1]. Let's start by writing some simple utility functions:
A function that converts an integer into the shortest sequence of bits that uniquely represents it (i.e. at least 1 bit, but otherwise no leading zeros).
A function that converts a sequence of bits into an integer.
A function that zero-extends (adds zeros to most significant positions) a sequence of bits (to allow fixed-length encoding).
Implementation:
def to_binary_list(n):
"""Convert integer into a list of bits"""
return [n] if (n <= 1) else to_binary_list(n >> 1) + [n & 1]
def from_binary_list(bits):
"""Convert list of bits into an integer"""
result = 0
for bit in bits:
result = (result << 1) | bit
return result
def pad_bits(bits, n):
"""Prefix list of bits with enough zeros to reach n digits"""
assert(n >= len(bits))
return ([0] * (n - len(bits)) + bits)
Example Usage:
>>> to_binary_list(14)
[1, 1, 1, 0]
>>> from_binary_list([1,1,1,0])
14
>>> pad_bits(to_binary_list(14),8)
[0, 0, 0, 0, 1, 1, 1, 0]
Output Bitstream
Since the file I/O API allows us to save only whole bytes, we need to create a wrapper class that will buffer the bits written into a stream in memory.
Let's provide means to write a single bit, as well as a sequence of bits.
Each write command (of 1 or more bits) will first add the bits into the buffer. Once the buffer contains more than 8 bits, groups of 8 bits are removed from the front, converted to an integer in range [0-255] and saved to the output file. This is done until the buffer contains less than 8 bits.
Finally, let's provide a way to "flush" the stream -- when the buffer is non-empty, but doesn't contain enough bits to make a whole byte, add zeros to the least significant position until there are 8 bits, and then write the byte. We need this when we're closing the bitstream (and there are some other benefits that we'll see later).
Implementation:
class OutputBitStream(object):
def __init__(self, file_name):
self.file_name = file_name
self.file = open(self.file_name, 'wb')
self.bytes_written = 0
self.buffer = []
def write_bit(self, value):
self.write_bits([value])
def write_bits(self, values):
self.buffer += values
while len(self.buffer) >= 8:
self._save_byte()
def flush(self):
if len(self.buffer) > 0: # Add trailing zeros to complete a byte and write it
self.buffer += [0] * (8 - len(self.buffer))
self._save_byte()
assert(len(self.buffer) == 0)
def _save_byte(self):
bits = self.buffer[:8]
self.buffer[:] = self.buffer[8:]
byte_value = from_binary_list(bits)
self.file.write(bytes([byte_value]))
self.bytes_written += 1
def close(self):
self.flush()
self.file.close()
Input Bitstream
Input bit stream follows similar theme. We want to read 1 or more bits at a time. To do so, we load bytes from the file, convert each byte to a list of bits and add it to the buffer, until there are enough to satisfy the read request.
The flush command in this case purges the buffer (assuring it contains only zeros).
Implementation:
class InputBitStream(object):
def __init__(self, file_name):
self.file_name = file_name
self.file = open(self.file_name, 'rb')
self.bytes_read = 0
self.buffer = []
def read_bit(self):
return self.read_bits(1)[0]
def read_bits(self, count):
while len(self.buffer) < count:
self._load_byte()
result = self.buffer[:count]
self.buffer[:] = self.buffer[count:]
return result
def flush(self):
assert(not any(self.buffer))
self.buffer[:] = []
def _load_byte(self):
value = ord(self.file.read(1))
self.buffer += pad_bits(to_binary_list(value), 8)
self.bytes_read += 1
def close(self):
self.file.close()
Compressed Format
Next we need to define the format of our compressed bitstream. There are three essential chunks of information that are needed to decode the image:
The shape of the image (height and width), with the assumption that it's a 3-channel RGB image.
Information necessary to reconstruct the Huffman codes on the decode side
Huffman-encoded pixel data
Let's make our compressed format as follows:
Header
Image height (16 bits, unsigned)
Image width (16 bits, unsigned)
Huffman table (beginning aligned to whole byte)
See this for the algorithm.
Pixel codes (beginning aligned to whole byte)
width * height * 3 Huffman codes in sequence
Compression
Implementation:
from PIL import Image
def compressed_size(counts, codes):
header_size = 2 * 16 # height and width as 16 bit values
tree_size = len(counts) * (1 + 8) # Leafs: 1 bit flag, 8 bit symbol each
tree_size += len(counts) - 1 # Nodes: 1 bit flag each
if tree_size % 8 > 0: # Padding to next full byte
tree_size += 8 - (tree_size % 8)
# Sum for each symbol of count * code length
pixels_size = sum([count * len(codes[symbol]) for symbol, count in counts])
if pixels_size % 8 > 0: # Padding to next full byte
pixels_size += 8 - (pixels_size % 8)
return (header_size + tree_size + pixels_size) / 8
def encode_header(image, bitstream):
height_bits = pad_bits(to_binary_list(image.height), 16)
bitstream.write_bits(height_bits)
width_bits = pad_bits(to_binary_list(image.width), 16)
bitstream.write_bits(width_bits)
def encode_tree(tree, bitstream):
if type(tree) == tuple: # Note - write 0 and encode children
bitstream.write_bit(0)
encode_tree(tree[0], bitstream)
encode_tree(tree[1], bitstream)
else: # Leaf - write 1, followed by 8 bit symbol
bitstream.write_bit(1)
symbol_bits = pad_bits(to_binary_list(tree), 8)
bitstream.write_bits(symbol_bits)
def encode_pixels(image, codes, bitstream):
for pixel in image.getdata():
for value in pixel:
bitstream.write_bits(codes[value])
def compress_image(in_file_name, out_file_name):
print('Compressing "%s" -> "%s"' % (in_file_name, out_file_name))
image = Image.open(in_file_name)
print('Image shape: (height=%d, width=%d)' % (image.height, image.width))
size_raw = raw_size(image.height, image.width)
print('RAW image size: %d bytes' % size_raw)
counts = count_symbols(image)
print('Counts: %s' % counts)
tree = build_tree(counts)
print('Tree: %s' % str(tree))
trimmed_tree = trim_tree(tree)
print('Trimmed tree: %s' % str(trimmed_tree))
codes = assign_codes(trimmed_tree)
print('Codes: %s' % codes)
size_estimate = compressed_size(counts, codes)
print('Estimated size: %d bytes' % size_estimate)
print('Writing...')
stream = OutputBitStream(out_file_name)
print('* Header offset: %d' % stream.bytes_written)
encode_header(image, stream)
stream.flush() # Ensure next chunk is byte-aligned
print('* Tree offset: %d' % stream.bytes_written)
encode_tree(trimmed_tree, stream)
stream.flush() # Ensure next chunk is byte-aligned
print('* Pixel offset: %d' % stream.bytes_written)
encode_pixels(image, codes, stream)
stream.close()
size_real = stream.bytes_written
print('Wrote %d bytes.' % size_real)
print('Estimate is %scorrect.' % ('' if size_estimate == size_real else 'in'))
print('Compression ratio: %0.2f' % (float(size_raw) / size_real))
Decompression
Implementation:
from PIL import Image
def decode_header(bitstream):
height = from_binary_list(bitstream.read_bits(16))
width = from_binary_list(bitstream.read_bits(16))
return (height, width)
# https://stackoverflow.com/a/759766/3962537
def decode_tree(bitstream):
flag = bitstream.read_bits(1)[0]
if flag == 1: # Leaf, read and return symbol
return from_binary_list(bitstream.read_bits(8))
left = decode_tree(bitstream)
right = decode_tree(bitstream)
return (left, right)
def decode_value(tree, bitstream):
bit = bitstream.read_bits(1)[0]
node = tree[bit]
if type(node) == tuple:
return decode_value(node, bitstream)
return node
def decode_pixels(height, width, tree, bitstream):
pixels = bytearray()
for i in range(height * width * 3):
pixels.append(decode_value(tree, bitstream))
return Image.frombytes('RGB', (width, height), bytes(pixels))
def decompress_image(in_file_name, out_file_name):
print('Decompressing "%s" -> "%s"' % (in_file_name, out_file_name))
print('Reading...')
stream = InputBitStream(in_file_name)
print('* Header offset: %d' % stream.bytes_read)
height, width = decode_header(stream)
stream.flush() # Ensure next chunk is byte-aligned
print('* Tree offset: %d' % stream.bytes_read)
trimmed_tree = decode_tree(stream)
stream.flush() # Ensure next chunk is byte-aligned
print('* Pixel offset: %d' % stream.bytes_read)
image = decode_pixels(height, width, trimmed_tree, stream)
stream.close()
print('Read %d bytes.' % stream.bytes_read)
print('Image size: (height=%d, width=%d)' % (height, width))
print('Trimmed tree: %s' % str(trimmed_tree))
image.save(out_file_name)
Test Run
from PIL import ImageChops
def raw_size(width, height):
header_size = 2 * 16 # height and width as 16 bit values
pixels_size = 3 * 8 * width * height # 3 channels, 8 bits per channel
return (header_size + pixels_size) / 8
def images_equal(file_name_a, file_name_b):
image_a = Image.open(file_name_a)
image_b = Image.open(file_name_b)
diff = ImageChops.difference(image_a, image_b)
return diff.getbbox() is None
if __name__ == '__main__':
start = time.time()
compress_image('flag.png', 'answer.txt')
print('-' * 40)
decompress_image('answer.txt', 'flag_out.png')
stop = time.time()
times = (stop - start) * 1000
print('-' * 40)
print('Run time takes %d miliseconds' % times)
print('Images equal = %s' % images_equal('flag.png', 'flag_out.png'))
I ran the script with the sample image you provided.
Console Output:
Compressing "flag.png" -> "answer.txt"
Image shape: (height=18, width=23)
RAW image size: 1246 bytes
Counts: [(24, 90), (131, 90), (215, 90), (59, 324), (60, 324), (110, 324)]
Tree: (1242, ((594, ((270, ((90, 215), (180, ((90, 24), (90, 131))))), (324, 59))), (648, ((324, 60), (324, 110)))))
Trimmed tree: (((215, (24, 131)), 59), (60, 110))
Codes: {215: [0, 0, 0], 24: [0, 0, 1, 0], 131: [0, 0, 1, 1], 59: [0, 1], 60: [1, 0], 110: [1, 1]}
Estimated size: 379 bytes
Writing...
* Header offset: 0
* Tree offset: 4
* Pixel offset: 12
Wrote 379 bytes.
Estimate is correct.
Compression ratio: 3.29
----------------------------------------
Decompressing "answer.txt" -> "flag_out.png"
Reading...
* Header offset: 0
* Tree offset: 4
* Pixel offset: 12
Read 379 bytes.
Image size: (height=18, width=23)
Trimmed tree: (((215, (24, 131)), 59), (60, 110))
----------------------------------------
Run time takes 32 miliseconds
Images equal = True
Potential Improvements
Huffman table per colour channel
Palette image support
Transformation filter (delta coding per channel, or more sophisticated predictor)
Model to handle repetitions (RLE, LZ...)
Canonical Huffman tables

Related

Python: stream a tarfile to S3 using multipart upload

I would like to create a .tar file in an S3 bucket from Python code running in an AWS Lambda function. Lambda functions are very memory- and disk- constrained. I want to create a .tar file that contains multiple files that are too large to fit in the Lambda function's memory or disk space.
Using "S3 multipart upload," it is possible to upload a large file by uploading chunks of 5MB or more in size. I have this figured out and working. What I need to figure out is how to manage a buffer of bytes in memory that won't grow past the limits of the Lambda function's runtime environment.
I think the solution is to create an io.BytesIO() object and manage both a read pointer and a write pointer. I can then write into the buffer (from files that I want to add to the .tar file) and every time the buffer exceeds some limit (like 5MB) I can read off a chunk of data and send another file part to S3.
What I haven't quite wrapped my head around is how to truncate the part of the buffer that has been read and is no longer needed in memory. I need to trim the head of the buffer, not the tail, so the truncate() function of BytesIO won't work for me.
Is the 'correct' solution to create a new BytesIO buffer, populating it with the contents of the existing buffer from the read pointer to the end of the buffer, when I truncate? Is there a better way to truncate the head of the BytesIO buffer? Is there a better solution than using BytesIO?
For the random Google-r who stumbles onto this question six years in the future and thinks, "man, that describes my problem exactly!", here's what I came up with:
import io
import struct
from tarfile import BLOCKSIZE
#This class was designed to write a .tar file to S3 using multipart upload
#in a memory- and disk constrained environment, such as AWS Lambda Functions.
#
#Much of this code is copied or adapted from the Python source code tarfile.py
#file at https://github.com/python/cpython/blob/3.10/Lib/tarfile.py
#
#No warranties expressed or implied. Your mileage may vary. Lather, rinse, repeat
class StreamingTarFileWriter:
#Various constants from tarfile.py that we need
GNU_FORMAT = 1
NUL = b"\0"
BLOCKSIZE = 512
RECORDSIZE = BLOCKSIZE * 20
class MemoryByteStream:
def __init__(self, bufferFullCallback = None, bufferFullByteCount = 0):
self.buf = io.BytesIO()
self.readPointer = 0
self.writePointer = 0
self.bufferFullCallback = bufferFullCallback
self.bufferFullByteCount = bufferFullByteCount
def write(self, buf: bytes):
self.buf.seek(self.writePointer)
self.writePointer += self.buf.write(buf)
bytesAvailableToRead = self.writePointer - self.readPointer
if self.bufferFullByteCount > 0 and bytesAvailableToRead > self.bufferFullByteCount:
if self.bufferFullCallback:
self.bufferFullCallback(self, bytesAvailableToRead)
def read(self, byteCount = None):
self.buf.seek(self.readPointer)
if byteCount:
chunk = self.buf.read(byteCount)
else:
chunk = self.buf.read()
self.readPointer += len(chunk)
self._truncate()
return chunk
def size(self):
return self.writePointer - self.readPointer
def _truncate(self):
self.buf.seek(self.readPointer)
self.buf = io.BytesIO(self.buf.read())
self.readPointer = 0
self.writePointer = self.buf.seek(0, 2)
def stn(self, s, length, encoding, errors):
#Convert a string to a null-terminated bytes object.
s = s.encode(encoding, errors)
return s[:length] + (length - len(s)) * self.NUL
def itn(self, n, digits=8, format=GNU_FORMAT):
#Convert a python number to a number field.
# POSIX 1003.1-1988 requires numbers to be encoded as a string of
# octal digits followed by a null-byte, this allows values up to
# (8**(digits-1))-1. GNU tar allows storing numbers greater than
# that if necessary. A leading 0o200 or 0o377 byte indicate this
# particular encoding, the following digits-1 bytes are a big-endian
# base-256 representation. This allows values up to (256**(digits-1))-1.
# A 0o200 byte indicates a positive number, a 0o377 byte a negative
# number.
original_n = n
n = int(n)
if 0 <= n < 8 ** (digits - 1):
s = bytes("%0*o" % (digits - 1, n), "ascii") + self.NUL
elif format == self.GNU_FORMAT and -256 ** (digits - 1) <= n < 256 ** (digits - 1):
if n >= 0:
s = bytearray([0o200])
else:
s = bytearray([0o377])
n = 256 ** digits + n
for i in range(digits - 1):
s.insert(1, n & 0o377)
n >>= 8
else:
raise ValueError("overflow in number field")
return s
def calc_chksums(self, buf):
"""Calculate the checksum for a member's header by summing up all
characters except for the chksum field which is treated as if
it was filled with spaces. According to the GNU tar sources,
some tars (Sun and NeXT) calculate chksum with signed char,
which will be different if there are chars in the buffer with
the high bit set. So we calculate two checksums, unsigned and
signed.
"""
unsigned_chksum = 256 + sum(struct.unpack_from("148B8x356B", buf))
signed_chksum = 256 + sum(struct.unpack_from("148b8x356b", buf))
return unsigned_chksum, signed_chksum
def __init__(self, bufferFullCallback = None, bufferFullByteCount = 0):
self.buf = self.MemoryByteStream(bufferFullCallback, bufferFullByteCount)
self.expectedFileSize = 0
self.fileBytesWritten = 0
self.offset = 0
pass
def addFileRecord(self, filename, filesize):
REGTYPE = b"0" # regular file
encoding = "utf-8"
LENGTH_NAME = 100
GNU_MAGIC = b"ustar \0" # magic gnu tar string
errors="surrogateescape"
#Copied from TarInfo.tobuf()
tarinfo = {
"name": filename,
"mode": 0o644,
"uid": 0,
"gid": 0,
"size": filesize,
"mtime": 0,
"chksum": 0,
"type": REGTYPE,
"linkname": "",
"uname": "",
"gname": "",
"devmajor": 0,
"devminor": 0,
"magic": GNU_MAGIC
}
buf = b""
if len(tarinfo["name"].encode(encoding, errors)) > LENGTH_NAME:
raise Exception("Filename is too long for tar file header.")
devmajor = self.stn("", 8, encoding, errors)
devminor = self.stn("", 8, encoding, errors)
parts = [
self.stn(tarinfo.get("name", ""), 100, encoding, errors),
self.itn(tarinfo.get("mode", 0) & 0o7777, 8, self.GNU_FORMAT),
self.itn(tarinfo.get("uid", 0), 8, self.GNU_FORMAT),
self.itn(tarinfo.get("gid", 0), 8, self.GNU_FORMAT),
self.itn(tarinfo.get("size", 0), 12, self.GNU_FORMAT),
self.itn(tarinfo.get("mtime", 0), 12, self.GNU_FORMAT),
b" ", # checksum field
tarinfo.get("type", REGTYPE),
self.stn(tarinfo.get("linkname", ""), 100, encoding, errors),
tarinfo.get("magic", GNU_MAGIC),
self.stn(tarinfo.get("uname", ""), 32, encoding, errors),
self.stn(tarinfo.get("gname", ""), 32, encoding, errors),
devmajor,
devminor,
self.stn(tarinfo.get("prefix", ""), 155, encoding, errors)
]
buf = struct.pack("%ds" % BLOCKSIZE, b"".join(parts))
chksum = self.calc_chksums(buf[-BLOCKSIZE:])[0]
buf = buf[:-364] + bytes("%06o\0" % chksum, "ascii") + buf[-357:]
self.buf.write(buf)
self.expectedFileSize = filesize
self.fileBytesWritten = 0
self.offset += len(buf)
def addFileData(self, buf):
self.buf.write(buf)
self.fileBytesWritten += len(buf)
self.offset += len(buf)
pass
def completeFileRecord(self):
if self.fileBytesWritten != self.expectedFileSize:
raise Exception(f"Expected {self.expectedFileSize:,} bytes but {self.fileBytesWritten:,} were written.")
#write the end-of-file marker
blocks, remainder = divmod(self.fileBytesWritten, BLOCKSIZE)
if remainder > 0:
self.buf.write(self.NUL * (BLOCKSIZE - remainder))
self.offset += BLOCKSIZE - remainder
def completeTarFile(self):
self.buf.write(self.NUL * (BLOCKSIZE * 2))
self.offset += (BLOCKSIZE * 2)
blocks, remainder = divmod(self.offset, self.RECORDSIZE)
if remainder > 0:
self.buf.write(self.NUL * (self.RECORDSIZE - remainder))
An example use of the class is:
OUTPUT_CHUNK_SIZE = 1024 * 1024 * 5
f_out = open("test.tar", "wb")
def get_file_block(blockNum):
block = f"block_{blockNum:010,}"
block += "0123456789abcdef" * 31
return bytes(block, 'ascii')
def buffer_full_callback(x: StreamingTarFileWriter.MemoryByteStream, bytesAvailable: int):
while x.size() > OUTPUT_CHUNK_SIZE:
buf = x.read(OUTPUT_CHUNK_SIZE)
#This is where you would write the chunk to S3
f_out.write(buf)
x = StreamingTarFileWriter(buffer_full_callback, OUTPUT_CHUNK_SIZE)
import random
numFiles = random.randint(3,8)
print(f"Creating {numFiles:,} files.")
for fileIdx in range(numFiles):
minSize = 1025 #1kB plus 1 byte
maxSize = 10 * 1024 * 1024 * 1024 + 5 #10GB plus 5 bytes
numBytes = random.randint(minSize, maxSize)
print(f"Creating file {str(fileIdx)} with {numBytes:,} bytes.")
blocks,remainder = divmod(numBytes, 512)
x.addFileRecord(f"File{str(fileIdx)}", numBytes)
for block in range(blocks):
x.addFileData(get_file_block(block))
x.addFileData(bytes(("X" * remainder), 'ascii'))
x.completeFileRecord()

Problem in the function of my program code python

I tried to make a program to do the below things but apparently, the function doesn't work. I want my function to take two or more arguments and give me the average and median and the maximum number of those arguments.
example input:
calc([2, 20])
example output : (11.0, 11.0, 20)
def calc():
total = 0
calc = sorted(calc)
for x in range(len(calc)):
total += int(calc[x])
average = total / len(calc)
sorted(calc)
maximum = calc[len(calc) - 1]
if len(calc) % 2 != 0:
median = calc[(len(calc) // 2) + 1]
else:
median = (float(calc[(len(calc) // 2) - 1]) + float(calc[(len(calc) // 2)])) / 2
return (average, median, maximum)
There are some things I'm going to fix as I go since I can't help myself.
First, you main problem is arguments.
If you hand a function arguments
calc([2, 20])
It needs to accept arguments.
def calc(some_argument):
This will fix your main problem but another thing is you shouldn't have identical names for your variables.
calc is your function name so it should not also be the name of your list within your function.
# changed the arg name to lst
def calc(lst):
lst = sorted(lst)
# I'm going to just set these as variables since
# you're doing the calculations more than once
# it adds a lot of noise to your lines
size = len(lst)
mid = size // 2
total = 0
# in python we can just iterate over a list directly
# without indexing into it
# and python will unpack the variable into x
for x in lst:
total += int(x)
average = total / size
# we can get the last element in a list like so
maximum = lst[-1]
if size % 2 != 0:
# this was a logical error
# the actual element you want is mid
# since indexes start at 0
median = lst[mid]
else:
# here there is no reason to explicity cast to float
# since python division does that automatically
median = (lst[mid - 1] + lst[mid]) / 2
return (average, median, maximum)
print(calc([11.0, 11.0, 20]))
Output:
(14.0, 11.0, 20)
Because you are passing arguments into a function that doesn't accept any, you are getting an error. You could fix this just by making the first line of your program:
def calc(calc):
But it would be better to accept inputs into your function as something like "mylist". To do so you would just have to change your function like so:
def calc(mylist):
calc=sorted(mylist)

Python calculation of LennardJones 2D interaction pair correlation distribution function in Grand Canonical Ensemble

Edit
I believe there is a problem with the normalization of the histogram, since one must divide with the radius of each element.
I am trying trying to calculate the fluctuations of particle number and the radial distribution function of a 2d LennardJones(LJ) system using python3. Although I believe the particle fluctuations come out right, the pair correlation g(r) come right for small distances but then blow up ( the calculation uses numpy's histogram method).
The thing is, I can' t find out why such a behavior emerges- perhaps of some misunderstanding of a method? As it is, I am posting the relevant code right below, and if needed, I could also upload other parts of the code or the entire script.
Note first, that since we are working with the Grand-Canonical Ensemble, as the number of particles changes, so is the array that stores the particles- and perhaps that's another point where a mistake in implementation could exist.
Particle removal or insertion
def mcex(L,npart,particles,beta,rho0,V,en):
factorin=(rho0*V)/(npart+1)
factorout=(npart)/(V*rho0)
print("factorin=",factorin)
print("factorout",factorout)
# Produce random number and check:
rand = random.uniform(0, 1)
if rand <= 0.5:
# Insert a particle at a random location
x_new_coord = random.uniform(0, L)
y_new_coord = random.uniform(0, L)
new_particle = [x_new_coord,y_new_coord]
new_E = particleEnergy(new_particle,particles, npart+1)
deltaE = new_E
print("dEin=",deltaE)
# Acceptance rule for inserting
if(deltaE>10):
P_in=0
else:
P_in = (factorin) *math.exp(-beta*deltaE)
print("pinacc=",P_in)
rand= random.uniform(0, 1)
if rand <= P_in :
particles.append(new_particle)
en += deltaE
npart += 1
print("accepted insertion")
else:
if npart != 0:
p = random.randint(0, npart-1)
this_particle = particles[p]
prev_E = particleEnergy(this_particle, particles, p)
deltaE = prev_E
print("dEout=",deltaE)
# Acceptance rule for removing
if(deltaE>10):
P_re=1
else:
P_re = (factorout)*math.exp(beta*deltaE)
print("poutacc=",P_re)
rand = random.uniform(0, 1)
if rand <= P_re :
particles.remove(this_particle)
en += deltaE
npart = npart - 1
print("accepted removal")
print()
return particles, en, npart
Monte Carlo relevant part: for 1/10 runs, check the possibility of inserting or removing a particle
# MC
for step in range(0, runTimes):
print(step)
print()
rand = random.uniform(0,1)
if rand <= 0.9:
#----------- change energies-------------------------
#........
#........
else:
particles, en, N = mcex(L,N,particles,beta,rho0,V, en)
# stepList.append(step)
if((step+1)%1000==0):
for i, particle1 in enumerate(particles):
for j, particle2 in enumerate(particles):
if j!= i:
# print(particle1)
# print(particle2)
# print(i)
# print(j)
dist.append(distancesq(particle1, particle2))
NList.append(N)
where we call the function mcex and perhaps the particles array is not updated correctly:
def mcex(L,npart,particles,beta,rho0,V,en):
factorin=(rho0*V)/(npart+1)
factorout=(npart)/(V*rho0)
print("factorin=",factorin)
print("factorout",factorout)
# Produce random number and check:
rand = random.uniform(0, 1)
if rand <= 0.5:
# Insert a particle at a random location
x_new_coord = random.uniform(0, L)
y_new_coord = random.uniform(0, L)
new_particle = [x_new_coord,y_new_coord]
new_E = particleEnergy(new_particle,particles, npart+1)
deltaE = new_E
print("dEin=",deltaE)
# Acceptance rule for inserting
if(deltaE>10):
P_in=0
else:
P_in = (factorin) *math.exp(-beta*deltaE)
print("pinacc=",P_in)
rand= random.uniform(0, 1)
if rand <= P_in :
particles.append(new_particle)
en += deltaE
npart += 1
print("accepted insertion")
else:
if npart != 0:
p = random.randint(0, npart-1)
this_particle = particles[p]
prev_E = particleEnergy(this_particle, particles, p)
deltaE = prev_E
print("dEout=",deltaE)
# Acceptance rule for removing
if(deltaE>10):
P_re=1
else:
P_re = (factorout)*math.exp(beta*deltaE)
print("poutacc=",P_re)
rand = random.uniform(0, 1)
if rand <= P_re :
particles.remove(this_particle)
en += deltaE
npart = npart - 1
print("accepted removal")
print()
return particles, en, npart
and finally, we create the g(r) histogramm
where perhaps the normalization or the use of the histogram method are not as they should
RDF(N,particles,L)
with the function:
def RDF(N,particles, L):
minb=0
maxb=8
nbin=500
skata=np.asarray(dist).flatten()
rDf = np.histogram(skata, np.linspace(minb, maxb,nbin))
prefactor = (1/2/ np.pi)* (L**2/N **2) /len(dist) *( nbin /(maxb -minb) )
# prefactor = (1/(2* np.pi))*(L**2/N**2)/(len(dist)*num_increments/(rMax + 1.1 * dr ))
rDf = [prefactor*rDf[0], 0.5*(rDf[1][1:]+rDf[1][:-1])]
print('skata',len(rDf[0]))
print('incr',len(rDf[1]))
plt.figure()
plt.plot(rDf[1],rDf[0])
plt.xlabel("r")
plt.ylabel("g(r)")
plt.show()
The results are:
Particle N number fluctuations
and
[
but we want
Although I have accepted an answer, I am posting here some more details.
To normalize the pair correlation correctly one must divide each "number of particles found at a certain distance" or mathematically the sum of delta function of the distances , one must divide with the distance it's self.
Understanding first that a numpy.histogram is an array of two elements, first element the array of all counted events and second element the vector of bins, one must take each element of the first array, lets say np.histogram[0] and multiply it pairwise with np.histogram[1] of the second array.
That is, one must do the following:
def RDF(N,particles, L):
minb=0
maxb=25
nbin=200
width=(maxb-minb)/(nbin)
rings=np.linspace(minb, maxb,nbin)
skata=np.asarray(dist).flatten()
rDf = np.histogram(skata, rings ,density=True)
prefactor = (1/( np.pi*(L**2/N**2)))
rDf = [prefactor*rDf[0], 0.5*(rDf[1][1:]+rDf[1][:-1])]
rDf[0]=np.multiply(rDf[0],1/(rDf[1]*( width )))
where before the last multiply line, we are centering the bins so that their numbers equals the number of elements of the first array( you have five fingers, but four intermediate gaps between them)
Your g(r) is not correctly normalised. You need to divide the number of pairs found in each bin by the average density of the system times the area of the annulus associated to that bin, where the latter is just 2 pi r dr, with r being the bin's midpoint and dr the bin size. As far as I can tell, your prefactor does not contain the "r" bit. There is also something else that is missing, but it's hard to tell without knowing what all the other constants contain.
EDIT: here is a link that will guide you the implementation of a routine to compute the radial distribution function in 2D and 3D

Python Image Compression

I am using the Pillow library of Python to read in image files. How can I compress and decompress using Huffman encoding? Here is an instruction:
You have been given a set of example images and your goal is to compress them as much as possible without losing any perceptible information –upon decompression they should appear identical to the original images. Images are essentially stored as a series of points of color, where each point is represented as a combination of red, green, and blue (rgb). Each component of the rgb value ranges between 0-255, so for example: (100, 0, 200) would represent a shade of purple. Using a fixed-length encoding, each component of the rgb value requires 8 bits to encode (28= 256) meaning that the entire rgb value requires 24 bits to encode. You could use a compression algorithm like Huffman encoding to reduce the number of bits needed for more common values and thereby reduce the total number of bits needed to encode your image.
# For my current code I just read the image, get all the rgb and build the tree
from PIL import Image
import sys, string
import copy
codes = {}
def sortFreq(freqs):
letters = freqs.keys()
tuples = []
for let in letters:
tuples.append (freqs[let],let)
tuples.sort()
return tuples
def buildTree(tuples):
while len (tuples) > 1:
leastTwo = tuple (tuples[0:2]) # get the 2 to combine
theRest = tuples[2:] # all the others
combFreq = leastTwo[0][0] + leastTwo[1][0] # the branch points freq
tuples = theRest + [(combFreq, leastTwo)] # add branch point to the end
tuples.sort() # sort it into place
return tuples[0] # Return the single tree inside the list
def trimTree(tree):
# Trim the freq counters off, leaving just the letters
p = tree[1] # ignore freq count in [0]
if type (p) == type (""):
return p # if just a leaf, return it
else:
return (trimTree (p[0]), trimTree (p[1]) # trim left then right and recombine
def assignCodes(node, pat=''):
global codes
if type (node) == type (""):
codes[node] = pat # A leaf. Set its code
else:
assignCodes(node[0], pat+"0") # Branch point. Do the left branch
assignCodes(node[1], pat+"1") # then do the right branch.
dictionary = {}
table = {}
image = Image.open('fall.bmp')
#image.show()
width, height = image.size
px = image.load()
totalpixel = width*height
print ("Total pixel: "+ str(totalpixel))
for x in range (width):
for y in range (height):
# print (px[x, y])
for i in range (3):
if dictionary.get(str(px[x, y][i])) is None:
dictionary[str(px[x, y][i])] = 1
else:
dictionary[str(px[x, y][i])] = dictionary[str(px[x, y][i])] +1
table = copy.deepcopy(dictionary)
#combination = len(dictionary)
#for value in table:
# table[value] = table[value] / (totalpixel * combination) * 100
#print(table)
print(dictionary)
sortdic = sortFreq(dictionary)
tree = buildTree(sortdic)
trim = trimTree(tree)
print(trim)
assignCodes(trim)
print(codes)
The class HuffmanCoding takes complete path of the text file to be compressed as parameter. (as its data members store data specific to the input file).
The compress() function returns the path of the output compressed file.
The function decompress() requires path of the file to be decompressed. (and decompress() is to be called from the same object created for compression, so as to get code mapping from its data members)
import heapq
import os
class HeapNode:
def __init__(self, char, freq):
self.char = char
self.freq = freq
self.left = None
self.right = None
def __cmp__(self, other):
if(other == None):
return -1
if(not isinstance(other, HeapNode)):
return -1
return self.freq > other.freq
class HuffmanCoding:
def __init__(self, path):
self.path = path
self.heap = []
self.codes = {}
self.reverse_mapping = {}
# functions for compression:
def make_frequency_dict(self, text):
frequency = {}
for character in text:
if not character in frequency:
frequency[character] = 0
frequency[character] += 1
return frequency
def make_heap(self, frequency):
for key in frequency:
node = HeapNode(key, frequency[key])
heapq.heappush(self.heap, node)
def merge_nodes(self):
while(len(self.heap)>1):
node1 = heapq.heappop(self.heap)
node2 = heapq.heappop(self.heap)
merged = HeapNode(None, node1.freq + node2.freq)
merged.left = node1
merged.right = node2
heapq.heappush(self.heap, merged)
def make_codes_helper(self, root, current_code):
if(root == None):
return
if(root.char != None):
self.codes[root.char] = current_code
self.reverse_mapping[current_code] = root.char
return
self.make_codes_helper(root.left, current_code + "0")
self.make_codes_helper(root.right, current_code + "1")
def make_codes(self):
root = heapq.heappop(self.heap)
current_code = ""
self.make_codes_helper(root, current_code)
def get_encoded_text(self, text):
encoded_text = ""
for character in text:
encoded_text += self.codes[character]
return encoded_text
def pad_encoded_text(self, encoded_text):
extra_padding = 8 - len(encoded_text) % 8
for i in range(extra_padding):
encoded_text += "0"
padded_info = "{0:08b}".format(extra_padding)
encoded_text = padded_info + encoded_text
return encoded_text
def get_byte_array(self, padded_encoded_text):
if(len(padded_encoded_text) % 8 != 0):
print("Encoded text not padded properly")
exit(0)
b = bytearray()
for i in range(0, len(padded_encoded_text), 8):
byte = padded_encoded_text[i:i+8]
b.append(int(byte, 2))
return b
def compress(self):
filename, file_extension = os.path.splitext(self.path)
output_path = filename + ".bin"
with open(self.path, 'r+') as file, open(output_path, 'wb') as output:
text = file.read()
text = text.rstrip()
frequency = self.make_frequency_dict(text)
self.make_heap(frequency)
self.merge_nodes()
self.make_codes()
encoded_text = self.get_encoded_text(text)
padded_encoded_text = self.pad_encoded_text(encoded_text)
b = self.get_byte_array(padded_encoded_text)
output.write(bytes(b))
print("Compressed")
return output_path
""" functions for decompression: """
def remove_padding(self, padded_encoded_text):
padded_info = padded_encoded_text[:8]
extra_padding = int(padded_info, 2)
padded_encoded_text = padded_encoded_text[8:]
encoded_text = padded_encoded_text[:-1*extra_padding]
return encoded_text
def decode_text(self, encoded_text):
current_code = ""
decoded_text = ""
for bit in encoded_text:
current_code += bit
if(current_code in self.reverse_mapping):
character = self.reverse_mapping[current_code]
decoded_text += character
current_code = ""
return decoded_text
def decompress(self, input_path):
filename, file_extension = os.path.splitext(self.path)
output_path = filename + "_decompressed" + ".txt"
with open(input_path, 'rb') as file, open(output_path, 'w') as output:
bit_string = ""
byte = file.read(1)
while(byte != ""):
byte = ord(byte)
bits = bin(byte)[2:].rjust(8, '0')
bit_string += bits
byte = file.read(1)
encoded_text = self.remove_padding(bit_string)
decompressed_text = self.decode_text(encoded_text)
output.write(decompressed_text)
print("Decompressed")
return output_path
Running the program:
Save the above code, in a file huffman.py.
Create a sample text file. Or download a sample file from sample.txt (right click, save as)
Save the code below, in the same directory as the above code, and Run this python code (edit the path variable below before running. initialize it to text file path)
UseHuffman.py
from huffman import HuffmanCoding
#input file path
path = "/home/ubuntu/Downloads/sample.txt"
h = HuffmanCoding(path)
output_path = h.compress()
h.decompress(output_path)
The compressed .bin file and the decompressed file are both saved in the same directory as of the input file.
Result
On running on the above linked sample text file:
Initial Size: 715.3 kB
Compressed file Size: 394.0 kB
Plus, the decompressed file comes out to be exactly the same as the original file, without any data loss.
And that is all for Huffman Coding implementation, with compression and decompression. This was fun to code.
The above program requires the decompression function to be run using the same object that created the compression file (because the code mapping is stored in its data members). We can also make the compression and decompression function run independently, if somehow, during compression we store the mapping info also in the compressed file (in the beginning). Then, during decompression, we will first read the mapping info from the file, then use that mapping info to decompress the rest file.

AttributeError: 'NoneType' object has no attribute 'get_width'

This is simple script on Ascii art generator from image , I get this error :
I run it in cmd line , and I am using windows 7 operating system
Traceback (most recent call last):
File "C:\Python33\mbwiga.py", line 251, in <module>
converter.convertImage(sys.argv[-1])
File "C:\Python33\mbwiga.py", line 228, in convertImage
self.getBlobs()
File "C:\Python33\mbwiga.py", line 190, in getBlobs
width, height = self.cat.get_width(), self.cat.get_height()
AttributeError: 'NoneType' object has no attribute 'get_width'
what am I messing here..?? can some one help..?
Here is full source code some one asked :
import sys
import pygame
NAME = sys.argv[0]
VERSION = "0.1.0" # The current version number.
HELP = """ {0} : An ASCII art generator. Version {1}
Usage:
{0} [-b BLOB_SIZE] [-p FONT_WIDTH:HEIGHT] [-c] image_filename
Commands:
-b | --blob Change the blob size used for grouping pixels. This is the width of the blob; the height is calculated by multiplying the blob size by the aspect ratio.
-p | --pixel-aspect Change the font character aspect ratio. By default this is 11:5, which seems to look nice. Change it based on the size of your font. Argument is specified in the format "WIDTH:HEIGHT". The colon is important.
-c | --colour Use colour codes in the output. {0} uses VT100 codes by default, limiting it to 8 colours, but this might be changed later.
-h | --help Shows this help.""""
.format(NAME, VERSION)
NO_IMAGE = \
""" Usage: %s [-b BLOB_SIZE] [-p FONT_WIDTH:HEIGHT] image_filename """ % (NAME)
import math
CAN_HAS_PYGAME = False
try:
import pygame
except ImportError:
sys.stderr.write("Can't use Pygame's image handling! Unable to proceed, sorry D:\n")
exit(-1)
VT100_COLOURS = {"000": "[0;30;40m",
"001": "[0;30;41m",
"010": "[0;30;42m",
"011": "[0;30;43m",
"100": "[0;30;44m",
"101": "[0;30;45m",
"110": "[0;30;46m",
"111": "[0;30;47m",
"blank": "[0m"}
VT100_COLOURS_I = {"000": "[0;40;30m",
"001": "[0;40;31m",
"010": "[0;40;32m",
"011": "[0;40;33m",
"100": "[0;40;34m",
"101": "[0;40;35m",
"110": "[0;40;36m",
"111": "[0;40;37m",
"blank": "[0m"}
# Convenient debug function.
DO_DEBUG = True
def debug(*args):
if not DO_DEBUG: return # Abort early, (but not often).
strrep = ""
for ii in args:
strrep += str(ii)
sys.stderr.write(strrep + "\n") # Write it to stderr. Niiicce.
# System init.
def init():
""" Start the necessary subsystems. """
pygame.init() # This is the only one at the moment...
# Get a section of the surface.
def getSubsurface(surf, x, y, w, h):
try:
return surf.subsurface(pygame.Rect(x, y, w, h))
except ValueError as er:
return getSubsurface(surf, x, y, w - 2, h - 2)
# The main class.
class AAGen:
""" A class to turn pictures into ASCII "art". """
def __init__(self):
""" Set things up for a default conversion. """
# Various blob settings.
self.aspectRatio = 11.0 / 5.0 # The default on my terminal.
self.blobW = 12 # The width. Also, the baseline for aspect ratio.
self.blobH = self.aspectRatio * self.blobW # The height.
self.blobList = []
self.cat = None # The currently open file.
self.chars = """##%H(ks+i,. """ # The characters to use.
self.colour = False # Do we use colour?
def processArgs(self):
""" Process the command line arguments, and remove any pertinent ones. """
cc = 0
for ii in sys.argv[1:]:
cc += 1
if ii == "-b" or ii == "--blob":
self.setBlob(int(sys.argv[cc + 1]))
elif ii == "-p" or ii == "--pixel-aspect":
jj = sys.argv[cc + 1]
self.setAspect(float(jj.split(":")[1]) / float(jj.split(":")[0]))
elif ii == "-c" or ii == "--colour":
self.colour = True
elif ii == "-h" or ii == "--help":
print(HELP)
exit(0)
if len(sys.argv) == 1:
print(NO_IMAGE)
exit(0)
def setBlob(self, blobW):
""" Set the blob size. """
self.blobW = blobW
self.blobH = int(math.ceil(self.aspectRatio * self.blobW))
def setAspect(self, aspect):
""" Set the aspect ratio. Also adjust the blob height. """
self.aspectRatio = aspect
self.blobH = int(math.ceil(self.blobW * self.aspectRatio))
def loadImg(self, fname):
""" Loads an image into the store. """
try:
tmpSurf = pygame.image.load(fname)
except:
print("Either this is an unsupported format, or we had problems loading the file.")
return None
self.cat = tmpSurf.convert(32)
if self.cat == None:
sys.stderr.write("Problem loading the image %s. Can't convert it!\n"
% fname)
return None
def makeBlob(self, section):
""" Blob a section into a single ASCII character."""
pxArr = pygame.surfarray.pixels3d(section)
colour = [0, 0, 0]
size = 0 # The number of pixels.
# Get the density/colours.
for i in pxArr:
for j in i:
size += 1
# Add to the colour.
colour[0] += j[0]
colour[1] += j[1]
colour[2] += j[2]
# Get just the greyscale.
grey = apply(lambda x, y, z: (x + y + z) / 3 / size,
colour)
if self.colour:
# Get the 3 bit colour.
threshold = 128
nearest = ""
nearest += "1" if (colour[0] / size > threshold) else "0"
nearest += "1" if (colour[1] / size > threshold) else "0"
nearest += "1" if (colour[2] / size > threshold) else "0"
return VT100_COLOURS[nearest], grey
return grey
# We just use a nasty mean function to find the average value.
# total = 0
# for pix in pxArr.flat:
# total += pix # flat is the array as a single-dimension one.
# return total / pxArr.size # This is a bad way to do it, it loses huge amounts of precision with large blob size. However, with ASCII art...
def getBlobs(self):
""" Get a list of blob locations. """
self.blobList = [] # Null it out.
width, height = self.cat.get_width(), self.cat.get_height()
# If the image is the wrong size for blobs, add extra space.
if height % self.blobH != 0 or width % self.blobW != 0:
oldimg = self.cat
newW = width - (width % self.blobW) + self.blobW
newH = height - (height % self.blobH) + self.blobH
self.cat = pygame.Surface((newW, newH))
self.cat.fill((255, 255, 255))
self.cat.blit(oldimg, pygame.Rect(0, 0, newW, newH))
# Loop over subsections.
for row in range(0, height, int(self.blobH)):
rowItem = []
for column in range(0, width, self.blobW):
# Construct a Rect to use.
src = pygame.Rect(column, row, self.blobW, self.blobH)
# Now, append the reference.
rowItem.append(self.cat.subsurface(src))
self.blobList.append(rowItem)
return self.blobList
def getCharacter(self, value, colour = False):
""" Get the correct character for a pixel value. """
col = value[0] if colour else ""
value = value[1] if colour else value
if not 0 <= value <= 256:
sys.stderr.write("Incorrect pixel data provided! (given %d)\n"
% value)
return "E"
char = self.chars[int(math.ceil(value / len(self.chars))) % len(self.chars)]
return char + col
def convertImage(self, fname):
""" Convert an image, and print it. """
self.loadImg(fname)
self.getBlobs()
pval = "" # The output value.
# Loop and add characters.
for ii in converter.blobList:
for jj in ii:
ch = self.makeBlob(jj)
pval += self.getCharacter(ch, self.colour) # Get the character.
# Reset the colour at the end of the line.
if self.colour: pval += VT100_COLOURS["blank"]
pval += "\n" # Split it up by line.
pval = pval[:-1] # Cut out the final newline.
print(pval) # Print it.
# Main program execution.
if __name__ == "__main__":
init()
converter = AAGen()
converter.processArgs()
converter.convertImage(sys.argv[-1])
sys.exit(1)
The problem is hidden somewhere in the loadImg. The error says that self.cat is None. The self.cat could get the None when initialised at the line 97, or it was assigned the result of tmpSurf.convert(32) and the result of that call is None. In the first case, you should observe the message Either this is an unsupported format..., in the later case you should see the message Problem loading the image... as you are testing self.cat against None:
def loadImg(self, fname):
""" Loads an image into the store. """
try:
tmpSurf = pygame.image.load(fname)
except:
print("Either this is an unsupported format, or we had problems loading the file.")
return None
self.cat = tmpSurf.convert(32)
if self.cat == None:
sys.stderr.write("Problem loading the image %s. Can't convert it!\n"
% fname)
return None
By the way, return None is exactly the same as return without argument. Also, the last return None can be completely removed because any function implicitly returns None when the end of the body is reached.
For testing to None, the is operator is recommended -- i.e. if self.cat is None:.
Update based on the comment from May 31.
If you want to make a step further, you should really learn Python a bit. Have a look at the end of the original script (indentation fixed):
# Main program execution.
if __name__ == "__main__":
init() # pygame is initialized here
converter = AAGen() # you need the converter object
converter.processArgs() # the command-line arguments are
# converted to the object attributes
converter.convertImage(sys.argv[-1]) # here the conversion happens
sys.exit(1) # this is unneccessary for the conversion
If the original script is saved in the mbwiga.py, then you can or call it as a script, or you can use it as a module. In the later case, the body below the if __name__ == "__main__": is not executed, and you have to do it in the caller script on your own. Say you have test.py that tries to do that. Say it is located at the same directory. It must import the mbwiga. Then the mbwiga. becomes the prefix of the functionality from inside the module. Your code may look like this:
import mbwiga
mbwiga.init() # pygame is initialized here
converter = mbwiga.AAGen() # you need the converter object
# Now the converter is your own object name. It does not take the mbwiga. prefix.
# The original converter.processArgs() took the argumens from the command-line
# when mbwiga.py was called as a script. If you want to use some of the arguments
# you can set the converter object's attributes the way that is shown
# in the .processArgs() method definition. Or you can call it the same way to
# extract the information from the command line passed when you called the test.py
#
converter.processArgs()
# Now the conversion
converter.convertImage('myImageFilename.xxx') # here the conversion happens

Resources