Porting a sub-class of python2 'file' class to python3 - python-3.x

I have a legacy code that calls class TiffFile(file). What is the python3 way to call it?
I tried to replace following in python2:
class TiffFile(file):
def __init__(self, path):
file.__init__(self, path, 'r+b')
By this in python3:
class TiffFile(RawIOBase):
def __init__(self, path):
super(TiffFile, self).__init__(path, 'r+b')
But now I am getting TypeError: object.__init__() takes no parameters

RawIOBase.__init__ does not take any arguments, that is where you error is.
Your TiffFile implementation also inherits file which is not a a class, but a constructor function, so your Python 2 implementation is non-idiomatic, and someone could even claim it is wrong-ish. You should use open instead of file, and in a class context you should use an io module class for input and output.
You can use open to return a file object for use as you would use file in Python 2.7 or you can use io.FileIO in both Python 2 and Python 3 for accessing file streams, like you would do with open.
So your implementation would be more like:
import io
class TiffFile(io.FileIO):
def __init__(self, name, mode='r+b', *args, **kwargs):
super(TiffFile, self).__init__(name, mode, *args, **kwargs)
This should work in all currently supported Python versions and allow you the same interface as your old implementation while being more correct and portable.
Are you actually using r+b for opening the file in read-write-binary mode on Windows? You should probably be using rb mode if you are not writing into the file, but just reading the TIFF data. rb would open the file in binary mode for reading only. The appended + sets the file to open in read-write mode.

Related

How to combine multiple custom management commands in Django?

I wrote a set of commands for my Django project in the usual way. Is it possible to combine multiple commands into one command?
class Command(BaseCommand):
""" import all files in media in dir and add to db
after call process to generate thumbnails """
def handle(self, *args, **options):
...
To make easy steps I have commands like: import files, read metadata from files, create thumbnails etc. The goal now is to create a "do all" command that somehow imports these commands and executes them one after another.
How to do that?
You can define a DoAll command and use django django.core.management.call_command to run all your subcommands. Try something like below:
class FoobarCommand(BaseCommand):
""" call foo and bar commands """
def handle(self, *args, **options):
call_command("foo_command", *args, **options)
call_command("bar_command", *args, **options)

Overriding file.write in python 3

I'm aware of the SO post How do I override file.write() in Python 3? but after looking it over and trying whats suggested I'm still stuck.
I want to override the file.write method in Python 3 so that I can "REDACT" certain words (Usernames, Passwords...etc).
I found a great example of overriding the print and general stdout and stderr http://code.activestate.com/recipes/119404/
The issue is that it doesn't work for file.write. How can I override the file.write?
My code for redacting when printing is:
def write(self, text):
for word in self.redacted_list:
text = text.replace(word, "REDACTED")
self.origOut.write(text)
return text
thanks
From the self.origOut.write(text) I assume you are trying to write an in-between-class that pretends to be a file but provides a different .write() method.
I don't see any problems in the code you posted (assuming it's a method of a class you use). Possibly you wrote a class but forgot to create instances of it?
Did you try to write something like this?:
class IAmNoARealFile:
def __init__(self, real_file):
self.origOut = real_file
def __getattr__(self, attr_name): # provide everything a file has
return getattr(self.origOut, attr_name)
def write(self, ...):
...
with open('test.txt', 'w') as f:
f = IAmNotARealFile(f) # did you forget this?
f.write('some text SECRET blah SECRET') # calls IAMNotARealFile.write with your extra code
with open('test.txt') as f:
f = IAmNotARealFile(f)
print(f.read()) # this "falls through" to the actual file object
you will also probably want to return self.origOut.write() in your own .write(), if you don't have a specific reason not to.
Note that if you rewrite open() to directly return IAMNotARealFile:
def open(*args, **kwargs):
return IAMNotARealFile(open(*args, **kwargs))
you will have to manually supply (some) "magic methods" because
This method may still be bypassed when looking up special methods as the result of implicit invocation via language syntax or built-in functions. See Special method lookup.
--docs for .__getattribute__(), but it also applies to .__getattr__()
Why?
Bypassing the __getattribute__() machinery in this fashion provides significant scope for speed optimisations within the interpreter, at the cost of some flexibility in the handling of special methods (the special method must be set on the class object itself in order to be consistently invoked by the interpreter).
-- On special ("magic") method lookup [code style and emphasis mine]

python (3.7) dataclass for self referenced structure [duplicate]

This question already has answers here:
How do I type hint a method with the type of the enclosing class?
(7 answers)
Closed 3 years ago.
class Node:
def append_child(self, node: Node):
if node != None:
self.first_child = node
self.child_nodes += [node]
How do I do node: Node? Because when I run it, it says name 'Node' is not defined.
Should I just remove the : Node and instance check it inside the function?
But then how could I access node's properties (which I would expect to be instance of Node class)?
I don't know how implement type casting in Python, BTW.
"self" references in type checking are typically done using strings:
class Node:
def append_child(self, node: 'Node'):
if node != None:
self.first_child = node
self.child_nodes += [node]
This is described in the "Forward references" section of PEP-0484.
Please note that this doesn't do any type-checking or casting. This is a type hint which python (normally) disregards completely1. However, third party tools (e.g. mypy), use type hints to do static analysis on your code and can generate errors before runtime.
Also, starting with python3.7, you can implicitly convert all of your type-hints to strings within a module by using the from __future__ import annotations (and in python4.0, this will be the default).
1The hints are introspectable -- So you could use them to build some kind of runtime checker using decorators or the like if you really wanted to, but python doesn't do this by default.
Python 3.7 and Python 4.03.10 onwards
PEP 563 introduced postponed evaluations, stored in __annotations__ as strings. A user can enable this through the __future__ directive:
from __future__ import annotations
This makes it possible to write:
class C:
a: C
def foo(self, b: C):
...
Starting in Python 3.10 (release planned 2021-10-04), this behaviour will be default.
Edit 2020-11-15: Originally it was announced to be mandatory starting in Python 4.0, but now it appears this will be default already in Python 3.10, which is expected 2021-10-04. This surprises me as it appears to be a violation of the promise in __future__ that this backward compatibility would not be broken until Python 4.0. Maybe the developers consider than 3.10 is 4.0, or maybe they have changed their mind. See also Why did __future__ MandatoryRelease for annotations change between 3.7 and 3.8?.
In Python > 3.7 you can use dataclass. You can also annotate dataclass.
In this particular example Node references itself and if you run it you will get
NameError: name 'Node' is not defined
To overcome this error you have to include:
from __future__ import annotations
It must be the first line in a module. In Python 4.0 and above you don't have to include annotations
from __future__ import annotations
from dataclasses import dataclass
#dataclass
class Node:
value: int
left: Node
right: Node
#property
def is_leaf(self) -> bool:
"""Check if node is a leaf"""
return not self.left and not self.right
Example:
node5 = Node(5, None, None)
node25 = Node(25, None, None)
node40 = Node(40, None, None)
node10 = Node(10, None, None)
# balanced tree
node30 = Node(30, node25, node40)
root = Node(20, node10, node30)
# unbalanced tree
node30 = Node(30, node5, node40)
root = Node(20, node10, node30)
If you just want an answer to the question, go read mgilson's answer.
mgilson's answer provides a good explanation of how you should work around this limitation of Python. But I think it's also important to have a good understanding of why this doesn't work, so I'm going to provide that explanation.
Python is a little different from other languages. In Python, there's really no such thing as a "declaration." As far as Python is concerned, code is just code. When you import a module, Python creates a new namespace (a place where global variables can live), and then executes each line of the module from top to bottom. def foo(args): code is just a compound statement that bundles a bunch of source code together into a function and binds that function to the name foo. Similarly, class Bar(bases): code creates a class, executes all of the code immediately (inside a separate namespace which holds any class-level variables that might be created by the code, particularly including methods created with def), and then binds that class to the name Bar. It has to execute the code immediately, because all of the methods need to be created immediately. Because the code gets executed before the name has been bound, you can't refer to the class at the top level of the code. It's perfectly fine to refer to the class inside of a method, however, because that code doesn't run until the method gets called.
(You might be wondering why we can't just bind the name first and then execute the code. It turns out that, because of the way Python implements classes, you have to know which methods exist up front, before you can even create the class object. It would be possible to create an empty class and then bind all of the methods to it one at a time with attribute assignment (and indeed, you can manually do this, by writing class Bar: pass and then doing def method1():...; Bar.method1 = method1 and so on), but this would result in a more complicated implementation, and be a little harder to conceptualize, so Python does not do this.)
To summarize in code:
class C:
C # NameError: C doesn't exist yet.
def method(self):
return C # This is fine. By the time the method gets called, C will exist.
C # This is fine; the class has been created by the time we hit this line.

Python - Multiprocessing vs Multithreading for file-based work

I have created a GUI (using wxPython) that allows the user to search for files and their content. The program consists of three parts:
The main GUI window
The searching mechanism (seperate class)
The output window that displays the results of the file search (seperate class)
Currently I'm using pythons threading module to run the searching mechanism (2) in a separate thread, so that the main GUI can still work flawlessly. I'm passing the results during runtime to the output window (3) using Queue. This works fine for less performance requiring file-reading-actions, but as soon as the searching mechanism requires more performance, the main GUI window (1) starts lagging.
This is roughly the schematic:
import threading
import os
import wx
class MainWindow(wx.Frame): # this is point (1)
def __init__(self, parent, ...):
# Initialize frame and panels etc.
self.button_search = wx.Button(self, label="Search")
self.button_search.Bind(wx.EVT_BUTTON, self.onSearch)
def onSearch(self, event):
"""Run filesearch in separate Thread""" # Here is the kernel of my question
filesearch = FileSearch().Run()
searcher = threading.Thread(target=filesearch, args=(args,))
searcher.start()
class FileSearch(): # this is point (2)
def __init__(self):
...
def Run(self, args):
"""Performs file search"""
for root, dirs, files in os.walk(...):
for file in files:
...
def DetectEncoding(self):
"""Detects encoding of file for reading"""
...
class OutputWindow(wx.Frame): # this is point (3)
def __init__(self, parent, ...):
# Initialize output window
...
def AppendItem(self, data):
"""Appends a fileitem to the list"""
...
My questions:
Is python's multiprocessing module better suited for this specific performance requiring job?
If yes, which way of interprocess communication (IPC) should I choose to send the results from the searching mechanism class (2) to the output window (3) and how should implement it schematically?

Inheritance and Pandas

I am trying to create a file writer based on Pandas' ExcelWriter. I proceeded as I usually do with classes in Python (3) with inheritance:
import pandas as pd
class Writer(pd.ExcelWriter):
def __init__(self, fname, engine='openpyxl'):
pd.ExcelWriter.__init__(self, fname, engine=engine)
self.newvar = 0
However, when I try to use it, I cannot access newvar:
test = Writer('test.xlsx')
test.newvar
returns:
AttributeError: '_XlsxWriter' object has no attribute 'nmax'
And when I check the type of test, it returns:
pandas.io.excel._XlsxWriter
I don't understand what I am missing since I used this kind of inheritance in many other cases. Any idea would be appreciated!
This is because pandas.ExcelWriter.__new__ returns a different class than itself (actually it is an abc.ABCMeta). The class is chosen based on the extension of the file path and the engine which is used - you could observe that when you checked the type of the newly created instance. That means the __init__ method of whatever class is returned gets called. You can think of ExcelWriter as some kind of proxy for the specific writers for each format and engine (though it also defines the API which such a writer must provide).
In order to make your writer available (for the given engine), you need to register it.
But before you can do that you need to make your class compatible by following the instructions which you'll find via help(pandas.ExcelWriter). For the sake of completeness I cite them here:
# Defining an ExcelWriter implementation (see abstract methods for more...)
# - Mandatory
# - ``write_cells(self, cells, sheet_name=None, startrow=0, startcol=0)``
# --> called to write additional DataFrames to disk
# - ``supported_extensions`` (tuple of supported extensions), used to
# check that engine supports the given extension.
# - ``engine`` - string that gives the engine name. Necessary to
# instantiate class directly and bypass ``ExcelWriterMeta`` engine
# lookup.
# - ``save(self)`` --> called to save file to disk
# - Mostly mandatory (i.e. should at least exist)
# - book, cur_sheet, path
# - Optional:
# - ``__init__(self, path, engine=None, **kwargs)`` --> always called
# with path as first argument.
So with that in mind we can extend your class:
class Writer(pd.ExcelWriter):
engine = 'openpyxl'
supported_extensions = ('xlsx',)
def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0):
# Implement something useful here.
pass
def save(self):
# Implement something useful here.
pass
def __init__(self, fname, engine='openpyxl', **kwargs):
super().__init__(self, fname, engine=engine, **kwargs)
Now you can use pd.io.excel.register_writer(Writer) to register the writer. But you need to make sure the engine which you've specified matches your version of openpyxl. You can check the process of how a specific writer is chosen here; the writers which are currently registered for each version can be checked via print(pd.io.excel._writers).
As a side note: You can also subclass one of the already available specific writers and reuse their write_cells and save methods for example (however you'll need to register your writer also in that case):
_Openpyxl1Writer
_Openpyxl20Writer
_Openpyxl22Writer
_XlwtWriter
_XlsxWriter

Resources