Improper Use of _New_ to Generate Class Instances

Improper use of __new__ to generate class instances?

I don't think using __new__() to do what you want is improper. In other words, I disagree with the accepted answer to this question which claims factory functions are always the "best way to do it".

If you really want to avoid using it, then the only options are metaclasses or a separate factory function/method (however see Python 3.6+ Update below). Given the choices available, making the __new__() method one — since it's static by default — is a perfectly sensible approach.

That said, below is what I think is an improved version of your code. I've added a couple of class methods to assist in automatically finding all the subclasses. These support the most important way in which it's better — which is now adding subclasses doesn't require modifying the __new__() method. This means it's now easily extensible since it effectively supports what you could call virtual constructors.

A similar implementation could also be used to move the creation of instances out of the __new__() method into a separate (static) factory method — so in one sense the technique shown is just a relatively simple way of coding an extensible generic factory function regardless of what name it's given.

# Works in Python 2 and 3.

import os
import re

class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass

# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')

@classmethod
def _get_all_subclasses(cls):
""" Recursive generator of all class' subclasses. """
for subclass in cls.__subclasses__():
yield subclass
for subclass in subclass._get_all_subclasses():
yield subclass

@classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None

def __new__(cls, path):
""" Create instance of appropriate subclass using path prefix. """
path_prefix = cls._get_prefix(path)

for subclass in cls._get_all_subclasses():
if subclass.prefix == path_prefix:
# Using "object" base class method avoids recursion here.
return object.__new__(subclass)
else: # No subclass with matching prefix found (& no default defined)
raise FileSystem.Unknown(
'path "{}" has no known file system prefix'.format(path))

def count_files(self):
raise NotImplementedError

class Nfs(FileSystem):
prefix = 'nfs'

def __init__ (self, path):
pass

def count_files(self):
pass

class LocalDrive(FileSystem):
prefix = None # Default when no file system prefix is found.

def __init__(self, path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path

def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))

if __name__ == '__main__':

data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.

print(type(data1).__name__) # -> Nfs
print(type(data2).__name__) # -> LocalDrive

print(data2.count_files()) # -> <some number>

Python 3.6+ Update

The code above works in both Python 2 and 3.x. However in Python 3.6 a new class method was added to object named __init_subclass__() which makes the finding of subclasses simpler by using it to automatically create a "registry" of them instead of potentially having to check every subclass recursively as the _get_all_subclasses() method is doing in the above.

I got the idea of using __init_subclass__() to do this from the Subclass registration section in the PEP 487 -- Simpler customisation of class creation proposal. Since the method will be inherited by all the base class' subclasses, registration will automatically be done for sub-subclasses, too (as opposed to only to direct subclasses) — it completely eliminates the need for a method like _get_all_subclasses().

# Requires Python 3.6+

import os
import re

class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass

# Pattern for matching "xxx://" # x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.

@classmethod
def __init_subclass__(cls, /, path_prefix, **kwargs):
super().__init_subclass__(**kwargs)
cls._registry[path_prefix] = cls # Add class to registry.

@classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None

def __new__(cls, path):
""" Create instance of appropriate subclass. """
path_prefix = cls._get_prefix(path)
subclass = cls._registry.get(path_prefix)
if subclass:
return object.__new__(subclass)
else: # No subclass with matching prefix found (and no default).
raise cls.Unknown(
f'path "{path}" has no known file system prefix')

def count_files(self):
raise NotImplementedError

class Nfs(FileSystem, path_prefix='nfs'):
def __init__ (self, path):
pass

def count_files(self):
pass

class Ufs(Nfs, path_prefix='ufs'):
def __init__ (self, path):
pass

def count_files(self):
pass

class LocalDrive(FileSystem, path_prefix=None): # Default file system.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise self.NoAccess(f'Cannot read directory {path!r}')
self.path = path

def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))

if __name__ == '__main__':

data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
data4 = FileSystem('ufs://192.168.1.18')

print(type(data1)) # -> <class '__main__.Nfs'>
print(type(data2)) # -> <class '__main__.LocalDrive'>
print(f'file count: {data2.count_files()}') # -> file count: <some number>

try:
data3 = FileSystem('c:/foobar') # A non-existent directory.
except FileSystem.NoAccess as exc:
print(f'{exc} - FileSystem.NoAccess exception raised as expected')
else:
raise RuntimeError("Non-existent path should have raised Exception!")

try:
data4 = FileSystem('foobar://42') # Unregistered path prefix.
except FileSystem.Unknown as exc:
print(f'{exc} - FileSystem.Unknown exception raised as expected')
else:
raise RuntimeError("Unregistered path prefix should have raised Exception!")

How do I create an instance of a nested class from within that class in Python?

Your error comes from how python handles classes.

When it encounters a class statement, the body of the class is run, and the names it defines are placed in a separate namespace, which will eventually become the class __dict__. The class object is not created and bound to its name until (well) after the body has run. That means that when you put class NestedClass: inside the body of class ExampleClass:, ExampleClass does not exist yet, and neither does NestedClass. Indirectly because of this, all the new class namespaces live in the top level available namespace (e.g. global or function), and are not actually nested within one another.

As a consequence of this order of operations, class bodies are not aware of the namespaces of surrounding classes at all. So the namespace of NestedClass looks out to the global namespace, not to the __dict__ of ExampleClass, as you might expect coming from say Java. A class defined in a function would be able to see the functions local namespace before globals, but still not that of an enclosing class.

And so, the line newclass = NestedClass() raises an error. The name NestedClass does not exist in the function's namespace, or in the global namespace. There are three simple workarounds available:

  1. Use the staticly scoped __class__:

    newclass = __class__()
  2. Refer to the class by its global name:

    newclass = ExampleClass.NestedClass()
  3. Don't use nested classes in Python. This is generally the preferred approach. Just move NestedClass to the top level. Then your makenew method will work without modification, and ExampleClass.Example can refer to NestedClass directly instead of as self.NestedClass.

Why is __init__() always called after __new__()?

Use __new__ when you need to control
the creation of a new instance.

Use
__init__ when you need to control initialization of a new instance.

__new__ is the first step of instance creation. It's called first, and is
responsible for returning a new
instance of your class.

In contrast,
__init__ doesn't return anything; it's only responsible for initializing the
instance after it's been created.

In general, you shouldn't need to
override __new__ unless you're
subclassing an immutable type like
str, int, unicode or tuple.

From April 2008 post: When to use __new__ vs. __init__? on mail.python.org.

You should consider that what you are trying to do is usually done with a Factory and that's the best way to do it. Using __new__ is not a good clean solution so please consider the usage of a factory. Here's a good example: ActiveState Fᴀᴄᴛᴏʀʏ ᴘᴀᴛᴛᴇʀɴ Recipe.

Is it bad to store all instances of a class in a class field?

Yes. It's bad. It conflates the instance with the collection of instances.

Collections are one thing.

The instances which are collected are unrelated.

Also, class-level variables which get updated confuse some of us. Yes, we can eventually reason on what's going on, but the Standard Expectation™ is that state change applies to objects, not classes.


 class Foobar_Collection( dict ):
def __init__( self, *arg, **kw ):
super( Foobar_Collection, self ).__init__( *arg, **kw ):
def foobar( self, *arg, **kw ):
fb= Foobar( *arg, **kw )
self[fb.name]= fb
return fb

class Foobar( object ):
def __init__( self, name, something )
self.name= name
self.something= something

fc= Foobar_Collection()
fc.foobar( 'first', 42 )
fc.foobar( 'second', 77 )

for name in fc:
print name, fc[name]

That's more typical.


In your example, the wait_for_deps is simply a method of the task collection, not the individual task. You don't need globals.

You need to refactor.

__new__ and __init__ in Python

how I should structure the class using __init__ and __new__ as they are different and both accepts arbitrary arguments besides default first argument.

Only rarely will you have to worry about __new__. Usually, you'll just define __init__ and let the default __new__ pass the constructor arguments to it.

self keyword is in terms of name can be changed to something else? But I am wondering cls is in terms of name is subject to change to something else as it is just a parameter name?

Both are just parameter names with no special meaning in the language. But their use is a very strong convention in the Python community; most Pythonistas will never change the names self and cls in these contexts and will be confused when someone else does.

Note that your use of def __new__(tuple) re-binds the name tuple inside the constructor function. When actually implementing __new__, you'll want to do it as

def __new__(cls, *args, **kwargs):
# do allocation to get an object, say, obj
return obj

Albeit I said I want to return tuple, this code works fine and returned me [1,2,3].

MyClass() will have the value that __new__ returns. There's no implicit type checking in Python; it's the responsibility of the programmer to return the correct type ("we're all consenting adults here"). Being able to return a different type than requested can be useful for implementing factories: you can return a subclass of the type requested.

This also explains the issubclass/isinstance behavior you observe: the subclass relationship follows from your use of class MyClass(tuple), the isinstance reflects that you return the "wrong" type from __new__.

For reference, check out the requirements for __new__ in the Python Language Reference.

Edit: ok, here's an example of potentially useful use of __new__. The class Eel keeps track of how many eels are alive in the process and refuses to allocate if this exceeds some maximum.

class Eel(object):
MAX_EELS = 20
n_eels = 0

def __new__(cls, *args, **kwargs):
if cls.n_eels == cls.MAX_EELS:
raise HovercraftFull()

obj = super(Eel, cls).__new__(cls)
cls.n_eels += 1
return obj

def __init__(self, voltage):
self.voltage = voltage

def __del__(self):
type(self).n_eels -= 1

def electric(self):
"""Is this an electric eel?"""
return self.voltage > 0

Mind you, there are smarter ways to accomplish this behavior.

How can I automate the creation of instances of a class?

# Create Class 
class Player:
def greating(self):
print 'Hello!'

# List to store instanses
l = []

for i in range(4):
l.append(Player())

# Call Instance #1 methods
l[0].greating()

Here we have a player class and 4 instances from this class stored in l list.

Is it inefficient to highly frequently create short-lived new instances of a class?

The actual creation of an object with a very short lifetime is minuscule. Creating a new object pretty much just involves incrementing the heap pointer by the size of the object and zeroing out those bits. This isn't going to be a problem.

As for the actual collection of those objects, when the garbage collector performs a collection it is taking all of the objects still alive and copying them. Any objects not "alive" are not touched by the GC here, so they aren't adding work to collections. If the objects you create never, or very very rarely, are alive during a GC collection then they're not adding any costs there.

The one thing that they can do, is decrease the amount of currently available memory such that the GC needs to perform collections noticeably more often than it otherwise would. The GC will actually perform a collection when it needs some more memory. If you'r constantly using all of your available memory creating these short lived objects, then you could increase the rate of collections in your program.

Of course, it would take a lot of objects to actually have a meaningful impact on how often collections take place. If you're concerned you should spend some time measuring how often your program is performing collections with and without this block of code, to see what effect it is having. If you really are causing collections to happen much more frequently than they otherwise would, and you're noticing performance problems as a result, then consider trying to address the problem.

There are two avenues that come to mind for resolving this problem, if you have found a measurable increase in the number of collections. You could investigate the possibility of using value types instead of reference types. This may or may not make sense conceptually in context for you, and it may or may not actually help the problem. It'd depend way too much on specifics not mentioned, but it's at least something to look into. The other possibility is trying to aggressively cache the objects so that they can be re-used over time. This also needs to be looked at carefully, because it can greatly increase the complexity of a program and make it much harder to write programs that are correct, maintainable, and easy to reason about, but it can be an effective tool for reintroducing memory if used correctly.



Related Topics



Leave a reply



Submit