Python notes

ch11.a_pythonic_object

Ch 11. A Pythonic Object

graph LR;

    AAA["Pythonic?"] --> AAX["Making it easy and natural for Python programmers to perform tasks."]

    AAA --> AY["To build Pythonic objects\nobserve how real Python objects behave."]
graph LR;

    philosophy["Zen of Python: Simple is better than complex"]
    XXX["An object should be as simple\nas the requirements dictate\nand not a parade of language features."]
    XXY["If the code is for an application\nthen it should focus on\nwhat is needed to support the end users, not more."]
    XXZ["If the code is for a library for other programmers to use\nit’s reasonable to implement special methods\nsupporting behaviors that Pythonistas expect."]

    philosophy --> XXX
    XXX --> XXY
    XXX --> XXZ
graph LR;
    subgraph Context["Python Special Methods and Conventions"]
      S["Object Representation\n__repr__\n__str__\n__format__\n__bytes__"]
      U["Object to Number\n__abs__\n__bool__\n__hash__"]
      W["Equality and Hashing\n__eq__"]
    end
graph LR;
    Y["Supporting Bytes Conversion"] --> |Inspired by array.array.frombytes|Z["Alternative Constructor\n@classmethod frombytes()"]
    Z --> |Comparison|AA["@staticmethod\ndon't have special first arg: cls \notherwise same as @classmethod"]


    AC["__format__\nExtensible Format Spec"]
    AC --> AF["format(obj, format_spec)\n'{:format_spec}' in f-strings\nor str.format()"]
graph LR;
    AI["Python lacks a private modifier for variables."]
    AI --> |Why?| AIII["What’s the simplest thing that could possibly work?\nPublic attributes can be changed to properties later if needed.\navoid unnecessary up-front complexity"]

    AI --> AKK["@property decorator marks getter methods.\nWhich can wrap the actual members\nand creates private attribute"]
    AI --> AKJ["Attributes named with two leading underscores\n(e.g., __mood) are name-mangled.\ndirect access is still possible"]


    AM["__slots__ Attribute\nfor memory savings"]
    AM --> AO["Caveat: has side effects\nshould only used when\nmillions+ number of Instances"]
    AM --> AQ["Normally: Consider pandas instead"]




    AS["Techniaues:Class attributes\nas default values for instance attributes."] --> AT["Assigning a value to\nan undefined instance attribute\ncreates a new instance attribute.\n(and class attribute with same name intact)"]

Pythonic

  • Definition of Pythonic:

    • Making it easy and natural for Python programmers to perform tasks.
  • Expectations for Libraries/Frameworks:

    • Programmers expect custom classes to behave like built-in Python classes.
    • Meeting these expectations is a key aspect of being "Pythonic."
  • Implementing Special Methods:

    • Application-Specific Needs:

      • Only implement special methods if required by your application.
      • End users are not concerned with the "Pythonic" nature of internal objects.
    • Library Development:

      • For libraries intended for other Python programmers:
        • Expect diverse uses of your objects.
        • Implement more "Pythonic" behaviors to meet varied expectations.

"To build Pythonic objects, observe how real Python objects behave."

Simple is better than complex (Zen of Python).

Object Representations in Python

  • Standard String Representations:

    • repr():
      • Returns a string representation as seen by the developer.
      • Used by the Python console or debugger.
    • str():
      • Returns a string representation as seen by the user.
      • Used by the print() function.
  • Additional Special Methods:

    • bytes():
      • Analogous to str.
      • Called by bytes() to get the object as a byte sequence.
    • format():
      • Used by f-strings, format(), and str.format().
      • Calls obj.format(format_spec) for formatted string displays.
# from
# https://github.com/fluentpython/example-code-2e/blob/master/11-pythonic-obj/vector2d_v0.py
from array import array
import math


class Vector2d:
    # typecode is a class attribute

    typecode = 'd'  # <1>

    def __init__(self, x, y):
        self.x = float(x)    # <2>
        self.y = float(y)

    def __iter__(self):
        # Converting x and y to float in __init__ catches errors early, which
        # is helpful in case Vector2d is called with unsuitable arguments
        return (i for i in (self.x, self.y))  # <3>

    def __repr__(self):
        class_name = type(self).__name__
        #  interpolating the components with {!r} to get their repr; because
        # Vector2d is iterable, *self feeds the x and y components to format.
        return '{}({!r}, {!r})'.format(class_name, *self)  # <4>

    def __str__(self):
        return str(tuple(self))  # <5>

    def __bytes__(self):
        # To generate bytes, we convert the typecode to bytes and concatenate…
        # …bytes converted from an array built by iterating over the instance.
        return (bytes([ord(self.typecode)]) +  # <6>
                bytes(array(self.typecode, self)))  # <7>

    def __eq__(self, other):
        # when comparing Vector2d instances to other iterables holding the same
        # numeric values (e.g., Vector(3, 4) == [3, 4]), it's also True
        # This may be considered a feature or a bug
        return tuple(self) == tuple(other)  # <8>

    def __abs__(self):
        # hypotenuse
        return math.hypot(self.x, self.y)  # <9>

    def __bool__(self):
        return bool(abs(self))  # <10>
  • classmethod vs. staticmethod:

    • Purpose and Use Cases:
      • classmethod:
        • Defines a method that operates on the class, not instances.
        • Receives the class (cls) as the first argument.
        • Commonly used for alternative constructors.
      • staticmethod:
        • Defines a method that does not receive a special first argument.
        • Functions like a plain function within a class body.
        • Use cases are rare; sometimes used for closely related functions that do not touch the class.
  • Example Code:

    class Demo:
        @classmethod
        def klassmeth(*args):
            return args 
    
        @staticmethod
        def statmeth(*args):
            return args 
    
    # classmethod usage
    >>> Demo.klassmeth() 
    (<class '__main__.Demo'>,)
    
    >>> Demo.klassmeth('spam')
    (<class '__main__.Demo'>, 'spam')
    
    # staticmethod usage
    >>> Demo.statmeth() 
    ()
    
    >>> Demo.statmeth('spam')
    ('spam',)
    • Explanation:
      • klassmeth returns all positional arguments, always receiving the class (Demo) as the first argument.
      • statmeth behaves like a regular function, returning the positional arguments without receiving any special first argument.

Alternative Constructor with classmethod

class Vector2d:
    # ... other methods and attributes ...

    @classmethod  # <1>
    def frombytes(cls, octets):  # <2>
        typecode = chr(octets[0])  # <3>
        memv = memoryview(octets[1:]).cast(typecode)  # <4>
        return cls(*memv)  # <5>
  1. @classmethod decorator:
    • Modifies the method to be called directly on the class.
  2. Method Definition:
    • frombytes method, with cls as the first argument instead of self.
  3. Typecode Extraction:
    • Reads the typecode from the first byte of octets.
  4. Memoryview Creation:
    • Creates a memoryview from the binary sequence (octets[1:]) and casts it using the typecode.
  5. Object Creation:
    • Unpacks the memoryview into the pair of arguments needed for the constructor and returns a new instance of cls (i.e., Vector2d).

Formatted Displays

  • Delegation to format(format_spec):

    • f-strings, format(), and str.format() call .__format__(format_spec) on each type.
    • format_spec:
      • Second argument in format(my_obj, format_spec).
      • Appears after colon {} in f-strings or fmt in fmt.str.format().
  • Example Usages:

    brl = 1 / 4.82  # BRL to USD conversion rate
    >>> format(brl, '0.4f') 
    '0.2075'
    >>> '1 BRL = {rate:0.2f} USD'.format(rate=brl) 
    '1 BRL = 0.21 USD'
    >>> f'1 USD = {1 / brl:0.2f} BRL' 
    '1 USD = 4.82 BRL'
    • Formatting specifier 0.4f and 0.2f indicate precision for floating-point numbers.
    • {rate:0.2f} specifies the rate keyword argument's format.
  • Formatting Mini-Language:

    • Notation for formatting specifiers (e.g., 5.3e).
    • Example: '0.mass:5.3e':
      • 0.mass is the field_name.
      • 5.3e is the formatting specifier.
    • Study format() for understanding, then move to f-strings and str.format() for advanced formatting.
  • Built-in Types:

    • int: b for base 2, x for base 16.

    • float: f for fixed-point, % for percentage.

      >>> format(42, 'b')
      '101010'
      >>> format(2 / 3, '.1%')
      '66.7%'
  • DateTime Example:

    from datetime import datetime
    now = datetime.now()
    >>> format(now, '%H:%M:%S')
    '18:49:05'
    >>> "It's now {:%I:%M %p}".format(now)
    "It's now 06:49 PM"
  • Handling Custom Classes:

    • Classes without __format__ fallback to str().

      >>> v1 = Vector2d(3, 4)
      >>> format(v1)
      '(3.0, 4.0)'
  • Implementing Custom Formatting in Vector2d:

    • Format each component of the vector.
    def __format__(self, fmt_spec=''):
        components = (format(c, fmt_spec) for c in self) 
        return '({}, {})'.format(*components)
    • Polar Coordinates:

      • Use custom format specifier ending with 'p'.
      def angle(self):
          return math.atan2(self.y, self.x)
      
      def __format__(self, fmt_spec=''):
          if fmt_spec.endswith('p'):
              fmt_spec = fmt_spec[:-1]
              coords = (abs(self), self.angle())
              outer_fmt = '<{}, {}>'
          else:
              coords = self
              outer_fmt = '({}, {})'
          components = (format(c, fmt_spec) for c in coords)
          return outer_fmt.format(*components)
      • Examples:
      >>> format(Vector2d(1, 1), 'p')
      '<1.4142135623730951, 0.7853981633974483>'
      >>> format(Vector2d(1, 1), '.3ep')
      '<1.414e+00, 7.854e-01>'
      >>> format(Vector2d(1, 1), '0.5fp')
      '<1.41421, 0.78540>'

Make Attributes Immutable

  • Make x and y read-only properties.

    class Vector2d:
        typecode = 'd'
    
        def __init__(self, x, y):
            self.__x = float(x)  #1
            self.__y = float(y)
    
        @property  #2
        def x(self):  #3
            return self.__x  #4
    
        @property  #5
        def y(self):
            return self.__y
    
        def __iter__(self):
            return (i for i in (self.x, self.y))  #6
  1. (Convention) Use double leading underscores to make attributes private.
  2. @property decorator marks getter methods.
  3. Getter method x returns self.__x.
  4. Getter method y returns self.__y.
  5. Methods that read x and y should access these properties instead of private attributes.

Make Vector2d Hashable

  • Implementing __hash__:

    • Ensure it returns an int and considers object attributes used in __eq__.

      def __hash__(self):
          return hash((self.x, self.y))
    • Explanation:

      • Hash is computed using a tuple of x and y components.
  • Example Usage:

    >>> v1 = Vector2d(3, 4)
    >>> hash(v1)
    >>> v2 = Vector2d(3.1, 4.2)
    >>> hash(v1), hash(v2)

Additional Methods

  • Optional Methods:

    • Implement __int__, __float__, and __complex__ for additional type coercion.
    • Example: __complex__ could be added to support complex() constructor.
  • Implementation:

    def __int__(self):
        return int(self.x), int(self.y)
    
    def __float__(self):
        return float(self.x), float(self.y)
    
    def __complex__(self):
        return complex(self.x, self.y)

Supporting Positional Pattern Matching

  • Current Keyword Class Patterns:

    • Example:

      def keyword_pattern_demo(v: Vector2d) -> None:
          match v:
              case Vector2d(x=0, y=0):
                  print(f'{v!r} is null')
              case Vector2d(x=0):
                  print(f'{v!r} is vertical')
              case Vector2d(y=0):
                  print(f'{v!r} is horizontal')
              case Vector2d(x=x, y=y) if x == y:
                  print(f'{v!r} is diagonal')
              case _:
                  print(f'{v!r} is awesome')
    • These patterns work as expected.

  • Positional Pattern Issue:

    • Example of positional pattern causing an error:

      def positional_pattern_demo(v: Vector2d) -> None:
          match v:
              case Vector2d(_, 0):
                  print(f'{v!r} is horizontal')
    • Results in TypeError because Vector2d does not support positional sub-patterns.

  • Solution: Adding __match_args__ Attribute:

    • To support positional patterns, add __match_args__ to the Vector2d class.

      class Vector2d:
          __match_args__ = ('x', 'y')
          # other attributes and methods...
  • Updated Example with Positional Patterns:

    • Example:

      def positional_pattern_demo(v: Vector2d) -> None:
          match v:
              case Vector2d(0, 0):
                  print(f'{v!r} is null')
              case Vector2d(0):
                  print(f'{v!r} is vertical')
              case Vector2d(_, 0):
                  print(f'{v!r} is horizontal')
              case Vector2d(x, y) if x == y:
                  print(f'{v!r} is diagonal')
              case _:
                  print(f'{v!r} is awesome')
  • Explanation:

    • __match_args__ lists instance attributes for positional pattern matching.
    • Not all public instance attributes need to be included, particularly optional ones.

Private and Protected Attributes in Python

  • No True Private Variables:

    • Unlike Java, Python lacks a private modifier for variables.
    • Uses a simple mechanism to prevent accidental overwriting in subclasses.
  • Name Mangling:

    • Attributes named with two leading underscores (e.g., __mood) are name-mangled.

    • Name mangling stores the attribute with a prefix of the class name:

      v1 = Vector2d(3, 4)
      >>> v1.__dict__
      {'_Vector2d__y': 4.0, '_Vector2d__x': 3.0}
      >>> v1._Vector2d__x
      3.0
  • Purpose of Name Mangling:

    • Ensures safety by preventing accidental access.
    • Not designed for security; direct access is still possible.
  • Accessing Mangled Names:

    • Useful for debugging and serialization.
    • Directly accessing or modifying mangled names in production is discouraged.
  • Implications:

    • Attributes like __x and __y in Vector2d are "private" by convention.
    • Instances are "immutable" by design, but not enforced by Python.
  • Criticism and Alternatives:

    • Double-underscore mangling is sometimes disliked for being too private.
    • Single underscore prefix (e.g., _x) is preferred by some for "protected" attributes.
      • Indicates attributes should not be accessed from outside the class.
      • Considered "protected" by convention, not by the interpreter.
      • Widely respected among Python programmers.

Saving Memory with __slots__:

  • Default Behavior:

    • Python stores instance attributes in a __dict__ with significant memory overhead.
  • Using __slots__:

    • Define a class attribute __slots__ with a sequence of attribute names.
    • Stores attributes in a hidden array of references, using less memory.
  • Example:

    class Pixel:
        __slots__ = ('x', 'y')
    
    p = Pixel()
    >>> p.__dict__
    Traceback (most recent call last):
    ...
    AttributeError: 'Pixel' object has no attribute '__dict__'
    >>> p.x = 10
    >>> p.y = 20
    >>> p.color = 'red'
    Traceback (most recent call last):
    ...
    AttributeError: 'Pixel' object has no attribute 'color'
    • Effects:
      • Instances have no __dict__.
      • Setting an attribute not in __slots__ raises AttributeError.
  • Inheritance:

    • Subclasses with no __slots__ defined will have a __dict__.

    • Example:

      class OpenPixel(Pixel):
          pass
      
      op = OpenPixel()
      >>> op.__dict__  # Exists in subclass
      {}
      >>> op.x = 8
      >>> op.__dict__
      {}
      >>> op.color = 'green'
      >>> op.__dict__
      {'color': 'green'}
    • Partial Inheritance:

      • To ensure no __dict__ in subclasses, declare __slots__ again.

      • Use __slots__ = () to restrict attributes to those in the base class.

      • Superclasses' __slots__ are added to the current class's __slots__.

      • Example:

        class ColorPixel(Pixel):
            __slots__ = ('color',)
        
        cp = ColorPixel()
        >>> cp.__dict__
        Traceback (most recent call last):
        ...
        AttributeError: 'ColorPixel' object has no attribute '__dict__'
        >>> cp.x = 2
        >>> cp.color = 'blue'
        >>> cp.flavor = 'banana'
        Traceback (most recent call last):
        ...
        AttributeError: 'ColorPixel' object has no attribute 'flavor'
  • Including __dict__ and __weakref__:

    • Add '__dict__' to __slots__ to support dynamic attributes.

    • Necessary for decorators like @cached_property.

    • Example:

      class PixelWithDict:
          __slots__ = ('x', 'y', '__dict__')
    • Weak References:

      • Include '__weakref__' in __slots__ for weak reference support.

      • Example:

        class PixelWithWeakRef:
            __slots__ = ('x', 'y', '__weakref__')
  • Cautions:

    • Including '__dict__' may negate memory savings.
    • Careful optimization is needed to balance benefits and complexity.

Simple Measure of slot Savings

# Checks if exactly one command-line argument is given
# Imports the specified module dynamically using importlib.import_module.

import importlib

module = None
if len(sys.argv) == 2:
    module_name = sys.argv[1].replace('.py', '')
    module = importlib.import_module(module_name)
# Uses resource.getrusage to get the memory usage before and after creating
# the vector instances.
import resource

mem_init = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

print(f'Creating {NUM_VECTORS:,} {cls.__qualname__!r} instances')
vectors = [cls(3.0, 4.0) for i in range(NUM_VECTORS)]

mem_final = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

Run with

example-code-2e/11-pythonic-obj $ time python3 mem_test.py vector2d_v3
Selected Vector2d type: vector2d_v3.Vector2d
Creating 10,000,000 'Vector2d' instances
Initial RAM usage:          9,344
  Final RAM usage:      1,656,044

real    0m4.621s
user    0m4.306s
sys     0m0.310s

example-code-2e/11-pythonic-obj $  time python3 mem_test.py vector2d_v3_slots
Selected Vector2d type: vector2d_v3_slots.Vector2d
Creating 10,000,000 'Vector2d' instances
Initial RAM usage:          9,648
  Final RAM usage:        560,120

real    0m3.184s
user    0m3.061s
sys     0m0.120s

Summary of issues with __slots__

  • Memory Savings with Caveats:
    • Redeclaration in Subclasses:
      • __slots__ must be redeclared in each subclass to avoid the creation of __dict__.
    • Attribute Restrictions:
      • Instances can only have attributes listed in __slots__.
      • Including '__dict__' in __slots__ allows additional attributes but may negate memory savings.
    • Decorator Limitations:
      • Classes using __slots__ cannot use the @cached_property decorator unless '__dict__' is included in __slots__.
    • Weak References:
      • Instances cannot be targets of weak references unless '__weakref__' is included in __slots__.

Overriding Class Attributes

  • Class Attributes as Default Values:

    • Class attributes can serve as default values for instance attributes.
    • Example: typecode in Vector2d is used in the __bytes__ method.
    • Accessed via self.typecode, defaulting to Vector2d.typecode.
  • Instance Attribute Shadowing:

    • Assigning a value to an undefined instance attribute creates a new instance attribute.
    • This new instance attribute shadows the class attribute.
    • Allows customization of individual instances without altering the class attribute.
  • Example: Changing typecode for an Instance:

    from vector2d_v3 import Vector2d
    
    # Default `typecode` is `'d'` (8-byte double precision float).
    v1 = Vector2d(1.1, 2.2)
    dumpd = bytes(v1)
    >>> dumpd
    b'd\x9a\x99\x99\x99\x99\x99\xf1?\x9a\x99\x99\x99\x99\x99\x01@'
    >>> len(dumpd)
    17
    
    #  Setting `typecode` to `'f'` on an instance changes its export format to
    # 4-byte single precision float.
    v1.typecode = 'f'
    dumpf = bytes(v1)
    >>> dumpf
    b'f\xcd\xcc\x8c?\xcd\xcc\x0c@'
    >>> len(dumpf)
    9
    
    # Class attribute `Vector2d.typecode` remains unchanged
    >>> Vector2d.typecode
    'd'
    
  • Changing Class Attributes:

    • Change class attributes directly on the class.
    Vector2d.typecode = 'f'
  • Subclassing to Customize Class Attributes:

    • Create a subclass to customize class attributes more explicitly.
    from vector2d_v3 import Vector2d
    
    class ShortVector2d(Vector2d):
        typecode = 'f'
    
    sv = ShortVector2d(1/11, 1/27)
    >>> sv
    ShortVector2d(0.09090909090909091, 0.037037037037037035)
    >>> len(bytes(sv))
    9
  • Avoiding Hardcoding in __repr__:

    • Use type(self).__name__ to make __repr__ safer to inherit.
      • If I had hardcoded the class_name, subclasses of Vector2d like ShortVector2d would have to overwrite __repr__ just to change the class_name.
      • By reading the name from the type of the instance, I made __repr__ safer to inherit
    # inside class Vector2d:
    def __repr__(self):
        class_name = type(self).__name__
        return '{}({!r}, {!r})'.format(class_name, *self)
  • Summary:

    • Built a simple class leveraging the data model:
      • Different object representations.
      • Custom formatting code.
      • Read-only attributes.
      • Support for hash() to integrate with sets and mappings.

Java v.s. Python

  • Python: Public Attributes as Default:

    • Initial versions of Vector2d used public attributes x and y.
    • Users could access components via my_vector.x and my_vector.y.
    • Vectors are iterable and unpackable into variables.
  • Python with properties:

    • Properties were implemented to avoid accidental updates to x and y.
    • Public interface remained unchanged, verified by doctests.
    • Demonstrates flexibility to start with public attributes and later add control.
  • Comparison with Java:

    • Python:
      • Start with public attributes.
      • Add getters and setters later using properties without changing code that interacts with the attributes.
    • Java:
      • No properties; getters and setters are required from the start.
      • Cannot evolve from public attributes to getters and setters without breaking code.
      • Typing getter/setter calls is verbose and unnecessary in Python.
  • Why Python do this? Simplest Thing That Works:

    • Ward Cunningham's advice: "What’s the simplest thing that could possibly work?"
      • Focus on the goal; avoid unnecessary up-front complexity.
      • Public attributes can be changed to properties later if needed.
  • Python's Flexibility:

    • No strict enforcement of privacy.
    • Convention over configuration: use public attributes, convert to properties if necessary.

Comparison

  • A key aspect of Java's privacy guarantees: despite the use of access control modifiers, Java's reflection API can bypass these restrictions to access private fields.

Confidential.java

public class Confidential {
    private String secret = "";  // Private field
    public Confidential(String text) {
        this.secret = text.toUpperCase();  // Set field value in constructor
    }
}
  • Class Definition: The Confidential class has a private field secret and a constructor that initializes this field with the uppercased version of the input text.
  • Private Field: The secret field is marked private, which means it should not be accessible from outside the class directly.
#!/usr/bin/env jython
import Confidential  # Import the Java class using Jython

# Create an instance of Confidential from the Java class
message = Confidential('top secret text')

# Access the private field using reflection
secret_field = Confidential.getDeclaredField('secret')

# Allow access to the private field
secret_field.setAccessible(True)

# Access the value of the private field
print 'message.secret =', secret_field.get(message)

#$ jython expose.py
#message.secret = TOP SECRET TEXT
  • The script prints the value of the private field secret from the Confidential class instance, showing that Java's private modifier does not prevent access via reflection.

  • Security Implications: Although private fields are not intended to be accessible outside the class, reflection can bypass these protections. This can be a security concern if sensitive data is handled inappropriately.

  • SecurityManager:

    • Role: A SecurityManager can restrict reflective access and other sensitive operations.
    • Usage: In practice, SecurityManager is not commonly used or enforced, meaning reflection can often bypass access controls in typical applications.
  • Conclusion:

    • Python allows flexibility and simplicity in class design.
    • Start with public attributes and evolve to properties as needed.
    • Enjoy the power and responsibility of Python's design choices.