Python notes

ch03.dict_and_sets

Ch 3. Dictionaries and Sets

graph LR;

GG["Hashable?"] --> GGX["Stable Hash Code through __hash__()"]
GG --> GGY["Equality Comparison through __eq__()"]
GG --> GGZ["Consistency between Equality and Hashing"]
graph LR;
A["Dictionary Syntax"] --> X["Dictionary Comprehension\n"]
A --> Y["Unpacking dictionary with **\ninto several key=val"]
A --> Z["Packing several key=val\nas parameter, through **"]
A --> XX["Merging with |, |=, .update(), ..."]
A --> XY["Pattern Matching with Mappings"]
graph LR;
BB["Handling missing keys"] --> J["setdefault Method\nUpdate Mutable Values\nAvoids Second Search"]
BB --> BBX["Using defaultdict"]
BB --> BBY["implementing __missing__"]
graph LR;

D["Specialized Mappings"]
E["defaultdict"]
F["ChainMap\nmaintains references\nto a list of dictionaries or mappings"]
DDS["shelve.Shelf\nPersistent Storage for Python Objects"]
G["Counter"] --> GGX["Can be used as multiset"]
H["OrderedDict"] --> HX["Key Ordering in ==\nBackward Compatibility"]
H --> HY["popitem(last=False) can pop first inserted"]
H --> HZ["move_to_end() method\ne.g. for LRU like ops"]
I["UserDict\nBase Class for Custom Mappings"]

D --> E
D --> F
D --> G
D --> DDS
D --> H
D --> I
graph LR;
P["collections.abc Module"]
Q["Mapping and MutableMapping ABCs\nRuntime Type Checking"]
R["MappingProxyType\nImmutable Façade"]

P --> Q
P --> R
graph LR;
FF["set"] --> FFX["set"]
FFX --> FFXA["can use set comprehension"]
FF --> FFY["frozenset"]
FF --> FFZ["set operators"]

S["Dictionary Views\nEfficient .keys(), .values(), .items()"]
T["dict_keys and dict_items\nSupport frozenset Operators"]



S --> T
FFZ --> T

Introduction

  • Python dictionaries (dict) are a fundamental part of the language, serving as highly versatile data structures that support a wide range of functionalities.
  • They are a core component of Python's internal implementation and are optimized for performance.

Key Areas Where Dictionaries Are Used

  1. Class and Instance Attributes:

    • The attributes of classes and instances are stored as dictionaries. This allows for dynamic attribute management and flexible object models.
  2. Module Namespaces:

    • Each module in Python has its own namespace, represented as a dictionary. This enables efficient organization and access to variables and functions within modules.
  3. Function Keyword Arguments:

    • Keyword arguments in functions are passed and managed using dictionaries, enabling flexible and readable function calls.
  4. Built-in Types and Functions:

    • The __builtins__.__dict__ dictionary stores all built-in types, objects, and functions, providing quick access to core Python features.

Modern dict Syntax

  • Some require Python 3.9 (like the | operator) or Python 3.10 (like match/case).

Dictionary Comprehensions in Python

  • The syntax of a dictionary comprehension is similar to that of a list comprehension, with the addition of a colon (:) to separate keys and values. The general form is:
  • Dictionary comprehensions are typically more efficient than constructing dictionaries with loops because they are optimized internally by Python.
{key_expression: value_expression for item in iterable if condition}
  • key_expression: Expression for the key in the resulting dictionary.
  • value_expression: Expression for the value in the resulting dictionary.
  • iterable: Any iterable from which the elements are taken.
  • condition: An optional condition to filter elements.
dial_codes = [
    (880, 'Bangladesh'),
    (55, 'Brazil'),
    (86, 'China'),
    (91, 'India'),
    (62, 'Indonesia'),
    (81, 'Japan'),
    (234, 'Nigeria'),
    (92, 'Pakistan'),
    (7, 'Russia'),
    (1, 'United States'),
]


# Swap the pairs: country is the key, and code is the value
country_dial = {country: code for code, country in dial_codes}

Using Conditions in Dictionary Comprehensions

You can also apply conditions to filter the items in the comprehension:

# Create a dictionary of country codes less than 70, with country names in uppercase
filtered_country_dial = {
  code: country.upper()
    for country, code in sorted(country_dial.items()) if code < 70}

# Display the resulting dictionary
print(filtered_country_dial)

Unpacking Mappings in Python

  • Python's unpacking generalizations, introduced in PEP 448, provide enhanced support for unpacking mappings.
  • These enhancements, available since Python 3.5, allow for more flexible and expressive manipulation of dictionaries and other mapping types.

1. Multiple Mapping Unpacking in Function Calls

  • Prior to Python 3.5, you could only unpack a single dictionary into a function's keyword arguments using **.
  • With PEP 448, you can unpack multiple mappings into a single function call, as long as all keys are strings and unique across all mappings.
# Function that accepts any number of keyword arguments
def dump(**kwargs):
    return kwargs

# Unpacking multiple dictionaries into a function call
result = dump(**{'x': 1}, y=2, **{'z': 3})

# Displaying the result
print(result)
{'x': 1, 'y': 2, 'z': 3}
  • The function dump(**kwargs) accepts any number of keyword arguments and returns them as a dictionary.
  • The function call dump(**{'x': 1}, y=2, **{'z': 3}) unpacks multiple mappings into the kwargs dictionary.
  • The keys 'x', 'y', and 'z' are unique across the mappings, so the unpacking succeeds without any conflicts.
Key Considerations
  • Unique Keys:

    • All keys must be strings and unique across the unpacked mappings, as duplicate keyword arguments are not allowed in function calls.
  • Flexible Argument Passing:

    • This feature enables more flexible function calls, allowing you to dynamically compose keyword arguments from multiple sources.

2. Unpacking Mappings in Dictionary Literals

  • PEP 448 also introduced the ability to use ** unpacking within dictionary literals, allowing you to merge multiple mappings into a single dictionary.
  • This syntax supports multiple unpackings and allows duplicate keys, with later occurrences overwriting previous ones.
Example: Unpacking Mappings in Dictionary Literals
# Unpacking multiple mappings into a dictionary literal
merged_dict = {'a': 0, **{'x': 1}, 'y': 2, **{'z': 3, 'x': 4}}

# Displaying the merged dictionary
print(merged_dict)
{'a': 0, 'x': 4, 'y': 2, 'z': 3}
  • The dictionary literal {'a': 0, **{'x': 1}, 'y': 2, **{'z': 3, 'x': 4}} merges multiple mappings into a single dictionary.
  • Duplicate keys are allowed, and later occurrences overwrite previous ones. In this example, the key 'x' appears twice, with the final value being 4.
Use Cases
  • Merging Dictionaries:

    • Dictionary unpacking is useful for merging multiple dictionaries, especially when the order of precedence for keys matters.
  • Constructing Complex Dictionaries:

    • This syntax allows for the construction of complex dictionaries from multiple sources, facilitating cleaner and more expressive code.
Alternative Ways to Merge Dictionaries
  1. Using update() Method
# Using the update() method to merge dictionaries
dict1 = {'a': 0, 'x': 1}
dict2 = {'y': 2, 'z': 3, 'x': 4}

# Creating a copy of dict1 and updating it with dict2
merged_dict = dict1.copy()
merged_dict.update(dict2)

# Displaying the merged dictionary
print(merged_dict)
{'a': 0, 'x': 4, 'y': 2, 'z': 3}

Merging Mappings with the | and |= Operators

Using the | Operator to Merge Dictionaries

  • The | operator is used to merge two dictionaries into a new one. The resulting dictionary contains all key-value pairs from both dictionaries, with keys from the right-hand operand overwriting keys from the left-hand operand in case of conflicts.
# Define two dictionaries
d1 = {'a': 1, 'b': 3}
d2 = {'a': 2, 'b': 4, 'c': 6}

# Merge dictionaries using the | operator
merged_dict = d1 | d2

# Display the merged dictionary
print(merged_dict)
{'a': 2, 'b': 4, 'c': 6}

Behavior with Different Types

  • The type of the resulting dictionary is usually the same as the type of the left operand (d1 in this case).
  • However, if user-defined dictionary-like types are involved, the behavior may vary according to operator overloading rules.

Using the |= Operator to Update Dictionaries

  • The |= operator updates a dictionary in place by merging another dictionary into it.
  • This operation modifies the original dictionary rather than creating a new one.
# Define two dictionaries
d1 = {'a': 1, 'b': 3}
d2 = {'a': 2, 'b': 4, 'c': 6}

# Display original dictionary
print("Original d1:", d1)

# Update d1 in place using the |= operator
d1 |= d2

# Display the updated dictionary
print("Updated d1:", d1)
Original d1: {'a': 1, 'b': 3}
Updated d1: {'a': 2, 'b': 4, 'c': 6}

Using update()

# Using the update() method to merge dictionaries
d1 = {'a': 1, 'b': 3}
d2 = {'a': 2, 'b': 4, 'c': 6}

# Merge dictionaries
d1.update(d2)
print(d1)
{'a': 2, 'b': 4, 'c': 6}

Using Dictionary Unpacking (Python 3.5+)

# Using dictionary unpacking to merge dictionaries
d1 = {'a': 1, 'b': 3}
d2 = {'a': 2, 'b': 4, 'c': 6}

# Merge dictionaries using unpacking
merged_dict = {**d1, **d2}
print(merged_dict)
{'a': 2, 'b': 4, 'c': 6}
  • Explanation:
    • Dictionary unpacking {**d1, **d2} creates a new dictionary by unpacking key-value pairs from d1 and d2.

Pattern Matching with Mappings in Python

  • Mapping patterns are designed to match dictionary-like structures, which include any actual or virtual subclass of collections.abc.Mapping.
  • The syntax for mapping patterns resembles dictionary literals, but with the ability to match specific key-value pairs and even nest other patterns, such as sequence patterns.

Example: Extracting Data with Mapping Patterns

def get_creators(record: dict) -> list:
    match record:
        case {'type': 'book', 'api': 2, 'authors': [*names]}:
            return names
        case {'type': 'book', 'api': 1, 'author': name}:
            return [name]
        case {'type': 'book'}:
            raise ValueError(f"Invalid 'book' record: {record!r}")
        case {'type': 'movie', 'director': name}:
            return [name]
        case _:
            raise ValueError(f'Invalid record: {record!r}')
  1. Pattern: {'type': 'book', 'api': 2, 'authors': [*names]}

    • Description: Matches any mapping with 'type' equal to 'book', 'api' equal to 2, and an 'authors' key whose value is a sequence.
    • Action: The variable names is bound to the list of authors, which is then returned.
  2. Pattern: {'type': 'book', 'api': 1, 'author': name}

    • Description: Matches any mapping with 'type' equal to 'book', 'api' equal to 1, and an 'author' key whose value is a single object.
    • Action: The single author name is returned inside a list.
  3. Pattern: {'type': 'book'}

    • Description: Matches any other mapping with 'type' equal to 'book' but missing required keys for specific versions.
    • Action: Raises a ValueError, indicating an invalid record for books.
  4. Pattern: {'type': 'movie', 'director': name}

    • Description: Matches any mapping with 'type' equal to 'movie' and a 'director' key.
    • Action: The director's name is returned inside a list.
  5. Pattern: _

    • Description: A catch-all pattern that matches anything not matched by previous cases.
    • Action: Raises a ValueError, indicating an invalid record type.

Handling of Mapping Patterns

  • Order-Independent Matching:

    • The order of keys in patterns is irrelevant.
    • This means patterns can match against mappings regardless of how keys are ordered, which is useful for matching dictionaries like OrderedDict.
  • Partial Matching:

    • Unlike sequence patterns, mapping patterns succeed on partial matches.
    • If all required keys are present in the mapping, a match occurs, even if extra keys exist in the subject.
  • Destructuring with Patterns:

    • Mapping patterns allow destructuring complex data structures with nested patterns, providing a clean way to extract data.
  • Combining with Other Patterns:

    • You can nest sequence patterns within mapping patterns to handle nested data structures.

Handling Missing Keys

  • In the context of pattern matching, a match succeeds only if the required keys are present in the subject.
  • Pattern matching uses the d.get(key, sentinel) method internally, ensuring that missing keys result in a failed match rather than triggering default values or exceptions.
>>> b1 = dict(api=1, author='Douglas Hofstadter',
...           type='book', title='Gödel, Escher, Bach')
>>> get_creators(b1)
['Douglas Hofstadter']
  • Explanation:
    • The first example matches the second pattern ({'type': 'book', 'api': 1, 'author': name}) and extracts the author name.
>>> from collections import OrderedDict
>>> b2 = OrderedDict(api=2, type='book',
...                  title='Python in a Nutshell',
...                  authors='Martelli Ravenscroft Holden'.split())
>>> get_creators(b2)
['Martelli', 'Ravenscroft', 'Holden']
  • Explanation:
    • The second example matches the first pattern ({'type': 'book', 'api': 2, 'authors': [*names]}) and returns the list of authors.
>>> get_creators({'type': 'book', 'pages': 770})
Traceback (most recent call last):
 ...
ValueError: Invalid 'book' record: {'type': 'book', 'pages': 770}
  • Explanation:
    • The third example matches the third pattern ({'type': 'book'}) but lacks the required keys for valid book records, raising a ValueError.
>>> get_creators('Spam, spam, spam')
Traceback (most recent call last):
 ...
ValueError: Invalid record: 'Spam, spam, spam'
  • Explanation:
    • The fourth example doesn't match any pattern and falls to the catch-all, raising a ValueError.

Capturing Additional Key-Value Pairs

  • When matching mapping patterns, you can capture additional key-value pairs using ** syntax. This is useful if you need to capture extra information beyond the required keys.
food = dict(category='ice cream', flavor='vanilla', cost=199)

match food:
    case {'category': 'ice cream', **details}:
        print(f'Ice cream details: {details}')

Output:

Ice cream details: {'flavor': 'vanilla', 'cost': 199}
  • Explanation:
    • The pattern {'category': 'ice cream', **details} matches any dictionary with 'category' equal to 'ice cream' and captures all other key-value pairs into the details dictionary.

Standard API of Mapping Types in Python

  • Python's standard library includes several built-in types and tools to work with mappings, most notably the dict type.
  • For more advanced use cases, Python provides abstract base classes (ABCs) that define interfaces for mapping types.
  • These are part of the collections.abc module and offer a formalized way to define custom mapping types that behave consistently with Python's built-in dictionary types.

Mapping and MutableMapping ABCs

The collections.abc module defines two key abstract base classes for mapping types:

  1. Mapping
  2. MutableMapping

Hashability of Keys

  • All standard and custom mappings built on top of hash tables share a common limitation: keys must be hashable.
  • Hashability is a requirement because hash tables use hash values of keys to store and retrieve data efficiently.

What Is Hashable in Python?

Definition of Hashable

An object in Python is considered hashable if it meets the following criteria:

  1. Stable Hash Code:

    • The object has a hash code that remains constant during its lifetime. This requires the object to implement a __hash__() method.
  2. Equality Comparison:

    • The object can be compared to other objects for equality, which means it implements an __eq__() method.
  3. Consistency between Equality and Hashing:

    • If two objects compare equal (__eq__() returns True), they must have the same hash code (__hash__() returns the same value). (e.g. if a == b, then hash(a) == hash(b))
    • This consistency is vital for the integrity of hash-based collections. When you look up an item in a dictionary or set, the hash code is used to find the correct bucket or slot where the item should be. Once in that bucket, the equality comparison is used to find the exact item. If the hash code were different for two objects that are considered equal, the collection might end up in a state where it cannot correctly locate or differentiate between these objects. This would lead to logical errors and data corruption.

Built-in Hashable Types

  • Immutable Numeric Types:

    • All built-in numeric types, such as int, float, and complex, are hashable because they are immutable.
  • Immutable Sequence Types:

    • Strings (str) and bytes (bytes) are hashable because they are immutable.
  • Tuples:

    • A tuple is hashable if all its elements are hashable. This makes tuples a common choice for keys in dictionaries if they contain hashable items.
  • Frozensets:

    • A frozenset is always hashable because it is immutable and requires that all its elements are hashable.

Hashable Tuple

# A tuple containing hashable elements
tt = (1, 2, (30, 40))

# Calculate hash
hash_tt = hash(tt)
print("Hash of hashable tuple:", hash_tt)
Hash of hashable tuple: 8027212646858338501

Unhashable Tuple

# A tuple containing an unhashable list
tl = (1, 2, [30, 40])

# Attempt to calculate hash
try:
    hash_tl = hash(tl)
except TypeError as e:
    print("Error:", e)
Error: unhashable type: 'list'

Hashable Tuple with Frozenset

# A tuple containing a frozenset, which is hashable
tf = (1, 2, frozenset([30, 40]))

# Calculate hash
hash_tf = hash(tf)
print("Hash of hashable tuple with frozenset:", hash_tf)
Hash of hashable tuple with frozenset: -4118419923444501110

Considerations for Hashing

Hash Code Variability

  • Platform and Version Dependent:
    • The hash code of an object may vary across different Python versions, machine architectures, and due to randomization for security reasons (a feature known as hash randomization).
    • Thus, hash values are only consistent within a single Python process.

User-Defined Types

  • Default Behavior:

    • User-defined types are hashable by default because their hash code is derived from their id(), and the __eq__() method inherited from the object class compares object IDs.
  • Custom Equality and Hashing:

    • If a user-defined class implements a custom __eq__() method to compare objects based on internal state, it must also implement a consistent __hash__() method. The __hash__() method should only consider immutable attributes to ensure that the hash code does not change over the object's lifetime.

Example: Custom Hashable Class

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __eq__(self, other):
        if isinstance(other, Point):
            return self.x == other.x and self.y == other.y
        return False

    def __hash__(self):
        return hash((self.x, self.y))

# Usage
p1 = Point(1, 2)
p2 = Point(1, 2)

# Compare points
print(p1 == p2)  # Output: True

# Hash codes
print(hash(p1))  # Example output: -3550055125485641917
print(hash(p2))  # Same output as above since they are equal
  • Explanation:
    • The Point class implements __eq__() and __hash__() based on its x and y attributes. Since these attributes are immutable, the class instances are hashable and behave correctly when used in hash-based collections.

Hashable Types and Mappings

  • Hashability is crucial for keys in mapping types like dict, defaultdict, and OrderedDict, as these data structures rely on hash codes for efficient key lookups.
  • Dictionaries and Sets:
    • Keys in dictionaries and elements in sets must be hashable because these collections use hash tables to store and retrieve data efficiently.

Common Mapping Methods in Python

Key Mapping Types

  1. dict: The standard dictionary type in Python, used for general-purpose key-value storage.
  2. defaultdict: A dictionary subclass that provides default values for missing keys.
  3. OrderedDict: A dictionary subclass that maintains the order of keys based on their insertion order.

update()

  • The update() method is an example of duck typing in Python. It updates the dictionary with elements from another dictionary or an iterable of key-value pairs. The method first checks if the argument has a keys() method, treating it as a mapping. If not, it assumes the argument is an iterable of key-value pairs.
# Using update with another dictionary
d1 = {'a': 1, 'b': 2}
d2 = {'b': 3, 'c': 4}
d1.update(d2)
print(d1)  # Output: {'a': 1, 'b': 3, 'c': 4}

# Using update with an iterable of pairs
d3 = {'x': 5}
d3.update([('y', 6), ('z', 7)])
print(d3)  # Output: {'x': 5, 'y': 6, 'z': 7}

Inserting or Updating Mutable Values in a Dictionary

Using dict.get() vs. dict.setdefault()

  • When working with dictionaries, especially when updating mutable values such as lists, using dict.get() might not be the most efficient approach. Instead, dict.setdefault() can simplify the code and improve performance by reducing the number of key lookups.

  • Consider a script to index words in a text file, mapping each word to a list of occurrences where the word appears. Each occurrence is represented as a pair of line and column numbers.

    """Build an index mapping word -> list of occurrences"""
    import re
    import sys
    
    WORD_RE = re.compile(r'\w+')
    index = {}
    
    with open(sys.argv[1], encoding='utf-8') as fp:
        for line_no, line in enumerate(fp, 1):
            for match in WORD_RE.finditer(line):
                word = match.group()
                column_no = match.start() + 1
                location = (line_no, column_no)
                # This is inefficient
                occurrences = index.get(word, [])
                occurrences.append(location)
                index[word] = occurrences
    • The script reads a text file and builds an index of word occurrences.
    • dict.get(word, []) retrieves the list of occurrences for a word, or an empty list if the word is not found.
    • After updating the list, it assigns it back to the dictionary, resulting in redundant key lookups.
    """Build an index mapping word -> list of occurrences"""
    import re
    import sys
    
    WORD_RE = re.compile(r'\w+')
    index = {}
    
    with open(sys.argv[1], encoding='utf-8') as fp:
        for line_no, line in enumerate(fp, 1):
            for match in WORD_RE.finditer(line):
                word = match.group()
                column_no = match.start() + 1
                location = (line_no, column_no)
                # Efficiently retrieve or create the list of occurrences
                index.setdefault(word, []).append(location)
    
  • Explanation:

    • index.setdefault(word, []) retrieves the list of occurrences for a word or sets it to an empty list if the word is not found.
    • The list is updated in place, eliminating redundant key lookups.

Performance Comparison

The code snippet using setdefault():

my_dict.setdefault(key, []).append(new_value)

is equivalent to the longer version using if statements:

if key not in my_dict:
    my_dict[key] = []
my_dict[key].append(new_value)
  • Key Differences:
    • The setdefault() approach performs a single lookup.
    • The if statement approach performs at least two lookups (three if the key is not found).

Handling Missing Keys Efficiently

  • Using dict.setdefault() is particularly useful when dealing with mutable values, such as lists or dictionaries, where you want to insert or update items efficiently. This method ensures that a key exists in the dictionary and retrieves or initializes the associated value in one step.

Automatic Handling of Missing Keys in Python

  • Handling missing keys in mappings (dictionaries) can be streamlined using two main approaches: using defaultdict from the collections module, or subclassing dict and implementing the __missing__ method.

1. Using defaultdict for Missing Keys

  • defaultdict is a subclass of dict that calls a factory function to supply missing values when a key is accessed but does not exist in the dictionary. This can simplify code by avoiding explicit checks for key existence.
How defaultdict Works
  • When you instantiate a defaultdict, you provide a callable that produces the default value. This callable is stored in the instance attribute default_factory. When a missing key is accessed, defaultdict:
  1. Calls the default_factory to create a new value.
  2. Inserts the key and the newly created value into the dictionary.
  3. Returns the newly created value.
"""Build an index mapping word -> list of occurrences"""
import collections
import re
import sys

WORD_RE = re.compile(r'\w+')
index = collections.defaultdict(list) # <--------------------

with open(sys.argv[1], encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start() + 1
            location = (line_no, column_no)
            index[word].append(location) # <--------------------
  • Explanation:
    • The defaultdict is created with list as the default_factory.
    • For each word, if it’s not already in the dictionary, defaultdict automatically creates a new list and appends the location.

2. Using __missing__ Method in Custom Dictionary Subclasses

  • The __missing__ method provides another way to handle missing keys in a dictionary subclass. This method is automatically invoked by dict.__getitem__() when a key is not found.
  • The __missing__ method only affects __getitem__() and not other dictionary methods like get() or membership tests.
class DefaultDict(dict):
    def __init__(self, default_factory):
        super().__init__()
        self.default_factory = default_factory

    def __missing__(self, key):
        if self.default_factory is None:
            raise KeyError(key)
        self[key] = value = self.default_factory()
        return value

# Usage
dd = DefaultDict(list)

dd['a'].append(1)
print(dd)  # Output: {'a': [1]}

dd['b'].append(2)
print(dd)  # Output: {'a': [1], 'b': [2]}

Variations of dict in Python

collections.OrderedDict

  • OrderedDict is a dictionary subclass that maintains the order of keys based on their insertion order.
  • While this feature has been incorporated into the regular dict in Python 3.6+, OrderedDict still has specific use cases and characteristics that differentiate it from the built-in dict.

Key Differences Between OrderedDict and dict

  1. Equality Operation:

    • In OrderedDict, the equality operation (==) checks not only that the keys and values are identical but also that the order of the keys is the same.
    • In a regular dict, equality is based solely on the keys and values, regardless of their order.
    from collections import OrderedDict
    
    d1 = OrderedDict([('a', 1), ('b', 2)])
    d2 = OrderedDict([('b', 2), ('a', 1)])
    print(d1 == d2)  # Output: False, because the order is different
    
    d3 = {'a': 1, 'b': 2}
    d4 = {'b': 2, 'a': 1}
    print(d3 == d4)  # Output: True, because order does not matter in dict
  2. popitem() Method:

    • OrderedDict.popitem() has an additional argument that allows you to specify whether to remove and return the last or the first inserted key-value pair.
    • In a regular dict, popitem() only removes and returns the last inserted key-value pair (reflecting insertion order in Python 3.7+).
    d = OrderedDict([('a', 1), ('b', 2)])
    print(d.popitem(last=False))  # Output: ('a', 1), removes the first item
    print(d.popitem())  # Output: ('b', 2), removes the last item
  3. move_to_end() Method:

    • OrderedDict has a move_to_end(key, last=True) method, which efficiently moves an existing key to either end of the dictionary.
    • This method is particularly useful in scenarios like implementing an LRU (Least Recently Used) cache.
    d = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
    d.move_to_end('b')
    print(d)  # Output: OrderedDict([('a', 1), ('c', 3), ('b', 2)])
  4. Design Priorities:

    • dict:
      • Optimized for mapping operations like lookups and updates.
      • Tracking insertion order is secondary (but now included in Python 3.6+).
    • OrderedDict:
      • Optimized for operations involving reordering, such as moving elements to different positions.
      • This makes it more suitable for use cases that require frequent reordering, like implementing a cache.
  5. Efficiency Considerations:

    • OrderedDict:
      • Better suited for frequent reordering operations, such as tracking recent accesses.
      • Less space-efficient and slower in iteration and update operations compared to a regular dict.
    • dict:
      • More space-efficient and faster for general-purpose usage where order preservation is needed but reordering is not frequent.

Use Cases for OrderedDict

Even though the built-in dict maintains insertion order since Python 3.6+, OrderedDict remains relevant in specific scenarios:

  • Backward Compatibility: Code that needs to maintain compatibility with Python versions before 3.6.
  • Order-Sensitive Equality: When the order of keys matters in equality comparisons.
  • Efficient Reordering: Use cases where frequent reordering of elements is required, such as implementing LRU caches or other order-sensitive algorithms.

collections.ChainMap

  • ChainMap is a convenient tool from the collections module in Python that allows you to group multiple dictionaries or mappings together and treat them as a single unit.
  • It performs lookups across the grouped mappings in the order they were added, which is particularly useful in scenarios like variable scopes in programming languages, where a lookup should proceed from the innermost scope outward.

How ChainMap Works

  • A ChainMap instance maintains references to a list of dictionaries or mappings, rather than copying them.
  • When you attempt to retrieve a value using a key, ChainMap checks each mapping in the order they were provided during the creation of the ChainMap instance. The lookup stops as soon as a key is found.
from collections import ChainMap

# Define two dictionaries
d1 = dict(a=1, b=3)
d2 = dict(a=2, b=4, c=6)

# Create a ChainMap instance
chain = ChainMap(d1, d2)

# Lookups
print(chain['a'])  # Output: 1 (from d1)
print(chain['c'])  # Output: 6 (from d2)

Mutable Operations on ChainMap

While ChainMap allows you to look up keys across multiple mappings, any updates, insertions, or deletions only affect the first dictionary in the chain.

# Modify the ChainMap
chain['c'] = -1

# Check the dictionaries
print(d1)  # Output: {'a': 1, 'b': 3, 'c': -1} (updated with new 'c' value)
print(d2)  # Output: {'a': 2, 'b': 4, 'c': 6}  (unchanged)

Use Cases for ChainMap

ChainMap is particularly useful in scenarios where multiple contexts or layers need to be managed simultaneously:

  1. Nested Scopes in Interpreters:

    • ChainMap can be used to model the behavior of nested variable scopes in programming languages. For example, local variables can be stored in the first dictionary, global variables in the second, and built-ins in the third.

      import builtins
      # mimics how Python looks up variables: it first checks
      # the local scope, then the global scope, and finally
      # the built-in scope.
      pylookup = ChainMap(locals(), globals(), vars(builtins))
  2. Configuration Management:

    • ChainMap can combine multiple configuration sources (e.g., default settings, user settings, and environment variables), allowing for layered configuration without overriding the original mappings.
    default_settings = {'theme': 'light', 'show_line_numbers': True}
    user_settings = {'theme': 'dark'}
    env_settings = {'show_line_numbers': False}
    
    config = ChainMap(env_settings, user_settings, default_settings)
    print(config['theme'])  # Output: 'dark'
    print(config['show_line_numbers'])  # Output: False
  3. Dynamic Scope Changes:

    • When implementing dynamic scopes or context-dependent variables, ChainMap can easily be updated to reflect the current state by adding or removing mappings.

collections.Counter

  • collections.Counter is a specialized dictionary subclass in Python designed for counting hashable objects. It allows you to easily count occurrences of elements, and it offers several powerful features for tallying and manipulating these counts.

  • Counter works by storing elements as dictionary keys and their counts as the corresponding values. You can initialize a Counter with an iterable, a dictionary, or by using keyword arguments.

    import collections
    
    # Create a Counter from a string
    ct = collections.Counter('abracadabra')
    
    # Output the counts
    print(ct)  # Output: Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
  • You can update the counts in a Counter using the .update() method, which accepts an iterable or another Counter.

    # Update the counter with more characters
    ct.update('aaaaazzz')
    
    # Output the updated counts
    print(ct)  # Output: Counter({'a': 10, 'z': 3, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
  • most_common([n]):

    • Returns a list of the n most common elements and their counts, sorted in descending order of frequency.
    # Get the 3 most common elements
    print(ct.most_common(3))  # Output: [('a', 10), ('z', 3), ('b', 2)]
  • Arithmetic Operations:

    • Counter supports addition (+), subtraction (-), union (|), and intersection (&) operations.
    ct1 = collections.Counter('hello')
    ct2 = collections.Counter('world')
    
    # Addition of counters
    print(ct1 + ct2)  # Output: Counter({'l': 3, 'o': 2, 'h': 1, 'e': 1, 'w': 1, 'r': 1, 'd': 1})
    
    # Subtraction of counters
    print(ct1 - ct2)  # Output: Counter({'h': 1, 'e': 1, 'l': 1})
  • elements():

    • Returns an iterator over elements, repeating each as many times as its count.
    ct = collections.Counter({'a': 2, 'b': 3, 'c': 1})
    print(list(ct.elements()))  # Output: ['a', 'a', 'b', 'b', 'b', 'c']

Using Counter as a Multiset

  • In mathematics, a multiset is a generalized concept of a set that allows multiple occurrences of its elements. Counter can be used as a multiset where each key represents an element, and its count represents the multiplicity (how many times it appears in the set).

    # Example of a multiset using Counter
    multiset = collections.Counter(['apple', 'banana', 'apple', 'orange', 'banana', 'apple'])
    
    # Display the multiset
    print(multiset)  # Output: Counter({'apple': 3, 'banana': 2, 'orange': 1})
    
    # Add another apple
    multiset.update(['apple'])
    print(multiset)  # Output: Counter({'apple': 4, 'banana': 2, 'orange': 1})
    
    # Remove an apple
    multiset.subtract(['apple'])
    print(multiset)  # Output: Counter({'apple': 3, 'banana': 2, 'orange': 1})
  • Explanation:

    • The Counter behaves like a multiset, allowing you to add or remove elements and adjusting the counts accordingly.

shelve.Shelf: Persistent Storage for Python Objects

  • The shelve module in Python's standard library offers a convenient way to persistently store a mapping of string keys to Python objects.
  • These objects are serialized using the pickle module, and the resulting data is stored in a DBM-style database.
  • The term "shelve" is a playful reference to storing pickle jars on shelves, as the data is pickled and saved on disk.

Key Features of shelve.Shelf

  1. Mapping Type:

    • shelve.Shelf is a subclass of abc.MutableMapping, meaning it behaves like a standard Python dictionary. You can use it to store and retrieve key-value pairs, where keys are strings and values are pickled Python objects.
  2. Persistence:

    • Unlike a standard dictionary, the data stored in a Shelf object is persistent, meaning it is saved to disk and can be reloaded across different Python sessions.
  3. Automatic Storage:

    • When you assign a new value to a key, shelve.Shelf automatically saves the key-value pair to disk.
  4. Context Manager Support:

    • Shelf objects can be used within a with statement to ensure they are properly closed after use, which helps avoid data corruption and ensures all data is written to disk.
  5. String Keys:

    • Keys must be strings, as the underlying DBM database requires this.
  6. Pickleable Values:

    • Values must be objects that can be serialized by the pickle module. This includes most built-in Python types, but some objects (like open file handles) cannot be pickled.

Basic Example of Using shelve.Shelf

import shelve

# Open a shelf (file on disk)
with shelve.open('my_shelf.db') as shelf:
    # Store data in the shelf
    shelf['name'] = 'Alice'
    shelf['age'] = 30
    shelf['hobbies'] = ['reading', 'hiking', 'coding']

    # Retrieve data from the shelf
    print(shelf['name'])     # Output: 'Alice'
    print(shelf['age'])      # Output: 30
    print(shelf['hobbies'])  # Output: ['reading', 'hiking', 'coding']

# Shelf is automatically closed at the end of the `with` block

Managing Shelf Storage

The shelve.Shelf class provides additional methods for managing the storage:

  • sync(): This method forces any unsaved data to be written to disk. It can be useful if you want to ensure that all data is saved at a specific point in your code.

    with shelve.open('my_shelf.db') as shelf:
        shelf['new_data'] = 'important information'
        shelf.sync()  # Ensure the data is written to disk immediately
  • close(): This method closes the shelf explicitly. While close() is called automatically when the shelf is used as a context manager, you can call it manually if not using the with statement.

    shelf = shelve.open('my_shelf.db')
    shelf['more_data'] = 'some value'
    shelf.close()  # Manually close the shelf

Caveats and Considerations

  1. Serialization Limitations:

    • Not all Python objects can be pickled and stored in a Shelf. Objects like file handles, database connections, and certain custom objects may not be serializable.
  2. Concurrent Access:

    • The shelve module is not designed for concurrent access from multiple processes or threads. If your application requires this, consider using a more robust database solution.
  3. Potential Risks with pickle:

    • The pickle module has several known issues and security concerns, particularly when loading data from untrusted sources. It's recommended to understand these risks and consider alternative serialization formats if security is a concern.
  4. Database Size:

    • Shelves are backed by DBM-style databases, which may have limitations on the size and number of entries they can handle efficiently.

When to Use shelve.Shelf

  • Prototyping and Small Projects: shelve is ideal for simple use cases where you need a quick way to persistently store data between sessions without setting up a full database.
  • Lightweight Data Persistence: If your application needs to store a small number of key-value pairs persistently and without much overhead, shelve is a good fit.
  • Temporary or Intermediate Storage: Use shelve for temporary storage during computations or as a quick solution during development.

Subclassing UserDict Instead of dict

  • When creating a new mapping type in Python, it's often better to subclass collections.UserDict rather than directly subclassing dict.
  • This approach offers several advantages, primarily because UserDict provides a more flexible and less error-prone foundation for creating custom dictionary-like objects.

Key Advantages of Subclassing UserDict

  1. Avoiding Implementation Shortcuts:

    • The built-in dict has various internal implementation shortcuts that can complicate subclassing. For example, directly subclassing dict might force you to override several methods to ensure consistent behavior across all dictionary operations.
    • UserDict, on the other hand, avoids these complications by using composition. It stores the actual data in an internal dictionary called data, making it easier to manage and extend.
  2. Simpler Method Overrides:

    • With UserDict, you often only need to override a few methods (__setitem__, __getitem__, etc.) without worrying about internal optimizations or side effects present in dict.
  3. Composition over Inheritance:

    • UserDict is not a subclass of dict. Instead, it wraps a dict internally (self.data). This composition approach makes it easier to control how data is stored and manipulated, reducing the risk of unintended recursion or other issues.
import collections

class StrKeyDict(collections.UserDict):
    def __missing__(self, key):
        if isinstance(key, str):
            raise KeyError(key)
        return self[str(key)]

    def __contains__(self, key):
        return str(key) in self.data

    def __setitem__(self, key, item):
        self.data[str(key)] = item
  • __missing__(self, key):
    • This method handles cases where a key is not found in the dictionary. If the key is not a string, it converts it to a string and retries the lookup.
  • __contains__(self, key):
    • This method checks if the string version of the key is in the dictionary, ensuring consistency in key handling.
  • __setitem__(self, key, item):
    • This method ensures that all keys are converted to strings before being added to the dictionary. This simplifies key management and avoids issues with mixed key types.

Inheritance from UserDict and ABCs

  • UserDict inherits from abc.MutableMapping, which provides default implementations for many standard dictionary methods. This inheritance simplifies the creation of fully functional dictionary-like objects:

  • MutableMapping.update:

    • This method allows bulk updates from other mappings or iterables. It uses self[key] = value internally, which ensures that the custom __setitem__ logic is applied.
  • Mapping.get:

    • The get method is inherited from Mapping, and it behaves consistently with __getitem__, avoiding the need for custom implementations.

Immutable Mappings

  • Problem: Standard library mappings are mutable, which can lead to accidental modifications.

    • For example, in hardware programming libraries like Pingo, the board.pins mapping represents physical GPIO pins, and altering this mapping could create inconsistencies with the actual hardware.
  • Solution: Use MappingProxyType from the types module to create a read-only, dynamic proxy for the original mapping.

    • Behavior:
      • The mappingproxy instance reflects changes made to the original mapping but prevents modifications via the proxy itself.
  • Example:

    from types import MappingProxyType
    
    d = {1: 'A'}
    d_proxy = MappingProxyType(d)
    
    # Access through proxy
    print(d_proxy)  # Output: mappingproxy({1: 'A'})
    print(d_proxy[1])  # Output: 'A'
    
    # Attempting modification through proxy raises an error
    d_proxy[2] = 'x'  # TypeError: 'mappingproxy' object does not support item assignment
    
    # Modifications in the original mapping are reflected in the proxy
    d[2] = 'B'
    print(d_proxy)  # Output: mappingproxy({1: 'A', 2: 'B'})
    print(d_proxy[2])  # Output: 'B'

Dictionary Views

  • Overview:

    • The dict methods .keys(), .values(), and .items() return instances of dict_keys, dict_values, and dict_items respectively.
    • These views provide a read-only, memory-efficient way to access dictionary data, unlike older methods in Python 2 that duplicated data by returning lists.
  • Key Features:

    • Read-Only Projections:
      • These views are read-only and directly reflect the current state of the dictionary.
    • Memory Efficiency:
      • Views avoid the memory overhead associated with creating duplicate lists of dictionary content.
    • Dynamic Nature:
      • Changes to the dictionary are immediately visible in any existing view.
  • Example:

    d = dict(a=10, b=20, c=30)
    values = d.values()
    
    # View representation
    print(values)  # Output: dict_values([10, 20, 30])
    
    # Length of the view
    print(len(values))  # Output: 3
    
    # Convert view to list
    print(list(values))  # Output: [10, 20, 30]
    
    # Reverse iteration over the values
    print(reversed(values))  # Output: <dict_reversevalueiterator object>
    
    # Attempting to index the view raises an error
    values[0]  # TypeError: 'dict_values' object is not subscriptable
  • Dynamic Updates:

    • If the dictionary is modified, the view reflects these changes automatically:

      d['z'] = 99
      print(d)  # Output: {'a': 10, 'b': 20, 'c': 30, 'z': 99}
      print(values)  # Output: dict_values([10, 20, 30, 99])
  • Internal Classes:

    • dict_keys, dict_values, and dict_items are internal classes in Python.

    • They are not accessible directly via standard library modules or builtins.

    • Attempts to instantiate these classes directly result in errors:

      values_class = type({}.values())
      v = values_class()  # TypeError: cannot create 'dict_values' instances
  • Functionality:

    • dict_values: Supports basic methods like __len__, __iter__, and __reversed__.
    • dict_keys and dict_items: In addition to these methods, they support several set operations, similar to the frozenset class.

Practical Consequences of How dict Works

  • Key Characteristics:

    • Keys Must Be Hashable:
      • Keys must be objects that implement proper __hash__ and __eq__ methods.
    • Fast Item Access:
      • Python can quickly locate a key, even in a dict with millions of keys, by computing the key's hash code and finding the corresponding index in the hash table. This process usually requires only a few comparisons.
    • Preserved Key Ordering:
      • From Python 3.6 (official in 3.7), dict preserves the insertion order of keys. This is a result of a more compact memory layout in CPython.
  • Memory Considerations:

    • Memory Overhead:
      • Although dict is now more compact, it still requires significant memory. A hash table needs to store more data per entry compared to an array of pointers, and to maintain efficiency, at least one-third of the hash table must remain empty.
  • Instance Attributes and Memory Optimization:

    • Avoid Creating Attributes Outside __init__:
      • To save memory, define all instance attributes within the __init__ method.
    • PEP 412 - Key-Sharing Dictionary:
      • Implemented in Python 3.3, this optimization allows instances of the same class to share a common hash table for their __dict__ attributes, provided they have identical attribute names after __init__ runs.
      • Impact:
        • This shared hash table reduces memory usage by 10% to 20% in object-oriented programs.
        • If an attribute is added outside of __init__, a new hash table is created for that particular instance, increasing memory usage.

Set Theory in Python

  • Sets in Python, including set and frozenset, are used to create collections of unique objects.

    • set: Mutable, unordered collection.
    • frozenset: Immutable version of a set.
  • Elements in a set must be hashable.

    • set objects are not hashable, so you can't have nested sets.
    • frozenset is hashable, allowing it to be an element within another set.
  • Common operations:

    • Union: a | b
    • Intersection: a & b
    • Difference: a - b
    • Symmetric Difference: a ^ b
  • On-the-Fly Set Creation:

    • If needles and haystack are not sets, you can create them on the fly:

      found = len(set(needles) & set(haystack))
      # or
      found = len(set(needles).intersection(haystack))
    • Performance Note:

      • Building sets on the fly has an additional cost but can still be faster if one of the collections is already a set.

Set Literals

  • Syntax:

    • Set literals use {...} notation: {1}, {1, 2}, etc.

    • Empty Set:

      • No literal syntax for an empty set; use set().
      • {} creates an empty dictionary, not a set.
      s = {1}
      print(type(s))  # Output: <class 'set'>
      print(s)        # Output: {1}
      
      s.pop()         # Removes an element
      print(s)        # Output: set()  # Now empty
    • Literal set syntax ({1, 2, 3}) is faster and more readable than using set([1, 2, 3]).

    • The literal syntax leverages the BUILD_SET bytecode, which is optimized.

  • frozenset:

    • No literal syntax for frozenset; must use the constructor.
    frozenset(range(10))
    # Output: frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9})

Set Comprehensions (Setcomps)

  • Introduced in Python 2.7, allows building sets using comprehension syntax, similar to list comprehensions.

    from unicodedata import name
    chars_with_sign = {chr(i) for i in range(32, 256) if 'SIGN' in name(chr(i), '')}
    print(chars_with_sign)
    # Output: {'§', '=', '¢', '#', '¤', '<', '¥', 'µ', '×', '$', '¶', '£', '©', '°', '+', '÷', '±', '>', '¬', '®', '%'}

Practical Consequences of How Sets Work

  • Hash Table Implementation:
    • Sets are implemented using hash tables, affecting performance and behavior.
  • Hashability:
    • Set elements must be hashable, implementing __hash__ and __eq__ methods.
  • Efficient Membership Testing:
    • Set membership tests are very fast, even with millions of elements.
  • Memory Overhead:
    • Sets use more memory than a low-level array, but this trade-off offers faster lookups.
  • Element Ordering:
    • Element order in a set is influenced by insertion order but is not reliable.
    • Adding elements might change the order due to the need for hash table resizing, which can occur if the table becomes more than two-thirds full.

Set Operations in Python

  • Usage Notes:

  • Overview:

    • Mutable Set Operations: These include methods that modify the original set in place.
    • Immutable Set Operations: These operations are available for both set and frozenset, but in-place modifications are not allowed on frozenset.
  • Table of Common Set Operations:

    • Union (| or .union() method):
      • Python 3.5+ Syntax:
        • Use {*a, *b, *c, *d} to create a new set with the union of multiple iterables.
    • Intersection (& or .intersection() method):
    • Difference (- or .difference() method):
    • Symmetric Difference (^ or .symmetric_difference() method):
      • Returns elements in either set but not in both.
    • Subset (<= or .issubset() method):
      • Checks if one set is a subset of another.
    • Superset (>= or .issuperset() method):
      • Checks if one set is a superset of another.
  • In-Place Operations:

    • Intersection Update (&= or .intersection_update() method):

      • Modifies the original set to keep only elements found in both sets.
    • Difference Update (-= or .difference_update() method):

      • Modifies the original set to remove elements found in another set.
    • Symmetric Difference Update (^= or .symmetric_difference_update() method):

      • Modifies the original set to keep elements in either set but not in both.
    • Infix Operators: Require both operands to be sets.

    • Methods: Can take any iterable as an argument, allowing operations across different types of collections (e.g., lists, tuples).

Set Operations on dict Views

  • The view objects returned by dict.keys() and dict.items() behave similarly to frozenset and support powerful set operations such as intersection (&), union (|), difference (-), and symmetric difference (^).

  • These operations can be applied directly to compare keys and items between dictionaries, or between a dictionary view and a set.

    d1 = dict(a=1, b=2, c=3, d=4)
    d2 = dict(b=20, d=40, e=50)
    common_keys = d1.keys() & d2.keys()
    print(common_keys)  # Output: {'b', 'd'}
  • Compatibility with Sets:

    • dict_keys views can be directly compared with a set:

      s = {'a', 'e', 'i'}
      key_intersection = d1.keys() & s
      key_union = d1.keys() | s
      print(key_intersection)  # Output: {'a'}
      print(key_union)         # Output: {'a', 'c', 'b', 'd', 'i', 'e'}
    • Operations on dict_items:

      • dict_items views can also be treated like sets, but only if all dictionary values are hashable.

      • Example:

        d1 = dict(a=1, b=2, c=3)
        d2 = dict(a=1, c=4, d=5)
        common_items = d1.items() & d2.items()
        print(common_items)  # Output: {('a', 1)}
      • Important: If any value in a dict_items view is unhashable, attempting set operations will raise a TypeError.

  • Efficiency and Practicality:

    • Using set operators on dictionary views eliminates the need for loops and conditional statements when comparing dictionary contents.
    • Python's underlying C implementation makes these operations highly efficient, saving both development time and computational resources.