Python notes

fluent_python

Book

Ch 2. An Array of Sequences

  • Python provides various built-in sequence types with operations like iteration, slicing, and concatenation.

  • Sequences can be mutable or immutable, and container or flat.

    • Container sequences: Hold items of different types, e.g., list, tuple, collections.deque.
    • Flat sequences: Hold items of one simple type, e.g., str, bytes, array.array.
      • Flat sequences are more memory-efficient as they store primitive values directly.
    • Mutable sequences: Can be modified, e.g., list, bytearray, array.array.
    • Immutable sequences: Cannot be modified, e.g., tuple, str, bytes.
  • Python objects have a memory header with fields like ob_refcnt, ob_type, and ob_fval.

  • Built-in sequence types are virtual subclasses of Sequence and MutableSequence abstract base classes (ABCs).


  • List comprehensions: Create lists using a concise syntax ([expression for item in iterable if condition]).
  • Generator expressions: Similar to list comprehensions, but yield items one by one for memory efficiency ((expression for item in iterable if condition)).
    • Memory efficiency: Generators are preferable for large sequences due to lazy evaluation.
  • Readability: List comprehensions and generator expressions improve readability but should be used carefully for simple operations.
    • map and filter: Achieve similar results but are less readable, especially with nested lambda expressions.
  • Performance: List comprehensions are faster than loops; generator expressions save memory for large datasets.
    • Performance: List comprehensions are generally faster or equal to map and filter.
  • Syntax tips: Line breaks inside [], {}, and () are allowed, and trailing commas improve maintainability.
  • List comprehensions: More readable alternative to map and filter for filtering and transforming lists.
  • Local scope in comprehensions: Variables in comprehensions are local, preventing unintended side effects on outer variables.
    • Walrus operator (:=): Assigns values within comprehensions and keeps the variable accessible after execution.
  • Generator expressions: Yield items one by one, using the iterator protocol, for memory efficiency.
    • Syntax: Similar to list comprehensions but use parentheses () instead of brackets [].
    • Tuple and array initialization: Useful for initializing sequences like tuples and arrays without building the entire list in memory.
    • Memory efficiency: Ideal for large datasets where storing an entire list is costly.
    • Use cases: Suitable for scenarios requiring iteration over items one at a time, such as initializing sequences or working with large data.
  • Tuples as Records: Tuples can represent records where item position holds meaning, making them useful for structured data (e.g., coordinates, city data).
  • Tuples as Immutable Lists: Tuples are used as immutable lists, offering clarity (fixed size) and performance benefits.
  • Immutability: Tuples are immutable, but contained mutable objects can still be altered.
  • Performance Advantages: Tuples are more memory-efficient, have faster bytecode evaluation, and better memory allocation than lists.
  • Tuple Methods: Tuples lack list-specific mutability methods but support reversed().

  • Basic unpacking: Assigns items from an iterable to multiple variables (e.g., latitude, longitude = lax_coordinates).
  • Swapping values: Achieved through unpacking without needing a temporary variable (a, b = b, a).
  • Function argument unpacking: Use * to unpack arguments from a tuple (e.g., divmod(*t)).
    • Returning multiple values: Functions return multiple values as tuples, which can be unpacked by the caller.
    • * operator: Captures excess items in sequences, allowing flexible unpacking in various positions.
  • Function parameters with *args: Collects additional positional arguments into a tuple.
    • Unpacking in function calls: Use * to unpack multiple iterables in a function call.
    • Sequence literals with *: Use * in list, tuple, or set literals to combine sequences.
  • Nested unpacking: Simplifies unpacking of complex, nested structures (e.g., tuples within tuples).

  • Pattern matching (Python 3.10+): The match/case statement allows destructuring and matching of complex sequences and data structures.
  • Command handling with pattern matching: Use patterns to match different command structures
  • Destructuring nested tuples: Extract values from nested sequences directly in the case patterns.
  • * operator in pattern matching: Captures remaining items in sequences (e.g., a, *body, c).
  • as keyword: Binds parts of a pattern to variables for later use.
  • Type constraints: Patterns can specify types (e.g., case [str(name), float(lat), float(lon)]).
  • Guards: Add conditions to patterns using if clauses for additional checks.
  • Original if/elif structure: Uses unpacking to handle different Lisp-like expression forms (e.g., quote, if, lambda, define).
  • Pattern matching refactor: Replaces if/elif with match/case for clearer and more declarative handling of expressions.
  • Quote pattern: Matches two-item sequences starting with 'quote'.
  • If pattern: Matches four-item sequences starting with 'if', evaluating the test and returning the consequence or alternative.
  • Lambda pattern: Matches sequences starting with 'lambda' where parameters are a list and the body has one or more expressions.
  • Define pattern: Matches three-item sequences starting with 'define' and ensures the name is a Symbol.
  • Catch-all case: Raises an error for unmatched expressions.
  • Safe lambda pattern: Ensures the parameters part of lambda is always a list by using nested sequence matching ([*parms]).
  • Alternative define syntax: Supports defining functions with the name and parameters inside a list, matched using ['define', [Symbol() as name, *parms], *body].
  • Pattern for quote: Matches sequences starting with 'quote' to return the expression.
  • Pattern for if: Matches four-item sequences starting with 'if' and evaluates the test and branches.
  • Pattern for lambda: Ensures parameters are a list, with at least one body expression.
  • Pattern for define: Supports both variable definitions and function definitions with named parameters.
  • Catch-all: Raises SyntaxError for invalid expressions.

  • Slicing excludes the last item: Python slices and ranges use zero-based indexing and exclude the stop index, simplifying length calculation and ensuring non-overlapping splits.
  • Slice syntax: s[a:b:c] allows defining start, stop, and step, enabling skipping or reversing items.
  • Examples of slicing:
    • s[::3] skips every 2 items.
    • s[::-1] reverses the sequence.
  • Slice objects: Python internally represents slices using slice objects.
  • Named slices: Improve code readability when handling complex structures.
  • Multidimensional slicing: Allows slicing across multiple axes (e.g., using NumPy).
  • Ellipsis (...): Used for multidimensional slicing to represent unspecified dimensions.
  • Modifying sequences with slices: Slices can replace parts of mutable sequences.

  • Concatenation (+) with sequences: Combines two sequences of the same type into a new sequence.
    • Example: [1, 2] + [3, 4][1, 2, 3, 4].
  • Repetition (*) with sequences: Repeats a sequence a specified number of times, producing a new sequence.
    • Example: ['a'] * 3['a', 'a', 'a'].
  • Caution with mutable items in sequences: Using * with sequences containing mutable items can result in multiple references to the same object, leading to unintended modifications.
    • Example: [['x']] * 3 creates three references to the same list.
  • Correct approach for list of lists: Use list comprehensions to create independent sublists.
    • Example: [['x' for _ in range(3)] for _ in range(3)].

  • Augmented assignment (+= and *=):
    • Mutable sequences (lists): Modifies the sequence in place if __iadd__ or __imul__ is implemented.
    • Immutable sequences (tuples): Creates a new sequence, as in-place modification is not possible.
  • Unexpected behavior with +=: Modifying a mutable object inside an immutable tuple raises an error but still changes the mutable object before the exception.
    • Example: t = (1, 2, [30, 40]); t[2] += [50, 60] modifies the list but raises TypeError.

  • list.sort method:
    • Sorts the list in place, modifying the original list.
    • Returns None as it operates on the existing list.
    • Only works with lists (mutable sequences).
  • sorted function:
    • Returns a new sorted list from any iterable, leaving the original sequence unchanged.
    • Can be used with lists, tuples, strings, or generators.
  • Arguments for both list.sort and sorted:
    • reverse argument: Reverses the sort order when set to True.
    • key argument: A function used to extract a comparison key from each list element (e.g., key=str.lower for case-insensitive sorting).
  • Stability of sorting: Python's sort is stable, maintaining the relative order of items that compare equal (important when sorting by multiple criteria).
  • Performance considerations:
    • list.sort: More memory-efficient since it does not create a new list.
    • sorted: Requires more memory as it creates a new list but leaves the original list unchanged.
  • Efficient searching in sorted sequences: Once sorted, sequences can be searched efficiently using binary search (via the bisect module).

  • Use of arrays (array.array) for numerical data:

    • Arrays store numerical data more compactly, improving memory efficiency.
    • They support binary I/O for faster loading and saving.
    • Sorting requires using sorted since arrays lack an in-place sort method.
  • collections.deque: Efficient for frequent appends and pops from both ends (O(1) time complexity).

    • Example: dq = deque([1, 2, 3]); dq.append(4); dq.popleft().
  • Memory views (memoryview): Allows working with slices of data without copying, useful for large datasets.

    • Example: mview = memoryview(array('B', range(6))).
  • NumPy arrays: More memory-efficient and faster for numerical operations than Python lists, supporting high-level operations like reshaping and element-wise math.

    • Example: a = np.arange(12).reshape(3, 4).
  • Deques vs lists: Deques outperform lists for queue-like operations where adding/removing items from both ends is frequent, while lists excel with random access and operations on the middle of the sequence.

  • Queue implementations:

    • queue.Queue: Thread-safe FIFO.
    • asyncio.Queue: Asynchronous FIFO for event loops.
    • heapq: Priority queue implementation.

  • Memory model:
    • Container sequences: Use references to manage complex data.
    • Flat sequences: Store data directly, optimizing memory for simple types.
  • Container sequences: Store references to objects, allowing them to hold heterogeneous and nested data types (e.g., list, tuple, deque).
    • Example: nested_list = [1, 'a', [2, 3], ('b', 'c')].
  • Flat sequences: Store simple, atomic data types in contiguous memory, making them more memory-efficient but unable to hold complex structures (e.g., str, bytes, array.array).
    • Example: numbers = array.array('i', [1, 2, 3, 4]).
  • Use cases:
    • Container sequences: Best for storing mixed or nested data.
    • Flat sequences: Ideal for large collections of primitive types where memory efficiency matters.
  • Python Container abstract class: Defines objects supporting the in operator (__contains__), including str and array.array.

Ch 20. Concurrent executors

  • Michele Simionato: Simple thread spawning and queue collection pattern is sufficient for 99% of application programming cases.
  • concurrent.futures.Executor: Simplifies concurrent execution with threads and processes.
  • ThreadpoolExecutor simplifies concurrency, balancing worker threads based on CPU cores.
  • Future objects track the status of asynchronous operations.
  • executor.submit schedules tasks, returning futures for tracking results.
  • futures.as_completed allows reacting to tasks as they finish, offering more control than executor.map.
  • ProcessPoolExecutor in concurrent.futures simplifies parallel execution of CPU-bound tasks across multiple processes.
    • Benefits:
      • Hides process management, inter-process communication, and task distribution complexities.
      • Best suited for CPU-bound tasks, while threads are better for I/O-bound tasks.
    • Advantages:
      • Simplified code with no manual handling of multiprocessing.
      • Efficient use of CPU cores for parallel processing.
      • Results returned predictably in the order of task submission.
    • Key Takeaways:
      • ProcessPoolExecutor offers clean and scalable parallel execution for CPU-intensive tasks.
      • While the task completion order is hidden, the simplicity and efficiency of the approach are major benefits.
  • executor.map submits tasks non-blockingly but retrieves results in submission order.
  • executor.submit + futures.as_completed provides more flexibility by yielding results as they complete.