fluent_python
Book
Ch 2. An Array of Sequences
-
Python provides various built-in sequence types with operations like iteration, slicing, and concatenation.
-
Sequences can be mutable or immutable, and container or flat.
- Container sequences: Hold items of different types, e.g.,
list,tuple,collections.deque. - Flat sequences: Hold items of one simple type, e.g.,
str,bytes,array.array.- Flat sequences are more memory-efficient as they store primitive values directly.
- Mutable sequences: Can be modified, e.g.,
list,bytearray,array.array. - Immutable sequences: Cannot be modified, e.g.,
tuple,str,bytes.
- Container sequences: Hold items of different types, e.g.,
-
Python objects have a memory header with fields like
ob_refcnt,ob_type, andob_fval. -
Built-in sequence types are virtual subclasses of
SequenceandMutableSequenceabstract base classes (ABCs).
- List comprehensions: Create lists using a concise syntax (
[expression for item in iterable if condition]). - Generator expressions: Similar to list comprehensions, but yield items one by one for memory efficiency (
(expression for item in iterable if condition)).- Memory efficiency: Generators are preferable for large sequences due to lazy evaluation.
- Readability: List comprehensions and generator expressions improve readability but should be used carefully for simple operations.
mapandfilter: Achieve similar results but are less readable, especially with nestedlambdaexpressions.
- Performance: List comprehensions are faster than loops; generator expressions save memory for large datasets.
- Performance: List comprehensions are generally faster or equal to
mapandfilter.
- Performance: List comprehensions are generally faster or equal to
- Syntax tips: Line breaks inside
[],{}, and()are allowed, and trailing commas improve maintainability. - List comprehensions: More readable alternative to
mapandfilterfor filtering and transforming lists. - Local scope in comprehensions: Variables in comprehensions are local, preventing unintended side effects on outer variables.
- Walrus operator (
:=): Assigns values within comprehensions and keeps the variable accessible after execution.
- Walrus operator (
- Generator expressions: Yield items one by one, using the iterator protocol, for memory efficiency.
- Syntax: Similar to list comprehensions but use parentheses
()instead of brackets[]. - Tuple and array initialization: Useful for initializing sequences like tuples and arrays without building the entire list in memory.
- Memory efficiency: Ideal for large datasets where storing an entire list is costly.
- Use cases: Suitable for scenarios requiring iteration over items one at a time, such as initializing sequences or working with large data.
- Syntax: Similar to list comprehensions but use parentheses
- Tuples as Records: Tuples can represent records where item position holds meaning, making them useful for structured data (e.g., coordinates, city data).
- Tuples as Immutable Lists: Tuples are used as immutable lists, offering clarity (fixed size) and performance benefits.
- Immutability: Tuples are immutable, but contained mutable objects can still be altered.
- Performance Advantages: Tuples are more memory-efficient, have faster bytecode evaluation, and better memory allocation than lists.
- Tuple Methods: Tuples lack list-specific mutability methods but support
reversed().
- Basic unpacking: Assigns items from an iterable to multiple variables (e.g.,
latitude, longitude = lax_coordinates). - Swapping values: Achieved through unpacking without needing a temporary variable (
a, b = b, a). - Function argument unpacking: Use
*to unpack arguments from a tuple (e.g.,divmod(*t)).- Returning multiple values: Functions return multiple values as tuples, which can be unpacked by the caller.
*operator: Captures excess items in sequences, allowing flexible unpacking in various positions.
- Function parameters with
*args: Collects additional positional arguments into a tuple.- Unpacking in function calls: Use
*to unpack multiple iterables in a function call. - Sequence literals with
*: Use*in list, tuple, or set literals to combine sequences.
- Unpacking in function calls: Use
- Nested unpacking: Simplifies unpacking of complex, nested structures (e.g., tuples within tuples).
- Pattern matching (Python 3.10+): The
match/casestatement allows destructuring and matching of complex sequences and data structures. - Command handling with pattern matching: Use patterns to match different command structures
- Destructuring nested tuples: Extract values from nested sequences directly in the
casepatterns. *operator in pattern matching: Captures remaining items in sequences (e.g.,a, *body, c).askeyword: Binds parts of a pattern to variables for later use.- Type constraints: Patterns can specify types (e.g.,
case [str(name), float(lat), float(lon)]). - Guards: Add conditions to patterns using
ifclauses for additional checks. - Original
if/elifstructure: Uses unpacking to handle different Lisp-like expression forms (e.g.,quote,if,lambda,define). - Pattern matching refactor: Replaces
if/elifwithmatch/casefor clearer and more declarative handling of expressions. - Quote pattern: Matches two-item sequences starting with
'quote'. - If pattern: Matches four-item sequences starting with
'if', evaluating the test and returning the consequence or alternative. - Lambda pattern: Matches sequences starting with
'lambda'where parameters are a list and the body has one or more expressions. - Define pattern: Matches three-item sequences starting with
'define'and ensures the name is aSymbol. - Catch-all case: Raises an error for unmatched expressions.
- Safe
lambdapattern: Ensures the parameters part oflambdais always a list by using nested sequence matching ([*parms]). - Alternative define syntax: Supports defining functions with the name and parameters inside a list, matched using
['define', [Symbol() as name, *parms], *body]. - Pattern for
quote: Matches sequences starting with'quote'to return the expression. - Pattern for
if: Matches four-item sequences starting with'if'and evaluates the test and branches. - Pattern for
lambda: Ensures parameters are a list, with at least one body expression. - Pattern for
define: Supports both variable definitions and function definitions with named parameters. - Catch-all: Raises
SyntaxErrorfor invalid expressions.
- Slicing excludes the last item: Python slices and ranges use zero-based indexing and exclude the stop index, simplifying length calculation and ensuring non-overlapping splits.
- Slice syntax:
s[a:b:c]allows defining start, stop, and step, enabling skipping or reversing items. - Examples of slicing:
s[::3]skips every 2 items.s[::-1]reverses the sequence.
- Slice objects: Python internally represents slices using
sliceobjects. - Named slices: Improve code readability when handling complex structures.
- Multidimensional slicing: Allows slicing across multiple axes (e.g., using NumPy).
- Ellipsis (
...): Used for multidimensional slicing to represent unspecified dimensions. - Modifying sequences with slices: Slices can replace parts of mutable sequences.
- Concatenation (
+) with sequences: Combines two sequences of the same type into a new sequence.- Example:
[1, 2] + [3, 4]→[1, 2, 3, 4].
- Example:
- Repetition (
*) with sequences: Repeats a sequence a specified number of times, producing a new sequence.- Example:
['a'] * 3→['a', 'a', 'a'].
- Example:
- Caution with mutable items in sequences: Using
*with sequences containing mutable items can result in multiple references to the same object, leading to unintended modifications.- Example:
[['x']] * 3creates three references to the same list.
- Example:
- Correct approach for list of lists: Use list comprehensions to create independent sublists.
- Example:
[['x' for _ in range(3)] for _ in range(3)].
- Example:
- Augmented assignment (
+=and*=):- Mutable sequences (lists): Modifies the sequence in place if
__iadd__or__imul__is implemented. - Immutable sequences (tuples): Creates a new sequence, as in-place modification is not possible.
- Mutable sequences (lists): Modifies the sequence in place if
- Unexpected behavior with
+=: Modifying a mutable object inside an immutable tuple raises an error but still changes the mutable object before the exception.- Example:
t = (1, 2, [30, 40]); t[2] += [50, 60]modifies the list but raisesTypeError.
- Example:
list.sortmethod:- Sorts the list in place, modifying the original list.
- Returns
Noneas it operates on the existing list. - Only works with lists (mutable sequences).
sortedfunction:- Returns a new sorted list from any iterable, leaving the original sequence unchanged.
- Can be used with lists, tuples, strings, or generators.
- Arguments for both
list.sortandsorted:reverseargument: Reverses the sort order when set toTrue.keyargument: A function used to extract a comparison key from each list element (e.g.,key=str.lowerfor case-insensitive sorting).
- Stability of sorting: Python's sort is stable, maintaining the relative order of items that compare equal (important when sorting by multiple criteria).
- Performance considerations:
list.sort: More memory-efficient since it does not create a new list.sorted: Requires more memory as it creates a new list but leaves the original list unchanged.
- Efficient searching in sorted sequences: Once sorted, sequences can be searched efficiently using binary search (via the
bisectmodule).
-
Use of arrays (
array.array) for numerical data:- Arrays store numerical data more compactly, improving memory efficiency.
- They support binary I/O for faster loading and saving.
- Sorting requires using
sortedsince arrays lack an in-placesortmethod.
-
collections.deque: Efficient for frequent appends and pops from both ends (O(1) time complexity).- Example:
dq = deque([1, 2, 3]); dq.append(4); dq.popleft().
- Example:
-
Memory views (
memoryview): Allows working with slices of data without copying, useful for large datasets.- Example:
mview = memoryview(array('B', range(6))).
- Example:
-
NumPy arrays: More memory-efficient and faster for numerical operations than Python lists, supporting high-level operations like reshaping and element-wise math.
- Example:
a = np.arange(12).reshape(3, 4).
- Example:
-
Deques vs lists: Deques outperform lists for queue-like operations where adding/removing items from both ends is frequent, while lists excel with random access and operations on the middle of the sequence.
-
Queue implementations:
queue.Queue: Thread-safe FIFO.asyncio.Queue: Asynchronous FIFO for event loops.heapq: Priority queue implementation.
- Memory model:
- Container sequences: Use references to manage complex data.
- Flat sequences: Store data directly, optimizing memory for simple types.
- Container sequences: Store references to objects, allowing them to hold heterogeneous and nested data types (e.g.,
list,tuple,deque).- Example:
nested_list = [1, 'a', [2, 3], ('b', 'c')].
- Example:
- Flat sequences: Store simple, atomic data types in contiguous memory, making them more memory-efficient but unable to hold complex structures (e.g.,
str,bytes,array.array).- Example:
numbers = array.array('i', [1, 2, 3, 4]).
- Example:
- Use cases:
- Container sequences: Best for storing mixed or nested data.
- Flat sequences: Ideal for large collections of primitive types where memory efficiency matters.
- Python
Containerabstract class: Defines objects supporting theinoperator (__contains__), includingstrandarray.array.
Ch 20. Concurrent executors
- Michele Simionato: Simple thread spawning and queue collection pattern is sufficient for 99% of application programming cases.
concurrent.futures.Executor: Simplifies concurrent execution with threads and processes.ThreadpoolExecutorsimplifies concurrency, balancing worker threads based on CPU cores.Futureobjects track the status of asynchronous operations.executor.submitschedules tasks, returning futures for tracking results.futures.as_completedallows reacting to tasks as they finish, offering more control thanexecutor.map.ProcessPoolExecutorinconcurrent.futuressimplifies parallel execution of CPU-bound tasks across multiple processes.- Benefits:
- Hides process management, inter-process communication, and task distribution complexities.
- Best suited for CPU-bound tasks, while threads are better for I/O-bound tasks.
- Advantages:
- Simplified code with no manual handling of
multiprocessing. - Efficient use of CPU cores for parallel processing.
- Results returned predictably in the order of task submission.
- Simplified code with no manual handling of
- Key Takeaways:
ProcessPoolExecutoroffers clean and scalable parallel execution for CPU-intensive tasks.- While the task completion order is hidden, the simplicity and efficiency of the approach are major benefits.
- Benefits:
executor.mapsubmits tasks non-blockingly but retrieves results in submission order.executor.submit+futures.as_completedprovides more flexibility by yielding results as they complete.