Skip to main content

Command Palette

Search for a command to run...

The Elegant Dict-Object Hybrid: A Pythonic Design Pattern for Flexible Data Containers

Published
9 min read
The Elegant Dict-Object Hybrid: A Pythonic Design Pattern for Flexible Data Containers

Introduction

In Python development, we frequently work with structured data: configuration settings, API responses, form submissions, event logs, and more. Most developers reach for one of two common approaches:

# Dictionary approach
user_data = {"name": "Alice", "email": "alice@example.com", "active": True}
print(user_data["name"])  # Dictionary access

# Class-based approach
class UserData:
    def __init__(self, name, email, active):
        self.name = name
        self.email = email
        self.active = active

user = UserData("Alice", "alice@example.com", True)
print(user.name)  # Attribute access

But what if we could get the best of both worlds? What if we could create a flexible data container that combines the dynamic nature of dictionaries with the elegant attribute access of objects, while adding powerful transformation capabilities?

This blog post explores a surprisingly simple yet powerful design pattern that can transform how you handle data in your Python applications.

The Problem Space

Let's consider a common scenario: handling API responses.

Imagine you're building an application that interacts with various REST APIs. Each API returns JSON data that needs to be parsed, validated, transformed, and passed through different parts of your application.

Here are the pain points:

  1. Access syntax: Dictionary access (data["key"]) is more verbose and error-prone than attribute access (data.key)

  2. Data transformation: You often need to create modified versions of the data without mutating the original

  3. Contextual metadata: Sometimes you need to track metadata about the fields without cluttering the data itself

  4. Consistency: Different parts of your codebase might expect different formats

  5. Selective access: You frequently need to extract subsets of the data based on field types or categories

Let's see how our design pattern addresses these challenges.

Building a Flexible Data Container

Let's create a flexible DataContainer class that solves these problems:

class DataContainer:
    def __init__(self, base=None, **kwargs):
        # Internal storage
        self._store = {}
        self._metadata = {}

        # Initialize from another DataContainer
        if base and isinstance(base, type(self)):
            self._store = base._store.copy()
            self._metadata = base._metadata.copy()
        # Initialize from a dict
        elif base and isinstance(base, dict):
            self._store = base.copy()

        # Update with provided kwargs
        self._store.update(kwargs)

    def __getattr__(self, key):
        if key.startswith("__") and key.endswith("__"):
            raise AttributeError
        if key in self._store:
            return self._store[key]
        raise AttributeError(f"'{type(self).__name__}' object has no attribute '{key}'")

    def __setattr__(self, key, value):
        if key.startswith("_"):
            super().__setattr__(key, value)
        else:
            self._store[key] = value

    def __getitem__(self, key):
        return self._store[key]

    def __setitem__(self, key, value):
        self._store[key] = value

    def __contains__(self, key):
        return key in self._store

    def __repr__(self):
        return f"{type(self).__name__}({self._store})"

    def keys(self):
        return self._store.keys()

    def values(self):
        return self._store.values()

    def items(self):
        return self._store.items()

    def get(self, key, default=None):
        return self._store.get(key, default)

    def copy(self, **kwargs):
        result = type(self)(base=self)
        result._store.update(kwargs)
        return result

    def without(self, *keys):
        result = self.copy()
        for key in keys:
            if key in result._store:
                del result._store[key]
        return result

    def with_metadata(self, **metadata):
        result = self.copy()
        for key, value in metadata.items():
            result._metadata[key] = value
        return result

    def filter(self, predicate):
        filtered = {k: v for k, v in self._store.items() if predicate(k, v)}
        return type(self)(filtered)

This class gives us:

  1. Both dictionary-style and attribute-style access

  2. Immutable operations that create new copies (like copy() and without())

  3. Metadata tracking via the _metadata dictionary

  4. Filtering capabilities

Real-World Example: API Response Handling

Let's see how this improves a typical API workflow:

import requests
from datacontainer import DataContainer

def get_user(user_id):
    response = requests.get(f"https://api.example.com/users/{user_id}")
    data = response.json()

    # Convert the plain dict to our DataContainer
    user = DataContainer(data)

    # Add some metadata about this request
    user = user.with_metadata(
        source="api.example.com",
        timestamp=response.headers.get("Date"),
        request_id=response.headers.get("X-Request-ID")
    )

    return user

# Usage
user = get_user(123)

# Attribute-style access is cleaner and IDE-friendly
print(f"Hello, {user.name}!")

# We can make a modified copy without affecting the original
display_user = user.copy(
    full_name=f"{user.first_name} {user.last_name}",
    display_date=format_date(user.created_at)
).without("password_hash", "security_question")

# We can filter fields based on type or other criteria
contact_info = user.filter(
    lambda key, value: key in ("email", "phone", "address")
)

Comparison with Pydantic and dataclasses

A common question might be: "Why not just use Pydantic or dataclasses?" Let's compare these approaches to understand where our DataContainer fits in the ecosystem.

Dataclasses

from dataclasses import dataclass, field, asdict

@dataclass
class UserData:
    name: str
    email: str
    active: bool = True
    metadata: dict = field(default_factory=dict)

    def to_dict(self):
        return asdict(self)

Pros of dataclasses:

  • Type hints and IDE support

  • Auto-generated methods

  • Clear structure defined upfront

Cons of dataclasses:

  • Fields need to be predefined

  • Adding dynamic fields requires extra work

  • Transformations often require creating new classes

Pydantic

from pydantic import BaseModel, Field
from typing import Dict, Any

class UserData(BaseModel):
    name: str
    email: str
    active: bool = True
    metadata: Dict[str, Any] = Field(default_factory=dict)

Pros of Pydantic:

  • Powerful validation

  • Schema generation

  • Serialization capabilities

  • Type safety

Cons of Pydantic:

  • Less flexibility for dynamic fields

  • More verbose for simple use cases

  • Performance overhead for validation

  • Transformations can be cumbersome

Our DataContainer

user = DataContainer(name="Alice", email="alice@example.com")
user.active = True
user = user.with_metadata(source="signup_form")

Pros of DataContainer:

  • Maximum flexibility for both defined and dynamic fields

  • Clean, fluent API for transformations

  • Lightweight with minimal dependencies

  • Both dict-like and object-like interfaces

  • Excellent for evolving or unpredictable data structures

Cons of DataContainer:

  • No built-in validation

  • No type hinting for specific fields

  • Less formal structure can lead to inconsistency

When to Use Each Approach

Use dataclasses when:

  • You have a well-defined, stable data structure

  • You value type hints and IDE support

  • You don't need many dynamic transformations

Use Pydantic when:

  • Data validation is critical

  • You're working with external APIs and need schema validation

  • You're building larger applications where type safety matters

  • You need automatic documentation (via schema generation)

Use our DataContainer when:

  • You need maximum flexibility for evolving data structures

  • You're working with unpredictable data sources

  • You want clean, chainable transformations

  • You value simplicity and readability over strict validation

  • You're building prototypes or smaller applications

Enhanced Configuration Management

One area where our DataContainer pattern truly shines is in configuration management. Let's expand on this use case with a more robust implementation:

import os
import json
import yaml
from pathlib import Path
from typing import Any, Dict, Optional

class Config(DataContainer):
    def __init__(self, base=None, **kwargs):
        super().__init__(base, **kwargs)
        self._frozen = False
        self._sources = []  # Track where config values came from

    def __setattr__(self, key, value):
        if hasattr(self, "_frozen") and self._frozen and not key.startswith("_"):
            raise AttributeError(f"Cannot modify frozen config key '{key}'")
        super().__setattr__(key, value)

    def freeze(self):
        """Make the config immutable"""
        result = self.copy()
        result._frozen = True
        return result

    def with_prefix(self, prefix, strip_prefix=True):
        """Extract all keys with a specific prefix"""
        result = type(self)()
        prefix_len = len(prefix)

        for key, value in self.items():
            if key.startswith(prefix):
                new_key = key[prefix_len:] if strip_prefix else key
                result[new_key] = value

        return result

    def with_source(self, source, **kwargs):
        """Add config values with tracking of their source"""
        result = self.copy(**kwargs)
        for key, value in kwargs.items():
            if not hasattr(result, "_sources"):
                result._sources = []
            result._sources.append((key, source))
        return result

    def get_source(self, key):
        """Get the source of a config value"""
        for k, source in getattr(self, "_sources", []):
            if k == key:
                return source
        return None

    def merge(self, other_config, overwrite=True):
        """Merge with another config, with option to preserve existing values"""
        result = self.copy()
        for key, value in other_config.items():
            if overwrite or key not in result:
                result[key] = value
                # Preserve source information if available
                if hasattr(other_config, "get_source"):
                    source = other_config.get_source(key)
                    if source and hasattr(result, "_sources"):
                        result._sources.append((key, source))
        return result

    def with_env_override(self, prefix="APP_"):
        """Override config values from environment variables"""
        result = self.copy()
        for env_key, env_value in os.environ.items():
            if env_key.startswith(prefix):
                config_key = env_key[len(prefix):].lower()
                # Convert environment variable to appropriate type
                if config_key in result and isinstance(result[config_key], bool):
                    typed_value = env_value.lower() in ("true", "yes", "1")
                elif config_key in result and isinstance(result[config_key], int):
                    typed_value = int(env_value)
                elif config_key in result and isinstance(result[config_key], float):
                    typed_value = float(env_value)
                else:
                    typed_value = env_value

                result = result.with_source(f"env:{env_key}", **{config_key: typed_value})
        return result

    @classmethod
    def from_file(cls, filepath):
        """Load from a config file based on extension"""
        path = Path(filepath)
        if not path.exists():
            raise FileNotFoundError(f"Config file not found: {filepath}")

        with open(path, "r") as f:
            if path.suffix.lower() in (".yml", ".yaml"):
                data = yaml.safe_load(f)
            elif path.suffix.lower() == ".json":
                data = json.load(f)
            else:
                raise ValueError(f"Unsupported config file type: {path.suffix}")

        config = cls(data)
        return config.with_source(f"file:{filepath}")

    @classmethod
    def from_dict(cls, data, source="dict"):
        """Create from a dictionary with source tracking"""
        config = cls(data)
        for key in data:
            if not hasattr(config, "_sources"):
                config._sources = []
            config._sources.append((key, source))
        return config

    def to_dict(self):
        """Convert config to a plain dictionary"""
        return dict(self.items())

    def to_file(self, filepath):
        """Save config to a file based on extension"""
        path = Path(filepath)

        with open(path, "w") as f:
            if path.suffix.lower() in (".yml", ".yaml"):
                yaml.dump(self.to_dict(), f)
            elif path.suffix.lower() == ".json":
                json.dump(self.to_dict(), f, indent=2)
            else:
                raise ValueError(f"Unsupported config file type: {path.suffix}")

Building a Layered Configuration System

With our enhanced Config class, we can create a sophisticated configuration system that loads from multiple sources with clear precedence:

def load_application_config(app_name, env="development"):
    # Start with default configuration
    config = Config.from_file(f"config/defaults.yaml")

    # Add environment-specific configuration
    try:
        env_config = Config.from_file(f"config/{env}.yaml")
        config = config.merge(env_config)
    except FileNotFoundError:
        print(f"No configuration found for environment: {env}")

    # Add local overrides (not committed to version control)
    try:
        local_config = Config.from_file("config/local.yaml")
        config = config.merge(local_config)
    except FileNotFoundError:
        # Local config is optional
        pass

    # Override with environment variables
    config = config.with_env_override(prefix=f"{app_name.upper()}_")

    # Freeze configuration to prevent accidental modification
    return config.freeze()

# Usage
config = load_application_config("myapp", env="production")

# Extract subsystem configuration
database_config = config.with_prefix("database_")
logging_config = config.with_prefix("logging_")

# Check sources for debugging
print(f"Database URL source: {config.get_source('database_url')}")

Hierarchical Configuration

Our pattern also handles hierarchical configurations elegantly:

def nested_get(config, path, default=None):
    """Get a nested configuration value using dot notation"""
    keys = path.split(".")
    current = config

    for key in keys:
        if hasattr(current, key):
            current = getattr(current, key)
        else:
            return default

    return current

# Create a hierarchical config
server_config = Config(
    host="localhost",
    port=8080,
    ssl=Config(
        enabled=True,
        cert="/path/to/cert.pem",
        key="/path/to/key.pem"
    ),
    cors=Config(
        enabled=True,
        origins=["https://example.com"]
    )
)

# Access nested values
ssl_enabled = nested_get(server_config, "ssl.enabled")
cors_origins = nested_get(server_config, "cors.origins")

# Or use direct attribute access for a cleaner API
ssl_enabled = server_config.ssl.enabled
cors_origins = server_config.cors.origins

The Grand Reveal: Inspiration from DSPy

If you found this pattern useful, you might be interested to know that it was inspired by the Example class in DSPy, a powerful framework for programming with language models.

In DSPy, this pattern is used to represent examples in machine learning datasets, elegantly separating inputs from labels while maintaining a clean, unified interface.

Here's a simplified version of how DSPy uses this pattern:

# Creating a dataset for sentiment analysis
dataset = [
    Example(text="I loved this movie!", sentiment="positive").with_inputs("text"),
    Example(text="Terrible acting and plot.", sentiment="negative").with_inputs("text")
]

# Using the data in a model
for item in dataset:
    # Just get the inputs (the "text" field)
    inputs = item.inputs()

    # Make a prediction
    prediction = model(inputs.text)

    # Compare to the actual label
    accuracy = (prediction == item.sentiment)

Conclusion

The Dict-Object hybrid pattern demonstrates the elegance of Python's design flexibility. With just a few magic methods and thoughtful API design, we created a powerful data container that:

  1. Provides both dictionary and object interfaces

  2. Supports immutable transformations

  3. Enables domain-specific extensions

  4. Makes code more readable and maintainable

While Pydantic and dataclasses excel at validation and type safety, our DataContainer pattern shines in scenarios requiring flexibility, dynamic transformations, and clean APIs. The configuration management example shows how a relatively simple design can solve complex real-world problems in an elegant way.

This pattern can be adapted to numerous domains beyond the examples shown here:

  • Event data in logging systems

  • ETL pipeline transformations

  • Command-line argument parsing

  • JSON/XML data processing

  • Application state management

The next time you find yourself juggling dictionaries and custom classes for structured data, consider implementing this flexible container pattern. It might just transform how you think about data handling in Python.