The Elegant Dict-Object Hybrid: A Pythonic Design Pattern for Flexible Data Containers

Introduction

In Python development, we frequently work with structured data: configuration settings, API responses, form submissions, event logs, and more. Most developers reach for one of two common approaches:

# Dictionary approach
user_data = {"name": "Alice", "email": "alice@example.com", "active": True}
print(user_data["name"])  # Dictionary access

# Class-based approach
class UserData:
    def __init__(self, name, email, active):
        self.name = name
        self.email = email
        self.active = active

user = UserData("Alice", "alice@example.com", True)
print(user.name)  # Attribute access

But what if we could get the best of both worlds? What if we could create a flexible data container that combines the dynamic nature of dictionaries with the elegant attribute access of objects, while adding powerful transformation capabilities?

This blog post explores a surprisingly simple yet powerful design pattern that can transform how you handle data in your Python applications.

The Problem Space

Let's consider a common scenario: handling API responses.

Imagine you're building an application that interacts with various REST APIs. Each API returns JSON data that needs to be parsed, validated, transformed, and passed through different parts of your application.

Here are the pain points:

Access syntax: Dictionary access (data["key"]) is more verbose and error-prone than attribute access (data.key)
Data transformation: You often need to create modified versions of the data without mutating the original
Contextual metadata: Sometimes you need to track metadata about the fields without cluttering the data itself
Consistency: Different parts of your codebase might expect different formats
Selective access: You frequently need to extract subsets of the data based on field types or categories

Let's see how our design pattern addresses these challenges.

Building a Flexible Data Container

Let's create a flexible DataContainer class that solves these problems:

class DataContainer:
    def __init__(self, base=None, **kwargs):
        # Internal storage
        self._store = {}
        self._metadata = {}

        # Initialize from another DataContainer
        if base and isinstance(base, type(self)):
            self._store = base._store.copy()
            self._metadata = base._metadata.copy()
        # Initialize from a dict
        elif base and isinstance(base, dict):
            self._store = base.copy()

        # Update with provided kwargs
        self._store.update(kwargs)

    def __getattr__(self, key):
        if key.startswith("__") and key.endswith("__"):
            raise AttributeError
        if key in self._store:
            return self._store[key]
        raise AttributeError(f"'{type(self).__name__}' object has no attribute '{key}'")

    def __setattr__(self, key, value):
        if key.startswith("_"):
            super().__setattr__(key, value)
        else:
            self._store[key] = value

    def __getitem__(self, key):
        return self._store[key]

    def __setitem__(self, key, value):
        self._store[key] = value

    def __contains__(self, key):
        return key in self._store

    def __repr__(self):
        return f"{type(self).__name__}({self._store})"

    def keys(self):
        return self._store.keys()

    def values(self):
        return self._store.values()

    def items(self):
        return self._store.items()

    def get(self, key, default=None):
        return self._store.get(key, default)

    def copy(self, **kwargs):
        result = type(self)(base=self)
        result._store.update(kwargs)
        return result

    def without(self, *keys):
        result = self.copy()
        for key in keys:
            if key in result._store:
                del result._store[key]
        return result

    def with_metadata(self, **metadata):
        result = self.copy()
        for key, value in metadata.items():
            result._metadata[key] = value
        return result

    def filter(self, predicate):
        filtered = {k: v for k, v in self._store.items() if predicate(k, v)}
        return type(self)(filtered)

This class gives us:

Both dictionary-style and attribute-style access
Immutable operations that create new copies (like copy() and without())
Metadata tracking via the _metadata dictionary
Filtering capabilities

Real-World Example: API Response Handling

Let's see how this improves a typical API workflow:

import requests
from datacontainer import DataContainer

def get_user(user_id):
    response = requests.get(f"https://api.example.com/users/{user_id}")
    data = response.json()

    # Convert the plain dict to our DataContainer
    user = DataContainer(data)

    # Add some metadata about this request
    user = user.with_metadata(
        source="api.example.com",
        timestamp=response.headers.get("Date"),
        request_id=response.headers.get("X-Request-ID")
    )

    return user

# Usage
user = get_user(123)

# Attribute-style access is cleaner and IDE-friendly
print(f"Hello, {user.name}!")

# We can make a modified copy without affecting the original
display_user = user.copy(
    full_name=f"{user.first_name} {user.last_name}",
    display_date=format_date(user.created_at)
).without("password_hash", "security_question")

# We can filter fields based on type or other criteria
contact_info = user.filter(
    lambda key, value: key in ("email", "phone", "address")
)

Comparison with Pydantic and dataclasses

A common question might be: "Why not just use Pydantic or dataclasses?" Let's compare these approaches to understand where our DataContainer fits in the ecosystem.

Dataclasses

from dataclasses import dataclass, field, asdict

@dataclass
class UserData:
    name: str
    email: str
    active: bool = True
    metadata: dict = field(default_factory=dict)

    def to_dict(self):
        return asdict(self)

Pros of dataclasses:

Type hints and IDE support
Auto-generated methods
Clear structure defined upfront

Cons of dataclasses:

Fields need to be predefined
Adding dynamic fields requires extra work
Transformations often require creating new classes

Pydantic

from pydantic import BaseModel, Field
from typing import Dict, Any

class UserData(BaseModel):
    name: str
    email: str
    active: bool = True
    metadata: Dict[str, Any] = Field(default_factory=dict)

Pros of Pydantic:

Powerful validation
Schema generation
Serialization capabilities
Type safety

Cons of Pydantic:

Less flexibility for dynamic fields
More verbose for simple use cases
Performance overhead for validation
Transformations can be cumbersome

Our DataContainer

user = DataContainer(name="Alice", email="alice@example.com")
user.active = True
user = user.with_metadata(source="signup_form")

Pros of DataContainer:

Maximum flexibility for both defined and dynamic fields
Clean, fluent API for transformations
Lightweight with minimal dependencies
Both dict-like and object-like interfaces
Excellent for evolving or unpredictable data structures

Cons of DataContainer:

No built-in validation
No type hinting for specific fields
Less formal structure can lead to inconsistency

When to Use Each Approach

Use dataclasses when:

You have a well-defined, stable data structure
You value type hints and IDE support
You don't need many dynamic transformations

Use Pydantic when:

Data validation is critical
You're working with external APIs and need schema validation
You're building larger applications where type safety matters
You need automatic documentation (via schema generation)

Use our DataContainer when:

You need maximum flexibility for evolving data structures
You're working with unpredictable data sources
You want clean, chainable transformations
You value simplicity and readability over strict validation
You're building prototypes or smaller applications

Enhanced Configuration Management

One area where our DataContainer pattern truly shines is in configuration management. Let's expand on this use case with a more robust implementation:

import os
import json
import yaml
from pathlib import Path
from typing import Any, Dict, Optional

class Config(DataContainer):
    def __init__(self, base=None, **kwargs):
        super().__init__(base, **kwargs)
        self._frozen = False
        self._sources = []  # Track where config values came from

    def __setattr__(self, key, value):
        if hasattr(self, "_frozen") and self._frozen and not key.startswith("_"):
            raise AttributeError(f"Cannot modify frozen config key '{key}'")
        super().__setattr__(key, value)

    def freeze(self):
        """Make the config immutable"""
        result = self.copy()
        result._frozen = True
        return result

    def with_prefix(self, prefix, strip_prefix=True):
        """Extract all keys with a specific prefix"""
        result = type(self)()
        prefix_len = len(prefix)

        for key, value in self.items():
            if key.startswith(prefix):
                new_key = key[prefix_len:] if strip_prefix else key
                result[new_key] = value

        return result

    def with_source(self, source, **kwargs):
        """Add config values with tracking of their source"""
        result = self.copy(**kwargs)
        for key, value in kwargs.items():
            if not hasattr(result, "_sources"):
                result._sources = []
            result._sources.append((key, source))
        return result

    def get_source(self, key):
        """Get the source of a config value"""
        for k, source in getattr(self, "_sources", []):
            if k == key:
                return source
        return None

    def merge(self, other_config, overwrite=True):
        """Merge with another config, with option to preserve existing values"""
        result = self.copy()
        for key, value in other_config.items():
            if overwrite or key not in result:
                result[key] = value
                # Preserve source information if available
                if hasattr(other_config, "get_source"):
                    source = other_config.get_source(key)
                    if source and hasattr(result, "_sources"):
                        result._sources.append((key, source))
        return result

    def with_env_override(self, prefix="APP_"):
        """Override config values from environment variables"""
        result = self.copy()
        for env_key, env_value in os.environ.items():
            if env_key.startswith(prefix):
                config_key = env_key[len(prefix):].lower()
                # Convert environment variable to appropriate type
                if config_key in result and isinstance(result[config_key], bool):
                    typed_value = env_value.lower() in ("true", "yes", "1")
                elif config_key in result and isinstance(result[config_key], int):
                    typed_value = int(env_value)
                elif config_key in result and isinstance(result[config_key], float):
                    typed_value = float(env_value)
                else:
                    typed_value = env_value

                result = result.with_source(f"env:{env_key}", **{config_key: typed_value})
        return result

    @classmethod
    def from_file(cls, filepath):
        """Load from a config file based on extension"""
        path = Path(filepath)
        if not path.exists():
            raise FileNotFoundError(f"Config file not found: {filepath}")

        with open(path, "r") as f:
            if path.suffix.lower() in (".yml", ".yaml"):
                data = yaml.safe_load(f)
            elif path.suffix.lower() == ".json":
                data = json.load(f)
            else:
                raise ValueError(f"Unsupported config file type: {path.suffix}")

        config = cls(data)
        return config.with_source(f"file:{filepath}")

    @classmethod
    def from_dict(cls, data, source="dict"):
        """Create from a dictionary with source tracking"""
        config = cls(data)
        for key in data:
            if not hasattr(config, "_sources"):
                config._sources = []
            config._sources.append((key, source))
        return config

    def to_dict(self):
        """Convert config to a plain dictionary"""
        return dict(self.items())

    def to_file(self, filepath):
        """Save config to a file based on extension"""
        path = Path(filepath)

        with open(path, "w") as f:
            if path.suffix.lower() in (".yml", ".yaml"):
                yaml.dump(self.to_dict(), f)
            elif path.suffix.lower() == ".json":
                json.dump(self.to_dict(), f, indent=2)
            else:
                raise ValueError(f"Unsupported config file type: {path.suffix}")

Building a Layered Configuration System

With our enhanced Config class, we can create a sophisticated configuration system that loads from multiple sources with clear precedence:

def load_application_config(app_name, env="development"):
    # Start with default configuration
    config = Config.from_file(f"config/defaults.yaml")

    # Add environment-specific configuration
    try:
        env_config = Config.from_file(f"config/{env}.yaml")
        config = config.merge(env_config)
    except FileNotFoundError:
        print(f"No configuration found for environment: {env}")

    # Add local overrides (not committed to version control)
    try:
        local_config = Config.from_file("config/local.yaml")
        config = config.merge(local_config)
    except FileNotFoundError:
        # Local config is optional
        pass

    # Override with environment variables
    config = config.with_env_override(prefix=f"{app_name.upper()}_")

    # Freeze configuration to prevent accidental modification
    return config.freeze()

# Usage
config = load_application_config("myapp", env="production")

# Extract subsystem configuration
database_config = config.with_prefix("database_")
logging_config = config.with_prefix("logging_")

# Check sources for debugging
print(f"Database URL source: {config.get_source('database_url')}")

Hierarchical Configuration

Our pattern also handles hierarchical configurations elegantly:

def nested_get(config, path, default=None):
    """Get a nested configuration value using dot notation"""
    keys = path.split(".")
    current = config

    for key in keys:
        if hasattr(current, key):
            current = getattr(current, key)
        else:
            return default

    return current

# Create a hierarchical config
server_config = Config(
    host="localhost",
    port=8080,
    ssl=Config(
        enabled=True,
        cert="/path/to/cert.pem",
        key="/path/to/key.pem"
    ),
    cors=Config(
        enabled=True,
        origins=["https://example.com"]
    )
)

# Access nested values
ssl_enabled = nested_get(server_config, "ssl.enabled")
cors_origins = nested_get(server_config, "cors.origins")

# Or use direct attribute access for a cleaner API
ssl_enabled = server_config.ssl.enabled
cors_origins = server_config.cors.origins

The Grand Reveal: Inspiration from DSPy

If you found this pattern useful, you might be interested to know that it was inspired by the Example class in DSPy, a powerful framework for programming with language models.

In DSPy, this pattern is used to represent examples in machine learning datasets, elegantly separating inputs from labels while maintaining a clean, unified interface.

Here's a simplified version of how DSPy uses this pattern:

# Creating a dataset for sentiment analysis
dataset = [
    Example(text="I loved this movie!", sentiment="positive").with_inputs("text"),
    Example(text="Terrible acting and plot.", sentiment="negative").with_inputs("text")
]

# Using the data in a model
for item in dataset:
    # Just get the inputs (the "text" field)
    inputs = item.inputs()

    # Make a prediction
    prediction = model(inputs.text)

    # Compare to the actual label
    accuracy = (prediction == item.sentiment)

Conclusion

The Dict-Object hybrid pattern demonstrates the elegance of Python's design flexibility. With just a few magic methods and thoughtful API design, we created a powerful data container that:

Provides both dictionary and object interfaces
Supports immutable transformations
Enables domain-specific extensions
Makes code more readable and maintainable

While Pydantic and dataclasses excel at validation and type safety, our DataContainer pattern shines in scenarios requiring flexibility, dynamic transformations, and clean APIs. The configuration management example shows how a relatively simple design can solve complex real-world problems in an elegant way.

This pattern can be adapted to numerous domains beyond the examples shown here:

Event data in logging systems
ETL pipeline transformations
Command-line argument parsing
JSON/XML data processing
Application state management

The next time you find yourself juggling dictionaries and custom classes for structured data, consider implementing this flexible container pattern. It might just transform how you think about data handling in Python.

The Elegant Dict-Object Hybrid: A Pythonic Design Pattern for Flexible Data Containers

Introduction

The Problem Space

Building a Flexible Data Container

Real-World Example: API Response Handling

Comparison with Pydantic and dataclasses

Dataclasses

Pydantic

Our DataContainer

When to Use Each Approach

Enhanced Configuration Management

Building a Layered Configuration System

Hierarchical Configuration

The Grand Reveal: Inspiration from DSPy

Conclusion

Comments

More from this blog

Designing A Retrieval Stack For Agents

Your Linter Just Broke Production

Building a Semantic Highlighter: Understanding Search Result Presentation Through Machine Learning - Part 2

Building a Semantic Highlighter: Understanding Search Result Presentation Through Machine Learning - Part 1

Command Palette

Introduction

The Problem Space

Building a Flexible Data Container

Real-World Example: API Response Handling

Comparison with Pydantic and dataclasses

Dataclasses

Pydantic

Our DataContainer

When to Use Each Approach

Enhanced Configuration Management

Building a Layered Configuration System

Hierarchical Configuration

The Grand Reveal: Inspiration from DSPy

Conclusion

Comments

More from this blog