Skip to content

Data Management in Optiscope Dash App

This document details the architecture and data flow for managing optimization data within the Optiscope Dash application. It covers how data is imported, stored, and accessed across user sessions.

Overview

The data management system in Optiscope is designed to handle multiple data sources simultaneously, ensuring data isolation between user sessions while providing a unified interface for the application to access optimization results.

The system is built around a few key concepts: - Data Sources: Origins of data (uploaded files, connected databases). - Session Isolation: Each user has a private workspace and database. - Storage Backends: Abstractions for different storage types (Database, Memory, FileSystem). - Registry: A central point to access active storage instances.

Architecture

The architecture is split between the client-side (browser) and the server-side (Python backend).

Client-Side

  • dcc.Store: The browser holds references to data sources, not the data itself.
    • storage-references: Stores metadata (source ID, label, type) for all loaded sources.
    • selected-results-store: Stores IDs of currently selected results for analysis.
  • UI Components: The DataManagerPage provides the interface for uploading files and connecting to databases.

Server-Side

  • DataSourceManager: The high-level orchestrator for adding/removing sources.
  • StorageRegistry: A singleton (per worker) that maps source_id to StorageBackend instances.
  • SessionManager: Manages per-user directories and session-specific databases (DuckDB/SQLite).
  • StorageBackend: The actual implementation of data persistence (e.g., DatabaseStorage, SessionDBStorage).
classDiagram
    class DataSourceManager {
        +add_data_source(config)
        +remove_data_source(source_id)
        -_import_to_session_db()
        -_mount_database()
    }

    class StorageRegistry {
        -_storages: dict
        +register(source_id, backend)
        +get(source_id)
        +unregister(source_id)
    }

    class SessionManager {
        +session_id: str
        +session_path: Path
        +get_db_connection_string()
        +cleanup()
    }

    class StorageBackend {
        <<interface>>
        +save_result(key, result)
        +load_result(key)
        +list_results()
    }

    class DatabaseStorage {
        +connection_string: str
        +save_result()
        +load_result()
    }

    class SessionDBStorage {
        +source_id: str
        -_prefix_key(key)
        +save_result()
    }

    DataSourceManager --> SessionManager : uses
    DataSourceManager --> StorageRegistry : registers backends
    StorageRegistry o-- StorageBackend : holds
    DatabaseStorage --|> StorageBackend : implements
    SessionDBStorage --|> DatabaseStorage : extends

Data Flow Scenarios

1. File Upload (Import)

When a user uploads a file (e.g., CSV, JSON), the data is imported into the user's private session database.

  1. Upload: dcc.Upload sends file content to the server.
  2. Process: DataSourceManager identifies the file type.
  3. Import: Data is parsed and saved into the Session Database using SessionDBStorage.
    • SessionDBStorage prefixes all keys with the source_id to prevent collisions with other imports in the same DB.
  4. Register: The SessionDBStorage instance is registered in StorageRegistry.
  5. Reference: A lightweight reference (ID, label) is sent back to the client's dcc.Store.
sequenceDiagram
    participant Client as Browser (UI)
    participant Callback as Dash Callback
    participant DSM as DataSourceManager
    participant Reg as StorageRegistry
    participant SessDB as Session Database

    Client->>Callback: Upload File (content)
    Callback->>DSM: add_data_source(content)
    DSM->>DSM: Identify Format
    DSM->>SessDB: Save Data (prefixed with source_id)
    DSM->>Reg: Register SessionDBStorage(source_id)
    DSM-->>Callback: Return Source Reference
    Callback-->>Client: Update dcc.Store & UI

2. Database Connection (Mount)

When a user connects to an existing SQLite or DuckDB file, the database is mounted. Data is NOT copied; it is read directly from the source file.

  1. Connect: User provides a path to a .db or .duckdb file.
  2. Mount: DataSourceManager creates a DatabaseStorage instance pointing to that file.
  3. Register: The DatabaseStorage instance is registered in StorageRegistry.
  4. Reference: A reference is sent back to the client.
sequenceDiagram
    participant Client as Browser (UI)
    participant Callback as Dash Callback
    participant DSM as DataSourceManager
    participant Reg as StorageRegistry
    participant ExtDB as External Database

    Client->>Callback: Connect DB (path)
    Callback->>DSM: add_data_source(path, type='sqlite')
    DSM->>DSM: Verify Path
    DSM->>Reg: Register DatabaseStorage(path)
    Note over Reg, ExtDB: Connection established (Read-Only)
    DSM-->>Callback: Return Source Reference
    Callback-->>Client: Update dcc.Store & UI

3. Data Retrieval

When a visualization page needs data:

  1. Request: Client sends source_id and result_key to a callback.
  2. Lookup: Callback asks StorageRegistry for the backend associated with source_id.
  3. Load: The backend (SessionDBStorage or DatabaseStorage) loads the OptimizationResult object.
    • If SessionDBStorage, it automatically handles the source_id prefixing/unprefixing transparently.
flowchart LR
    subgraph Server
        Callback[Dash Callback]
        Registry[StorageRegistry]
        Backend[StorageBackend]
        DB[(Database)]
    end

    Client[Browser] -->|source_id, result_key| Callback
    Callback -->|"get(source_id)"| Registry
    Registry -->|return backend| Callback
    Callback -->|"load_result(key)"| Backend
    Backend -->|SQL Query| DB
    DB -->|Data| Backend
    Backend -->|OptimizationResult| Callback
    Callback -->|Figure/Data| Client

Database Schema

The DatabaseStorage uses SQLAlchemy to manage two main tables. This schema is used for both external databases and the internal session database.

  • optimization_results: Stores the main optimization objects.
  • result_sets: Stores subsets of data (e.g., Pareto fronts) associated with a result.
erDiagram
    OPTIMIZATION_RESULTS ||--o{ RESULT_SETS : contains
    OPTIMIZATION_RESULTS {
        string key PK "Unique identifier (prefixed in session DB)"
        blob data "Serialized OptimizationResult (JSON/Pickle)"
        datetime created_at
        datetime modified_at
        int n_points "Number of solutions"
        int n_sets "Number of subsets"
        text metadata_json "Custom metadata"
    }
    RESULT_SETS {
        int id PK
        string result_key FK "Links to parent result"
        string set_name "Name of the set (e.g., 'pareto')"
        text data "Serialized ResultSet"
        datetime created_at
    }

Session Management Details

  • Isolation: Each user session gets a unique session_id.
  • Directory: A directory is created at ./sessions/<session_id>/.
  • Session DB: A data.duckdb (or data.db) file is created inside the session directory.
  • Cleanup: When a session expires or the server shuts down, SessionManager.cleanup() removes the directory.

SessionDBStorage Logic

SessionDBStorage is a clever wrapper that allows multiple imported files to live in the same physical data.duckdb file without mixing data.

  • Prefixing: When saving result_A from source_1, it is stored as source_1::result_A.
  • Filtering: When listing results for source_1, it queries for keys starting with source_1:: and strips the prefix before returning them.
  • Transparency: The consumer of the class (the app) doesn't need to know about the prefixes.

Summary

Feature Imported Data (File Upload) Mounted Data (DB Connection)
Storage Location User's Session DB (./sessions/...) External File (/path/to/file.db)
Persistence Temporary (Session duration) Permanent (External file)
Backend Class SessionDBStorage DatabaseStorage
Write Access Read/Write (Isolated) Read-Only (Safety)
Performance Fast (Local DuckDB) Depends on external DB