Data Management in Optiscope Dash App¶
This document details the architecture and data flow for managing optimization data within the Optiscope Dash application. It covers how data is imported, stored, and accessed across user sessions.
Overview¶
The data management system in Optiscope is designed to handle multiple data sources simultaneously, ensuring data isolation between user sessions while providing a unified interface for the application to access optimization results.
The system is built around a few key concepts: - Data Sources: Origins of data (uploaded files, connected databases). - Session Isolation: Each user has a private workspace and database. - Storage Backends: Abstractions for different storage types (Database, Memory, FileSystem). - Registry: A central point to access active storage instances.
Architecture¶
The architecture is split between the client-side (browser) and the server-side (Python backend).
Client-Side¶
dcc.Store: The browser holds references to data sources, not the data itself.storage-references: Stores metadata (source ID, label, type) for all loaded sources.selected-results-store: Stores IDs of currently selected results for analysis.
- UI Components: The
DataManagerPageprovides the interface for uploading files and connecting to databases.
Server-Side¶
DataSourceManager: The high-level orchestrator for adding/removing sources.StorageRegistry: A singleton (per worker) that mapssource_idtoStorageBackendinstances.SessionManager: Manages per-user directories and session-specific databases (DuckDB/SQLite).StorageBackend: The actual implementation of data persistence (e.g.,DatabaseStorage,SessionDBStorage).
classDiagram
class DataSourceManager {
+add_data_source(config)
+remove_data_source(source_id)
-_import_to_session_db()
-_mount_database()
}
class StorageRegistry {
-_storages: dict
+register(source_id, backend)
+get(source_id)
+unregister(source_id)
}
class SessionManager {
+session_id: str
+session_path: Path
+get_db_connection_string()
+cleanup()
}
class StorageBackend {
<<interface>>
+save_result(key, result)
+load_result(key)
+list_results()
}
class DatabaseStorage {
+connection_string: str
+save_result()
+load_result()
}
class SessionDBStorage {
+source_id: str
-_prefix_key(key)
+save_result()
}
DataSourceManager --> SessionManager : uses
DataSourceManager --> StorageRegistry : registers backends
StorageRegistry o-- StorageBackend : holds
DatabaseStorage --|> StorageBackend : implements
SessionDBStorage --|> DatabaseStorage : extends
Data Flow Scenarios¶
1. File Upload (Import)¶
When a user uploads a file (e.g., CSV, JSON), the data is imported into the user's private session database.
- Upload:
dcc.Uploadsends file content to the server. - Process:
DataSourceManageridentifies the file type. - Import: Data is parsed and saved into the Session Database using
SessionDBStorage.SessionDBStorageprefixes all keys with thesource_idto prevent collisions with other imports in the same DB.
- Register: The
SessionDBStorageinstance is registered inStorageRegistry. - Reference: A lightweight reference (ID, label) is sent back to the client's
dcc.Store.
sequenceDiagram
participant Client as Browser (UI)
participant Callback as Dash Callback
participant DSM as DataSourceManager
participant Reg as StorageRegistry
participant SessDB as Session Database
Client->>Callback: Upload File (content)
Callback->>DSM: add_data_source(content)
DSM->>DSM: Identify Format
DSM->>SessDB: Save Data (prefixed with source_id)
DSM->>Reg: Register SessionDBStorage(source_id)
DSM-->>Callback: Return Source Reference
Callback-->>Client: Update dcc.Store & UI
2. Database Connection (Mount)¶
When a user connects to an existing SQLite or DuckDB file, the database is mounted. Data is NOT copied; it is read directly from the source file.
- Connect: User provides a path to a
.dbor.duckdbfile. - Mount:
DataSourceManagercreates aDatabaseStorageinstance pointing to that file. - Register: The
DatabaseStorageinstance is registered inStorageRegistry. - Reference: A reference is sent back to the client.
sequenceDiagram
participant Client as Browser (UI)
participant Callback as Dash Callback
participant DSM as DataSourceManager
participant Reg as StorageRegistry
participant ExtDB as External Database
Client->>Callback: Connect DB (path)
Callback->>DSM: add_data_source(path, type='sqlite')
DSM->>DSM: Verify Path
DSM->>Reg: Register DatabaseStorage(path)
Note over Reg, ExtDB: Connection established (Read-Only)
DSM-->>Callback: Return Source Reference
Callback-->>Client: Update dcc.Store & UI
3. Data Retrieval¶
When a visualization page needs data:
- Request: Client sends
source_idandresult_keyto a callback. - Lookup: Callback asks
StorageRegistryfor the backend associated withsource_id. - Load: The backend (
SessionDBStorageorDatabaseStorage) loads theOptimizationResultobject.- If
SessionDBStorage, it automatically handles thesource_idprefixing/unprefixing transparently.
- If
flowchart LR
subgraph Server
Callback[Dash Callback]
Registry[StorageRegistry]
Backend[StorageBackend]
DB[(Database)]
end
Client[Browser] -->|source_id, result_key| Callback
Callback -->|"get(source_id)"| Registry
Registry -->|return backend| Callback
Callback -->|"load_result(key)"| Backend
Backend -->|SQL Query| DB
DB -->|Data| Backend
Backend -->|OptimizationResult| Callback
Callback -->|Figure/Data| Client
Database Schema¶
The DatabaseStorage uses SQLAlchemy to manage two main tables. This schema is used for both external databases and the internal session database.
optimization_results: Stores the main optimization objects.result_sets: Stores subsets of data (e.g., Pareto fronts) associated with a result.
erDiagram
OPTIMIZATION_RESULTS ||--o{ RESULT_SETS : contains
OPTIMIZATION_RESULTS {
string key PK "Unique identifier (prefixed in session DB)"
blob data "Serialized OptimizationResult (JSON/Pickle)"
datetime created_at
datetime modified_at
int n_points "Number of solutions"
int n_sets "Number of subsets"
text metadata_json "Custom metadata"
}
RESULT_SETS {
int id PK
string result_key FK "Links to parent result"
string set_name "Name of the set (e.g., 'pareto')"
text data "Serialized ResultSet"
datetime created_at
}
Session Management Details¶
- Isolation: Each user session gets a unique
session_id. - Directory: A directory is created at
./sessions/<session_id>/. - Session DB: A
data.duckdb(ordata.db) file is created inside the session directory. - Cleanup: When a session expires or the server shuts down,
SessionManager.cleanup()removes the directory.
SessionDBStorage Logic¶
SessionDBStorage is a clever wrapper that allows multiple imported files to live in the same physical data.duckdb file without mixing data.
- Prefixing: When saving
result_Afromsource_1, it is stored assource_1::result_A. - Filtering: When listing results for
source_1, it queries for keys starting withsource_1::and strips the prefix before returning them. - Transparency: The consumer of the class (the app) doesn't need to know about the prefixes.
Summary¶
| Feature | Imported Data (File Upload) | Mounted Data (DB Connection) |
|---|---|---|
| Storage Location | User's Session DB (./sessions/...) |
External File (/path/to/file.db) |
| Persistence | Temporary (Session duration) | Permanent (External file) |
| Backend Class | SessionDBStorage |
DatabaseStorage |
| Write Access | Read/Write (Isolated) | Read-Only (Safety) |
| Performance | Fast (Local DuckDB) | Depends on external DB |