Merge branch 'main' of http://jeason.online:3000/zhaojie/iov_data_analysis_agent
This commit is contained in:
1
.kiro/specs/agent-robustness-optimization/.config.kiro
Normal file
1
.kiro/specs/agent-robustness-optimization/.config.kiro
Normal file
@@ -0,0 +1 @@
|
||||
{"specId": "ea41aaef-0737-4255-bcad-90f156a5b2d5", "workflowType": "requirements-first", "specType": "feature"}
|
||||
515
.kiro/specs/agent-robustness-optimization/design.md
Normal file
515
.kiro/specs/agent-robustness-optimization/design.md
Normal file
@@ -0,0 +1,515 @@
|
||||
# Design Document: Agent Robustness Optimization
|
||||
|
||||
## Overview
|
||||
|
||||
This design addresses five areas of improvement for the AI Data Analysis Agent: data privacy fallback recovery, conversation history trimming, analysis template integration, frontend progress display, and multi-file chunked/parallel loading. The changes span the Python backend (`data_analysis_agent.py`, `config/app_config.py`, `utils/data_privacy.py`, `utils/data_loader.py`, `web/main.py`) and the vanilla JS frontend (`web/static/script.js`, `web/static/index.html`, `web/static/clean_style.css`).
|
||||
|
||||
The core design principle is **minimal invasiveness**: each feature is implemented as a composable module or method that plugs into the existing agent loop, avoiding large-scale refactors of the `DataAnalysisAgent.analyze()` main loop.
|
||||
|
||||
## Architecture
|
||||
|
||||
The system follows a layered architecture where the `DataAnalysisAgent` orchestrates LLM calls and code execution, the FastAPI server manages sessions and exposes APIs, and the frontend polls for status updates.
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph Frontend
|
||||
UI[script.js + index.html]
|
||||
end
|
||||
|
||||
subgraph FastAPI Server
|
||||
API[web/main.py]
|
||||
SM[SessionManager]
|
||||
end
|
||||
|
||||
subgraph Agent Core
|
||||
DA[DataAnalysisAgent]
|
||||
EC[ErrorClassifier]
|
||||
HG[HintGenerator]
|
||||
HT[HistoryTrimmer]
|
||||
TI[TemplateIntegration]
|
||||
end
|
||||
|
||||
subgraph Utilities
|
||||
DP[data_privacy.py]
|
||||
DL[data_loader.py]
|
||||
AT[analysis_templates.py]
|
||||
CE[code_executor.py]
|
||||
end
|
||||
|
||||
subgraph Config
|
||||
AC[app_config.py]
|
||||
end
|
||||
|
||||
UI -->|POST /api/start, GET /api/status, GET /api/templates| API
|
||||
API --> SM
|
||||
API --> DA
|
||||
DA --> EC
|
||||
DA --> HG
|
||||
DA --> HT
|
||||
DA --> TI
|
||||
DA --> CE
|
||||
HG --> DP
|
||||
DL --> AC
|
||||
DA --> DL
|
||||
TI --> AT
|
||||
EC --> AC
|
||||
HT --> AC
|
||||
```
|
||||
|
||||
### Change Impact Summary
|
||||
|
||||
| Area | Files Modified | New Files |
|
||||
|------|---------------|-----------|
|
||||
| Data Privacy Fallback | `data_analysis_agent.py`, `utils/data_privacy.py`, `config/app_config.py` | None |
|
||||
| Conversation Trimming | `data_analysis_agent.py`, `config/app_config.py` | None |
|
||||
| Template System | `data_analysis_agent.py`, `web/main.py`, `web/static/script.js`, `web/static/index.html`, `web/static/clean_style.css` | None |
|
||||
| Progress Bar | `web/main.py`, `web/static/script.js`, `web/static/index.html`, `web/static/clean_style.css` | None |
|
||||
| Multi-File Loading | `utils/data_loader.py`, `data_analysis_agent.py`, `config/app_config.py` | None |
|
||||
|
||||
## Components and Interfaces
|
||||
|
||||
### 1. Error Classifier (`data_analysis_agent.py`)
|
||||
|
||||
A new method `_classify_error(error_message: str) -> str` on `DataAnalysisAgent` that inspects error messages and returns `"data_context"` or `"other"`.
|
||||
|
||||
```python
|
||||
DATA_CONTEXT_PATTERNS = [
|
||||
r"KeyError:\s*['\"](.+?)['\"]",
|
||||
r"ValueError.*(?:column|col|field)",
|
||||
r"NameError.*(?:df|data|frame)",
|
||||
r"(?:empty|no\s+data|0\s+rows)",
|
||||
r"IndexError.*(?:out of range|out of bounds)",
|
||||
]
|
||||
|
||||
def _classify_error(self, error_message: str) -> str:
|
||||
"""Classify execution error as data-context or other."""
|
||||
for pattern in DATA_CONTEXT_PATTERNS:
|
||||
if re.search(pattern, error_message, re.IGNORECASE):
|
||||
return "data_context"
|
||||
return "other"
|
||||
```
|
||||
|
||||
### 2. Enriched Hint Generator (`utils/data_privacy.py`)
|
||||
|
||||
A new function `generate_enriched_hint(error_message: str, safe_profile: str) -> str` that extracts the referenced column name from the error, looks it up in the safe profile, and returns a hint string containing only schema-level metadata.
|
||||
|
||||
```python
|
||||
def generate_enriched_hint(error_message: str, safe_profile: str) -> str:
|
||||
"""
|
||||
Generate an enriched hint from the safe profile for a data-context error.
|
||||
Returns schema-level metadata only — no real data values.
|
||||
"""
|
||||
column_name = _extract_column_from_error(error_message)
|
||||
column_meta = _lookup_column_in_profile(column_name, safe_profile)
|
||||
|
||||
hint = "[RETRY CONTEXT] 上一次代码执行因数据上下文错误失败。\n"
|
||||
hint += f"错误信息: {error_message}\n"
|
||||
if column_meta:
|
||||
hint += f"相关列 '{column_name}' 的结构信息:\n"
|
||||
hint += f" - 数据类型: {column_meta['dtype']}\n"
|
||||
hint += f" - 唯一值数量: {column_meta['unique_count']}\n"
|
||||
hint += f" - 空值率: {column_meta['null_rate']}\n"
|
||||
hint += f" - 特征描述: {column_meta['description']}\n"
|
||||
hint += "请根据以上结构信息修正代码,不要假设具体的数据值。"
|
||||
return hint
|
||||
|
||||
def _extract_column_from_error(error_message: str) -> Optional[str]:
|
||||
"""Extract column name from error message patterns like KeyError: 'col_name'."""
|
||||
match = re.search(r"KeyError:\s*['\"](.+?)['\"]", error_message)
|
||||
if match:
|
||||
return match.group(1)
|
||||
match = re.search(r"column\s+['\"](.+?)['\"]", error_message, re.IGNORECASE)
|
||||
if match:
|
||||
return match.group(1)
|
||||
return None
|
||||
|
||||
def _lookup_column_in_profile(column_name: Optional[str], safe_profile: str) -> Optional[dict]:
|
||||
"""Look up column metadata in the safe profile markdown table."""
|
||||
if not column_name:
|
||||
return None
|
||||
# Parse the markdown table rows for the matching column
|
||||
for line in safe_profile.split("\n"):
|
||||
if line.startswith("|") and column_name in line:
|
||||
parts = [p.strip() for p in line.split("|") if p.strip()]
|
||||
if len(parts) >= 5 and parts[0] == column_name:
|
||||
return {
|
||||
"dtype": parts[1],
|
||||
"null_rate": parts[2],
|
||||
"unique_count": parts[3],
|
||||
"description": parts[4],
|
||||
}
|
||||
return None
|
||||
```
|
||||
|
||||
### 3. Conversation History Trimmer (`data_analysis_agent.py`)
|
||||
|
||||
A new method `_trim_conversation_history()` on `DataAnalysisAgent` that implements sliding window trimming with summary compression.
|
||||
|
||||
```python
|
||||
def _trim_conversation_history(self):
|
||||
"""Apply sliding window trimming to conversation history."""
|
||||
window_size = app_config.conversation_window_size
|
||||
max_messages = window_size * 2 # pairs of user+assistant messages
|
||||
|
||||
if len(self.conversation_history) <= max_messages:
|
||||
return # No trimming needed
|
||||
|
||||
first_message = self.conversation_history[0] # Always retain
|
||||
|
||||
# Determine trim boundary: skip first message + possible existing summary
|
||||
start_idx = 1
|
||||
has_existing_summary = (
|
||||
len(self.conversation_history) > 1
|
||||
and self.conversation_history[1]["role"] == "user"
|
||||
and self.conversation_history[1]["content"].startswith("[分析摘要]")
|
||||
)
|
||||
if has_existing_summary:
|
||||
start_idx = 2
|
||||
|
||||
# Messages to trim vs keep
|
||||
messages_to_consider = self.conversation_history[start_idx:]
|
||||
messages_to_trim = messages_to_consider[:-max_messages]
|
||||
messages_to_keep = messages_to_consider[-max_messages:]
|
||||
|
||||
if not messages_to_trim:
|
||||
return
|
||||
|
||||
# Generate summary of trimmed messages
|
||||
summary = self._compress_trimmed_messages(messages_to_trim)
|
||||
|
||||
# Rebuild history: first_message + summary + recent messages
|
||||
self.conversation_history = [first_message]
|
||||
if summary:
|
||||
self.conversation_history.append({"role": "user", "content": summary})
|
||||
self.conversation_history.extend(messages_to_keep)
|
||||
|
||||
def _compress_trimmed_messages(self, messages: list) -> str:
|
||||
"""Compress trimmed messages into a summary string."""
|
||||
summary_parts = ["[分析摘要] 以下是之前分析轮次的概要:"]
|
||||
round_num = 0
|
||||
|
||||
for msg in messages:
|
||||
content = msg["content"]
|
||||
if msg["role"] == "assistant":
|
||||
round_num += 1
|
||||
# Extract action type from YAML-like content
|
||||
action = "generate_code"
|
||||
if "action: \"collect_figures\"" in content or "action: collect_figures" in content:
|
||||
action = "collect_figures"
|
||||
elif "action: \"analysis_complete\"" in content or "action: analysis_complete" in content:
|
||||
action = "analysis_complete"
|
||||
summary_parts.append(f"- 轮次{round_num}: 动作={action}")
|
||||
elif msg["role"] == "user" and "代码执行反馈" in content:
|
||||
success = "失败" if "[ERROR]" in content or "执行错误" in content else "成功"
|
||||
summary_parts[-1] += f", 执行结果={success}"
|
||||
|
||||
return "\n".join(summary_parts)
|
||||
```
|
||||
|
||||
### 4. Template Integration (`data_analysis_agent.py` + `web/main.py`)
|
||||
|
||||
The `analyze()` method gains an optional `template_name` parameter. When provided, the template prompt is prepended to the user requirement.
|
||||
|
||||
**Agent side:**
|
||||
```python
|
||||
def analyze(self, user_input: str, files=None, session_output_dir=None,
|
||||
reset_session=True, max_rounds=None, template_name=None):
|
||||
# ... existing init code ...
|
||||
if template_name:
|
||||
from utils.analysis_templates import get_template
|
||||
template = get_template(template_name) # Raises ValueError if invalid
|
||||
template_prompt = template.get_full_prompt()
|
||||
user_input = f"{template_prompt}\n\n{user_input}"
|
||||
# ... rest of analyze ...
|
||||
```
|
||||
|
||||
**API side (`web/main.py`):**
|
||||
```python
|
||||
# New endpoint
|
||||
@app.get("/api/templates")
|
||||
async def list_available_templates():
|
||||
from utils.analysis_templates import list_templates
|
||||
return {"templates": list_templates()}
|
||||
|
||||
# Modified StartRequest
|
||||
class StartRequest(BaseModel):
|
||||
requirement: str
|
||||
template: Optional[str] = None
|
||||
```
|
||||
|
||||
### 5. Progress Bar Integration
|
||||
|
||||
**Backend (`web/main.py`):** Update `run_analysis_task` to set progress fields on `SessionData` via a callback or by polling the agent's `current_round`. The simplest approach is to add a progress callback to the agent.
|
||||
|
||||
```python
|
||||
# In DataAnalysisAgent
|
||||
def set_progress_callback(self, callback):
|
||||
"""Set a callback function(current_round, max_rounds, message) for progress updates."""
|
||||
self._progress_callback = callback
|
||||
|
||||
# Called at the start of each round in the analyze() loop:
|
||||
if hasattr(self, '_progress_callback') and self._progress_callback:
|
||||
self._progress_callback(self.current_round, self.max_rounds, f"第{self.current_round}轮分析中...")
|
||||
```
|
||||
|
||||
**Backend (`web/main.py`):** In `run_analysis_task`, wire the callback:
|
||||
```python
|
||||
def progress_cb(current, total, message):
|
||||
session.current_round = current
|
||||
session.max_rounds = total
|
||||
session.progress_percentage = round((current / total) * 100, 1) if total > 0 else 0
|
||||
session.status_message = message
|
||||
|
||||
agent.set_progress_callback(progress_cb)
|
||||
```
|
||||
|
||||
**API response:** Add progress fields to `GET /api/status`:
|
||||
```python
|
||||
return {
|
||||
"is_running": session.is_running,
|
||||
"log": log_content,
|
||||
"has_report": ...,
|
||||
"current_round": session.current_round,
|
||||
"max_rounds": session.max_rounds,
|
||||
"progress_percentage": session.progress_percentage,
|
||||
"status_message": session.status_message,
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
**Frontend (`script.js`):** During polling, render a progress bar when `is_running` is true:
|
||||
```javascript
|
||||
// In the polling callback:
|
||||
if (data.is_running) {
|
||||
updateProgressBar(data.progress_percentage, data.status_message);
|
||||
}
|
||||
```
|
||||
|
||||
### 6. Multi-File Chunked & Parallel Loading
|
||||
|
||||
**Chunked loading enhancement (`utils/data_loader.py`):**
|
||||
|
||||
```python
|
||||
def load_and_profile_data_smart(file_paths: list, max_file_size_mb: int = None) -> str:
|
||||
"""Smart loader: uses chunked reading for large files, regular for small."""
|
||||
if max_file_size_mb is None:
|
||||
max_file_size_mb = app_config.max_file_size_mb
|
||||
|
||||
profile_summary = "# 数据画像报告 (Data Profile)\n\n"
|
||||
for file_path in file_paths:
|
||||
file_size_mb = os.path.getsize(file_path) / (1024 * 1024)
|
||||
if file_size_mb > max_file_size_mb:
|
||||
profile_summary += _profile_chunked(file_path)
|
||||
else:
|
||||
profile_summary += _profile_full(file_path)
|
||||
return profile_summary
|
||||
|
||||
def _profile_chunked(file_path: str) -> str:
|
||||
"""Profile a large file by reading first chunk + sampling subsequent chunks."""
|
||||
chunks = load_data_chunked(file_path)
|
||||
first_chunk = next(chunks, None)
|
||||
if first_chunk is None:
|
||||
return f"[ERROR] 无法读取文件: {file_path}\n"
|
||||
|
||||
# Sample from subsequent chunks
|
||||
sample_rows = [first_chunk]
|
||||
for i, chunk in enumerate(chunks):
|
||||
if i % 5 == 0: # Sample every 5th chunk
|
||||
sample_rows.append(chunk.sample(min(100, len(chunk))))
|
||||
|
||||
combined = pd.concat(sample_rows, ignore_index=True)
|
||||
# Generate profile from combined sample
|
||||
return _generate_profile_for_df(combined, file_path, sampled=True)
|
||||
```
|
||||
|
||||
**Parallel profiling (`data_analysis_agent.py`):**
|
||||
|
||||
```python
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
|
||||
def _profile_files_parallel(self, file_paths: list) -> tuple[str, str]:
|
||||
"""Profile multiple files concurrently."""
|
||||
max_workers = app_config.max_parallel_profiles
|
||||
safe_profiles = []
|
||||
local_profiles = []
|
||||
|
||||
def profile_single(path):
|
||||
safe = build_safe_profile([path])
|
||||
local = build_local_profile([path])
|
||||
return path, safe, local
|
||||
|
||||
with ThreadPoolExecutor(max_workers=max_workers) as executor:
|
||||
futures = {executor.submit(profile_single, p): p for p in file_paths}
|
||||
for future in as_completed(futures):
|
||||
path = futures[future]
|
||||
try:
|
||||
_, safe, local = future.result()
|
||||
safe_profiles.append(safe)
|
||||
local_profiles.append(local)
|
||||
except Exception as e:
|
||||
error_entry = f"## 文件: {os.path.basename(path)}\n[ERROR] 分析失败: {e}\n\n"
|
||||
safe_profiles.append(error_entry)
|
||||
local_profiles.append(error_entry)
|
||||
|
||||
return "\n".join(safe_profiles), "\n".join(local_profiles)
|
||||
```
|
||||
|
||||
## Data Models
|
||||
|
||||
### AppConfig Extensions (`config/app_config.py`)
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class AppConfig:
|
||||
# ... existing fields ...
|
||||
|
||||
# New fields
|
||||
max_data_context_retries: int = field(default=2)
|
||||
conversation_window_size: int = field(default=10)
|
||||
max_parallel_profiles: int = field(default=4)
|
||||
|
||||
@classmethod
|
||||
def from_env(cls) -> 'AppConfig':
|
||||
config = cls()
|
||||
# ... existing env overrides ...
|
||||
if val := os.getenv("APP_MAX_DATA_CONTEXT_RETRIES"):
|
||||
config.max_data_context_retries = int(val)
|
||||
if val := os.getenv("APP_CONVERSATION_WINDOW_SIZE"):
|
||||
config.conversation_window_size = int(val)
|
||||
if val := os.getenv("APP_MAX_PARALLEL_PROFILES"):
|
||||
config.max_parallel_profiles = int(val)
|
||||
return config
|
||||
```
|
||||
|
||||
### StartRequest Extension (`web/main.py`)
|
||||
|
||||
```python
|
||||
class StartRequest(BaseModel):
|
||||
requirement: str
|
||||
template: Optional[str] = None # New field
|
||||
```
|
||||
|
||||
### SessionData Progress Fields (already exist, just need wiring)
|
||||
|
||||
The `SessionData` class already has `current_round`, `max_rounds`, `progress_percentage`, and `status_message` fields. These just need to be updated during analysis and included in the `/api/status` response.
|
||||
|
||||
## Correctness Properties
|
||||
|
||||
*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
|
||||
|
||||
### Property 1: Error Classification Correctness
|
||||
|
||||
*For any* error message string, if it contains a data-context pattern (KeyError on a column name, ValueError on column values, NameError for data variables, or empty DataFrame conditions), `_classify_error` SHALL return `"data_context"`; otherwise it SHALL return `"other"`.
|
||||
|
||||
**Validates: Requirements 1.1**
|
||||
|
||||
### Property 2: Retry Below Limit Produces Enriched Hint
|
||||
|
||||
*For any* `max_data_context_retries` value and any current retry count strictly less than that value, when a data-context error is detected, the agent SHALL produce an enriched hint message rather than forwarding the raw error.
|
||||
|
||||
**Validates: Requirements 1.3**
|
||||
|
||||
### Property 3: Enriched Hint Contains Correct Column Metadata Without Real Data
|
||||
|
||||
*For any* error message referencing a column name present in the Safe_Profile, the generated enriched hint SHALL contain that column's data type, unique value count, null rate, and categorical description, and SHALL NOT contain any real data values (min, max, mean, sample rows) from the Local_Profile.
|
||||
|
||||
**Validates: Requirements 2.1, 2.2, 2.4**
|
||||
|
||||
### Property 4: Environment Variable Override for Config Fields
|
||||
|
||||
*For any* positive integer value set as the `APP_MAX_DATA_CONTEXT_RETRIES` environment variable, `AppConfig.from_env()` SHALL produce a config where `max_data_context_retries` equals that integer value.
|
||||
|
||||
**Validates: Requirements 3.2**
|
||||
|
||||
### Property 5: Sliding Window Trimming Preserves First Message and Retains Recent Pairs
|
||||
|
||||
*For any* conversation history whose length exceeds `2 * conversation_window_size` and any `conversation_window_size >= 1`, after trimming: (a) the first user message is always retained at index 0, and (b) the most recent `conversation_window_size` message pairs are retained in full.
|
||||
|
||||
**Validates: Requirements 4.2, 4.3**
|
||||
|
||||
### Property 6: Trimming Summary Contains Round Info and Excludes Code/Raw Output
|
||||
|
||||
*For any* set of trimmed conversation messages, the generated summary SHALL list each trimmed round's action type and execution success/failure, and SHALL NOT contain any code blocks (``` markers) or raw execution output.
|
||||
|
||||
**Validates: Requirements 4.4, 5.1, 5.2**
|
||||
|
||||
### Property 7: Template Prompt Integration
|
||||
|
||||
*For any* valid template name in `TEMPLATE_REGISTRY` and any user requirement string, the initial conversation message SHALL contain the template's `get_full_prompt()` output prepended to the user requirement.
|
||||
|
||||
**Validates: Requirements 6.1, 6.2**
|
||||
|
||||
### Property 8: Invalid Template Name Raises Descriptive Error
|
||||
|
||||
*For any* string that is not a key in `TEMPLATE_REGISTRY`, calling `get_template()` SHALL raise a `ValueError` whose message contains the list of available template names.
|
||||
|
||||
**Validates: Requirements 6.3**
|
||||
|
||||
### Property 9: Chunked Loading Threshold
|
||||
|
||||
*For any* file path and `max_file_size_mb` threshold, if the file's size in MB exceeds the threshold, the smart loader SHALL use chunked loading; otherwise it SHALL use full loading.
|
||||
|
||||
**Validates: Requirements 10.1**
|
||||
|
||||
### Property 10: Chunked Profiling Uses First Chunk Plus Samples
|
||||
|
||||
*For any* file loaded in chunked mode, the generated profile SHALL be based on the first chunk plus sampled rows from subsequent chunks, not from the entire file loaded into memory.
|
||||
|
||||
**Validates: Requirements 10.3**
|
||||
|
||||
### Property 11: Parallel Profile Merge With Error Resilience
|
||||
|
||||
*For any* set of file paths where some are valid and some are invalid/corrupted, the merged profile output SHALL contain valid profile entries for successful files and error entries for failed files, with no files missing from the output.
|
||||
|
||||
**Validates: Requirements 11.2, 11.3**
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Handling Strategy |
|
||||
|----------|------------------|
|
||||
| Data-context error below retry limit | Generate enriched hint, retry with LLM |
|
||||
| Data-context error at retry limit | Fall back to normal sanitized error forwarding |
|
||||
| Invalid template name | Raise `ValueError` with available template list |
|
||||
| File too large for memory | Automatically switch to chunked loading |
|
||||
| Chunked loading fails | Return descriptive error, continue with other files |
|
||||
| Single file profiling fails in parallel | Include error entry, continue profiling remaining files |
|
||||
| Conversation history exceeds window | Trim old messages, generate compressed summary |
|
||||
| Summary generation fails | Log warning, proceed without summary (graceful degradation) |
|
||||
| Progress callback fails | Log warning, analysis continues without progress updates |
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Property-Based Tests (using `hypothesis`)
|
||||
|
||||
Each correctness property maps to a property-based test with minimum 100 iterations. The test library is `hypothesis` (Python).
|
||||
|
||||
- **Property 1**: Generate random error strings with/without data-context patterns → verify classification
|
||||
- **Property 2**: Generate random retry counts and limits → verify hint vs raw error behavior
|
||||
- **Property 3**: Generate random Safe_Profile tables and error messages → verify hint content and absence of real data
|
||||
- **Property 4**: Generate random positive integers → set env var → verify config
|
||||
- **Property 5**: Generate random conversation histories and window sizes → verify trimming invariants
|
||||
- **Property 6**: Generate random trimmed message sets → verify summary content and absence of code blocks
|
||||
- **Property 7**: Pick random valid template names and requirement strings → verify prompt construction
|
||||
- **Property 8**: Generate random strings not in registry → verify ValueError
|
||||
- **Property 9**: Generate random file sizes and thresholds → verify loading method selection
|
||||
- **Property 10**: Generate random chunked data → verify profile source
|
||||
- **Property 11**: Generate random file sets with failures → verify merged output
|
||||
|
||||
Tag format: `Feature: agent-robustness-optimization, Property {N}: {title}`
|
||||
|
||||
### Unit Tests
|
||||
|
||||
- Error classifier with specific known error messages (KeyError, ValueError, NameError, generic errors)
|
||||
- Enriched hint generation with known column profiles
|
||||
- Conversation trimming with exact message counts at boundary conditions
|
||||
- Template retrieval for each registered template
|
||||
- Progress callback wiring
|
||||
- API endpoint response shapes (`GET /api/templates`, `GET /api/status` with progress fields)
|
||||
|
||||
### Integration Tests
|
||||
|
||||
- `GET /api/templates` returns all registered templates
|
||||
- `POST /api/start` with `template` field passes template to agent
|
||||
- `GET /api/status` includes progress fields during analysis
|
||||
- Multi-file parallel profiling with real CSV files
|
||||
- End-to-end: start analysis with template → verify template prompt in conversation history
|
||||
142
.kiro/specs/agent-robustness-optimization/requirements.md
Normal file
142
.kiro/specs/agent-robustness-optimization/requirements.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# Requirements Document
|
||||
|
||||
## Introduction
|
||||
|
||||
This document specifies the requirements for improving the robustness, efficiency, and usability of the AI Data Analysis Agent. The improvements span five areas: a data privacy fallback mechanism for recovering from LLM-generated code failures when real data is unavailable, conversation history trimming to reduce token consumption and prevent data leakage, integration of the existing analysis template system, frontend progress bar display, and multi-file parallel/chunked analysis support.
|
||||
|
||||
## Glossary
|
||||
|
||||
- **Agent**: The `DataAnalysisAgent` class in `data_analysis_agent.py` that orchestrates LLM calls and IPython code execution for data analysis.
|
||||
- **Safe_Profile**: The schema-only data description generated by `build_safe_profile()` in `utils/data_privacy.py`, containing column names, data types, null rates, and unique value counts — but no real data values.
|
||||
- **Local_Profile**: The full data profile generated by `build_local_profile()` containing real data values, statistics, and sample rows — used only in the local execution environment.
|
||||
- **Code_Executor**: The `CodeExecutor` class in `utils/code_executor.py` that runs Python code in an IPython sandbox and returns execution results.
|
||||
- **Conversation_History**: The list of `{"role": ..., "content": ...}` message dictionaries maintained by the Agent across analysis rounds.
|
||||
- **Feedback_Sanitizer**: The `sanitize_execution_feedback()` function in `utils/data_privacy.py` that removes real data values from execution output before sending to the LLM.
|
||||
- **Template_Registry**: The `TEMPLATE_REGISTRY` dictionary in `utils/analysis_templates.py` mapping template names to template classes.
|
||||
- **Session_Data**: The `SessionData` class in `web/main.py` that tracks session state including `progress_percentage`, `current_round`, `max_rounds`, and `status_message`.
|
||||
- **Polling_Loop**: The `setInterval`-based polling mechanism in `web/static/script.js` that fetches `/api/status` every 2 seconds.
|
||||
- **Data_Loader**: The module `utils/data_loader.py` providing `load_and_profile_data`, `load_data_chunked`, and `load_data_with_cache` functions.
|
||||
- **AppConfig**: The `AppConfig` dataclass in `config/app_config.py` holding configuration values such as `max_rounds`, `chunk_size`, and `max_file_size_mb`.
|
||||
|
||||
## Requirements
|
||||
|
||||
### Requirement 1: Data Privacy Fallback — Error Detection
|
||||
|
||||
**User Story:** As a system operator, I want the Agent to detect when LLM-generated code fails due to missing real data context, so that the system can attempt intelligent recovery instead of wasting an analysis round.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the Code_Executor returns a failed execution result, THE Agent SHALL classify the error as either a data-context error or a non-data error by inspecting the error message for patterns such as `KeyError`, `ValueError` on column values, `NameError` for undefined data variables, or empty DataFrame conditions.
|
||||
2. WHEN a data-context error is detected, THE Agent SHALL increment a per-round retry counter for the current analysis round.
|
||||
3. WHILE the retry counter for a given round is below the configured maximum retry limit, THE Agent SHALL attempt recovery by generating an enriched hint prompt rather than forwarding the raw error to the LLM as a normal failure.
|
||||
4. IF the retry counter reaches the configured maximum retry limit, THEN THE Agent SHALL fall back to normal error handling by forwarding the sanitized error feedback to the LLM and proceeding to the next round.
|
||||
|
||||
### Requirement 2: Data Privacy Fallback — Enriched Hint Generation
|
||||
|
||||
**User Story:** As a system operator, I want the Agent to provide the LLM with enriched schema hints when data-context errors occur, so that the LLM can generate corrected code without receiving raw data values.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN a data-context error is detected and retry is permitted, THE Agent SHALL generate an enriched hint containing the relevant column's data type, unique value count, null rate, and a categorical description (e.g., "low-cardinality category with 5 classes") extracted from the Safe_Profile.
|
||||
2. WHEN the error involves a specific column name referenced in the error message, THE Agent SHALL include that column's schema metadata in the enriched hint.
|
||||
3. THE Agent SHALL append the enriched hint to the conversation history as a user message with a prefix indicating it is a retry context, before requesting a new LLM response.
|
||||
4. THE Agent SHALL NOT include any real data values, sample rows, or statistical values (min, max, mean) from the Local_Profile in the enriched hint sent to the LLM.
|
||||
|
||||
### Requirement 3: Data Privacy Fallback — Configuration
|
||||
|
||||
**User Story:** As a system operator, I want to configure the maximum number of data-context retries, so that I can balance between recovery attempts and analysis throughput.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE AppConfig SHALL include a `max_data_context_retries` field with a default value of 2.
|
||||
2. WHEN the `APP_MAX_DATA_CONTEXT_RETRIES` environment variable is set, THE AppConfig SHALL use its integer value to override the default.
|
||||
3. THE Agent SHALL read the `max_data_context_retries` value from AppConfig during initialization.
|
||||
|
||||
### Requirement 4: Conversation History Trimming — Sliding Window
|
||||
|
||||
**User Story:** As a system operator, I want the conversation history to be trimmed using a sliding window, so that token consumption stays bounded and early execution results containing potential data leakage are removed.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE AppConfig SHALL include a `conversation_window_size` field with a default value of 10, representing the maximum number of recent message pairs to retain in full.
|
||||
2. WHEN the Conversation_History length exceeds twice the `conversation_window_size` (counting individual messages), THE Agent SHALL retain only the most recent `conversation_window_size` pairs of messages in full detail.
|
||||
3. THE Agent SHALL always retain the first user message (containing the original requirement and Safe_Profile) regardless of window trimming.
|
||||
4. WHEN messages are trimmed from the Conversation_History, THE Agent SHALL generate a compressed summary of the trimmed messages and prepend it after the first user message.
|
||||
|
||||
### Requirement 5: Conversation History Trimming — Summary Compression
|
||||
|
||||
**User Story:** As a system operator, I want trimmed conversation rounds to be compressed into a summary, so that the LLM retains awareness of prior analysis steps without consuming excessive tokens.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN conversation messages are trimmed, THE Agent SHALL produce a summary string that lists each trimmed round's action type (generate_code, collect_figures), a one-line description of what was done, and whether execution succeeded or failed.
|
||||
2. THE summary SHALL NOT contain any code blocks, raw execution output, or data values from prior rounds.
|
||||
3. THE summary SHALL be inserted into the Conversation_History as a single user message immediately after the first user message, replacing any previous summary message.
|
||||
4. IF no messages have been trimmed, THEN THE Agent SHALL NOT insert a summary message.
|
||||
|
||||
### Requirement 6: Analysis Template System — Backend Integration
|
||||
|
||||
**User Story:** As a user, I want to select a predefined analysis template when starting an analysis, so that the Agent follows a structured analysis plan tailored to my scenario.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN a template name is provided in the analysis request, THE Agent SHALL retrieve the corresponding template from the Template_Registry using the `get_template()` function.
|
||||
2. WHEN a valid template is retrieved, THE Agent SHALL call `get_full_prompt()` on the template and prepend the resulting structured prompt to the user's requirement in the initial conversation message.
|
||||
3. IF an invalid template name is provided, THEN THE Agent SHALL raise a descriptive error listing available template names.
|
||||
4. WHEN no template name is provided, THE Agent SHALL proceed with the default unstructured analysis flow.
|
||||
|
||||
### Requirement 7: Analysis Template System — API Endpoint
|
||||
|
||||
**User Story:** As a frontend developer, I want API endpoints to list available templates and to accept a template selection when starting analysis, so that the frontend can offer template choices to users.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE FastAPI server SHALL expose a `GET /api/templates` endpoint that returns the list of available templates by calling `list_templates()`, with each entry containing `name`, `display_name`, and `description`.
|
||||
2. THE `POST /api/start` request body SHALL accept an optional `template` field containing the template name string.
|
||||
3. WHEN the `template` field is present in the start request, THE FastAPI server SHALL pass the template name to the Agent's `analyze()` method.
|
||||
4. WHEN the `template` field is absent or empty, THE FastAPI server SHALL start analysis without a template.
|
||||
|
||||
### Requirement 8: Analysis Template System — Frontend Template Selector
|
||||
|
||||
**User Story:** As a user, I want to see and select analysis templates in the web interface before starting analysis, so that I can choose a structured analysis approach.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN the web page loads, THE frontend SHALL fetch the template list from `GET /api/templates` and render selectable template cards above the requirement input area.
|
||||
2. WHEN a user selects a template card, THE frontend SHALL visually highlight the selected template and store the template name.
|
||||
3. WHEN the user clicks "Start Analysis" with a template selected, THE frontend SHALL include the template name in the `POST /api/start` request body.
|
||||
4. THE frontend SHALL provide a "No Template (Free Analysis)" option that is selected by default, allowing users to proceed without a template.
|
||||
|
||||
### Requirement 9: Frontend Progress Bar Display
|
||||
|
||||
**User Story:** As a user, I want to see a real-time progress bar during analysis, so that I can understand how far the analysis has progressed.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. THE FastAPI server SHALL update the Session_Data's `current_round`, `max_rounds`, `progress_percentage`, and `status_message` fields during each analysis round in the `run_analysis_task` function.
|
||||
2. THE `GET /api/status` response SHALL include `current_round`, `max_rounds`, `progress_percentage`, and `status_message` fields.
|
||||
3. WHEN the Polling_Loop receives status data with `is_running` equal to true, THE frontend SHALL render a progress bar element showing the `progress_percentage` value and the `status_message` text.
|
||||
4. WHEN `progress_percentage` changes between polls, THE frontend SHALL animate the progress bar width transition smoothly.
|
||||
5. WHEN `is_running` becomes false, THE frontend SHALL set the progress bar to 100% and display a completion message.
|
||||
|
||||
### Requirement 10: Multi-File Chunked Loading
|
||||
|
||||
**User Story:** As a user, I want large data files to be loaded in chunks, so that the system can handle files that exceed available memory.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN a data file's size exceeds the `max_file_size_mb` threshold in AppConfig, THE Data_Loader SHALL use `load_data_chunked()` to stream the file in chunks of `chunk_size` rows instead of loading the entire file into memory.
|
||||
2. WHEN chunked loading is used, THE Agent SHALL instruct the Code_Executor to make the chunked iterator available in the notebook environment as a variable, so that LLM-generated code can process data in chunks.
|
||||
3. WHEN chunked loading is used for profiling, THE Agent SHALL generate the Safe_Profile by reading only the first chunk plus sampling from subsequent chunks, rather than loading the entire file.
|
||||
4. IF a file cannot be loaded even in chunked mode, THEN THE Data_Loader SHALL return a descriptive error message indicating the failure reason.
|
||||
|
||||
### Requirement 11: Multi-File Parallel Profiling
|
||||
|
||||
**User Story:** As a user, I want multiple data files to be profiled concurrently, so that the initial data exploration phase completes faster when multiple files are uploaded.
|
||||
|
||||
#### Acceptance Criteria
|
||||
|
||||
1. WHEN multiple files are provided for analysis, THE Agent SHALL profile each file concurrently using thread-based parallelism rather than sequentially.
|
||||
2. THE Agent SHALL collect all profiling results and merge them into a single Safe_Profile string and a single Local_Profile string, maintaining the same format as the current sequential output.
|
||||
3. IF any individual file profiling fails, THEN THE Agent SHALL include an error entry for that file in the profile output and continue profiling the remaining files.
|
||||
4. THE AppConfig SHALL include a `max_parallel_profiles` field with a default value of 4, controlling the maximum number of concurrent profiling threads.
|
||||
74
.kiro/specs/agent-robustness-optimization/tasks.md
Normal file
74
.kiro/specs/agent-robustness-optimization/tasks.md
Normal file
@@ -0,0 +1,74 @@
|
||||
# Tasks — Agent Robustness Optimization
|
||||
|
||||
## Priority 1: Configuration Foundation
|
||||
|
||||
- [ ] 1. Add new config fields to AppConfig
|
||||
- [-] 1.1 Add `max_data_context_retries` field (default=2) with `APP_MAX_DATA_CONTEXT_RETRIES` env override to `config/app_config.py`
|
||||
- [-] 1.2 Add `conversation_window_size` field (default=10) with `APP_CONVERSATION_WINDOW_SIZE` env override to `config/app_config.py`
|
||||
- [-] 1.3 Add `max_parallel_profiles` field (default=4) with `APP_MAX_PARALLEL_PROFILES` env override to `config/app_config.py`
|
||||
|
||||
## Priority 2: Data Privacy Fallback (R1–R3)
|
||||
|
||||
- [ ] 2. Implement error classification
|
||||
- [~] 2.1 Add `_classify_error(error_message: str) -> str` method to `DataAnalysisAgent` in `data_analysis_agent.py` with regex patterns for KeyError, ValueError, NameError, empty DataFrame
|
||||
- [~] 2.2 Add `_extract_column_from_error(error_message: str) -> Optional[str]` function to `utils/data_privacy.py`
|
||||
- [~] 2.3 Add `_lookup_column_in_profile(column_name, safe_profile) -> Optional[dict]` function to `utils/data_privacy.py`
|
||||
- [ ] 3. Implement enriched hint generation
|
||||
- [~] 3.1 Add `generate_enriched_hint(error_message: str, safe_profile: str) -> str` function to `utils/data_privacy.py`
|
||||
- [~] 3.2 Integrate retry logic into the `analyze()` loop in `data_analysis_agent.py`: add per-round retry counter, call `_classify_error` on failures, generate enriched hint when below retry limit, fall back to normal error handling at limit
|
||||
|
||||
## Priority 3: Conversation History Trimming (R4–R5)
|
||||
|
||||
- [ ] 4. Implement conversation trimming
|
||||
- [~] 4.1 Add `_trim_conversation_history()` method to `DataAnalysisAgent` implementing sliding window with first-message preservation
|
||||
- [~] 4.2 Add `_compress_trimmed_messages(messages: list) -> str` method to `DataAnalysisAgent` that generates summary with action types and success/failure, excluding code blocks and raw output
|
||||
- [~] 4.3 Call `_trim_conversation_history()` at the start of each round in the `analyze()` loop, after the first round
|
||||
|
||||
## Priority 4: Analysis Template System (R6–R8)
|
||||
|
||||
- [ ] 5. Backend template integration
|
||||
- [~] 5.1 Add optional `template_name` parameter to `DataAnalysisAgent.analyze()` method; retrieve template via `get_template()`, prepend `get_full_prompt()` to user requirement
|
||||
- [~] 5.2 Add `GET /api/templates` endpoint to `web/main.py` returning `list_templates()` result
|
||||
- [~] 5.3 Add optional `template` field to `StartRequest` model in `web/main.py`; pass template name to agent in `run_analysis_task`
|
||||
- [ ] 6. Frontend template selector
|
||||
- [~] 6.1 Add template selector HTML section (cards above requirement input) to `web/static/index.html`
|
||||
- [~] 6.2 Add template fetching, selection logic, and "No Template" default to `web/static/script.js`
|
||||
- [~] 6.3 Add template card styles (`.template-card`, `.template-card.selected`) to `web/static/clean_style.css`
|
||||
|
||||
## Priority 5: Frontend Progress Bar (R9)
|
||||
|
||||
- [ ] 7. Backend progress updates
|
||||
- [~] 7.1 Add `set_progress_callback(callback)` method to `DataAnalysisAgent`; call callback at start of each round in `analyze()` loop
|
||||
- [~] 7.2 Wire progress callback in `run_analysis_task` in `web/main.py` to update `SessionData` progress fields
|
||||
- [~] 7.3 Add `current_round`, `max_rounds`, `progress_percentage`, `status_message` to `GET /api/status` response in `web/main.py`
|
||||
- [ ] 8. Frontend progress bar
|
||||
- [~] 8.1 Add progress bar HTML element below the status bar area in `web/static/index.html`
|
||||
- [~] 8.2 Add `updateProgressBar(percentage, message)` function to `web/static/script.js`; call it during polling when `is_running` is true; set to 100% on completion
|
||||
- [~] 8.3 Add progress bar styles with CSS transition animation to `web/static/clean_style.css`
|
||||
|
||||
## Priority 6: Multi-File Chunked & Parallel Loading (R10–R11)
|
||||
|
||||
- [ ] 9. Chunked loading enhancement
|
||||
- [~] 9.1 Add `_profile_chunked(file_path: str) -> str` function to `utils/data_loader.py` that profiles using first chunk + sampled subsequent chunks
|
||||
- [~] 9.2 Add `load_and_profile_data_smart(file_paths, max_file_size_mb) -> str` function to `utils/data_loader.py` that selects chunked vs full loading based on file size threshold
|
||||
- [~] 9.3 Update `DataAnalysisAgent.analyze()` to use smart loader and expose chunked iterator in Code_Executor namespace for large files
|
||||
- [ ] 10. Parallel profiling
|
||||
- [~] 10.1 Add `_profile_files_parallel(file_paths: list) -> tuple[str, str]` method to `DataAnalysisAgent` using `ThreadPoolExecutor` with `max_parallel_profiles` workers
|
||||
- [~] 10.2 Update `DataAnalysisAgent.analyze()` to call `_profile_files_parallel` when multiple files are provided, replacing sequential `build_safe_profile` + `build_local_profile` calls
|
||||
|
||||
## Priority 7: Testing
|
||||
|
||||
- [ ] 11. Write property-based tests
|
||||
- [ ] 11.1 ~PBT~ Property test for error classification correctness (Property 1) using `hypothesis`
|
||||
- [ ] 11.2 ~PBT~ Property test for enriched hint content and privacy (Property 3) using `hypothesis`
|
||||
- [ ] 11.3 ~PBT~ Property test for env var config override (Property 4) using `hypothesis`
|
||||
- [ ] 11.4 ~PBT~ Property test for sliding window trimming invariants (Property 5) using `hypothesis`
|
||||
- [ ] 11.5 ~PBT~ Property test for trimming summary content (Property 6) using `hypothesis`
|
||||
- [ ] 11.6 ~PBT~ Property test for template prompt integration (Property 7) using `hypothesis`
|
||||
- [ ] 11.7 ~PBT~ Property test for invalid template error (Property 8) using `hypothesis`
|
||||
- [ ] 11.8 ~PBT~ Property test for parallel profile merge with error resilience (Property 11) using `hypothesis`
|
||||
- [ ] 12. Write unit and integration tests
|
||||
- [ ] 12.1 Unit tests for error classifier with known error messages
|
||||
- [ ] 12.2 Unit tests for conversation trimming at boundary conditions
|
||||
- [ ] 12.3 Integration tests for `GET /api/templates` and `POST /api/start` with template field
|
||||
- [ ] 12.4 Integration tests for `GET /api/status` progress fields
|
||||
Reference in New Issue
Block a user