.kiro/specs/agent-robustness-optimization/design.md

# Design Document: Agent Robustness Optimization

## Overview

This design addresses five areas of improvement for the AI Data Analysis Agent: data privacy fallback recovery, conversation history trimming, analysis template integration, frontend progress display, and multi-file chunked/parallel loading. The changes span the Python backend (`data_analysis_agent.py`, `config/app_config.py`, `utils/data_privacy.py`, `utils/data_loader.py`, `web/main.py`) and the vanilla JS frontend (`web/static/script.js`, `web/static/index.html`, `web/static/clean_style.css`).

The core design principle is **minimal invasiveness**: each feature is implemented as a composable module or method that plugs into the existing agent loop, avoiding large-scale refactors of the `DataAnalysisAgent.analyze()` main loop.

## Architecture

The system follows a layered architecture where the `DataAnalysisAgent` orchestrates LLM calls and code execution, the FastAPI server manages sessions and exposes APIs, and the frontend polls for status updates.

```mermaid
graph TD
    subgraph Frontend
        UI[script.js + index.html]
    end

    subgraph FastAPI Server
        API[web/main.py]
        SM[SessionManager]
    end

    subgraph Agent Core
        DA[DataAnalysisAgent]
        EC[ErrorClassifier]
        HG[HintGenerator]
        HT[HistoryTrimmer]
        TI[TemplateIntegration]
    end

    subgraph Utilities
        DP[data_privacy.py]
        DL[data_loader.py]
        AT[analysis_templates.py]
        CE[code_executor.py]
    end

    subgraph Config
        AC[app_config.py]
    end

    UI -->|POST /api/start, GET /api/status, GET /api/templates| API
    API --> SM
    API --> DA
    DA --> EC
    DA --> HG
    DA --> HT
    DA --> TI
    DA --> CE
    HG --> DP
    DL --> AC
    DA --> DL
    TI --> AT
    EC --> AC
    HT --> AC
```

### Change Impact Summary

| Area | Files Modified | New Files |
|------|---------------|-----------|
| Data Privacy Fallback | `data_analysis_agent.py`, `utils/data_privacy.py`, `config/app_config.py` | None |
| Conversation Trimming | `data_analysis_agent.py`, `config/app_config.py` | None |
| Template System | `data_analysis_agent.py`, `web/main.py`, `web/static/script.js`, `web/static/index.html`, `web/static/clean_style.css` | None |
| Progress Bar | `web/main.py`, `web/static/script.js`, `web/static/index.html`, `web/static/clean_style.css` | None |
| Multi-File Loading | `utils/data_loader.py`, `data_analysis_agent.py`, `config/app_config.py` | None |

## Components and Interfaces

### 1. Error Classifier (`data_analysis_agent.py`)

A new method `_classify_error(error_message: str) -> str` on `DataAnalysisAgent` that inspects error messages and returns `"data_context"` or `"other"`.

```python
DATA_CONTEXT_PATTERNS = [
    r"KeyError:\s*['\"](.+?)['\"]",
    r"ValueError.*(?:column|col|field)",
    r"NameError.*(?:df|data|frame)",
    r"(?:empty|no\s+data|0\s+rows)",
    r"IndexError.*(?:out of range|out of bounds)",
]

def _classify_error(self, error_message: str) -> str:
    """Classify execution error as data-context or other."""
    for pattern in DATA_CONTEXT_PATTERNS:
        if re.search(pattern, error_message, re.IGNORECASE):
            return "data_context"
    return "other"
```

### 2. Enriched Hint Generator (`utils/data_privacy.py`)

A new function `generate_enriched_hint(error_message: str, safe_profile: str) -> str` that extracts the referenced column name from the error, looks it up in the safe profile, and returns a hint string containing only schema-level metadata.

```python
def generate_enriched_hint(error_message: str, safe_profile: str) -> str:
    """
    Generate an enriched hint from the safe profile for a data-context error.
    Returns schema-level metadata only — no real data values.
    """
    column_name = _extract_column_from_error(error_message)
    column_meta = _lookup_column_in_profile(column_name, safe_profile)
    
    hint = "[RETRY CONTEXT] 上一次代码执行因数据上下文错误失败。\n"
    hint += f"错误信息: {error_message}\n"
    if column_meta:
        hint += f"相关列 '{column_name}' 的结构信息:\n"
        hint += f"  - 数据类型: {column_meta['dtype']}\n"
        hint += f"  - 唯一值数量: {column_meta['unique_count']}\n"
        hint += f"  - 空值率: {column_meta['null_rate']}\n"
        hint += f"  - 特征描述: {column_meta['description']}\n"
    hint += "请根据以上结构信息修正代码，不要假设具体的数据值。"
    return hint

def _extract_column_from_error(error_message: str) -> Optional[str]:
    """Extract column name from error message patterns like KeyError: 'col_name'."""
    match = re.search(r"KeyError:\s*['\"](.+?)['\"]", error_message)
    if match:
        return match.group(1)
    match = re.search(r"column\s+['\"](.+?)['\"]", error_message, re.IGNORECASE)
    if match:
        return match.group(1)
    return None

def _lookup_column_in_profile(column_name: Optional[str], safe_profile: str) -> Optional[dict]:
    """Look up column metadata in the safe profile markdown table."""
    if not column_name:
        return None
    # Parse the markdown table rows for the matching column
    for line in safe_profile.split("\n"):
        if line.startswith("|") and column_name in line:
            parts = [p.strip() for p in line.split("|") if p.strip()]
            if len(parts) >= 5 and parts[0] == column_name:
                return {
                    "dtype": parts[1],
                    "null_rate": parts[2],
                    "unique_count": parts[3],
                    "description": parts[4],
                }
    return None
```

### 3. Conversation History Trimmer (`data_analysis_agent.py`)

A new method `_trim_conversation_history()` on `DataAnalysisAgent` that implements sliding window trimming with summary compression.

```python
def _trim_conversation_history(self):
    """Apply sliding window trimming to conversation history."""
    window_size = app_config.conversation_window_size
    max_messages = window_size * 2  # pairs of user+assistant messages
    
    if len(self.conversation_history) <= max_messages:
        return  # No trimming needed
    
    first_message = self.conversation_history[0]  # Always retain
    
    # Determine trim boundary: skip first message + possible existing summary
    start_idx = 1
    has_existing_summary = (
        len(self.conversation_history) > 1
        and self.conversation_history[1]["role"] == "user"
        and self.conversation_history[1]["content"].startswith("[分析摘要]")
    )
    if has_existing_summary:
        start_idx = 2
    
    # Messages to trim vs keep
    messages_to_consider = self.conversation_history[start_idx:]
    messages_to_trim = messages_to_consider[:-max_messages]
    messages_to_keep = messages_to_consider[-max_messages:]
    
    if not messages_to_trim:
        return
    
    # Generate summary of trimmed messages
    summary = self._compress_trimmed_messages(messages_to_trim)
    
    # Rebuild history: first_message + summary + recent messages
    self.conversation_history = [first_message]
    if summary:
        self.conversation_history.append({"role": "user", "content": summary})
    self.conversation_history.extend(messages_to_keep)

def _compress_trimmed_messages(self, messages: list) -> str:
    """Compress trimmed messages into a summary string."""
    summary_parts = ["[分析摘要] 以下是之前分析轮次的概要:"]
    round_num = 0
    
    for msg in messages:
        content = msg["content"]
        if msg["role"] == "assistant":
            round_num += 1
            # Extract action type from YAML-like content
            action = "generate_code"
            if "action: \"collect_figures\"" in content or "action: collect_figures" in content:
                action = "collect_figures"
            elif "action: \"analysis_complete\"" in content or "action: analysis_complete" in content:
                action = "analysis_complete"
            summary_parts.append(f"- 轮次{round_num}: 动作={action}")
        elif msg["role"] == "user" and "代码执行反馈" in content:
            success = "失败" if "[ERROR]" in content or "执行错误" in content else "成功"
            summary_parts[-1] += f", 执行结果={success}"
    
    return "\n".join(summary_parts)
```

### 4. Template Integration (`data_analysis_agent.py` + `web/main.py`)

The `analyze()` method gains an optional `template_name` parameter. When provided, the template prompt is prepended to the user requirement.

**Agent side:**
```python
def analyze(self, user_input: str, files=None, session_output_dir=None,
            reset_session=True, max_rounds=None, template_name=None):
    # ... existing init code ...
    if template_name:
        from utils.analysis_templates import get_template
        template = get_template(template_name)  # Raises ValueError if invalid
        template_prompt = template.get_full_prompt()
        user_input = f"{template_prompt}\n\n{user_input}"
    # ... rest of analyze ...
```

**API side (`web/main.py`):**
```python
# New endpoint
@app.get("/api/templates")
async def list_available_templates():
    from utils.analysis_templates import list_templates
    return {"templates": list_templates()}

# Modified StartRequest
class StartRequest(BaseModel):
    requirement: str
    template: Optional[str] = None
```

### 5. Progress Bar Integration

**Backend (`web/main.py`):** Update `run_analysis_task` to set progress fields on `SessionData` via a callback or by polling the agent's `current_round`. The simplest approach is to add a progress callback to the agent.

```python
# In DataAnalysisAgent
def set_progress_callback(self, callback):
    """Set a callback function(current_round, max_rounds, message) for progress updates."""
    self._progress_callback = callback

# Called at the start of each round in the analyze() loop:
if hasattr(self, '_progress_callback') and self._progress_callback:
    self._progress_callback(self.current_round, self.max_rounds, f"第{self.current_round}轮分析中...")
```

**Backend (`web/main.py`):** In `run_analysis_task`, wire the callback:
```python
def progress_cb(current, total, message):
    session.current_round = current
    session.max_rounds = total
    session.progress_percentage = round((current / total) * 100, 1) if total > 0 else 0
    session.status_message = message

agent.set_progress_callback(progress_cb)
```

**API response:** Add progress fields to `GET /api/status`:
```python
return {
    "is_running": session.is_running,
    "log": log_content,
    "has_report": ...,
    "current_round": session.current_round,
    "max_rounds": session.max_rounds,
    "progress_percentage": session.progress_percentage,
    "status_message": session.status_message,
    ...
}
```

**Frontend (`script.js`):** During polling, render a progress bar when `is_running` is true:
```javascript
// In the polling callback:
if (data.is_running) {
    updateProgressBar(data.progress_percentage, data.status_message);
}
```

### 6. Multi-File Chunked & Parallel Loading

**Chunked loading enhancement (`utils/data_loader.py`):**

```python
def load_and_profile_data_smart(file_paths: list, max_file_size_mb: int = None) -> str:
    """Smart loader: uses chunked reading for large files, regular for small."""
    if max_file_size_mb is None:
        max_file_size_mb = app_config.max_file_size_mb
    
    profile_summary = "# 数据画像报告 (Data Profile)\n\n"
    for file_path in file_paths:
        file_size_mb = os.path.getsize(file_path) / (1024 * 1024)
        if file_size_mb > max_file_size_mb:
            profile_summary += _profile_chunked(file_path)
        else:
            profile_summary += _profile_full(file_path)
    return profile_summary

def _profile_chunked(file_path: str) -> str:
    """Profile a large file by reading first chunk + sampling subsequent chunks."""
    chunks = load_data_chunked(file_path)
    first_chunk = next(chunks, None)
    if first_chunk is None:
        return f"[ERROR] 无法读取文件: {file_path}\n"
    
    # Sample from subsequent chunks
    sample_rows = [first_chunk]
    for i, chunk in enumerate(chunks):
        if i % 5 == 0:  # Sample every 5th chunk
            sample_rows.append(chunk.sample(min(100, len(chunk))))
    
    combined = pd.concat(sample_rows, ignore_index=True)
    # Generate profile from combined sample
    return _generate_profile_for_df(combined, file_path, sampled=True)
```

**Parallel profiling (`data_analysis_agent.py`):**

```python
from concurrent.futures import ThreadPoolExecutor, as_completed

def _profile_files_parallel(self, file_paths: list) -> tuple[str, str]:
    """Profile multiple files concurrently."""
    max_workers = app_config.max_parallel_profiles
    safe_profiles = []
    local_profiles = []
    
    def profile_single(path):
        safe = build_safe_profile([path])
        local = build_local_profile([path])
        return path, safe, local
    
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = {executor.submit(profile_single, p): p for p in file_paths}
        for future in as_completed(futures):
            path = futures[future]
            try:
                _, safe, local = future.result()
                safe_profiles.append(safe)
                local_profiles.append(local)
            except Exception as e:
                error_entry = f"## 文件: {os.path.basename(path)}\n[ERROR] 分析失败: {e}\n\n"
                safe_profiles.append(error_entry)
                local_profiles.append(error_entry)
    
    return "\n".join(safe_profiles), "\n".join(local_profiles)
```

## Data Models

### AppConfig Extensions (`config/app_config.py`)

```python
@dataclass
class AppConfig:
    # ... existing fields ...
    
    # New fields
    max_data_context_retries: int = field(default=2)
    conversation_window_size: int = field(default=10)
    max_parallel_profiles: int = field(default=4)
    
    @classmethod
    def from_env(cls) -> 'AppConfig':
        config = cls()
        # ... existing env overrides ...
        if val := os.getenv("APP_MAX_DATA_CONTEXT_RETRIES"):
            config.max_data_context_retries = int(val)
        if val := os.getenv("APP_CONVERSATION_WINDOW_SIZE"):
            config.conversation_window_size = int(val)
        if val := os.getenv("APP_MAX_PARALLEL_PROFILES"):
            config.max_parallel_profiles = int(val)
        return config
```

### StartRequest Extension (`web/main.py`)

```python
class StartRequest(BaseModel):
    requirement: str
    template: Optional[str] = None  # New field
```

### SessionData Progress Fields (already exist, just need wiring)

The `SessionData` class already has `current_round`, `max_rounds`, `progress_percentage`, and `status_message` fields. These just need to be updated during analysis and included in the `/api/status` response.

## Correctness Properties

*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*

### Property 1: Error Classification Correctness

*For any* error message string, if it contains a data-context pattern (KeyError on a column name, ValueError on column values, NameError for data variables, or empty DataFrame conditions), `_classify_error` SHALL return `"data_context"`; otherwise it SHALL return `"other"`.

**Validates: Requirements 1.1**

### Property 2: Retry Below Limit Produces Enriched Hint

*For any* `max_data_context_retries` value and any current retry count strictly less than that value, when a data-context error is detected, the agent SHALL produce an enriched hint message rather than forwarding the raw error.

**Validates: Requirements 1.3**

### Property 3: Enriched Hint Contains Correct Column Metadata Without Real Data

*For any* error message referencing a column name present in the Safe_Profile, the generated enriched hint SHALL contain that column's data type, unique value count, null rate, and categorical description, and SHALL NOT contain any real data values (min, max, mean, sample rows) from the Local_Profile.

**Validates: Requirements 2.1, 2.2, 2.4**

### Property 4: Environment Variable Override for Config Fields

*For any* positive integer value set as the `APP_MAX_DATA_CONTEXT_RETRIES` environment variable, `AppConfig.from_env()` SHALL produce a config where `max_data_context_retries` equals that integer value.

**Validates: Requirements 3.2**

### Property 5: Sliding Window Trimming Preserves First Message and Retains Recent Pairs

*For any* conversation history whose length exceeds `2 * conversation_window_size` and any `conversation_window_size >= 1`, after trimming: (a) the first user message is always retained at index 0, and (b) the most recent `conversation_window_size` message pairs are retained in full.

**Validates: Requirements 4.2, 4.3**

### Property 6: Trimming Summary Contains Round Info and Excludes Code/Raw Output

*For any* set of trimmed conversation messages, the generated summary SHALL list each trimmed round's action type and execution success/failure, and SHALL NOT contain any code blocks (``` markers) or raw execution output.

**Validates: Requirements 4.4, 5.1, 5.2**

### Property 7: Template Prompt Integration

*For any* valid template name in `TEMPLATE_REGISTRY` and any user requirement string, the initial conversation message SHALL contain the template's `get_full_prompt()` output prepended to the user requirement.

**Validates: Requirements 6.1, 6.2**

### Property 8: Invalid Template Name Raises Descriptive Error

*For any* string that is not a key in `TEMPLATE_REGISTRY`, calling `get_template()` SHALL raise a `ValueError` whose message contains the list of available template names.

**Validates: Requirements 6.3**

### Property 9: Chunked Loading Threshold

*For any* file path and `max_file_size_mb` threshold, if the file's size in MB exceeds the threshold, the smart loader SHALL use chunked loading; otherwise it SHALL use full loading.

**Validates: Requirements 10.1**

### Property 10: Chunked Profiling Uses First Chunk Plus Samples

*For any* file loaded in chunked mode, the generated profile SHALL be based on the first chunk plus sampled rows from subsequent chunks, not from the entire file loaded into memory.

**Validates: Requirements 10.3**

### Property 11: Parallel Profile Merge With Error Resilience

*For any* set of file paths where some are valid and some are invalid/corrupted, the merged profile output SHALL contain valid profile entries for successful files and error entries for failed files, with no files missing from the output.

**Validates: Requirements 11.2, 11.3**

## Error Handling

| Scenario | Handling Strategy |
|----------|------------------|
| Data-context error below retry limit | Generate enriched hint, retry with LLM |
| Data-context error at retry limit | Fall back to normal sanitized error forwarding |
| Invalid template name | Raise `ValueError` with available template list |
| File too large for memory | Automatically switch to chunked loading |
| Chunked loading fails | Return descriptive error, continue with other files |
| Single file profiling fails in parallel | Include error entry, continue profiling remaining files |
| Conversation history exceeds window | Trim old messages, generate compressed summary |
| Summary generation fails | Log warning, proceed without summary (graceful degradation) |
| Progress callback fails | Log warning, analysis continues without progress updates |

## Testing Strategy

### Property-Based Tests (using `hypothesis`)

Each correctness property maps to a property-based test with minimum 100 iterations. The test library is `hypothesis` (Python).

- **Property 1**: Generate random error strings with/without data-context patterns → verify classification
- **Property 2**: Generate random retry counts and limits → verify hint vs raw error behavior
- **Property 3**: Generate random Safe_Profile tables and error messages → verify hint content and absence of real data
- **Property 4**: Generate random positive integers → set env var → verify config
- **Property 5**: Generate random conversation histories and window sizes → verify trimming invariants
- **Property 6**: Generate random trimmed message sets → verify summary content and absence of code blocks
- **Property 7**: Pick random valid template names and requirement strings → verify prompt construction
- **Property 8**: Generate random strings not in registry → verify ValueError
- **Property 9**: Generate random file sizes and thresholds → verify loading method selection
- **Property 10**: Generate random chunked data → verify profile source
- **Property 11**: Generate random file sets with failures → verify merged output

Tag format: `Feature: agent-robustness-optimization, Property {N}: {title}`

### Unit Tests

- Error classifier with specific known error messages (KeyError, ValueError, NameError, generic errors)
- Enriched hint generation with known column profiles
- Conversation trimming with exact message counts at boundary conditions
- Template retrieval for each registered template
- Progress callback wiring
- API endpoint response shapes (`GET /api/templates`, `GET /api/status` with progress fields)

### Integration Tests

- `GET /api/templates` returns all registered templates
- `POST /api/start` with `template` field passes template to agent
- `GET /api/status` includes progress fields during analysis
- Multi-file parallel profiling with real CSV files
- End-to-end: start analysis with template → verify template prompt in conversation history