Files

Jeason b256aa27d9 Merge branch 'main' of http://jeason.online:3000/zhaojie/iov_data_analysis_agent

2026-04-19 16:29:59 +08:00

21 KiB

Raw Blame History

Design Document: Agent Robustness Optimization

Overview

This design addresses five areas of improvement for the AI Data Analysis Agent: data privacy fallback recovery, conversation history trimming, analysis template integration, frontend progress display, and multi-file chunked/parallel loading. The changes span the Python backend (data_analysis_agent.py, config/app_config.py, utils/data_privacy.py, utils/data_loader.py, web/main.py) and the vanilla JS frontend (web/static/script.js, web/static/index.html, web/static/clean_style.css).

The core design principle is minimal invasiveness: each feature is implemented as a composable module or method that plugs into the existing agent loop, avoiding large-scale refactors of the DataAnalysisAgent.analyze() main loop.

Architecture

The system follows a layered architecture where the DataAnalysisAgent orchestrates LLM calls and code execution, the FastAPI server manages sessions and exposes APIs, and the frontend polls for status updates.

graph TD
    subgraph Frontend
        UI[script.js + index.html]
    end

    subgraph FastAPI Server
        API[web/main.py]
        SM[SessionManager]
    end

    subgraph Agent Core
        DA[DataAnalysisAgent]
        EC[ErrorClassifier]
        HG[HintGenerator]
        HT[HistoryTrimmer]
        TI[TemplateIntegration]
    end

    subgraph Utilities
        DP[data_privacy.py]
        DL[data_loader.py]
        AT[analysis_templates.py]
        CE[code_executor.py]
    end

    subgraph Config
        AC[app_config.py]
    end

    UI -->|POST /api/start, GET /api/status, GET /api/templates| API
    API --> SM
    API --> DA
    DA --> EC
    DA --> HG
    DA --> HT
    DA --> TI
    DA --> CE
    HG --> DP
    DL --> AC
    DA --> DL
    TI --> AT
    EC --> AC
    HT --> AC

Change Impact Summary

Area	Files Modified	New Files
Data Privacy Fallback	`data_analysis_agent.py`, `utils/data_privacy.py`, `config/app_config.py`	None
Conversation Trimming	`data_analysis_agent.py`, `config/app_config.py`	None
Template System	`data_analysis_agent.py`, `web/main.py`, `web/static/script.js`, `web/static/index.html`, `web/static/clean_style.css`	None
Progress Bar	`web/main.py`, `web/static/script.js`, `web/static/index.html`, `web/static/clean_style.css`	None
Multi-File Loading	`utils/data_loader.py`, `data_analysis_agent.py`, `config/app_config.py`	None

Components and Interfaces

1. Error Classifier (`data_analysis_agent.py`)

A new method _classify_error(error_message: str) -> str on DataAnalysisAgent that inspects error messages and returns "data_context" or "other".

DATA_CONTEXT_PATTERNS = [
    r"KeyError:\s*['\"](.+?)['\"]",
    r"ValueError.*(?:column|col|field)",
    r"NameError.*(?:df|data|frame)",
    r"(?:empty|no\s+data|0\s+rows)",
    r"IndexError.*(?:out of range|out of bounds)",
]

def _classify_error(self, error_message: str) -> str:
    """Classify execution error as data-context or other."""
    for pattern in DATA_CONTEXT_PATTERNS:
        if re.search(pattern, error_message, re.IGNORECASE):
            return "data_context"
    return "other"

2. Enriched Hint Generator (`utils/data_privacy.py`)

A new function generate_enriched_hint(error_message: str, safe_profile: str) -> str that extracts the referenced column name from the error, looks it up in the safe profile, and returns a hint string containing only schema-level metadata.

def generate_enriched_hint(error_message: str, safe_profile: str) -> str:
    """
    Generate an enriched hint from the safe profile for a data-context error.
    Returns schema-level metadata only — no real data values.
    """
    column_name = _extract_column_from_error(error_message)
    column_meta = _lookup_column_in_profile(column_name, safe_profile)
    
    hint = "[RETRY CONTEXT] 上一次代码执行因数据上下文错误失败。\n"
    hint += f"错误信息: {error_message}\n"
    if column_meta:
        hint += f"相关列 '{column_name}' 的结构信息:\n"
        hint += f"  - 数据类型: {column_meta['dtype']}\n"
        hint += f"  - 唯一值数量: {column_meta['unique_count']}\n"
        hint += f"  - 空值率: {column_meta['null_rate']}\n"
        hint += f"  - 特征描述: {column_meta['description']}\n"
    hint += "请根据以上结构信息修正代码，不要假设具体的数据值。"
    return hint

def _extract_column_from_error(error_message: str) -> Optional[str]:
    """Extract column name from error message patterns like KeyError: 'col_name'."""
    match = re.search(r"KeyError:\s*['\"](.+?)['\"]", error_message)
    if match:
        return match.group(1)
    match = re.search(r"column\s+['\"](.+?)['\"]", error_message, re.IGNORECASE)
    if match:
        return match.group(1)
    return None

def _lookup_column_in_profile(column_name: Optional[str], safe_profile: str) -> Optional[dict]:
    """Look up column metadata in the safe profile markdown table."""
    if not column_name:
        return None
    # Parse the markdown table rows for the matching column
    for line in safe_profile.split("\n"):
        if line.startswith("|") and column_name in line:
            parts = [p.strip() for p in line.split("|") if p.strip()]
            if len(parts) >= 5 and parts[0] == column_name:
                return {
                    "dtype": parts[1],
                    "null_rate": parts[2],
                    "unique_count": parts[3],
                    "description": parts[4],
                }
    return None

3. Conversation History Trimmer (`data_analysis_agent.py`)

A new method _trim_conversation_history() on DataAnalysisAgent that implements sliding window trimming with summary compression.

def _trim_conversation_history(self):
    """Apply sliding window trimming to conversation history."""
    window_size = app_config.conversation_window_size
    max_messages = window_size * 2  # pairs of user+assistant messages
    
    if len(self.conversation_history) <= max_messages:
        return  # No trimming needed
    
    first_message = self.conversation_history[0]  # Always retain
    
    # Determine trim boundary: skip first message + possible existing summary
    start_idx = 1
    has_existing_summary = (
        len(self.conversation_history) > 1
        and self.conversation_history[1]["role"] == "user"
        and self.conversation_history[1]["content"].startswith("[分析摘要]")
    )
    if has_existing_summary:
        start_idx = 2
    
    # Messages to trim vs keep
    messages_to_consider = self.conversation_history[start_idx:]
    messages_to_trim = messages_to_consider[:-max_messages]
    messages_to_keep = messages_to_consider[-max_messages:]
    
    if not messages_to_trim:
        return
    
    # Generate summary of trimmed messages
    summary = self._compress_trimmed_messages(messages_to_trim)
    
    # Rebuild history: first_message + summary + recent messages
    self.conversation_history = [first_message]
    if summary:
        self.conversation_history.append({"role": "user", "content": summary})
    self.conversation_history.extend(messages_to_keep)

def _compress_trimmed_messages(self, messages: list) -> str:
    """Compress trimmed messages into a summary string."""
    summary_parts = ["[分析摘要] 以下是之前分析轮次的概要:"]
    round_num = 0
    
    for msg in messages:
        content = msg["content"]
        if msg["role"] == "assistant":
            round_num += 1
            # Extract action type from YAML-like content
            action = "generate_code"
            if "action: \"collect_figures\"" in content or "action: collect_figures" in content:
                action = "collect_figures"
            elif "action: \"analysis_complete\"" in content or "action: analysis_complete" in content:
                action = "analysis_complete"
            summary_parts.append(f"- 轮次{round_num}: 动作={action}")
        elif msg["role"] == "user" and "代码执行反馈" in content:
            success = "失败" if "[ERROR]" in content or "执行错误" in content else "成功"
            summary_parts[-1] += f", 执行结果={success}"
    
    return "\n".join(summary_parts)

4. Template Integration (`data_analysis_agent.py` + `web/main.py`)

The analyze() method gains an optional template_name parameter. When provided, the template prompt is prepended to the user requirement.

Agent side:

def analyze(self, user_input: str, files=None, session_output_dir=None,
            reset_session=True, max_rounds=None, template_name=None):
    # ... existing init code ...
    if template_name:
        from utils.analysis_templates import get_template
        template = get_template(template_name)  # Raises ValueError if invalid
        template_prompt = template.get_full_prompt()
        user_input = f"{template_prompt}\n\n{user_input}"
    # ... rest of analyze ...

API side (web/main.py):

# New endpoint
@app.get("/api/templates")
async def list_available_templates():
    from utils.analysis_templates import list_templates
    return {"templates": list_templates()}

# Modified StartRequest
class StartRequest(BaseModel):
    requirement: str
    template: Optional[str] = None

5. Progress Bar Integration

Backend (web/main.py): Update run_analysis_task to set progress fields on SessionData via a callback or by polling the agent's current_round. The simplest approach is to add a progress callback to the agent.

# In DataAnalysisAgent
def set_progress_callback(self, callback):
    """Set a callback function(current_round, max_rounds, message) for progress updates."""
    self._progress_callback = callback

# Called at the start of each round in the analyze() loop:
if hasattr(self, '_progress_callback') and self._progress_callback:
    self._progress_callback(self.current_round, self.max_rounds, f"第{self.current_round}轮分析中...")

Backend (web/main.py): In run_analysis_task, wire the callback:

def progress_cb(current, total, message):
    session.current_round = current
    session.max_rounds = total
    session.progress_percentage = round((current / total) * 100, 1) if total > 0 else 0
    session.status_message = message

agent.set_progress_callback(progress_cb)

API response: Add progress fields to GET /api/status:

return {
    "is_running": session.is_running,
    "log": log_content,
    "has_report": ...,
    "current_round": session.current_round,
    "max_rounds": session.max_rounds,
    "progress_percentage": session.progress_percentage,
    "status_message": session.status_message,
    ...
}

Frontend (script.js): During polling, render a progress bar when is_running is true:

// In the polling callback:
if (data.is_running) {
    updateProgressBar(data.progress_percentage, data.status_message);
}

6. Multi-File Chunked & Parallel Loading

Chunked loading enhancement (utils/data_loader.py):

def load_and_profile_data_smart(file_paths: list, max_file_size_mb: int = None) -> str:
    """Smart loader: uses chunked reading for large files, regular for small."""
    if max_file_size_mb is None:
        max_file_size_mb = app_config.max_file_size_mb
    
    profile_summary = "# 数据画像报告 (Data Profile)\n\n"
    for file_path in file_paths:
        file_size_mb = os.path.getsize(file_path) / (1024 * 1024)
        if file_size_mb > max_file_size_mb:
            profile_summary += _profile_chunked(file_path)
        else:
            profile_summary += _profile_full(file_path)
    return profile_summary

def _profile_chunked(file_path: str) -> str:
    """Profile a large file by reading first chunk + sampling subsequent chunks."""
    chunks = load_data_chunked(file_path)
    first_chunk = next(chunks, None)
    if first_chunk is None:
        return f"[ERROR] 无法读取文件: {file_path}\n"
    
    # Sample from subsequent chunks
    sample_rows = [first_chunk]
    for i, chunk in enumerate(chunks):
        if i % 5 == 0:  # Sample every 5th chunk
            sample_rows.append(chunk.sample(min(100, len(chunk))))
    
    combined = pd.concat(sample_rows, ignore_index=True)
    # Generate profile from combined sample
    return _generate_profile_for_df(combined, file_path, sampled=True)

Parallel profiling (data_analysis_agent.py):

from concurrent.futures import ThreadPoolExecutor, as_completed

def _profile_files_parallel(self, file_paths: list) -> tuple[str, str]:
    """Profile multiple files concurrently."""
    max_workers = app_config.max_parallel_profiles
    safe_profiles = []
    local_profiles = []
    
    def profile_single(path):
        safe = build_safe_profile([path])
        local = build_local_profile([path])
        return path, safe, local
    
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = {executor.submit(profile_single, p): p for p in file_paths}
        for future in as_completed(futures):
            path = futures[future]
            try:
                _, safe, local = future.result()
                safe_profiles.append(safe)
                local_profiles.append(local)
            except Exception as e:
                error_entry = f"## 文件: {os.path.basename(path)}\n[ERROR] 分析失败: {e}\n\n"
                safe_profiles.append(error_entry)
                local_profiles.append(error_entry)
    
    return "\n".join(safe_profiles), "\n".join(local_profiles)

Data Models

AppConfig Extensions (`config/app_config.py`)

@dataclass
class AppConfig:
    # ... existing fields ...
    
    # New fields
    max_data_context_retries: int = field(default=2)
    conversation_window_size: int = field(default=10)
    max_parallel_profiles: int = field(default=4)
    
    @classmethod
    def from_env(cls) -> 'AppConfig':
        config = cls()
        # ... existing env overrides ...
        if val := os.getenv("APP_MAX_DATA_CONTEXT_RETRIES"):
            config.max_data_context_retries = int(val)
        if val := os.getenv("APP_CONVERSATION_WINDOW_SIZE"):
            config.conversation_window_size = int(val)
        if val := os.getenv("APP_MAX_PARALLEL_PROFILES"):
            config.max_parallel_profiles = int(val)
        return config

StartRequest Extension (`web/main.py`)

class StartRequest(BaseModel):
    requirement: str
    template: Optional[str] = None  # New field

SessionData Progress Fields (already exist, just need wiring)

The SessionData class already has current_round, max_rounds, progress_percentage, and status_message fields. These just need to be updated during analysis and included in the /api/status response.

Correctness Properties

A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.

Property 1: Error Classification Correctness

For any error message string, if it contains a data-context pattern (KeyError on a column name, ValueError on column values, NameError for data variables, or empty DataFrame conditions), _classify_error SHALL return "data_context"; otherwise it SHALL return "other".

Validates: Requirements 1.1

Property 2: Retry Below Limit Produces Enriched Hint

For any max_data_context_retries value and any current retry count strictly less than that value, when a data-context error is detected, the agent SHALL produce an enriched hint message rather than forwarding the raw error.

Validates: Requirements 1.3

Property 3: Enriched Hint Contains Correct Column Metadata Without Real Data

For any error message referencing a column name present in the Safe_Profile, the generated enriched hint SHALL contain that column's data type, unique value count, null rate, and categorical description, and SHALL NOT contain any real data values (min, max, mean, sample rows) from the Local_Profile.

Validates: Requirements 2.1, 2.2, 2.4

Property 4: Environment Variable Override for Config Fields

For any positive integer value set as the APP_MAX_DATA_CONTEXT_RETRIES environment variable, AppConfig.from_env() SHALL produce a config where max_data_context_retries equals that integer value.

Validates: Requirements 3.2

Property 5: Sliding Window Trimming Preserves First Message and Retains Recent Pairs

For any conversation history whose length exceeds 2 * conversation_window_size and any conversation_window_size >= 1, after trimming: (a) the first user message is always retained at index 0, and (b) the most recent conversation_window_size message pairs are retained in full.

Validates: Requirements 4.2, 4.3

Property 6: Trimming Summary Contains Round Info and Excludes Code/Raw Output

For any set of trimmed conversation messages, the generated summary SHALL list each trimmed round's action type and execution success/failure, and SHALL NOT contain any code blocks (``` markers) or raw execution output.

Validates: Requirements 4.4, 5.1, 5.2

Property 7: Template Prompt Integration

For any valid template name in TEMPLATE_REGISTRY and any user requirement string, the initial conversation message SHALL contain the template's get_full_prompt() output prepended to the user requirement.

Validates: Requirements 6.1, 6.2

Property 8: Invalid Template Name Raises Descriptive Error

For any string that is not a key in TEMPLATE_REGISTRY, calling get_template() SHALL raise a ValueError whose message contains the list of available template names.

Validates: Requirements 6.3

Property 9: Chunked Loading Threshold

For any file path and max_file_size_mb threshold, if the file's size in MB exceeds the threshold, the smart loader SHALL use chunked loading; otherwise it SHALL use full loading.

Validates: Requirements 10.1

Property 10: Chunked Profiling Uses First Chunk Plus Samples

For any file loaded in chunked mode, the generated profile SHALL be based on the first chunk plus sampled rows from subsequent chunks, not from the entire file loaded into memory.

Validates: Requirements 10.3

Property 11: Parallel Profile Merge With Error Resilience

For any set of file paths where some are valid and some are invalid/corrupted, the merged profile output SHALL contain valid profile entries for successful files and error entries for failed files, with no files missing from the output.

Validates: Requirements 11.2, 11.3

Error Handling

Scenario	Handling Strategy
Data-context error below retry limit	Generate enriched hint, retry with LLM
Data-context error at retry limit	Fall back to normal sanitized error forwarding
Invalid template name	Raise `ValueError` with available template list
File too large for memory	Automatically switch to chunked loading
Chunked loading fails	Return descriptive error, continue with other files
Single file profiling fails in parallel	Include error entry, continue profiling remaining files
Conversation history exceeds window	Trim old messages, generate compressed summary
Summary generation fails	Log warning, proceed without summary (graceful degradation)
Progress callback fails	Log warning, analysis continues without progress updates

Testing Strategy

Property-Based Tests (using `hypothesis`)

Each correctness property maps to a property-based test with minimum 100 iterations. The test library is hypothesis (Python).

Property 1: Generate random error strings with/without data-context patterns → verify classification
Property 2: Generate random retry counts and limits → verify hint vs raw error behavior
Property 3: Generate random Safe_Profile tables and error messages → verify hint content and absence of real data
Property 4: Generate random positive integers → set env var → verify config
Property 5: Generate random conversation histories and window sizes → verify trimming invariants
Property 6: Generate random trimmed message sets → verify summary content and absence of code blocks
Property 7: Pick random valid template names and requirement strings → verify prompt construction
Property 8: Generate random strings not in registry → verify ValueError
Property 9: Generate random file sizes and thresholds → verify loading method selection
Property 10: Generate random chunked data → verify profile source
Property 11: Generate random file sets with failures → verify merged output

Tag format: Feature: agent-robustness-optimization, Property {N}: {title}

Unit Tests

Error classifier with specific known error messages (KeyError, ValueError, NameError, generic errors)
Enriched hint generation with known column profiles
Conversation trimming with exact message counts at boundary conditions
Template retrieval for each registered template
Progress callback wiring
API endpoint response shapes (GET /api/templates, GET /api/status with progress fields)

Integration Tests

GET /api/templates returns all registered templates
POST /api/start with template field passes template to agent
GET /api/status includes progress fields during analysis
Multi-file parallel profiling with real CSV files
End-to-end: start analysis with template → verify template prompt in conversation history

21 KiB Raw Blame History