21 KiB
Design Document: Agent Robustness Optimization
Overview
This design addresses five areas of improvement for the AI Data Analysis Agent: data privacy fallback recovery, conversation history trimming, analysis template integration, frontend progress display, and multi-file chunked/parallel loading. The changes span the Python backend (data_analysis_agent.py, config/app_config.py, utils/data_privacy.py, utils/data_loader.py, web/main.py) and the vanilla JS frontend (web/static/script.js, web/static/index.html, web/static/clean_style.css).
The core design principle is minimal invasiveness: each feature is implemented as a composable module or method that plugs into the existing agent loop, avoiding large-scale refactors of the DataAnalysisAgent.analyze() main loop.
Architecture
The system follows a layered architecture where the DataAnalysisAgent orchestrates LLM calls and code execution, the FastAPI server manages sessions and exposes APIs, and the frontend polls for status updates.
graph TD
subgraph Frontend
UI[script.js + index.html]
end
subgraph FastAPI Server
API[web/main.py]
SM[SessionManager]
end
subgraph Agent Core
DA[DataAnalysisAgent]
EC[ErrorClassifier]
HG[HintGenerator]
HT[HistoryTrimmer]
TI[TemplateIntegration]
end
subgraph Utilities
DP[data_privacy.py]
DL[data_loader.py]
AT[analysis_templates.py]
CE[code_executor.py]
end
subgraph Config
AC[app_config.py]
end
UI -->|POST /api/start, GET /api/status, GET /api/templates| API
API --> SM
API --> DA
DA --> EC
DA --> HG
DA --> HT
DA --> TI
DA --> CE
HG --> DP
DL --> AC
DA --> DL
TI --> AT
EC --> AC
HT --> AC
Change Impact Summary
| Area | Files Modified | New Files |
|---|---|---|
| Data Privacy Fallback | data_analysis_agent.py, utils/data_privacy.py, config/app_config.py |
None |
| Conversation Trimming | data_analysis_agent.py, config/app_config.py |
None |
| Template System | data_analysis_agent.py, web/main.py, web/static/script.js, web/static/index.html, web/static/clean_style.css |
None |
| Progress Bar | web/main.py, web/static/script.js, web/static/index.html, web/static/clean_style.css |
None |
| Multi-File Loading | utils/data_loader.py, data_analysis_agent.py, config/app_config.py |
None |
Components and Interfaces
1. Error Classifier (data_analysis_agent.py)
A new method _classify_error(error_message: str) -> str on DataAnalysisAgent that inspects error messages and returns "data_context" or "other".
DATA_CONTEXT_PATTERNS = [
r"KeyError:\s*['\"](.+?)['\"]",
r"ValueError.*(?:column|col|field)",
r"NameError.*(?:df|data|frame)",
r"(?:empty|no\s+data|0\s+rows)",
r"IndexError.*(?:out of range|out of bounds)",
]
def _classify_error(self, error_message: str) -> str:
"""Classify execution error as data-context or other."""
for pattern in DATA_CONTEXT_PATTERNS:
if re.search(pattern, error_message, re.IGNORECASE):
return "data_context"
return "other"
2. Enriched Hint Generator (utils/data_privacy.py)
A new function generate_enriched_hint(error_message: str, safe_profile: str) -> str that extracts the referenced column name from the error, looks it up in the safe profile, and returns a hint string containing only schema-level metadata.
def generate_enriched_hint(error_message: str, safe_profile: str) -> str:
"""
Generate an enriched hint from the safe profile for a data-context error.
Returns schema-level metadata only — no real data values.
"""
column_name = _extract_column_from_error(error_message)
column_meta = _lookup_column_in_profile(column_name, safe_profile)
hint = "[RETRY CONTEXT] 上一次代码执行因数据上下文错误失败。\n"
hint += f"错误信息: {error_message}\n"
if column_meta:
hint += f"相关列 '{column_name}' 的结构信息:\n"
hint += f" - 数据类型: {column_meta['dtype']}\n"
hint += f" - 唯一值数量: {column_meta['unique_count']}\n"
hint += f" - 空值率: {column_meta['null_rate']}\n"
hint += f" - 特征描述: {column_meta['description']}\n"
hint += "请根据以上结构信息修正代码,不要假设具体的数据值。"
return hint
def _extract_column_from_error(error_message: str) -> Optional[str]:
"""Extract column name from error message patterns like KeyError: 'col_name'."""
match = re.search(r"KeyError:\s*['\"](.+?)['\"]", error_message)
if match:
return match.group(1)
match = re.search(r"column\s+['\"](.+?)['\"]", error_message, re.IGNORECASE)
if match:
return match.group(1)
return None
def _lookup_column_in_profile(column_name: Optional[str], safe_profile: str) -> Optional[dict]:
"""Look up column metadata in the safe profile markdown table."""
if not column_name:
return None
# Parse the markdown table rows for the matching column
for line in safe_profile.split("\n"):
if line.startswith("|") and column_name in line:
parts = [p.strip() for p in line.split("|") if p.strip()]
if len(parts) >= 5 and parts[0] == column_name:
return {
"dtype": parts[1],
"null_rate": parts[2],
"unique_count": parts[3],
"description": parts[4],
}
return None
3. Conversation History Trimmer (data_analysis_agent.py)
A new method _trim_conversation_history() on DataAnalysisAgent that implements sliding window trimming with summary compression.
def _trim_conversation_history(self):
"""Apply sliding window trimming to conversation history."""
window_size = app_config.conversation_window_size
max_messages = window_size * 2 # pairs of user+assistant messages
if len(self.conversation_history) <= max_messages:
return # No trimming needed
first_message = self.conversation_history[0] # Always retain
# Determine trim boundary: skip first message + possible existing summary
start_idx = 1
has_existing_summary = (
len(self.conversation_history) > 1
and self.conversation_history[1]["role"] == "user"
and self.conversation_history[1]["content"].startswith("[分析摘要]")
)
if has_existing_summary:
start_idx = 2
# Messages to trim vs keep
messages_to_consider = self.conversation_history[start_idx:]
messages_to_trim = messages_to_consider[:-max_messages]
messages_to_keep = messages_to_consider[-max_messages:]
if not messages_to_trim:
return
# Generate summary of trimmed messages
summary = self._compress_trimmed_messages(messages_to_trim)
# Rebuild history: first_message + summary + recent messages
self.conversation_history = [first_message]
if summary:
self.conversation_history.append({"role": "user", "content": summary})
self.conversation_history.extend(messages_to_keep)
def _compress_trimmed_messages(self, messages: list) -> str:
"""Compress trimmed messages into a summary string."""
summary_parts = ["[分析摘要] 以下是之前分析轮次的概要:"]
round_num = 0
for msg in messages:
content = msg["content"]
if msg["role"] == "assistant":
round_num += 1
# Extract action type from YAML-like content
action = "generate_code"
if "action: \"collect_figures\"" in content or "action: collect_figures" in content:
action = "collect_figures"
elif "action: \"analysis_complete\"" in content or "action: analysis_complete" in content:
action = "analysis_complete"
summary_parts.append(f"- 轮次{round_num}: 动作={action}")
elif msg["role"] == "user" and "代码执行反馈" in content:
success = "失败" if "[ERROR]" in content or "执行错误" in content else "成功"
summary_parts[-1] += f", 执行结果={success}"
return "\n".join(summary_parts)
4. Template Integration (data_analysis_agent.py + web/main.py)
The analyze() method gains an optional template_name parameter. When provided, the template prompt is prepended to the user requirement.
Agent side:
def analyze(self, user_input: str, files=None, session_output_dir=None,
reset_session=True, max_rounds=None, template_name=None):
# ... existing init code ...
if template_name:
from utils.analysis_templates import get_template
template = get_template(template_name) # Raises ValueError if invalid
template_prompt = template.get_full_prompt()
user_input = f"{template_prompt}\n\n{user_input}"
# ... rest of analyze ...
API side (web/main.py):
# New endpoint
@app.get("/api/templates")
async def list_available_templates():
from utils.analysis_templates import list_templates
return {"templates": list_templates()}
# Modified StartRequest
class StartRequest(BaseModel):
requirement: str
template: Optional[str] = None
5. Progress Bar Integration
Backend (web/main.py): Update run_analysis_task to set progress fields on SessionData via a callback or by polling the agent's current_round. The simplest approach is to add a progress callback to the agent.
# In DataAnalysisAgent
def set_progress_callback(self, callback):
"""Set a callback function(current_round, max_rounds, message) for progress updates."""
self._progress_callback = callback
# Called at the start of each round in the analyze() loop:
if hasattr(self, '_progress_callback') and self._progress_callback:
self._progress_callback(self.current_round, self.max_rounds, f"第{self.current_round}轮分析中...")
Backend (web/main.py): In run_analysis_task, wire the callback:
def progress_cb(current, total, message):
session.current_round = current
session.max_rounds = total
session.progress_percentage = round((current / total) * 100, 1) if total > 0 else 0
session.status_message = message
agent.set_progress_callback(progress_cb)
API response: Add progress fields to GET /api/status:
return {
"is_running": session.is_running,
"log": log_content,
"has_report": ...,
"current_round": session.current_round,
"max_rounds": session.max_rounds,
"progress_percentage": session.progress_percentage,
"status_message": session.status_message,
...
}
Frontend (script.js): During polling, render a progress bar when is_running is true:
// In the polling callback:
if (data.is_running) {
updateProgressBar(data.progress_percentage, data.status_message);
}
6. Multi-File Chunked & Parallel Loading
Chunked loading enhancement (utils/data_loader.py):
def load_and_profile_data_smart(file_paths: list, max_file_size_mb: int = None) -> str:
"""Smart loader: uses chunked reading for large files, regular for small."""
if max_file_size_mb is None:
max_file_size_mb = app_config.max_file_size_mb
profile_summary = "# 数据画像报告 (Data Profile)\n\n"
for file_path in file_paths:
file_size_mb = os.path.getsize(file_path) / (1024 * 1024)
if file_size_mb > max_file_size_mb:
profile_summary += _profile_chunked(file_path)
else:
profile_summary += _profile_full(file_path)
return profile_summary
def _profile_chunked(file_path: str) -> str:
"""Profile a large file by reading first chunk + sampling subsequent chunks."""
chunks = load_data_chunked(file_path)
first_chunk = next(chunks, None)
if first_chunk is None:
return f"[ERROR] 无法读取文件: {file_path}\n"
# Sample from subsequent chunks
sample_rows = [first_chunk]
for i, chunk in enumerate(chunks):
if i % 5 == 0: # Sample every 5th chunk
sample_rows.append(chunk.sample(min(100, len(chunk))))
combined = pd.concat(sample_rows, ignore_index=True)
# Generate profile from combined sample
return _generate_profile_for_df(combined, file_path, sampled=True)
Parallel profiling (data_analysis_agent.py):
from concurrent.futures import ThreadPoolExecutor, as_completed
def _profile_files_parallel(self, file_paths: list) -> tuple[str, str]:
"""Profile multiple files concurrently."""
max_workers = app_config.max_parallel_profiles
safe_profiles = []
local_profiles = []
def profile_single(path):
safe = build_safe_profile([path])
local = build_local_profile([path])
return path, safe, local
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = {executor.submit(profile_single, p): p for p in file_paths}
for future in as_completed(futures):
path = futures[future]
try:
_, safe, local = future.result()
safe_profiles.append(safe)
local_profiles.append(local)
except Exception as e:
error_entry = f"## 文件: {os.path.basename(path)}\n[ERROR] 分析失败: {e}\n\n"
safe_profiles.append(error_entry)
local_profiles.append(error_entry)
return "\n".join(safe_profiles), "\n".join(local_profiles)
Data Models
AppConfig Extensions (config/app_config.py)
@dataclass
class AppConfig:
# ... existing fields ...
# New fields
max_data_context_retries: int = field(default=2)
conversation_window_size: int = field(default=10)
max_parallel_profiles: int = field(default=4)
@classmethod
def from_env(cls) -> 'AppConfig':
config = cls()
# ... existing env overrides ...
if val := os.getenv("APP_MAX_DATA_CONTEXT_RETRIES"):
config.max_data_context_retries = int(val)
if val := os.getenv("APP_CONVERSATION_WINDOW_SIZE"):
config.conversation_window_size = int(val)
if val := os.getenv("APP_MAX_PARALLEL_PROFILES"):
config.max_parallel_profiles = int(val)
return config
StartRequest Extension (web/main.py)
class StartRequest(BaseModel):
requirement: str
template: Optional[str] = None # New field
SessionData Progress Fields (already exist, just need wiring)
The SessionData class already has current_round, max_rounds, progress_percentage, and status_message fields. These just need to be updated during analysis and included in the /api/status response.
Correctness Properties
A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.
Property 1: Error Classification Correctness
For any error message string, if it contains a data-context pattern (KeyError on a column name, ValueError on column values, NameError for data variables, or empty DataFrame conditions), _classify_error SHALL return "data_context"; otherwise it SHALL return "other".
Validates: Requirements 1.1
Property 2: Retry Below Limit Produces Enriched Hint
For any max_data_context_retries value and any current retry count strictly less than that value, when a data-context error is detected, the agent SHALL produce an enriched hint message rather than forwarding the raw error.
Validates: Requirements 1.3
Property 3: Enriched Hint Contains Correct Column Metadata Without Real Data
For any error message referencing a column name present in the Safe_Profile, the generated enriched hint SHALL contain that column's data type, unique value count, null rate, and categorical description, and SHALL NOT contain any real data values (min, max, mean, sample rows) from the Local_Profile.
Validates: Requirements 2.1, 2.2, 2.4
Property 4: Environment Variable Override for Config Fields
For any positive integer value set as the APP_MAX_DATA_CONTEXT_RETRIES environment variable, AppConfig.from_env() SHALL produce a config where max_data_context_retries equals that integer value.
Validates: Requirements 3.2
Property 5: Sliding Window Trimming Preserves First Message and Retains Recent Pairs
For any conversation history whose length exceeds 2 * conversation_window_size and any conversation_window_size >= 1, after trimming: (a) the first user message is always retained at index 0, and (b) the most recent conversation_window_size message pairs are retained in full.
Validates: Requirements 4.2, 4.3
Property 6: Trimming Summary Contains Round Info and Excludes Code/Raw Output
For any set of trimmed conversation messages, the generated summary SHALL list each trimmed round's action type and execution success/failure, and SHALL NOT contain any code blocks (``` markers) or raw execution output.
Validates: Requirements 4.4, 5.1, 5.2
Property 7: Template Prompt Integration
For any valid template name in TEMPLATE_REGISTRY and any user requirement string, the initial conversation message SHALL contain the template's get_full_prompt() output prepended to the user requirement.
Validates: Requirements 6.1, 6.2
Property 8: Invalid Template Name Raises Descriptive Error
For any string that is not a key in TEMPLATE_REGISTRY, calling get_template() SHALL raise a ValueError whose message contains the list of available template names.
Validates: Requirements 6.3
Property 9: Chunked Loading Threshold
For any file path and max_file_size_mb threshold, if the file's size in MB exceeds the threshold, the smart loader SHALL use chunked loading; otherwise it SHALL use full loading.
Validates: Requirements 10.1
Property 10: Chunked Profiling Uses First Chunk Plus Samples
For any file loaded in chunked mode, the generated profile SHALL be based on the first chunk plus sampled rows from subsequent chunks, not from the entire file loaded into memory.
Validates: Requirements 10.3
Property 11: Parallel Profile Merge With Error Resilience
For any set of file paths where some are valid and some are invalid/corrupted, the merged profile output SHALL contain valid profile entries for successful files and error entries for failed files, with no files missing from the output.
Validates: Requirements 11.2, 11.3
Error Handling
| Scenario | Handling Strategy |
|---|---|
| Data-context error below retry limit | Generate enriched hint, retry with LLM |
| Data-context error at retry limit | Fall back to normal sanitized error forwarding |
| Invalid template name | Raise ValueError with available template list |
| File too large for memory | Automatically switch to chunked loading |
| Chunked loading fails | Return descriptive error, continue with other files |
| Single file profiling fails in parallel | Include error entry, continue profiling remaining files |
| Conversation history exceeds window | Trim old messages, generate compressed summary |
| Summary generation fails | Log warning, proceed without summary (graceful degradation) |
| Progress callback fails | Log warning, analysis continues without progress updates |
Testing Strategy
Property-Based Tests (using hypothesis)
Each correctness property maps to a property-based test with minimum 100 iterations. The test library is hypothesis (Python).
- Property 1: Generate random error strings with/without data-context patterns → verify classification
- Property 2: Generate random retry counts and limits → verify hint vs raw error behavior
- Property 3: Generate random Safe_Profile tables and error messages → verify hint content and absence of real data
- Property 4: Generate random positive integers → set env var → verify config
- Property 5: Generate random conversation histories and window sizes → verify trimming invariants
- Property 6: Generate random trimmed message sets → verify summary content and absence of code blocks
- Property 7: Pick random valid template names and requirement strings → verify prompt construction
- Property 8: Generate random strings not in registry → verify ValueError
- Property 9: Generate random file sizes and thresholds → verify loading method selection
- Property 10: Generate random chunked data → verify profile source
- Property 11: Generate random file sets with failures → verify merged output
Tag format: Feature: agent-robustness-optimization, Property {N}: {title}
Unit Tests
- Error classifier with specific known error messages (KeyError, ValueError, NameError, generic errors)
- Enriched hint generation with known column profiles
- Conversation trimming with exact message counts at boundary conditions
- Template retrieval for each registered template
- Progress callback wiring
- API endpoint response shapes (
GET /api/templates,GET /api/statuswith progress fields)
Integration Tests
GET /api/templatesreturns all registered templatesPOST /api/startwithtemplatefield passes template to agentGET /api/statusincludes progress fields during analysis- Multi-file parallel profiling with real CSV files
- End-to-end: start analysis with template → verify template prompt in conversation history