Files
iov_data_analysis_agent/.kiro/specs/agent-robustness-optimization/requirements.md

143 lines
12 KiB
Markdown

# Requirements Document
## Introduction
This document specifies the requirements for improving the robustness, efficiency, and usability of the AI Data Analysis Agent. The improvements span five areas: a data privacy fallback mechanism for recovering from LLM-generated code failures when real data is unavailable, conversation history trimming to reduce token consumption and prevent data leakage, integration of the existing analysis template system, frontend progress bar display, and multi-file parallel/chunked analysis support.
## Glossary
- **Agent**: The `DataAnalysisAgent` class in `data_analysis_agent.py` that orchestrates LLM calls and IPython code execution for data analysis.
- **Safe_Profile**: The schema-only data description generated by `build_safe_profile()` in `utils/data_privacy.py`, containing column names, data types, null rates, and unique value counts — but no real data values.
- **Local_Profile**: The full data profile generated by `build_local_profile()` containing real data values, statistics, and sample rows — used only in the local execution environment.
- **Code_Executor**: The `CodeExecutor` class in `utils/code_executor.py` that runs Python code in an IPython sandbox and returns execution results.
- **Conversation_History**: The list of `{"role": ..., "content": ...}` message dictionaries maintained by the Agent across analysis rounds.
- **Feedback_Sanitizer**: The `sanitize_execution_feedback()` function in `utils/data_privacy.py` that removes real data values from execution output before sending to the LLM.
- **Template_Registry**: The `TEMPLATE_REGISTRY` dictionary in `utils/analysis_templates.py` mapping template names to template classes.
- **Session_Data**: The `SessionData` class in `web/main.py` that tracks session state including `progress_percentage`, `current_round`, `max_rounds`, and `status_message`.
- **Polling_Loop**: The `setInterval`-based polling mechanism in `web/static/script.js` that fetches `/api/status` every 2 seconds.
- **Data_Loader**: The module `utils/data_loader.py` providing `load_and_profile_data`, `load_data_chunked`, and `load_data_with_cache` functions.
- **AppConfig**: The `AppConfig` dataclass in `config/app_config.py` holding configuration values such as `max_rounds`, `chunk_size`, and `max_file_size_mb`.
## Requirements
### Requirement 1: Data Privacy Fallback — Error Detection
**User Story:** As a system operator, I want the Agent to detect when LLM-generated code fails due to missing real data context, so that the system can attempt intelligent recovery instead of wasting an analysis round.
#### Acceptance Criteria
1. WHEN the Code_Executor returns a failed execution result, THE Agent SHALL classify the error as either a data-context error or a non-data error by inspecting the error message for patterns such as `KeyError`, `ValueError` on column values, `NameError` for undefined data variables, or empty DataFrame conditions.
2. WHEN a data-context error is detected, THE Agent SHALL increment a per-round retry counter for the current analysis round.
3. WHILE the retry counter for a given round is below the configured maximum retry limit, THE Agent SHALL attempt recovery by generating an enriched hint prompt rather than forwarding the raw error to the LLM as a normal failure.
4. IF the retry counter reaches the configured maximum retry limit, THEN THE Agent SHALL fall back to normal error handling by forwarding the sanitized error feedback to the LLM and proceeding to the next round.
### Requirement 2: Data Privacy Fallback — Enriched Hint Generation
**User Story:** As a system operator, I want the Agent to provide the LLM with enriched schema hints when data-context errors occur, so that the LLM can generate corrected code without receiving raw data values.
#### Acceptance Criteria
1. WHEN a data-context error is detected and retry is permitted, THE Agent SHALL generate an enriched hint containing the relevant column's data type, unique value count, null rate, and a categorical description (e.g., "low-cardinality category with 5 classes") extracted from the Safe_Profile.
2. WHEN the error involves a specific column name referenced in the error message, THE Agent SHALL include that column's schema metadata in the enriched hint.
3. THE Agent SHALL append the enriched hint to the conversation history as a user message with a prefix indicating it is a retry context, before requesting a new LLM response.
4. THE Agent SHALL NOT include any real data values, sample rows, or statistical values (min, max, mean) from the Local_Profile in the enriched hint sent to the LLM.
### Requirement 3: Data Privacy Fallback — Configuration
**User Story:** As a system operator, I want to configure the maximum number of data-context retries, so that I can balance between recovery attempts and analysis throughput.
#### Acceptance Criteria
1. THE AppConfig SHALL include a `max_data_context_retries` field with a default value of 2.
2. WHEN the `APP_MAX_DATA_CONTEXT_RETRIES` environment variable is set, THE AppConfig SHALL use its integer value to override the default.
3. THE Agent SHALL read the `max_data_context_retries` value from AppConfig during initialization.
### Requirement 4: Conversation History Trimming — Sliding Window
**User Story:** As a system operator, I want the conversation history to be trimmed using a sliding window, so that token consumption stays bounded and early execution results containing potential data leakage are removed.
#### Acceptance Criteria
1. THE AppConfig SHALL include a `conversation_window_size` field with a default value of 10, representing the maximum number of recent message pairs to retain in full.
2. WHEN the Conversation_History length exceeds twice the `conversation_window_size` (counting individual messages), THE Agent SHALL retain only the most recent `conversation_window_size` pairs of messages in full detail.
3. THE Agent SHALL always retain the first user message (containing the original requirement and Safe_Profile) regardless of window trimming.
4. WHEN messages are trimmed from the Conversation_History, THE Agent SHALL generate a compressed summary of the trimmed messages and prepend it after the first user message.
### Requirement 5: Conversation History Trimming — Summary Compression
**User Story:** As a system operator, I want trimmed conversation rounds to be compressed into a summary, so that the LLM retains awareness of prior analysis steps without consuming excessive tokens.
#### Acceptance Criteria
1. WHEN conversation messages are trimmed, THE Agent SHALL produce a summary string that lists each trimmed round's action type (generate_code, collect_figures), a one-line description of what was done, and whether execution succeeded or failed.
2. THE summary SHALL NOT contain any code blocks, raw execution output, or data values from prior rounds.
3. THE summary SHALL be inserted into the Conversation_History as a single user message immediately after the first user message, replacing any previous summary message.
4. IF no messages have been trimmed, THEN THE Agent SHALL NOT insert a summary message.
### Requirement 6: Analysis Template System — Backend Integration
**User Story:** As a user, I want to select a predefined analysis template when starting an analysis, so that the Agent follows a structured analysis plan tailored to my scenario.
#### Acceptance Criteria
1. WHEN a template name is provided in the analysis request, THE Agent SHALL retrieve the corresponding template from the Template_Registry using the `get_template()` function.
2. WHEN a valid template is retrieved, THE Agent SHALL call `get_full_prompt()` on the template and prepend the resulting structured prompt to the user's requirement in the initial conversation message.
3. IF an invalid template name is provided, THEN THE Agent SHALL raise a descriptive error listing available template names.
4. WHEN no template name is provided, THE Agent SHALL proceed with the default unstructured analysis flow.
### Requirement 7: Analysis Template System — API Endpoint
**User Story:** As a frontend developer, I want API endpoints to list available templates and to accept a template selection when starting analysis, so that the frontend can offer template choices to users.
#### Acceptance Criteria
1. THE FastAPI server SHALL expose a `GET /api/templates` endpoint that returns the list of available templates by calling `list_templates()`, with each entry containing `name`, `display_name`, and `description`.
2. THE `POST /api/start` request body SHALL accept an optional `template` field containing the template name string.
3. WHEN the `template` field is present in the start request, THE FastAPI server SHALL pass the template name to the Agent's `analyze()` method.
4. WHEN the `template` field is absent or empty, THE FastAPI server SHALL start analysis without a template.
### Requirement 8: Analysis Template System — Frontend Template Selector
**User Story:** As a user, I want to see and select analysis templates in the web interface before starting analysis, so that I can choose a structured analysis approach.
#### Acceptance Criteria
1. WHEN the web page loads, THE frontend SHALL fetch the template list from `GET /api/templates` and render selectable template cards above the requirement input area.
2. WHEN a user selects a template card, THE frontend SHALL visually highlight the selected template and store the template name.
3. WHEN the user clicks "Start Analysis" with a template selected, THE frontend SHALL include the template name in the `POST /api/start` request body.
4. THE frontend SHALL provide a "No Template (Free Analysis)" option that is selected by default, allowing users to proceed without a template.
### Requirement 9: Frontend Progress Bar Display
**User Story:** As a user, I want to see a real-time progress bar during analysis, so that I can understand how far the analysis has progressed.
#### Acceptance Criteria
1. THE FastAPI server SHALL update the Session_Data's `current_round`, `max_rounds`, `progress_percentage`, and `status_message` fields during each analysis round in the `run_analysis_task` function.
2. THE `GET /api/status` response SHALL include `current_round`, `max_rounds`, `progress_percentage`, and `status_message` fields.
3. WHEN the Polling_Loop receives status data with `is_running` equal to true, THE frontend SHALL render a progress bar element showing the `progress_percentage` value and the `status_message` text.
4. WHEN `progress_percentage` changes between polls, THE frontend SHALL animate the progress bar width transition smoothly.
5. WHEN `is_running` becomes false, THE frontend SHALL set the progress bar to 100% and display a completion message.
### Requirement 10: Multi-File Chunked Loading
**User Story:** As a user, I want large data files to be loaded in chunks, so that the system can handle files that exceed available memory.
#### Acceptance Criteria
1. WHEN a data file's size exceeds the `max_file_size_mb` threshold in AppConfig, THE Data_Loader SHALL use `load_data_chunked()` to stream the file in chunks of `chunk_size` rows instead of loading the entire file into memory.
2. WHEN chunked loading is used, THE Agent SHALL instruct the Code_Executor to make the chunked iterator available in the notebook environment as a variable, so that LLM-generated code can process data in chunks.
3. WHEN chunked loading is used for profiling, THE Agent SHALL generate the Safe_Profile by reading only the first chunk plus sampling from subsequent chunks, rather than loading the entire file.
4. IF a file cannot be loaded even in chunked mode, THEN THE Data_Loader SHALL return a descriptive error message indicating the failure reason.
### Requirement 11: Multi-File Parallel Profiling
**User Story:** As a user, I want multiple data files to be profiled concurrently, so that the initial data exploration phase completes faster when multiple files are uploaded.
#### Acceptance Criteria
1. WHEN multiple files are provided for analysis, THE Agent SHALL profile each file concurrently using thread-based parallelism rather than sequentially.
2. THE Agent SHALL collect all profiling results and merge them into a single Safe_Profile string and a single Local_Profile string, maintaining the same format as the current sequential output.
3. IF any individual file profiling fails, THEN THE Agent SHALL include an error entry for that file in the profile output and continue profiling the remaining files.
4. THE AppConfig SHALL include a `max_parallel_profiles` field with a default value of 4, controlling the maximum number of concurrent profiling threads.