Files
iov_data_analysis_agent/.kiro/specs/analysis-dashboard-redesign/requirements.md

160 lines
14 KiB
Markdown

# Requirements Document
## Introduction
This feature redesigns the Analysis Dashboard from the current 3-tab layout (Live Log, Report, Gallery) to a new 3-tab layout (Execution Process, Data Files, Report) with richer functionality. The redesign introduces structured round-by-round execution cards, intermediate data file browsing, inline image display within the report, and a data evidence/supporting data feature that links analysis conclusions to the specific data rows that support them. The Gallery tab is removed; its functionality is absorbed into the Report tab.
## Glossary
- **Dashboard**: The main analysis output panel in the web frontend (`index.html`) containing tabs for viewing analysis results.
- **Execution_Process_Tab**: The new first tab (执行过程) replacing the Live Log tab, displaying analysis rounds as collapsible cards.
- **Round_Card**: A collapsible UI card within the Execution_Process_Tab representing one analysis round, containing reasoning, code, result summary, data evidence, and raw log.
- **Data_Files_Tab**: The new second tab (数据文件) showing intermediate data files produced during analysis.
- **Report_Tab**: The enhanced third tab (报告) with inline images and supporting data links.
- **Data_Evidence**: Specific data rows extracted during analysis that support a particular analytical conclusion or claim.
- **CodeExecutor**: The Python class (`utils/code_executor.py`) responsible for executing generated analysis code in an IPython environment.
- **DataAnalysisAgent**: The Python class (`data_analysis_agent.py`) orchestrating the multi-round LLM-driven analysis workflow.
- **SessionData**: The Python class (`web/main.py`) tracking per-session state including running status, output directory, and analysis results.
- **Status_API**: The `GET /api/status` endpoint polled every 2 seconds by the frontend to retrieve analysis progress.
- **Data_Files_API**: The new set of API endpoints (`GET /api/data-files`, `GET /api/data-files/preview`, `GET /api/data-files/download`) for listing, previewing, and downloading intermediate data files.
- **Round_Data**: A structured JSON object representing one analysis round, containing fields for reasoning, code, execution result summary, data evidence rows, and raw log output.
- **Auto_Detection**: The mechanism by which CodeExecutor automatically detects new DataFrames created during code execution and exports them as files.
- **Prompt_Guidance**: Instructions embedded in the system prompt that direct the LLM to proactively save intermediate analysis results as files.
## Requirements
### Requirement 1: Structured Round Data Capture
**User Story:** As a user, I want each analysis round's data to be captured in a structured format, so that the frontend can render rich execution cards instead of raw log text.
#### Acceptance Criteria
1. WHEN an analysis round completes, THE DataAnalysisAgent SHALL produce a Round_Data object containing the following fields: round number, AI reasoning text, generated code, execution result summary, data evidence rows (list of dictionaries), and raw log output.
2. WHEN the DataAnalysisAgent processes an LLM response with a YAML `reasoning` field, THE DataAnalysisAgent SHALL extract and store the reasoning text in the Round_Data object for that round.
3. THE DataAnalysisAgent SHALL append each completed Round_Data object to a list stored on the SessionData instance, preserving insertion order.
4. IF the LLM response does not contain a parseable `reasoning` field, THEN THE DataAnalysisAgent SHALL store an empty string as the reasoning text in the Round_Data object.
### Requirement 2: Structured Status API Response
**User Story:** As a frontend developer, I want the status API to return structured round data, so that I can render execution cards in real time.
#### Acceptance Criteria
1. WHEN the frontend polls `GET /api/status`, THE Status_API SHALL return a JSON response containing a `rounds` array of Round_Data objects in addition to the existing fields (`is_running`, `has_report`, `progress_percentage`, `current_round`, `max_rounds`, `status_message`).
2. WHEN a new analysis round completes between two polling intervals, THE Status_API SHALL include the newly completed Round_Data object in the `rounds` array on the next poll response.
3. THE Status_API SHALL continue to return the `log` field containing raw log text for backward compatibility.
### Requirement 3: Execution Process Tab UI
**User Story:** As a user, I want to see each analysis round as a collapsible card with reasoning, code, results, and data evidence, so that I can understand the step-by-step analysis process.
#### Acceptance Criteria
1. THE Dashboard SHALL display an "执行过程" (Execution Process) tab as the first tab, replacing the current "Live Log" tab.
2. WHEN the Execution_Process_Tab is active, THE Dashboard SHALL render one Round_Card for each entry in the `rounds` array returned by the Status_API.
3. THE Round_Card SHALL default to a collapsed state showing only the round number and a one-line execution result summary.
4. WHEN a user clicks on a collapsed Round_Card, THE Dashboard SHALL expand the card to reveal: AI reasoning text, generated code (in a collapsible sub-section), execution result summary, data evidence section (labeled "本轮数据案例"), and raw log output (in a collapsible sub-section).
5. WHEN a new Round_Data object appears in the polling response, THE Dashboard SHALL append a new Round_Card to the Execution_Process_Tab without removing or re-rendering existing cards.
6. WHILE analysis is running, THE Dashboard SHALL auto-scroll the Execution_Process_Tab to keep the latest Round_Card visible.
### Requirement 4: Data Evidence Capture
**User Story:** As a user, I want to see the specific data rows that support each analytical conclusion, so that I can verify claims made by the AI agent.
#### Acceptance Criteria
1. WHEN the CodeExecutor executes code that produces a DataFrame result, THE CodeExecutor SHALL capture up to 10 representative rows from that DataFrame as the data evidence for the current round.
2. THE CodeExecutor SHALL serialize data evidence rows as a list of dictionaries (one dictionary per row, keys being column names) and include the list in the execution result returned to the DataAnalysisAgent.
3. IF the code execution does not produce a DataFrame result, THEN THE CodeExecutor SHALL return an empty list as the data evidence.
4. THE DataAnalysisAgent SHALL include the data evidence list in the Round_Data object for the corresponding round.
### Requirement 5: DataFrame Auto-Detection and Export
**User Story:** As a user, I want intermediate DataFrames created during analysis to be automatically saved as files, so that I can browse and download them from the Data Files tab.
#### Acceptance Criteria
1. WHEN code execution completes, THE CodeExecutor SHALL compare the set of DataFrame variables in the IPython namespace before and after execution to detect newly created DataFrames.
2. WHEN a new DataFrame variable is detected, THE CodeExecutor SHALL export the DataFrame to the session output directory as a CSV file named `{variable_name}.csv`.
3. IF a file with the same name already exists in the session output directory, THEN THE CodeExecutor SHALL append a numeric suffix (e.g., `_1`, `_2`) to avoid overwriting.
4. THE CodeExecutor SHALL record metadata for each auto-exported file: variable name, filename, row count, column count, and column names.
5. WHEN auto-export completes, THE CodeExecutor SHALL include the exported file metadata in the execution result returned to the DataAnalysisAgent.
### Requirement 6: Prompt Guidance for Intermediate File Saving
**User Story:** As a user, I want the LLM to proactively save intermediate analysis results as files, so that important intermediate datasets are available for review.
#### Acceptance Criteria
1. THE system prompt (`prompts.py`) SHALL include instructions directing the LLM to save intermediate analysis results (filtered subsets, aggregation tables, clustering results) as CSV or XLSX files in the `session_output_dir`.
2. THE system prompt SHALL instruct the LLM to print a standardized marker line after saving each file, in the format: `[DATA_FILE_SAVED] filename: {name}, rows: {count}, description: {desc}`.
3. WHEN the CodeExecutor detects a `[DATA_FILE_SAVED]` marker in the execution output, THE CodeExecutor SHALL parse the marker and record the file metadata (filename, row count, description).
### Requirement 7: Data Files API
**User Story:** As a frontend developer, I want API endpoints to list, preview, and download intermediate data files, so that the Data Files tab can display and serve them.
#### Acceptance Criteria
1. WHEN the frontend requests `GET /api/data-files?session_id={id}`, THE Data_Files_API SHALL return a JSON array of file entries, each containing: filename, description, row count, column count, and file size in bytes.
2. WHEN the frontend requests `GET /api/data-files/preview?session_id={id}&filename={name}`, THE Data_Files_API SHALL return a JSON object containing: column names (list of strings), and up to 5 data rows (list of dictionaries).
3. WHEN the frontend requests `GET /api/data-files/download?session_id={id}&filename={name}`, THE Data_Files_API SHALL return the file as a downloadable attachment with the appropriate MIME type (`text/csv` for CSV, `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` for XLSX).
4. IF the requested file does not exist, THEN THE Data_Files_API SHALL return HTTP 404 with a descriptive error message.
### Requirement 8: Data Files Tab UI
**User Story:** As a user, I want to browse intermediate data files produced during analysis, preview their contents, and download them individually.
#### Acceptance Criteria
1. THE Dashboard SHALL display a "数据文件" (Data Files) tab as the second tab.
2. WHEN the Data_Files_Tab is active, THE Dashboard SHALL fetch the file list from `GET /api/data-files` and render each file as a card showing: filename, description, and row count.
3. WHEN a user clicks on a file card, THE Dashboard SHALL fetch the preview from `GET /api/data-files/preview` and display a table showing column headers and up to 5 data rows.
4. WHEN a user clicks the download button on a file card, THE Dashboard SHALL initiate a file download via `GET /api/data-files/download`.
5. WHILE analysis is running, THE Dashboard SHALL refresh the file list on each polling cycle to show newly created files.
### Requirement 9: Gallery Removal and Inline Images in Report
**User Story:** As a user, I want images displayed inline within report paragraphs instead of in a separate Gallery tab, so that visual evidence is presented in context.
#### Acceptance Criteria
1. THE Dashboard SHALL remove the "Gallery" tab from the tab bar.
2. THE Dashboard SHALL remove the gallery carousel UI (carousel container, navigation buttons, image info panel) from the HTML.
3. THE Report_Tab SHALL render images inline within report paragraphs using standard Markdown image syntax (`![alt](url)`), as already supported by the existing `marked.js` rendering.
4. THE `switchTab` function in `script.js` SHALL handle only the three new tab identifiers: `execution`, `datafiles`, and `report`.
5. THE frontend SHALL remove all gallery-related JavaScript functions (`loadGallery`, `renderGalleryImage`, `prevImage`, `nextImage`) and associated state variables (`galleryImages`, `currentImageIndex`).
### Requirement 10: Supporting Data Button in Report
**User Story:** As a user, I want report paragraphs that make data-driven claims to have a "查看支撑数据" button, so that I can view the evidence data that supports each conclusion.
#### Acceptance Criteria
1. WHEN the Report_Tab renders a paragraph of type `text` that has associated data evidence, THE Dashboard SHALL display a "查看支撑数据" (View Supporting Data) button below the paragraph content.
2. WHEN a user clicks the "查看支撑数据" button, THE Dashboard SHALL display a popover or modal showing the associated data evidence rows in a table format.
3. THE `GET /api/report` response SHALL include a `supporting_data` mapping (keyed by paragraph ID) containing the data evidence rows relevant to each paragraph.
4. IF a paragraph has no associated data evidence, THEN THE Dashboard SHALL not display the "查看支撑数据" button for that paragraph.
### Requirement 11: Report-to-Evidence Linking in Backend
**User Story:** As a backend developer, I want the system to associate data evidence from execution rounds with report paragraphs, so that the frontend can display supporting data buttons.
#### Acceptance Criteria
1. WHEN generating the final report, THE DataAnalysisAgent SHALL pass the collected data evidence from all rounds to the report generation prompt.
2. THE final report generation prompt SHALL instruct the LLM to annotate report paragraphs with round references (e.g., `<!-- evidence:round_3 -->`) when a paragraph's content is derived from a specific analysis round.
3. WHEN the `GET /api/report` endpoint parses the report, THE backend SHALL extract evidence annotations and build a `supporting_data` mapping by looking up the referenced round's data evidence from the SessionData.
4. IF a paragraph contains no evidence annotation, THEN THE backend SHALL exclude that paragraph from the `supporting_data` mapping.
### Requirement 12: Session Data Model Extension
**User Story:** As a backend developer, I want the SessionData model to store structured round data and data file metadata, so that the new API endpoints can serve this information.
#### Acceptance Criteria
1. THE SessionData class SHALL include a `rounds` attribute (list of Round_Data dictionaries) to store structured data for each completed analysis round.
2. THE SessionData class SHALL include a `data_files` attribute (list of file metadata dictionaries) to store information about intermediate data files.
3. WHEN a new data file is detected (via auto-detection or prompt-guided saving), THE DataAnalysisAgent SHALL append the file metadata to the SessionData `data_files` list.
4. THE SessionData class SHALL persist the `rounds` and `data_files` attributes to the session's `results.json` file upon analysis completion.