160 lines
14 KiB
Markdown
160 lines
14 KiB
Markdown
|
|
# Requirements Document
|
||
|
|
|
||
|
|
## Introduction
|
||
|
|
|
||
|
|
This feature redesigns the Analysis Dashboard from the current 3-tab layout (Live Log, Report, Gallery) to a new 3-tab layout (Execution Process, Data Files, Report) with richer functionality. The redesign introduces structured round-by-round execution cards, intermediate data file browsing, inline image display within the report, and a data evidence/supporting data feature that links analysis conclusions to the specific data rows that support them. The Gallery tab is removed; its functionality is absorbed into the Report tab.
|
||
|
|
|
||
|
|
## Glossary
|
||
|
|
|
||
|
|
- **Dashboard**: The main analysis output panel in the web frontend (`index.html`) containing tabs for viewing analysis results.
|
||
|
|
- **Execution_Process_Tab**: The new first tab (执行过程) replacing the Live Log tab, displaying analysis rounds as collapsible cards.
|
||
|
|
- **Round_Card**: A collapsible UI card within the Execution_Process_Tab representing one analysis round, containing reasoning, code, result summary, data evidence, and raw log.
|
||
|
|
- **Data_Files_Tab**: The new second tab (数据文件) showing intermediate data files produced during analysis.
|
||
|
|
- **Report_Tab**: The enhanced third tab (报告) with inline images and supporting data links.
|
||
|
|
- **Data_Evidence**: Specific data rows extracted during analysis that support a particular analytical conclusion or claim.
|
||
|
|
- **CodeExecutor**: The Python class (`utils/code_executor.py`) responsible for executing generated analysis code in an IPython environment.
|
||
|
|
- **DataAnalysisAgent**: The Python class (`data_analysis_agent.py`) orchestrating the multi-round LLM-driven analysis workflow.
|
||
|
|
- **SessionData**: The Python class (`web/main.py`) tracking per-session state including running status, output directory, and analysis results.
|
||
|
|
- **Status_API**: The `GET /api/status` endpoint polled every 2 seconds by the frontend to retrieve analysis progress.
|
||
|
|
- **Data_Files_API**: The new set of API endpoints (`GET /api/data-files`, `GET /api/data-files/preview`, `GET /api/data-files/download`) for listing, previewing, and downloading intermediate data files.
|
||
|
|
- **Round_Data**: A structured JSON object representing one analysis round, containing fields for reasoning, code, execution result summary, data evidence rows, and raw log output.
|
||
|
|
- **Auto_Detection**: The mechanism by which CodeExecutor automatically detects new DataFrames created during code execution and exports them as files.
|
||
|
|
- **Prompt_Guidance**: Instructions embedded in the system prompt that direct the LLM to proactively save intermediate analysis results as files.
|
||
|
|
|
||
|
|
## Requirements
|
||
|
|
|
||
|
|
### Requirement 1: Structured Round Data Capture
|
||
|
|
|
||
|
|
**User Story:** As a user, I want each analysis round's data to be captured in a structured format, so that the frontend can render rich execution cards instead of raw log text.
|
||
|
|
|
||
|
|
#### Acceptance Criteria
|
||
|
|
|
||
|
|
1. WHEN an analysis round completes, THE DataAnalysisAgent SHALL produce a Round_Data object containing the following fields: round number, AI reasoning text, generated code, execution result summary, data evidence rows (list of dictionaries), and raw log output.
|
||
|
|
2. WHEN the DataAnalysisAgent processes an LLM response with a YAML `reasoning` field, THE DataAnalysisAgent SHALL extract and store the reasoning text in the Round_Data object for that round.
|
||
|
|
3. THE DataAnalysisAgent SHALL append each completed Round_Data object to a list stored on the SessionData instance, preserving insertion order.
|
||
|
|
4. IF the LLM response does not contain a parseable `reasoning` field, THEN THE DataAnalysisAgent SHALL store an empty string as the reasoning text in the Round_Data object.
|
||
|
|
|
||
|
|
### Requirement 2: Structured Status API Response
|
||
|
|
|
||
|
|
**User Story:** As a frontend developer, I want the status API to return structured round data, so that I can render execution cards in real time.
|
||
|
|
|
||
|
|
#### Acceptance Criteria
|
||
|
|
|
||
|
|
1. WHEN the frontend polls `GET /api/status`, THE Status_API SHALL return a JSON response containing a `rounds` array of Round_Data objects in addition to the existing fields (`is_running`, `has_report`, `progress_percentage`, `current_round`, `max_rounds`, `status_message`).
|
||
|
|
2. WHEN a new analysis round completes between two polling intervals, THE Status_API SHALL include the newly completed Round_Data object in the `rounds` array on the next poll response.
|
||
|
|
3. THE Status_API SHALL continue to return the `log` field containing raw log text for backward compatibility.
|
||
|
|
|
||
|
|
### Requirement 3: Execution Process Tab UI
|
||
|
|
|
||
|
|
**User Story:** As a user, I want to see each analysis round as a collapsible card with reasoning, code, results, and data evidence, so that I can understand the step-by-step analysis process.
|
||
|
|
|
||
|
|
#### Acceptance Criteria
|
||
|
|
|
||
|
|
1. THE Dashboard SHALL display an "执行过程" (Execution Process) tab as the first tab, replacing the current "Live Log" tab.
|
||
|
|
2. WHEN the Execution_Process_Tab is active, THE Dashboard SHALL render one Round_Card for each entry in the `rounds` array returned by the Status_API.
|
||
|
|
3. THE Round_Card SHALL default to a collapsed state showing only the round number and a one-line execution result summary.
|
||
|
|
4. WHEN a user clicks on a collapsed Round_Card, THE Dashboard SHALL expand the card to reveal: AI reasoning text, generated code (in a collapsible sub-section), execution result summary, data evidence section (labeled "本轮数据案例"), and raw log output (in a collapsible sub-section).
|
||
|
|
5. WHEN a new Round_Data object appears in the polling response, THE Dashboard SHALL append a new Round_Card to the Execution_Process_Tab without removing or re-rendering existing cards.
|
||
|
|
6. WHILE analysis is running, THE Dashboard SHALL auto-scroll the Execution_Process_Tab to keep the latest Round_Card visible.
|
||
|
|
|
||
|
|
### Requirement 4: Data Evidence Capture
|
||
|
|
|
||
|
|
**User Story:** As a user, I want to see the specific data rows that support each analytical conclusion, so that I can verify claims made by the AI agent.
|
||
|
|
|
||
|
|
#### Acceptance Criteria
|
||
|
|
|
||
|
|
1. WHEN the CodeExecutor executes code that produces a DataFrame result, THE CodeExecutor SHALL capture up to 10 representative rows from that DataFrame as the data evidence for the current round.
|
||
|
|
2. THE CodeExecutor SHALL serialize data evidence rows as a list of dictionaries (one dictionary per row, keys being column names) and include the list in the execution result returned to the DataAnalysisAgent.
|
||
|
|
3. IF the code execution does not produce a DataFrame result, THEN THE CodeExecutor SHALL return an empty list as the data evidence.
|
||
|
|
4. THE DataAnalysisAgent SHALL include the data evidence list in the Round_Data object for the corresponding round.
|
||
|
|
|
||
|
|
### Requirement 5: DataFrame Auto-Detection and Export
|
||
|
|
|
||
|
|
**User Story:** As a user, I want intermediate DataFrames created during analysis to be automatically saved as files, so that I can browse and download them from the Data Files tab.
|
||
|
|
|
||
|
|
#### Acceptance Criteria
|
||
|
|
|
||
|
|
1. WHEN code execution completes, THE CodeExecutor SHALL compare the set of DataFrame variables in the IPython namespace before and after execution to detect newly created DataFrames.
|
||
|
|
2. WHEN a new DataFrame variable is detected, THE CodeExecutor SHALL export the DataFrame to the session output directory as a CSV file named `{variable_name}.csv`.
|
||
|
|
3. IF a file with the same name already exists in the session output directory, THEN THE CodeExecutor SHALL append a numeric suffix (e.g., `_1`, `_2`) to avoid overwriting.
|
||
|
|
4. THE CodeExecutor SHALL record metadata for each auto-exported file: variable name, filename, row count, column count, and column names.
|
||
|
|
5. WHEN auto-export completes, THE CodeExecutor SHALL include the exported file metadata in the execution result returned to the DataAnalysisAgent.
|
||
|
|
|
||
|
|
### Requirement 6: Prompt Guidance for Intermediate File Saving
|
||
|
|
|
||
|
|
**User Story:** As a user, I want the LLM to proactively save intermediate analysis results as files, so that important intermediate datasets are available for review.
|
||
|
|
|
||
|
|
#### Acceptance Criteria
|
||
|
|
|
||
|
|
1. THE system prompt (`prompts.py`) SHALL include instructions directing the LLM to save intermediate analysis results (filtered subsets, aggregation tables, clustering results) as CSV or XLSX files in the `session_output_dir`.
|
||
|
|
2. THE system prompt SHALL instruct the LLM to print a standardized marker line after saving each file, in the format: `[DATA_FILE_SAVED] filename: {name}, rows: {count}, description: {desc}`.
|
||
|
|
3. WHEN the CodeExecutor detects a `[DATA_FILE_SAVED]` marker in the execution output, THE CodeExecutor SHALL parse the marker and record the file metadata (filename, row count, description).
|
||
|
|
|
||
|
|
### Requirement 7: Data Files API
|
||
|
|
|
||
|
|
**User Story:** As a frontend developer, I want API endpoints to list, preview, and download intermediate data files, so that the Data Files tab can display and serve them.
|
||
|
|
|
||
|
|
#### Acceptance Criteria
|
||
|
|
|
||
|
|
1. WHEN the frontend requests `GET /api/data-files?session_id={id}`, THE Data_Files_API SHALL return a JSON array of file entries, each containing: filename, description, row count, column count, and file size in bytes.
|
||
|
|
2. WHEN the frontend requests `GET /api/data-files/preview?session_id={id}&filename={name}`, THE Data_Files_API SHALL return a JSON object containing: column names (list of strings), and up to 5 data rows (list of dictionaries).
|
||
|
|
3. WHEN the frontend requests `GET /api/data-files/download?session_id={id}&filename={name}`, THE Data_Files_API SHALL return the file as a downloadable attachment with the appropriate MIME type (`text/csv` for CSV, `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet` for XLSX).
|
||
|
|
4. IF the requested file does not exist, THEN THE Data_Files_API SHALL return HTTP 404 with a descriptive error message.
|
||
|
|
|
||
|
|
### Requirement 8: Data Files Tab UI
|
||
|
|
|
||
|
|
**User Story:** As a user, I want to browse intermediate data files produced during analysis, preview their contents, and download them individually.
|
||
|
|
|
||
|
|
#### Acceptance Criteria
|
||
|
|
|
||
|
|
1. THE Dashboard SHALL display a "数据文件" (Data Files) tab as the second tab.
|
||
|
|
2. WHEN the Data_Files_Tab is active, THE Dashboard SHALL fetch the file list from `GET /api/data-files` and render each file as a card showing: filename, description, and row count.
|
||
|
|
3. WHEN a user clicks on a file card, THE Dashboard SHALL fetch the preview from `GET /api/data-files/preview` and display a table showing column headers and up to 5 data rows.
|
||
|
|
4. WHEN a user clicks the download button on a file card, THE Dashboard SHALL initiate a file download via `GET /api/data-files/download`.
|
||
|
|
5. WHILE analysis is running, THE Dashboard SHALL refresh the file list on each polling cycle to show newly created files.
|
||
|
|
|
||
|
|
### Requirement 9: Gallery Removal and Inline Images in Report
|
||
|
|
|
||
|
|
**User Story:** As a user, I want images displayed inline within report paragraphs instead of in a separate Gallery tab, so that visual evidence is presented in context.
|
||
|
|
|
||
|
|
#### Acceptance Criteria
|
||
|
|
|
||
|
|
1. THE Dashboard SHALL remove the "Gallery" tab from the tab bar.
|
||
|
|
2. THE Dashboard SHALL remove the gallery carousel UI (carousel container, navigation buttons, image info panel) from the HTML.
|
||
|
|
3. THE Report_Tab SHALL render images inline within report paragraphs using standard Markdown image syntax (``), as already supported by the existing `marked.js` rendering.
|
||
|
|
4. THE `switchTab` function in `script.js` SHALL handle only the three new tab identifiers: `execution`, `datafiles`, and `report`.
|
||
|
|
5. THE frontend SHALL remove all gallery-related JavaScript functions (`loadGallery`, `renderGalleryImage`, `prevImage`, `nextImage`) and associated state variables (`galleryImages`, `currentImageIndex`).
|
||
|
|
|
||
|
|
### Requirement 10: Supporting Data Button in Report
|
||
|
|
|
||
|
|
**User Story:** As a user, I want report paragraphs that make data-driven claims to have a "查看支撑数据" button, so that I can view the evidence data that supports each conclusion.
|
||
|
|
|
||
|
|
#### Acceptance Criteria
|
||
|
|
|
||
|
|
1. WHEN the Report_Tab renders a paragraph of type `text` that has associated data evidence, THE Dashboard SHALL display a "查看支撑数据" (View Supporting Data) button below the paragraph content.
|
||
|
|
2. WHEN a user clicks the "查看支撑数据" button, THE Dashboard SHALL display a popover or modal showing the associated data evidence rows in a table format.
|
||
|
|
3. THE `GET /api/report` response SHALL include a `supporting_data` mapping (keyed by paragraph ID) containing the data evidence rows relevant to each paragraph.
|
||
|
|
4. IF a paragraph has no associated data evidence, THEN THE Dashboard SHALL not display the "查看支撑数据" button for that paragraph.
|
||
|
|
|
||
|
|
### Requirement 11: Report-to-Evidence Linking in Backend
|
||
|
|
|
||
|
|
**User Story:** As a backend developer, I want the system to associate data evidence from execution rounds with report paragraphs, so that the frontend can display supporting data buttons.
|
||
|
|
|
||
|
|
#### Acceptance Criteria
|
||
|
|
|
||
|
|
1. WHEN generating the final report, THE DataAnalysisAgent SHALL pass the collected data evidence from all rounds to the report generation prompt.
|
||
|
|
2. THE final report generation prompt SHALL instruct the LLM to annotate report paragraphs with round references (e.g., `<!-- evidence:round_3 -->`) when a paragraph's content is derived from a specific analysis round.
|
||
|
|
3. WHEN the `GET /api/report` endpoint parses the report, THE backend SHALL extract evidence annotations and build a `supporting_data` mapping by looking up the referenced round's data evidence from the SessionData.
|
||
|
|
4. IF a paragraph contains no evidence annotation, THEN THE backend SHALL exclude that paragraph from the `supporting_data` mapping.
|
||
|
|
|
||
|
|
### Requirement 12: Session Data Model Extension
|
||
|
|
|
||
|
|
**User Story:** As a backend developer, I want the SessionData model to store structured round data and data file metadata, so that the new API endpoints can serve this information.
|
||
|
|
|
||
|
|
#### Acceptance Criteria
|
||
|
|
|
||
|
|
1. THE SessionData class SHALL include a `rounds` attribute (list of Round_Data dictionaries) to store structured data for each completed analysis round.
|
||
|
|
2. THE SessionData class SHALL include a `data_files` attribute (list of file metadata dictionaries) to store information about intermediate data files.
|
||
|
|
3. WHEN a new data file is detected (via auto-detection or prompt-guided saving), THE DataAnalysisAgent SHALL append the file metadata to the SessionData `data_files` list.
|
||
|
|
4. THE SessionData class SHALL persist the `rounds` and `data_files` attributes to the session's `results.json` file upon analysis completion.
|