14 KiB
Requirements Document
Introduction
This feature redesigns the Analysis Dashboard from the current 3-tab layout (Live Log, Report, Gallery) to a new 3-tab layout (Execution Process, Data Files, Report) with richer functionality. The redesign introduces structured round-by-round execution cards, intermediate data file browsing, inline image display within the report, and a data evidence/supporting data feature that links analysis conclusions to the specific data rows that support them. The Gallery tab is removed; its functionality is absorbed into the Report tab.
Glossary
- Dashboard: The main analysis output panel in the web frontend (
index.html) containing tabs for viewing analysis results. - Execution_Process_Tab: The new first tab (执行过程) replacing the Live Log tab, displaying analysis rounds as collapsible cards.
- Round_Card: A collapsible UI card within the Execution_Process_Tab representing one analysis round, containing reasoning, code, result summary, data evidence, and raw log.
- Data_Files_Tab: The new second tab (数据文件) showing intermediate data files produced during analysis.
- Report_Tab: The enhanced third tab (报告) with inline images and supporting data links.
- Data_Evidence: Specific data rows extracted during analysis that support a particular analytical conclusion or claim.
- CodeExecutor: The Python class (
utils/code_executor.py) responsible for executing generated analysis code in an IPython environment. - DataAnalysisAgent: The Python class (
data_analysis_agent.py) orchestrating the multi-round LLM-driven analysis workflow. - SessionData: The Python class (
web/main.py) tracking per-session state including running status, output directory, and analysis results. - Status_API: The
GET /api/statusendpoint polled every 2 seconds by the frontend to retrieve analysis progress. - Data_Files_API: The new set of API endpoints (
GET /api/data-files,GET /api/data-files/preview,GET /api/data-files/download) for listing, previewing, and downloading intermediate data files. - Round_Data: A structured JSON object representing one analysis round, containing fields for reasoning, code, execution result summary, data evidence rows, and raw log output.
- Auto_Detection: The mechanism by which CodeExecutor automatically detects new DataFrames created during code execution and exports them as files.
- Prompt_Guidance: Instructions embedded in the system prompt that direct the LLM to proactively save intermediate analysis results as files.
Requirements
Requirement 1: Structured Round Data Capture
User Story: As a user, I want each analysis round's data to be captured in a structured format, so that the frontend can render rich execution cards instead of raw log text.
Acceptance Criteria
- WHEN an analysis round completes, THE DataAnalysisAgent SHALL produce a Round_Data object containing the following fields: round number, AI reasoning text, generated code, execution result summary, data evidence rows (list of dictionaries), and raw log output.
- WHEN the DataAnalysisAgent processes an LLM response with a YAML
reasoningfield, THE DataAnalysisAgent SHALL extract and store the reasoning text in the Round_Data object for that round. - THE DataAnalysisAgent SHALL append each completed Round_Data object to a list stored on the SessionData instance, preserving insertion order.
- IF the LLM response does not contain a parseable
reasoningfield, THEN THE DataAnalysisAgent SHALL store an empty string as the reasoning text in the Round_Data object.
Requirement 2: Structured Status API Response
User Story: As a frontend developer, I want the status API to return structured round data, so that I can render execution cards in real time.
Acceptance Criteria
- WHEN the frontend polls
GET /api/status, THE Status_API SHALL return a JSON response containing aroundsarray of Round_Data objects in addition to the existing fields (is_running,has_report,progress_percentage,current_round,max_rounds,status_message). - WHEN a new analysis round completes between two polling intervals, THE Status_API SHALL include the newly completed Round_Data object in the
roundsarray on the next poll response. - THE Status_API SHALL continue to return the
logfield containing raw log text for backward compatibility.
Requirement 3: Execution Process Tab UI
User Story: As a user, I want to see each analysis round as a collapsible card with reasoning, code, results, and data evidence, so that I can understand the step-by-step analysis process.
Acceptance Criteria
- THE Dashboard SHALL display an "执行过程" (Execution Process) tab as the first tab, replacing the current "Live Log" tab.
- WHEN the Execution_Process_Tab is active, THE Dashboard SHALL render one Round_Card for each entry in the
roundsarray returned by the Status_API. - THE Round_Card SHALL default to a collapsed state showing only the round number and a one-line execution result summary.
- WHEN a user clicks on a collapsed Round_Card, THE Dashboard SHALL expand the card to reveal: AI reasoning text, generated code (in a collapsible sub-section), execution result summary, data evidence section (labeled "本轮数据案例"), and raw log output (in a collapsible sub-section).
- WHEN a new Round_Data object appears in the polling response, THE Dashboard SHALL append a new Round_Card to the Execution_Process_Tab without removing or re-rendering existing cards.
- WHILE analysis is running, THE Dashboard SHALL auto-scroll the Execution_Process_Tab to keep the latest Round_Card visible.
Requirement 4: Data Evidence Capture
User Story: As a user, I want to see the specific data rows that support each analytical conclusion, so that I can verify claims made by the AI agent.
Acceptance Criteria
- WHEN the CodeExecutor executes code that produces a DataFrame result, THE CodeExecutor SHALL capture up to 10 representative rows from that DataFrame as the data evidence for the current round.
- THE CodeExecutor SHALL serialize data evidence rows as a list of dictionaries (one dictionary per row, keys being column names) and include the list in the execution result returned to the DataAnalysisAgent.
- IF the code execution does not produce a DataFrame result, THEN THE CodeExecutor SHALL return an empty list as the data evidence.
- THE DataAnalysisAgent SHALL include the data evidence list in the Round_Data object for the corresponding round.
Requirement 5: DataFrame Auto-Detection and Export
User Story: As a user, I want intermediate DataFrames created during analysis to be automatically saved as files, so that I can browse and download them from the Data Files tab.
Acceptance Criteria
- WHEN code execution completes, THE CodeExecutor SHALL compare the set of DataFrame variables in the IPython namespace before and after execution to detect newly created DataFrames.
- WHEN a new DataFrame variable is detected, THE CodeExecutor SHALL export the DataFrame to the session output directory as a CSV file named
{variable_name}.csv. - IF a file with the same name already exists in the session output directory, THEN THE CodeExecutor SHALL append a numeric suffix (e.g.,
_1,_2) to avoid overwriting. - THE CodeExecutor SHALL record metadata for each auto-exported file: variable name, filename, row count, column count, and column names.
- WHEN auto-export completes, THE CodeExecutor SHALL include the exported file metadata in the execution result returned to the DataAnalysisAgent.
Requirement 6: Prompt Guidance for Intermediate File Saving
User Story: As a user, I want the LLM to proactively save intermediate analysis results as files, so that important intermediate datasets are available for review.
Acceptance Criteria
- THE system prompt (
prompts.py) SHALL include instructions directing the LLM to save intermediate analysis results (filtered subsets, aggregation tables, clustering results) as CSV or XLSX files in thesession_output_dir. - THE system prompt SHALL instruct the LLM to print a standardized marker line after saving each file, in the format:
[DATA_FILE_SAVED] filename: {name}, rows: {count}, description: {desc}. - WHEN the CodeExecutor detects a
[DATA_FILE_SAVED]marker in the execution output, THE CodeExecutor SHALL parse the marker and record the file metadata (filename, row count, description).
Requirement 7: Data Files API
User Story: As a frontend developer, I want API endpoints to list, preview, and download intermediate data files, so that the Data Files tab can display and serve them.
Acceptance Criteria
- WHEN the frontend requests
GET /api/data-files?session_id={id}, THE Data_Files_API SHALL return a JSON array of file entries, each containing: filename, description, row count, column count, and file size in bytes. - WHEN the frontend requests
GET /api/data-files/preview?session_id={id}&filename={name}, THE Data_Files_API SHALL return a JSON object containing: column names (list of strings), and up to 5 data rows (list of dictionaries). - WHEN the frontend requests
GET /api/data-files/download?session_id={id}&filename={name}, THE Data_Files_API SHALL return the file as a downloadable attachment with the appropriate MIME type (text/csvfor CSV,application/vnd.openxmlformats-officedocument.spreadsheetml.sheetfor XLSX). - IF the requested file does not exist, THEN THE Data_Files_API SHALL return HTTP 404 with a descriptive error message.
Requirement 8: Data Files Tab UI
User Story: As a user, I want to browse intermediate data files produced during analysis, preview their contents, and download them individually.
Acceptance Criteria
- THE Dashboard SHALL display a "数据文件" (Data Files) tab as the second tab.
- WHEN the Data_Files_Tab is active, THE Dashboard SHALL fetch the file list from
GET /api/data-filesand render each file as a card showing: filename, description, and row count. - WHEN a user clicks on a file card, THE Dashboard SHALL fetch the preview from
GET /api/data-files/previewand display a table showing column headers and up to 5 data rows. - WHEN a user clicks the download button on a file card, THE Dashboard SHALL initiate a file download via
GET /api/data-files/download. - WHILE analysis is running, THE Dashboard SHALL refresh the file list on each polling cycle to show newly created files.
Requirement 9: Gallery Removal and Inline Images in Report
User Story: As a user, I want images displayed inline within report paragraphs instead of in a separate Gallery tab, so that visual evidence is presented in context.
Acceptance Criteria
- THE Dashboard SHALL remove the "Gallery" tab from the tab bar.
- THE Dashboard SHALL remove the gallery carousel UI (carousel container, navigation buttons, image info panel) from the HTML.
- THE Report_Tab SHALL render images inline within report paragraphs using standard Markdown image syntax (
), as already supported by the existingmarked.jsrendering. - THE
switchTabfunction inscript.jsSHALL handle only the three new tab identifiers:execution,datafiles, andreport. - THE frontend SHALL remove all gallery-related JavaScript functions (
loadGallery,renderGalleryImage,prevImage,nextImage) and associated state variables (galleryImages,currentImageIndex).
Requirement 10: Supporting Data Button in Report
User Story: As a user, I want report paragraphs that make data-driven claims to have a "查看支撑数据" button, so that I can view the evidence data that supports each conclusion.
Acceptance Criteria
- WHEN the Report_Tab renders a paragraph of type
textthat has associated data evidence, THE Dashboard SHALL display a "查看支撑数据" (View Supporting Data) button below the paragraph content. - WHEN a user clicks the "查看支撑数据" button, THE Dashboard SHALL display a popover or modal showing the associated data evidence rows in a table format.
- THE
GET /api/reportresponse SHALL include asupporting_datamapping (keyed by paragraph ID) containing the data evidence rows relevant to each paragraph. - IF a paragraph has no associated data evidence, THEN THE Dashboard SHALL not display the "查看支撑数据" button for that paragraph.
Requirement 11: Report-to-Evidence Linking in Backend
User Story: As a backend developer, I want the system to associate data evidence from execution rounds with report paragraphs, so that the frontend can display supporting data buttons.
Acceptance Criteria
- WHEN generating the final report, THE DataAnalysisAgent SHALL pass the collected data evidence from all rounds to the report generation prompt.
- THE final report generation prompt SHALL instruct the LLM to annotate report paragraphs with round references (e.g.,
<!-- evidence:round_3 -->) when a paragraph's content is derived from a specific analysis round. - WHEN the
GET /api/reportendpoint parses the report, THE backend SHALL extract evidence annotations and build asupporting_datamapping by looking up the referenced round's data evidence from the SessionData. - IF a paragraph contains no evidence annotation, THEN THE backend SHALL exclude that paragraph from the
supporting_datamapping.
Requirement 12: Session Data Model Extension
User Story: As a backend developer, I want the SessionData model to store structured round data and data file metadata, so that the new API endpoints can serve this information.
Acceptance Criteria
- THE SessionData class SHALL include a
roundsattribute (list of Round_Data dictionaries) to store structured data for each completed analysis round. - THE SessionData class SHALL include a
data_filesattribute (list of file metadata dictionaries) to store information about intermediate data files. - WHEN a new data file is detected (via auto-detection or prompt-guided saving), THE DataAnalysisAgent SHALL append the file metadata to the SessionData
data_fileslist. - THE SessionData class SHALL persist the
roundsanddata_filesattributes to the session'sresults.jsonfile upon analysis completion.