Files

Jeason 00bd48e7e7 大更新，架构调整，数据分析能力提升，

2026-04-19 21:30:08 +08:00

14 KiB

Raw Blame History

Requirements Document

Introduction

This feature redesigns the Analysis Dashboard from the current 3-tab layout (Live Log, Report, Gallery) to a new 3-tab layout (Execution Process, Data Files, Report) with richer functionality. The redesign introduces structured round-by-round execution cards, intermediate data file browsing, inline image display within the report, and a data evidence/supporting data feature that links analysis conclusions to the specific data rows that support them. The Gallery tab is removed; its functionality is absorbed into the Report tab.

Glossary

Dashboard: The main analysis output panel in the web frontend (index.html) containing tabs for viewing analysis results.
Execution_Process_Tab: The new first tab (执行过程) replacing the Live Log tab, displaying analysis rounds as collapsible cards.
Round_Card: A collapsible UI card within the Execution_Process_Tab representing one analysis round, containing reasoning, code, result summary, data evidence, and raw log.
Data_Files_Tab: The new second tab (数据文件) showing intermediate data files produced during analysis.
Report_Tab: The enhanced third tab (报告) with inline images and supporting data links.
Data_Evidence: Specific data rows extracted during analysis that support a particular analytical conclusion or claim.
CodeExecutor: The Python class (utils/code_executor.py) responsible for executing generated analysis code in an IPython environment.
DataAnalysisAgent: The Python class (data_analysis_agent.py) orchestrating the multi-round LLM-driven analysis workflow.
SessionData: The Python class (web/main.py) tracking per-session state including running status, output directory, and analysis results.
Status_API: The GET /api/status endpoint polled every 2 seconds by the frontend to retrieve analysis progress.
Data_Files_API: The new set of API endpoints (GET /api/data-files, GET /api/data-files/preview, GET /api/data-files/download) for listing, previewing, and downloading intermediate data files.
Round_Data: A structured JSON object representing one analysis round, containing fields for reasoning, code, execution result summary, data evidence rows, and raw log output.
Auto_Detection: The mechanism by which CodeExecutor automatically detects new DataFrames created during code execution and exports them as files.
Prompt_Guidance: Instructions embedded in the system prompt that direct the LLM to proactively save intermediate analysis results as files.

Requirements

Requirement 1: Structured Round Data Capture

User Story: As a user, I want each analysis round's data to be captured in a structured format, so that the frontend can render rich execution cards instead of raw log text.

Acceptance Criteria

WHEN an analysis round completes, THE DataAnalysisAgent SHALL produce a Round_Data object containing the following fields: round number, AI reasoning text, generated code, execution result summary, data evidence rows (list of dictionaries), and raw log output.
WHEN the DataAnalysisAgent processes an LLM response with a YAML reasoning field, THE DataAnalysisAgent SHALL extract and store the reasoning text in the Round_Data object for that round.
THE DataAnalysisAgent SHALL append each completed Round_Data object to a list stored on the SessionData instance, preserving insertion order.
IF the LLM response does not contain a parseable reasoning field, THEN THE DataAnalysisAgent SHALL store an empty string as the reasoning text in the Round_Data object.

Requirement 2: Structured Status API Response

User Story: As a frontend developer, I want the status API to return structured round data, so that I can render execution cards in real time.

Acceptance Criteria

WHEN the frontend polls GET /api/status, THE Status_API SHALL return a JSON response containing a rounds array of Round_Data objects in addition to the existing fields (is_running, has_report, progress_percentage, current_round, max_rounds, status_message).
WHEN a new analysis round completes between two polling intervals, THE Status_API SHALL include the newly completed Round_Data object in the rounds array on the next poll response.
THE Status_API SHALL continue to return the log field containing raw log text for backward compatibility.

Requirement 3: Execution Process Tab UI

User Story: As a user, I want to see each analysis round as a collapsible card with reasoning, code, results, and data evidence, so that I can understand the step-by-step analysis process.

Acceptance Criteria

THE Dashboard SHALL display an "执行过程" (Execution Process) tab as the first tab, replacing the current "Live Log" tab.
WHEN the Execution_Process_Tab is active, THE Dashboard SHALL render one Round_Card for each entry in the rounds array returned by the Status_API.
THE Round_Card SHALL default to a collapsed state showing only the round number and a one-line execution result summary.
WHEN a user clicks on a collapsed Round_Card, THE Dashboard SHALL expand the card to reveal: AI reasoning text, generated code (in a collapsible sub-section), execution result summary, data evidence section (labeled "本轮数据案例"), and raw log output (in a collapsible sub-section).
WHEN a new Round_Data object appears in the polling response, THE Dashboard SHALL append a new Round_Card to the Execution_Process_Tab without removing or re-rendering existing cards.
WHILE analysis is running, THE Dashboard SHALL auto-scroll the Execution_Process_Tab to keep the latest Round_Card visible.

Requirement 4: Data Evidence Capture

User Story: As a user, I want to see the specific data rows that support each analytical conclusion, so that I can verify claims made by the AI agent.

Acceptance Criteria

WHEN the CodeExecutor executes code that produces a DataFrame result, THE CodeExecutor SHALL capture up to 10 representative rows from that DataFrame as the data evidence for the current round.
THE CodeExecutor SHALL serialize data evidence rows as a list of dictionaries (one dictionary per row, keys being column names) and include the list in the execution result returned to the DataAnalysisAgent.
IF the code execution does not produce a DataFrame result, THEN THE CodeExecutor SHALL return an empty list as the data evidence.
THE DataAnalysisAgent SHALL include the data evidence list in the Round_Data object for the corresponding round.

Requirement 5: DataFrame Auto-Detection and Export

User Story: As a user, I want intermediate DataFrames created during analysis to be automatically saved as files, so that I can browse and download them from the Data Files tab.

Acceptance Criteria

WHEN code execution completes, THE CodeExecutor SHALL compare the set of DataFrame variables in the IPython namespace before and after execution to detect newly created DataFrames.
WHEN a new DataFrame variable is detected, THE CodeExecutor SHALL export the DataFrame to the session output directory as a CSV file named {variable_name}.csv.
IF a file with the same name already exists in the session output directory, THEN THE CodeExecutor SHALL append a numeric suffix (e.g., _1, _2) to avoid overwriting.
THE CodeExecutor SHALL record metadata for each auto-exported file: variable name, filename, row count, column count, and column names.
WHEN auto-export completes, THE CodeExecutor SHALL include the exported file metadata in the execution result returned to the DataAnalysisAgent.

Requirement 6: Prompt Guidance for Intermediate File Saving

User Story: As a user, I want the LLM to proactively save intermediate analysis results as files, so that important intermediate datasets are available for review.

Acceptance Criteria

THE system prompt (prompts.py) SHALL include instructions directing the LLM to save intermediate analysis results (filtered subsets, aggregation tables, clustering results) as CSV or XLSX files in the session_output_dir.
THE system prompt SHALL instruct the LLM to print a standardized marker line after saving each file, in the format: [DATA_FILE_SAVED] filename: {name}, rows: {count}, description: {desc}.
WHEN the CodeExecutor detects a [DATA_FILE_SAVED] marker in the execution output, THE CodeExecutor SHALL parse the marker and record the file metadata (filename, row count, description).

Requirement 7: Data Files API

User Story: As a frontend developer, I want API endpoints to list, preview, and download intermediate data files, so that the Data Files tab can display and serve them.

Acceptance Criteria

WHEN the frontend requests GET /api/data-files?session_id={id}, THE Data_Files_API SHALL return a JSON array of file entries, each containing: filename, description, row count, column count, and file size in bytes.
WHEN the frontend requests GET /api/data-files/preview?session_id={id}&filename={name}, THE Data_Files_API SHALL return a JSON object containing: column names (list of strings), and up to 5 data rows (list of dictionaries).
WHEN the frontend requests GET /api/data-files/download?session_id={id}&filename={name}, THE Data_Files_API SHALL return the file as a downloadable attachment with the appropriate MIME type (text/csv for CSV, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet for XLSX).
IF the requested file does not exist, THEN THE Data_Files_API SHALL return HTTP 404 with a descriptive error message.

Requirement 8: Data Files Tab UI

User Story: As a user, I want to browse intermediate data files produced during analysis, preview their contents, and download them individually.

Acceptance Criteria

THE Dashboard SHALL display a "数据文件" (Data Files) tab as the second tab.
WHEN the Data_Files_Tab is active, THE Dashboard SHALL fetch the file list from GET /api/data-files and render each file as a card showing: filename, description, and row count.
WHEN a user clicks on a file card, THE Dashboard SHALL fetch the preview from GET /api/data-files/preview and display a table showing column headers and up to 5 data rows.
WHEN a user clicks the download button on a file card, THE Dashboard SHALL initiate a file download via GET /api/data-files/download.
WHILE analysis is running, THE Dashboard SHALL refresh the file list on each polling cycle to show newly created files.

Requirement 9: Gallery Removal and Inline Images in Report

User Story: As a user, I want images displayed inline within report paragraphs instead of in a separate Gallery tab, so that visual evidence is presented in context.

Acceptance Criteria

THE Dashboard SHALL remove the "Gallery" tab from the tab bar.
THE Dashboard SHALL remove the gallery carousel UI (carousel container, navigation buttons, image info panel) from the HTML.
THE Report_Tab SHALL render images inline within report paragraphs using standard Markdown image syntax (![alt](url)), as already supported by the existing marked.js rendering.
THE switchTab function in script.js SHALL handle only the three new tab identifiers: execution, datafiles, and report.
THE frontend SHALL remove all gallery-related JavaScript functions (loadGallery, renderGalleryImage, prevImage, nextImage) and associated state variables (galleryImages, currentImageIndex).

Requirement 10: Supporting Data Button in Report

User Story: As a user, I want report paragraphs that make data-driven claims to have a "查看支撑数据" button, so that I can view the evidence data that supports each conclusion.

Acceptance Criteria

WHEN the Report_Tab renders a paragraph of type text that has associated data evidence, THE Dashboard SHALL display a "查看支撑数据" (View Supporting Data) button below the paragraph content.
WHEN a user clicks the "查看支撑数据" button, THE Dashboard SHALL display a popover or modal showing the associated data evidence rows in a table format.
THE GET /api/report response SHALL include a supporting_data mapping (keyed by paragraph ID) containing the data evidence rows relevant to each paragraph.
IF a paragraph has no associated data evidence, THEN THE Dashboard SHALL not display the "查看支撑数据" button for that paragraph.

Requirement 11: Report-to-Evidence Linking in Backend

User Story: As a backend developer, I want the system to associate data evidence from execution rounds with report paragraphs, so that the frontend can display supporting data buttons.

Acceptance Criteria

WHEN generating the final report, THE DataAnalysisAgent SHALL pass the collected data evidence from all rounds to the report generation prompt.
THE final report generation prompt SHALL instruct the LLM to annotate report paragraphs with round references (e.g., ) when a paragraph's content is derived from a specific analysis round.
WHEN the GET /api/report endpoint parses the report, THE backend SHALL extract evidence annotations and build a supporting_data mapping by looking up the referenced round's data evidence from the SessionData.
IF a paragraph contains no evidence annotation, THEN THE backend SHALL exclude that paragraph from the supporting_data mapping.

Requirement 12: Session Data Model Extension

User Story: As a backend developer, I want the SessionData model to store structured round data and data file metadata, so that the new API endpoints can serve this information.

Acceptance Criteria

THE SessionData class SHALL include a rounds attribute (list of Round_Data dictionaries) to store structured data for each completed analysis round.
THE SessionData class SHALL include a data_files attribute (list of file metadata dictionaries) to store information about intermediate data files.
WHEN a new data file is detected (via auto-detection or prompt-guided saving), THE DataAnalysisAgent SHALL append the file metadata to the SessionData data_files list.
THE SessionData class SHALL persist the rounds and data_files attributes to the session's results.json file upon analysis completion.

14 KiB Raw Blame History

Requirements Document

Introduction

Glossary

Requirements

Requirement 1: Structured Round Data Capture

Acceptance Criteria

Requirement 2: Structured Status API Response

Acceptance Criteria

Requirement 3: Execution Process Tab UI

Acceptance Criteria

Requirement 4: Data Evidence Capture

Acceptance Criteria

Requirement 5: DataFrame Auto-Detection and Export

Acceptance Criteria

Requirement 6: Prompt Guidance for Intermediate File Saving

Acceptance Criteria

Requirement 7: Data Files API

Acceptance Criteria

Requirement 8: Data Files Tab UI

Acceptance Criteria

Requirement 9: Gallery Removal and Inline Images in Report

Acceptance Criteria

Requirement 10: Supporting Data Button in Report

Acceptance Criteria

Requirement 11: Report-to-Evidence Linking in Backend

Acceptance Criteria

Requirement 12: Session Data Model Extension

Acceptance Criteria

14 KiB

Raw Blame History