前后端页面同步策略,支持分析模板热编辑以及yaml配置,修改提示词编码,占用符等问题,优化文件扫描
This commit is contained in:
22
config/templates/anomaly_detection.yaml
Normal file
22
config/templates/anomaly_detection.yaml
Normal file
@@ -0,0 +1,22 @@
|
|||||||
|
name: 异常值检测分析
|
||||||
|
description: 识别数据中的异常值和离群点
|
||||||
|
steps:
|
||||||
|
- name: 数值列统计分析
|
||||||
|
description: 计算数值列的统计特征
|
||||||
|
prompt: 计算所有数值列的均值、标准差、四分位数等统计量
|
||||||
|
|
||||||
|
- name: 箱线图可视化
|
||||||
|
description: 使用箱线图识别异常值
|
||||||
|
prompt: 为每个数值列绘制箱线图,直观展示异常值分布
|
||||||
|
|
||||||
|
- name: Z-Score异常检测
|
||||||
|
description: 使用Z-Score方法检测异常值
|
||||||
|
prompt: 计算每个数值的Z-Score,标记|Z|>3的异常值
|
||||||
|
|
||||||
|
- name: IQR异常检测
|
||||||
|
description: 使用四分位距方法检测异常值
|
||||||
|
prompt: 使用IQR方法(Q1-1.5*IQR, Q3+1.5*IQR)检测异常值
|
||||||
|
|
||||||
|
- name: 异常值汇总报告
|
||||||
|
description: 整理所有检测到的异常值
|
||||||
|
prompt: 汇总所有异常值,分析其特征和可能原因,提供处理建议
|
||||||
18
config/templates/comparison.yaml
Normal file
18
config/templates/comparison.yaml
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
name: 分组对比分析
|
||||||
|
description: 对比不同分组之间的差异和特征
|
||||||
|
steps:
|
||||||
|
- name: 分组统计
|
||||||
|
description: 计算各组的统计指标
|
||||||
|
prompt: 按分组列分组,计算数值列的均值、中位数、标准差
|
||||||
|
|
||||||
|
- name: 分组可视化对比
|
||||||
|
description: 绘制对比图表
|
||||||
|
prompt: 绘制各组的柱状图和箱线图,直观对比差异
|
||||||
|
|
||||||
|
- name: 差异显著性检验
|
||||||
|
description: 统计检验组间差异
|
||||||
|
prompt: 进行t检验或方差分析,判断组间差异是否显著
|
||||||
|
|
||||||
|
- name: 对比结论
|
||||||
|
description: 总结对比结果
|
||||||
|
prompt: 总结各组特征、主要差异和业务洞察
|
||||||
50
config/templates/health_report.yaml
Normal file
50
config/templates/health_report.yaml
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
name: 车联网工单健康度报告
|
||||||
|
description: 全面分析车联网技术支持工单的健康状况,从多个维度评估工单处理效率和质量
|
||||||
|
steps:
|
||||||
|
- name: 数据概览与质量检查
|
||||||
|
description: 检查数据完整性、缺失值、异常值等
|
||||||
|
prompt: 加载数据并进行质量检查,输出数据概况和潜在问题
|
||||||
|
|
||||||
|
- name: 工单总量分析
|
||||||
|
description: 统计总工单数、时间分布、趋势变化
|
||||||
|
prompt: 计算总工单数,按时间维度统计工单量,绘制时间序列趋势图
|
||||||
|
|
||||||
|
- name: 车型维度分析
|
||||||
|
description: 分析不同车型的工单分布和问题特征
|
||||||
|
prompt: 统计各车型工单数量,绘制车型分布图,识别高风险车型
|
||||||
|
|
||||||
|
- name: 模块维度分析
|
||||||
|
description: 分析工单涉及的技术模块分布
|
||||||
|
prompt: 统计各技术模块的工单量,绘制模块分布图,识别高频问题模块
|
||||||
|
|
||||||
|
- name: 功能维度分析
|
||||||
|
description: 分析具体功能点的问题分布
|
||||||
|
prompt: 统计各功能的工单量,绘制TOP功能问题排行,分析功能稳定性
|
||||||
|
|
||||||
|
- name: 问题严重程度分析
|
||||||
|
description: 分析工单的严重程度分布
|
||||||
|
prompt: 统计不同严重程度的工单比例,绘制严重程度分布图
|
||||||
|
|
||||||
|
- name: 处理时长分析
|
||||||
|
description: 分析工单处理时效性
|
||||||
|
prompt: 计算平均处理时长、SLA达成率,识别超时工单,绘制时长分布图
|
||||||
|
|
||||||
|
- name: 责任人工作负载分析
|
||||||
|
description: 分析各责任人的工单负载和处理效率
|
||||||
|
prompt: 统计各责任人的工单数和处理效率,绘制负载分布图
|
||||||
|
|
||||||
|
- name: 来源渠道分析
|
||||||
|
description: 分析工单来源渠道分布
|
||||||
|
prompt: 统计各来源渠道的工单量,绘制渠道分布图
|
||||||
|
|
||||||
|
- name: 高频问题深度分析
|
||||||
|
description: 识别并深入分析高频问题
|
||||||
|
prompt: 提取TOP10高频问题,分析问题原因、影响范围和解决方案
|
||||||
|
|
||||||
|
- name: 综合健康度评分
|
||||||
|
description: 基于多个维度计算综合健康度评分
|
||||||
|
prompt: 综合考虑工单量、处理时长、问题严重度等指标,计算健康度评分
|
||||||
|
|
||||||
|
- name: 生成最终报告
|
||||||
|
description: 整合所有分析结果,生成完整报告
|
||||||
|
prompt: 整合所有图表和分析结论,生成一份完整的车联网工单健康度报告
|
||||||
22
config/templates/trend_analysis.yaml
Normal file
22
config/templates/trend_analysis.yaml
Normal file
@@ -0,0 +1,22 @@
|
|||||||
|
name: 时间序列趋势分析
|
||||||
|
description: 分析数据的时间趋势、季节性和周期性特征
|
||||||
|
steps:
|
||||||
|
- name: 时间序列数据准备
|
||||||
|
description: 将数据转换为时间序列格式
|
||||||
|
prompt: 将时间列转换为日期格式,按时间排序数据
|
||||||
|
|
||||||
|
- name: 趋势可视化
|
||||||
|
description: 绘制时间序列图
|
||||||
|
prompt: 绘制数值随时间的变化趋势图,添加移动平均线
|
||||||
|
|
||||||
|
- name: 趋势分析
|
||||||
|
description: 识别上升、下降或平稳趋势
|
||||||
|
prompt: 计算趋势线斜率,判断整体趋势方向和变化速率
|
||||||
|
|
||||||
|
- name: 季节性分析
|
||||||
|
description: 检测季节性模式
|
||||||
|
prompt: 分析月度、季度等周期性模式,绘制季节性分解图
|
||||||
|
|
||||||
|
- name: 异常点检测
|
||||||
|
description: 识别时间序列中的异常点
|
||||||
|
prompt: 使用统计方法检测时间序列中的异常值,标注在图表上
|
||||||
@@ -352,9 +352,13 @@ class DataAnalysisAgent:
|
|||||||
def _compress_trimmed_messages(self, messages: list) -> str:
|
def _compress_trimmed_messages(self, messages: list) -> str:
|
||||||
"""Compress trimmed messages into a concise summary string.
|
"""Compress trimmed messages into a concise summary string.
|
||||||
|
|
||||||
Extracts the action type from each assistant message and the execution
|
Extracts the action type from each assistant message, the execution
|
||||||
outcome (success / failure) from the subsequent user feedback message.
|
outcome (success / failure), and completed SOP stages from the
|
||||||
Code blocks and raw execution output are excluded.
|
subsequent user feedback message. Code blocks and raw execution
|
||||||
|
output are excluded.
|
||||||
|
|
||||||
|
The summary explicitly lists completed SOP stages so the LLM does
|
||||||
|
not restart from stage 1 after conversation trimming.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
messages: List of conversation message dicts to compress.
|
messages: List of conversation message dicts to compress.
|
||||||
@@ -364,6 +368,17 @@ class DataAnalysisAgent:
|
|||||||
"""
|
"""
|
||||||
summary_parts = ["[分析摘要] 以下是之前分析轮次的概要:"]
|
summary_parts = ["[分析摘要] 以下是之前分析轮次的概要:"]
|
||||||
round_num = 0
|
round_num = 0
|
||||||
|
completed_stages = set()
|
||||||
|
|
||||||
|
# SOP stage keywords to detect from assistant messages
|
||||||
|
stage_keywords = {
|
||||||
|
"阶段1": "数据探索与加载",
|
||||||
|
"阶段2": "基础分布分析",
|
||||||
|
"阶段3": "时序与来源分析",
|
||||||
|
"阶段4": "深度交叉分析",
|
||||||
|
"阶段5": "效率分析",
|
||||||
|
"阶段6": "高级挖掘",
|
||||||
|
}
|
||||||
|
|
||||||
for msg in messages:
|
for msg in messages:
|
||||||
content = msg["content"]
|
content = msg["content"]
|
||||||
@@ -375,12 +390,27 @@ class DataAnalysisAgent:
|
|||||||
action = "collect_figures"
|
action = "collect_figures"
|
||||||
elif "action: \"analysis_complete\"" in content or "action: analysis_complete" in content:
|
elif "action: \"analysis_complete\"" in content or "action: analysis_complete" in content:
|
||||||
action = "analysis_complete"
|
action = "analysis_complete"
|
||||||
|
|
||||||
|
# Detect completed SOP stages
|
||||||
|
for stage_key, stage_name in stage_keywords.items():
|
||||||
|
if stage_key in content or stage_name in content:
|
||||||
|
completed_stages.add(f"{stage_key}: {stage_name}")
|
||||||
|
|
||||||
summary_parts.append(f"- 轮次{round_num}: 动作={action}")
|
summary_parts.append(f"- 轮次{round_num}: 动作={action}")
|
||||||
elif msg["role"] == "user" and "代码执行反馈" in content:
|
elif msg["role"] == "user" and "代码执行反馈" in content:
|
||||||
success = "失败" if "[ERROR]" in content or "执行错误" in content else "成功"
|
success = "失败" if "[ERROR]" in content or "执行错误" in content else "成功"
|
||||||
if summary_parts and summary_parts[-1].startswith("- 轮次"):
|
if summary_parts and summary_parts[-1].startswith("- 轮次"):
|
||||||
summary_parts[-1] += f", 执行结果={success}"
|
summary_parts[-1] += f", 执行结果={success}"
|
||||||
|
|
||||||
|
# Append completed stages so the LLM knows where to continue
|
||||||
|
if completed_stages:
|
||||||
|
summary_parts.append("")
|
||||||
|
summary_parts.append("**已完成的SOP阶段** (请勿重复执行):")
|
||||||
|
for stage in sorted(completed_stages):
|
||||||
|
summary_parts.append(f" ✓ {stage}")
|
||||||
|
summary_parts.append("")
|
||||||
|
summary_parts.append("请从下一个未完成的阶段继续,不要重新执行已完成的阶段。")
|
||||||
|
|
||||||
return "\n".join(summary_parts)
|
return "\n".join(summary_parts)
|
||||||
|
|
||||||
def _profile_files_parallel(self, file_paths: list) -> tuple:
|
def _profile_files_parallel(self, file_paths: list) -> tuple:
|
||||||
@@ -948,6 +978,17 @@ class DataAnalysisAgent:
|
|||||||
- 注意:必须使用实际生成的图片文件名,严禁使用占位符
|
- 注意:必须使用实际生成的图片文件名,严禁使用占位符
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
# Append actual data files list so the LLM uses real filenames in the report
|
||||||
|
if self._session_ref and self._session_ref.data_files:
|
||||||
|
data_files_summary = "\n**已生成的数据文件列表** (请在报告中使用这些实际文件名,替换模板中的占位文件名如 [4-1TSP问题聚类.xlsx]):\n"
|
||||||
|
for df_meta in self._session_ref.data_files:
|
||||||
|
fname = df_meta.get("filename", "")
|
||||||
|
desc = df_meta.get("description", "")
|
||||||
|
rows = df_meta.get("rows", 0)
|
||||||
|
data_files_summary += f"- {fname} ({rows}行): {desc}\n"
|
||||||
|
data_files_summary += "\n注意:报告模板中的 `[4-1TSP问题聚类.xlsx]` 等占位文件名必须替换为上述实际文件名。如果某类聚类文件未生成,请说明原因(如数据量不足或该分类不适用),不要保留占位符。\n"
|
||||||
|
prompt += data_files_summary
|
||||||
|
|
||||||
return prompt
|
return prompt
|
||||||
|
|
||||||
def reset(self):
|
def reset(self):
|
||||||
|
|||||||
@@ -89,6 +89,8 @@ jupyter notebook环境当前变量:
|
|||||||
- 生成 `模块_严重程度堆叠图.png` (Stacked Bar)
|
- 生成 `模块_严重程度堆叠图.png` (Stacked Bar)
|
||||||
|
|
||||||
**阶段5:效率分析**
|
**阶段5:效率分析**
|
||||||
|
- **必做:按一级分类分组统计**:对每个一级分类(如TSP、APP、DK、咨询等)分别计算工单数量、平均时长、中位数时长,输出汇总表并保存为CSV。
|
||||||
|
示例代码:`df.groupby('一级分类')['解决时长/h'].agg(['count','mean','median'])`
|
||||||
- 生成 `处理时长分布.png` (直方图)
|
- 生成 `处理时长分布.png` (直方图)
|
||||||
- 生成 `责任人效率分析.png` (散点图: 工单量 vs 平均时长)
|
- 生成 `责任人效率分析.png` (散点图: 工单量 vs 平均时长)
|
||||||
|
|
||||||
@@ -179,6 +181,7 @@ final_report_system_prompt = """你是一位**资深数据分析专家 (Senior D
|
|||||||
### 报告结构模板使用说明 (Template Instructions)
|
### 报告结构模板使用说明 (Template Instructions)
|
||||||
- **固定格式 (Format)**:所有的 Markdown 标题 (`#`, `##`)、列表项前缀 (`- **...**`)、表格表头是必须保留的**骨架**。
|
- **固定格式 (Format)**:所有的 Markdown 标题 (`#`, `##`)、列表项前缀 (`- **...**`)、表格表头是必须保留的**骨架**。
|
||||||
- **写作指引 (Prompts)**:方括号 `[...]` 内的文字是给你的**写作提示**,请根据实际分析将其**替换**为具体内容,**不要**在最终报告中保留方括号。
|
- **写作指引 (Prompts)**:方括号 `[...]` 内的文字是给你的**写作提示**,请根据实际分析将其**替换**为具体内容,**不要**在最终报告中保留方括号。
|
||||||
|
- **数据文件引用规则**:模板中的 `[4-1TSP问题聚类.xlsx]` 等占位文件名**必须替换**为实际生成的文件名(见下方传入的已生成数据文件列表)。如果某类文件未生成,请注明原因(如"数据量不足,未执行聚类"或"该分类无对应数据"),不要保留占位符。
|
||||||
- **直接输出Markdown**:不要使用JSON或YAML包裹,直接输出Markdown内容。
|
- **直接输出Markdown**:不要使用JSON或YAML包裹,直接输出Markdown内容。
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|||||||
@@ -248,7 +248,7 @@ def test_prop7_template_prompt_prepended(name):
|
|||||||
template = get_template(name)
|
template = get_template(name)
|
||||||
prompt = template.get_full_prompt()
|
prompt = template.get_full_prompt()
|
||||||
assert len(prompt) > 0
|
assert len(prompt) > 0
|
||||||
assert template.name in prompt
|
assert template.display_name in prompt
|
||||||
|
|
||||||
|
|
||||||
# ===========================================================================
|
# ===========================================================================
|
||||||
|
|||||||
@@ -1,289 +1,153 @@
|
|||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
"""
|
"""
|
||||||
分析模板系统 - 提供预定义的分析场景
|
分析模板系统 - 从 config/templates/*.yaml 加载模板
|
||||||
|
|
||||||
|
模板文件格式:
|
||||||
|
name: 模板显示名称
|
||||||
|
description: 模板描述
|
||||||
|
steps:
|
||||||
|
- name: 步骤名称
|
||||||
|
description: 步骤描述
|
||||||
|
prompt: 给LLM的指令
|
||||||
"""
|
"""
|
||||||
|
|
||||||
from abc import ABC, abstractmethod
|
import os
|
||||||
|
import glob
|
||||||
|
import yaml
|
||||||
from typing import List, Dict, Any
|
from typing import List, Dict, Any
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
TEMPLATES_DIR = os.path.join(os.path.dirname(os.path.dirname(__file__)), "config", "templates")
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class AnalysisStep:
|
class AnalysisStep:
|
||||||
"""分析步骤"""
|
"""分析步骤"""
|
||||||
name: str
|
name: str
|
||||||
description: str
|
description: str
|
||||||
analysis_type: str # explore, visualize, calculate, report
|
|
||||||
prompt: str
|
prompt: str
|
||||||
|
|
||||||
|
|
||||||
class AnalysisTemplate(ABC):
|
class AnalysisTemplate:
|
||||||
"""分析模板基类"""
|
"""从 YAML 文件加载的分析模板"""
|
||||||
|
|
||||||
def __init__(self, name: str, description: str):
|
def __init__(self, name: str, display_name: str, description: str, steps: List[AnalysisStep], filepath: str = ""):
|
||||||
self.name = name
|
self.name = name
|
||||||
|
self.display_name = display_name
|
||||||
self.description = description
|
self.description = description
|
||||||
self.steps: List[AnalysisStep] = []
|
self.steps = steps
|
||||||
|
self.filepath = filepath
|
||||||
|
|
||||||
@abstractmethod
|
def get_full_prompt(self) -> str:
|
||||||
def build_steps(self, **kwargs) -> List[AnalysisStep]:
|
prompt = f"# {self.display_name}\n\n{self.description}\n\n"
|
||||||
"""构建分析步骤"""
|
|
||||||
pass
|
|
||||||
|
|
||||||
def get_full_prompt(self, **kwargs) -> str:
|
|
||||||
"""获取完整的分析提示词"""
|
|
||||||
steps = self.build_steps(**kwargs)
|
|
||||||
|
|
||||||
prompt = f"# {self.name}\n\n{self.description}\n\n"
|
|
||||||
prompt += "## 分析步骤:\n\n"
|
prompt += "## 分析步骤:\n\n"
|
||||||
|
for i, step in enumerate(self.steps, 1):
|
||||||
for i, step in enumerate(steps, 1):
|
|
||||||
prompt += f"### {i}. {step.name}\n"
|
prompt += f"### {i}. {step.name}\n"
|
||||||
prompt += f"{step.description}\n\n"
|
prompt += f"{step.description}\n\n"
|
||||||
prompt += f"```\n{step.prompt}\n```\n\n"
|
prompt += f"```\n{step.prompt}\n```\n\n"
|
||||||
|
|
||||||
return prompt
|
return prompt
|
||||||
|
|
||||||
|
def to_dict(self) -> Dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"name": self.name,
|
||||||
|
"display_name": self.display_name,
|
||||||
|
"description": self.description,
|
||||||
|
"steps": [{"name": s.name, "description": s.description, "prompt": s.prompt} for s in self.steps],
|
||||||
|
}
|
||||||
|
|
||||||
class HealthReportTemplate(AnalysisTemplate):
|
|
||||||
"""健康度报告模板 - 专门用于车联网工单健康度分析"""
|
|
||||||
|
|
||||||
def __init__(self):
|
def _load_template_from_file(filepath: str) -> AnalysisTemplate:
|
||||||
super().__init__(
|
"""从单个 YAML 文件加载模板"""
|
||||||
name="车联网工单健康度报告",
|
with open(filepath, "r", encoding="utf-8") as f:
|
||||||
description="全面分析车联网技术支持工单的健康状况,从多个维度评估工单处理效率和质量"
|
data = yaml.safe_load(f)
|
||||||
|
|
||||||
|
template_name = os.path.splitext(os.path.basename(filepath))[0]
|
||||||
|
steps = []
|
||||||
|
for s in data.get("steps", []):
|
||||||
|
steps.append(AnalysisStep(
|
||||||
|
name=s.get("name", ""),
|
||||||
|
description=s.get("description", ""),
|
||||||
|
prompt=s.get("prompt", ""),
|
||||||
|
))
|
||||||
|
|
||||||
|
return AnalysisTemplate(
|
||||||
|
name=template_name,
|
||||||
|
display_name=data.get("name", template_name),
|
||||||
|
description=data.get("description", ""),
|
||||||
|
steps=steps,
|
||||||
|
filepath=filepath,
|
||||||
)
|
)
|
||||||
|
|
||||||
def build_steps(self, **kwargs) -> List[AnalysisStep]:
|
|
||||||
"""构建健康度报告的分析步骤"""
|
def _scan_templates() -> Dict[str, AnalysisTemplate]:
|
||||||
return [
|
"""扫描 config/templates/ 目录加载所有模板"""
|
||||||
AnalysisStep(
|
registry = {}
|
||||||
name="数据概览与质量检查",
|
if not os.path.exists(TEMPLATES_DIR):
|
||||||
description="检查数据完整性、缺失值、异常值等",
|
os.makedirs(TEMPLATES_DIR, exist_ok=True)
|
||||||
analysis_type="explore",
|
return registry
|
||||||
prompt="加载数据并进行质量检查,输出数据概况和潜在问题"
|
|
||||||
),
|
for fpath in sorted(glob.glob(os.path.join(TEMPLATES_DIR, "*.yaml"))):
|
||||||
AnalysisStep(
|
try:
|
||||||
name="工单总量分析",
|
tpl = _load_template_from_file(fpath)
|
||||||
description="统计总工单数、时间分布、趋势变化",
|
registry[tpl.name] = tpl
|
||||||
analysis_type="calculate",
|
except Exception as e:
|
||||||
prompt="计算总工单数,按时间维度统计工单量,绘制时间序列趋势图"
|
print(f"[WARN] 加载模板失败 {fpath}: {e}")
|
||||||
),
|
return registry
|
||||||
AnalysisStep(
|
|
||||||
name="车型维度分析",
|
|
||||||
description="分析不同车型的工单分布和问题特征",
|
|
||||||
analysis_type="visualize",
|
|
||||||
prompt="统计各车型工单数量,绘制车型分布饼图和柱状图,识别高风险车型"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="模块维度分析",
|
|
||||||
description="分析工单涉及的技术模块分布",
|
|
||||||
analysis_type="visualize",
|
|
||||||
prompt="统计各技术模块的工单量,绘制模块分布图,识别高频问题模块"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="功能维度分析",
|
|
||||||
description="分析具体功能点的问题分布",
|
|
||||||
analysis_type="visualize",
|
|
||||||
prompt="统计各功能的工单量,绘制TOP功能问题排行,分析功能稳定性"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="问题严重程度分析",
|
|
||||||
description="分析工单的严重程度分布",
|
|
||||||
analysis_type="visualize",
|
|
||||||
prompt="统计不同严重程度的工单比例,绘制严重程度分布图"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="处理时长分析",
|
|
||||||
description="分析工单处理时效性",
|
|
||||||
analysis_type="calculate",
|
|
||||||
prompt="计算平均处理时长、SLA达成率,识别超时工单,绘制时长分布图"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="责任人工作负载分析",
|
|
||||||
description="分析各责任人的工单负载和处理效率",
|
|
||||||
analysis_type="visualize",
|
|
||||||
prompt="统计各责任人的工单数和处理效率,绘制负载分布图,识别超负荷人员"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="来源渠道分析",
|
|
||||||
description="分析工单来源渠道分布",
|
|
||||||
analysis_type="visualize",
|
|
||||||
prompt="统计各来源渠道的工单量,绘制渠道分布图"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="高频问题深度分析",
|
|
||||||
description="识别并深入分析高频问题",
|
|
||||||
analysis_type="explore",
|
|
||||||
prompt="提取TOP10高频问题,分析问题原因、影响范围和解决方案"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="综合健康度评分",
|
|
||||||
description="基于多个维度计算综合健康度评分",
|
|
||||||
analysis_type="calculate",
|
|
||||||
prompt="综合考虑工单量、处理时长、问题严重度等指标,计算健康度评分"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="生成最终报告",
|
|
||||||
description="整合所有分析结果,生成完整报告",
|
|
||||||
analysis_type="report",
|
|
||||||
prompt="整合所有图表和分析结论,生成一份完整的车联网工单健康度报告"
|
|
||||||
)
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
class TrendAnalysisTemplate(AnalysisTemplate):
|
# Module-level registry, refreshed on each call to support hot-editing
|
||||||
"""趋势分析模板"""
|
def _get_registry() -> Dict[str, AnalysisTemplate]:
|
||||||
|
return _scan_templates()
|
||||||
def __init__(self):
|
|
||||||
super().__init__(
|
|
||||||
name="时间序列趋势分析",
|
|
||||||
description="分析数据的时间趋势、季节性和周期性特征"
|
|
||||||
)
|
|
||||||
|
|
||||||
def build_steps(self, time_column: str = "日期", value_column: str = "数值", **kwargs) -> List[AnalysisStep]:
|
|
||||||
return [
|
|
||||||
AnalysisStep(
|
|
||||||
name="时间序列数据准备",
|
|
||||||
description="将数据转换为时间序列格式",
|
|
||||||
analysis_type="explore",
|
|
||||||
prompt=f"将 '{time_column}' 列转换为日期格式,按时间排序数据"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="趋势可视化",
|
|
||||||
description="绘制时间序列图",
|
|
||||||
analysis_type="visualize",
|
|
||||||
prompt=f"绘制 '{value_column}' 随 '{time_column}' 的变化趋势图,添加移动平均线"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="趋势分析",
|
|
||||||
description="识别上升、下降或平稳趋势",
|
|
||||||
analysis_type="calculate",
|
|
||||||
prompt="计算趋势线斜率,判断整体趋势方向和变化速率"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="季节性分析",
|
|
||||||
description="检测季节性模式",
|
|
||||||
analysis_type="visualize",
|
|
||||||
prompt="分析月度、季度等周期性模式,绘制季节性分解图"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="异常点检测",
|
|
||||||
description="识别时间序列中的异常点",
|
|
||||||
analysis_type="calculate",
|
|
||||||
prompt="使用统计方法检测时间序列中的异常值,标注在图表上"
|
|
||||||
)
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
class AnomalyDetectionTemplate(AnalysisTemplate):
|
# Keep TEMPLATE_REGISTRY as a lazy property for backward compatibility with tests
|
||||||
"""异常检测模板"""
|
TEMPLATE_REGISTRY = _scan_templates()
|
||||||
|
|
||||||
def __init__(self):
|
|
||||||
super().__init__(
|
|
||||||
name="异常值检测分析",
|
|
||||||
description="识别数据中的异常值和离群点"
|
|
||||||
)
|
|
||||||
|
|
||||||
def build_steps(self, **kwargs) -> List[AnalysisStep]:
|
|
||||||
return [
|
|
||||||
AnalysisStep(
|
|
||||||
name="数值列统计分析",
|
|
||||||
description="计算数值列的统计特征",
|
|
||||||
analysis_type="calculate",
|
|
||||||
prompt="计算所有数值列的均值、标准差、四分位数等统计量"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="箱线图可视化",
|
|
||||||
description="使用箱线图识别异常值",
|
|
||||||
analysis_type="visualize",
|
|
||||||
prompt="为每个数值列绘制箱线图,直观展示异常值分布"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="Z-Score异常检测",
|
|
||||||
description="使用Z-Score方法检测异常值",
|
|
||||||
analysis_type="calculate",
|
|
||||||
prompt="计算每个数值的Z-Score,标记|Z|>3的异常值"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="IQR异常检测",
|
|
||||||
description="使用四分位距方法检测异常值",
|
|
||||||
analysis_type="calculate",
|
|
||||||
prompt="使用IQR方法(Q1-1.5*IQR, Q3+1.5*IQR)检测异常值"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="异常值汇总报告",
|
|
||||||
description="整理所有检测到的异常值",
|
|
||||||
analysis_type="report",
|
|
||||||
prompt="汇总所有异常值,分析其特征和可能原因,提供处理建议"
|
|
||||||
)
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
class ComparisonAnalysisTemplate(AnalysisTemplate):
|
|
||||||
"""对比分析模板"""
|
|
||||||
|
|
||||||
def __init__(self):
|
|
||||||
super().__init__(
|
|
||||||
name="分组对比分析",
|
|
||||||
description="对比不同分组之间的差异和特征"
|
|
||||||
)
|
|
||||||
|
|
||||||
def build_steps(self, group_column: str = "分组", value_column: str = "数值", **kwargs) -> List[AnalysisStep]:
|
|
||||||
return [
|
|
||||||
AnalysisStep(
|
|
||||||
name="分组统计",
|
|
||||||
description="计算各组的统计指标",
|
|
||||||
analysis_type="calculate",
|
|
||||||
prompt=f"按 '{group_column}' 分组,计算 '{value_column}' 的均值、中位数、标准差"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="分组可视化对比",
|
|
||||||
description="绘制对比图表",
|
|
||||||
analysis_type="visualize",
|
|
||||||
prompt=f"绘制各组的柱状图和箱线图,直观对比差异"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="差异显著性检验",
|
|
||||||
description="统计检验组间差异",
|
|
||||||
analysis_type="calculate",
|
|
||||||
prompt="进行t检验或方差分析,判断组间差异是否显著"
|
|
||||||
),
|
|
||||||
AnalysisStep(
|
|
||||||
name="对比结论",
|
|
||||||
description="总结对比结果",
|
|
||||||
analysis_type="report",
|
|
||||||
prompt="总结各组特征、主要差异和业务洞察"
|
|
||||||
)
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
# 模板注册表
|
|
||||||
TEMPLATE_REGISTRY = {
|
|
||||||
"health_report": HealthReportTemplate,
|
|
||||||
"trend_analysis": TrendAnalysisTemplate,
|
|
||||||
"anomaly_detection": AnomalyDetectionTemplate,
|
|
||||||
"comparison": ComparisonAnalysisTemplate
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def get_template(template_name: str) -> AnalysisTemplate:
|
def get_template(template_name: str) -> AnalysisTemplate:
|
||||||
"""获取分析模板"""
|
"""获取分析模板(每次从磁盘重新加载以支持热编辑)"""
|
||||||
template_class = TEMPLATE_REGISTRY.get(template_name)
|
registry = _get_registry()
|
||||||
if template_class:
|
if template_name in registry:
|
||||||
return template_class()
|
return registry[template_name]
|
||||||
else:
|
raise ValueError(f"未找到模板: {template_name}。可用模板: {list(registry.keys())}")
|
||||||
raise ValueError(f"未找到模板: {template_name}。可用模板: {list(TEMPLATE_REGISTRY.keys())}")
|
|
||||||
|
|
||||||
|
|
||||||
def list_templates() -> List[Dict[str, str]]:
|
def list_templates() -> List[Dict[str, str]]:
|
||||||
"""列出所有可用模板"""
|
"""列出所有可用模板"""
|
||||||
templates = []
|
registry = _get_registry()
|
||||||
for name, template_class in TEMPLATE_REGISTRY.items():
|
return [
|
||||||
template = template_class()
|
{"name": tpl.name, "display_name": tpl.display_name, "description": tpl.description}
|
||||||
templates.append({
|
for tpl in registry.values()
|
||||||
"name": name,
|
]
|
||||||
"display_name": template.name,
|
|
||||||
"description": template.description
|
|
||||||
})
|
def save_template(template_name: str, data: Dict[str, Any]) -> str:
|
||||||
return templates
|
"""保存或更新模板到 YAML 文件,返回文件路径"""
|
||||||
|
os.makedirs(TEMPLATES_DIR, exist_ok=True)
|
||||||
|
filepath = os.path.join(TEMPLATES_DIR, f"{template_name}.yaml")
|
||||||
|
|
||||||
|
yaml_data = {
|
||||||
|
"name": data.get("display_name", data.get("name", template_name)),
|
||||||
|
"description": data.get("description", ""),
|
||||||
|
"steps": data.get("steps", []),
|
||||||
|
}
|
||||||
|
|
||||||
|
with open(filepath, "w", encoding="utf-8") as f:
|
||||||
|
yaml.dump(yaml_data, f, allow_unicode=True, default_flow_style=False, sort_keys=False)
|
||||||
|
|
||||||
|
# Refresh global registry
|
||||||
|
global TEMPLATE_REGISTRY
|
||||||
|
TEMPLATE_REGISTRY = _scan_templates()
|
||||||
|
|
||||||
|
return filepath
|
||||||
|
|
||||||
|
|
||||||
|
def delete_template(template_name: str) -> bool:
|
||||||
|
"""删除模板文件"""
|
||||||
|
filepath = os.path.join(TEMPLATES_DIR, f"{template_name}.yaml")
|
||||||
|
if os.path.exists(filepath):
|
||||||
|
os.remove(filepath)
|
||||||
|
global TEMPLATE_REGISTRY
|
||||||
|
TEMPLATE_REGISTRY = _scan_templates()
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|||||||
@@ -92,12 +92,29 @@ class CodeExecutor:
|
|||||||
AUTO_EXPORT_MAX_ROWS = 50000
|
AUTO_EXPORT_MAX_ROWS = 50000
|
||||||
|
|
||||||
# Variable names to skip during DataFrame auto-export
|
# Variable names to skip during DataFrame auto-export
|
||||||
# (common import aliases and built-in namespace names)
|
# (common import aliases, built-in namespace names, and typical
|
||||||
|
# temporary/intermediate variable names that shouldn't be persisted)
|
||||||
_SKIP_EXPORT_NAMES = {
|
_SKIP_EXPORT_NAMES = {
|
||||||
|
# Import aliases
|
||||||
"pd", "np", "plt", "sns", "os", "json", "sys", "re", "io",
|
"pd", "np", "plt", "sns", "os", "json", "sys", "re", "io",
|
||||||
"csv", "glob", "duckdb", "display", "math", "datetime", "time",
|
"csv", "glob", "duckdb", "display", "math", "datetime", "time",
|
||||||
"warnings", "logging", "copy", "pickle", "pathlib", "collections",
|
"warnings", "logging", "copy", "pickle", "pathlib", "collections",
|
||||||
"itertools", "functools", "operator", "random", "networkx",
|
"itertools", "functools", "operator", "random", "networkx",
|
||||||
|
# Common data variable — the main loaded DataFrame should not be
|
||||||
|
# auto-exported every round; the LLM can save it explicitly via
|
||||||
|
# DATA_FILE_SAVED if needed.
|
||||||
|
"df",
|
||||||
|
# Typical intermediate/temporary variable names from analysis code
|
||||||
|
"cross_table", "cross_table_filtered",
|
||||||
|
"module_issue_table", "module_issue_filtered",
|
||||||
|
"correlation_matrix",
|
||||||
|
"feature_data", "person_stats", "top_persons",
|
||||||
|
"abnormal_durations", "abnormal_orders",
|
||||||
|
"missing_df", "missing_values", "missing_percent",
|
||||||
|
"monthly_counts", "monthly_summary",
|
||||||
|
"distribution_results", "phrase_freq",
|
||||||
|
"normal_durations",
|
||||||
|
"df_check", "df_temp",
|
||||||
}
|
}
|
||||||
|
|
||||||
# Regex for parsing DATA_FILE_SAVED markers
|
# Regex for parsing DATA_FILE_SAVED markers
|
||||||
@@ -341,15 +358,31 @@ from IPython.display import display
|
|||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
def _sanitize_for_json(rows: List[Dict]) -> List[Dict]:
|
def _sanitize_for_json(rows: List[Dict]) -> List[Dict]:
|
||||||
"""Replace NaN/inf/-inf with None so the data is JSON-serializable."""
|
"""Make evidence row values JSON-serializable.
|
||||||
|
|
||||||
|
Handles NaN/inf → None, Timestamp/datetime → isoformat string,
|
||||||
|
numpy scalars → Python native types.
|
||||||
|
"""
|
||||||
import math
|
import math
|
||||||
sanitized = []
|
sanitized = []
|
||||||
for row in rows:
|
for row in rows:
|
||||||
clean = {}
|
clean = {}
|
||||||
for k, v in row.items():
|
for k, v in row.items():
|
||||||
if isinstance(v, float) and (math.isnan(v) or math.isinf(v)):
|
if v is None:
|
||||||
clean[k] = None
|
clean[k] = None
|
||||||
|
elif isinstance(v, float) and (math.isnan(v) or math.isinf(v)):
|
||||||
|
clean[k] = None
|
||||||
|
elif hasattr(v, 'isoformat'): # Timestamp, datetime
|
||||||
|
clean[k] = v.isoformat()
|
||||||
|
elif hasattr(v, 'item'): # numpy scalar
|
||||||
|
clean[k] = v.item()
|
||||||
else:
|
else:
|
||||||
|
try:
|
||||||
|
if pd.isna(v):
|
||||||
|
clean[k] = None
|
||||||
|
continue
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
pass
|
||||||
clean[k] = v
|
clean[k] = v
|
||||||
sanitized.append(clean)
|
sanitized.append(clean)
|
||||||
return sanitized
|
return sanitized
|
||||||
@@ -405,12 +438,17 @@ from IPython.display import display
|
|||||||
def _detect_new_dataframes(
|
def _detect_new_dataframes(
|
||||||
self, before: Dict[str, int], after: Dict[str, int]
|
self, before: Dict[str, int], after: Dict[str, int]
|
||||||
) -> List[str]:
|
) -> List[str]:
|
||||||
"""Return variable names of new or changed DataFrames."""
|
"""Return variable names of truly NEW DataFrames only.
|
||||||
new_or_changed = []
|
|
||||||
|
Only returns names that did not exist in the before-snapshot.
|
||||||
|
Changed DataFrames (same name, different id) are excluded to avoid
|
||||||
|
re-exporting the main 'df' or other modified variables every round.
|
||||||
|
"""
|
||||||
|
new_only = []
|
||||||
for name, obj_id in after.items():
|
for name, obj_id in after.items():
|
||||||
if name not in before or before[name] != obj_id:
|
if name not in before:
|
||||||
new_or_changed.append(name)
|
new_only.append(name)
|
||||||
return new_or_changed
|
return new_only
|
||||||
|
|
||||||
def _export_dataframe(self, var_name: str, df) -> Optional[Dict[str, Any]]:
|
def _export_dataframe(self, var_name: str, df) -> Optional[Dict[str, Any]]:
|
||||||
"""
|
"""
|
||||||
|
|||||||
@@ -84,6 +84,20 @@ class LLMHelper:
|
|||||||
else:
|
else:
|
||||||
yaml_content = response.strip()
|
yaml_content = response.strip()
|
||||||
|
|
||||||
|
# Strip language identifier if LLM used ```python instead of ```yaml
|
||||||
|
# e.g. "python\naction: ..." → "action: ..."
|
||||||
|
import re
|
||||||
|
if re.match(r'^[a-zA-Z]+\n', yaml_content):
|
||||||
|
yaml_content = yaml_content.split('\n', 1)[1]
|
||||||
|
|
||||||
|
# Fix Windows backslash paths that break YAML double-quoted strings.
|
||||||
|
# e.g. "D:\code\iov..." → "D:/code/iov..." inside quoted values
|
||||||
|
yaml_content = re.sub(
|
||||||
|
r'"([A-Za-z]:\\[^"]*)"',
|
||||||
|
lambda m: '"' + m.group(1).replace('\\', '/') + '"',
|
||||||
|
yaml_content,
|
||||||
|
)
|
||||||
|
|
||||||
parsed = yaml.safe_load(yaml_content)
|
parsed = yaml.safe_load(yaml_content)
|
||||||
return parsed if parsed is not None else {}
|
return parsed if parsed is not None else {}
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
|
|||||||
@@ -71,6 +71,59 @@ def clean_code_block(code: str) -> str:
|
|||||||
return '\n'.join(result_lines)
|
return '\n'.join(result_lines)
|
||||||
|
|
||||||
|
|
||||||
|
def _is_verification_code(code: str) -> bool:
|
||||||
|
"""Detect code blocks that only check/list files without doing real analysis.
|
||||||
|
|
||||||
|
These are typically generated when the LLM runs os.listdir / os.path.exists
|
||||||
|
loops to verify outputs, and should not appear in the reusable script.
|
||||||
|
"""
|
||||||
|
lines = [l.strip() for l in code.strip().splitlines() if l.strip() and not l.strip().startswith('#')]
|
||||||
|
if not lines:
|
||||||
|
return True
|
||||||
|
|
||||||
|
verification_indicators = 0
|
||||||
|
analysis_indicators = 0
|
||||||
|
|
||||||
|
for line in lines:
|
||||||
|
# Verification patterns
|
||||||
|
if any(kw in line for kw in [
|
||||||
|
'os.listdir(', 'os.path.exists(', 'os.path.getsize(',
|
||||||
|
'os.path.isfile(', '✓', '✗', 'all_exist',
|
||||||
|
]):
|
||||||
|
verification_indicators += 1
|
||||||
|
# Analysis patterns (actual computation / plotting / saving)
|
||||||
|
if any(kw in line for kw in [
|
||||||
|
'.plot(', 'plt.', '.to_csv(', '.value_counts()',
|
||||||
|
'.groupby(', '.corr(', '.fit_transform(', '.fit_predict(',
|
||||||
|
'pd.read_csv(', 'pd.crosstab(', '.describe()',
|
||||||
|
]):
|
||||||
|
analysis_indicators += 1
|
||||||
|
|
||||||
|
# If the block is dominated by verification with no real analysis, skip it
|
||||||
|
return verification_indicators > 0 and analysis_indicators == 0
|
||||||
|
|
||||||
|
|
||||||
|
def _is_duplicate_data_load(code: str, seen_load_blocks: set) -> bool:
|
||||||
|
"""Detect duplicate data loading blocks (LLM 'amnesia' repeats).
|
||||||
|
|
||||||
|
Computes a fingerprint from the code's structural lines (ignoring
|
||||||
|
whitespace and comments) and returns True if we've seen it before.
|
||||||
|
"""
|
||||||
|
# Extract structural fingerprint: non-empty, non-comment lines
|
||||||
|
structural_lines = []
|
||||||
|
for line in code.splitlines():
|
||||||
|
stripped = line.strip()
|
||||||
|
if stripped and not stripped.startswith('#'):
|
||||||
|
structural_lines.append(stripped)
|
||||||
|
|
||||||
|
fingerprint = '\n'.join(structural_lines[:30]) # First 30 lines are enough
|
||||||
|
|
||||||
|
if fingerprint in seen_load_blocks:
|
||||||
|
return True
|
||||||
|
seen_load_blocks.add(fingerprint)
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
def generate_reusable_script(
|
def generate_reusable_script(
|
||||||
analysis_results: List[Dict[str, Any]],
|
analysis_results: List[Dict[str, Any]],
|
||||||
data_files: List[str],
|
data_files: List[str],
|
||||||
@@ -92,17 +145,29 @@ def generate_reusable_script(
|
|||||||
# 收集所有成功执行的代码
|
# 收集所有成功执行的代码
|
||||||
all_imports = set()
|
all_imports = set()
|
||||||
code_blocks = []
|
code_blocks = []
|
||||||
|
seen_load_blocks: Set[str] = set()
|
||||||
|
|
||||||
for result in analysis_results:
|
for result in analysis_results:
|
||||||
# 只处理 generate_code 类型的结果
|
# 只处理 generate_code 类型的结果
|
||||||
if result.get("action") == "collect_figures":
|
if result.get("action") == "collect_figures":
|
||||||
continue
|
continue
|
||||||
|
# Skip retry attempts
|
||||||
|
if result.get("retry"):
|
||||||
|
continue
|
||||||
|
|
||||||
code = result.get("code", "")
|
code = result.get("code", "")
|
||||||
exec_result = result.get("result", {})
|
exec_result = result.get("result", {})
|
||||||
|
|
||||||
# 只收集成功执行的代码
|
# 只收集成功执行的代码
|
||||||
if code and exec_result.get("success", False):
|
if code and exec_result.get("success", False):
|
||||||
|
# Skip pure verification/file-check code (e.g. os.listdir loops)
|
||||||
|
if _is_verification_code(code):
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Skip duplicate data-loading blocks (LLM amnesia repeats)
|
||||||
|
if _is_duplicate_data_load(code, seen_load_blocks):
|
||||||
|
continue
|
||||||
|
|
||||||
# 提取 imports
|
# 提取 imports
|
||||||
imports = extract_imports(code)
|
imports = extract_imports(code)
|
||||||
all_imports.update(imports)
|
all_imports.update(imports)
|
||||||
|
|||||||
133
web/main.py
133
web/main.py
@@ -85,10 +85,29 @@ class SessionManager:
|
|||||||
return self.sessions[session_id]
|
return self.sessions[session_id]
|
||||||
|
|
||||||
# Fallback: Try to reconstruct from disk for history sessions
|
# Fallback: Try to reconstruct from disk for history sessions
|
||||||
|
# First try the old convention: outputs/session_{uuid}
|
||||||
output_dir = os.path.join("outputs", f"session_{session_id}")
|
output_dir = os.path.join("outputs", f"session_{session_id}")
|
||||||
if os.path.exists(output_dir) and os.path.isdir(output_dir):
|
if os.path.exists(output_dir) and os.path.isdir(output_dir):
|
||||||
return self._reconstruct_session(session_id, output_dir)
|
return self._reconstruct_session(session_id, output_dir)
|
||||||
|
|
||||||
|
# Scan all session directories for session_meta.json matching this session_id
|
||||||
|
# This handles the case where output_dir uses a timestamp name, not the UUID
|
||||||
|
outputs_root = "outputs"
|
||||||
|
if os.path.exists(outputs_root):
|
||||||
|
for dirname in os.listdir(outputs_root):
|
||||||
|
dir_path = os.path.join(outputs_root, dirname)
|
||||||
|
if not os.path.isdir(dir_path) or not dirname.startswith("session_"):
|
||||||
|
continue
|
||||||
|
meta_path = os.path.join(dir_path, "session_meta.json")
|
||||||
|
if os.path.exists(meta_path):
|
||||||
|
try:
|
||||||
|
with open(meta_path, "r", encoding="utf-8") as f:
|
||||||
|
meta = json.load(f)
|
||||||
|
if meta.get("session_id") == session_id:
|
||||||
|
return self._reconstruct_session(session_id, dir_path)
|
||||||
|
except Exception:
|
||||||
|
continue
|
||||||
|
|
||||||
return None
|
return None
|
||||||
|
|
||||||
def _reconstruct_session(self, session_id: str, output_dir: str) -> SessionData:
|
def _reconstruct_session(self, session_id: str, output_dir: str) -> SessionData:
|
||||||
@@ -100,18 +119,29 @@ class SessionManager:
|
|||||||
session.progress_percentage = 100.0
|
session.progress_percentage = 100.0
|
||||||
session.status_message = "已完成 (历史记录)"
|
session.status_message = "已完成 (历史记录)"
|
||||||
|
|
||||||
|
# Read session_meta.json if available
|
||||||
|
meta = {}
|
||||||
|
meta_path = os.path.join(output_dir, "session_meta.json")
|
||||||
|
if os.path.exists(meta_path):
|
||||||
|
try:
|
||||||
|
with open(meta_path, "r", encoding="utf-8") as f:
|
||||||
|
meta = json.load(f)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
# Recover Log
|
# Recover Log
|
||||||
log_path = os.path.join(output_dir, "process.log")
|
log_path = os.path.join(output_dir, "process.log")
|
||||||
if os.path.exists(log_path):
|
if os.path.exists(log_path):
|
||||||
session.log_file = log_path
|
session.log_file = log_path
|
||||||
|
|
||||||
# Recover Report
|
# Recover Report — prefer meta, then scan .md files
|
||||||
# 宽容查找:扫描所有 .md 文件,优先取包含 "report" 或 "报告" 的文件
|
report_path = meta.get("report_path")
|
||||||
|
if report_path and os.path.exists(report_path):
|
||||||
|
session.generated_report = report_path
|
||||||
|
else:
|
||||||
md_files = glob.glob(os.path.join(output_dir, "*.md"))
|
md_files = glob.glob(os.path.join(output_dir, "*.md"))
|
||||||
if md_files:
|
if md_files:
|
||||||
# 默认取第一个
|
|
||||||
chosen = md_files[0]
|
chosen = md_files[0]
|
||||||
# 尝试找更好的匹配
|
|
||||||
for md in md_files:
|
for md in md_files:
|
||||||
fname = os.path.basename(md).lower()
|
fname = os.path.basename(md).lower()
|
||||||
if "report" in fname or "报告" in fname:
|
if "report" in fname or "报告" in fname:
|
||||||
@@ -119,13 +149,21 @@ class SessionManager:
|
|||||||
break
|
break
|
||||||
session.generated_report = chosen
|
session.generated_report = chosen
|
||||||
|
|
||||||
# Recover Script (查找可能的脚本文件)
|
# Recover Script — prefer meta, then scan for 分析脚本_*.py or other patterns
|
||||||
possible_scripts = ["data_analysis_script.py", "script.py", "analysis_script.py"]
|
script_path = meta.get("script_path")
|
||||||
for s in possible_scripts:
|
if script_path and os.path.exists(script_path):
|
||||||
|
session.reusable_script = script_path
|
||||||
|
else:
|
||||||
|
# Try Chinese-named scripts first (generated by this system)
|
||||||
|
script_files = glob.glob(os.path.join(output_dir, "分析脚本_*.py"))
|
||||||
|
if not script_files:
|
||||||
|
for s in ["data_analysis_script.py", "script.py", "analysis_script.py"]:
|
||||||
p = os.path.join(output_dir, s)
|
p = os.path.join(output_dir, s)
|
||||||
if os.path.exists(p):
|
if os.path.exists(p):
|
||||||
session.reusable_script = p
|
script_files = [p]
|
||||||
break
|
break
|
||||||
|
if script_files:
|
||||||
|
session.reusable_script = script_files[0]
|
||||||
|
|
||||||
# Recover Results (images etc)
|
# Recover Results (images etc)
|
||||||
results_json = os.path.join(output_dir, "results.json")
|
results_json = os.path.join(output_dir, "results.json")
|
||||||
@@ -219,6 +257,14 @@ def run_analysis_task(session_id: str, files: list, user_requirement: str, is_fo
|
|||||||
session_output_dir = session.output_dir
|
session_output_dir = session.output_dir
|
||||||
session.log_file = os.path.join(session_output_dir, "process.log")
|
session.log_file = os.path.join(session_output_dir, "process.log")
|
||||||
|
|
||||||
|
# Persist session-to-directory mapping immediately so recovery works
|
||||||
|
# even if the server restarts mid-analysis
|
||||||
|
try:
|
||||||
|
with open(os.path.join(session_output_dir, "session_meta.json"), "w") as f:
|
||||||
|
json.dump({"session_id": session_id, "user_requirement": user_requirement}, f, default=str)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
# 使用 PrintCapture 替代全局 FileLogger,退出 with 块后自动恢复 stdout
|
# 使用 PrintCapture 替代全局 FileLogger,退出 with 块后自动恢复 stdout
|
||||||
with PrintCapture(session.log_file):
|
with PrintCapture(session.log_file):
|
||||||
if is_followup:
|
if is_followup:
|
||||||
@@ -285,6 +331,18 @@ def run_analysis_task(session_id: str, files: list, user_requirement: str, is_fo
|
|||||||
"data_files": session.data_files,
|
"data_files": session.data_files,
|
||||||
}, f, default=str)
|
}, f, default=str)
|
||||||
|
|
||||||
|
# Persist session-to-directory mapping for recovery after server restart
|
||||||
|
try:
|
||||||
|
with open(os.path.join(session_output_dir, "session_meta.json"), "w") as f:
|
||||||
|
json.dump({
|
||||||
|
"session_id": session_id,
|
||||||
|
"user_requirement": user_requirement,
|
||||||
|
"report_path": session.generated_report,
|
||||||
|
"script_path": session.reusable_script,
|
||||||
|
}, f, default=str)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f"Error during analysis: {e}")
|
print(f"Error during analysis: {e}")
|
||||||
|
|
||||||
@@ -350,14 +408,36 @@ async def chat_analysis(request: ChatRequest, background_tasks: BackgroundTasks)
|
|||||||
import math as _math
|
import math as _math
|
||||||
|
|
||||||
def _sanitize_value(v):
|
def _sanitize_value(v):
|
||||||
"""Replace NaN/inf with None for JSON safety."""
|
"""Make any value JSON-serializable.
|
||||||
if isinstance(v, float) and (_math.isnan(v) or _math.isinf(v)):
|
|
||||||
|
Handles: NaN/inf floats → None, pandas Timestamp/Timedelta → str,
|
||||||
|
numpy integers/floats → Python int/float, dicts and lists recursively.
|
||||||
|
"""
|
||||||
|
if v is None:
|
||||||
return None
|
return None
|
||||||
|
if isinstance(v, float):
|
||||||
|
if _math.isnan(v) or _math.isinf(v):
|
||||||
|
return None
|
||||||
|
return v
|
||||||
|
if isinstance(v, (int, bool, str)):
|
||||||
|
return v
|
||||||
if isinstance(v, dict):
|
if isinstance(v, dict):
|
||||||
return {k: _sanitize_value(val) for k, val in v.items()}
|
return {k: _sanitize_value(val) for k, val in v.items()}
|
||||||
if isinstance(v, list):
|
if isinstance(v, list):
|
||||||
return [_sanitize_value(item) for item in v]
|
return [_sanitize_value(item) for item in v]
|
||||||
return v
|
# pandas Timestamp, Timedelta, NaT
|
||||||
|
try:
|
||||||
|
if pd.isna(v):
|
||||||
|
return None
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
pass
|
||||||
|
if hasattr(v, 'isoformat'): # datetime, Timestamp
|
||||||
|
return v.isoformat()
|
||||||
|
# numpy scalar types
|
||||||
|
if hasattr(v, 'item'):
|
||||||
|
return v.item()
|
||||||
|
# Fallback: convert to string
|
||||||
|
return str(v)
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/status")
|
@app.get("/api/status")
|
||||||
@@ -533,6 +613,37 @@ async def list_available_templates():
|
|||||||
return {"templates": list_templates()}
|
return {"templates": list_templates()}
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/api/templates/{template_name}")
|
||||||
|
async def get_template_detail(template_name: str):
|
||||||
|
"""获取单个模板的完整内容(含步骤)"""
|
||||||
|
from utils.analysis_templates import get_template
|
||||||
|
try:
|
||||||
|
tpl = get_template(template_name)
|
||||||
|
return tpl.to_dict()
|
||||||
|
except ValueError as e:
|
||||||
|
raise HTTPException(status_code=404, detail=str(e))
|
||||||
|
|
||||||
|
|
||||||
|
@app.put("/api/templates/{template_name}")
|
||||||
|
async def update_template(template_name: str, body: dict):
|
||||||
|
"""创建或更新模板"""
|
||||||
|
from utils.analysis_templates import save_template
|
||||||
|
try:
|
||||||
|
filepath = save_template(template_name, body)
|
||||||
|
return {"status": "saved", "filepath": filepath}
|
||||||
|
except Exception as e:
|
||||||
|
raise HTTPException(status_code=500, detail=str(e))
|
||||||
|
|
||||||
|
|
||||||
|
@app.delete("/api/templates/{template_name}")
|
||||||
|
async def remove_template(template_name: str):
|
||||||
|
"""删除模板"""
|
||||||
|
from utils.analysis_templates import delete_template
|
||||||
|
if delete_template(template_name):
|
||||||
|
return {"status": "deleted"}
|
||||||
|
raise HTTPException(status_code=404, detail=f"Template not found: {template_name}")
|
||||||
|
|
||||||
|
|
||||||
# --- Data Files API ---
|
# --- Data Files API ---
|
||||||
|
|
||||||
@app.get("/api/data-files")
|
@app.get("/api/data-files")
|
||||||
|
|||||||
@@ -209,12 +209,16 @@ function startPolling() {
|
|||||||
loadDataFiles();
|
loadDataFiles();
|
||||||
|
|
||||||
// Update progress bar during analysis
|
// Update progress bar during analysis
|
||||||
|
// Use rounds.length (actual completed analysis rounds) for display
|
||||||
|
// instead of current_round (which includes non-code rounds like collect_figures)
|
||||||
if (data.is_running && data.progress_percentage !== undefined) {
|
if (data.is_running && data.progress_percentage !== undefined) {
|
||||||
updateProgressBar(data.progress_percentage, data.status_message, data.current_round, data.max_rounds);
|
const displayRound = rounds.length || data.current_round || 0;
|
||||||
|
updateProgressBar(data.progress_percentage, data.status_message, displayRound, data.max_rounds);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!data.is_running && isRunning) {
|
if (!data.is_running && isRunning) {
|
||||||
updateProgressBar(100, 'Analysis complete', data.current_round || data.max_rounds, data.max_rounds);
|
const displayRound = rounds.length || data.current_round || data.max_rounds;
|
||||||
|
updateProgressBar(100, 'Analysis complete', displayRound, data.max_rounds);
|
||||||
setTimeout(hideProgressBar, 3000);
|
setTimeout(hideProgressBar, 3000);
|
||||||
|
|
||||||
setRunningState(false);
|
setRunningState(false);
|
||||||
|
|||||||
Reference in New Issue
Block a user