Appearance
课 2 · Neo4j 建模与图检索
本课目标
把抽取的实体关系写入 Neo4j,用 Cypher 实现图检索,理解图数据库在 RAG 场景下的价值。
记一句话:在 Neo4j 里,关系是一等公民——和节点一样可以有属性,可以直接查询。
启动 Neo4j
模块 1 已经准备好了 Docker Compose 配置,启用 Neo4j profile:
bash
pnpm dev:infra:graph
# 等价于:docker compose -f infra/docker-compose.dev.yml --profile graph up -d访问 http://localhost:7474 打开 Neo4j Browser,用 neo4j/password 登录。
安装驱动
bash
pnpm add neo4j-driver连接封装
typescript
// packages/shared/src/lib/neo4j.ts
import neo4j, { type Driver } from 'neo4j-driver'
let driver: Driver | null = null
export function getNeo4j(): Driver {
if (!driver) {
driver = neo4j.driver(
process.env.NEO4J_URL ?? 'bolt://localhost:7687',
neo4j.auth.basic(
process.env.NEO4J_USER ?? 'neo4j',
process.env.NEO4J_PASSWORD ?? 'password',
),
)
}
return driver
}
export async function runQuery<T = unknown>(
cypher: string,
params: Record<string, unknown> = {},
): Promise<T[]> {
const session = getNeo4j().session()
try {
const result = await session.run(cypher, params)
return result.records.map((r) => r.toObject() as T)
} finally {
await session.close()
}
}图谱建模
typescript
// packages/rag-core/src/graph/neo4j-store.ts
// 创建约束(保证唯一性,同时加速查询)
export async function ensureConstraints() {
await runQuery('CREATE CONSTRAINT entity_id IF NOT EXISTS FOR (e:Entity) REQUIRE e.id IS UNIQUE')
}
// 写入实体
export async function upsertEntity(entity: Entity) {
await runQuery(
`
MERGE (e:Entity { id: $id })
SET e.name = $name,
e.type = $type,
e.description = $description
WITH e
UNWIND $sourceChunkIds AS chunkId
MERGE (c:Chunk { id: chunkId })
MERGE (e)-[:MENTIONED_IN]->(c)
`,
{
id: entity.id,
name: entity.name,
type: entity.type,
description: entity.description ?? '',
sourceChunkIds: entity.sourceChunkIds,
},
)
}
// 写入关系
export async function upsertRelationship(rel: Relationship) {
// 注意:Neo4j Cypher 里关系类型不能用参数,需要用模板
await runQuery(
`
MATCH (from:Entity { id: $fromId })
MATCH (to:Entity { id: $toId })
MERGE (from)-[r:${rel.type} { id: $relId }]->(to)
SET r.description = $description
`,
{
fromId: rel.fromEntityId,
toId: rel.toEntityId,
relId: rel.id,
description: rel.description ?? '',
},
)
}Cypher 图检索
Cypher 是 Neo4j 的查询语言,语法接近"画出图的形状":
typescript
// packages/rag-core/src/graph/graph-search.ts
/**
* 根据实体名称查找直接相关的所有实体(1 跳)
*/
export async function findRelatedEntities(entityName: string, depth = 1) {
const results = await runQuery<{
entity: { properties: Entity }
relType: string
related: { properties: Entity }
}>(
`
MATCH (e:Entity { name: $name })-[r]->(related:Entity)
RETURN e AS entity, type(r) AS relType, related
UNION
MATCH (e:Entity { name: $name })<-[r]-(related:Entity)
RETURN e AS entity, type(r) AS relType, related
LIMIT 20
`,
{ name: entityName },
)
return results.map((r) => ({
entity: r.entity.properties,
relType: r.relType,
related: r.related.properties,
}))
}
/**
* 从问题中提取实体名称,然后做图检索
*/
export async function graphSearch(query: string) {
// 用 LLM 从问题里提取实体名称
const { object } = await generateObject({
model: getModel(),
schema: z.object({ entities: z.array(z.string()) }),
prompt: `从以下问题中提取关键实体名称(人名、机构名、产品名、技术名等):\n${query}`,
})
const allRelated: Array<{ entity: Entity; relType: string; related: Entity }> = []
for (const entityName of object.entities) {
const related = await findRelatedEntities(entityName)
allRelated.push(...related)
}
return allRelated
}路径查询示例
Neo4j 特有的多跳路径查询,向量检索做不到:
cypher
// 找两个实体之间的所有路径(最多 3 跳)
MATCH path = shortestPath((a:Entity { name: "OpenAI" })-[*..3]-(b:Entity { name: "GPT-4" }))
RETURN path
// 找和某技术相关的所有论文(通过关系推理)
MATCH (t:Entity { name: "Transformer" })<-[:BASED_ON]-(product:Entity { type: "Product" })
MATCH (product)<-[:AUTHORED_BY]-(person:Entity { type: "Person" })
RETURN person.name, product.name本节产物
packages/shared/src/lib/
neo4j.ts # Neo4j 连接封装
packages/rag-core/src/graph/
neo4j-store.ts # 实体/关系写入
graph-search.ts # 图检索查询面试追问
Neo4j 和关系型数据库(如 PostgreSQL)存图有什么区别?
关系型数据库可以用关联表存图,但多跳查询需要多次 JOIN,性能随跳数指数级下降。Neo4j 的底层存储针对图遍历优化:每个节点直接存储指向邻居节点的指针,多跳遍历是 O(k) 而非 O(n)。对 RAG 这种 1-3 跳查询,Neo4j 比 SQL JOIN 快 10-100 倍。
Cypher 的语法设计有什么特点?
Cypher 的核心设计理念是"画出你想要的图形"。(a)-[:KNOWS]->(b) 就像画了一个箭头,从 a 到 b。圆括号代表节点,方括号代表关系,箭头代表方向。这种可视化的语法让图查询比 SQL 更直观。