Skip to content

课 2 · Neo4j 建模与图检索

本课目标

把抽取的实体关系写入 Neo4j,用 Cypher 实现图检索,理解图数据库在 RAG 场景下的价值。

记一句话:在 Neo4j 里,关系是一等公民——和节点一样可以有属性,可以直接查询。

启动 Neo4j

模块 1 已经准备好了 Docker Compose 配置,启用 Neo4j profile:

bash
pnpm dev:infra:graph
# 等价于:docker compose -f infra/docker-compose.dev.yml --profile graph up -d

访问 http://localhost:7474 打开 Neo4j Browser,用 neo4j/password 登录。

安装驱动

bash
pnpm add neo4j-driver

连接封装

typescript
// packages/shared/src/lib/neo4j.ts
import neo4j, { type Driver } from 'neo4j-driver'

let driver: Driver | null = null

export function getNeo4j(): Driver {
  if (!driver) {
    driver = neo4j.driver(
      process.env.NEO4J_URL ?? 'bolt://localhost:7687',
      neo4j.auth.basic(
        process.env.NEO4J_USER ?? 'neo4j',
        process.env.NEO4J_PASSWORD ?? 'password',
      ),
    )
  }
  return driver
}

export async function runQuery<T = unknown>(
  cypher: string,
  params: Record<string, unknown> = {},
): Promise<T[]> {
  const session = getNeo4j().session()
  try {
    const result = await session.run(cypher, params)
    return result.records.map((r) => r.toObject() as T)
  } finally {
    await session.close()
  }
}

图谱建模

typescript
// packages/rag-core/src/graph/neo4j-store.ts

// 创建约束(保证唯一性,同时加速查询)
export async function ensureConstraints() {
  await runQuery('CREATE CONSTRAINT entity_id IF NOT EXISTS FOR (e:Entity) REQUIRE e.id IS UNIQUE')
}

// 写入实体
export async function upsertEntity(entity: Entity) {
  await runQuery(
    `
    MERGE (e:Entity { id: $id })
    SET e.name = $name,
        e.type = $type,
        e.description = $description
    WITH e
    UNWIND $sourceChunkIds AS chunkId
    MERGE (c:Chunk { id: chunkId })
    MERGE (e)-[:MENTIONED_IN]->(c)
    `,
    {
      id: entity.id,
      name: entity.name,
      type: entity.type,
      description: entity.description ?? '',
      sourceChunkIds: entity.sourceChunkIds,
    },
  )
}

// 写入关系
export async function upsertRelationship(rel: Relationship) {
  // 注意:Neo4j Cypher 里关系类型不能用参数,需要用模板
  await runQuery(
    `
    MATCH (from:Entity { id: $fromId })
    MATCH (to:Entity { id: $toId })
    MERGE (from)-[r:${rel.type} { id: $relId }]->(to)
    SET r.description = $description
    `,
    {
      fromId: rel.fromEntityId,
      toId: rel.toEntityId,
      relId: rel.id,
      description: rel.description ?? '',
    },
  )
}

Cypher 图检索

Cypher 是 Neo4j 的查询语言,语法接近"画出图的形状":

typescript
// packages/rag-core/src/graph/graph-search.ts

/**
 * 根据实体名称查找直接相关的所有实体(1 跳)
 */
export async function findRelatedEntities(entityName: string, depth = 1) {
  const results = await runQuery<{
    entity: { properties: Entity }
    relType: string
    related: { properties: Entity }
  }>(
    `
    MATCH (e:Entity { name: $name })-[r]->(related:Entity)
    RETURN e AS entity, type(r) AS relType, related
    UNION
    MATCH (e:Entity { name: $name })<-[r]-(related:Entity)
    RETURN e AS entity, type(r) AS relType, related
    LIMIT 20
    `,
    { name: entityName },
  )

  return results.map((r) => ({
    entity: r.entity.properties,
    relType: r.relType,
    related: r.related.properties,
  }))
}

/**
 * 从问题中提取实体名称,然后做图检索
 */
export async function graphSearch(query: string) {
  // 用 LLM 从问题里提取实体名称
  const { object } = await generateObject({
    model: getModel(),
    schema: z.object({ entities: z.array(z.string()) }),
    prompt: `从以下问题中提取关键实体名称(人名、机构名、产品名、技术名等):\n${query}`,
  })

  const allRelated: Array<{ entity: Entity; relType: string; related: Entity }> = []
  for (const entityName of object.entities) {
    const related = await findRelatedEntities(entityName)
    allRelated.push(...related)
  }

  return allRelated
}

路径查询示例

Neo4j 特有的多跳路径查询,向量检索做不到:

cypher
// 找两个实体之间的所有路径(最多 3 跳)
MATCH path = shortestPath((a:Entity { name: "OpenAI" })-[*..3]-(b:Entity { name: "GPT-4" }))
RETURN path

// 找和某技术相关的所有论文(通过关系推理)
MATCH (t:Entity { name: "Transformer" })<-[:BASED_ON]-(product:Entity { type: "Product" })
MATCH (product)<-[:AUTHORED_BY]-(person:Entity { type: "Person" })
RETURN person.name, product.name

本节产物

packages/shared/src/lib/
  neo4j.ts              # Neo4j 连接封装
packages/rag-core/src/graph/
  neo4j-store.ts        # 实体/关系写入
  graph-search.ts       # 图检索查询

面试追问

Neo4j 和关系型数据库(如 PostgreSQL)存图有什么区别?

关系型数据库可以用关联表存图,但多跳查询需要多次 JOIN,性能随跳数指数级下降。Neo4j 的底层存储针对图遍历优化:每个节点直接存储指向邻居节点的指针,多跳遍历是 O(k) 而非 O(n)。对 RAG 这种 1-3 跳查询,Neo4j 比 SQL JOIN 快 10-100 倍。

Cypher 的语法设计有什么特点?

Cypher 的核心设计理念是"画出你想要的图形"。(a)-[:KNOWS]->(b) 就像画了一个箭头,从 a 到 b。圆括号代表节点,方括号代表关系,箭头代表方向。这种可视化的语法让图查询比 SQL 更直观。

面向前端工程师和独立开发者的 AI 应用工程课程