sakana.ai の AI エージェント: AI Scientist のコード解析メモ

カバー絵は sakana.ai から拝借。

sakana.ai の AI エージェント AI Scientist を動かして、さらにコードを読むところまで。

先に所感を述べると、とてもシンプルなコードで、またやっていることも真新しいところは特にない。むしろ基本的なテクニックを愚直に真正面から使い倒しているかんじ。大切なのは Reflection と Chain-of-Thought であった。

以下の commit で説明する。

GitHub - SakanaAI/AI-Scientist at 3274a3c242108e4ae1eaf41f75211f97adaafd08

github.com

AI Scientist
#

AI Scientist は、sakana.ai が開発した AI エージェントのアプリ。科学的な研究プロセスを自動化して、アイデア生成から論文執筆まで一連の作業を行う。

shi3z 氏の解説記事:

ついにご家庭にやってきたシンギュラリティ。AIサイエンティストが勝手に仮説を立て、実験して、論文を書く｜shi3z

note.com

ブログ:

sakana.ai

論文:

arxiv.org

AI Scientist は以下のようなかんじで動く:

最初にアイディアを出して
勝手に論文を探しにいって新規性を評価して
実際に実験をして
それを論文として記述する

では早速コードをみる

Reflection, Reflection, Reflection!
#

とにかく Reflection だらけ。ここで言う Reflection とは Andrew Ng が言っていた Agentic Workflow における Reflection.

Agentic Design Patterns Part 2: Reflection

www.deeplearning.ai

Reflection の例: アイディアを出すところ
#

例えば、以下の generate_ideas.py を見てみる。

AI-Scientist/ai_scientist/generate_ideas.py at 3274a3c242108e4ae1eaf41f75211f97adaafd08 · …

github.com

このスクリプトでは、アイデア生成後に複数回の振り返りを行っている。

具体的には：

初期アイデアの生成
最大 5 回の振り返りと改善
必要に応じた早期終了

という具合。

どういう振り返りをするかの指示は idea_reflection_prompt に定義されている。中を見てみる:

idea_reflection_prompt = """Round {current_round}/{num_reflections}.
In your thoughts, first carefully consider the quality, novelty, and feasibility of the idea you just created.
Include any other factors that you think are important in evaluating the idea.
Ensure the idea is clear and concise, and the JSON is the correct format.
Do not make things overly complicated.
In the next attempt, try and refine and improve your idea.
Stick to the spirit of the original idea unless there are glaring issues.

Respond in the same format as before:
THOUGHT:
<THOUGHT>

NEW IDEA JSON:

```json
<JSON>
\```

If there is nothing to improve, simply repeat the previous JSON EXACTLY after the thought and include "I am done" at the end of the thoughts but before the JSON.
ONLY INCLUDE "I am done" IF YOU ARE MAKING NO MORE CHANGES."""

このプロンプトは以下のようなことが書いてある:

アイデアの質、新規性、実現可能性を慎重に検討する
アイデアの評価に重要なその他の要因を考慮する
アイデアが明確で簡潔であることを確認する
JSON フォーマットが正確であることを確認する
過度に複雑にしないよう注意する
次の試行でアイデアを改良し改善する
大きな問題がない限り、元のアイデアの本質を維持する

実際の振り返りプロセスは generate_ideas に実装されている:

# Iteratively improve task.
if num_reflections > 1:
    for j in range(num_reflections - 1):
        print(f"Iteration {j + 2}/{num_reflections}")
        text, msg_history = get_response_from_llm(
            idea_reflection_prompt.format(
                current_round=j + 2, num_reflections=num_reflections
            ),
            client=client,
            model=model,
            system_message=idea_system_prompt,
            msg_history=msg_history,
        )
        ## PARSE OUTPUT
        json_output = extract_json_between_markers(text)
        assert (
            json_output is not None
        ), "Failed to extract JSON from LLM output"
        print(json_output)

        if "I am done" in text:
            print(f"Idea generation converged after {j + 2} iterations.")
            break

このコードでは、最大で num_reflections - 1 回（通常は 4 回）の振り返りを行っている。各反復で

AI に振り返りを促すプロンプトを送信
AI の応答から JSON を抽出
「I am done」が含まれていれば早期終了

このプロセスにより、AI は初期アイデアを洗練させている。また、改善の必要がなくなった時点で早期に終了することも可能。

Reflection の例: TeX で論文を実際に書くところ
#

他にも、 perform_writeup.py も見る。ここでは諸々終わった一番最後のステップで、 TeX 形式で論文を書くところ。

AI-Scientist/ai_scientist/perform_writeup.py at 3274a3c242108e4ae1eaf41f75211f97adaafd08 · …

github.com

とりわけ perform_writeup は、以下のステップで論文の執筆プロセスを実行している:

論文のタイトルと要約（Abstract）を作成
以下のセクションを順番に作成:
- Introduction（序論）
- Background（背景）
- Method（方法）
- Experimental Setup（実験設定）
- Results（結果）
- Conclusion（結論）
Related Work（関連研究）セクションの概要を作成
引用文献を追加するために、複数回のラウンドで論文を改訂
各セクションを再度改善
最後に、LaTeX を生成して PDF ファイルを作成

詳しく見る。

for section in [
    "Introduction",
    "Background",
    "Method",
    "Experimental Setup",
    "Results",
    "Conclusion",
]:
    section_prompt = f"""Please fill in the {section} of the writeup. Some tips are provided below:
{per_section_tips[section]}

Be sure to use \cite or \citet where relevant, referring to the works provided in the file.
Do not cite anything that is not already in `references.bib`. Do not add any new entries to this.

Keep the experimental results (figures and tables) only in the Results section, and make sure that any captions are filled in.
In this pass, do not reference anything in later sections of the paper.

Before every paragraph, please include a brief description of what you plan to write in that paragraph in a comment.

Be sure to first name the file and use *SEARCH/REPLACE* blocks to perform these edits.
"""
    coder_out = coder.run(section_prompt)

このコードは、各セクションに対して特定のプロンプトを用意し、AI コーダーに指示を出している。興味深いのは、各セクションのポイントを per_section_tips で示しているところ。例えば Introduction の Tips は以下のようなかんじ:

    "Introduction": """
- Longer version of the Abstract, i.e. of the entire paper
- What are we trying to do and why is it relevant?
- Why is this hard?
- How do we solve it (i.e. our contribution!)
- How do we verify that we solved it (e.g. Experiments and results)
- New trend: specifically list your contributions as bullet points
- Extra space? Future work!
""",

次に引用文献の追加の部分。以下のコードでは、Semantic Scholar API を使用して関連論文を検索し、引用を追加している:

for _ in range(num_cite_rounds):
    with open(osp.join(folder_name, "latex", "template.tex"), "r") as f:
        draft = f.read()
    prompt, done = get_citation_aider_prompt(
        cite_client, cite_model, draft, _, num_cite_rounds
    )
    if done:
        break
    if prompt is not None:
        # extract bibtex string
        bibtex_string = prompt.split('"""')[1]
        # insert this into draft before the "\end{filecontents}" line
        search_str = r"\end{filecontents}"
        draft = draft.replace(search_str, f"{bibtex_string}{search_str}")
        with open(osp.join(folder_name, "latex", "template.tex"), "w") as f:
            f.write(draft)
        coder_out = coder.run(prompt)

このコードは、指定された回数（num_cite_rounds）だけ引用追加のプロセスを繰り返している。各ラウンドで、get_citation_aider_prompt という関数を呼び出して、新しい引用を探し、それを論文に追加している。

ここは結構すごいことやってると思ってる。このプロセスでは、AI が論文の内容を理解し、適切な引用を提案している。情報検索を LLM に任せるという技。すごくいい。 Andrew Ng が「LLM は大学生のインターンに任せるような仕事を任せるといい」的なことを言っていたけれども、まさにそう、というかんじ。

最後に、論文全体を見直し、各セクションを改善する：

for section in [
    "Abstract",
    "Related Work",
    "Introduction",
    "Background",
    "Method",
    "Experimental Setup",
    "Results",
    "Conclusion",
]:
    coder_out = coder.run(
        second_refinement_prompt.format(
            section=section, tips=per_section_tips[section]
        )
        .replace(r"{{", "{")
        .replace(r"}}", "}")
    )

引用周りの実装
#

重要な get_citation_aider_prompt を更にみてみる。

ここはお手本にすべき RAG の実装というかんじ。

この関数の主な役割は以下の通り:

現在の論文ドラフトを分析し、新しい引用が必要な箇所を特定する
適切な検索クエリを生成する
Semantic Scholar API を使用して関連論文を検索する
検索結果から最適な論文を選択する
選択した論文の引用情報を BibTeX 形式で生成する

この関数の一部を以下に示す:

def get_citation_aider_prompt(
    client, model, draft, current_round, total_rounds
) -> Tuple[Optional[str], bool]:
    msg_history = []
    try:
        text, msg_history = get_response_from_llm(
            citation_first_prompt.format(
                draft=draft, current_round=current_round, total_rounds=total_rounds
            ),
            client=client,
            model=model,
            system_message=citation_system_msg.format(total_rounds=total_rounds),
            msg_history=msg_history,
        )
        if "No more citations needed" in text:
            print("No more citations needed.")
            return None, True

        ## PARSE OUTPUT
        json_output = extract_json_between_markers(text)
        assert json_output is not None, "Failed to extract JSON from LLM output"
        query = json_output["Query"]
        papers = search_for_papers(query)
    except Exception as e:
        print(f"Error: {e}")
        return None, False

    # ... (以下、検索結果の処理と最適な論文の選択)

この関数では、 LLM に論文ドラフトを分析させ、新しい引用が必要な箇所を特定している。同時に、適切な検索クエリも生成している。これには、以下のようなプロンプトを使用している：


citation_first_prompt = '''Round {current_round}/{total_rounds}:

You have written this LaTeX draft so far:

"""
{draft}
"""

Identify the most important citation that you still need to add, and the query to find the paper.

Respond in the following format:

THOUGHT:
<THOUGHT>

RESPONSE:
```json
<JSON>
\```

In <THOUGHT>, first briefly reason over the paper and identify where citations should be added.
If there are more citations needed, add "No more citations needed" to your thoughts.
Do not add "No more citations needed" if you are adding citations this round.

In <JSON>, respond in JSON format with the following fields:
- "Description": A precise description of the required edit, along with the proposed text and location where it should be made.
- "Query": The search query to find the paper (e.g. attention is all you need).

Ensure the description is sufficient to make the change without further context. Someone else will make the change.
The query will work best if you are able to recall the exact name of the paper you are looking for, or the authors.
This JSON will be automatically parsed, so ensure the format is precise.

このプロンプトの主な内容：

ラウンド情報：現在のラウンドと総ラウンド数の提示
現在の LaTeX 原稿：これまでに作成された論文下書きの提示
タスク：未追加の最重要引用の特定と、該当論文検索用クエリの作成
回答フォーマット：
- THOUGHT：論文分析と引用追加箇所特定の思考過程記述
- RESPONSE：具体的編集内容とクエリの JSON 形式での提供
THOUGHT セクション指示：
- 論文の簡潔な分析と引用追加箇所の特定
- 引用不要の場合「No more citations needed」の記述
- 当ラウンドで引用追加時は「No more citations needed」の省略
RESPONSE セクション指示：
- “Description”：必要編集の詳細説明、提案テキスト、編集箇所の記述
- “Query”：論文検索用クエリの提供
追加注意事項：
- 他者による編集可能な十分詳細な説明の必要性
- 論文の正確な名称や著者名を含むクエリの推奨
- 自動解析のための正確な JSON フォーマットの要求

まさに、 Query の提供とか、特徴的だな。

検索クエリが生成されたら、search_for_papers 関数を使用して Semantic Scholar API で論文を検索する。その後、検索結果を AI に提示し、最適な論文を選択させる：

citation_second_prompt = """Search has recovered the following articles:

{papers}

Respond in the following format:

THOUGHT:
<THOUGHT>

RESPONSE:
```json
<JSON>
\```

In <THOUGHT>, first briefly reason over the search results and identify which citation best fits your paper and the location is to be added at.
If none are appropriate, add "Do not add any" to your thoughts.

In <JSON>, respond in JSON format with the following fields:
- "Selected": A list of the indices of the selected papers to be cited, e.g. "[0, 1]". Can be "[]" if no papers are selected. This must be a string.
- "Description": Update the previous description of the required edit if needed. Ensure that any cites precisely match the name in the bibtex!!!

Do not select papers that are already in the `references.bib` file at the top of the draft, or if the same citation exists under a different name.
This JSON will be automatically parsed, so ensure the format is precise."""

このプロンプトの主要内容：

検索結果の提示
- 検索により回収された論文リストの提供
回答フォーマットの指定
- THOUGHT：思考過程の記述
- RESPONSE：JSON 形式での回答
THOUGHT セクションの指示
- 検索結果の分析
- 最適な引用の特定と追加位置の決定
- 適切な引用がない場合「Do not add any」の記述
RESPONSE セクションの指示
- JSON 形式での回答要求
- 必須フィールド
  - “Selected”：引用する論文のインデックスリスト（文字列形式）
  - “Description”：必要に応じた編集内容の更新
引用選択における注意事項
- references.bibファイルに既存の論文の除外
- 異なる名称で同一引用が存在する場合の除外
JSON 形式の厳密性要求
- 自動解析のための正確なフォーマットの必要性

この THOUGHT セクションは興味深い。必要なのは RESPONSE セクションであるが、その前にその根拠を書かせている。これはまさに Chain of Thought を使っている。

arxiv.org

おわりに: 細いところ
#

環境設定
#

README には conda を用いた方法が記載されているが、 conda は使いたくないので rye でやる。

requirements.txt のやつを全部 rye add しようとするとエラーがでるので、丁寧にバージョン周りを解決しよう。こういうのがあるから、素の requirements.txt を使うのはよくないねというのがわかるね。

とりあえず以下のバージョンならば rye sync で動く。

dependencies = [
    "openai>=1.40.6",
    "anthropic[bedrock]>=0.34.0",
    "aider-chat>=0.50.1",
    "backoff>=2.2.1",
    "matplotlib>=3.9.2",
    "pypdf>=4.3.1",
    "pymupdf4llm>=0.0.10",
    "torch>=2.4.0",
    "numpy>=1.26.4",
    "transformers>=4.44.0",
    "datasets>=2.21.0",
    "tiktoken>=0.7.0",
    "wandb>=0.17.7",
    "tqdm>=4.66.5",
]

LLM はとりあえず OpenAI にする。

export OPENAI_API_KEY="YOUR KEY HERE"

他にも README にはいろいろとやるべきことが書いてあったけれども、一旦やらなくてもいい。とりあえずアイディアを出すところまでは動く。その後の実験と TeX を書くところは動かない。

なお、本当に論文書かせるまでやろうとすると、結構めんどくさい:

SakanaAIのAIサイエンティストで独自の実験テンプレートを作ってみる(実験中)｜shi3z

note.com

実行
#

README に従う:

python launch_scientist.py --model "gpt-4o-2024-05-13" --experiment nanoGPT_lite --num-ideas 2

Debug しながら確かめる
#

import sys

from launch_scientist import main

if __name__ == "__main__":
    # コマンドライン引数を設定
    sys.argv = [
        "launch_scientist.py",
        "--model",
        "gpt-4o-2024-05-13",
        "--experiment",
        "nanoGPT_lite",
        "--num-ideas",
        "2",
    ]

    # main関数を実行
    main()