```
├── .gitattributes
├── .github/
├── ISSUE_TEMPLATE/
├── config.yml
├── request_new_features.yaml
├── show_me_the_bug.yaml
├── PULL_REQUEST_TEMPLATE.md
├── dependabot.yml
├── workflows/
├── build-package.yaml
├── environment-corrupt-check.yaml
├── pr-autodiff.yaml
├── pre-commit.yaml
├── stale.yaml
├── top-issues.yaml
├── .gitignore
├── .pre-commit-config.yaml
├── .vscode/
├── extensions.json
├── settings.json
├── CODE_OF_CONDUCT.md
├── Dockerfile
├── LICENSE
├── README.md
├── README_ja.md
├── README_ko.md
├── README_zh.md
├── app/
├── __init__.py
├── agent/
├── __init__.py
├── base.py
├── browser.py
├── data_analysis.py
├── manus.py
├── mcp.py
├── react.py
├── swe.py
├── toolcall.py
├── bedrock.py
├── config.py
├── exceptions.py
├── flow/
├── __init__.py
├── base.py
├── flow_factory.py
├── planning.py
├── llm.py
├── logger.py
├── mcp/
├── __init__.py
├── server.py
├── prompt/
├── __init__.py
├── browser.py
├── manus.py
├── mcp.py
├── planning.py
├── swe.py
├── toolcall.py
├── visualization.py
├── sandbox/
├── __init__.py
├── client.py
├── core/
├── exceptions.py
├── manager.py
├── sandbox.py
├── terminal.py
├── schema.py
├── tool/
├── __init__.py
├── ask_human.py
├── base.py
├── bash.py
```
## /.gitattributes
```gitattributes path="/.gitattributes"
# HTML code is incorrectly calculated into statistics, so ignore them
*.html linguist-detectable=false
# Auto detect text files and perform LF normalization
* text=auto eol=lf
# Ensure shell scripts use LF (Linux style) line endings on Windows
*.sh text eol=lf
# Treat specific binary files as binary and prevent line ending conversion
*.png binary
*.jpg binary
*.gif binary
*.ico binary
*.jpeg binary
*.mp3 binary
*.zip binary
*.bin binary
# Preserve original line endings for specific document files
*.doc text eol=crlf
*.docx text eol=crlf
*.pdf binary
# Ensure source code and script files use LF line endings
*.py text eol=lf
*.js text eol=lf
*.html text eol=lf
*.css text eol=lf
# Specify custom diff driver for specific file types
*.md diff=markdown
*.json diff=json
*.mp4 filter=lfs diff=lfs merge=lfs -text
*.mov filter=lfs diff=lfs merge=lfs -text
*.webm filter=lfs diff=lfs merge=lfs -text
```
## /.github/ISSUE_TEMPLATE/config.yml
```yml path="/.github/ISSUE_TEMPLATE/config.yml"
blank_issues_enabled: false
contact_links:
- name: "Join the Community Group"
about: Join the OpenManus community to discuss and get help from others
url: https://github.com/mannaandpoem/OpenManus?tab=readme-ov-file#community-group
```
## /.github/ISSUE_TEMPLATE/request_new_features.yaml
```yaml path="/.github/ISSUE_TEMPLATE/request_new_features.yaml"
name: "🤔 Request new features"
description: Suggest ideas or features you’d like to see implemented in OpenManus.
labels: enhancement
body:
- type: textarea
id: feature-description
attributes:
label: Feature description
description: |
Provide a clear and concise description of the proposed feature
validations:
required: true
- type: textarea
id: your-feature
attributes:
label: Your Feature
description: |
Explain your idea or implementation process, if any. Optionally, include a Pull Request URL.
Ensure accompanying docs/tests/examples are provided for review.
validations:
required: false
```
## /.github/ISSUE_TEMPLATE/show_me_the_bug.yaml
```yaml path="/.github/ISSUE_TEMPLATE/show_me_the_bug.yaml"
name: "🪲 Show me the Bug"
description: Report a bug encountered while using OpenManus and seek assistance.
labels: bug
body:
- type: textarea
id: bug-description
attributes:
label: Bug Description
description: |
Clearly describe the bug you encountered
validations:
required: true
- type: textarea
id: solve-method
attributes:
label: Bug solved method
description: |
If resolved, explain the solution. Optionally, include a Pull Request URL.
If unresolved, provide additional details to aid investigation
validations:
required: true
- type: textarea
id: environment-information
attributes:
label: Environment information
description: |
System: e.g., Ubuntu 22.04
Python: e.g., 3.12
OpenManus version: e.g., 0.1.0
value: |
- System version:
- Python version:
- OpenManus version or branch:
- Installation method (e.g., `pip install -r requirements.txt` or `pip install -e .`):
validations:
required: true
- type: textarea
id: extra-information
attributes:
label: Extra information
description: |
For example, attach screenshots or logs to help diagnose the issue
validations:
required: false
```
## /.github/PULL_REQUEST_TEMPLATE.md
**Features**
- Feature 1
- Feature 2
**Feature Docs**
**Influence**
**Result**
**Other**
## /.github/dependabot.yml
```yml path="/.github/dependabot.yml"
version: 2
updates:
- package-ecosystem: "pip"
directory: "/"
schedule:
interval: "weekly"
open-pull-requests-limit: 4
groups:
# Group critical packages that might need careful review
core-dependencies:
patterns:
- "pydantic*"
- "openai"
- "fastapi"
- "tiktoken"
browsergym-related:
patterns:
- "browsergym*"
- "browser-use"
- "playwright"
search-tools:
patterns:
- "googlesearch-python"
- "baidusearch"
- "duckduckgo_search"
pre-commit:
patterns:
- "pre-commit"
security-all:
applies-to: "security-updates"
patterns:
- "*"
version-all:
applies-to: "version-updates"
patterns:
- "*"
exclude-patterns:
- "pydantic*"
- "openai"
- "fastapi"
- "tiktoken"
- "browsergym*"
- "browser-use"
- "playwright"
- "googlesearch-python"
- "baidusearch"
- "duckduckgo_search"
- "pre-commit"
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
open-pull-requests-limit: 4
groups:
actions:
patterns:
- "*"
```
## /.github/workflows/build-package.yaml
```yaml path="/.github/workflows/build-package.yaml"
name: Build and upload Python package
on:
workflow_dispatch:
release:
types: [created, published]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install setuptools wheel twine
- name: Set package version
run: |
export VERSION="${GITHUB_REF#refs/tags/v}"
sed -i "s/version=.*/version=\"${VERSION}\",/" setup.py
- name: Build and publish
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
run: |
python setup.py bdist_wheel sdist
twine upload dist/*
```
## /.github/workflows/environment-corrupt-check.yaml
```yaml path="/.github/workflows/environment-corrupt-check.yaml"
name: Environment Corruption Check
on:
push:
branches: ["main"]
paths:
- requirements.txt
pull_request:
branches: ["main"]
paths:
- requirements.txt
concurrency:
group: ${{ github.workflow }}-${{ github.event_name }}-${{ github.ref }}
cancel-in-progress: true
jobs:
test-python-versions:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.11.11", "3.12.8", "3.13.2"]
fail-fast: false
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Upgrade pip
run: |
python -m pip install --upgrade pip
- name: Install dependencies
run: |
pip install -r requirements.txt
```
## /.github/workflows/pr-autodiff.yaml
```yaml path="/.github/workflows/pr-autodiff.yaml"
name: PR Diff Summarization
on:
# pull_request:
# branches: [main]
# types: [opened, ready_for_review, reopened]
issue_comment:
types: [created]
permissions:
contents: read
pull-requests: write
jobs:
pr-diff-summarization:
runs-on: ubuntu-latest
if: |
(github.event_name == 'pull_request') ||
(github.event_name == 'issue_comment' &&
contains(github.event.comment.body, '!pr-diff') &&
(github.event.comment.author_association == 'CONTRIBUTOR' || github.event.comment.author_association == 'COLLABORATOR' || github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'OWNER') &&
github.event.issue.pull_request)
steps:
- name: Get PR head SHA
id: get-pr-sha
run: |
PR_URL="${{ github.event.issue.pull_request.url || github.event.pull_request.url }}"
# https://api.github.com/repos/OpenManus/pulls/1
RESPONSE=$(curl -s -H "Authorization: Bearer ${{ secrets.GITHUB_TOKEN }}" $PR_URL)
SHA=$(echo $RESPONSE | jq -r '.head.sha')
TARGET_BRANCH=$(echo $RESPONSE | jq -r '.base.ref')
echo "pr_sha=$SHA" >> $GITHUB_OUTPUT
echo "target_branch=$TARGET_BRANCH" >> $GITHUB_OUTPUT
echo "Retrieved PR head SHA from API: $SHA, target branch: $TARGET_BRANCH"
- name: Check out code
uses: actions/checkout@v4
with:
ref: ${{ steps.get-pr-sha.outputs.pr_sha }}
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install openai requests
- name: Create and run Python script
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENAI_BASE_URL: ${{ secrets.OPENAI_BASE_URL }}
GH_TOKEN: ${{ github.token }}
PR_NUMBER: ${{ github.event.pull_request.number || github.event.issue.number }}
TARGET_BRANCH: ${{ steps.get-pr-sha.outputs.target_branch }}
run: |-
cat << 'EOF' > /tmp/_workflow_core.py
import os
import subprocess
import json
import requests
from openai import OpenAI
def get_diff():
result = subprocess.run(
['git', 'diff', 'origin/' + os.getenv('TARGET_BRANCH') + '...HEAD'],
capture_output=True, text=True, check=True)
return '\n'.join(
line for line in result.stdout.split('\n')
if any(line.startswith(c) for c in ('+', '-'))
and not line.startswith(('---', '+++'))
)[:round(200000 * 0.4)] # Truncate to prevent overflow
def generate_comment(diff_content):
client = OpenAI(
base_url=os.getenv("OPENAI_BASE_URL"),
api_key=os.getenv("OPENAI_API_KEY")
)
guidelines = '''
1. English version first, Chinese Simplified version after
2. Example format:
# Diff Report
## English
- Added `ABC` class
- Fixed `f()` behavior in `foo` module
### Comments Highlight
- `config.toml` needs to be configured properly to make sure new features work as expected.
### Spelling/Offensive Content Check
- No spelling mistakes or offensive content found in the code or comments.
## 中文(简体)
- 新增了 `ABC` 类
- `foo` 模块中的 `f()` 行为已修复
### 评论高亮
- `config.toml` 需要正确配置才能确保新功能正常运行。
### 内容检查
- 没有发现代码或注释中的拼写错误或不当措辞。
3. Highlight non-English comments
4. Check for spelling/offensive content'''
response = client.chat.completions.create(
model="o3-mini",
messages=[{
"role": "system",
"content": "Generate bilingual code review feedback."
}, {
"role": "user",
"content": f"Review these changes per guidelines:\n{guidelines}\n\nDIFF:\n{diff_content}"
}]
)
return response.choices[0].message.content
def post_comment(comment):
repo = os.getenv("GITHUB_REPOSITORY")
pr_number = os.getenv("PR_NUMBER")
headers = {
"Authorization": f"Bearer {os.getenv('GH_TOKEN')}",
"Accept": "application/vnd.github.v3+json"
}
url = f"https://api.github.com/repos/{repo}/issues/{pr_number}/comments"
requests.post(url, json={"body": comment}, headers=headers)
if __name__ == "__main__":
diff_content = get_diff()
if not diff_content.strip():
print("No meaningful diff detected.")
exit(0)
comment = generate_comment(diff_content)
post_comment(comment)
print("Comment posted successfully.")
EOF
python /tmp/_workflow_core.py
```
## /.github/workflows/pre-commit.yaml
```yaml path="/.github/workflows/pre-commit.yaml"
name: Pre-commit checks
on:
pull_request:
branches:
- '**'
push:
branches:
- '**'
jobs:
pre-commit-check:
runs-on: ubuntu-latest
steps:
- name: Checkout Source Code
uses: actions/checkout@v4
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install pre-commit and tools
run: |
python -m pip install --upgrade pip
pip install pre-commit black==23.1.0 isort==5.12.0 autoflake==2.0.1
- name: Run pre-commit hooks
run: pre-commit run --all-files
```
## /.github/workflows/stale.yaml
```yaml path="/.github/workflows/stale.yaml"
name: Close inactive issues
on:
schedule:
- cron: "5 0 * * *"
jobs:
close-issues:
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: write
steps:
- uses: actions/stale@v9
with:
days-before-issue-stale: 30
days-before-issue-close: 14
stale-issue-label: "inactive"
stale-issue-message: "This issue has been inactive for 30 days. Please comment if you have updates."
close-issue-message: "This issue was closed due to 45 days of inactivity. Reopen if still relevant."
days-before-pr-stale: -1
days-before-pr-close: -1
repo-token: ${{ secrets.GITHUB_TOKEN }}
```
## /.github/workflows/top-issues.yaml
```yaml path="/.github/workflows/top-issues.yaml"
name: Top issues
on:
schedule:
- cron: '0 0/2 * * *'
workflow_dispatch:
jobs:
ShowAndLabelTopIssues:
permissions:
issues: write
pull-requests: write
actions: read
contents: read
name: Display and label top issues
runs-on: ubuntu-latest
if: github.repository == 'mannaandpoem/OpenManus'
steps:
- name: Run top issues action
uses: rickstaa/top-issues-action@7e8dda5d5ae3087670f9094b9724a9a091fc3ba1 # v1.3.101
env:
github_token: ${{ secrets.GITHUB_TOKEN }}
with:
label: true
dashboard: true
dashboard_show_total_reactions: true
top_issues: true
top_features: true
top_bugs: true
top_pull_requests: true
top_list_size: 14
```
## /.gitignore
```gitignore path="/.gitignore"
### Project-specific ###
# Logs
logs/
# Data
data/
# Workspace
workspace/
### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# UV
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
#uv.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/
# PyPI configuration file
.pypirc
### Visual Studio Code ###
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
!.vscode/*.code-snippets
# Local History for Visual Studio Code
.history/
# Built Visual Studio Code Extensions
*.vsix
# OSX
.DS_Store
# node
node_modules
```
## /.pre-commit-config.yaml
```yaml path="/.pre-commit-config.yaml"
repos:
- repo: https://github.com/psf/black
rev: 23.1.0
hooks:
- id: black
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/PyCQA/autoflake
rev: v2.0.1
hooks:
- id: autoflake
args: [
--remove-all-unused-imports,
--ignore-init-module-imports,
--expand-star-imports,
--remove-duplicate-keys,
--remove-unused-variables,
--recursive,
--in-place,
--exclude=__init__.py,
]
files: \.py$
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
args: [
"--profile", "black",
"--filter-files",
"--lines-after-imports=2",
]
```
## /.vscode/extensions.json
```json path="/.vscode/extensions.json"
{
"recommendations": [
"tamasfe.even-better-toml",
"ms-python.black-formatter",
"ms-python.isort"
],
"unwantedRecommendations": []
}
```
## /.vscode/settings.json
```json path="/.vscode/settings.json"
{
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.codeActionsOnSave": {
"source.organizeImports": "always"
}
},
"[toml]": {
"editor.defaultFormatter": "tamasfe.even-better-toml",
},
"pre-commit-helper.runOnSave": "none",
"pre-commit-helper.config": ".pre-commit-config.yaml",
"evenBetterToml.schema.enabled": true,
"evenBetterToml.schema.associations": {
"^.+config[/\\\\].+\\.toml$": "../config/schema.config.json"
},
"files.insertFinalNewline": true,
"files.trimTrailingWhitespace": true,
"editor.formatOnSave": true
}
```
## /CODE_OF_CONDUCT.md
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, caste, color, religion, or sexual
identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people.
* Being respectful of differing opinions, viewpoints, and experiences.
* Giving and gracefully accepting constructive feedback.
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience.
* Focusing on what is best not just for us as individuals, but for the overall
community.
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or advances of
any kind.
* Trolling, insulting or derogatory comments, and personal or political attacks.
* Public or private harassment.
* Publishing others' private information, such as a physical or email address,
without their explicit permission.
* Other conduct which could reasonably be considered inappropriate in a
professional setting.
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official email address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
mannaandpoem@gmail.com
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series of
actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or permanent
ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within the
community.
### Slack and Discord Etiquettes
These Slack and Discord etiquette guidelines are designed to foster an inclusive, respectful, and productive environment
for all community members. By following these best practices, we ensure effective communication and collaboration while
minimizing disruptions. Let’s work together to build a supportive and welcoming community!
- Communicate respectfully and professionally, avoiding sarcasm or harsh language, and remember that tone can be
difficult to interpret in text.
- Use threads for specific discussions to keep channels organized and easier to follow.
- Tag others only when their input is critical or urgent, and use @here, @channel or @everyone sparingly to minimize
disruptions.
- Be patient, as open-source contributors and maintainers often have other commitments and may need time to respond.
- Post questions or discussions in the most relevant
channel ([discord - #general](https://discord.com/channels/1125308739348594758/1138430348557025341)).
- When asking for help or raising issues, include necessary details like links, screenshots, or clear explanations to
provide context.
- Keep discussions in public channels whenever possible to allow others to benefit from the conversation, unless the
matter is sensitive or private.
- Always adhere to [our standards](https://github.com/mannaandpoem/OpenManus/blob/main/CODE_OF_CONDUCT.md#our-standards)
to ensure a welcoming and collaborative environment.
- If you choose to mute a channel, consider setting up alerts for topics that still interest you to stay engaged. For
Slack, Go to Settings → Notifications → My Keywords to add specific keywords that will notify you when mentioned. For
example, if you're here for discussions about LLMs, mute the channel if it’s too busy, but set notifications to alert
you only when “LLMs” appears in messages. Also for Discord, go to the channel notifications and choose the option that
best describes your need.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.1, available at
[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
Community Impact Guidelines were inspired by
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
[https://www.contributor-covenant.org/translations][translations].
[homepage]: https://www.contributor-covenant.org
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
[translations]: https://www.contributor-covenant.org/translations
## /Dockerfile
``` path="/Dockerfile"
FROM python:3.12-slim
WORKDIR /app/OpenManus
RUN apt-get update && apt-get install -y --no-install-recommends git curl \
&& rm -rf /var/lib/apt/lists/* \
&& (command -v uv >/dev/null 2>&1 || pip install --no-cache-dir uv)
COPY . .
RUN uv pip install --system -r requirements.txt
CMD ["bash"]
```
## /LICENSE
``` path="/LICENSE"
MIT License
Copyright (c) 2025 manna_and_poem
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
```
## /README.md
English | [中文](README_zh.md) | [한국어](README_ko.md) | [日本語](README_ja.md)
[](https://github.com/mannaandpoem/OpenManus/stargazers)
[](https://opensource.org/licenses/MIT)
[](https://discord.gg/DYn29wFk9z)
[](https://huggingface.co/spaces/lyh-917/OpenManusDemo)
[](https://doi.org/10.5281/zenodo.15186407)
# 👋 OpenManus
Manus is incredible, but OpenManus can achieve any idea without an *Invite Code* 🛫!
Our team members [@Xinbin Liang](https://github.com/mannaandpoem) and [@Jinyu Xiang](https://github.com/XiangJinyu) (core authors), along with [@Zhaoyang Yu](https://github.com/MoshiQAQ), [@Jiayi Zhang](https://github.com/didiforgithub), and [@Sirui Hong](https://github.com/stellaHSR), we are from [@MetaGPT](https://github.com/geekan/MetaGPT). The prototype is launched within 3 hours and we are keeping building!
It's a simple implementation, so we welcome any suggestions, contributions, and feedback!
Enjoy your own agent with OpenManus!
We're also excited to introduce [OpenManus-RL](https://github.com/OpenManus/OpenManus-RL), an open-source project dedicated to reinforcement learning (RL)- based (such as GRPO) tuning methods for LLM agents, developed collaboratively by researchers from UIUC and OpenManus.
## Project Demo
## Installation
We provide two installation methods. Method 2 (using uv) is recommended for faster installation and better dependency management.
### Method 1: Using conda
1. Create a new conda environment:
```bash
conda create -n open_manus python=3.12
conda activate open_manus
```
2. Clone the repository:
```bash
git clone https://github.com/mannaandpoem/OpenManus.git
cd OpenManus
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
### Method 2: Using uv (Recommended)
1. Install uv (A fast Python package installer and resolver):
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
2. Clone the repository:
```bash
git clone https://github.com/mannaandpoem/OpenManus.git
cd OpenManus
```
3. Create a new virtual environment and activate it:
```bash
uv venv --python 3.12
source .venv/bin/activate # On Unix/macOS
# Or on Windows:
# .venv\Scripts\activate
```
4. Install dependencies:
```bash
uv pip install -r requirements.txt
```
### Browser Automation Tool (Optional)
```bash
playwright install
```
## Configuration
OpenManus requires configuration for the LLM APIs it uses. Follow these steps to set up your configuration:
1. Create a `config.toml` file in the `config` directory (you can copy from the example):
```bash
cp config/config.example.toml config/config.toml
```
2. Edit `config/config.toml` to add your API keys and customize settings:
```toml
# Global LLM configuration
[llm]
model = "gpt-4o"
base_url = "https://api.openai.com/v1"
api_key = "sk-..." # Replace with your actual API key
max_tokens = 4096
temperature = 0.0
# Optional configuration for specific LLM models
[llm.vision]
model = "gpt-4o"
base_url = "https://api.openai.com/v1"
api_key = "sk-..." # Replace with your actual API key
```
## Quick Start
One line for run OpenManus:
```bash
python main.py
```
Then input your idea via terminal!
For MCP tool version, you can run:
```bash
python run_mcp.py
```
For unstable multi-agent version, you also can run:
```bash
python run_flow.py
```
## How to contribute
We welcome any friendly suggestions and helpful contributions! Just create issues or submit pull requests.
Or contact @mannaandpoem via 📧email: mannaandpoem@gmail.com
**Note**: Before submitting a pull request, please use the pre-commit tool to check your changes. Run `pre-commit run --all-files` to execute the checks.
## Community Group
Join our networking group on Feishu and share your experience with other developers!
## Star History
[](https://star-history.com/#mannaandpoem/OpenManus&Date)
## Sponsors
Thanks to [PPIO](https://ppinfra.com/user/register?invited_by=OCPKCN&utm_source=github_openmanus&utm_medium=github_readme&utm_campaign=link) for computing source support.
> PPIO: The most affordable and easily-integrated MaaS and GPU cloud solution.
## Acknowledgement
Thanks to [anthropic-computer-use](https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo)
and [browser-use](https://github.com/browser-use/browser-use) for providing basic support for this project!
Additionally, we are grateful to [AAAJ](https://github.com/metauto-ai/agent-as-a-judge), [MetaGPT](https://github.com/geekan/MetaGPT), [OpenHands](https://github.com/All-Hands-AI/OpenHands) and [SWE-agent](https://github.com/SWE-agent/SWE-agent).
We also thank stepfun(阶跃星辰) for supporting our Hugging Face demo space.
OpenManus is built by contributors from MetaGPT. Huge thanks to this agent community!
## Cite
```bibtex
@misc{openmanus2025,
author = {Xinbin Liang and Jinyu Xiang and Zhaoyang Yu and Jiayi Zhang and Sirui Hong and Sheng Fan and Xiao Tang},
title = {OpenManus: An open-source framework for building general AI agents},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.15186407},
url = {https://doi.org/10.5281/zenodo.15186407},
}
```
## /README_ja.md
## スター履歴
[](https://star-history.com/#mannaandpoem/OpenManus&Date)
## 謝辞
このプロジェクトの基本的なサポートを提供してくれた[anthropic-computer-use](https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo)
と[browser-use](https://github.com/browser-use/browser-use)に感謝します!
さらに、[AAAJ](https://github.com/metauto-ai/agent-as-a-judge)、[MetaGPT](https://github.com/geekan/MetaGPT)、[OpenHands](https://github.com/All-Hands-AI/OpenHands)、[SWE-agent](https://github.com/SWE-agent/SWE-agent)にも感謝します。
また、Hugging Face デモスペースをサポートしてくださった阶跃星辰 (stepfun)にも感謝いたします。
OpenManusはMetaGPTのコントリビューターによって構築されました。このエージェントコミュニティに大きな感謝を!
## 引用
```bibtex
@misc{openmanus2025,
author = {Xinbin Liang and Jinyu Xiang and Zhaoyang Yu and Jiayi Zhang and Sirui Hong and Sheng Fan and Xiao Tang},
title = {OpenManus: An open-source framework for building general AI agents},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.15186407},
url = {https://doi.org/10.5281/zenodo.15186407},
}
```
## /README_ko.md
[English](README.md) | [中文](README_zh.md) | 한국어 | [日本語](README_ja.md)
[](https://github.com/mannaandpoem/OpenManus/stargazers)
[](https://opensource.org/licenses/MIT)
[](https://discord.gg/DYn29wFk9z)
[](https://huggingface.co/spaces/lyh-917/OpenManusDemo)
[](https://doi.org/10.5281/zenodo.15186407)
# 👋 OpenManus
Manus는 놀라운 도구지만, OpenManus는 *초대 코드* 없이도 모든 아이디어를 실현할 수 있습니다! 🛫
우리 팀의 멤버인 [@Xinbin Liang](https://github.com/mannaandpoem)와 [@Jinyu Xiang](https://github.com/XiangJinyu) (핵심 작성자), 그리고 [@Zhaoyang Yu](https://github.com/MoshiQAQ), [@Jiayi Zhang](https://github.com/didiforgithub), [@Sirui Hong](https://github.com/stellaHSR)이 함께 했습니다. 우리는 [@MetaGPT](https://github.com/geekan/MetaGPT)로부터 왔습니다. 프로토타입은 단 3시간 만에 출시되었으며, 계속해서 발전하고 있습니다!
이 프로젝트는 간단한 구현에서 시작되었으며, 여러분의 제안, 기여 및 피드백을 환영합니다!
OpenManus를 통해 여러분만의 에이전트를 즐겨보세요!
또한 [OpenManus-RL](https://github.com/OpenManus/OpenManus-RL)을 소개하게 되어 기쁩니다. OpenManus와 UIUC 연구자들이 공동 개발한 이 오픈소스 프로젝트는 LLM 에이전트에 대해 강화 학습(RL) 기반 (예: GRPO) 튜닝 방법을 제공합니다.
## 프로젝트 데모
## 설치 방법
두 가지 설치 방법을 제공합니다. **방법 2 (uv 사용)** 이 더 빠른 설치와 효율적인 종속성 관리를 위해 권장됩니다.
### 방법 1: conda 사용
1. 새로운 conda 환경을 생성합니다:
```bash
conda create -n open_manus python=3.12
conda activate open_manus
```
2. 저장소를 클론합니다:
```bash
git clone https://github.com/mannaandpoem/OpenManus.git
cd OpenManus
```
3. 종속성을 설치합니다:
```bash
pip install -r requirements.txt
```
### 방법 2: uv 사용 (권장)
1. uv를 설치합니다. (빠른 Python 패키지 설치 및 종속성 관리 도구):
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
2. 저장소를 클론합니다:
```bash
git clone https://github.com/mannaandpoem/OpenManus.git
cd OpenManus
```
3. 새로운 가상 환경을 생성하고 활성화합니다:
```bash
uv venv --python 3.12
source .venv/bin/activate # Unix/macOS의 경우
# Windows의 경우:
# .venv\Scripts\activate
```
4. 종속성을 설치합니다:
```bash
uv pip install -r requirements.txt
```
### 브라우저 자동화 도구 (선택사항)
```bash
playwright install
```
## 설정 방법
OpenManus를 사용하려면 사용하는 LLM API에 대한 설정이 필요합니다. 아래 단계를 따라 설정을 완료하세요:
1. `config` 디렉토리에 `config.toml` 파일을 생성하세요 (예제 파일을 복사하여 사용할 수 있습니다):
```bash
cp config/config.example.toml config/config.toml
```
2. `config/config.toml` 파일을 편집하여 API 키를 추가하고 설정을 커스터마이징하세요:
```toml
# 전역 LLM 설정
[llm]
model = "gpt-4o"
base_url = "https://api.openai.com/v1"
api_key = "sk-..." # 실제 API 키로 변경하세요
max_tokens = 4096
temperature = 0.0
# 특정 LLM 모델에 대한 선택적 설정
[llm.vision]
model = "gpt-4o"
base_url = "https://api.openai.com/v1"
api_key = "sk-..." # 실제 API 키로 변경하세요
```
## 빠른 시작
OpenManus를 실행하는 한 줄 명령어:
```bash
python main.py
```
이후 터미널에서 아이디어를 작성하세요!
MCP 도구 버전을 사용하려면 다음을 실행하세요:
```bash
python run_mcp.py
```
불안정한 멀티 에이전트 버전을 실행하려면 다음을 실행할 수 있습니다:
```bash
python run_flow.py
```
## 기여 방법
모든 친절한 제안과 유용한 기여를 환영합니다! 이슈를 생성하거나 풀 리퀘스트를 제출해 주세요.
또는 📧 메일로 연락주세요. @mannaandpoem : mannaandpoem@gmail.com
**참고**: pull request를 제출하기 전에 pre-commit 도구를 사용하여 변경 사항을 확인하십시오. `pre-commit run --all-files`를 실행하여 검사를 실행합니다.
## 커뮤니티 그룹
Feishu 네트워킹 그룹에 참여하여 다른 개발자들과 경험을 공유하세요!
## Star History
[](https://star-history.com/#mannaandpoem/OpenManus&Date)
## 감사의 글
이 프로젝트에 기본적인 지원을 제공해 주신 [anthropic-computer-use](https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo)와
[browser-use](https://github.com/browser-use/browser-use)에게 감사드립니다!
또한, [AAAJ](https://github.com/metauto-ai/agent-as-a-judge), [MetaGPT](https://github.com/geekan/MetaGPT), [OpenHands](https://github.com/All-Hands-AI/OpenHands), [SWE-agent](https://github.com/SWE-agent/SWE-agent)에 깊은 감사를 드립니다.
또한 Hugging Face 데모 공간을 지원해 주신 阶跃星辰 (stepfun)에게 감사드립니다.
OpenManus는 MetaGPT 기여자들에 의해 개발되었습니다. 이 에이전트 커뮤니티에 깊은 감사를 전합니다!
## 인용
```bibtex
@misc{openmanus2025,
author = {Xinbin Liang and Jinyu Xiang and Zhaoyang Yu and Jiayi Zhang and Sirui Hong and Sheng Fan and Xiao Tang},
title = {OpenManus: An open-source framework for building general AI agents},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.15186407},
url = {https://doi.org/10.5281/zenodo.15186407},
}
```
## /README_zh.md
## Star 数量
[](https://star-history.com/#mannaandpoem/OpenManus&Date)
## 赞助商
感谢[PPIO](https://ppinfra.com/user/register?invited_by=OCPKCN&utm_source=github_openmanus&utm_medium=github_readme&utm_campaign=link) 提供的算力支持。
> PPIO派欧云:一键调用高性价比的开源模型API和GPU容器
## 致谢
特别感谢 [anthropic-computer-use](https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo)
和 [browser-use](https://github.com/browser-use/browser-use) 为本项目提供的基础支持!
此外,我们感谢 [AAAJ](https://github.com/metauto-ai/agent-as-a-judge),[MetaGPT](https://github.com/geekan/MetaGPT),[OpenHands](https://github.com/All-Hands-AI/OpenHands) 和 [SWE-agent](https://github.com/SWE-agent/SWE-agent).
我们也感谢阶跃星辰 (stepfun) 提供的 Hugging Face 演示空间支持。
OpenManus 由 MetaGPT 社区的贡献者共同构建,感谢这个充满活力的智能体开发者社区!
## 引用
```bibtex
@misc{openmanus2025,
author = {Xinbin Liang and Jinyu Xiang and Zhaoyang Yu and Jiayi Zhang and Sirui Hong and Sheng Fan and Xiao Tang},
title = {OpenManus: An open-source framework for building general AI agents},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.15186407},
url = {https://doi.org/10.5281/zenodo.15186407},
}
```
## /app/__init__.py
```py path="/app/__init__.py"
# Python version check: 3.11-3.13
import sys
if sys.version_info < (3, 11) or sys.version_info > (3, 13):
print(
"Warning: Unsupported Python version {ver}, please use 3.11-3.13".format(
ver=".".join(map(str, sys.version_info))
)
)
```
## /app/agent/__init__.py
```py path="/app/agent/__init__.py"
from app.agent.base import BaseAgent
from app.agent.browser import BrowserAgent
from app.agent.mcp import MCPAgent
from app.agent.react import ReActAgent
from app.agent.swe import SWEAgent
from app.agent.toolcall import ToolCallAgent
__all__ = [
"BaseAgent",
"BrowserAgent",
"ReActAgent",
"SWEAgent",
"ToolCallAgent",
"MCPAgent",
]
```
## /app/agent/base.py
```py path="/app/agent/base.py"
from abc import ABC, abstractmethod
from contextlib import asynccontextmanager
from typing import List, Optional
from pydantic import BaseModel, Field, model_validator
from app.llm import LLM
from app.logger import logger
from app.sandbox.client import SANDBOX_CLIENT
from app.schema import ROLE_TYPE, AgentState, Memory, Message
class BaseAgent(BaseModel, ABC):
"""Abstract base class for managing agent state and execution.
Provides foundational functionality for state transitions, memory management,
and a step-based execution loop. Subclasses must implement the `step` method.
"""
# Core attributes
name: str = Field(..., description="Unique name of the agent")
description: Optional[str] = Field(None, description="Optional agent description")
# Prompts
system_prompt: Optional[str] = Field(
None, description="System-level instruction prompt"
)
next_step_prompt: Optional[str] = Field(
None, description="Prompt for determining next action"
)
# Dependencies
llm: LLM = Field(default_factory=LLM, description="Language model instance")
memory: Memory = Field(default_factory=Memory, description="Agent's memory store")
state: AgentState = Field(
default=AgentState.IDLE, description="Current agent state"
)
# Execution control
max_steps: int = Field(default=10, description="Maximum steps before termination")
current_step: int = Field(default=0, description="Current step in execution")
duplicate_threshold: int = 2
class Config:
arbitrary_types_allowed = True
extra = "allow" # Allow extra fields for flexibility in subclasses
@model_validator(mode="after")
def initialize_agent(self) -> "BaseAgent":
"""Initialize agent with default settings if not provided."""
if self.llm is None or not isinstance(self.llm, LLM):
self.llm = LLM(config_name=self.name.lower())
if not isinstance(self.memory, Memory):
self.memory = Memory()
return self
@asynccontextmanager
async def state_context(self, new_state: AgentState):
"""Context manager for safe agent state transitions.
Args:
new_state: The state to transition to during the context.
Yields:
None: Allows execution within the new state.
Raises:
ValueError: If the new_state is invalid.
"""
if not isinstance(new_state, AgentState):
raise ValueError(f"Invalid state: {new_state}")
previous_state = self.state
self.state = new_state
try:
yield
except Exception as e:
self.state = AgentState.ERROR # Transition to ERROR on failure
raise e
finally:
self.state = previous_state # Revert to previous state
def update_memory(
self,
role: ROLE_TYPE, # type: ignore
content: str,
base64_image: Optional[str] = None,
**kwargs,
) -> None:
"""Add a message to the agent's memory.
Args:
role: The role of the message sender (user, system, assistant, tool).
content: The message content.
base64_image: Optional base64 encoded image.
**kwargs: Additional arguments (e.g., tool_call_id for tool messages).
Raises:
ValueError: If the role is unsupported.
"""
message_map = {
"user": Message.user_message,
"system": Message.system_message,
"assistant": Message.assistant_message,
"tool": lambda content, **kw: Message.tool_message(content, **kw),
}
if role not in message_map:
raise ValueError(f"Unsupported message role: {role}")
# Create message with appropriate parameters based on role
kwargs = {"base64_image": base64_image, **(kwargs if role == "tool" else {})}
self.memory.add_message(message_map[role](content, **kwargs))
async def run(self, request: Optional[str] = None) -> str:
"""Execute the agent's main loop asynchronously.
Args:
request: Optional initial user request to process.
Returns:
A string summarizing the execution results.
Raises:
RuntimeError: If the agent is not in IDLE state at start.
"""
if self.state != AgentState.IDLE:
raise RuntimeError(f"Cannot run agent from state: {self.state}")
if request:
self.update_memory("user", request)
results: List[str] = []
async with self.state_context(AgentState.RUNNING):
while (
self.current_step < self.max_steps and self.state != AgentState.FINISHED
):
self.current_step += 1
logger.info(f"Executing step {self.current_step}/{self.max_steps}")
step_result = await self.step()
# Check for stuck state
if self.is_stuck():
self.handle_stuck_state()
results.append(f"Step {self.current_step}: {step_result}")
if self.current_step >= self.max_steps:
self.current_step = 0
self.state = AgentState.IDLE
results.append(f"Terminated: Reached max steps ({self.max_steps})")
await SANDBOX_CLIENT.cleanup()
return "\n".join(results) if results else "No steps executed"
@abstractmethod
async def step(self) -> str:
"""Execute a single step in the agent's workflow.
Must be implemented by subclasses to define specific behavior.
"""
def handle_stuck_state(self):
"""Handle stuck state by adding a prompt to change strategy"""
stuck_prompt = "\
Observed duplicate responses. Consider new strategies and avoid repeating ineffective paths already attempted."
self.next_step_prompt = f"{stuck_prompt}\n{self.next_step_prompt}"
logger.warning(f"Agent detected stuck state. Added prompt: {stuck_prompt}")
def is_stuck(self) -> bool:
"""Check if the agent is stuck in a loop by detecting duplicate content"""
if len(self.memory.messages) < 2:
return False
last_message = self.memory.messages[-1]
if not last_message.content:
return False
# Count identical content occurrences
duplicate_count = sum(
1
for msg in reversed(self.memory.messages[:-1])
if msg.role == "assistant" and msg.content == last_message.content
)
return duplicate_count >= self.duplicate_threshold
@property
def messages(self) -> List[Message]:
"""Retrieve a list of messages from the agent's memory."""
return self.memory.messages
@messages.setter
def messages(self, value: List[Message]):
"""Set the list of messages in the agent's memory."""
self.memory.messages = value
```
## /app/agent/browser.py
```py path="/app/agent/browser.py"
import json
from typing import TYPE_CHECKING, Optional
from pydantic import Field, model_validator
from app.agent.toolcall import ToolCallAgent
from app.logger import logger
from app.prompt.browser import NEXT_STEP_PROMPT, SYSTEM_PROMPT
from app.schema import Message, ToolChoice
from app.tool import BrowserUseTool, Terminate, ToolCollection
# Avoid circular import if BrowserAgent needs BrowserContextHelper
if TYPE_CHECKING:
from app.agent.base import BaseAgent # Or wherever memory is defined
class BrowserContextHelper:
def __init__(self, agent: "BaseAgent"):
self.agent = agent
self._current_base64_image: Optional[str] = None
async def get_browser_state(self) -> Optional[dict]:
browser_tool = self.agent.available_tools.get_tool(BrowserUseTool().name)
if not browser_tool or not hasattr(browser_tool, "get_current_state"):
logger.warning("BrowserUseTool not found or doesn't have get_current_state")
return None
try:
result = await browser_tool.get_current_state()
if result.error:
logger.debug(f"Browser state error: {result.error}")
return None
if hasattr(result, "base64_image") and result.base64_image:
self._current_base64_image = result.base64_image
else:
self._current_base64_image = None
return json.loads(result.output)
except Exception as e:
logger.debug(f"Failed to get browser state: {str(e)}")
return None
async def format_next_step_prompt(self) -> str:
"""Gets browser state and formats the browser prompt."""
browser_state = await self.get_browser_state()
url_info, tabs_info, content_above_info, content_below_info = "", "", "", ""
results_info = "" # Or get from agent if needed elsewhere
if browser_state and not browser_state.get("error"):
url_info = f"\n URL: {browser_state.get('url', 'N/A')}\n Title: {browser_state.get('title', 'N/A')}"
tabs = browser_state.get("tabs", [])
if tabs:
tabs_info = f"\n {len(tabs)} tab(s) available"
pixels_above = browser_state.get("pixels_above", 0)
pixels_below = browser_state.get("pixels_below", 0)
if pixels_above > 0:
content_above_info = f" ({pixels_above} pixels)"
if pixels_below > 0:
content_below_info = f" ({pixels_below} pixels)"
if self._current_base64_image:
image_message = Message.user_message(
content="Current browser screenshot:",
base64_image=self._current_base64_image,
)
self.agent.memory.add_message(image_message)
self._current_base64_image = None # Consume the image after adding
return NEXT_STEP_PROMPT.format(
url_placeholder=url_info,
tabs_placeholder=tabs_info,
content_above_placeholder=content_above_info,
content_below_placeholder=content_below_info,
results_placeholder=results_info,
)
async def cleanup_browser(self):
browser_tool = self.agent.available_tools.get_tool(BrowserUseTool().name)
if browser_tool and hasattr(browser_tool, "cleanup"):
await browser_tool.cleanup()
class BrowserAgent(ToolCallAgent):
"""
A browser agent that uses the browser_use library to control a browser.
This agent can navigate web pages, interact with elements, fill forms,
extract content, and perform other browser-based actions to accomplish tasks.
"""
name: str = "browser"
description: str = "A browser agent that can control a browser to accomplish tasks"
system_prompt: str = SYSTEM_PROMPT
next_step_prompt: str = NEXT_STEP_PROMPT
max_observe: int = 10000
max_steps: int = 20
# Configure the available tools
available_tools: ToolCollection = Field(
default_factory=lambda: ToolCollection(BrowserUseTool(), Terminate())
)
# Use Auto for tool choice to allow both tool usage and free-form responses
tool_choices: ToolChoice = ToolChoice.AUTO
special_tool_names: list[str] = Field(default_factory=lambda: [Terminate().name])
browser_context_helper: Optional[BrowserContextHelper] = None
@model_validator(mode="after")
def initialize_helper(self) -> "BrowserAgent":
self.browser_context_helper = BrowserContextHelper(self)
return self
async def think(self) -> bool:
"""Process current state and decide next actions using tools, with browser state info added"""
self.next_step_prompt = (
await self.browser_context_helper.format_next_step_prompt()
)
return await super().think()
async def cleanup(self):
"""Clean up browser agent resources by calling parent cleanup."""
await self.browser_context_helper.cleanup_browser()
```
## /app/agent/data_analysis.py
```py path="/app/agent/data_analysis.py"
from pydantic import Field
from app.agent.toolcall import ToolCallAgent
from app.config import config
from app.prompt.visualization import NEXT_STEP_PROMPT, SYSTEM_PROMPT
from app.tool import Terminate, ToolCollection
from app.tool.chart_visualization.chart_prepare import VisualizationPrepare
from app.tool.chart_visualization.data_visualization import DataVisualization
from app.tool.chart_visualization.python_execute import NormalPythonExecute
class DataAnalysis(ToolCallAgent):
"""
A data analysis agent that uses planning to solve various data analysis tasks.
This agent extends ToolCallAgent with a comprehensive set of tools and capabilities,
including Data Analysis, Chart Visualization, Data Report.
"""
name: str = "DataAnalysis"
description: str = "An analytical agent that utilizes multiple tools to solve diverse data analysis tasks"
system_prompt: str = SYSTEM_PROMPT.format(directory=config.workspace_root)
next_step_prompt: str = NEXT_STEP_PROMPT
max_observe: int = 15000
max_steps: int = 20
# Add general-purpose tools to the tool collection
available_tools: ToolCollection = Field(
default_factory=lambda: ToolCollection(
NormalPythonExecute(),
VisualizationPrepare(),
DataVisualization(),
Terminate(),
)
)
```
## /app/agent/manus.py
```py path="/app/agent/manus.py"
from typing import Dict, List, Optional
from pydantic import Field, model_validator
from app.agent.browser import BrowserContextHelper
from app.agent.toolcall import ToolCallAgent
from app.config import config
from app.logger import logger
from app.prompt.manus import NEXT_STEP_PROMPT, SYSTEM_PROMPT
from app.tool import Terminate, ToolCollection
from app.tool.ask_human import AskHuman
from app.tool.browser_use_tool import BrowserUseTool
from app.tool.mcp import MCPClients, MCPClientTool
from app.tool.python_execute import PythonExecute
from app.tool.str_replace_editor import StrReplaceEditor
class Manus(ToolCallAgent):
"""A versatile general-purpose agent with support for both local and MCP tools."""
name: str = "Manus"
description: str = "A versatile agent that can solve various tasks using multiple tools including MCP-based tools"
system_prompt: str = SYSTEM_PROMPT.format(directory=config.workspace_root)
next_step_prompt: str = NEXT_STEP_PROMPT
max_observe: int = 10000
max_steps: int = 20
# MCP clients for remote tool access
mcp_clients: MCPClients = Field(default_factory=MCPClients)
# Add general-purpose tools to the tool collection
available_tools: ToolCollection = Field(
default_factory=lambda: ToolCollection(
PythonExecute(),
BrowserUseTool(),
StrReplaceEditor(),
AskHuman(),
Terminate(),
)
)
special_tool_names: list[str] = Field(default_factory=lambda: [Terminate().name])
browser_context_helper: Optional[BrowserContextHelper] = None
# Track connected MCP servers
connected_servers: Dict[str, str] = Field(
default_factory=dict
) # server_id -> url/command
_initialized: bool = False
@model_validator(mode="after")
def initialize_helper(self) -> "Manus":
"""Initialize basic components synchronously."""
self.browser_context_helper = BrowserContextHelper(self)
return self
@classmethod
async def create(cls, **kwargs) -> "Manus":
"""Factory method to create and properly initialize a Manus instance."""
instance = cls(**kwargs)
await instance.initialize_mcp_servers()
instance._initialized = True
return instance
async def initialize_mcp_servers(self) -> None:
"""Initialize connections to configured MCP servers."""
for server_id, server_config in config.mcp_config.servers.items():
try:
if server_config.type == "sse":
if server_config.url:
await self.connect_mcp_server(server_config.url, server_id)
logger.info(
f"Connected to MCP server {server_id} at {server_config.url}"
)
elif server_config.type == "stdio":
if server_config.command:
await self.connect_mcp_server(
server_config.command,
server_id,
use_stdio=True,
stdio_args=server_config.args,
)
logger.info(
f"Connected to MCP server {server_id} using command {server_config.command}"
)
except Exception as e:
logger.error(f"Failed to connect to MCP server {server_id}: {e}")
async def connect_mcp_server(
self,
server_url: str,
server_id: str = "",
use_stdio: bool = False,
stdio_args: List[str] = None,
) -> None:
"""Connect to an MCP server and add its tools."""
if use_stdio:
await self.mcp_clients.connect_stdio(
server_url, stdio_args or [], server_id
)
self.connected_servers[server_id or server_url] = server_url
else:
await self.mcp_clients.connect_sse(server_url, server_id)
self.connected_servers[server_id or server_url] = server_url
# Update available tools with only the new tools from this server
new_tools = [
tool for tool in self.mcp_clients.tools if tool.server_id == server_id
]
self.available_tools.add_tools(*new_tools)
async def disconnect_mcp_server(self, server_id: str = "") -> None:
"""Disconnect from an MCP server and remove its tools."""
await self.mcp_clients.disconnect(server_id)
if server_id:
self.connected_servers.pop(server_id, None)
else:
self.connected_servers.clear()
# Rebuild available tools without the disconnected server's tools
base_tools = [
tool
for tool in self.available_tools.tools
if not isinstance(tool, MCPClientTool)
]
self.available_tools = ToolCollection(*base_tools)
self.available_tools.add_tools(*self.mcp_clients.tools)
async def cleanup(self):
"""Clean up Manus agent resources."""
if self.browser_context_helper:
await self.browser_context_helper.cleanup_browser()
# Disconnect from all MCP servers only if we were initialized
if self._initialized:
await self.disconnect_mcp_server()
self._initialized = False
async def think(self) -> bool:
"""Process current state and decide next actions with appropriate context."""
if not self._initialized:
await self.initialize_mcp_servers()
self._initialized = True
original_prompt = self.next_step_prompt
recent_messages = self.memory.messages[-3:] if self.memory.messages else []
browser_in_use = any(
tc.function.name == BrowserUseTool().name
for msg in recent_messages
if msg.tool_calls
for tc in msg.tool_calls
)
if browser_in_use:
self.next_step_prompt = (
await self.browser_context_helper.format_next_step_prompt()
)
result = await super().think()
# Restore original prompt
self.next_step_prompt = original_prompt
return result
```
## /app/agent/mcp.py
```py path="/app/agent/mcp.py"
from typing import Any, Dict, List, Optional, Tuple
from pydantic import Field
from app.agent.toolcall import ToolCallAgent
from app.logger import logger
from app.prompt.mcp import MULTIMEDIA_RESPONSE_PROMPT, NEXT_STEP_PROMPT, SYSTEM_PROMPT
from app.schema import AgentState, Message
from app.tool.base import ToolResult
from app.tool.mcp import MCPClients
class MCPAgent(ToolCallAgent):
"""Agent for interacting with MCP (Model Context Protocol) servers.
This agent connects to an MCP server using either SSE or stdio transport
and makes the server's tools available through the agent's tool interface.
"""
name: str = "mcp_agent"
description: str = "An agent that connects to an MCP server and uses its tools."
system_prompt: str = SYSTEM_PROMPT
next_step_prompt: str = NEXT_STEP_PROMPT
# Initialize MCP tool collection
mcp_clients: MCPClients = Field(default_factory=MCPClients)
available_tools: MCPClients = None # Will be set in initialize()
max_steps: int = 20
connection_type: str = "stdio" # "stdio" or "sse"
# Track tool schemas to detect changes
tool_schemas: Dict[str, Dict[str, Any]] = Field(default_factory=dict)
_refresh_tools_interval: int = 5 # Refresh tools every N steps
# Special tool names that should trigger termination
special_tool_names: List[str] = Field(default_factory=lambda: ["terminate"])
async def initialize(
self,
connection_type: Optional[str] = None,
server_url: Optional[str] = None,
command: Optional[str] = None,
args: Optional[List[str]] = None,
) -> None:
"""Initialize the MCP connection.
Args:
connection_type: Type of connection to use ("stdio" or "sse")
server_url: URL of the MCP server (for SSE connection)
command: Command to run (for stdio connection)
args: Arguments for the command (for stdio connection)
"""
if connection_type:
self.connection_type = connection_type
# Connect to the MCP server based on connection type
if self.connection_type == "sse":
if not server_url:
raise ValueError("Server URL is required for SSE connection")
await self.mcp_clients.connect_sse(server_url=server_url)
elif self.connection_type == "stdio":
if not command:
raise ValueError("Command is required for stdio connection")
await self.mcp_clients.connect_stdio(command=command, args=args or [])
else:
raise ValueError(f"Unsupported connection type: {self.connection_type}")
# Set available_tools to our MCP instance
self.available_tools = self.mcp_clients
# Store initial tool schemas
await self._refresh_tools()
# Add system message about available tools
tool_names = list(self.mcp_clients.tool_map.keys())
tools_info = ", ".join(tool_names)
# Add system prompt and available tools information
self.memory.add_message(
Message.system_message(
f"{self.system_prompt}\n\nAvailable MCP tools: {tools_info}"
)
)
async def _refresh_tools(self) -> Tuple[List[str], List[str]]:
"""Refresh the list of available tools from the MCP server.
Returns:
A tuple of (added_tools, removed_tools)
"""
if not self.mcp_clients.session:
return [], []
# Get current tool schemas directly from the server
response = await self.mcp_clients.session.list_tools()
current_tools = {tool.name: tool.inputSchema for tool in response.tools}
# Determine added, removed, and changed tools
current_names = set(current_tools.keys())
previous_names = set(self.tool_schemas.keys())
added_tools = list(current_names - previous_names)
removed_tools = list(previous_names - current_names)
# Check for schema changes in existing tools
changed_tools = []
for name in current_names.intersection(previous_names):
if current_tools[name] != self.tool_schemas.get(name):
changed_tools.append(name)
# Update stored schemas
self.tool_schemas = current_tools
# Log and notify about changes
if added_tools:
logger.info(f"Added MCP tools: {added_tools}")
self.memory.add_message(
Message.system_message(f"New tools available: {', '.join(added_tools)}")
)
if removed_tools:
logger.info(f"Removed MCP tools: {removed_tools}")
self.memory.add_message(
Message.system_message(
f"Tools no longer available: {', '.join(removed_tools)}"
)
)
if changed_tools:
logger.info(f"Changed MCP tools: {changed_tools}")
return added_tools, removed_tools
async def think(self) -> bool:
"""Process current state and decide next action."""
# Check MCP session and tools availability
if not self.mcp_clients.session or not self.mcp_clients.tool_map:
logger.info("MCP service is no longer available, ending interaction")
self.state = AgentState.FINISHED
return False
# Refresh tools periodically
if self.current_step % self._refresh_tools_interval == 0:
await self._refresh_tools()
# All tools removed indicates shutdown
if not self.mcp_clients.tool_map:
logger.info("MCP service has shut down, ending interaction")
self.state = AgentState.FINISHED
return False
# Use the parent class's think method
return await super().think()
async def _handle_special_tool(self, name: str, result: Any, **kwargs) -> None:
"""Handle special tool execution and state changes"""
# First process with parent handler
await super()._handle_special_tool(name, result, **kwargs)
# Handle multimedia responses
if isinstance(result, ToolResult) and result.base64_image:
self.memory.add_message(
Message.system_message(
MULTIMEDIA_RESPONSE_PROMPT.format(tool_name=name)
)
)
def _should_finish_execution(self, name: str, **kwargs) -> bool:
"""Determine if tool execution should finish the agent"""
# Terminate if the tool name is 'terminate'
return name.lower() == "terminate"
async def cleanup(self) -> None:
"""Clean up MCP connection when done."""
if self.mcp_clients.session:
await self.mcp_clients.disconnect()
logger.info("MCP connection closed")
async def run(self, request: Optional[str] = None) -> str:
"""Run the agent with cleanup when done."""
try:
result = await super().run(request)
return result
finally:
# Ensure cleanup happens even if there's an error
await self.cleanup()
```
## /app/agent/react.py
```py path="/app/agent/react.py"
from abc import ABC, abstractmethod
from typing import Optional
from pydantic import Field
from app.agent.base import BaseAgent
from app.llm import LLM
from app.schema import AgentState, Memory
class ReActAgent(BaseAgent, ABC):
name: str
description: Optional[str] = None
system_prompt: Optional[str] = None
next_step_prompt: Optional[str] = None
llm: Optional[LLM] = Field(default_factory=LLM)
memory: Memory = Field(default_factory=Memory)
state: AgentState = AgentState.IDLE
max_steps: int = 10
current_step: int = 0
@abstractmethod
async def think(self) -> bool:
"""Process current state and decide next action"""
@abstractmethod
async def act(self) -> str:
"""Execute decided actions"""
async def step(self) -> str:
"""Execute a single step: think and act."""
should_act = await self.think()
if not should_act:
return "Thinking complete - no action needed"
return await self.act()
```
## /app/agent/swe.py
```py path="/app/agent/swe.py"
from typing import List
from pydantic import Field
from app.agent.toolcall import ToolCallAgent
from app.prompt.swe import SYSTEM_PROMPT
from app.tool import Bash, StrReplaceEditor, Terminate, ToolCollection
class SWEAgent(ToolCallAgent):
"""An agent that implements the SWEAgent paradigm for executing code and natural conversations."""
name: str = "swe"
description: str = "an autonomous AI programmer that interacts directly with the computer to solve tasks."
system_prompt: str = SYSTEM_PROMPT
next_step_prompt: str = ""
available_tools: ToolCollection = ToolCollection(
Bash(), StrReplaceEditor(), Terminate()
)
special_tool_names: List[str] = Field(default_factory=lambda: [Terminate().name])
max_steps: int = 20
```
## /app/agent/toolcall.py
```py path="/app/agent/toolcall.py"
import asyncio
import json
from typing import Any, List, Optional, Union
from pydantic import Field
from app.agent.react import ReActAgent
from app.exceptions import TokenLimitExceeded
from app.logger import logger
from app.prompt.toolcall import NEXT_STEP_PROMPT, SYSTEM_PROMPT
from app.schema import TOOL_CHOICE_TYPE, AgentState, Message, ToolCall, ToolChoice
from app.tool import CreateChatCompletion, Terminate, ToolCollection
TOOL_CALL_REQUIRED = "Tool calls required but none provided"
class ToolCallAgent(ReActAgent):
"""Base agent class for handling tool/function calls with enhanced abstraction"""
name: str = "toolcall"
description: str = "an agent that can execute tool calls."
system_prompt: str = SYSTEM_PROMPT
next_step_prompt: str = NEXT_STEP_PROMPT
available_tools: ToolCollection = ToolCollection(
CreateChatCompletion(), Terminate()
)
tool_choices: TOOL_CHOICE_TYPE = ToolChoice.AUTO # type: ignore
special_tool_names: List[str] = Field(default_factory=lambda: [Terminate().name])
tool_calls: List[ToolCall] = Field(default_factory=list)
_current_base64_image: Optional[str] = None
max_steps: int = 30
max_observe: Optional[Union[int, bool]] = None
async def think(self) -> bool:
"""Process current state and decide next actions using tools"""
if self.next_step_prompt:
user_msg = Message.user_message(self.next_step_prompt)
self.messages += [user_msg]
try:
# Get response with tool options
response = await self.llm.ask_tool(
messages=self.messages,
system_msgs=(
[Message.system_message(self.system_prompt)]
if self.system_prompt
else None
),
tools=self.available_tools.to_params(),
tool_choice=self.tool_choices,
)
except ValueError:
raise
except Exception as e:
# Check if this is a RetryError containing TokenLimitExceeded
if hasattr(e, "__cause__") and isinstance(e.__cause__, TokenLimitExceeded):
token_limit_error = e.__cause__
logger.error(
f"🚨 Token limit error (from RetryError): {token_limit_error}"
)
self.memory.add_message(
Message.assistant_message(
f"Maximum token limit reached, cannot continue execution: {str(token_limit_error)}"
)
)
self.state = AgentState.FINISHED
return False
raise
self.tool_calls = tool_calls = (
response.tool_calls if response and response.tool_calls else []
)
content = response.content if response and response.content else ""
# Log response info
logger.info(f"✨ {self.name}'s thoughts: {content}")
logger.info(
f"🛠️ {self.name} selected {len(tool_calls) if tool_calls else 0} tools to use"
)
if tool_calls:
logger.info(
f"🧰 Tools being prepared: {[call.function.name for call in tool_calls]}"
)
logger.info(f"🔧 Tool arguments: {tool_calls[0].function.arguments}")
try:
if response is None:
raise RuntimeError("No response received from the LLM")
# Handle different tool_choices modes
if self.tool_choices == ToolChoice.NONE:
if tool_calls:
logger.warning(
f"🤔 Hmm, {self.name} tried to use tools when they weren't available!"
)
if content:
self.memory.add_message(Message.assistant_message(content))
return True
return False
# Create and add assistant message
assistant_msg = (
Message.from_tool_calls(content=content, tool_calls=self.tool_calls)
if self.tool_calls
else Message.assistant_message(content)
)
self.memory.add_message(assistant_msg)
if self.tool_choices == ToolChoice.REQUIRED and not self.tool_calls:
return True # Will be handled in act()
# For 'auto' mode, continue with content if no commands but content exists
if self.tool_choices == ToolChoice.AUTO and not self.tool_calls:
return bool(content)
return bool(self.tool_calls)
except Exception as e:
logger.error(f"🚨 Oops! The {self.name}'s thinking process hit a snag: {e}")
self.memory.add_message(
Message.assistant_message(
f"Error encountered while processing: {str(e)}"
)
)
return False
async def act(self) -> str:
"""Execute tool calls and handle their results"""
if not self.tool_calls:
if self.tool_choices == ToolChoice.REQUIRED:
raise ValueError(TOOL_CALL_REQUIRED)
# Return last message content if no tool calls
return self.messages[-1].content or "No content or commands to execute"
results = []
for command in self.tool_calls:
# Reset base64_image for each tool call
self._current_base64_image = None
result = await self.execute_tool(command)
if self.max_observe:
result = result[: self.max_observe]
logger.info(
f"🎯 Tool '{command.function.name}' completed its mission! Result: {result}"
)
# Add tool response to memory
tool_msg = Message.tool_message(
content=result,
tool_call_id=command.id,
name=command.function.name,
base64_image=self._current_base64_image,
)
self.memory.add_message(tool_msg)
results.append(result)
return "\n\n".join(results)
async def execute_tool(self, command: ToolCall) -> str:
"""Execute a single tool call with robust error handling"""
if not command or not command.function or not command.function.name:
return "Error: Invalid command format"
name = command.function.name
if name not in self.available_tools.tool_map:
return f"Error: Unknown tool '{name}'"
try:
# Parse arguments
args = json.loads(command.function.arguments or "{}")
# Execute the tool
logger.info(f"🔧 Activating tool: '{name}'...")
result = await self.available_tools.execute(name=name, tool_input=args)
# Handle special tools
await self._handle_special_tool(name=name, result=result)
# Check if result is a ToolResult with base64_image
if hasattr(result, "base64_image") and result.base64_image:
# Store the base64_image for later use in tool_message
self._current_base64_image = result.base64_image
# Format result for display (standard case)
observation = (
f"Observed output of cmd `{name}` executed:\n{str(result)}"
if result
else f"Cmd `{name}` completed with no output"
)
return observation
except json.JSONDecodeError:
error_msg = f"Error parsing arguments for {name}: Invalid JSON format"
logger.error(
f"📝 Oops! The arguments for '{name}' don't make sense - invalid JSON, arguments:{command.function.arguments}"
)
return f"Error: {error_msg}"
except Exception as e:
error_msg = f"⚠️ Tool '{name}' encountered a problem: {str(e)}"
logger.exception(error_msg)
return f"Error: {error_msg}"
async def _handle_special_tool(self, name: str, result: Any, **kwargs):
"""Handle special tool execution and state changes"""
if not self._is_special_tool(name):
return
if self._should_finish_execution(name=name, result=result, **kwargs):
# Set agent state to finished
logger.info(f"🏁 Special tool '{name}' has completed the task!")
self.state = AgentState.FINISHED
@staticmethod
def _should_finish_execution(**kwargs) -> bool:
"""Determine if tool execution should finish the agent"""
return True
def _is_special_tool(self, name: str) -> bool:
"""Check if tool name is in special tools list"""
return name.lower() in [n.lower() for n in self.special_tool_names]
async def cleanup(self):
"""Clean up resources used by the agent's tools."""
logger.info(f"🧹 Cleaning up resources for agent '{self.name}'...")
for tool_name, tool_instance in self.available_tools.tool_map.items():
if hasattr(tool_instance, "cleanup") and asyncio.iscoroutinefunction(
tool_instance.cleanup
):
try:
logger.debug(f"🧼 Cleaning up tool: {tool_name}")
await tool_instance.cleanup()
except Exception as e:
logger.error(
f"🚨 Error cleaning up tool '{tool_name}': {e}", exc_info=True
)
logger.info(f"✨ Cleanup complete for agent '{self.name}'.")
async def run(self, request: Optional[str] = None) -> str:
"""Run the agent with cleanup when done."""
try:
return await super().run(request)
finally:
await self.cleanup()
```
## /app/bedrock.py
```py path="/app/bedrock.py"
import json
import sys
import time
import uuid
from datetime import datetime
from typing import Dict, List, Literal, Optional
import boto3
# Global variables to track the current tool use ID across function calls
# Tmp solution
CURRENT_TOOLUSE_ID = None
# Class to handle OpenAI-style response formatting
class OpenAIResponse:
def __init__(self, data):
# Recursively convert nested dicts and lists to OpenAIResponse objects
for key, value in data.items():
if isinstance(value, dict):
value = OpenAIResponse(value)
elif isinstance(value, list):
value = [
OpenAIResponse(item) if isinstance(item, dict) else item
for item in value
]
setattr(self, key, value)
def model_dump(self, *args, **kwargs):
# Convert object to dict and add timestamp
data = self.__dict__
data["created_at"] = datetime.now().isoformat()
return data
# Main client class for interacting with Amazon Bedrock
class BedrockClient:
def __init__(self):
# Initialize Bedrock client, you need to configure AWS env first
try:
self.client = boto3.client("bedrock-runtime")
self.chat = Chat(self.client)
except Exception as e:
print(f"Error initializing Bedrock client: {e}")
sys.exit(1)
# Chat interface class
class Chat:
def __init__(self, client):
self.completions = ChatCompletions(client)
# Core class handling chat completions functionality
class ChatCompletions:
def __init__(self, client):
self.client = client
def _convert_openai_tools_to_bedrock_format(self, tools):
# Convert OpenAI function calling format to Bedrock tool format
bedrock_tools = []
for tool in tools:
if tool.get("type") == "function":
function = tool.get("function", {})
bedrock_tool = {
"toolSpec": {
"name": function.get("name", ""),
"description": function.get("description", ""),
"inputSchema": {
"json": {
"type": "object",
"properties": function.get("parameters", {}).get(
"properties", {}
),
"required": function.get("parameters", {}).get(
"required", []
),
}
},
}
}
bedrock_tools.append(bedrock_tool)
return bedrock_tools
def _convert_openai_messages_to_bedrock_format(self, messages):
# Convert OpenAI message format to Bedrock message format
bedrock_messages = []
system_prompt = []
for message in messages:
if message.get("role") == "system":
system_prompt = [{"text": message.get("content")}]
elif message.get("role") == "user":
bedrock_message = {
"role": message.get("role", "user"),
"content": [{"text": message.get("content")}],
}
bedrock_messages.append(bedrock_message)
elif message.get("role") == "assistant":
bedrock_message = {
"role": "assistant",
"content": [{"text": message.get("content")}],
}
openai_tool_calls = message.get("tool_calls", [])
if openai_tool_calls:
bedrock_tool_use = {
"toolUseId": openai_tool_calls[0]["id"],
"name": openai_tool_calls[0]["function"]["name"],
"input": json.loads(
openai_tool_calls[0]["function"]["arguments"]
),
}
bedrock_message["content"].append({"toolUse": bedrock_tool_use})
global CURRENT_TOOLUSE_ID
CURRENT_TOOLUSE_ID = openai_tool_calls[0]["id"]
bedrock_messages.append(bedrock_message)
elif message.get("role") == "tool":
bedrock_message = {
"role": "user",
"content": [
{
"toolResult": {
"toolUseId": CURRENT_TOOLUSE_ID,
"content": [{"text": message.get("content")}],
}
}
],
}
bedrock_messages.append(bedrock_message)
else:
raise ValueError(f"Invalid role: {message.get('role')}")
return system_prompt, bedrock_messages
def _convert_bedrock_response_to_openai_format(self, bedrock_response):
# Convert Bedrock response format to OpenAI format
content = ""
if bedrock_response.get("output", {}).get("message", {}).get("content"):
content_array = bedrock_response["output"]["message"]["content"]
content = "".join(item.get("text", "") for item in content_array)
if content == "":
content = "."
# Handle tool calls in response
openai_tool_calls = []
if bedrock_response.get("output", {}).get("message", {}).get("content"):
for content_item in bedrock_response["output"]["message"]["content"]:
if content_item.get("toolUse"):
bedrock_tool_use = content_item["toolUse"]
global CURRENT_TOOLUSE_ID
CURRENT_TOOLUSE_ID = bedrock_tool_use["toolUseId"]
openai_tool_call = {
"id": CURRENT_TOOLUSE_ID,
"type": "function",
"function": {
"name": bedrock_tool_use["name"],
"arguments": json.dumps(bedrock_tool_use["input"]),
},
}
openai_tool_calls.append(openai_tool_call)
# Construct final OpenAI format response
openai_format = {
"id": f"chatcmpl-{uuid.uuid4()}",
"created": int(time.time()),
"object": "chat.completion",
"system_fingerprint": None,
"choices": [
{
"finish_reason": bedrock_response.get("stopReason", "end_turn"),
"index": 0,
"message": {
"content": content,
"role": bedrock_response.get("output", {})
.get("message", {})
.get("role", "assistant"),
"tool_calls": openai_tool_calls
if openai_tool_calls != []
else None,
"function_call": None,
},
}
],
"usage": {
"completion_tokens": bedrock_response.get("usage", {}).get(
"outputTokens", 0
),
"prompt_tokens": bedrock_response.get("usage", {}).get(
"inputTokens", 0
),
"total_tokens": bedrock_response.get("usage", {}).get("totalTokens", 0),
},
}
return OpenAIResponse(openai_format)
async def _invoke_bedrock(
self,
model: str,
messages: List[Dict[str, str]],
max_tokens: int,
temperature: float,
tools: Optional[List[dict]] = None,
tool_choice: Literal["none", "auto", "required"] = "auto",
**kwargs,
) -> OpenAIResponse:
# Non-streaming invocation of Bedrock model
(
system_prompt,
bedrock_messages,
) = self._convert_openai_messages_to_bedrock_format(messages)
response = self.client.converse(
modelId=model,
system=system_prompt,
messages=bedrock_messages,
inferenceConfig={"temperature": temperature, "maxTokens": max_tokens},
toolConfig={"tools": tools} if tools else None,
)
openai_response = self._convert_bedrock_response_to_openai_format(response)
return openai_response
async def _invoke_bedrock_stream(
self,
model: str,
messages: List[Dict[str, str]],
max_tokens: int,
temperature: float,
tools: Optional[List[dict]] = None,
tool_choice: Literal["none", "auto", "required"] = "auto",
**kwargs,
) -> OpenAIResponse:
# Streaming invocation of Bedrock model
(
system_prompt,
bedrock_messages,
) = self._convert_openai_messages_to_bedrock_format(messages)
response = self.client.converse_stream(
modelId=model,
system=system_prompt,
messages=bedrock_messages,
inferenceConfig={"temperature": temperature, "maxTokens": max_tokens},
toolConfig={"tools": tools} if tools else None,
)
# Initialize response structure
bedrock_response = {
"output": {"message": {"role": "", "content": []}},
"stopReason": "",
"usage": {},
"metrics": {},
}
bedrock_response_text = ""
bedrock_response_tool_input = ""
# Process streaming response
stream = response.get("stream")
if stream:
for event in stream:
if event.get("messageStart", {}).get("role"):
bedrock_response["output"]["message"]["role"] = event[
"messageStart"
]["role"]
if event.get("contentBlockDelta", {}).get("delta", {}).get("text"):
bedrock_response_text += event["contentBlockDelta"]["delta"]["text"]
print(
event["contentBlockDelta"]["delta"]["text"], end="", flush=True
)
if event.get("contentBlockStop", {}).get("contentBlockIndex") == 0:
bedrock_response["output"]["message"]["content"].append(
{"text": bedrock_response_text}
)
if event.get("contentBlockStart", {}).get("start", {}).get("toolUse"):
bedrock_tool_use = event["contentBlockStart"]["start"]["toolUse"]
tool_use = {
"toolUseId": bedrock_tool_use["toolUseId"],
"name": bedrock_tool_use["name"],
}
bedrock_response["output"]["message"]["content"].append(
{"toolUse": tool_use}
)
global CURRENT_TOOLUSE_ID
CURRENT_TOOLUSE_ID = bedrock_tool_use["toolUseId"]
if event.get("contentBlockDelta", {}).get("delta", {}).get("toolUse"):
bedrock_response_tool_input += event["contentBlockDelta"]["delta"][
"toolUse"
]["input"]
print(
event["contentBlockDelta"]["delta"]["toolUse"]["input"],
end="",
flush=True,
)
if event.get("contentBlockStop", {}).get("contentBlockIndex") == 1:
bedrock_response["output"]["message"]["content"][1]["toolUse"][
"input"
] = json.loads(bedrock_response_tool_input)
print()
openai_response = self._convert_bedrock_response_to_openai_format(
bedrock_response
)
return openai_response
def create(
self,
model: str,
messages: List[Dict[str, str]],
max_tokens: int,
temperature: float,
stream: Optional[bool] = True,
tools: Optional[List[dict]] = None,
tool_choice: Literal["none", "auto", "required"] = "auto",
**kwargs,
) -> OpenAIResponse:
# Main entry point for chat completion
bedrock_tools = []
if tools is not None:
bedrock_tools = self._convert_openai_tools_to_bedrock_format(tools)
if stream:
return self._invoke_bedrock_stream(
model,
messages,
max_tokens,
temperature,
bedrock_tools,
tool_choice,
**kwargs,
)
else:
return self._invoke_bedrock(
model,
messages,
max_tokens,
temperature,
bedrock_tools,
tool_choice,
**kwargs,
)
```
## /app/config.py
```py path="/app/config.py"
import json
import threading
import tomllib
from pathlib import Path
from typing import Dict, List, Optional
from pydantic import BaseModel, Field
def get_project_root() -> Path:
"""Get the project root directory"""
return Path(__file__).resolve().parent.parent
PROJECT_ROOT = get_project_root()
WORKSPACE_ROOT = PROJECT_ROOT / "workspace"
class LLMSettings(BaseModel):
model: str = Field(..., description="Model name")
base_url: str = Field(..., description="API base URL")
api_key: str = Field(..., description="API key")
max_tokens: int = Field(4096, description="Maximum number of tokens per request")
max_input_tokens: Optional[int] = Field(
None,
description="Maximum input tokens to use across all requests (None for unlimited)",
)
temperature: float = Field(1.0, description="Sampling temperature")
api_type: str = Field(..., description="Azure, Openai, or Ollama")
api_version: str = Field(..., description="Azure Openai version if AzureOpenai")
class ProxySettings(BaseModel):
server: str = Field(None, description="Proxy server address")
username: Optional[str] = Field(None, description="Proxy username")
password: Optional[str] = Field(None, description="Proxy password")
class SearchSettings(BaseModel):
engine: str = Field(default="Google", description="Search engine the llm to use")
fallback_engines: List[str] = Field(
default_factory=lambda: ["DuckDuckGo", "Baidu", "Bing"],
description="Fallback search engines to try if the primary engine fails",
)
retry_delay: int = Field(
default=60,
description="Seconds to wait before retrying all engines again after they all fail",
)
max_retries: int = Field(
default=3,
description="Maximum number of times to retry all engines when all fail",
)
lang: str = Field(
default="en",
description="Language code for search results (e.g., en, zh, fr)",
)
country: str = Field(
default="us",
description="Country code for search results (e.g., us, cn, uk)",
)
class BrowserSettings(BaseModel):
headless: bool = Field(False, description="Whether to run browser in headless mode")
disable_security: bool = Field(
True, description="Disable browser security features"
)
extra_chromium_args: List[str] = Field(
default_factory=list, description="Extra arguments to pass to the browser"
)
chrome_instance_path: Optional[str] = Field(
None, description="Path to a Chrome instance to use"
)
wss_url: Optional[str] = Field(
None, description="Connect to a browser instance via WebSocket"
)
cdp_url: Optional[str] = Field(
None, description="Connect to a browser instance via CDP"
)
proxy: Optional[ProxySettings] = Field(
None, description="Proxy settings for the browser"
)
max_content_length: int = Field(
2000, description="Maximum length for content retrieval operations"
)
class SandboxSettings(BaseModel):
"""Configuration for the execution sandbox"""
use_sandbox: bool = Field(False, description="Whether to use the sandbox")
image: str = Field("python:3.12-slim", description="Base image")
work_dir: str = Field("/workspace", description="Container working directory")
memory_limit: str = Field("512m", description="Memory limit")
cpu_limit: float = Field(1.0, description="CPU limit")
timeout: int = Field(300, description="Default command timeout (seconds)")
network_enabled: bool = Field(
False, description="Whether network access is allowed"
)
class MCPServerConfig(BaseModel):
"""Configuration for a single MCP server"""
type: str = Field(..., description="Server connection type (sse or stdio)")
url: Optional[str] = Field(None, description="Server URL for SSE connections")
command: Optional[str] = Field(None, description="Command for stdio connections")
args: List[str] = Field(
default_factory=list, description="Arguments for stdio command"
)
class MCPSettings(BaseModel):
"""Configuration for MCP (Model Context Protocol)"""
server_reference: str = Field(
"app.mcp.server", description="Module reference for the MCP server"
)
servers: Dict[str, MCPServerConfig] = Field(
default_factory=dict, description="MCP server configurations"
)
@classmethod
def load_server_config(cls) -> Dict[str, MCPServerConfig]:
"""Load MCP server configuration from JSON file"""
config_path = PROJECT_ROOT / "config" / "mcp.json"
try:
config_file = config_path if config_path.exists() else None
if not config_file:
return {}
with config_file.open() as f:
data = json.load(f)
servers = {}
for server_id, server_config in data.get("mcpServers", {}).items():
servers[server_id] = MCPServerConfig(
type=server_config["type"],
url=server_config.get("url"),
command=server_config.get("command"),
args=server_config.get("args", []),
)
return servers
except Exception as e:
raise ValueError(f"Failed to load MCP server config: {e}")
class AppConfig(BaseModel):
llm: Dict[str, LLMSettings]
sandbox: Optional[SandboxSettings] = Field(
None, description="Sandbox configuration"
)
browser_config: Optional[BrowserSettings] = Field(
None, description="Browser configuration"
)
search_config: Optional[SearchSettings] = Field(
None, description="Search configuration"
)
mcp_config: Optional[MCPSettings] = Field(None, description="MCP configuration")
class Config:
arbitrary_types_allowed = True
class Config:
_instance = None
_lock = threading.Lock()
_initialized = False
def __new__(cls):
if cls._instance is None:
with cls._lock:
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
def __init__(self):
if not self._initialized:
with self._lock:
if not self._initialized:
self._config = None
self._load_initial_config()
self._initialized = True
@staticmethod
def _get_config_path() -> Path:
root = PROJECT_ROOT
config_path = root / "config" / "config.toml"
if config_path.exists():
return config_path
example_path = root / "config" / "config.example.toml"
if example_path.exists():
return example_path
raise FileNotFoundError("No configuration file found in config directory")
def _load_config(self) -> dict:
config_path = self._get_config_path()
with config_path.open("rb") as f:
return tomllib.load(f)
def _load_initial_config(self):
raw_config = self._load_config()
base_llm = raw_config.get("llm", {})
llm_overrides = {
k: v for k, v in raw_config.get("llm", {}).items() if isinstance(v, dict)
}
default_settings = {
"model": base_llm.get("model"),
"base_url": base_llm.get("base_url"),
"api_key": base_llm.get("api_key"),
"max_tokens": base_llm.get("max_tokens", 4096),
"max_input_tokens": base_llm.get("max_input_tokens"),
"temperature": base_llm.get("temperature", 1.0),
"api_type": base_llm.get("api_type", ""),
"api_version": base_llm.get("api_version", ""),
}
# handle browser config.
browser_config = raw_config.get("browser", {})
browser_settings = None
if browser_config:
# handle proxy settings.
proxy_config = browser_config.get("proxy", {})
proxy_settings = None
if proxy_config and proxy_config.get("server"):
proxy_settings = ProxySettings(
**{
k: v
for k, v in proxy_config.items()
if k in ["server", "username", "password"] and v
}
)
# filter valid browser config parameters.
valid_browser_params = {
k: v
for k, v in browser_config.items()
if k in BrowserSettings.__annotations__ and v is not None
}
# if there is proxy settings, add it to the parameters.
if proxy_settings:
valid_browser_params["proxy"] = proxy_settings
# only create BrowserSettings when there are valid parameters.
if valid_browser_params:
browser_settings = BrowserSettings(**valid_browser_params)
search_config = raw_config.get("search", {})
search_settings = None
if search_config:
search_settings = SearchSettings(**search_config)
sandbox_config = raw_config.get("sandbox", {})
if sandbox_config:
sandbox_settings = SandboxSettings(**sandbox_config)
else:
sandbox_settings = SandboxSettings()
mcp_config = raw_config.get("mcp", {})
mcp_settings = None
if mcp_config:
# Load server configurations from JSON
mcp_config["servers"] = MCPSettings.load_server_config()
mcp_settings = MCPSettings(**mcp_config)
else:
mcp_settings = MCPSettings(servers=MCPSettings.load_server_config())
config_dict = {
"llm": {
"default": default_settings,
**{
name: {**default_settings, **override_config}
for name, override_config in llm_overrides.items()
},
},
"sandbox": sandbox_settings,
"browser_config": browser_settings,
"search_config": search_settings,
"mcp_config": mcp_settings,
}
self._config = AppConfig(**config_dict)
@property
def llm(self) -> Dict[str, LLMSettings]:
return self._config.llm
@property
def sandbox(self) -> SandboxSettings:
return self._config.sandbox
@property
def browser_config(self) -> Optional[BrowserSettings]:
return self._config.browser_config
@property
def search_config(self) -> Optional[SearchSettings]:
return self._config.search_config
@property
def mcp_config(self) -> MCPSettings:
"""Get the MCP configuration"""
return self._config.mcp_config
@property
def workspace_root(self) -> Path:
"""Get the workspace root directory"""
return WORKSPACE_ROOT
@property
def root_path(self) -> Path:
"""Get the root path of the application"""
return PROJECT_ROOT
config = Config()
```
## /app/exceptions.py
```py path="/app/exceptions.py"
class ToolError(Exception):
"""Raised when a tool encounters an error."""
def __init__(self, message):
self.message = message
class OpenManusError(Exception):
"""Base exception for all OpenManus errors"""
class TokenLimitExceeded(OpenManusError):
"""Exception raised when the token limit is exceeded"""
```
## /app/flow/__init__.py
```py path="/app/flow/__init__.py"
```
## /app/flow/base.py
```py path="/app/flow/base.py"
from abc import ABC, abstractmethod
from typing import Dict, List, Optional, Union
from pydantic import BaseModel
from app.agent.base import BaseAgent
class BaseFlow(BaseModel, ABC):
"""Base class for execution flows supporting multiple agents"""
agents: Dict[str, BaseAgent]
tools: Optional[List] = None
primary_agent_key: Optional[str] = None
class Config:
arbitrary_types_allowed = True
def __init__(
self, agents: Union[BaseAgent, List[BaseAgent], Dict[str, BaseAgent]], **data
):
# Handle different ways of providing agents
if isinstance(agents, BaseAgent):
agents_dict = {"default": agents}
elif isinstance(agents, list):
agents_dict = {f"agent_{i}": agent for i, agent in enumerate(agents)}
else:
agents_dict = agents
# If primary agent not specified, use first agent
primary_key = data.get("primary_agent_key")
if not primary_key and agents_dict:
primary_key = next(iter(agents_dict))
data["primary_agent_key"] = primary_key
# Set the agents dictionary
data["agents"] = agents_dict
# Initialize using BaseModel's init
super().__init__(**data)
@property
def primary_agent(self) -> Optional[BaseAgent]:
"""Get the primary agent for the flow"""
return self.agents.get(self.primary_agent_key)
def get_agent(self, key: str) -> Optional[BaseAgent]:
"""Get a specific agent by key"""
return self.agents.get(key)
def add_agent(self, key: str, agent: BaseAgent) -> None:
"""Add a new agent to the flow"""
self.agents[key] = agent
@abstractmethod
async def execute(self, input_text: str) -> str:
"""Execute the flow with given input"""
```
## /app/flow/flow_factory.py
```py path="/app/flow/flow_factory.py"
from enum import Enum
from typing import Dict, List, Union
from app.agent.base import BaseAgent
from app.flow.base import BaseFlow
from app.flow.planning import PlanningFlow
class FlowType(str, Enum):
PLANNING = "planning"
class FlowFactory:
"""Factory for creating different types of flows with support for multiple agents"""
@staticmethod
def create_flow(
flow_type: FlowType,
agents: Union[BaseAgent, List[BaseAgent], Dict[str, BaseAgent]],
**kwargs,
) -> BaseFlow:
flows = {
FlowType.PLANNING: PlanningFlow,
}
flow_class = flows.get(flow_type)
if not flow_class:
raise ValueError(f"Unknown flow type: {flow_type}")
return flow_class(agents, **kwargs)
```
## /app/flow/planning.py
```py path="/app/flow/planning.py"
import json
import time
from enum import Enum
from typing import Dict, List, Optional, Union
from pydantic import Field
from app.agent.base import BaseAgent
from app.flow.base import BaseFlow
from app.llm import LLM
from app.logger import logger
from app.schema import AgentState, Message, ToolChoice
from app.tool import PlanningTool
class PlanStepStatus(str, Enum):
"""Enum class defining possible statuses of a plan step"""
NOT_STARTED = "not_started"
IN_PROGRESS = "in_progress"
COMPLETED = "completed"
BLOCKED = "blocked"
@classmethod
def get_all_statuses(cls) -> list[str]:
"""Return a list of all possible step status values"""
return [status.value for status in cls]
@classmethod
def get_active_statuses(cls) -> list[str]:
"""Return a list of values representing active statuses (not started or in progress)"""
return [cls.NOT_STARTED.value, cls.IN_PROGRESS.value]
@classmethod
def get_status_marks(cls) -> Dict[str, str]:
"""Return a mapping of statuses to their marker symbols"""
return {
cls.COMPLETED.value: "[✓]",
cls.IN_PROGRESS.value: "[→]",
cls.BLOCKED.value: "[!]",
cls.NOT_STARTED.value: "[ ]",
}
class PlanningFlow(BaseFlow):
"""A flow that manages planning and execution of tasks using agents."""
llm: LLM = Field(default_factory=lambda: LLM())
planning_tool: PlanningTool = Field(default_factory=PlanningTool)
executor_keys: List[str] = Field(default_factory=list)
active_plan_id: str = Field(default_factory=lambda: f"plan_{int(time.time())}")
current_step_index: Optional[int] = None
def __init__(
self, agents: Union[BaseAgent, List[BaseAgent], Dict[str, BaseAgent]], **data
):
# Set executor keys before super().__init__
if "executors" in data:
data["executor_keys"] = data.pop("executors")
# Set plan ID if provided
if "plan_id" in data:
data["active_plan_id"] = data.pop("plan_id")
# Initialize the planning tool if not provided
if "planning_tool" not in data:
planning_tool = PlanningTool()
data["planning_tool"] = planning_tool
# Call parent's init with the processed data
super().__init__(agents, **data)
# Set executor_keys to all agent keys if not specified
if not self.executor_keys:
self.executor_keys = list(self.agents.keys())
def get_executor(self, step_type: Optional[str] = None) -> BaseAgent:
"""
Get an appropriate executor agent for the current step.
Can be extended to select agents based on step type/requirements.
"""
# If step type is provided and matches an agent key, use that agent
if step_type and step_type in self.agents:
return self.agents[step_type]
# Otherwise use the first available executor or fall back to primary agent
for key in self.executor_keys:
if key in self.agents:
return self.agents[key]
# Fallback to primary agent
return self.primary_agent
async def execute(self, input_text: str) -> str:
"""Execute the planning flow with agents."""
try:
if not self.primary_agent:
raise ValueError("No primary agent available")
# Create initial plan if input provided
if input_text:
await self._create_initial_plan(input_text)
# Verify plan was created successfully
if self.active_plan_id not in self.planning_tool.plans:
logger.error(
f"Plan creation failed. Plan ID {self.active_plan_id} not found in planning tool."
)
return f"Failed to create plan for: {input_text}"
result = ""
while True:
# Get current step to execute
self.current_step_index, step_info = await self._get_current_step_info()
# Exit if no more steps or plan completed
if self.current_step_index is None:
result += await self._finalize_plan()
break
# Execute current step with appropriate agent
step_type = step_info.get("type") if step_info else None
executor = self.get_executor(step_type)
step_result = await self._execute_step(executor, step_info)
result += step_result + "\n"
# Check if agent wants to terminate
if hasattr(executor, "state") and executor.state == AgentState.FINISHED:
break
return result
except Exception as e:
logger.error(f"Error in PlanningFlow: {str(e)}")
return f"Execution failed: {str(e)}"
async def _create_initial_plan(self, request: str) -> None:
"""Create an initial plan based on the request using the flow's LLM and PlanningTool."""
logger.info(f"Creating initial plan with ID: {self.active_plan_id}")
# Create a system message for plan creation
system_message = Message.system_message(
"You are a planning assistant. Create a concise, actionable plan with clear steps. "
"Focus on key milestones rather than detailed sub-steps. "
"Optimize for clarity and efficiency."
)
# Create a user message with the request
user_message = Message.user_message(
f"Create a reasonable plan with clear steps to accomplish the task: {request}"
)
# Call LLM with PlanningTool
response = await self.llm.ask_tool(
messages=[user_message],
system_msgs=[system_message],
tools=[self.planning_tool.to_param()],
tool_choice=ToolChoice.AUTO,
)
# Process tool calls if present
if response.tool_calls:
for tool_call in response.tool_calls:
if tool_call.function.name == "planning":
# Parse the arguments
args = tool_call.function.arguments
if isinstance(args, str):
try:
args = json.loads(args)
except json.JSONDecodeError:
logger.error(f"Failed to parse tool arguments: {args}")
continue
# Ensure plan_id is set correctly and execute the tool
args["plan_id"] = self.active_plan_id
# Execute the tool via ToolCollection instead of directly
result = await self.planning_tool.execute(**args)
logger.info(f"Plan creation result: {str(result)}")
return
# If execution reached here, create a default plan
logger.warning("Creating default plan")
# Create default plan using the ToolCollection
await self.planning_tool.execute(
**{
"command": "create",
"plan_id": self.active_plan_id,
"title": f"Plan for: {request[:50]}{'...' if len(request) > 50 else ''}",
"steps": ["Analyze request", "Execute task", "Verify results"],
}
)
async def _get_current_step_info(self) -> tuple[Optional[int], Optional[dict]]:
"""
Parse the current plan to identify the first non-completed step's index and info.
Returns (None, None) if no active step is found.
"""
if (
not self.active_plan_id
or self.active_plan_id not in self.planning_tool.plans
):
logger.error(f"Plan with ID {self.active_plan_id} not found")
return None, None
try:
# Direct access to plan data from planning tool storage
plan_data = self.planning_tool.plans[self.active_plan_id]
steps = plan_data.get("steps", [])
step_statuses = plan_data.get("step_statuses", [])
# Find first non-completed step
for i, step in enumerate(steps):
if i >= len(step_statuses):
status = PlanStepStatus.NOT_STARTED.value
else:
status = step_statuses[i]
if status in PlanStepStatus.get_active_statuses():
# Extract step type/category if available
step_info = {"text": step}
# Try to extract step type from the text (e.g., [SEARCH] or [CODE])
import re
type_match = re.search(r"\[([A-Z_]+)\]", step)
if type_match:
step_info["type"] = type_match.group(1).lower()
# Mark current step as in_progress
try:
await self.planning_tool.execute(
command="mark_step",
plan_id=self.active_plan_id,
step_index=i,
step_status=PlanStepStatus.IN_PROGRESS.value,
)
except Exception as e:
logger.warning(f"Error marking step as in_progress: {e}")
# Update step status directly if needed
if i < len(step_statuses):
step_statuses[i] = PlanStepStatus.IN_PROGRESS.value
else:
while len(step_statuses) < i:
step_statuses.append(PlanStepStatus.NOT_STARTED.value)
step_statuses.append(PlanStepStatus.IN_PROGRESS.value)
plan_data["step_statuses"] = step_statuses
return i, step_info
return None, None # No active step found
except Exception as e:
logger.warning(f"Error finding current step index: {e}")
return None, None
async def _execute_step(self, executor: BaseAgent, step_info: dict) -> str:
"""Execute the current step with the specified agent using agent.run()."""
# Prepare context for the agent with current plan status
plan_status = await self._get_plan_text()
step_text = step_info.get("text", f"Step {self.current_step_index}")
# Create a prompt for the agent to execute the current step
step_prompt = f"""
CURRENT PLAN STATUS:
{plan_status}
YOUR CURRENT TASK:
You are now working on step {self.current_step_index}: "{step_text}"
Please execute this step using the appropriate tools. When you're done, provide a summary of what you accomplished.
"""
# Use agent.run() to execute the step
try:
step_result = await executor.run(step_prompt)
# Mark the step as completed after successful execution
await self._mark_step_completed()
return step_result
except Exception as e:
logger.error(f"Error executing step {self.current_step_index}: {e}")
return f"Error executing step {self.current_step_index}: {str(e)}"
async def _mark_step_completed(self) -> None:
"""Mark the current step as completed."""
if self.current_step_index is None:
return
try:
# Mark the step as completed
await self.planning_tool.execute(
command="mark_step",
plan_id=self.active_plan_id,
step_index=self.current_step_index,
step_status=PlanStepStatus.COMPLETED.value,
)
logger.info(
f"Marked step {self.current_step_index} as completed in plan {self.active_plan_id}"
)
except Exception as e:
logger.warning(f"Failed to update plan status: {e}")
# Update step status directly in planning tool storage
if self.active_plan_id in self.planning_tool.plans:
plan_data = self.planning_tool.plans[self.active_plan_id]
step_statuses = plan_data.get("step_statuses", [])
# Ensure the step_statuses list is long enough
while len(step_statuses) <= self.current_step_index:
step_statuses.append(PlanStepStatus.NOT_STARTED.value)
# Update the status
step_statuses[self.current_step_index] = PlanStepStatus.COMPLETED.value
plan_data["step_statuses"] = step_statuses
async def _get_plan_text(self) -> str:
"""Get the current plan as formatted text."""
try:
result = await self.planning_tool.execute(
command="get", plan_id=self.active_plan_id
)
return result.output if hasattr(result, "output") else str(result)
except Exception as e:
logger.error(f"Error getting plan: {e}")
return self._generate_plan_text_from_storage()
def _generate_plan_text_from_storage(self) -> str:
"""Generate plan text directly from storage if the planning tool fails."""
try:
if self.active_plan_id not in self.planning_tool.plans:
return f"Error: Plan with ID {self.active_plan_id} not found"
plan_data = self.planning_tool.plans[self.active_plan_id]
title = plan_data.get("title", "Untitled Plan")
steps = plan_data.get("steps", [])
step_statuses = plan_data.get("step_statuses", [])
step_notes = plan_data.get("step_notes", [])
# Ensure step_statuses and step_notes match the number of steps
while len(step_statuses) < len(steps):
step_statuses.append(PlanStepStatus.NOT_STARTED.value)
while len(step_notes) < len(steps):
step_notes.append("")
# Count steps by status
status_counts = {status: 0 for status in PlanStepStatus.get_all_statuses()}
for status in step_statuses:
if status in status_counts:
status_counts[status] += 1
completed = status_counts[PlanStepStatus.COMPLETED.value]
total = len(steps)
progress = (completed / total) * 100 if total > 0 else 0
plan_text = f"Plan: {title} (ID: {self.active_plan_id})\n"
plan_text += "=" * len(plan_text) + "\n\n"
plan_text += (
f"Progress: {completed}/{total} steps completed ({progress:.1f}%)\n"
)
plan_text += f"Status: {status_counts[PlanStepStatus.COMPLETED.value]} completed, {status_counts[PlanStepStatus.IN_PROGRESS.value]} in progress, "
plan_text += f"{status_counts[PlanStepStatus.BLOCKED.value]} blocked, {status_counts[PlanStepStatus.NOT_STARTED.value]} not started\n\n"
plan_text += "Steps:\n"
status_marks = PlanStepStatus.get_status_marks()
for i, (step, status, notes) in enumerate(
zip(steps, step_statuses, step_notes)
):
# Use status marks to indicate step status
status_mark = status_marks.get(
status, status_marks[PlanStepStatus.NOT_STARTED.value]
)
plan_text += f"{i}. {status_mark} {step}\n"
if notes:
plan_text += f" Notes: {notes}\n"
return plan_text
except Exception as e:
logger.error(f"Error generating plan text from storage: {e}")
return f"Error: Unable to retrieve plan with ID {self.active_plan_id}"
async def _finalize_plan(self) -> str:
"""Finalize the plan and provide a summary using the flow's LLM directly."""
plan_text = await self._get_plan_text()
# Create a summary using the flow's LLM directly
try:
system_message = Message.system_message(
"You are a planning assistant. Your task is to summarize the completed plan."
)
user_message = Message.user_message(
f"The plan has been completed. Here is the final plan status:\n\n{plan_text}\n\nPlease provide a summary of what was accomplished and any final thoughts."
)
response = await self.llm.ask(
messages=[user_message], system_msgs=[system_message]
)
return f"Plan completed:\n\n{response}"
except Exception as e:
logger.error(f"Error finalizing plan with LLM: {e}")
# Fallback to using an agent for the summary
try:
agent = self.primary_agent
summary_prompt = f"""
The plan has been completed. Here is the final plan status:
{plan_text}
Please provide a summary of what was accomplished and any final thoughts.
"""
summary = await agent.run(summary_prompt)
return f"Plan completed:\n\n{summary}"
except Exception as e2:
logger.error(f"Error finalizing plan with agent: {e2}")
return "Plan completed. Error generating summary."
```
## /app/llm.py
```py path="/app/llm.py"
import math
from typing import Dict, List, Optional, Union
import tiktoken
from openai import (
APIError,
AsyncAzureOpenAI,
AsyncOpenAI,
AuthenticationError,
OpenAIError,
RateLimitError,
)
from openai.types.chat import ChatCompletion, ChatCompletionMessage
from tenacity import (
retry,
retry_if_exception_type,
stop_after_attempt,
wait_random_exponential,
)
from app.bedrock import BedrockClient
from app.config import LLMSettings, config
from app.exceptions import TokenLimitExceeded
from app.logger import logger # Assuming a logger is set up in your app
from app.schema import (
ROLE_VALUES,
TOOL_CHOICE_TYPE,
TOOL_CHOICE_VALUES,
Message,
ToolChoice,
)
REASONING_MODELS = ["o1", "o3-mini"]
MULTIMODAL_MODELS = [
"gpt-4-vision-preview",
"gpt-4o",
"gpt-4o-mini",
"claude-3-opus-20240229",
"claude-3-sonnet-20240229",
"claude-3-haiku-20240307",
]
class TokenCounter:
# Token constants
BASE_MESSAGE_TOKENS = 4
FORMAT_TOKENS = 2
LOW_DETAIL_IMAGE_TOKENS = 85
HIGH_DETAIL_TILE_TOKENS = 170
# Image processing constants
MAX_SIZE = 2048
HIGH_DETAIL_TARGET_SHORT_SIDE = 768
TILE_SIZE = 512
def __init__(self, tokenizer):
self.tokenizer = tokenizer
def count_text(self, text: str) -> int:
"""Calculate tokens for a text string"""
return 0 if not text else len(self.tokenizer.encode(text))
def count_image(self, image_item: dict) -> int:
"""
Calculate tokens for an image based on detail level and dimensions
For "low" detail: fixed 85 tokens
For "high" detail:
1. Scale to fit in 2048x2048 square
2. Scale shortest side to 768px
3. Count 512px tiles (170 tokens each)
4. Add 85 tokens
"""
detail = image_item.get("detail", "medium")
# For low detail, always return fixed token count
if detail == "low":
return self.LOW_DETAIL_IMAGE_TOKENS
# For medium detail (default in OpenAI), use high detail calculation
# OpenAI doesn't specify a separate calculation for medium
# For high detail, calculate based on dimensions if available
if detail == "high" or detail == "medium":
# If dimensions are provided in the image_item
if "dimensions" in image_item:
width, height = image_item["dimensions"]
return self._calculate_high_detail_tokens(width, height)
return (
self._calculate_high_detail_tokens(1024, 1024) if detail == "high" else 1024
)
def _calculate_high_detail_tokens(self, width: int, height: int) -> int:
"""Calculate tokens for high detail images based on dimensions"""
# Step 1: Scale to fit in MAX_SIZE x MAX_SIZE square
if width > self.MAX_SIZE or height > self.MAX_SIZE:
scale = self.MAX_SIZE / max(width, height)
width = int(width * scale)
height = int(height * scale)
# Step 2: Scale so shortest side is HIGH_DETAIL_TARGET_SHORT_SIDE
scale = self.HIGH_DETAIL_TARGET_SHORT_SIDE / min(width, height)
scaled_width = int(width * scale)
scaled_height = int(height * scale)
# Step 3: Count number of 512px tiles
tiles_x = math.ceil(scaled_width / self.TILE_SIZE)
tiles_y = math.ceil(scaled_height / self.TILE_SIZE)
total_tiles = tiles_x * tiles_y
# Step 4: Calculate final token count
return (
total_tiles * self.HIGH_DETAIL_TILE_TOKENS
) + self.LOW_DETAIL_IMAGE_TOKENS
def count_content(self, content: Union[str, List[Union[str, dict]]]) -> int:
"""Calculate tokens for message content"""
if not content:
return 0
if isinstance(content, str):
return self.count_text(content)
token_count = 0
for item in content:
if isinstance(item, str):
token_count += self.count_text(item)
elif isinstance(item, dict):
if "text" in item:
token_count += self.count_text(item["text"])
elif "image_url" in item:
token_count += self.count_image(item)
return token_count
def count_tool_calls(self, tool_calls: List[dict]) -> int:
"""Calculate tokens for tool calls"""
token_count = 0
for tool_call in tool_calls:
if "function" in tool_call:
function = tool_call["function"]
token_count += self.count_text(function.get("name", ""))
token_count += self.count_text(function.get("arguments", ""))
return token_count
def count_message_tokens(self, messages: List[dict]) -> int:
"""Calculate the total number of tokens in a message list"""
total_tokens = self.FORMAT_TOKENS # Base format tokens
for message in messages:
tokens = self.BASE_MESSAGE_TOKENS # Base tokens per message
# Add role tokens
tokens += self.count_text(message.get("role", ""))
# Add content tokens
if "content" in message:
tokens += self.count_content(message["content"])
# Add tool calls tokens
if "tool_calls" in message:
tokens += self.count_tool_calls(message["tool_calls"])
# Add name and tool_call_id tokens
tokens += self.count_text(message.get("name", ""))
tokens += self.count_text(message.get("tool_call_id", ""))
total_tokens += tokens
return total_tokens
class LLM:
_instances: Dict[str, "LLM"] = {}
def __new__(
cls, config_name: str = "default", llm_config: Optional[LLMSettings] = None
):
if config_name not in cls._instances:
instance = super().__new__(cls)
instance.__init__(config_name, llm_config)
cls._instances[config_name] = instance
return cls._instances[config_name]
def __init__(
self, config_name: str = "default", llm_config: Optional[LLMSettings] = None
):
if not hasattr(self, "client"): # Only initialize if not already initialized
llm_config = llm_config or config.llm
llm_config = llm_config.get(config_name, llm_config["default"])
self.model = llm_config.model
self.max_tokens = llm_config.max_tokens
self.temperature = llm_config.temperature
self.api_type = llm_config.api_type
self.api_key = llm_config.api_key
self.api_version = llm_config.api_version
self.base_url = llm_config.base_url
# Add token counting related attributes
self.total_input_tokens = 0
self.total_completion_tokens = 0
self.max_input_tokens = (
llm_config.max_input_tokens
if hasattr(llm_config, "max_input_tokens")
else None
)
# Initialize tokenizer
try:
self.tokenizer = tiktoken.encoding_for_model(self.model)
except KeyError:
# If the model is not in tiktoken's presets, use cl100k_base as default
self.tokenizer = tiktoken.get_encoding("cl100k_base")
if self.api_type == "azure":
self.client = AsyncAzureOpenAI(
base_url=self.base_url,
api_key=self.api_key,
api_version=self.api_version,
)
elif self.api_type == "aws":
self.client = BedrockClient()
else:
self.client = AsyncOpenAI(api_key=self.api_key, base_url=self.base_url)
self.token_counter = TokenCounter(self.tokenizer)
def count_tokens(self, text: str) -> int:
"""Calculate the number of tokens in a text"""
if not text:
return 0
return len(self.tokenizer.encode(text))
def count_message_tokens(self, messages: List[dict]) -> int:
return self.token_counter.count_message_tokens(messages)
def update_token_count(self, input_tokens: int, completion_tokens: int = 0) -> None:
"""Update token counts"""
# Only track tokens if max_input_tokens is set
self.total_input_tokens += input_tokens
self.total_completion_tokens += completion_tokens
logger.info(
f"Token usage: Input={input_tokens}, Completion={completion_tokens}, "
f"Cumulative Input={self.total_input_tokens}, Cumulative Completion={self.total_completion_tokens}, "
f"Total={input_tokens + completion_tokens}, Cumulative Total={self.total_input_tokens + self.total_completion_tokens}"
)
def check_token_limit(self, input_tokens: int) -> bool:
"""Check if token limits are exceeded"""
if self.max_input_tokens is not None:
return (self.total_input_tokens + input_tokens) <= self.max_input_tokens
# If max_input_tokens is not set, always return True
return True
def get_limit_error_message(self, input_tokens: int) -> str:
"""Generate error message for token limit exceeded"""
if (
self.max_input_tokens is not None
and (self.total_input_tokens + input_tokens) > self.max_input_tokens
):
return f"Request may exceed input token limit (Current: {self.total_input_tokens}, Needed: {input_tokens}, Max: {self.max_input_tokens})"
return "Token limit exceeded"
@staticmethod
def format_messages(
messages: List[Union[dict, Message]], supports_images: bool = False
) -> List[dict]:
"""
Format messages for LLM by converting them to OpenAI message format.
Args:
messages: List of messages that can be either dict or Message objects
supports_images: Flag indicating if the target model supports image inputs
Returns:
List[dict]: List of formatted messages in OpenAI format
Raises:
ValueError: If messages are invalid or missing required fields
TypeError: If unsupported message types are provided
Examples:
>>> msgs = [
... Message.system_message("You are a helpful assistant"),
... {"role": "user", "content": "Hello"},
... Message.user_message("How are you?")
... ]
>>> formatted = LLM.format_messages(msgs)
"""
formatted_messages = []
for message in messages:
# Convert Message objects to dictionaries
if isinstance(message, Message):
message = message.to_dict()
if isinstance(message, dict):
# If message is a dict, ensure it has required fields
if "role" not in message:
raise ValueError("Message dict must contain 'role' field")
# Process base64 images if present and model supports images
if supports_images and message.get("base64_image"):
# Initialize or convert content to appropriate format
if not message.get("content"):
message["content"] = []
elif isinstance(message["content"], str):
message["content"] = [
{"type": "text", "text": message["content"]}
]
elif isinstance(message["content"], list):
# Convert string items to proper text objects
message["content"] = [
(
{"type": "text", "text": item}
if isinstance(item, str)
else item
)
for item in message["content"]
]
# Add the image to content
message["content"].append(
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{message['base64_image']}"
},
}
)
# Remove the base64_image field
del message["base64_image"]
# If model doesn't support images but message has base64_image, handle gracefully
elif not supports_images and message.get("base64_image"):
# Just remove the base64_image field and keep the text content
del message["base64_image"]
if "content" in message or "tool_calls" in message:
formatted_messages.append(message)
# else: do not include the message
else:
raise TypeError(f"Unsupported message type: {type(message)}")
# Validate all messages have required fields
for msg in formatted_messages:
if msg["role"] not in ROLE_VALUES:
raise ValueError(f"Invalid role: {msg['role']}")
return formatted_messages
@retry(
wait=wait_random_exponential(min=1, max=60),
stop=stop_after_attempt(6),
retry=retry_if_exception_type(
(OpenAIError, Exception, ValueError)
), # Don't retry TokenLimitExceeded
)
async def ask(
self,
messages: List[Union[dict, Message]],
system_msgs: Optional[List[Union[dict, Message]]] = None,
stream: bool = True,
temperature: Optional[float] = None,
) -> str:
"""
Send a prompt to the LLM and get the response.
Args:
messages: List of conversation messages
system_msgs: Optional system messages to prepend
stream (bool): Whether to stream the response
temperature (float): Sampling temperature for the response
Returns:
str: The generated response
Raises:
TokenLimitExceeded: If token limits are exceeded
ValueError: If messages are invalid or response is empty
OpenAIError: If API call fails after retries
Exception: For unexpected errors
"""
try:
# Check if the model supports images
supports_images = self.model in MULTIMODAL_MODELS
# Format system and user messages with image support check
if system_msgs:
system_msgs = self.format_messages(system_msgs, supports_images)
messages = system_msgs + self.format_messages(messages, supports_images)
else:
messages = self.format_messages(messages, supports_images)
# Calculate input token count
input_tokens = self.count_message_tokens(messages)
# Check if token limits are exceeded
if not self.check_token_limit(input_tokens):
error_message = self.get_limit_error_message(input_tokens)
# Raise a special exception that won't be retried
raise TokenLimitExceeded(error_message)
params = {
"model": self.model,
"messages": messages,
}
if self.model in REASONING_MODELS:
params["max_completion_tokens"] = self.max_tokens
else:
params["max_tokens"] = self.max_tokens
params["temperature"] = (
temperature if temperature is not None else self.temperature
)
if not stream:
# Non-streaming request
response = await self.client.chat.completions.create(
**params, stream=False
)
if not response.choices or not response.choices[0].message.content:
raise ValueError("Empty or invalid response from LLM")
# Update token counts
self.update_token_count(
response.usage.prompt_tokens, response.usage.completion_tokens
)
return response.choices[0].message.content
# Streaming request, For streaming, update estimated token count before making the request
self.update_token_count(input_tokens)
response = await self.client.chat.completions.create(**params, stream=True)
collected_messages = []
completion_text = ""
async for chunk in response:
chunk_message = chunk.choices[0].delta.content or ""
collected_messages.append(chunk_message)
completion_text += chunk_message
print(chunk_message, end="", flush=True)
print() # Newline after streaming
full_response = "".join(collected_messages).strip()
if not full_response:
raise ValueError("Empty response from streaming LLM")
# estimate completion tokens for streaming response
completion_tokens = self.count_tokens(completion_text)
logger.info(
f"Estimated completion tokens for streaming response: {completion_tokens}"
)
self.total_completion_tokens += completion_tokens
return full_response
except TokenLimitExceeded:
# Re-raise token limit errors without logging
raise
except ValueError:
logger.exception(f"Validation error")
raise
except OpenAIError as oe:
logger.exception(f"OpenAI API error")
if isinstance(oe, AuthenticationError):
logger.error("Authentication failed. Check API key.")
elif isinstance(oe, RateLimitError):
logger.error("Rate limit exceeded. Consider increasing retry attempts.")
elif isinstance(oe, APIError):
logger.error(f"API error: {oe}")
raise
except Exception:
logger.exception(f"Unexpected error in ask")
raise
@retry(
wait=wait_random_exponential(min=1, max=60),
stop=stop_after_attempt(6),
retry=retry_if_exception_type(
(OpenAIError, Exception, ValueError)
), # Don't retry TokenLimitExceeded
)
async def ask_with_images(
self,
messages: List[Union[dict, Message]],
images: List[Union[str, dict]],
system_msgs: Optional[List[Union[dict, Message]]] = None,
stream: bool = False,
temperature: Optional[float] = None,
) -> str:
"""
Send a prompt with images to the LLM and get the response.
Args:
messages: List of conversation messages
images: List of image URLs or image data dictionaries
system_msgs: Optional system messages to prepend
stream (bool): Whether to stream the response
temperature (float): Sampling temperature for the response
Returns:
str: The generated response
Raises:
TokenLimitExceeded: If token limits are exceeded
ValueError: If messages are invalid or response is empty
OpenAIError: If API call fails after retries
Exception: For unexpected errors
"""
try:
# For ask_with_images, we always set supports_images to True because
# this method should only be called with models that support images
if self.model not in MULTIMODAL_MODELS:
raise ValueError(
f"Model {self.model} does not support images. Use a model from {MULTIMODAL_MODELS}"
)
# Format messages with image support
formatted_messages = self.format_messages(messages, supports_images=True)
# Ensure the last message is from the user to attach images
if not formatted_messages or formatted_messages[-1]["role"] != "user":
raise ValueError(
"The last message must be from the user to attach images"
)
# Process the last user message to include images
last_message = formatted_messages[-1]
# Convert content to multimodal format if needed
content = last_message["content"]
multimodal_content = (
[{"type": "text", "text": content}]
if isinstance(content, str)
else content
if isinstance(content, list)
else []
)
# Add images to content
for image in images:
if isinstance(image, str):
multimodal_content.append(
{"type": "image_url", "image_url": {"url": image}}
)
elif isinstance(image, dict) and "url" in image:
multimodal_content.append({"type": "image_url", "image_url": image})
elif isinstance(image, dict) and "image_url" in image:
multimodal_content.append(image)
else:
raise ValueError(f"Unsupported image format: {image}")
# Update the message with multimodal content
last_message["content"] = multimodal_content
# Add system messages if provided
if system_msgs:
all_messages = (
self.format_messages(system_msgs, supports_images=True)
+ formatted_messages
)
else:
all_messages = formatted_messages
# Calculate tokens and check limits
input_tokens = self.count_message_tokens(all_messages)
if not self.check_token_limit(input_tokens):
raise TokenLimitExceeded(self.get_limit_error_message(input_tokens))
# Set up API parameters
params = {
"model": self.model,
"messages": all_messages,
"stream": stream,
}
# Add model-specific parameters
if self.model in REASONING_MODELS:
params["max_completion_tokens"] = self.max_tokens
else:
params["max_tokens"] = self.max_tokens
params["temperature"] = (
temperature if temperature is not None else self.temperature
)
# Handle non-streaming request
if not stream:
response = await self.client.chat.completions.create(**params)
if not response.choices or not response.choices[0].message.content:
raise ValueError("Empty or invalid response from LLM")
self.update_token_count(response.usage.prompt_tokens)
return response.choices[0].message.content
# Handle streaming request
self.update_token_count(input_tokens)
response = await self.client.chat.completions.create(**params)
collected_messages = []
async for chunk in response:
chunk_message = chunk.choices[0].delta.content or ""
collected_messages.append(chunk_message)
print(chunk_message, end="", flush=True)
print() # Newline after streaming
full_response = "".join(collected_messages).strip()
if not full_response:
raise ValueError("Empty response from streaming LLM")
return full_response
except TokenLimitExceeded:
raise
except ValueError as ve:
logger.error(f"Validation error in ask_with_images: {ve}")
raise
except OpenAIError as oe:
logger.error(f"OpenAI API error: {oe}")
if isinstance(oe, AuthenticationError):
logger.error("Authentication failed. Check API key.")
elif isinstance(oe, RateLimitError):
logger.error("Rate limit exceeded. Consider increasing retry attempts.")
elif isinstance(oe, APIError):
logger.error(f"API error: {oe}")
raise
except Exception as e:
logger.error(f"Unexpected error in ask_with_images: {e}")
raise
@retry(
wait=wait_random_exponential(min=1, max=60),
stop=stop_after_attempt(6),
retry=retry_if_exception_type(
(OpenAIError, Exception, ValueError)
), # Don't retry TokenLimitExceeded
)
async def ask_tool(
self,
messages: List[Union[dict, Message]],
system_msgs: Optional[List[Union[dict, Message]]] = None,
timeout: int = 300,
tools: Optional[List[dict]] = None,
tool_choice: TOOL_CHOICE_TYPE = ToolChoice.AUTO, # type: ignore
temperature: Optional[float] = None,
**kwargs,
) -> ChatCompletionMessage | None:
"""
Ask LLM using functions/tools and return the response.
Args:
messages: List of conversation messages
system_msgs: Optional system messages to prepend
timeout: Request timeout in seconds
tools: List of tools to use
tool_choice: Tool choice strategy
temperature: Sampling temperature for the response
**kwargs: Additional completion arguments
Returns:
ChatCompletionMessage: The model's response
Raises:
TokenLimitExceeded: If token limits are exceeded
ValueError: If tools, tool_choice, or messages are invalid
OpenAIError: If API call fails after retries
Exception: For unexpected errors
"""
try:
# Validate tool_choice
if tool_choice not in TOOL_CHOICE_VALUES:
raise ValueError(f"Invalid tool_choice: {tool_choice}")
# Check if the model supports images
supports_images = self.model in MULTIMODAL_MODELS
# Format messages
if system_msgs:
system_msgs = self.format_messages(system_msgs, supports_images)
messages = system_msgs + self.format_messages(messages, supports_images)
else:
messages = self.format_messages(messages, supports_images)
# Calculate input token count
input_tokens = self.count_message_tokens(messages)
# If there are tools, calculate token count for tool descriptions
tools_tokens = 0
if tools:
for tool in tools:
tools_tokens += self.count_tokens(str(tool))
input_tokens += tools_tokens
# Check if token limits are exceeded
if not self.check_token_limit(input_tokens):
error_message = self.get_limit_error_message(input_tokens)
# Raise a special exception that won't be retried
raise TokenLimitExceeded(error_message)
# Validate tools if provided
if tools:
for tool in tools:
if not isinstance(tool, dict) or "type" not in tool:
raise ValueError("Each tool must be a dict with 'type' field")
# Set up the completion request
params = {
"model": self.model,
"messages": messages,
"tools": tools,
"tool_choice": tool_choice,
"timeout": timeout,
**kwargs,
}
if self.model in REASONING_MODELS:
params["max_completion_tokens"] = self.max_tokens
else:
params["max_tokens"] = self.max_tokens
params["temperature"] = (
temperature if temperature is not None else self.temperature
)
params["stream"] = False # Always use non-streaming for tool requests
response: ChatCompletion = await self.client.chat.completions.create(
**params
)
# Check if response is valid
if not response.choices or not response.choices[0].message:
print(response)
# raise ValueError("Invalid or empty response from LLM")
return None
# Update token counts
self.update_token_count(
response.usage.prompt_tokens, response.usage.completion_tokens
)
return response.choices[0].message
except TokenLimitExceeded:
# Re-raise token limit errors without logging
raise
except ValueError as ve:
logger.error(f"Validation error in ask_tool: {ve}")
raise
except OpenAIError as oe:
logger.error(f"OpenAI API error: {oe}")
if isinstance(oe, AuthenticationError):
logger.error("Authentication failed. Check API key.")
elif isinstance(oe, RateLimitError):
logger.error("Rate limit exceeded. Consider increasing retry attempts.")
elif isinstance(oe, APIError):
logger.error(f"API error: {oe}")
raise
except Exception as e:
logger.error(f"Unexpected error in ask_tool: {e}")
raise
```
## /app/logger.py
```py path="/app/logger.py"
import sys
from datetime import datetime
from loguru import logger as _logger
from app.config import PROJECT_ROOT
_print_level = "INFO"
def define_log_level(print_level="INFO", logfile_level="DEBUG", name: str = None):
"""Adjust the log level to above level"""
global _print_level
_print_level = print_level
current_date = datetime.now()
formatted_date = current_date.strftime("%Y%m%d%H%M%S")
log_name = (
f"{name}_{formatted_date}" if name else formatted_date
) # name a log with prefix name
_logger.remove()
_logger.add(sys.stderr, level=print_level)
_logger.add(PROJECT_ROOT / f"logs/{log_name}.log", level=logfile_level)
return _logger
logger = define_log_level()
if __name__ == "__main__":
logger.info("Starting application")
logger.debug("Debug message")
logger.warning("Warning message")
logger.error("Error message")
logger.critical("Critical message")
try:
raise ValueError("Test error")
except Exception as e:
logger.exception(f"An error occurred: {e}")
```
## /app/mcp/__init__.py
```py path="/app/mcp/__init__.py"
```
## /app/mcp/server.py
```py path="/app/mcp/server.py"
import logging
import sys
logging.basicConfig(level=logging.INFO, handlers=[logging.StreamHandler(sys.stderr)])
import argparse
import asyncio
import atexit
import json
from inspect import Parameter, Signature
from typing import Any, Dict, Optional
from mcp.server.fastmcp import FastMCP
from app.logger import logger
from app.tool.base import BaseTool
from app.tool.bash import Bash
from app.tool.browser_use_tool import BrowserUseTool
from app.tool.str_replace_editor import StrReplaceEditor
from app.tool.terminate import Terminate
class MCPServer:
"""MCP Server implementation with tool registration and management."""
def __init__(self, name: str = "openmanus"):
self.server = FastMCP(name)
self.tools: Dict[str, BaseTool] = {}
# Initialize standard tools
self.tools["bash"] = Bash()
self.tools["browser"] = BrowserUseTool()
self.tools["editor"] = StrReplaceEditor()
self.tools["terminate"] = Terminate()
def register_tool(self, tool: BaseTool, method_name: Optional[str] = None) -> None:
"""Register a tool with parameter validation and documentation."""
tool_name = method_name or tool.name
tool_param = tool.to_param()
tool_function = tool_param["function"]
# Define the async function to be registered
async def tool_method(**kwargs):
logger.info(f"Executing {tool_name}: {kwargs}")
result = await tool.execute(**kwargs)
logger.info(f"Result of {tool_name}: {result}")
# Handle different types of results (match original logic)
if hasattr(result, "model_dump"):
return json.dumps(result.model_dump())
elif isinstance(result, dict):
return json.dumps(result)
return result
# Set method metadata
tool_method.__name__ = tool_name
tool_method.__doc__ = self._build_docstring(tool_function)
tool_method.__signature__ = self._build_signature(tool_function)
# Store parameter schema (important for tools that access it programmatically)
param_props = tool_function.get("parameters", {}).get("properties", {})
required_params = tool_function.get("parameters", {}).get("required", [])
tool_method._parameter_schema = {
param_name: {
"description": param_details.get("description", ""),
"type": param_details.get("type", "any"),
"required": param_name in required_params,
}
for param_name, param_details in param_props.items()
}
# Register with server
self.server.tool()(tool_method)
logger.info(f"Registered tool: {tool_name}")
def _build_docstring(self, tool_function: dict) -> str:
"""Build a formatted docstring from tool function metadata."""
description = tool_function.get("description", "")
param_props = tool_function.get("parameters", {}).get("properties", {})
required_params = tool_function.get("parameters", {}).get("required", [])
# Build docstring (match original format)
docstring = description
if param_props:
docstring += "\n\nParameters:\n"
for param_name, param_details in param_props.items():
required_str = (
"(required)" if param_name in required_params else "(optional)"
)
param_type = param_details.get("type", "any")
param_desc = param_details.get("description", "")
docstring += (
f" {param_name} ({param_type}) {required_str}: {param_desc}\n"
)
return docstring
def _build_signature(self, tool_function: dict) -> Signature:
"""Build a function signature from tool function metadata."""
param_props = tool_function.get("parameters", {}).get("properties", {})
required_params = tool_function.get("parameters", {}).get("required", [])
parameters = []
# Follow original type mapping
for param_name, param_details in param_props.items():
param_type = param_details.get("type", "")
default = Parameter.empty if param_name in required_params else None
# Map JSON Schema types to Python types (same as original)
annotation = Any
if param_type == "string":
annotation = str
elif param_type == "integer":
annotation = int
elif param_type == "number":
annotation = float
elif param_type == "boolean":
annotation = bool
elif param_type == "object":
annotation = dict
elif param_type == "array":
annotation = list
# Create parameter with same structure as original
param = Parameter(
name=param_name,
kind=Parameter.KEYWORD_ONLY,
default=default,
annotation=annotation,
)
parameters.append(param)
return Signature(parameters=parameters)
async def cleanup(self) -> None:
"""Clean up server resources."""
logger.info("Cleaning up resources")
# Follow original cleanup logic - only clean browser tool
if "browser" in self.tools and hasattr(self.tools["browser"], "cleanup"):
await self.tools["browser"].cleanup()
def register_all_tools(self) -> None:
"""Register all tools with the server."""
for tool in self.tools.values():
self.register_tool(tool)
def run(self, transport: str = "stdio") -> None:
"""Run the MCP server."""
# Register all tools
self.register_all_tools()
# Register cleanup function (match original behavior)
atexit.register(lambda: asyncio.run(self.cleanup()))
# Start server (with same logging as original)
logger.info(f"Starting OpenManus server ({transport} mode)")
self.server.run(transport=transport)
def parse_args() -> argparse.Namespace:
"""Parse command line arguments."""
parser = argparse.ArgumentParser(description="OpenManus MCP Server")
parser.add_argument(
"--transport",
choices=["stdio"],
default="stdio",
help="Communication method: stdio or http (default: stdio)",
)
return parser.parse_args()
if __name__ == "__main__":
args = parse_args()
# Create and run server (maintaining original flow)
server = MCPServer()
server.run(transport=args.transport)
```
## /app/prompt/__init__.py
```py path="/app/prompt/__init__.py"
```
## /app/prompt/browser.py
```py path="/app/prompt/browser.py"
SYSTEM_PROMPT = """\
You are an AI agent designed to automate browser tasks. Your goal is to accomplish the ultimate task following the rules.
# Input Format
Task
Previous steps
Current URL
Open Tabs
Interactive Elements
[index]text
- index: Numeric identifier for interaction
- type: HTML element type (button, input, etc.)
- text: Element description
Example:
[33]
- Only elements with numeric indexes in [] are interactive
- elements without [] provide only context
# Response Rules
1. RESPONSE FORMAT: You must ALWAYS respond with valid JSON in this exact format:
{{"current_state": {{"evaluation_previous_goal": "Success|Failed|Unknown - Analyze the current elements and the image to check if the previous goals/actions are successful like intended by the task. Mention if something unexpected happened. Shortly state why/why not",
"memory": "Description of what has been done and what you need to remember. Be very specific. Count here ALWAYS how many times you have done something and how many remain. E.g. 0 out of 10 websites analyzed. Continue with abc and xyz",
"next_goal": "What needs to be done with the next immediate action"}},
"action":[{{"one_action_name": {{// action-specific parameter}}}}, // ... more actions in sequence]}}
2. ACTIONS: You can specify multiple actions in the list to be executed in sequence. But always specify only one action name per item. Use maximum {{max_actions}} actions per sequence.
Common action sequences:
- Form filling: [{{"input_text": {{"index": 1, "text": "username"}}}}, {{"input_text": {{"index": 2, "text": "password"}}}}, {{"click_element": {{"index": 3}}}}]
- Navigation and extraction: [{{"go_to_url": {{"url": "https://example.com"}}}}, {{"extract_content": {{"goal": "extract the names"}}}}]
- Actions are executed in the given order
- If the page changes after an action, the sequence is interrupted and you get the new state.
- Only provide the action sequence until an action which changes the page state significantly.
- Try to be efficient, e.g. fill forms at once, or chain actions where nothing changes on the page
- only use multiple actions if it makes sense.
3. ELEMENT INTERACTION:
- Only use indexes of the interactive elements
- Elements marked with "[]Non-interactive text" are non-interactive
4. NAVIGATION & ERROR HANDLING:
- If no suitable elements exist, use other functions to complete the task
- If stuck, try alternative approaches - like going back to a previous page, new search, new tab etc.
- Handle popups/cookies by accepting or closing them
- Use scroll to find elements you are looking for
- If you want to research something, open a new tab instead of using the current tab
- If captcha pops up, try to solve it - else try a different approach
- If the page is not fully loaded, use wait action
5. TASK COMPLETION:
- Use the done action as the last action as soon as the ultimate task is complete
- Dont use "done" before you are done with everything the user asked you, except you reach the last step of max_steps.
- If you reach your last step, use the done action even if the task is not fully finished. Provide all the information you have gathered so far. If the ultimate task is completly finished set success to true. If not everything the user asked for is completed set success in done to false!
- If you have to do something repeatedly for example the task says for "each", or "for all", or "x times", count always inside "memory" how many times you have done it and how many remain. Don't stop until you have completed like the task asked you. Only call done after the last step.
- Don't hallucinate actions
- Make sure you include everything you found out for the ultimate task in the done text parameter. Do not just say you are done, but include the requested information of the task.
6. VISUAL CONTEXT:
- When an image is provided, use it to understand the page layout
- Bounding boxes with labels on their top right corner correspond to element indexes
7. Form filling:
- If you fill an input field and your action sequence is interrupted, most often something changed e.g. suggestions popped up under the field.
8. Long tasks:
- Keep track of the status and subresults in the memory.
9. Extraction:
- If your task is to find information - call extract_content on the specific pages to get and store the information.
Your responses must be always JSON with the specified format.
"""
NEXT_STEP_PROMPT = """
What should I do next to achieve my goal?
When you see [Current state starts here], focus on the following:
- Current URL and page title{url_placeholder}
- Available tabs{tabs_placeholder}
- Interactive elements and their indices
- Content above{content_above_placeholder} or below{content_below_placeholder} the viewport (if indicated)
- Any action results or errors{results_placeholder}
For browser interactions:
- To navigate: browser_use with action="go_to_url", url="..."
- To click: browser_use with action="click_element", index=N
- To type: browser_use with action="input_text", index=N, text="..."
- To extract: browser_use with action="extract_content", goal="..."
- To scroll: browser_use with action="scroll_down" or "scroll_up"
Consider both what's visible and what might be beyond the current viewport.
Be methodical - remember your progress and what you've learned so far.
If you want to stop the interaction at any point, use the `terminate` tool/function call.
"""
```
## /app/prompt/manus.py
```py path="/app/prompt/manus.py"
SYSTEM_PROMPT = (
"You are OpenManus, an all-capable AI assistant, aimed at solving any task presented by the user. You have various tools at your disposal that you can call upon to efficiently complete complex requests. Whether it's programming, information retrieval, file processing, web browsing, or human interaction (only for extreme cases), you can handle it all."
"The initial directory is: {directory}"
)
NEXT_STEP_PROMPT = """
Based on user needs, proactively select the most appropriate tool or combination of tools. For complex tasks, you can break down the problem and use different tools step by step to solve it. After using each tool, clearly explain the execution results and suggest the next steps.
If you want to stop the interaction at any point, use the `terminate` tool/function call.
"""
```
## /app/prompt/mcp.py
```py path="/app/prompt/mcp.py"
"""Prompts for the MCP Agent."""
SYSTEM_PROMPT = """You are an AI assistant with access to a Model Context Protocol (MCP) server.
You can use the tools provided by the MCP server to complete tasks.
The MCP server will dynamically expose tools that you can use - always check the available tools first.
When using an MCP tool:
1. Choose the appropriate tool based on your task requirements
2. Provide properly formatted arguments as required by the tool
3. Observe the results and use them to determine next steps
4. Tools may change during operation - new tools might appear or existing ones might disappear
Follow these guidelines:
- Call tools with valid parameters as documented in their schemas
- Handle errors gracefully by understanding what went wrong and trying again with corrected parameters
- For multimedia responses (like images), you'll receive a description of the content
- Complete user requests step by step, using the most appropriate tools
- If multiple tools need to be called in sequence, make one call at a time and wait for results
Remember to clearly explain your reasoning and actions to the user.
"""
NEXT_STEP_PROMPT = """Based on the current state and available tools, what should be done next?
Think step by step about the problem and identify which MCP tool would be most helpful for the current stage.
If you've already made progress, consider what additional information you need or what actions would move you closer to completing the task.
"""
# Additional specialized prompts
TOOL_ERROR_PROMPT = """You encountered an error with the tool '{tool_name}'.
Try to understand what went wrong and correct your approach.
Common issues include:
- Missing or incorrect parameters
- Invalid parameter formats
- Using a tool that's no longer available
- Attempting an operation that's not supported
Please check the tool specifications and try again with corrected parameters.
"""
MULTIMEDIA_RESPONSE_PROMPT = """You've received a multimedia response (image, audio, etc.) from the tool '{tool_name}'.
This content has been processed and described for you.
Use this information to continue the task or provide insights to the user.
"""
```
## /app/prompt/planning.py
```py path="/app/prompt/planning.py"
PLANNING_SYSTEM_PROMPT = """
You are an expert Planning Agent tasked with solving problems efficiently through structured plans.
Your job is:
1. Analyze requests to understand the task scope
2. Create a clear, actionable plan that makes meaningful progress with the `planning` tool
3. Execute steps using available tools as needed
4. Track progress and adapt plans when necessary
5. Use `finish` to conclude immediately when the task is complete
Available tools will vary by task but may include:
- `planning`: Create, update, and track plans (commands: create, update, mark_step, etc.)
- `finish`: End the task when complete
Break tasks into logical steps with clear outcomes. Avoid excessive detail or sub-steps.
Think about dependencies and verification methods.
Know when to conclude - don't continue thinking once objectives are met.
"""
NEXT_STEP_PROMPT = """
Based on the current state, what's your next action?
Choose the most efficient path forward:
1. Is the plan sufficient, or does it need refinement?
2. Can you execute the next step immediately?
3. Is the task complete? If so, use `finish` right away.
Be concise in your reasoning, then select the appropriate tool or action.
"""
```
## /app/prompt/swe.py
```py path="/app/prompt/swe.py"
SYSTEM_PROMPT = """SETTING: You are an autonomous programmer, and you're working directly in the command line with a special interface.
The special interface consists of a file editor that shows you {{WINDOW}} lines of a file at a time.
In addition to typical bash commands, you can also use specific commands to help you navigate and edit files.
To call a command, you need to invoke it with a function call/tool call.
Please note that THE EDIT COMMAND REQUIRES PROPER INDENTATION.
If you'd like to add the line ' print(x)' you must fully write that out, with all those spaces before the code! Indentation is important and code that is not indented correctly will fail and require fixing before it can be run.
RESPONSE FORMAT:
Your shell prompt is formatted as follows:
(Open file: )
(Current directory: )
bash-$
First, you should _always_ include a general thought about what you're going to do next.
Then, for every response, you must include exactly _ONE_ tool call/function call.
Remember, you should always include a _SINGLE_ tool call/function call and then wait for a response from the shell before continuing with more discussion and commands. Everything you include in the DISCUSSION section will be saved for future reference.
If you'd like to issue two commands at once, PLEASE DO NOT DO THAT! Please instead first submit just the first tool call, and then after receiving a response you'll be able to issue the second tool call.
Note that the environment does NOT support interactive session commands (e.g. python, vim), so please do not invoke them.
"""
```
## /app/prompt/toolcall.py
```py path="/app/prompt/toolcall.py"
SYSTEM_PROMPT = "You are an agent that can execute tool calls"
NEXT_STEP_PROMPT = (
"If you want to stop interaction, use `terminate` tool/function call."
)
```
## /app/prompt/visualization.py
```py path="/app/prompt/visualization.py"
SYSTEM_PROMPT = """You are an AI agent designed to data analysis / visualization task. You have various tools at your disposal that you can call upon to efficiently complete complex requests.
# Note:
1. The workspace directory is: {directory}; Read / write file in workspace
2. Generate analysis conclusion report in the end"""
NEXT_STEP_PROMPT = """Based on user needs, break down the problem and use different tools step by step to solve it.
# Note
1. Each step select the most appropriate tool proactively (ONLY ONE).
2. After using each tool, clearly explain the execution results and suggest the next steps.
3. When observation with Error, review and fix it."""
```
## /app/sandbox/__init__.py
```py path="/app/sandbox/__init__.py"
"""
Docker Sandbox Module
Provides secure containerized execution environment with resource limits
and isolation for running untrusted code.
"""
from app.sandbox.client import (
BaseSandboxClient,
LocalSandboxClient,
create_sandbox_client,
)
from app.sandbox.core.exceptions import (
SandboxError,
SandboxResourceError,
SandboxTimeoutError,
)
from app.sandbox.core.manager import SandboxManager
from app.sandbox.core.sandbox import DockerSandbox
__all__ = [
"DockerSandbox",
"SandboxManager",
"BaseSandboxClient",
"LocalSandboxClient",
"create_sandbox_client",
"SandboxError",
"SandboxTimeoutError",
"SandboxResourceError",
]
```
## /app/sandbox/client.py
```py path="/app/sandbox/client.py"
from abc import ABC, abstractmethod
from typing import Dict, Optional, Protocol
from app.config import SandboxSettings
from app.sandbox.core.sandbox import DockerSandbox
class SandboxFileOperations(Protocol):
"""Protocol for sandbox file operations."""
async def copy_from(self, container_path: str, local_path: str) -> None:
"""Copies file from container to local.
Args:
container_path: File path in container.
local_path: Local destination path.
"""
...
async def copy_to(self, local_path: str, container_path: str) -> None:
"""Copies file from local to container.
Args:
local_path: Local source file path.
container_path: Destination path in container.
"""
...
async def read_file(self, path: str) -> str:
"""Reads file content from container.
Args:
path: File path in container.
Returns:
str: File content.
"""
...
async def write_file(self, path: str, content: str) -> None:
"""Writes content to file in container.
Args:
path: File path in container.
content: Content to write.
"""
...
class BaseSandboxClient(ABC):
"""Base sandbox client interface."""
@abstractmethod
async def create(
self,
config: Optional[SandboxSettings] = None,
volume_bindings: Optional[Dict[str, str]] = None,
) -> None:
"""Creates sandbox."""
@abstractmethod
async def run_command(self, command: str, timeout: Optional[int] = None) -> str:
"""Executes command."""
@abstractmethod
async def copy_from(self, container_path: str, local_path: str) -> None:
"""Copies file from container."""
@abstractmethod
async def copy_to(self, local_path: str, container_path: str) -> None:
"""Copies file to container."""
@abstractmethod
async def read_file(self, path: str) -> str:
"""Reads file."""
@abstractmethod
async def write_file(self, path: str, content: str) -> None:
"""Writes file."""
@abstractmethod
async def cleanup(self) -> None:
"""Cleans up resources."""
class LocalSandboxClient(BaseSandboxClient):
"""Local sandbox client implementation."""
def __init__(self):
"""Initializes local sandbox client."""
self.sandbox: Optional[DockerSandbox] = None
async def create(
self,
config: Optional[SandboxSettings] = None,
volume_bindings: Optional[Dict[str, str]] = None,
) -> None:
"""Creates a sandbox.
Args:
config: Sandbox configuration.
volume_bindings: Volume mappings.
Raises:
RuntimeError: If sandbox creation fails.
"""
self.sandbox = DockerSandbox(config, volume_bindings)
await self.sandbox.create()
async def run_command(self, command: str, timeout: Optional[int] = None) -> str:
"""Runs command in sandbox.
Args:
command: Command to execute.
timeout: Execution timeout in seconds.
Returns:
Command output.
Raises:
RuntimeError: If sandbox not initialized.
"""
if not self.sandbox:
raise RuntimeError("Sandbox not initialized")
return await self.sandbox.run_command(command, timeout)
async def copy_from(self, container_path: str, local_path: str) -> None:
"""Copies file from container to local.
Args:
container_path: File path in container.
local_path: Local destination path.
Raises:
RuntimeError: If sandbox not initialized.
"""
if not self.sandbox:
raise RuntimeError("Sandbox not initialized")
await self.sandbox.copy_from(container_path, local_path)
async def copy_to(self, local_path: str, container_path: str) -> None:
"""Copies file from local to container.
Args:
local_path: Local source file path.
container_path: Destination path in container.
Raises:
RuntimeError: If sandbox not initialized.
"""
if not self.sandbox:
raise RuntimeError("Sandbox not initialized")
await self.sandbox.copy_to(local_path, container_path)
async def read_file(self, path: str) -> str:
"""Reads file from container.
Args:
path: File path in container.
Returns:
File content.
Raises:
RuntimeError: If sandbox not initialized.
"""
if not self.sandbox:
raise RuntimeError("Sandbox not initialized")
return await self.sandbox.read_file(path)
async def write_file(self, path: str, content: str) -> None:
"""Writes file to container.
Args:
path: File path in container.
content: File content.
Raises:
RuntimeError: If sandbox not initialized.
"""
if not self.sandbox:
raise RuntimeError("Sandbox not initialized")
await self.sandbox.write_file(path, content)
async def cleanup(self) -> None:
"""Cleans up resources."""
if self.sandbox:
await self.sandbox.cleanup()
self.sandbox = None
def create_sandbox_client() -> LocalSandboxClient:
"""Creates a sandbox client.
Returns:
LocalSandboxClient: Sandbox client instance.
"""
return LocalSandboxClient()
SANDBOX_CLIENT = create_sandbox_client()
```
## /app/sandbox/core/exceptions.py
```py path="/app/sandbox/core/exceptions.py"
"""Exception classes for the sandbox system.
This module defines custom exceptions used throughout the sandbox system to
handle various error conditions in a structured way.
"""
class SandboxError(Exception):
"""Base exception for sandbox-related errors."""
class SandboxTimeoutError(SandboxError):
"""Exception raised when a sandbox operation times out."""
class SandboxResourceError(SandboxError):
"""Exception raised for resource-related errors."""
```
## /app/sandbox/core/manager.py
```py path="/app/sandbox/core/manager.py"
import asyncio
import uuid
from contextlib import asynccontextmanager
from typing import Dict, Optional, Set
import docker
from docker.errors import APIError, ImageNotFound
from app.config import SandboxSettings
from app.logger import logger
from app.sandbox.core.sandbox import DockerSandbox
class SandboxManager:
"""Docker sandbox manager.
Manages multiple DockerSandbox instances lifecycle including creation,
monitoring, and cleanup. Provides concurrent access control and automatic
cleanup mechanisms for sandbox resources.
Attributes:
max_sandboxes: Maximum allowed number of sandboxes.
idle_timeout: Sandbox idle timeout in seconds.
cleanup_interval: Cleanup check interval in seconds.
_sandboxes: Active sandbox instance mapping.
_last_used: Last used time record for sandboxes.
"""
def __init__(
self,
max_sandboxes: int = 100,
idle_timeout: int = 3600,
cleanup_interval: int = 300,
):
"""Initializes sandbox manager.
Args:
max_sandboxes: Maximum sandbox count limit.
idle_timeout: Idle timeout in seconds.
cleanup_interval: Cleanup check interval in seconds.
"""
self.max_sandboxes = max_sandboxes
self.idle_timeout = idle_timeout
self.cleanup_interval = cleanup_interval
# Docker client
self._client = docker.from_env()
# Resource mappings
self._sandboxes: Dict[str, DockerSandbox] = {}
self._last_used: Dict[str, float] = {}
# Concurrency control
self._locks: Dict[str, asyncio.Lock] = {}
self._global_lock = asyncio.Lock()
self._active_operations: Set[str] = set()
# Cleanup task
self._cleanup_task: Optional[asyncio.Task] = None
self._is_shutting_down = False
# Start automatic cleanup
self.start_cleanup_task()
async def ensure_image(self, image: str) -> bool:
"""Ensures Docker image is available.
Args:
image: Image name.
Returns:
bool: Whether image is available.
"""
try:
self._client.images.get(image)
return True
except ImageNotFound:
try:
logger.info(f"Pulling image {image}...")
await asyncio.get_event_loop().run_in_executor(
None, self._client.images.pull, image
)
return True
except (APIError, Exception) as e:
logger.error(f"Failed to pull image {image}: {e}")
return False
@asynccontextmanager
async def sandbox_operation(self, sandbox_id: str):
"""Context manager for sandbox operations.
Provides concurrency control and usage time updates.
Args:
sandbox_id: Sandbox ID.
Raises:
KeyError: If sandbox not found.
"""
if sandbox_id not in self._locks:
self._locks[sandbox_id] = asyncio.Lock()
async with self._locks[sandbox_id]:
if sandbox_id not in self._sandboxes:
raise KeyError(f"Sandbox {sandbox_id} not found")
self._active_operations.add(sandbox_id)
try:
self._last_used[sandbox_id] = asyncio.get_event_loop().time()
yield self._sandboxes[sandbox_id]
finally:
self._active_operations.remove(sandbox_id)
async def create_sandbox(
self,
config: Optional[SandboxSettings] = None,
volume_bindings: Optional[Dict[str, str]] = None,
) -> str:
"""Creates a new sandbox instance.
Args:
config: Sandbox configuration.
volume_bindings: Volume mapping configuration.
Returns:
str: Sandbox ID.
Raises:
RuntimeError: If max sandbox count reached or creation fails.
"""
async with self._global_lock:
if len(self._sandboxes) >= self.max_sandboxes:
raise RuntimeError(
f"Maximum number of sandboxes ({self.max_sandboxes}) reached"
)
config = config or SandboxSettings()
if not await self.ensure_image(config.image):
raise RuntimeError(f"Failed to ensure Docker image: {config.image}")
sandbox_id = str(uuid.uuid4())
try:
sandbox = DockerSandbox(config, volume_bindings)
await sandbox.create()
self._sandboxes[sandbox_id] = sandbox
self._last_used[sandbox_id] = asyncio.get_event_loop().time()
self._locks[sandbox_id] = asyncio.Lock()
logger.info(f"Created sandbox {sandbox_id}")
return sandbox_id
except Exception as e:
logger.error(f"Failed to create sandbox: {e}")
if sandbox_id in self._sandboxes:
await self.delete_sandbox(sandbox_id)
raise RuntimeError(f"Failed to create sandbox: {e}")
async def get_sandbox(self, sandbox_id: str) -> DockerSandbox:
"""Gets a sandbox instance.
Args:
sandbox_id: Sandbox ID.
Returns:
DockerSandbox: Sandbox instance.
Raises:
KeyError: If sandbox does not exist.
"""
async with self.sandbox_operation(sandbox_id) as sandbox:
return sandbox
def start_cleanup_task(self) -> None:
"""Starts automatic cleanup task."""
async def cleanup_loop():
while not self._is_shutting_down:
try:
await self._cleanup_idle_sandboxes()
except Exception as e:
logger.error(f"Error in cleanup loop: {e}")
await asyncio.sleep(self.cleanup_interval)
self._cleanup_task = asyncio.create_task(cleanup_loop())
async def _cleanup_idle_sandboxes(self) -> None:
"""Cleans up idle sandboxes."""
current_time = asyncio.get_event_loop().time()
to_cleanup = []
async with self._global_lock:
for sandbox_id, last_used in self._last_used.items():
if (
sandbox_id not in self._active_operations
and current_time - last_used > self.idle_timeout
):
to_cleanup.append(sandbox_id)
for sandbox_id in to_cleanup:
try:
await self.delete_sandbox(sandbox_id)
except Exception as e:
logger.error(f"Error cleaning up sandbox {sandbox_id}: {e}")
async def cleanup(self) -> None:
"""Cleans up all resources."""
logger.info("Starting manager cleanup...")
self._is_shutting_down = True
# Cancel cleanup task
if self._cleanup_task:
self._cleanup_task.cancel()
try:
await asyncio.wait_for(self._cleanup_task, timeout=1.0)
except (asyncio.CancelledError, asyncio.TimeoutError):
pass
# Get all sandbox IDs to clean up
async with self._global_lock:
sandbox_ids = list(self._sandboxes.keys())
# Concurrently clean up all sandboxes
cleanup_tasks = []
for sandbox_id in sandbox_ids:
task = asyncio.create_task(self._safe_delete_sandbox(sandbox_id))
cleanup_tasks.append(task)
if cleanup_tasks:
# Wait for all cleanup tasks to complete, with timeout to avoid infinite waiting
try:
await asyncio.wait(cleanup_tasks, timeout=30.0)
except asyncio.TimeoutError:
logger.error("Sandbox cleanup timed out")
# Clean up remaining references
self._sandboxes.clear()
self._last_used.clear()
self._locks.clear()
self._active_operations.clear()
logger.info("Manager cleanup completed")
async def _safe_delete_sandbox(self, sandbox_id: str) -> None:
"""Safely deletes a single sandbox.
Args:
sandbox_id: Sandbox ID to delete.
"""
try:
if sandbox_id in self._active_operations:
logger.warning(
f"Sandbox {sandbox_id} has active operations, waiting for completion"
)
for _ in range(10): # Wait at most 10 times
await asyncio.sleep(0.5)
if sandbox_id not in self._active_operations:
break
else:
logger.warning(
f"Timeout waiting for sandbox {sandbox_id} operations to complete"
)
# Get reference to sandbox object
sandbox = self._sandboxes.get(sandbox_id)
if sandbox:
await sandbox.cleanup()
# Remove sandbox record from manager
async with self._global_lock:
self._sandboxes.pop(sandbox_id, None)
self._last_used.pop(sandbox_id, None)
self._locks.pop(sandbox_id, None)
logger.info(f"Deleted sandbox {sandbox_id}")
except Exception as e:
logger.error(f"Error during cleanup of sandbox {sandbox_id}: {e}")
async def delete_sandbox(self, sandbox_id: str) -> None:
"""Deletes specified sandbox.
Args:
sandbox_id: Sandbox ID.
"""
if sandbox_id not in self._sandboxes:
return
try:
await self._safe_delete_sandbox(sandbox_id)
except Exception as e:
logger.error(f"Failed to delete sandbox {sandbox_id}: {e}")
async def __aenter__(self) -> "SandboxManager":
"""Async context manager entry."""
return self
async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
"""Async context manager exit."""
await self.cleanup()
def get_stats(self) -> Dict:
"""Gets manager statistics.
Returns:
Dict: Statistics information.
"""
return {
"total_sandboxes": len(self._sandboxes),
"active_operations": len(self._active_operations),
"max_sandboxes": self.max_sandboxes,
"idle_timeout": self.idle_timeout,
"cleanup_interval": self.cleanup_interval,
"is_shutting_down": self._is_shutting_down,
}
```
## /app/sandbox/core/sandbox.py
```py path="/app/sandbox/core/sandbox.py"
import asyncio
import io
import os
import tarfile
import tempfile
import uuid
from typing import Dict, Optional
import docker
from docker.errors import NotFound
from docker.models.containers import Container
from app.config import SandboxSettings
from app.sandbox.core.exceptions import SandboxTimeoutError
from app.sandbox.core.terminal import AsyncDockerizedTerminal
class DockerSandbox:
"""Docker sandbox environment.
Provides a containerized execution environment with resource limits,
file operations, and command execution capabilities.
Attributes:
config: Sandbox configuration.
volume_bindings: Volume mapping configuration.
client: Docker client.
container: Docker container instance.
terminal: Container terminal interface.
"""
def __init__(
self,
config: Optional[SandboxSettings] = None,
volume_bindings: Optional[Dict[str, str]] = None,
):
"""Initializes a sandbox instance.
Args:
config: Sandbox configuration. Default configuration used if None.
volume_bindings: Volume mappings in {host_path: container_path} format.
"""
self.config = config or SandboxSettings()
self.volume_bindings = volume_bindings or {}
self.client = docker.from_env()
self.container: Optional[Container] = None
self.terminal: Optional[AsyncDockerizedTerminal] = None
async def create(self) -> "DockerSandbox":
"""Creates and starts the sandbox container.
Returns:
Current sandbox instance.
Raises:
docker.errors.APIError: If Docker API call fails.
RuntimeError: If container creation or startup fails.
"""
try:
# Prepare container config
host_config = self.client.api.create_host_config(
mem_limit=self.config.memory_limit,
cpu_period=100000,
cpu_quota=int(100000 * self.config.cpu_limit),
network_mode="none" if not self.config.network_enabled else "bridge",
binds=self._prepare_volume_bindings(),
)
# Generate unique container name with sandbox_ prefix
container_name = f"sandbox_{uuid.uuid4().hex[:8]}"
# Create container
container = await asyncio.to_thread(
self.client.api.create_container,
image=self.config.image,
command="tail -f /dev/null",
hostname="sandbox",
working_dir=self.config.work_dir,
host_config=host_config,
name=container_name,
tty=True,
detach=True,
)
self.container = self.client.containers.get(container["Id"])
# Start container
await asyncio.to_thread(self.container.start)
# Initialize terminal
self.terminal = AsyncDockerizedTerminal(
container["Id"],
self.config.work_dir,
env_vars={"PYTHONUNBUFFERED": "1"}
# Ensure Python output is not buffered
)
await self.terminal.init()
return self
except Exception as e:
await self.cleanup() # Ensure resources are cleaned up
raise RuntimeError(f"Failed to create sandbox: {e}") from e
def _prepare_volume_bindings(self) -> Dict[str, Dict[str, str]]:
"""Prepares volume binding configuration.
Returns:
Volume binding configuration dictionary.
"""
bindings = {}
# Create and add working directory mapping
work_dir = self._ensure_host_dir(self.config.work_dir)
bindings[work_dir] = {"bind": self.config.work_dir, "mode": "rw"}
# Add custom volume bindings
for host_path, container_path in self.volume_bindings.items():
bindings[host_path] = {"bind": container_path, "mode": "rw"}
return bindings
@staticmethod
def _ensure_host_dir(path: str) -> str:
"""Ensures directory exists on the host.
Args:
path: Directory path.
Returns:
Actual path on the host.
"""
host_path = os.path.join(
tempfile.gettempdir(),
f"sandbox_{os.path.basename(path)}_{os.urandom(4).hex()}",
)
os.makedirs(host_path, exist_ok=True)
return host_path
async def run_command(self, cmd: str, timeout: Optional[int] = None) -> str:
"""Runs a command in the sandbox.
Args:
cmd: Command to execute.
timeout: Timeout in seconds.
Returns:
Command output as string.
Raises:
RuntimeError: If sandbox not initialized or command execution fails.
TimeoutError: If command execution times out.
"""
if not self.terminal:
raise RuntimeError("Sandbox not initialized")
try:
return await self.terminal.run_command(
cmd, timeout=timeout or self.config.timeout
)
except TimeoutError:
raise SandboxTimeoutError(
f"Command execution timed out after {timeout or self.config.timeout} seconds"
)
async def read_file(self, path: str) -> str:
"""Reads a file from the container.
Args:
path: File path.
Returns:
File contents as string.
Raises:
FileNotFoundError: If file does not exist.
RuntimeError: If read operation fails.
"""
if not self.container:
raise RuntimeError("Sandbox not initialized")
try:
# Get file archive
resolved_path = self._safe_resolve_path(path)
tar_stream, _ = await asyncio.to_thread(
self.container.get_archive, resolved_path
)
# Read file content from tar stream
content = await self._read_from_tar(tar_stream)
return content.decode("utf-8")
except NotFound:
raise FileNotFoundError(f"File not found: {path}")
except Exception as e:
raise RuntimeError(f"Failed to read file: {e}")
async def write_file(self, path: str, content: str) -> None:
"""Writes content to a file in the container.
Args:
path: Target path.
content: File content.
Raises:
RuntimeError: If write operation fails.
"""
if not self.container:
raise RuntimeError("Sandbox not initialized")
try:
resolved_path = self._safe_resolve_path(path)
parent_dir = os.path.dirname(resolved_path)
# Create parent directory
if parent_dir:
await self.run_command(f"mkdir -p {parent_dir}")
# Prepare file data
tar_stream = await self._create_tar_stream(
os.path.basename(path), content.encode("utf-8")
)
# Write file
await asyncio.to_thread(
self.container.put_archive, parent_dir or "/", tar_stream
)
except Exception as e:
raise RuntimeError(f"Failed to write file: {e}")
def _safe_resolve_path(self, path: str) -> str:
"""Safely resolves container path, preventing path traversal.
Args:
path: Original path.
Returns:
Resolved absolute path.
Raises:
ValueError: If path contains potentially unsafe patterns.
"""
# Check for path traversal attempts
if ".." in path.split("/"):
raise ValueError("Path contains potentially unsafe patterns")
resolved = (
os.path.join(self.config.work_dir, path)
if not os.path.isabs(path)
else path
)
return resolved
async def copy_from(self, src_path: str, dst_path: str) -> None:
"""Copies a file from the container.
Args:
src_path: Source file path (container).
dst_path: Destination path (host).
Raises:
FileNotFoundError: If source file does not exist.
RuntimeError: If copy operation fails.
"""
try:
# Ensure destination file's parent directory exists
parent_dir = os.path.dirname(dst_path)
if parent_dir:
os.makedirs(parent_dir, exist_ok=True)
# Get file stream
resolved_src = self._safe_resolve_path(src_path)
stream, stat = await asyncio.to_thread(
self.container.get_archive, resolved_src
)
# Create temporary directory to extract file
with tempfile.TemporaryDirectory() as tmp_dir:
# Write stream to temporary file
tar_path = os.path.join(tmp_dir, "temp.tar")
with open(tar_path, "wb") as f:
for chunk in stream:
f.write(chunk)
# Extract file
with tarfile.open(tar_path) as tar:
members = tar.getmembers()
if not members:
raise FileNotFoundError(f"Source file is empty: {src_path}")
# If destination is a directory, we should preserve relative path structure
if os.path.isdir(dst_path):
tar.extractall(dst_path)
else:
# If destination is a file, we only extract the source file's content
if len(members) > 1:
raise RuntimeError(
f"Source path is a directory but destination is a file: {src_path}"
)
with open(dst_path, "wb") as dst:
src_file = tar.extractfile(members[0])
if src_file is None:
raise RuntimeError(
f"Failed to extract file: {src_path}"
)
dst.write(src_file.read())
except docker.errors.NotFound:
raise FileNotFoundError(f"Source file not found: {src_path}")
except Exception as e:
raise RuntimeError(f"Failed to copy file: {e}")
async def copy_to(self, src_path: str, dst_path: str) -> None:
"""Copies a file to the container.
Args:
src_path: Source file path (host).
dst_path: Destination path (container).
Raises:
FileNotFoundError: If source file does not exist.
RuntimeError: If copy operation fails.
"""
try:
if not os.path.exists(src_path):
raise FileNotFoundError(f"Source file not found: {src_path}")
# Create destination directory in container
resolved_dst = self._safe_resolve_path(dst_path)
container_dir = os.path.dirname(resolved_dst)
if container_dir:
await self.run_command(f"mkdir -p {container_dir}")
# Create tar file to upload
with tempfile.TemporaryDirectory() as tmp_dir:
tar_path = os.path.join(tmp_dir, "temp.tar")
with tarfile.open(tar_path, "w") as tar:
# Handle directory source path
if os.path.isdir(src_path):
os.path.basename(src_path.rstrip("/"))
for root, _, files in os.walk(src_path):
for file in files:
file_path = os.path.join(root, file)
arcname = os.path.join(
os.path.basename(dst_path),
os.path.relpath(file_path, src_path),
)
tar.add(file_path, arcname=arcname)
else:
# Add single file to tar
tar.add(src_path, arcname=os.path.basename(dst_path))
# Read tar file content
with open(tar_path, "rb") as f:
data = f.read()
# Upload to container
await asyncio.to_thread(
self.container.put_archive,
os.path.dirname(resolved_dst) or "/",
data,
)
# Verify file was created successfully
try:
await self.run_command(f"test -e {resolved_dst}")
except Exception:
raise RuntimeError(f"Failed to verify file creation: {dst_path}")
except FileNotFoundError:
raise
except Exception as e:
raise RuntimeError(f"Failed to copy file: {e}")
@staticmethod
async def _create_tar_stream(name: str, content: bytes) -> io.BytesIO:
"""Creates a tar file stream.
Args:
name: Filename.
content: File content.
Returns:
Tar file stream.
"""
tar_stream = io.BytesIO()
with tarfile.open(fileobj=tar_stream, mode="w") as tar:
tarinfo = tarfile.TarInfo(name=name)
tarinfo.size = len(content)
tar.addfile(tarinfo, io.BytesIO(content))
tar_stream.seek(0)
return tar_stream
@staticmethod
async def _read_from_tar(tar_stream) -> bytes:
"""Reads file content from a tar stream.
Args:
tar_stream: Tar file stream.
Returns:
File content.
Raises:
RuntimeError: If read operation fails.
"""
with tempfile.NamedTemporaryFile() as tmp:
for chunk in tar_stream:
tmp.write(chunk)
tmp.seek(0)
with tarfile.open(fileobj=tmp) as tar:
member = tar.next()
if not member:
raise RuntimeError("Empty tar archive")
file_content = tar.extractfile(member)
if not file_content:
raise RuntimeError("Failed to extract file content")
return file_content.read()
async def cleanup(self) -> None:
"""Cleans up sandbox resources."""
errors = []
try:
if self.terminal:
try:
await self.terminal.close()
except Exception as e:
errors.append(f"Terminal cleanup error: {e}")
finally:
self.terminal = None
if self.container:
try:
await asyncio.to_thread(self.container.stop, timeout=5)
except Exception as e:
errors.append(f"Container stop error: {e}")
try:
await asyncio.to_thread(self.container.remove, force=True)
except Exception as e:
errors.append(f"Container remove error: {e}")
finally:
self.container = None
except Exception as e:
errors.append(f"General cleanup error: {e}")
if errors:
print(f"Warning: Errors during cleanup: {', '.join(errors)}")
async def __aenter__(self) -> "DockerSandbox":
"""Async context manager entry."""
return await self.create()
async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
"""Async context manager exit."""
await self.cleanup()
```
## /app/sandbox/core/terminal.py
```py path="/app/sandbox/core/terminal.py"
"""
Asynchronous Docker Terminal
This module provides asynchronous terminal functionality for Docker containers,
allowing interactive command execution with timeout control.
"""
import asyncio
import re
import socket
from typing import Dict, Optional, Tuple, Union
import docker
from docker import APIClient
from docker.errors import APIError
from docker.models.containers import Container
class DockerSession:
def __init__(self, container_id: str) -> None:
"""Initializes a Docker session.
Args:
container_id: ID of the Docker container.
"""
self.api = APIClient()
self.container_id = container_id
self.exec_id = None
self.socket = None
async def create(self, working_dir: str, env_vars: Dict[str, str]) -> None:
"""Creates an interactive session with the container.
Args:
working_dir: Working directory inside the container.
env_vars: Environment variables to set.
Raises:
RuntimeError: If socket connection fails.
"""
startup_command = [
"bash",
"-c",
f"cd {working_dir} && "
"PROMPT_COMMAND='' "
"PS1='$ ' "
"exec bash --norc --noprofile",
]
exec_data = self.api.exec_create(
self.container_id,
startup_command,
stdin=True,
tty=True,
stdout=True,
stderr=True,
privileged=True,
user="root",
environment={**env_vars, "TERM": "dumb", "PS1": "$ ", "PROMPT_COMMAND": ""},
)
self.exec_id = exec_data["Id"]
socket_data = self.api.exec_start(
self.exec_id, socket=True, tty=True, stream=True, demux=True
)
if hasattr(socket_data, "_sock"):
self.socket = socket_data._sock
self.socket.setblocking(False)
else:
raise RuntimeError("Failed to get socket connection")
await self._read_until_prompt()
async def close(self) -> None:
"""Cleans up session resources.
1. Sends exit command
2. Closes socket connection
3. Checks and cleans up exec instance
"""
try:
if self.socket:
# Send exit command to close bash session
try:
self.socket.sendall(b"exit\n")
# Allow time for command execution
await asyncio.sleep(0.1)
except:
pass # Ignore sending errors, continue cleanup
# Close socket connection
try:
self.socket.shutdown(socket.SHUT_RDWR)
except:
pass # Some platforms may not support shutdown
self.socket.close()
self.socket = None
if self.exec_id:
try:
# Check exec instance status
exec_inspect = self.api.exec_inspect(self.exec_id)
if exec_inspect.get("Running", False):
# If still running, wait for it to complete
await asyncio.sleep(0.5)
except:
pass # Ignore inspection errors, continue cleanup
self.exec_id = None
except Exception as e:
# Log error but don't raise, ensure cleanup continues
print(f"Warning: Error during session cleanup: {e}")
async def _read_until_prompt(self) -> str:
"""Reads output until prompt is found.
Returns:
String containing output up to the prompt.
Raises:
socket.error: If socket communication fails.
"""
buffer = b""
while b"$ " not in buffer:
try:
chunk = self.socket.recv(4096)
if chunk:
buffer += chunk
except socket.error as e:
if e.errno == socket.EWOULDBLOCK:
await asyncio.sleep(0.1)
continue
raise
return buffer.decode("utf-8")
async def execute(self, command: str, timeout: Optional[int] = None) -> str:
"""Executes a command and returns cleaned output.
Args:
command: Shell command to execute.
timeout: Maximum execution time in seconds.
Returns:
Command output as string with prompt markers removed.
Raises:
RuntimeError: If session not initialized or execution fails.
TimeoutError: If command execution exceeds timeout.
"""
if not self.socket:
raise RuntimeError("Session not initialized")
try:
# Sanitize command to prevent shell injection
sanitized_command = self._sanitize_command(command)
full_command = f"{sanitized_command}\necho $?\n"
self.socket.sendall(full_command.encode())
async def read_output() -> str:
buffer = b""
result_lines = []
command_sent = False
while True:
try:
chunk = self.socket.recv(4096)
if not chunk:
break
buffer += chunk
lines = buffer.split(b"\n")
buffer = lines[-1]
lines = lines[:-1]
for line in lines:
line = line.rstrip(b"\r")
if not command_sent:
command_sent = True
continue
if line.strip() == b"echo $?" or line.strip().isdigit():
continue
if line.strip():
result_lines.append(line)
if buffer.endswith(b"$ "):
break
except socket.error as e:
if e.errno == socket.EWOULDBLOCK:
await asyncio.sleep(0.1)
continue
raise
output = b"\n".join(result_lines).decode("utf-8")
output = re.sub(r"\n\$ echo \$\$?.*$", "", output)
return output
if timeout:
result = await asyncio.wait_for(read_output(), timeout)
else:
result = await read_output()
return result.strip()
except asyncio.TimeoutError:
raise TimeoutError(f"Command execution timed out after {timeout} seconds")
except Exception as e:
raise RuntimeError(f"Failed to execute command: {e}")
def _sanitize_command(self, command: str) -> str:
"""Sanitizes the command string to prevent shell injection.
Args:
command: Raw command string.
Returns:
Sanitized command string.
Raises:
ValueError: If command contains potentially dangerous patterns.
"""
# Additional checks for specific risky commands
risky_commands = [
"rm -rf /",
"rm -rf /*",
"mkfs",
"dd if=/dev/zero",
":(){:|:&};:",
"chmod -R 777 /",
"chown -R",
]
for risky in risky_commands:
if risky in command.lower():
raise ValueError(
f"Command contains potentially dangerous operation: {risky}"
)
return command
class AsyncDockerizedTerminal:
def __init__(
self,
container: Union[str, Container],
working_dir: str = "/workspace",
env_vars: Optional[Dict[str, str]] = None,
default_timeout: int = 60,
) -> None:
"""Initializes an asynchronous terminal for Docker containers.
Args:
container: Docker container ID or Container object.
working_dir: Working directory inside the container.
env_vars: Environment variables to set.
default_timeout: Default command execution timeout in seconds.
"""
self.client = docker.from_env()
self.container = (
container
if isinstance(container, Container)
else self.client.containers.get(container)
)
self.working_dir = working_dir
self.env_vars = env_vars or {}
self.default_timeout = default_timeout
self.session = None
async def init(self) -> None:
"""Initializes the terminal environment.
Ensures working directory exists and creates an interactive session.
Raises:
RuntimeError: If initialization fails.
"""
await self._ensure_workdir()
self.session = DockerSession(self.container.id)
await self.session.create(self.working_dir, self.env_vars)
async def _ensure_workdir(self) -> None:
"""Ensures working directory exists in container.
Raises:
RuntimeError: If directory creation fails.
"""
try:
await self._exec_simple(f"mkdir -p {self.working_dir}")
except APIError as e:
raise RuntimeError(f"Failed to create working directory: {e}")
async def _exec_simple(self, cmd: str) -> Tuple[int, str]:
"""Executes a simple command using Docker's exec_run.
Args:
cmd: Command to execute.
Returns:
Tuple of (exit_code, output).
"""
result = await asyncio.to_thread(
self.container.exec_run, cmd, environment=self.env_vars
)
return result.exit_code, result.output.decode("utf-8")
async def run_command(self, cmd: str, timeout: Optional[int] = None) -> str:
"""Runs a command in the container with timeout.
Args:
cmd: Shell command to execute.
timeout: Maximum execution time in seconds.
Returns:
Command output as string.
Raises:
RuntimeError: If terminal not initialized.
"""
if not self.session:
raise RuntimeError("Terminal not initialized")
return await self.session.execute(cmd, timeout=timeout or self.default_timeout)
async def close(self) -> None:
"""Closes the terminal session."""
if self.session:
await self.session.close()
async def __aenter__(self) -> "AsyncDockerizedTerminal":
"""Async context manager entry."""
await self.init()
return self
async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
"""Async context manager exit."""
await self.close()
```
## /app/schema.py
```py path="/app/schema.py"
from enum import Enum
from typing import Any, List, Literal, Optional, Union
from pydantic import BaseModel, Field
class Role(str, Enum):
"""Message role options"""
SYSTEM = "system"
USER = "user"
ASSISTANT = "assistant"
TOOL = "tool"
ROLE_VALUES = tuple(role.value for role in Role)
ROLE_TYPE = Literal[ROLE_VALUES] # type: ignore
class ToolChoice(str, Enum):
"""Tool choice options"""
NONE = "none"
AUTO = "auto"
REQUIRED = "required"
TOOL_CHOICE_VALUES = tuple(choice.value for choice in ToolChoice)
TOOL_CHOICE_TYPE = Literal[TOOL_CHOICE_VALUES] # type: ignore
class AgentState(str, Enum):
"""Agent execution states"""
IDLE = "IDLE"
RUNNING = "RUNNING"
FINISHED = "FINISHED"
ERROR = "ERROR"
class Function(BaseModel):
name: str
arguments: str
class ToolCall(BaseModel):
"""Represents a tool/function call in a message"""
id: str
type: str = "function"
function: Function
class Message(BaseModel):
"""Represents a chat message in the conversation"""
role: ROLE_TYPE = Field(...) # type: ignore
content: Optional[str] = Field(default=None)
tool_calls: Optional[List[ToolCall]] = Field(default=None)
name: Optional[str] = Field(default=None)
tool_call_id: Optional[str] = Field(default=None)
base64_image: Optional[str] = Field(default=None)
def __add__(self, other) -> List["Message"]:
"""支持 Message + list 或 Message + Message 的操作"""
if isinstance(other, list):
return [self] + other
elif isinstance(other, Message):
return [self, other]
else:
raise TypeError(
f"unsupported operand type(s) for +: '{type(self).__name__}' and '{type(other).__name__}'"
)
def __radd__(self, other) -> List["Message"]:
"""支持 list + Message 的操作"""
if isinstance(other, list):
return other + [self]
else:
raise TypeError(
f"unsupported operand type(s) for +: '{type(other).__name__}' and '{type(self).__name__}'"
)
def to_dict(self) -> dict:
"""Convert message to dictionary format"""
message = {"role": self.role}
if self.content is not None:
message["content"] = self.content
if self.tool_calls is not None:
message["tool_calls"] = [tool_call.dict() for tool_call in self.tool_calls]
if self.name is not None:
message["name"] = self.name
if self.tool_call_id is not None:
message["tool_call_id"] = self.tool_call_id
if self.base64_image is not None:
message["base64_image"] = self.base64_image
return message
@classmethod
def user_message(
cls, content: str, base64_image: Optional[str] = None
) -> "Message":
"""Create a user message"""
return cls(role=Role.USER, content=content, base64_image=base64_image)
@classmethod
def system_message(cls, content: str) -> "Message":
"""Create a system message"""
return cls(role=Role.SYSTEM, content=content)
@classmethod
def assistant_message(
cls, content: Optional[str] = None, base64_image: Optional[str] = None
) -> "Message":
"""Create an assistant message"""
return cls(role=Role.ASSISTANT, content=content, base64_image=base64_image)
@classmethod
def tool_message(
cls, content: str, name, tool_call_id: str, base64_image: Optional[str] = None
) -> "Message":
"""Create a tool message"""
return cls(
role=Role.TOOL,
content=content,
name=name,
tool_call_id=tool_call_id,
base64_image=base64_image,
)
@classmethod
def from_tool_calls(
cls,
tool_calls: List[Any],
content: Union[str, List[str]] = "",
base64_image: Optional[str] = None,
**kwargs,
):
"""Create ToolCallsMessage from raw tool calls.
Args:
tool_calls: Raw tool calls from LLM
content: Optional message content
base64_image: Optional base64 encoded image
"""
formatted_calls = [
{"id": call.id, "function": call.function.model_dump(), "type": "function"}
for call in tool_calls
]
return cls(
role=Role.ASSISTANT,
content=content,
tool_calls=formatted_calls,
base64_image=base64_image,
**kwargs,
)
class Memory(BaseModel):
messages: List[Message] = Field(default_factory=list)
max_messages: int = Field(default=100)
def add_message(self, message: Message) -> None:
"""Add a message to memory"""
self.messages.append(message)
# Optional: Implement message limit
if len(self.messages) > self.max_messages:
self.messages = self.messages[-self.max_messages :]
def add_messages(self, messages: List[Message]) -> None:
"""Add multiple messages to memory"""
self.messages.extend(messages)
# Optional: Implement message limit
if len(self.messages) > self.max_messages:
self.messages = self.messages[-self.max_messages :]
def clear(self) -> None:
"""Clear all messages"""
self.messages.clear()
def get_recent_messages(self, n: int) -> List[Message]:
"""Get n most recent messages"""
return self.messages[-n:]
def to_dict_list(self) -> List[dict]:
"""Convert messages to list of dicts"""
return [msg.to_dict() for msg in self.messages]
```
## /app/tool/__init__.py
```py path="/app/tool/__init__.py"
from app.tool.base import BaseTool
from app.tool.bash import Bash
from app.tool.browser_use_tool import BrowserUseTool
from app.tool.create_chat_completion import CreateChatCompletion
from app.tool.planning import PlanningTool
from app.tool.str_replace_editor import StrReplaceEditor
from app.tool.terminate import Terminate
from app.tool.tool_collection import ToolCollection
from app.tool.web_search import WebSearch
__all__ = [
"BaseTool",
"Bash",
"BrowserUseTool",
"Terminate",
"StrReplaceEditor",
"WebSearch",
"ToolCollection",
"CreateChatCompletion",
"PlanningTool",
]
```
## /app/tool/ask_human.py
```py path="/app/tool/ask_human.py"
from app.tool import BaseTool
class AskHuman(BaseTool):
"""Add a tool to ask human for help."""
name: str = "ask_human"
description: str = "Use this tool to ask human for help."
parameters: str = {
"type": "object",
"properties": {
"inquire": {
"type": "string",
"description": "The question you want to ask human.",
}
},
"required": ["inquire"],
}
async def execute(self, inquire: str) -> str:
return input(f"""Bot: {inquire}\n\nYou: """).strip()
```
## /app/tool/base.py
```py path="/app/tool/base.py"
from abc import ABC, abstractmethod
from typing import Any, Dict, Optional
from pydantic import BaseModel, Field
class BaseTool(ABC, BaseModel):
name: str
description: str
parameters: Optional[dict] = None
class Config:
arbitrary_types_allowed = True
async def __call__(self, **kwargs) -> Any:
"""Execute the tool with given parameters."""
return await self.execute(**kwargs)
@abstractmethod
async def execute(self, **kwargs) -> Any:
"""Execute the tool with given parameters."""
def to_param(self) -> Dict:
"""Convert tool to function call format."""
return {
"type": "function",
"function": {
"name": self.name,
"description": self.description,
"parameters": self.parameters,
},
}
class ToolResult(BaseModel):
"""Represents the result of a tool execution."""
output: Any = Field(default=None)
error: Optional[str] = Field(default=None)
base64_image: Optional[str] = Field(default=None)
system: Optional[str] = Field(default=None)
class Config:
arbitrary_types_allowed = True
def __bool__(self):
return any(getattr(self, field) for field in self.__fields__)
def __add__(self, other: "ToolResult"):
def combine_fields(
field: Optional[str], other_field: Optional[str], concatenate: bool = True
):
if field and other_field:
if concatenate:
return field + other_field
raise ValueError("Cannot combine tool results")
return field or other_field
return ToolResult(
output=combine_fields(self.output, other.output),
error=combine_fields(self.error, other.error),
base64_image=combine_fields(self.base64_image, other.base64_image, False),
system=combine_fields(self.system, other.system),
)
def __str__(self):
return f"Error: {self.error}" if self.error else self.output
def replace(self, **kwargs):
"""Returns a new ToolResult with the given fields replaced."""
# return self.copy(update=kwargs)
return type(self)(**{**self.dict(), **kwargs})
class CLIResult(ToolResult):
"""A ToolResult that can be rendered as a CLI output."""
class ToolFailure(ToolResult):
"""A ToolResult that represents a failure."""
```
The content has been capped at 50000 tokens, and files over NaN bytes have been omitted. The user could consider applying other filters to refine the result. The better and more specific the context, the better the LLM can follow instructions. If the context seems verbose, the user can refine the filter using uithub. Thank you for using https://uithub.com - Perfect LLM context for any GitHub repo.