``` ├── .devcontainer/ ├── README.md ├── devcontainer.json ├── docker-compose.yaml ├── .gitattributes ├── .github/ ├── CODEOWNERS ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── DISCUSSION_TEMPLATE/ ├── ideas.yml ├── q-a.yml ├── ISSUE_TEMPLATE/ ├── bug-report.yml ├── config.yml ├── documentation.yml ├── privileged.yml ├── PULL_REQUEST_TEMPLATE.md ├── actions/ ├── people/ ├── Dockerfile ├── action.yml ├── app/ ├── main.py ├── poetry_setup/ ├── action.yml ├── uv_setup/ ├── action.yml ├── scripts/ ├── check_diff.py ├── check_prerelease_dependencies.py ├── get_min_versions.py ├── prep_api_docs_build.py ├── tools/ ├── git-restore-mtime ├── workflows/ ├── .codespell-exclude ├── _compile_integration_test.yml ├── _integration_test.yml ├── _lint.yml ├── _release.yml ├── _test.yml ├── _test_doc_imports.yml ├── _test_pydantic.yml ├── _test_release.yml ├── api_doc_build.yml ├── check-broken-links.yml ├── check_core_versions.yml ├── check_diffs.yml ├── check_new_docs.yml ├── codespell.yml ├── codspeed.yml ├── extract_ignored_words_list.py ├── people.yml ├── run_notebooks.yml ├── scheduled_test.yml ├── .gitignore ├── .pre-commit-config.yaml ├── .readthedocs.yaml ├── CITATION.cff ├── LICENSE ├── MIGRATE.md ├── Makefile ├── README.md ├── SECURITY.md ├── cookbook/ ├── Gemma_LangChain.ipynb ├── LLaMA2_sql_chat.ipynb ├── Multi_modal_RAG.ipynb ``` ## /.devcontainer/README.md # Dev container This project includes a [dev container](https://containers.dev/), which lets you use a container as a full-featured dev environment. You can use the dev container configuration in this folder to build and run the app without needing to install any of its tools locally! You can use it in [GitHub Codespaces](https://github.com/features/codespaces) or the [VS Code Dev Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers). ## GitHub Codespaces [![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/langchain-ai/langchain) You may use the button above, or follow these steps to open this repo in a Codespace: 1. Click the **Code** drop-down menu at the top of https://github.com/langchain-ai/langchain. 1. Click on the **Codespaces** tab. 1. Click **Create codespace on master**. For more info, check out the [GitHub documentation](https://docs.github.com/en/free-pro-team@latest/github/developing-online-with-codespaces/creating-a-codespace#creating-a-codespace). ## VS Code Dev Containers [![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain) Note: If you click the link above you will open the main repo (langchain-ai/langchain) and not your local cloned repo. This is fine if you only want to run and test the library, but if you want to contribute you can use the link below and replace with your username and cloned repo name: ``` https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com// ``` Then you will have a local cloned repo where you can contribute and then create pull requests. If you already have VS Code and Docker installed, you can use the button above to get started. This will cause VS Code to automatically install the Dev Containers extension if needed, clone the source code into a container volume, and spin up a dev container for use. Alternatively you can also follow these steps to open this repo in a container using the VS Code Dev Containers extension: 1. If this is your first time using a development container, please ensure your system meets the pre-reqs (i.e. have Docker installed) in the [getting started steps](https://aka.ms/vscode-remote/containers/getting-started). 2. Open a locally cloned copy of the code: - Fork and Clone this repository to your local filesystem. - Press F1 and select the **Dev Containers: Open Folder in Container...** command. - Select the cloned copy of this folder, wait for the container to start, and try things out! You can learn more in the [Dev Containers documentation](https://code.visualstudio.com/docs/devcontainers/containers). ## Tips and tricks * If you are working with the same repository folder in a container and Windows, you'll want consistent line endings (otherwise you may see hundreds of changes in the SCM view). The `.gitattributes` file in the root of this repo will disable line ending conversion and should prevent this. See [tips and tricks](https://code.visualstudio.com/docs/devcontainers/tips-and-tricks#_resolving-git-line-ending-issues-in-containers-resulting-in-many-modified-files) for more info. * If you'd like to review the contents of the image used in this dev container, you can check it out in the [devcontainers/images](https://github.com/devcontainers/images/tree/main/src/python) repo. ## /.devcontainer/devcontainer.json ```json path="/.devcontainer/devcontainer.json" // For format details, see https://aka.ms/devcontainer.json. For config options, see the // README at: https://github.com/devcontainers/templates/tree/main/src/docker-existing-docker-compose { // Name for the dev container "name": "langchain", // Point to a Docker Compose file "dockerComposeFile": "./docker-compose.yaml", // Required when using Docker Compose. The name of the service to connect to once running "service": "langchain", // The optional 'workspaceFolder' property is the path VS Code should open by default when // connected. This is typically a file mount in .devcontainer/docker-compose.yml "workspaceFolder": "/workspaces/langchain", // Prevent the container from shutting down "overrideCommand": true // Features to add to the dev container. More info: https://containers.dev/features // "features": { // "ghcr.io/devcontainers-contrib/features/poetry:2": {} // } // Use 'forwardPorts' to make a list of ports inside the container available locally. // "forwardPorts": [], // Uncomment the next line to run commands after the container is created. // "postCreateCommand": "cat /etc/os-release", // Configure tool-specific properties. // "customizations": {}, // Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root. // "remoteUser": "root" } ``` ## /.devcontainer/docker-compose.yaml ```yaml path="/.devcontainer/docker-compose.yaml" version: '3' services: langchain: build: dockerfile: libs/langchain/dev.Dockerfile context: .. volumes: # Update this to wherever you want VS Code to mount the folder of your project - ..:/workspaces/langchain:cached networks: - langchain-network # environment: # MONGO_ROOT_USERNAME: root # MONGO_ROOT_PASSWORD: example123 # depends_on: # - mongo # mongo: # image: mongo # restart: unless-stopped # environment: # MONGO_INITDB_ROOT_USERNAME: root # MONGO_INITDB_ROOT_PASSWORD: example123 # ports: # - "27017:27017" # networks: # - langchain-network networks: langchain-network: driver: bridge ``` ## /.gitattributes ```gitattributes path="/.gitattributes" * text=auto eol=lf *.{cmd,[cC][mM][dD]} text eol=crlf *.{bat,[bB][aA][tT]} text eol=crlf ``` ## /.github/CODEOWNERS ```github/CODEOWNERS path="/.github/CODEOWNERS" /.github/ @baskaryan @ccurme /libs/packages.yml @ccurme ``` ## /.github/CODE_OF_CONDUCT.md # Contributor Covenant Code of Conduct ## Our Pledge We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation. We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community. ## Our Standards Examples of behavior that contributes to a positive environment for our community include: * Demonstrating empathy and kindness toward other people * Being respectful of differing opinions, viewpoints, and experiences * Giving and gracefully accepting constructive feedback * Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience * Focusing on what is best not just for us as individuals, but for the overall community Examples of unacceptable behavior include: * The use of sexualized language or imagery, and sexual attention or advances of any kind * Trolling, insulting or derogatory comments, and personal or political attacks * Public or private harassment * Publishing others' private information, such as a physical or email address, without their explicit permission * Other conduct which could reasonably be considered inappropriate in a professional setting ## Enforcement Responsibilities Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful. Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate. ## Scope This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. ## Enforcement Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at conduct@langchain.dev. All complaints will be reviewed and investigated promptly and fairly. All community leaders are obligated to respect the privacy and security of the reporter of any incident. ## Enforcement Guidelines Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct: ### 1. Correction **Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community. **Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested. ### 2. Warning **Community Impact**: A violation through a single incident or series of actions. **Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban. ### 3. Temporary Ban **Community Impact**: A serious violation of community standards, including sustained inappropriate behavior. **Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban. ### 4. Permanent Ban **Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals. **Consequence**: A permanent ban from any sort of public interaction within the community. ## Attribution This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.1, available at [https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1]. Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder][Mozilla CoC]. For answers to common questions about this code of conduct, see the FAQ at [https://www.contributor-covenant.org/faq][FAQ]. Translations are available at [https://www.contributor-covenant.org/translations][translations]. [homepage]: https://www.contributor-covenant.org [v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html [Mozilla CoC]: https://github.com/mozilla/diversity [FAQ]: https://www.contributor-covenant.org/faq [translations]: https://www.contributor-covenant.org/translations ## /.github/CONTRIBUTING.md # Contributing to LangChain Hi there! Thank you for even being interested in contributing to LangChain. As an open-source project in a rapidly developing field, we are extremely open to contributions, whether they involve new features, improved infrastructure, better documentation, or bug fixes. To learn how to contribute to LangChain, please follow the [contribution guide here](https://python.langchain.com/docs/contributing/). ## /.github/DISCUSSION_TEMPLATE/ideas.yml ```yml path="/.github/DISCUSSION_TEMPLATE/ideas.yml" labels: [idea] body: - type: checkboxes id: checks attributes: label: Checked description: Please confirm and check all the following options. options: - label: I searched existing ideas and did not find a similar one required: true - label: I added a very descriptive title required: true - label: I've clearly described the feature request and motivation for it required: true - type: textarea id: feature-request validations: required: true attributes: label: Feature request description: | A clear and concise description of the feature proposal. Please provide links to any relevant GitHub repos, papers, or other resources if relevant. - type: textarea id: motivation validations: required: true attributes: label: Motivation description: | Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too. - type: textarea id: proposal validations: required: false attributes: label: Proposal (If applicable) description: | If you would like to propose a solution, please describe it here. ``` ## /.github/DISCUSSION_TEMPLATE/q-a.yml ```yml path="/.github/DISCUSSION_TEMPLATE/q-a.yml" labels: [Question] body: - type: markdown attributes: value: | Thanks for your interest in LangChain 🦜️🔗! Please follow these instructions, fill every question, and do every step. 🙏 We're asking for this because answering questions and solving problems in GitHub takes a lot of time -- this is time that we cannot spend on adding new features, fixing bugs, writing documentation or reviewing pull requests. By asking questions in a structured way (following this) it will be much easier for us to help you. There's a high chance that by following this process, you'll find the solution on your own, eliminating the need to submit a question and wait for an answer. 😎 As there are many questions submitted every day, we will **DISCARD** and close the incomplete ones. That will allow us (and others) to focus on helping people like you that follow the whole process. 🤓 Relevant links to check before opening a question to see if your question has already been answered, fixed or if there's another way to solve your problem: [LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction), [API Reference](https://python.langchain.com/api_reference/), [GitHub search](https://github.com/langchain-ai/langchain), [LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions), [LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue), [LangChain ChatBot](https://chat.langchain.com/) - type: checkboxes id: checks attributes: label: Checked other resources description: Please confirm and check all the following options. options: - label: I added a very descriptive title to this question. required: true - label: I searched the LangChain documentation with the integrated search. required: true - label: I used the GitHub search to find a similar question and didn't find it. required: true - type: checkboxes id: help attributes: label: Commit to Help description: | After submitting this, I commit to one of: * Read open questions until I find 2 where I can help someone and add a comment to help there. * I already hit the "watch" button in this repository to receive notifications and I commit to help at least 2 people that ask questions in the future. * Once my question is answered, I will mark the answer as "accepted". options: - label: I commit to help with one of those options 👆 required: true - type: textarea id: example attributes: label: Example Code description: | Please add a self-contained, [minimal, reproducible, example](https://stackoverflow.com/help/minimal-reproducible-example) with your use case. If a maintainer can copy it, run it, and see it right away, there's a much higher chance that you'll be able to get help. **Important!** * Use code tags (e.g., \`\`\`python ... \`\`\`) to correctly [format your code](https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting). * INCLUDE the language label (e.g. `python`) after the first three backticks to enable syntax highlighting. (e.g., \`\`\`python rather than \`\`\`). * Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you. * Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code. placeholder: | from langchain_core.runnables import RunnableLambda def bad_code(inputs) -> int: raise NotImplementedError('For demo purpose') chain = RunnableLambda(bad_code) chain.invoke('Hello!') render: python validations: required: true - type: textarea id: description attributes: label: Description description: | What is the problem, question, or error? Write a short description explaining what you are doing, what you expect to happen, and what is currently happening. placeholder: | * I'm trying to use the `langchain` library to do X. * I expect to see Y. * Instead, it does Z. validations: required: true - type: textarea id: system-info attributes: label: System Info description: | Please share your system info with us. "pip freeze | grep langchain" platform (windows / linux / mac) python version OR if you're on a recent version of langchain-core you can paste the output of: python -m langchain_core.sys_info placeholder: | "pip freeze | grep langchain" platform python version Alternatively, if you're on a recent version of langchain-core you can paste the output of: python -m langchain_core.sys_info These will only surface LangChain packages, don't forget to include any other relevant packages you're using (if you're not sure what's relevant, you can paste the entire output of `pip freeze`). validations: required: true ``` ## /.github/ISSUE_TEMPLATE/bug-report.yml ```yml path="/.github/ISSUE_TEMPLATE/bug-report.yml" name: "\U0001F41B Bug Report" description: Report a bug in LangChain. To report a security issue, please instead use the security option below. For questions, please use the GitHub Discussions. labels: ["02 Bug Report"] body: - type: markdown attributes: value: > Thank you for taking the time to file a bug report. Use this to report bugs in LangChain. If you're not certain that your issue is due to a bug in LangChain, please use [GitHub Discussions](https://github.com/langchain-ai/langchain/discussions) to ask for help with your issue. Relevant links to check before filing a bug report to see if your issue has already been reported, fixed or if there's another way to solve your problem: [LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction), [API Reference](https://python.langchain.com/api_reference/), [GitHub search](https://github.com/langchain-ai/langchain), [LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions), [LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue), [LangChain ChatBot](https://chat.langchain.com/) - type: checkboxes id: checks attributes: label: Checked other resources description: Please confirm and check all the following options. options: - label: I added a very descriptive title to this issue. required: true - label: I used the GitHub search to find a similar question and didn't find it. required: true - label: I am sure that this is a bug in LangChain rather than my code. required: true - label: The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package). required: true - label: I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS. required: true - type: textarea id: reproduction validations: required: true attributes: label: Example Code description: | Please add a self-contained, [minimal, reproducible, example](https://stackoverflow.com/help/minimal-reproducible-example) with your use case. If a maintainer can copy it, run it, and see it right away, there's a much higher chance that you'll be able to get help. **Important!** * Use code tags (e.g., \`\`\`python ... \`\`\`) to correctly [format your code](https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting). * INCLUDE the language label (e.g. `python`) after the first three backticks to enable syntax highlighting. (e.g., \`\`\`python rather than \`\`\`). * Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you. * Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code. placeholder: | The following code: \`\`\`python from langchain_core.runnables import RunnableLambda def bad_code(inputs) -> int: raise NotImplementedError('For demo purpose') chain = RunnableLambda(bad_code) chain.invoke('Hello!') \`\`\` - type: textarea id: error validations: required: false attributes: label: Error Message and Stack Trace (if applicable) description: | If you are reporting an error, please include the full error message and stack trace. placeholder: | Exception + full stack trace - type: textarea id: description attributes: label: Description description: | What is the problem, question, or error? Write a short description telling what you are doing, what you expect to happen, and what is currently happening. placeholder: | * I'm trying to use the `langchain` library to do X. * I expect to see Y. * Instead, it does Z. validations: required: true - type: textarea id: system-info attributes: label: System Info description: | Please share your system info with us. Do NOT skip this step and please don't trim the output. Most users don't include enough information here and it makes it harder for us to help you. Run the following command in your terminal and paste the output here: python -m langchain_core.sys_info or if you have an existing python interpreter running: from langchain_core import sys_info sys_info.print_sys_info() alternatively, put the entire output of `pip freeze` here. placeholder: | python -m langchain_core.sys_info validations: required: true ``` ## /.github/ISSUE_TEMPLATE/config.yml ```yml path="/.github/ISSUE_TEMPLATE/config.yml" blank_issues_enabled: false version: 2.1 contact_links: - name: 🤔 Question or Problem about: Ask a question or ask about a problem in GitHub Discussions. url: https://www.github.com/langchain-ai/langchain/discussions/categories/q-a - name: Feature Request url: https://www.github.com/langchain-ai/langchain/discussions/categories/ideas about: Suggest a feature or an idea - name: Show and tell about: Show what you built with LangChain url: https://www.github.com/langchain-ai/langchain/discussions/categories/show-and-tell ``` ## /.github/ISSUE_TEMPLATE/documentation.yml ```yml path="/.github/ISSUE_TEMPLATE/documentation.yml" name: Documentation description: Report an issue related to the LangChain documentation. title: "DOC: " labels: [03 - Documentation] body: - type: markdown attributes: value: > Thank you for taking the time to report an issue in the documentation. Only report issues with documentation here, explain if there are any missing topics or if you found a mistake in the documentation. Do **NOT** use this to ask usage questions or reporting issues with your code. If you have usage questions or need help solving some problem, please use [GitHub Discussions](https://github.com/langchain-ai/langchain/discussions). If you're in the wrong place, here are some helpful links to find a better place to ask your question: [LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction), [API Reference](https://python.langchain.com/api_reference/), [GitHub search](https://github.com/langchain-ai/langchain), [LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions), [LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue), [LangChain ChatBot](https://chat.langchain.com/) - type: input id: url attributes: label: URL description: URL to documentation validations: required: false - type: checkboxes id: checks attributes: label: Checklist description: Please confirm and check all the following options. options: - label: I added a very descriptive title to this issue. required: true - label: I included a link to the documentation page I am referring to (if applicable). required: true - type: textarea attributes: label: "Issue with current documentation:" description: > Please make sure to leave a reference to the document/code you're referring to. Feel free to include names of classes, functions, methods or concepts you'd like to see documented more. - type: textarea attributes: label: "Idea or request for content:" description: > Please describe as clearly as possible what topics you think are missing from the current documentation. ``` ## /.github/ISSUE_TEMPLATE/privileged.yml ```yml path="/.github/ISSUE_TEMPLATE/privileged.yml" name: 🔒 Privileged description: You are a LangChain maintainer, or was asked directly by a maintainer to create an issue here. If not, check the other options. body: - type: markdown attributes: value: | Thanks for your interest in LangChain! 🚀 If you are not a LangChain maintainer or were not asked directly by a maintainer to create an issue, then please start the conversation in a [Question in GitHub Discussions](https://github.com/langchain-ai/langchain/discussions/categories/q-a) instead. You are a LangChain maintainer if you maintain any of the packages inside of the LangChain repository or are a regular contributor to LangChain with previous merged pull requests. - type: checkboxes id: privileged attributes: label: Privileged issue description: Confirm that you are allowed to create an issue here. options: - label: I am a LangChain maintainer, or was asked directly by a LangChain maintainer to create an issue here. required: true - type: textarea id: content attributes: label: Issue Content description: Add the content of the issue here. ``` ## /.github/PULL_REQUEST_TEMPLATE.md Thank you for contributing to LangChain! - [ ] **PR title**: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] **PR message**: ***Delete this entire checklist*** and replace with - **Description:** a description of the change - **Issue:** the issue # it fixes, if applicable - **Dependencies:** any dependencies required for this change - **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] **Add tests and docs**: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, eyurtsev, ccurme, vbarda, hwchase17. ## /.github/actions/people/Dockerfile ```github/actions/people/Dockerfile path="/.github/actions/people/Dockerfile" FROM python:3.9 RUN pip install httpx PyGithub "pydantic==2.0.2" pydantic-settings "pyyaml>=5.3.1,<6.0.0" COPY ./app /app CMD ["python", "/app/main.py"] ``` ## /.github/actions/people/action.yml ```yml path="/.github/actions/people/action.yml" # Adapted from https://github.com/tiangolo/fastapi/blob/master/.github/actions/people/action.yml name: "Generate LangChain People" description: "Generate the data for the LangChain People page" author: "Jacob Lee " inputs: token: description: 'User token, to read the GitHub API. Can be passed in using {{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}' required: true runs: using: 'docker' image: 'Dockerfile' ``` ## /.github/actions/people/app/main.py ```py path="/.github/actions/people/app/main.py" # Adapted from https://github.com/tiangolo/fastapi/blob/master/.github/actions/people/app/main.py import logging import subprocess import sys from collections import Counter from datetime import datetime, timedelta, timezone from pathlib import Path from typing import Any, Container, Dict, List, Set, Union import httpx import yaml from github import Github from pydantic import BaseModel, SecretStr from pydantic_settings import BaseSettings github_graphql_url = "https://api.github.com/graphql" questions_category_id = "DIC_kwDOIPDwls4CS6Ve" # discussions_query = """ # query Q($after: String, $category_id: ID) { # repository(name: "langchain", owner: "langchain-ai") { # discussions(first: 100, after: $after, categoryId: $category_id) { # edges { # cursor # node { # number # author { # login # avatarUrl # url # } # title # createdAt # comments(first: 100) { # nodes { # createdAt # author { # login # avatarUrl # url # } # isAnswer # replies(first: 10) { # nodes { # createdAt # author { # login # avatarUrl # url # } # } # } # } # } # } # } # } # } # } # """ # issues_query = """ # query Q($after: String) { # repository(name: "langchain", owner: "langchain-ai") { # issues(first: 100, after: $after) { # edges { # cursor # node { # number # author { # login # avatarUrl # url # } # title # createdAt # state # comments(first: 100) { # nodes { # createdAt # author { # login # avatarUrl # url # } # } # } # } # } # } # } # } # """ prs_query = """ query Q($after: String) { repository(name: "langchain", owner: "langchain-ai") { pullRequests(first: 100, after: $after, states: MERGED) { edges { cursor node { changedFiles additions deletions number labels(first: 100) { nodes { name } } author { login avatarUrl url ... on User { twitterUsername } } title createdAt state reviews(first:100) { nodes { author { login avatarUrl url ... on User { twitterUsername } } state } } } } } } } """ class Author(BaseModel): login: str avatarUrl: str url: str twitterUsername: Union[str, None] = None # Issues and Discussions class CommentsNode(BaseModel): createdAt: datetime author: Union[Author, None] = None class Replies(BaseModel): nodes: List[CommentsNode] class DiscussionsCommentsNode(CommentsNode): replies: Replies class Comments(BaseModel): nodes: List[CommentsNode] class DiscussionsComments(BaseModel): nodes: List[DiscussionsCommentsNode] class IssuesNode(BaseModel): number: int author: Union[Author, None] = None title: str createdAt: datetime state: str comments: Comments class DiscussionsNode(BaseModel): number: int author: Union[Author, None] = None title: str createdAt: datetime comments: DiscussionsComments class IssuesEdge(BaseModel): cursor: str node: IssuesNode class DiscussionsEdge(BaseModel): cursor: str node: DiscussionsNode class Issues(BaseModel): edges: List[IssuesEdge] class Discussions(BaseModel): edges: List[DiscussionsEdge] class IssuesRepository(BaseModel): issues: Issues class DiscussionsRepository(BaseModel): discussions: Discussions class IssuesResponseData(BaseModel): repository: IssuesRepository class DiscussionsResponseData(BaseModel): repository: DiscussionsRepository class IssuesResponse(BaseModel): data: IssuesResponseData class DiscussionsResponse(BaseModel): data: DiscussionsResponseData # PRs class LabelNode(BaseModel): name: str class Labels(BaseModel): nodes: List[LabelNode] class ReviewNode(BaseModel): author: Union[Author, None] = None state: str class Reviews(BaseModel): nodes: List[ReviewNode] class PullRequestNode(BaseModel): number: int labels: Labels author: Union[Author, None] = None changedFiles: int additions: int deletions: int title: str createdAt: datetime state: str reviews: Reviews # comments: Comments class PullRequestEdge(BaseModel): cursor: str node: PullRequestNode class PullRequests(BaseModel): edges: List[PullRequestEdge] class PRsRepository(BaseModel): pullRequests: PullRequests class PRsResponseData(BaseModel): repository: PRsRepository class PRsResponse(BaseModel): data: PRsResponseData class Settings(BaseSettings): input_token: SecretStr github_repository: str httpx_timeout: int = 30 def get_graphql_response( *, settings: Settings, query: str, after: Union[str, None] = None, category_id: Union[str, None] = None, ) -> Dict[str, Any]: headers = {"Authorization": f"token {settings.input_token.get_secret_value()}"} # category_id is only used by one query, but GraphQL allows unused variables, so # keep it here for simplicity variables = {"after": after, "category_id": category_id} response = httpx.post( github_graphql_url, headers=headers, timeout=settings.httpx_timeout, json={"query": query, "variables": variables, "operationName": "Q"}, ) if response.status_code != 200: logging.error( f"Response was not 200, after: {after}, category_id: {category_id}" ) logging.error(response.text) raise RuntimeError(response.text) data = response.json() if "errors" in data: logging.error(f"Errors in response, after: {after}, category_id: {category_id}") logging.error(data["errors"]) logging.error(response.text) raise RuntimeError(response.text) return data # def get_graphql_issue_edges(*, settings: Settings, after: Union[str, None] = None): # data = get_graphql_response(settings=settings, query=issues_query, after=after) # graphql_response = IssuesResponse.model_validate(data) # return graphql_response.data.repository.issues.edges # def get_graphql_question_discussion_edges( # *, # settings: Settings, # after: Union[str, None] = None, # ): # data = get_graphql_response( # settings=settings, # query=discussions_query, # after=after, # category_id=questions_category_id, # ) # graphql_response = DiscussionsResponse.model_validate(data) # return graphql_response.data.repository.discussions.edges def get_graphql_pr_edges(*, settings: Settings, after: Union[str, None] = None): if after is None: print("Querying PRs...") else: print(f"Querying PRs with cursor {after}...") data = get_graphql_response(settings=settings, query=prs_query, after=after) graphql_response = PRsResponse.model_validate(data) return graphql_response.data.repository.pullRequests.edges # def get_issues_experts(settings: Settings): # issue_nodes: List[IssuesNode] = [] # issue_edges = get_graphql_issue_edges(settings=settings) # while issue_edges: # for edge in issue_edges: # issue_nodes.append(edge.node) # last_edge = issue_edges[-1] # issue_edges = get_graphql_issue_edges(settings=settings, after=last_edge.cursor) # commentors = Counter() # last_month_commentors = Counter() # authors: Dict[str, Author] = {} # now = datetime.now(tz=timezone.utc) # one_month_ago = now - timedelta(days=30) # for issue in issue_nodes: # issue_author_name = None # if issue.author: # authors[issue.author.login] = issue.author # issue_author_name = issue.author.login # issue_commentors = set() # for comment in issue.comments.nodes: # if comment.author: # authors[comment.author.login] = comment.author # if comment.author.login != issue_author_name: # issue_commentors.add(comment.author.login) # for author_name in issue_commentors: # commentors[author_name] += 1 # if issue.createdAt > one_month_ago: # last_month_commentors[author_name] += 1 # return commentors, last_month_commentors, authors # def get_discussions_experts(settings: Settings): # discussion_nodes: List[DiscussionsNode] = [] # discussion_edges = get_graphql_question_discussion_edges(settings=settings) # while discussion_edges: # for discussion_edge in discussion_edges: # discussion_nodes.append(discussion_edge.node) # last_edge = discussion_edges[-1] # discussion_edges = get_graphql_question_discussion_edges( # settings=settings, after=last_edge.cursor # ) # commentors = Counter() # last_month_commentors = Counter() # authors: Dict[str, Author] = {} # now = datetime.now(tz=timezone.utc) # one_month_ago = now - timedelta(days=30) # for discussion in discussion_nodes: # discussion_author_name = None # if discussion.author: # authors[discussion.author.login] = discussion.author # discussion_author_name = discussion.author.login # discussion_commentors = set() # for comment in discussion.comments.nodes: # if comment.author: # authors[comment.author.login] = comment.author # if comment.author.login != discussion_author_name: # discussion_commentors.add(comment.author.login) # for reply in comment.replies.nodes: # if reply.author: # authors[reply.author.login] = reply.author # if reply.author.login != discussion_author_name: # discussion_commentors.add(reply.author.login) # for author_name in discussion_commentors: # commentors[author_name] += 1 # if discussion.createdAt > one_month_ago: # last_month_commentors[author_name] += 1 # return commentors, last_month_commentors, authors # def get_experts(settings: Settings): # ( # discussions_commentors, # discussions_last_month_commentors, # discussions_authors, # ) = get_discussions_experts(settings=settings) # commentors = discussions_commentors # last_month_commentors = discussions_last_month_commentors # authors = {**discussions_authors} # return commentors, last_month_commentors, authors def _logistic(x, k): return x / (x + k) def get_contributors(settings: Settings): pr_nodes: List[PullRequestNode] = [] pr_edges = get_graphql_pr_edges(settings=settings) while pr_edges: for edge in pr_edges: pr_nodes.append(edge.node) last_edge = pr_edges[-1] pr_edges = get_graphql_pr_edges(settings=settings, after=last_edge.cursor) contributors = Counter() contributor_scores = Counter() recent_contributor_scores = Counter() reviewers = Counter() authors: Dict[str, Author] = {} for pr in pr_nodes: pr_reviewers: Set[str] = set() for review in pr.reviews.nodes: if review.author: authors[review.author.login] = review.author pr_reviewers.add(review.author.login) for reviewer in pr_reviewers: reviewers[reviewer] += 1 if pr.author: authors[pr.author.login] = pr.author contributors[pr.author.login] += 1 files_changed = pr.changedFiles lines_changed = pr.additions + pr.deletions score = _logistic(files_changed, 20) + _logistic(lines_changed, 100) contributor_scores[pr.author.login] += score three_months_ago = datetime.now(timezone.utc) - timedelta(days=3 * 30) if pr.createdAt > three_months_ago: recent_contributor_scores[pr.author.login] += score return ( contributors, contributor_scores, recent_contributor_scores, reviewers, authors, ) def get_top_users( *, counter: Counter, min_count: int, authors: Dict[str, Author], skip_users: Container[str], ): users = [] for commentor, count in counter.most_common(): if commentor in skip_users: continue if count >= min_count: author = authors[commentor] users.append( { "login": commentor, "count": count, "avatarUrl": author.avatarUrl, "twitterUsername": author.twitterUsername, "url": author.url, } ) return users if __name__ == "__main__": logging.basicConfig(level=logging.INFO) settings = Settings() logging.info(f"Using config: {settings.model_dump_json()}") g = Github(settings.input_token.get_secret_value()) repo = g.get_repo(settings.github_repository) # question_commentors, question_last_month_commentors, question_authors = get_experts( # settings=settings # ) ( contributors, contributor_scores, recent_contributor_scores, reviewers, pr_authors, ) = get_contributors(settings=settings) # authors = {**question_authors, **pr_authors} authors = {**pr_authors} maintainers_logins = { "hwchase17", "agola11", "baskaryan", "hinthornw", "nfcampos", "efriis", "eyurtsev", "rlancemartin", "ccurme", "vbarda", } hidden_logins = { "dev2049", "vowelparrot", "obi1kenobi", "langchain-infra", "jacoblee93", "isahers1", "dqbd", "bracesproul", "akira", } bot_names = {"dosubot", "github-actions", "CodiumAI-Agent"} maintainers = [] for login in maintainers_logins: user = authors[login] maintainers.append( { "login": login, "count": contributors[login], # + question_commentors[login], "avatarUrl": user.avatarUrl, "twitterUsername": user.twitterUsername, "url": user.url, } ) # min_count_expert = 10 # min_count_last_month = 3 min_score_contributor = 1 min_count_reviewer = 5 skip_users = maintainers_logins | bot_names | hidden_logins # experts = get_top_users( # counter=question_commentors, # min_count=min_count_expert, # authors=authors, # skip_users=skip_users, # ) # last_month_active = get_top_users( # counter=question_last_month_commentors, # min_count=min_count_last_month, # authors=authors, # skip_users=skip_users, # ) top_recent_contributors = get_top_users( counter=recent_contributor_scores, min_count=min_score_contributor, authors=authors, skip_users=skip_users, ) top_contributors = get_top_users( counter=contributor_scores, min_count=min_score_contributor, authors=authors, skip_users=skip_users, ) top_reviewers = get_top_users( counter=reviewers, min_count=min_count_reviewer, authors=authors, skip_users=skip_users, ) people = { "maintainers": maintainers, # "experts": experts, # "last_month_active": last_month_active, "top_recent_contributors": top_recent_contributors, "top_contributors": top_contributors, "top_reviewers": top_reviewers, } people_path = Path("./docs/data/people.yml") people_old_content = people_path.read_text(encoding="utf-8") new_people_content = yaml.dump( people, sort_keys=False, width=200, allow_unicode=True ) if people_old_content == new_people_content: logging.info("The LangChain People data hasn't changed, finishing.") sys.exit(0) people_path.write_text(new_people_content, encoding="utf-8") logging.info("Setting up GitHub Actions git user") subprocess.run(["git", "config", "user.name", "github-actions"], check=True) subprocess.run( ["git", "config", "user.email", "github-actions@github.com"], check=True ) branch_name = "langchain/langchain-people" logging.info(f"Creating a new branch {branch_name}") subprocess.run(["git", "checkout", "-B", branch_name], check=True) logging.info("Adding updated file") subprocess.run(["git", "add", str(people_path)], check=True) logging.info("Committing updated file") message = "👥 Update LangChain people data" result = subprocess.run(["git", "commit", "-m", message], check=True) logging.info("Pushing branch") subprocess.run(["git", "push", "origin", branch_name, "-f"], check=True) logging.info("Creating PR") pr = repo.create_pull(title=message, body=message, base="master", head=branch_name) logging.info(f"Created PR: {pr.number}") logging.info("Finished") ``` ## /.github/actions/poetry_setup/action.yml ```yml path="/.github/actions/poetry_setup/action.yml" # An action for setting up poetry install with caching. # Using a custom action since the default action does not # take poetry install groups into account. # Action code from: # https://github.com/actions/setup-python/issues/505#issuecomment-1273013236 name: poetry-install-with-caching description: Poetry install with support for caching of dependency groups. inputs: python-version: description: Python version, supporting MAJOR.MINOR only required: true poetry-version: description: Poetry version required: true cache-key: description: Cache key to use for manual handling of caching required: true working-directory: description: Directory whose poetry.lock file should be cached required: true runs: using: composite steps: - uses: actions/setup-python@v5 name: Setup python ${{ inputs.python-version }} id: setup-python with: python-version: ${{ inputs.python-version }} - uses: actions/cache@v4 id: cache-bin-poetry name: Cache Poetry binary - Python ${{ inputs.python-version }} env: SEGMENT_DOWNLOAD_TIMEOUT_MIN: "1" with: path: | /opt/pipx/venvs/poetry # This step caches the poetry installation, so make sure it's keyed on the poetry version as well. key: bin-poetry-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}-${{ inputs.poetry-version }} - name: Refresh shell hashtable and fixup softlinks if: steps.cache-bin-poetry.outputs.cache-hit == 'true' shell: bash env: POETRY_VERSION: ${{ inputs.poetry-version }} PYTHON_VERSION: ${{ inputs.python-version }} run: | set -eux # Refresh the shell hashtable, to ensure correct `which` output. hash -r # `actions/cache@v3` doesn't always seem able to correctly unpack softlinks. # Delete and recreate the softlinks pipx expects to have. rm /opt/pipx/venvs/poetry/bin/python cd /opt/pipx/venvs/poetry/bin ln -s "$(which "python$PYTHON_VERSION")" python chmod +x python cd /opt/pipx_bin/ ln -s /opt/pipx/venvs/poetry/bin/poetry poetry chmod +x poetry # Ensure everything got set up correctly. /opt/pipx/venvs/poetry/bin/python --version /opt/pipx_bin/poetry --version - name: Install poetry if: steps.cache-bin-poetry.outputs.cache-hit != 'true' shell: bash env: POETRY_VERSION: ${{ inputs.poetry-version }} PYTHON_VERSION: ${{ inputs.python-version }} # Install poetry using the python version installed by setup-python step. run: pipx install "poetry==$POETRY_VERSION" --python '${{ steps.setup-python.outputs.python-path }}' --verbose - name: Restore pip and poetry cached dependencies uses: actions/cache@v4 env: SEGMENT_DOWNLOAD_TIMEOUT_MIN: "4" WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }} with: path: | ~/.cache/pip ~/.cache/pypoetry/virtualenvs ~/.cache/pypoetry/cache ~/.cache/pypoetry/artifacts ${{ env.WORKDIR }}/.venv key: py-deps-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}-poetry-${{ inputs.poetry-version }}-${{ inputs.cache-key }}-${{ hashFiles(format('{0}/**/poetry.lock', env.WORKDIR)) }} ``` ## /.github/actions/uv_setup/action.yml ```yml path="/.github/actions/uv_setup/action.yml" # TODO: https://docs.astral.sh/uv/guides/integration/github/#caching name: uv-install description: Set up Python and uv inputs: python-version: description: Python version, supporting MAJOR.MINOR only required: true env: UV_VERSION: "0.5.25" runs: using: composite steps: - name: Install uv and set the python version uses: astral-sh/setup-uv@v5 with: version: ${{ env.UV_VERSION }} python-version: ${{ inputs.python-version }} ``` ## /.github/scripts/check_diff.py ```py path="/.github/scripts/check_diff.py" import glob import json import os import sys from collections import defaultdict from typing import Dict, List, Set from pathlib import Path import tomllib from packaging.requirements import Requirement from get_min_versions import get_min_version_from_toml LANGCHAIN_DIRS = [ "libs/core", "libs/text-splitters", "libs/langchain", "libs/community", ] # when set to True, we are ignoring core dependents # in order to be able to get CI to pass for each individual # package that depends on core # e.g. if you touch core, we don't then add textsplitters/etc to CI IGNORE_CORE_DEPENDENTS = False # ignored partners are removed from dependents # but still run if directly edited IGNORED_PARTNERS = [ # remove huggingface from dependents because of CI instability # specifically in huggingface jobs # https://github.com/langchain-ai/langchain/issues/25558 "huggingface", # prompty exhibiting issues with numpy for Python 3.13 # https://github.com/langchain-ai/langchain/actions/runs/12651104685/job/35251034969?pr=29065 "prompty", ] PY_312_MAX_PACKAGES = [ "libs/partners/huggingface", # https://github.com/pytorch/pytorch/issues/130249 "libs/partners/voyageai", ] def all_package_dirs() -> Set[str]: return { "/".join(path.split("/")[:-1]).lstrip("./") for path in glob.glob("./libs/**/pyproject.toml", recursive=True) if "libs/cli" not in path and "libs/standard-tests" not in path } def dependents_graph() -> dict: """ Construct a mapping of package -> dependents, such that we can run tests on all dependents of a package when a change is made. """ dependents = defaultdict(set) for path in glob.glob("./libs/**/pyproject.toml", recursive=True): if "template" in path: continue # load regular and test deps from pyproject.toml with open(path, "rb") as f: pyproject = tomllib.load(f) pkg_dir = "libs" + "/".join(path.split("libs")[1].split("/")[:-1]) for dep in [ *pyproject["project"]["dependencies"], *pyproject["dependency-groups"]["test"], ]: requirement = Requirement(dep) package_name = requirement.name if "langchain" in dep: dependents[package_name].add(pkg_dir) continue # load extended deps from extended_testing_deps.txt package_path = Path(path).parent extended_requirement_path = package_path / "extended_testing_deps.txt" if extended_requirement_path.exists(): with open(extended_requirement_path, "r") as f: extended_deps = f.read().splitlines() for depline in extended_deps: if depline.startswith("-e "): # editable dependency assert depline.startswith( "-e ../partners/" ), "Extended test deps should only editable install partner packages" partner = depline.split("partners/")[1] dep = f"langchain-{partner}" else: dep = depline.split("==")[0] if "langchain" in dep: dependents[dep].add(pkg_dir) for k in dependents: for partner in IGNORED_PARTNERS: if f"libs/partners/{partner}" in dependents[k]: dependents[k].remove(f"libs/partners/{partner}") return dependents def add_dependents(dirs_to_eval: Set[str], dependents: dict) -> List[str]: updated = set() for dir_ in dirs_to_eval: # handle core manually because it has so many dependents if "core" in dir_: updated.add(dir_) continue pkg = "langchain-" + dir_.split("/")[-1] updated.update(dependents[pkg]) updated.add(dir_) return list(updated) def _get_configs_for_single_dir(job: str, dir_: str) -> List[Dict[str, str]]: if job == "test-pydantic": return _get_pydantic_test_configs(dir_) if dir_ == "libs/core": py_versions = ["3.9", "3.10", "3.11", "3.12", "3.13"] # custom logic for specific directories elif dir_ == "libs/partners/milvus": # milvus doesn't allow 3.12 because they declare deps in funny way py_versions = ["3.9", "3.11"] elif dir_ in PY_312_MAX_PACKAGES: py_versions = ["3.9", "3.12"] elif dir_ == "libs/langchain" and job == "extended-tests": py_versions = ["3.9", "3.13"] elif dir_ == "libs/community" and job == "extended-tests": py_versions = ["3.9", "3.12"] elif dir_ == "libs/community" and job == "compile-integration-tests": # community integration deps are slow in 3.12 py_versions = ["3.9", "3.11"] elif dir_ == ".": # unable to install with 3.13 because tokenizers doesn't support 3.13 yet py_versions = ["3.9", "3.12"] else: py_versions = ["3.9", "3.13"] return [{"working-directory": dir_, "python-version": py_v} for py_v in py_versions] def _get_pydantic_test_configs( dir_: str, *, python_version: str = "3.11" ) -> List[Dict[str, str]]: with open("./libs/core/uv.lock", "rb") as f: core_uv_lock_data = tomllib.load(f) for package in core_uv_lock_data["package"]: if package["name"] == "pydantic": core_max_pydantic_minor = package["version"].split(".")[1] break with open(f"./{dir_}/uv.lock", "rb") as f: dir_uv_lock_data = tomllib.load(f) for package in dir_uv_lock_data["package"]: if package["name"] == "pydantic": dir_max_pydantic_minor = package["version"].split(".")[1] break core_min_pydantic_version = get_min_version_from_toml( "./libs/core/pyproject.toml", "release", python_version, include=["pydantic"] )["pydantic"] core_min_pydantic_minor = ( core_min_pydantic_version.split(".")[1] if "." in core_min_pydantic_version else "0" ) dir_min_pydantic_version = get_min_version_from_toml( f"./{dir_}/pyproject.toml", "release", python_version, include=["pydantic"] ).get("pydantic", "0.0.0") dir_min_pydantic_minor = ( dir_min_pydantic_version.split(".")[1] if "." in dir_min_pydantic_version else "0" ) custom_mins = { # depends on pydantic-settings 2.4 which requires pydantic 2.7 "libs/community": 7, } max_pydantic_minor = min( int(dir_max_pydantic_minor), int(core_max_pydantic_minor), ) min_pydantic_minor = max( int(dir_min_pydantic_minor), int(core_min_pydantic_minor), custom_mins.get(dir_, 0), ) configs = [ { "working-directory": dir_, "pydantic-version": f"2.{v}.0", "python-version": python_version, } for v in range(min_pydantic_minor, max_pydantic_minor + 1) ] return configs def _get_configs_for_multi_dirs( job: str, dirs_to_run: Dict[str, Set[str]], dependents: dict ) -> List[Dict[str, str]]: if job == "lint": dirs = add_dependents( dirs_to_run["lint"] | dirs_to_run["test"] | dirs_to_run["extended-test"], dependents, ) elif job in ["test", "compile-integration-tests", "dependencies", "test-pydantic"]: dirs = add_dependents( dirs_to_run["test"] | dirs_to_run["extended-test"], dependents ) elif job == "extended-tests": dirs = list(dirs_to_run["extended-test"]) else: raise ValueError(f"Unknown job: {job}") return [ config for dir_ in dirs for config in _get_configs_for_single_dir(job, dir_) ] if __name__ == "__main__": files = sys.argv[1:] dirs_to_run: Dict[str, set] = { "lint": set(), "test": set(), "extended-test": set(), } docs_edited = False if len(files) >= 300: # max diff length is 300 files - there are likely files missing dirs_to_run["lint"] = all_package_dirs() dirs_to_run["test"] = all_package_dirs() dirs_to_run["extended-test"] = set(LANGCHAIN_DIRS) for file in files: if any( file.startswith(dir_) for dir_ in ( ".github/workflows", ".github/tools", ".github/actions", ".github/scripts/check_diff.py", ) ): # add all LANGCHAIN_DIRS for infra changes dirs_to_run["extended-test"].update(LANGCHAIN_DIRS) dirs_to_run["lint"].add(".") if any(file.startswith(dir_) for dir_ in LANGCHAIN_DIRS): # add that dir and all dirs after in LANGCHAIN_DIRS # for extended testing found = False for dir_ in LANGCHAIN_DIRS: if dir_ == "libs/core" and IGNORE_CORE_DEPENDENTS: dirs_to_run["extended-test"].add(dir_) continue if file.startswith(dir_): found = True if found: dirs_to_run["extended-test"].add(dir_) elif file.startswith("libs/standard-tests"): # TODO: update to include all packages that rely on standard-tests (all partner packages) # note: won't run on external repo partners dirs_to_run["lint"].add("libs/standard-tests") dirs_to_run["test"].add("libs/standard-tests") dirs_to_run["lint"].add("libs/cli") dirs_to_run["test"].add("libs/cli") dirs_to_run["test"].add("libs/partners/mistralai") dirs_to_run["test"].add("libs/partners/openai") dirs_to_run["test"].add("libs/partners/anthropic") dirs_to_run["test"].add("libs/partners/fireworks") dirs_to_run["test"].add("libs/partners/groq") elif file.startswith("libs/cli"): dirs_to_run["lint"].add("libs/cli") dirs_to_run["test"].add("libs/cli") elif file.startswith("libs/partners"): partner_dir = file.split("/")[2] if os.path.isdir(f"libs/partners/{partner_dir}") and [ filename for filename in os.listdir(f"libs/partners/{partner_dir}") if not filename.startswith(".") ] != ["README.md"]: dirs_to_run["test"].add(f"libs/partners/{partner_dir}") # Skip if the directory was deleted or is just a tombstone readme elif file == "libs/packages.yml": continue elif file.startswith("libs/"): raise ValueError( f"Unknown lib: {file}. check_diff.py likely needs " "an update for this new library!" ) elif file.startswith("docs/") or file in ["pyproject.toml", "uv.lock"]: # docs or root uv files docs_edited = True dirs_to_run["lint"].add(".") dependents = dependents_graph() # we now have dirs_by_job # todo: clean this up map_job_to_configs = { job: _get_configs_for_multi_dirs(job, dirs_to_run, dependents) for job in [ "lint", "test", "extended-tests", "compile-integration-tests", "dependencies", "test-pydantic", ] } map_job_to_configs["test-doc-imports"] = ( [{"python-version": "3.12"}] if docs_edited else [] ) for key, value in map_job_to_configs.items(): json_output = json.dumps(value) print(f"{key}={json_output}") ``` ## /.github/scripts/check_prerelease_dependencies.py ```py path="/.github/scripts/check_prerelease_dependencies.py" import sys import tomllib if __name__ == "__main__": # Get the TOML file path from the command line argument toml_file = sys.argv[1] # read toml file with open(toml_file, "rb") as file: toml_data = tomllib.load(file) # see if we're releasing an rc version = toml_data["project"]["version"] releasing_rc = "rc" in version or "dev" in version # if not, iterate through dependencies and make sure none allow prereleases if not releasing_rc: dependencies = toml_data["project"]["dependencies"] for dep_version in dependencies: dep_version_string = ( dep_version["version"] if isinstance(dep_version, dict) else dep_version ) if "rc" in dep_version_string: raise ValueError( f"Dependency {dep_version} has a prerelease version. Please remove this." ) if isinstance(dep_version, dict) and dep_version.get( "allow-prereleases", False ): raise ValueError( f"Dependency {dep_version} has allow-prereleases set to true. Please remove this." ) ``` ## /.github/scripts/get_min_versions.py ```py path="/.github/scripts/get_min_versions.py" from collections import defaultdict import sys from typing import Optional if sys.version_info >= (3, 11): import tomllib else: # for python 3.10 and below, which doesnt have stdlib tomllib import tomli as tomllib from packaging.requirements import Requirement from packaging.specifiers import SpecifierSet from packaging.version import Version import requests from packaging.version import parse from typing import List import re MIN_VERSION_LIBS = [ "langchain-core", "langchain-community", "langchain", "langchain-text-splitters", "numpy", "SQLAlchemy", ] # some libs only get checked on release because of simultaneous changes in # multiple libs SKIP_IF_PULL_REQUEST = [ "langchain-core", "langchain-text-splitters", "langchain", "langchain-community", ] def get_pypi_versions(package_name: str) -> List[str]: """ Fetch all available versions for a package from PyPI. Args: package_name (str): Name of the package Returns: List[str]: List of all available versions Raises: requests.exceptions.RequestException: If PyPI API request fails KeyError: If package not found or response format unexpected """ pypi_url = f"https://pypi.org/pypi/{package_name}/json" response = requests.get(pypi_url) response.raise_for_status() return list(response.json()["releases"].keys()) def get_minimum_version(package_name: str, spec_string: str) -> Optional[str]: """ Find the minimum published version that satisfies the given constraints. Args: package_name (str): Name of the package spec_string (str): Version specification string (e.g., ">=0.2.43,<0.4.0,!=0.3.0") Returns: Optional[str]: Minimum compatible version or None if no compatible version found """ # rewrite occurrences of ^0.0.z to 0.0.z (can be anywhere in constraint string) spec_string = re.sub(r"\^0\.0\.(\d+)", r"0.0.\1", spec_string) # rewrite occurrences of ^0.y.z to >=0.y.z,<0.y+1 (can be anywhere in constraint string) for y in range(1, 10): spec_string = re.sub(rf"\^0\.{y}\.(\d+)", rf">=0.{y}.\1,<0.{y+1}", spec_string) # rewrite occurrences of ^x.y.z to >=x.y.z,={x}.\1.\2,<{x+1}", spec_string ) spec_set = SpecifierSet(spec_string) all_versions = get_pypi_versions(package_name) valid_versions = [] for version_str in all_versions: try: version = parse(version_str) if spec_set.contains(version): valid_versions.append(version) except ValueError: continue return str(min(valid_versions)) if valid_versions else None def _check_python_version_from_requirement( requirement: Requirement, python_version: str ) -> bool: if not requirement.marker: return True else: marker_str = str(requirement.marker) if "python_version" or "python_full_version" in marker_str: python_version_str = "".join( char for char in marker_str if char.isdigit() or char in (".", "<", ">", "=", ",") ) return check_python_version(python_version, python_version_str) return True def get_min_version_from_toml( toml_path: str, versions_for: str, python_version: str, *, include: Optional[list] = None, ): # Parse the TOML file with open(toml_path, "rb") as file: toml_data = tomllib.load(file) dependencies = defaultdict(list) for dep in toml_data["project"]["dependencies"]: requirement = Requirement(dep) dependencies[requirement.name].append(requirement) # Initialize a dictionary to store the minimum versions min_versions = {} # Iterate over the libs in MIN_VERSION_LIBS for lib in set(MIN_VERSION_LIBS + (include or [])): if versions_for == "pull_request" and lib in SKIP_IF_PULL_REQUEST: # some libs only get checked on release because of simultaneous # changes in multiple libs continue # Check if the lib is present in the dependencies if lib in dependencies: if include and lib not in include: continue requirements = dependencies[lib] for requirement in requirements: if _check_python_version_from_requirement(requirement, python_version): version_string = str(requirement.specifier) break # Use parse_version to get the minimum supported version from version_string min_version = get_minimum_version(lib, version_string) # Store the minimum version in the min_versions dictionary min_versions[lib] = min_version return min_versions def check_python_version(version_string, constraint_string): """ Check if the given Python version matches the given constraints. :param version_string: A string representing the Python version (e.g. "3.8.5"). :param constraint_string: A string representing the package's Python version constraints (e.g. ">=3.6, <4.0"). :return: True if the version matches the constraints, False otherwise. """ # rewrite occurrences of ^0.0.z to 0.0.z (can be anywhere in constraint string) constraint_string = re.sub(r"\^0\.0\.(\d+)", r"0.0.\1", constraint_string) # rewrite occurrences of ^0.y.z to >=0.y.z,<0.y+1.0 (can be anywhere in constraint string) for y in range(1, 10): constraint_string = re.sub( rf"\^0\.{y}\.(\d+)", rf">=0.{y}.\1,<0.{y+1}.0", constraint_string ) # rewrite occurrences of ^x.y.z to >=x.y.z,={x}.0.\1,<{x+1}.0.0", constraint_string ) try: version = Version(version_string) constraints = SpecifierSet(constraint_string) return version in constraints except Exception as e: print(f"Error: {e}") return False if __name__ == "__main__": # Get the TOML file path from the command line argument toml_file = sys.argv[1] versions_for = sys.argv[2] python_version = sys.argv[3] assert versions_for in ["release", "pull_request"] # Call the function to get the minimum versions min_versions = get_min_version_from_toml(toml_file, versions_for, python_version) print(" ".join([f"{lib}=={version}" for lib, version in min_versions.items()])) ``` ## /.github/scripts/prep_api_docs_build.py ```py path="/.github/scripts/prep_api_docs_build.py" #!/usr/bin/env python """Script to sync libraries from various repositories into the main langchain repository.""" import os import shutil import yaml from pathlib import Path from typing import Dict, Any def load_packages_yaml() -> Dict[str, Any]: """Load and parse the packages.yml file.""" with open("langchain/libs/packages.yml", "r") as f: return yaml.safe_load(f) def get_target_dir(package_name: str) -> Path: """Get the target directory for a given package.""" package_name_short = package_name.replace("langchain-", "") base_path = Path("langchain/libs") if package_name_short == "experimental": return base_path / "experimental" return base_path / "partners" / package_name_short def clean_target_directories(packages: list) -> None: """Remove old directories that will be replaced.""" for package in packages: target_dir = get_target_dir(package["name"]) if target_dir.exists(): print(f"Removing {target_dir}") shutil.rmtree(target_dir) def move_libraries(packages: list) -> None: """Move libraries from their source locations to the target directories.""" for package in packages: repo_name = package["repo"].split("/")[1] source_path = package["path"] target_dir = get_target_dir(package["name"]) # Handle root path case if source_path == ".": source_dir = repo_name else: source_dir = f"{repo_name}/{source_path}" print(f"Moving {source_dir} to {target_dir}") # Ensure target directory exists os.makedirs(os.path.dirname(target_dir), exist_ok=True) try: # Move the directory shutil.move(source_dir, target_dir) except Exception as e: print(f"Error moving {source_dir} to {target_dir}: {e}") def main(): """Main function to orchestrate the library sync process.""" try: # Load packages configuration package_yaml = load_packages_yaml() # Clean target directories clean_target_directories([ p for p in package_yaml["packages"] if (p["repo"].startswith("langchain-ai/") or p.get("include_in_api_ref")) and p["repo"] != "langchain-ai/langchain" ]) # Move libraries to their new locations move_libraries([ p for p in package_yaml["packages"] if not p.get("disabled", False) and (p["repo"].startswith("langchain-ai/") or p.get("include_in_api_ref")) and p["repo"] != "langchain-ai/langchain" ]) # Delete ones without a pyproject.toml for partner in Path("langchain/libs/partners").iterdir(): if partner.is_dir() and not (partner / "pyproject.toml").exists(): print(f"Removing {partner} as it does not have a pyproject.toml") shutil.rmtree(partner) print("Library sync completed successfully!") except Exception as e: print(f"Error during library sync: {e}") raise if __name__ == "__main__": main() ``` ## /.github/tools/git-restore-mtime ```github/tools/git-restore-mtime path="/.github/tools/git-restore-mtime" #!/usr/bin/env python3 # # git-restore-mtime - Change mtime of files based on commit date of last change # # Copyright (C) 2012 Rodrigo Silva (MestreLion) # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. See # # Source: https://github.com/MestreLion/git-tools # Version: July 13, 2023 (commit hash 5f832e72453e035fccae9d63a5056918d64476a2) """ Change the modification time (mtime) of files in work tree, based on the date of the most recent commit that modified the file, including renames. Ignores untracked files and uncommitted deletions, additions and renames, and by default modifications too. --- Useful prior to generating release tarballs, so each file is archived with a date that is similar to the date when the file was actually last modified, assuming the actual modification date and its commit date are close. """ # TODO: # - Add -z on git whatchanged/ls-files, so we don't deal with filename decoding # - When Python is bumped to 3.7, use text instead of universal_newlines on subprocess # - Update "Statistics for some large projects" with modern hardware and repositories. # - Create a README.md for git-restore-mtime alone. It deserves extensive documentation # - Move Statistics there # - See git-extras as a good example on project structure and documentation # FIXME: # - When current dir is outside the worktree, e.g. using --work-tree, `git ls-files` # assume any relative pathspecs are to worktree root, not the current dir. As such, # relative pathspecs may not work. # - Renames are tricky: # - R100 should not change mtime, but original name is not on filelist. Should # track renames until a valid (A, M) mtime found and then set on current name. # - Should set mtime for both current and original directories. # - Check mode changes with unchanged blobs? # - Check file (A, D) for the directory mtime is not sufficient: # - Renames also change dir mtime, unless rename was on a parent dir # - If most recent change of all files in a dir was a Modification (M), # dir might not be touched at all. # - Dirs containing only subdirectories but no direct files will also # not be touched. They're files' [grand]parent dir, but never their dirname(). # - Some solutions: # - After files done, perform some dir processing for missing dirs, finding latest # file (A, D, R) # - Simple approach: dir mtime is the most recent child (dir or file) mtime # - Use a virtual concept of "created at most at" to fill missing info, bubble up # to parents and grandparents # - When handling [grand]parent dirs, stay inside # - Better handling of merge commits. `-m` is plain *wrong*. `-c/--cc` is perfect, but # painfully slow. First pass without merge commits is not accurate. Maybe add a new # `--accurate` mode for `--cc`? if __name__ != "__main__": raise ImportError("{} should not be used as a module.".format(__name__)) import argparse import datetime import logging import os.path import shlex import signal import subprocess import sys import time __version__ = "2022.12+dev" # Update symlinks only if the platform supports not following them UPDATE_SYMLINKS = bool(os.utime in getattr(os, 'supports_follow_symlinks', [])) # Call os.path.normpath() only if not in a POSIX platform (Windows) NORMALIZE_PATHS = (os.path.sep != '/') # How many files to process in each batch when re-trying merge commits STEPMISSING = 100 # (Extra) keywords for the os.utime() call performed by touch() UTIME_KWS = {} if not UPDATE_SYMLINKS else {'follow_symlinks': False} # Command-line interface ###################################################### def parse_args(): parser = argparse.ArgumentParser( description=__doc__.split('\n---')[0]) group = parser.add_mutually_exclusive_group() group.add_argument('--quiet', '-q', dest='loglevel', action="store_const", const=logging.WARNING, default=logging.INFO, help="Suppress informative messages and summary statistics.") group.add_argument('--verbose', '-v', action="count", help=""" Print additional information for each processed file. Specify twice to further increase verbosity. """) parser.add_argument('--cwd', '-C', metavar="DIRECTORY", help=""" Run as if %(prog)s was started in directory %(metavar)s. This affects how --work-tree, --git-dir and PATHSPEC arguments are handled. See 'man 1 git' or 'git --help' for more information. """) parser.add_argument('--git-dir', dest='gitdir', metavar="GITDIR", help=""" Path to the git repository, by default auto-discovered by searching the current directory and its parents for a .git/ subdirectory. """) parser.add_argument('--work-tree', dest='workdir', metavar="WORKTREE", help=""" Path to the work tree root, by default the parent of GITDIR if it's automatically discovered, or the current directory if GITDIR is set. """) parser.add_argument('--force', '-f', default=False, action="store_true", help=""" Force updating files with uncommitted modifications. Untracked files and uncommitted deletions, renames and additions are always ignored. """) parser.add_argument('--merge', '-m', default=False, action="store_true", help=""" Include merge commits. Leads to more recent times and more files per commit, thus with the same time, which may or may not be what you want. Including merge commits may lead to fewer commits being evaluated as files are found sooner, which can improve performance, sometimes substantially. But as merge commits are usually huge, processing them may also take longer. By default, merge commits are only used for files missing from regular commits. """) parser.add_argument('--first-parent', default=False, action="store_true", help=""" Consider only the first parent, the "main branch", when evaluating merge commits. Only effective when merge commits are processed, either when --merge is used or when finding missing files after the first regular log search. See --skip-missing. """) parser.add_argument('--skip-missing', '-s', dest="missing", default=True, action="store_false", help=""" Do not try to find missing files. If merge commits were not evaluated with --merge and some files were not found in regular commits, by default %(prog)s searches for these files again in the merge commits. This option disables this retry, so files found only in merge commits will not have their timestamp updated. """) parser.add_argument('--no-directories', '-D', dest='dirs', default=True, action="store_false", help=""" Do not update directory timestamps. By default, use the time of its most recently created, renamed or deleted file. Note that just modifying a file will NOT update its directory time. """) parser.add_argument('--test', '-t', default=False, action="store_true", help="Test run: do not actually update any file timestamp.") parser.add_argument('--commit-time', '-c', dest='commit_time', default=False, action='store_true', help="Use commit time instead of author time.") parser.add_argument('--oldest-time', '-o', dest='reverse_order', default=False, action='store_true', help=""" Update times based on the oldest, instead of the most recent commit of a file. This reverses the order in which the git log is processed to emulate a file "creation" date. Note this will be inaccurate for files deleted and re-created at later dates. """) parser.add_argument('--skip-older-than', metavar='SECONDS', type=int, help=""" Ignore files that are currently older than %(metavar)s. Useful in workflows that assume such files already have a correct timestamp, as it may improve performance by processing fewer files. """) parser.add_argument('--skip-older-than-commit', '-N', default=False, action='store_true', help=""" Ignore files older than the timestamp it would be updated to. Such files may be considered "original", likely in the author's repository. """) parser.add_argument('--unique-times', default=False, action="store_true", help=""" Set the microseconds to a unique value per commit. Allows telling apart changes that would otherwise have identical timestamps, as git's time accuracy is in seconds. """) parser.add_argument('pathspec', nargs='*', metavar='PATHSPEC', help=""" Only modify paths matching %(metavar)s, relative to current directory. By default, update all but untracked files and submodules. """) parser.add_argument('--version', '-V', action='version', version='%(prog)s version {version}'.format(version=get_version())) args_ = parser.parse_args() if args_.verbose: args_.loglevel = max(logging.TRACE, logging.DEBUG // args_.verbose) args_.debug = args_.loglevel <= logging.DEBUG return args_ def get_version(version=__version__): if not version.endswith('+dev'): return version try: cwd = os.path.dirname(os.path.realpath(__file__)) return Git(cwd=cwd, errors=False).describe().lstrip('v') except Git.Error: return '-'.join((version, "unknown")) # Helper functions ############################################################ def setup_logging(): """Add TRACE logging level and corresponding method, return the root logger""" logging.TRACE = TRACE = logging.DEBUG // 2 logging.Logger.trace = lambda _, m, *a, **k: _.log(TRACE, m, *a, **k) return logging.getLogger() def normalize(path): r"""Normalize paths from git, handling non-ASCII characters. Git stores paths as UTF-8 normalization form C. If path contains non-ASCII or non-printable characters, git outputs the UTF-8 in octal-escaped notation, escaping double-quotes and backslashes, and then double-quoting the whole path. https://git-scm.com/docs/git-config#Documentation/git-config.txt-corequotePath This function reverts this encoding, so: normalize(r'"Back\\slash_double\"quote_a\303\247a\303\255"') => r'Back\slash_double"quote_açaí') Paths with invalid UTF-8 encoding, such as single 0x80-0xFF bytes (e.g, from Latin1/Windows-1251 encoding) are decoded using surrogate escape, the same method used by Python for filesystem paths. So 0xE6 ("æ" in Latin1, r'\\346' from Git) is decoded as "\udce6". See https://peps.python.org/pep-0383/ and https://vstinner.github.io/painful-history-python-filesystem-encoding.html Also see notes on `windows/non-ascii-paths.txt` about path encodings on non-UTF-8 platforms and filesystems. """ if path and path[0] == '"': # Python 2: path = path[1:-1].decode("string-escape") # Python 3: https://stackoverflow.com/a/46650050/624066 path = (path[1:-1] # Remove enclosing double quotes .encode('latin1') # Convert to bytes, required by 'unicode-escape' .decode('unicode-escape') # Perform the actual octal-escaping decode .encode('latin1') # 1:1 mapping to bytes, UTF-8 encoded .decode('utf8', 'surrogateescape')) # Decode from UTF-8 if NORMALIZE_PATHS: # Make sure the slash matches the OS; for Windows we need a backslash path = os.path.normpath(path) return path def dummy(*_args, **_kwargs): """No-op function used in dry-run tests""" def touch(path, mtime): """The actual mtime update""" os.utime(path, (mtime, mtime), **UTIME_KWS) def touch_ns(path, mtime_ns): """The actual mtime update, using nanoseconds for unique timestamps""" os.utime(path, None, ns=(mtime_ns, mtime_ns), **UTIME_KWS) def isodate(secs: int): # time.localtime() accepts floats, but discards fractional part return time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(secs)) def isodate_ns(ns: int): # for integers fromtimestamp() is equivalent and ~16% slower than isodate() return datetime.datetime.fromtimestamp(ns / 1000000000).isoformat(sep=' ') def get_mtime_ns(secs: int, idx: int): # Time resolution for filesystems and functions: # ext-4 and other POSIX filesystems: 1 nanosecond # NTFS (Windows default): 100 nanoseconds # datetime.datetime() (due to 64-bit float epoch): 1 microsecond us = idx % 1000000 # 10**6 return 1000 * (1000000 * secs + us) def get_mtime_path(path): return os.path.getmtime(path) # Git class and parse_log(), the heart of the script ########################## class Git: def __init__(self, workdir=None, gitdir=None, cwd=None, errors=True): self.gitcmd = ['git'] self.errors = errors self._proc = None if workdir: self.gitcmd.extend(('--work-tree', workdir)) if gitdir: self.gitcmd.extend(('--git-dir', gitdir)) if cwd: self.gitcmd.extend(('-C', cwd)) self.workdir, self.gitdir = self._get_repo_dirs() def ls_files(self, paths: list = None): return (normalize(_) for _ in self._run('ls-files --full-name', paths)) def ls_dirty(self, force=False): return (normalize(_[3:].split(' -> ', 1)[-1]) for _ in self._run('status --porcelain') if _[:2] != '??' and (not force or (_[0] in ('R', 'A') or _[1] == 'D'))) def log(self, merge=False, first_parent=False, commit_time=False, reverse_order=False, paths: list = None): cmd = 'whatchanged --pretty={}'.format('%ct' if commit_time else '%at') if merge: cmd += ' -m' if first_parent: cmd += ' --first-parent' if reverse_order: cmd += ' --reverse' return self._run(cmd, paths) def describe(self): return self._run('describe --tags', check=True)[0] def terminate(self): if self._proc is None: return try: self._proc.terminate() except OSError: # Avoid errors on OpenBSD pass def _get_repo_dirs(self): return (os.path.normpath(_) for _ in self._run('rev-parse --show-toplevel --absolute-git-dir', check=True)) def _run(self, cmdstr: str, paths: list = None, output=True, check=False): cmdlist = self.gitcmd + shlex.split(cmdstr) if paths: cmdlist.append('--') cmdlist.extend(paths) popen_args = dict(universal_newlines=True, encoding='utf8') if not self.errors: popen_args['stderr'] = subprocess.DEVNULL log.trace("Executing: %s", ' '.join(cmdlist)) if not output: return subprocess.call(cmdlist, **popen_args) if check: try: stdout: str = subprocess.check_output(cmdlist, **popen_args) return stdout.splitlines() except subprocess.CalledProcessError as e: raise self.Error(e.returncode, e.cmd, e.output, e.stderr) self._proc = subprocess.Popen(cmdlist, stdout=subprocess.PIPE, **popen_args) return (_.rstrip() for _ in self._proc.stdout) def __del__(self): self.terminate() class Error(subprocess.CalledProcessError): """Error from git executable""" def parse_log(filelist, dirlist, stats, git, merge=False, filterlist=None): mtime = 0 datestr = isodate(0) for line in git.log( merge, args.first_parent, args.commit_time, args.reverse_order, filterlist ): stats['loglines'] += 1 # Blank line between Date and list of files if not line: continue # Date line if line[0] != ':': # Faster than `not line.startswith(':')` stats['commits'] += 1 mtime = int(line) if args.unique_times: mtime = get_mtime_ns(mtime, stats['commits']) if args.debug: datestr = isodate(mtime) continue # File line: three tokens if it describes a renaming, otherwise two tokens = line.split('\t') # Possible statuses: # M: Modified (content changed) # A: Added (created) # D: Deleted # T: Type changed: to/from regular file, symlinks, submodules # R099: Renamed (moved), with % of unchanged content. 100 = pure rename # Not possible in log: C=Copied, U=Unmerged, X=Unknown, B=pairing Broken status = tokens[0].split(' ')[-1] file = tokens[-1] # Handles non-ASCII chars and OS path separator file = normalize(file) def do_file(): if args.skip_older_than_commit and get_mtime_path(file) <= mtime: stats['skip'] += 1 return if args.debug: log.debug("%d\t%d\t%d\t%s\t%s", stats['loglines'], stats['commits'], stats['files'], datestr, file) try: touch(os.path.join(git.workdir, file), mtime) stats['touches'] += 1 except Exception as e: log.error("ERROR: %s: %s", e, file) stats['errors'] += 1 def do_dir(): if args.debug: log.debug("%d\t%d\t-\t%s\t%s", stats['loglines'], stats['commits'], datestr, "{}/".format(dirname or '.')) try: touch(os.path.join(git.workdir, dirname), mtime) stats['dirtouches'] += 1 except Exception as e: log.error("ERROR: %s: %s", e, dirname) stats['direrrors'] += 1 if file in filelist: stats['files'] -= 1 filelist.remove(file) do_file() if args.dirs and status in ('A', 'D'): dirname = os.path.dirname(file) if dirname in dirlist: dirlist.remove(dirname) do_dir() # All files done? if not stats['files']: git.terminate() return # Main Logic ################################################################## def main(): start = time.time() # yes, Wall time. CPU time is not realistic for users. stats = {_: 0 for _ in ('loglines', 'commits', 'touches', 'skip', 'errors', 'dirtouches', 'direrrors')} logging.basicConfig(level=args.loglevel, format='%(message)s') log.trace("Arguments: %s", args) # First things first: Where and Who are we? if args.cwd: log.debug("Changing directory: %s", args.cwd) try: os.chdir(args.cwd) except OSError as e: log.critical(e) return e.errno # Using both os.chdir() and `git -C` is redundant, but might prevent side effects # `git -C` alone could be enough if we make sure that: # - all paths, including args.pathspec, are processed by git: ls-files, rev-parse # - touch() / os.utime() path argument is always prepended with git.workdir try: git = Git(workdir=args.workdir, gitdir=args.gitdir, cwd=args.cwd) except Git.Error as e: # Not in a git repository, and git already informed user on stderr. So we just... return e.returncode # Get the files managed by git and build file list to be processed if UPDATE_SYMLINKS and not args.skip_older_than: filelist = set(git.ls_files(args.pathspec)) else: filelist = set() for path in git.ls_files(args.pathspec): fullpath = os.path.join(git.workdir, path) # Symlink (to file, to dir or broken - git handles the same way) if not UPDATE_SYMLINKS and os.path.islink(fullpath): log.warning("WARNING: Skipping symlink, no OS support for updates: %s", path) continue # skip files which are older than given threshold if (args.skip_older_than and start - get_mtime_path(fullpath) > args.skip_older_than): continue # Always add files relative to worktree root filelist.add(path) # If --force, silently ignore uncommitted deletions (not in the filesystem) # and renames / additions (will not be found in log anyway) if args.force: filelist -= set(git.ls_dirty(force=True)) # Otherwise, ignore any dirty files else: dirty = set(git.ls_dirty()) if dirty: log.warning("WARNING: Modified files in the working directory were ignored." "\nTo include such files, commit your changes or use --force.") filelist -= dirty # Build dir list to be processed dirlist = set(os.path.dirname(_) for _ in filelist) if args.dirs else set() stats['totalfiles'] = stats['files'] = len(filelist) log.info("{0:,} files to be processed in work dir".format(stats['totalfiles'])) if not filelist: # Nothing to do. Exit silently and without errors, just like git does return # Process the log until all files are 'touched' log.debug("Line #\tLog #\tF.Left\tModification Time\tFile Name") parse_log(filelist, dirlist, stats, git, args.merge, args.pathspec) # Missing files if filelist: # Try to find them in merge logs, if not done already # (usually HUGE, thus MUCH slower!) if args.missing and not args.merge: filterlist = list(filelist) missing = len(filterlist) log.info("{0:,} files not found in log, trying merge commits".format(missing)) for i in range(0, missing, STEPMISSING): parse_log(filelist, dirlist, stats, git, merge=True, filterlist=filterlist[i:i + STEPMISSING]) # Still missing some? for file in filelist: log.warning("WARNING: not found in the log: %s", file) # Final statistics # Suggestion: use git-log --before=mtime to brag about skipped log entries def log_info(msg, *a, width=13): ifmt = '{:%d,}' % (width,) # not using 'n' for consistency with ffmt ffmt = '{:%d,.2f}' % (width,) # %-formatting lacks a thousand separator, must pre-render with .format() log.info(msg.replace('%d', ifmt).replace('%f', ffmt).format(*a)) log_info( "Statistics:\n" "%f seconds\n" "%d log lines processed\n" "%d commits evaluated", time.time() - start, stats['loglines'], stats['commits']) if args.dirs: if stats['direrrors']: log_info("%d directory update errors", stats['direrrors']) log_info("%d directories updated", stats['dirtouches']) if stats['touches'] != stats['totalfiles']: log_info("%d files", stats['totalfiles']) if stats['skip']: log_info("%d files skipped", stats['skip']) if stats['files']: log_info("%d files missing", stats['files']) if stats['errors']: log_info("%d file update errors", stats['errors']) log_info("%d files updated", stats['touches']) if args.test: log.info("TEST RUN - No files modified!") # Keep only essential, global assignments here. Any other logic must be in main() log = setup_logging() args = parse_args() # Set the actual touch() and other functions based on command-line arguments if args.unique_times: touch = touch_ns isodate = isodate_ns # Make sure this is always set last to ensure --test behaves as intended if args.test: touch = dummy # UI done, it's showtime! try: sys.exit(main()) except KeyboardInterrupt: log.info("\nAborting") signal.signal(signal.SIGINT, signal.SIG_DFL) os.kill(os.getpid(), signal.SIGINT) ``` ## /.github/workflows/.codespell-exclude ```codespell-exclude path="/.github/workflows/.codespell-exclude" libs/community/langchain_community/llms/yuan2.py "NotIn": "not in", - `/checkin`: Check-in docs/docs/integrations/providers/trulens.mdx self.assertIn( from trulens_eval import Tru tru = Tru() ``` ## /.github/workflows/_compile_integration_test.yml ```yml path="/.github/workflows/_compile_integration_test.yml" name: compile-integration-test on: workflow_call: inputs: working-directory: required: true type: string description: "From which folder this pipeline executes" python-version: required: true type: string description: "Python version to use" env: UV_FROZEN: "true" jobs: build: defaults: run: working-directory: ${{ inputs.working-directory }} runs-on: ubuntu-latest timeout-minutes: 20 name: "uv run pytest -m compile tests/integration_tests #${{ inputs.python-version }}" steps: - uses: actions/checkout@v4 - name: Set up Python ${{ inputs.python-version }} + uv uses: "./.github/actions/uv_setup" with: python-version: ${{ inputs.python-version }} - name: Install integration dependencies shell: bash run: uv sync --group test --group test_integration - name: Check integration tests compile shell: bash run: uv run pytest -m compile tests/integration_tests - name: Ensure the tests did not create any additional files shell: bash run: | set -eu STATUS="$(git status)" echo "$STATUS" # grep will exit non-zero if the target message isn't found, # and `set -e` above will cause the step to fail. echo "$STATUS" | grep 'nothing to commit, working tree clean' ``` ## /.github/workflows/_integration_test.yml ```yml path="/.github/workflows/_integration_test.yml" name: Integration tests on: workflow_dispatch: inputs: working-directory: required: true type: string description: "From which folder this pipeline executes" python-version: required: true type: string description: "Python version to use" env: UV_FROZEN: "true" jobs: build: defaults: run: working-directory: ${{ inputs.working-directory }} runs-on: ubuntu-latest name: Python ${{ inputs.python-version }} steps: - uses: actions/checkout@v4 - name: Set up Python ${{ inputs.python-version }} + uv uses: "./.github/actions/uv_setup" with: python-version: ${{ inputs.python-version }} - name: Install dependencies shell: bash run: uv sync --group test --group test_integration - name: Install deps outside pyproject if: ${{ startsWith(inputs.working-directory, 'libs/community/') }} shell: bash run: VIRTUAL_ENV=.venv uv pip install "boto3<2" "google-cloud-aiplatform<2" - name: Run integration tests shell: bash env: AI21_API_KEY: ${{ secrets.AI21_API_KEY }} FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }} GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }} AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }} AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }} AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }} AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME }} AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }} AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }} MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }} TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }} OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }} NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }} GOOGLE_SEARCH_API_KEY: ${{ secrets.GOOGLE_SEARCH_API_KEY }} GOOGLE_CSE_ID: ${{ secrets.GOOGLE_CSE_ID }} HUGGINGFACEHUB_API_TOKEN: ${{ secrets.HUGGINGFACEHUB_API_TOKEN }} EXA_API_KEY: ${{ secrets.EXA_API_KEY }} NOMIC_API_KEY: ${{ secrets.NOMIC_API_KEY }} WATSONX_APIKEY: ${{ secrets.WATSONX_APIKEY }} WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }} ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }} ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }} ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }} ES_URL: ${{ secrets.ES_URL }} ES_CLOUD_ID: ${{ secrets.ES_CLOUD_ID }} ES_API_KEY: ${{ secrets.ES_API_KEY }} MONGODB_ATLAS_URI: ${{ secrets.MONGODB_ATLAS_URI }} VOYAGE_API_KEY: ${{ secrets.VOYAGE_API_KEY }} COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }} UPSTAGE_API_KEY: ${{ secrets.UPSTAGE_API_KEY }} XAI_API_KEY: ${{ secrets.XAI_API_KEY }} PPLX_API_KEY: ${{ secrets.PPLX_API_KEY }} run: | make integration_tests - name: Ensure the tests did not create any additional files shell: bash run: | set -eu STATUS="$(git status)" echo "$STATUS" # grep will exit non-zero if the target message isn't found, # and `set -e` above will cause the step to fail. echo "$STATUS" | grep 'nothing to commit, working tree clean' ``` ## /.github/workflows/_lint.yml ```yml path="/.github/workflows/_lint.yml" name: lint on: workflow_call: inputs: working-directory: required: true type: string description: "From which folder this pipeline executes" python-version: required: true type: string description: "Python version to use" env: WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }} # This env var allows us to get inline annotations when ruff has complaints. RUFF_OUTPUT_FORMAT: github UV_FROZEN: "true" jobs: build: name: "make lint #${{ inputs.python-version }}" runs-on: ubuntu-latest timeout-minutes: 20 steps: - uses: actions/checkout@v4 - name: Set up Python ${{ inputs.python-version }} + uv uses: "./.github/actions/uv_setup" with: python-version: ${{ inputs.python-version }} - name: Install dependencies # Also installs dev/lint/test/typing dependencies, to ensure we have # type hints for as many of our libraries as possible. # This helps catch errors that require dependencies to be spotted, for example: # https://github.com/langchain-ai/langchain/pull/10249/files#diff-935185cd488d015f026dcd9e19616ff62863e8cde8c0bee70318d3ccbca98341 # # If you change this configuration, make sure to change the `cache-key` # in the `poetry_setup` action above to stop using the old cache. # It doesn't matter how you change it, any change will cause a cache-bust. working-directory: ${{ inputs.working-directory }} run: | uv sync --group lint --group typing - name: Analysing the code with our lint working-directory: ${{ inputs.working-directory }} run: | make lint_package - name: Install unit test dependencies # Also installs dev/lint/test/typing dependencies, to ensure we have # type hints for as many of our libraries as possible. # This helps catch errors that require dependencies to be spotted, for example: # https://github.com/langchain-ai/langchain/pull/10249/files#diff-935185cd488d015f026dcd9e19616ff62863e8cde8c0bee70318d3ccbca98341 # # If you change this configuration, make sure to change the `cache-key` # in the `poetry_setup` action above to stop using the old cache. # It doesn't matter how you change it, any change will cause a cache-bust. if: ${{ ! startsWith(inputs.working-directory, 'libs/partners/') }} working-directory: ${{ inputs.working-directory }} run: | uv sync --inexact --group test - name: Install unit+integration test dependencies if: ${{ startsWith(inputs.working-directory, 'libs/partners/') }} working-directory: ${{ inputs.working-directory }} run: | uv sync --inexact --group test --group test_integration - name: Analysing the code with our lint working-directory: ${{ inputs.working-directory }} run: | make lint_tests ``` ## /.github/workflows/_release.yml ```yml path="/.github/workflows/_release.yml" name: release run-name: Release ${{ inputs.working-directory }} by @${{ github.actor }} on: workflow_call: inputs: working-directory: required: true type: string description: "From which folder this pipeline executes" workflow_dispatch: inputs: working-directory: required: true type: string description: "From which folder this pipeline executes" default: 'libs/langchain' dangerous-nonmaster-release: required: false type: boolean default: false description: "Release from a non-master branch (danger!)" env: PYTHON_VERSION: "3.11" UV_FROZEN: "true" UV_NO_SYNC: "true" jobs: build: if: github.ref == 'refs/heads/master' || inputs.dangerous-nonmaster-release environment: Scheduled testing runs-on: ubuntu-latest outputs: pkg-name: ${{ steps.check-version.outputs.pkg-name }} version: ${{ steps.check-version.outputs.version }} steps: - uses: actions/checkout@v4 - name: Set up Python + uv uses: "./.github/actions/uv_setup" with: python-version: ${{ env.PYTHON_VERSION }} # We want to keep this build stage *separate* from the release stage, # so that there's no sharing of permissions between them. # The release stage has trusted publishing and GitHub repo contents write access, # and we want to keep the scope of that access limited just to the release job. # Otherwise, a malicious `build` step (e.g. via a compromised dependency) # could get access to our GitHub or PyPI credentials. # # Per the trusted publishing GitHub Action: # > It is strongly advised to separate jobs for building [...] # > from the publish job. # https://github.com/pypa/gh-action-pypi-publish#non-goals - name: Build project for distribution run: uv build working-directory: ${{ inputs.working-directory }} - name: Upload build uses: actions/upload-artifact@v4 with: name: dist path: ${{ inputs.working-directory }}/dist/ - name: Check Version id: check-version shell: python working-directory: ${{ inputs.working-directory }} run: | import os import tomllib with open("pyproject.toml", "rb") as f: data = tomllib.load(f) pkg_name = data["project"]["name"] version = data["project"]["version"] with open(os.environ["GITHUB_OUTPUT"], "a") as f: f.write(f"pkg-name={pkg_name}\n") f.write(f"version={version}\n") release-notes: needs: - build runs-on: ubuntu-latest outputs: release-body: ${{ steps.generate-release-body.outputs.release-body }} steps: - uses: actions/checkout@v4 with: repository: langchain-ai/langchain path: langchain sparse-checkout: | # this only grabs files for relevant dir ${{ inputs.working-directory }} ref: ${{ github.ref }} # this scopes to just ref'd branch fetch-depth: 0 # this fetches entire commit history - name: Check Tags id: check-tags shell: bash working-directory: langchain/${{ inputs.working-directory }} env: PKG_NAME: ${{ needs.build.outputs.pkg-name }} VERSION: ${{ needs.build.outputs.version }} run: | # Handle regular versions and pre-release versions differently if [[ "$VERSION" == *"-"* ]]; then # This is a pre-release version (contains a hyphen) # Extract the base version without the pre-release suffix BASE_VERSION=${VERSION%%-*} # Look for the latest release of the same base version REGEX="^$PKG_NAME==$BASE_VERSION\$" PREV_TAG=$(git tag --sort=-creatordate | (grep -P "$REGEX" || true) | head -1) # If no exact base version match, look for the latest release of any kind if [ -z "$PREV_TAG" ]; then REGEX="^$PKG_NAME==\\d+\\.\\d+\\.\\d+\$" PREV_TAG=$(git tag --sort=-creatordate | (grep -P "$REGEX" || true) | head -1) fi else # Regular version handling PREV_TAG="$PKG_NAME==${VERSION%.*}.$(( ${VERSION##*.} - 1 ))"; [[ "${VERSION##*.}" -eq 0 ]] && PREV_TAG="" # backup case if releasing e.g. 0.3.0, looks up last release # note if last release (chronologically) was e.g. 0.1.47 it will get # that instead of the last 0.2 release if [ -z "$PREV_TAG" ]; then REGEX="^$PKG_NAME==\\d+\\.\\d+\\.\\d+\$" echo $REGEX PREV_TAG=$(git tag --sort=-creatordate | (grep -P $REGEX || true) | head -1) fi fi # if PREV_TAG is empty, let it be empty if [ -z "$PREV_TAG" ]; then echo "No previous tag found - first release" else # confirm prev-tag actually exists in git repo with git tag GIT_TAG_RESULT=$(git tag -l "$PREV_TAG") if [ -z "$GIT_TAG_RESULT" ]; then echo "Previous tag $PREV_TAG not found in git repo" exit 1 fi fi TAG="${PKG_NAME}==${VERSION}" if [ "$TAG" == "$PREV_TAG" ]; then echo "No new version to release" exit 1 fi echo tag="$TAG" >> $GITHUB_OUTPUT echo prev-tag="$PREV_TAG" >> $GITHUB_OUTPUT - name: Generate release body id: generate-release-body working-directory: langchain env: WORKING_DIR: ${{ inputs.working-directory }} PKG_NAME: ${{ needs.build.outputs.pkg-name }} TAG: ${{ steps.check-tags.outputs.tag }} PREV_TAG: ${{ steps.check-tags.outputs.prev-tag }} run: | PREAMBLE="Changes since $PREV_TAG" # if PREV_TAG is empty, then we are releasing the first version if [ -z "$PREV_TAG" ]; then PREAMBLE="Initial release" PREV_TAG=$(git rev-list --max-parents=0 HEAD) fi { echo 'release-body<> "$GITHUB_OUTPUT" test-pypi-publish: needs: - build - release-notes uses: ./.github/workflows/_test_release.yml permissions: write-all with: working-directory: ${{ inputs.working-directory }} dangerous-nonmaster-release: ${{ inputs.dangerous-nonmaster-release }} secrets: inherit pre-release-checks: needs: - build - release-notes - test-pypi-publish runs-on: ubuntu-latest timeout-minutes: 20 steps: - uses: actions/checkout@v4 # We explicitly *don't* set up caching here. This ensures our tests are # maximally sensitive to catching breakage. # # For example, here's a way that caching can cause a falsely-passing test: # - Make the langchain package manifest no longer list a dependency package # as a requirement. This means it won't be installed by `pip install`, # and attempting to use it would cause a crash. # - That dependency used to be required, so it may have been cached. # When restoring the venv packages from cache, that dependency gets included. # - Tests pass, because the dependency is present even though it wasn't specified. # - The package is published, and it breaks on the missing dependency when # used in the real world. - name: Set up Python + uv uses: "./.github/actions/uv_setup" id: setup-python with: python-version: ${{ env.PYTHON_VERSION }} - uses: actions/download-artifact@v4 with: name: dist path: ${{ inputs.working-directory }}/dist/ - name: Import dist package shell: bash working-directory: ${{ inputs.working-directory }} env: PKG_NAME: ${{ needs.build.outputs.pkg-name }} VERSION: ${{ needs.build.outputs.version }} # Here we use: # - The default regular PyPI index as the *primary* index, meaning # that it takes priority (https://pypi.org/simple) # - The test PyPI index as an extra index, so that any dependencies that # are not found on test PyPI can be resolved and installed anyway. # (https://test.pypi.org/simple). This will include the PKG_NAME==VERSION # package because VERSION will not have been uploaded to regular PyPI yet. # - attempt install again after 5 seconds if it fails because there is # sometimes a delay in availability on test pypi run: | uv venv VIRTUAL_ENV=.venv uv pip install dist/*.whl # Replace all dashes in the package name with underscores, # since that's how Python imports packages with dashes in the name. # also remove _official suffix IMPORT_NAME="$(echo "$PKG_NAME" | sed s/-/_/g | sed s/_official//g)" uv run python -c "import $IMPORT_NAME; print(dir($IMPORT_NAME))" - name: Import test dependencies run: uv sync --group test working-directory: ${{ inputs.working-directory }} # Overwrite the local version of the package with the built version - name: Import published package (again) working-directory: ${{ inputs.working-directory }} shell: bash env: PKG_NAME: ${{ needs.build.outputs.pkg-name }} VERSION: ${{ needs.build.outputs.version }} run: | VIRTUAL_ENV=.venv uv pip install dist/*.whl - name: Run unit tests run: make tests working-directory: ${{ inputs.working-directory }} - name: Check for prerelease versions working-directory: ${{ inputs.working-directory }} run: | uv run python $GITHUB_WORKSPACE/.github/scripts/check_prerelease_dependencies.py pyproject.toml - name: Get minimum versions working-directory: ${{ inputs.working-directory }} id: min-version run: | VIRTUAL_ENV=.venv uv pip install packaging requests python_version="$(uv run python --version | awk '{print $2}')" min_versions="$(uv run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml release $python_version)" echo "min-versions=$min_versions" >> "$GITHUB_OUTPUT" echo "min-versions=$min_versions" - name: Run unit tests with minimum dependency versions if: ${{ steps.min-version.outputs.min-versions != '' }} env: MIN_VERSIONS: ${{ steps.min-version.outputs.min-versions }} run: | VIRTUAL_ENV=.venv uv pip install --force-reinstall $MIN_VERSIONS --editable . make tests working-directory: ${{ inputs.working-directory }} - name: Import integration test dependencies run: uv sync --group test --group test_integration working-directory: ${{ inputs.working-directory }} - name: Run integration tests if: ${{ startsWith(inputs.working-directory, 'libs/partners/') }} env: AI21_API_KEY: ${{ secrets.AI21_API_KEY }} GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }} TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }} OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }} AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }} AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }} AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }} AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME }} AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }} AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }} NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }} GOOGLE_SEARCH_API_KEY: ${{ secrets.GOOGLE_SEARCH_API_KEY }} GOOGLE_CSE_ID: ${{ secrets.GOOGLE_CSE_ID }} GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }} HUGGINGFACEHUB_API_TOKEN: ${{ secrets.HUGGINGFACEHUB_API_TOKEN }} EXA_API_KEY: ${{ secrets.EXA_API_KEY }} NOMIC_API_KEY: ${{ secrets.NOMIC_API_KEY }} WATSONX_APIKEY: ${{ secrets.WATSONX_APIKEY }} WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }} ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }} ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }} ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }} ES_URL: ${{ secrets.ES_URL }} ES_CLOUD_ID: ${{ secrets.ES_CLOUD_ID }} ES_API_KEY: ${{ secrets.ES_API_KEY }} MONGODB_ATLAS_URI: ${{ secrets.MONGODB_ATLAS_URI }} VOYAGE_API_KEY: ${{ secrets.VOYAGE_API_KEY }} UPSTAGE_API_KEY: ${{ secrets.UPSTAGE_API_KEY }} FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }} XAI_API_KEY: ${{ secrets.XAI_API_KEY }} DEEPSEEK_API_KEY: ${{ secrets.DEEPSEEK_API_KEY }} PPLX_API_KEY: ${{ secrets.PPLX_API_KEY }} run: make integration_tests working-directory: ${{ inputs.working-directory }} # Test select published packages against new core test-prior-published-packages-against-new-core: needs: - build - release-notes - test-pypi-publish - pre-release-checks runs-on: ubuntu-latest strategy: matrix: partner: [openai, anthropic] fail-fast: false # Continue testing other partners if one fails env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }} AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }} AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }} AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }} AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME }} AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }} AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }} steps: - uses: actions/checkout@v4 # We implement this conditional as Github Actions does not have good support # for conditionally needing steps. https://github.com/actions/runner/issues/491 - name: Check if libs/core run: | if [ "${{ startsWith(inputs.working-directory, 'libs/core') }}" != "true" ]; then echo "Not in libs/core. Exiting successfully." exit 0 fi - name: Set up Python + uv if: startsWith(inputs.working-directory, 'libs/core') uses: "./.github/actions/uv_setup" with: python-version: ${{ env.PYTHON_VERSION }} - uses: actions/download-artifact@v4 if: startsWith(inputs.working-directory, 'libs/core') with: name: dist path: ${{ inputs.working-directory }}/dist/ - name: Test against ${{ matrix.partner }} if: startsWith(inputs.working-directory, 'libs/core') run: | # Identify latest tag LATEST_PACKAGE_TAG="$( git ls-remote --tags origin "langchain-${{ matrix.partner }}*" \ | awk '{print $2}' \ | sed 's|refs/tags/||' \ | sort -Vr \ | head -n 1 )" echo "Latest package tag: $LATEST_PACKAGE_TAG" # Shallow-fetch just that single tag git fetch --depth=1 origin tag "$LATEST_PACKAGE_TAG" # Checkout the latest package files rm -rf $GITHUB_WORKSPACE/libs/partners/${{ matrix.partner }}/* rm -rf $GITHUB_WORKSPACE/libs/standard-tests/* cd $GITHUB_WORKSPACE/libs/ git checkout "$LATEST_PACKAGE_TAG" -- standard-tests/ git checkout "$LATEST_PACKAGE_TAG" -- partners/${{ matrix.partner }}/ cd partners/${{ matrix.partner }} # Print as a sanity check echo "Version number from pyproject.toml: " cat pyproject.toml | grep "version = " # Run tests uv sync --group test --group test_integration uv pip install ../../core/dist/*.whl make integration_tests publish: needs: - build - release-notes - test-pypi-publish - pre-release-checks - test-prior-published-packages-against-new-core runs-on: ubuntu-latest permissions: # This permission is used for trusted publishing: # https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/ # # Trusted publishing has to also be configured on PyPI for each package: # https://docs.pypi.org/trusted-publishers/adding-a-publisher/ id-token: write defaults: run: working-directory: ${{ inputs.working-directory }} steps: - uses: actions/checkout@v4 - name: Set up Python + uv uses: "./.github/actions/uv_setup" with: python-version: ${{ env.PYTHON_VERSION }} - uses: actions/download-artifact@v4 with: name: dist path: ${{ inputs.working-directory }}/dist/ - name: Publish package distributions to PyPI uses: pypa/gh-action-pypi-publish@release/v1 with: packages-dir: ${{ inputs.working-directory }}/dist/ verbose: true print-hash: true # Temp workaround since attestations are on by default as of gh-action-pypi-publish v1.11.0 attestations: false mark-release: needs: - build - release-notes - test-pypi-publish - pre-release-checks - publish runs-on: ubuntu-latest permissions: # This permission is needed by `ncipollo/release-action` to # create the GitHub release. contents: write defaults: run: working-directory: ${{ inputs.working-directory }} steps: - uses: actions/checkout@v4 - name: Set up Python + uv uses: "./.github/actions/uv_setup" with: python-version: ${{ env.PYTHON_VERSION }} - uses: actions/download-artifact@v4 with: name: dist path: ${{ inputs.working-directory }}/dist/ - name: Create Tag uses: ncipollo/release-action@v1 with: artifacts: "dist/*" token: ${{ secrets.GITHUB_TOKEN }} generateReleaseNotes: false tag: ${{needs.build.outputs.pkg-name}}==${{ needs.build.outputs.version }} body: ${{ needs.release-notes.outputs.release-body }} commit: ${{ github.sha }} makeLatest: ${{ needs.build.outputs.pkg-name == 'langchain-core'}} ``` ## /.github/workflows/_test.yml ```yml path="/.github/workflows/_test.yml" name: test on: workflow_call: inputs: working-directory: required: true type: string description: "From which folder this pipeline executes" python-version: required: true type: string description: "Python version to use" env: UV_FROZEN: "true" UV_NO_SYNC: "true" jobs: build: defaults: run: working-directory: ${{ inputs.working-directory }} runs-on: ubuntu-latest timeout-minutes: 20 name: "make test #${{ inputs.python-version }}" steps: - uses: actions/checkout@v4 - name: Set up Python ${{ inputs.python-version }} + uv uses: "./.github/actions/uv_setup" id: setup-python with: python-version: ${{ inputs.python-version }} - name: Install dependencies shell: bash run: uv sync --group test --dev - name: Run core tests shell: bash run: | make test - name: Get minimum versions working-directory: ${{ inputs.working-directory }} id: min-version shell: bash run: | VIRTUAL_ENV=.venv uv pip install packaging tomli requests python_version="$(uv run python --version | awk '{print $2}')" min_versions="$(uv run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml pull_request $python_version)" echo "min-versions=$min_versions" >> "$GITHUB_OUTPUT" echo "min-versions=$min_versions" - name: Run unit tests with minimum dependency versions if: ${{ steps.min-version.outputs.min-versions != '' }} env: MIN_VERSIONS: ${{ steps.min-version.outputs.min-versions }} run: | VIRTUAL_ENV=.venv uv pip install $MIN_VERSIONS make tests working-directory: ${{ inputs.working-directory }} - name: Ensure the tests did not create any additional files shell: bash run: | set -eu STATUS="$(git status)" echo "$STATUS" # grep will exit non-zero if the target message isn't found, # and `set -e` above will cause the step to fail. echo "$STATUS" | grep 'nothing to commit, working tree clean' ``` ## /.github/workflows/_test_doc_imports.yml ```yml path="/.github/workflows/_test_doc_imports.yml" name: test_doc_imports on: workflow_call: inputs: python-version: required: true type: string description: "Python version to use" env: UV_FROZEN: "true" jobs: build: runs-on: ubuntu-latest timeout-minutes: 20 name: "check doc imports #${{ inputs.python-version }}" steps: - uses: actions/checkout@v4 - name: Set up Python ${{ inputs.python-version }} + uv uses: "./.github/actions/uv_setup" with: python-version: ${{ inputs.python-version }} - name: Install dependencies shell: bash run: uv sync --group test - name: Install langchain editable run: | VIRTUAL_ENV=.venv uv pip install langchain-experimental -e libs/core libs/langchain libs/community - name: Check doc imports shell: bash run: | uv run python docs/scripts/check_imports.py - name: Ensure the test did not create any additional files shell: bash run: | set -eu STATUS="$(git status)" echo "$STATUS" # grep will exit non-zero if the target message isn't found, # and `set -e` above will cause the step to fail. echo "$STATUS" | grep 'nothing to commit, working tree clean' ``` ## /.github/workflows/_test_pydantic.yml ```yml path="/.github/workflows/_test_pydantic.yml" name: test pydantic intermediate versions on: workflow_call: inputs: working-directory: required: true type: string description: "From which folder this pipeline executes" python-version: required: false type: string description: "Python version to use" default: "3.11" pydantic-version: required: true type: string description: "Pydantic version to test." env: UV_FROZEN: "true" UV_NO_SYNC: "true" jobs: build: defaults: run: working-directory: ${{ inputs.working-directory }} runs-on: ubuntu-latest timeout-minutes: 20 name: "make test # pydantic: ~=${{ inputs.pydantic-version }}, python: ${{ inputs.python-version }}, " steps: - uses: actions/checkout@v4 - name: Set up Python ${{ inputs.python-version }} + uv uses: "./.github/actions/uv_setup" with: python-version: ${{ inputs.python-version }} - name: Install dependencies shell: bash run: uv sync --group test - name: Overwrite pydantic version shell: bash run: VIRTUAL_ENV=.venv uv pip install pydantic~=${{ inputs.pydantic-version }} - name: Run core tests shell: bash run: | make test - name: Ensure the tests did not create any additional files shell: bash run: | set -eu STATUS="$(git status)" echo "$STATUS" # grep will exit non-zero if the target message isn't found, # and `set -e` above will cause the step to fail. echo "$STATUS" | grep 'nothing to commit, working tree clean' ``` ## /.github/workflows/_test_release.yml ```yml path="/.github/workflows/_test_release.yml" name: test-release on: workflow_call: inputs: working-directory: required: true type: string description: "From which folder this pipeline executes" dangerous-nonmaster-release: required: false type: boolean default: false description: "Release from a non-master branch (danger!)" env: PYTHON_VERSION: "3.11" UV_FROZEN: "true" jobs: build: if: github.ref == 'refs/heads/master' || inputs.dangerous-nonmaster-release runs-on: ubuntu-latest outputs: pkg-name: ${{ steps.check-version.outputs.pkg-name }} version: ${{ steps.check-version.outputs.version }} steps: - uses: actions/checkout@v4 - name: Set up Python + uv uses: "./.github/actions/uv_setup" with: python-version: ${{ env.PYTHON_VERSION }} # We want to keep this build stage *separate* from the release stage, # so that there's no sharing of permissions between them. # The release stage has trusted publishing and GitHub repo contents write access, # and we want to keep the scope of that access limited just to the release job. # Otherwise, a malicious `build` step (e.g. via a compromised dependency) # could get access to our GitHub or PyPI credentials. # # Per the trusted publishing GitHub Action: # > It is strongly advised to separate jobs for building [...] # > from the publish job. # https://github.com/pypa/gh-action-pypi-publish#non-goals - name: Build project for distribution run: uv build working-directory: ${{ inputs.working-directory }} - name: Upload build uses: actions/upload-artifact@v4 with: name: test-dist path: ${{ inputs.working-directory }}/dist/ - name: Check Version id: check-version shell: python working-directory: ${{ inputs.working-directory }} run: | import os import tomllib with open("pyproject.toml", "rb") as f: data = tomllib.load(f) pkg_name = data["project"]["name"] version = data["project"]["version"] with open(os.environ["GITHUB_OUTPUT"], "a") as f: f.write(f"pkg-name={pkg_name}\n") f.write(f"version={version}\n") publish: needs: - build runs-on: ubuntu-latest permissions: # This permission is used for trusted publishing: # https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/ # # Trusted publishing has to also be configured on PyPI for each package: # https://docs.pypi.org/trusted-publishers/adding-a-publisher/ id-token: write steps: - uses: actions/checkout@v4 - uses: actions/download-artifact@v4 with: name: test-dist path: ${{ inputs.working-directory }}/dist/ - name: Publish to test PyPI uses: pypa/gh-action-pypi-publish@release/v1 with: packages-dir: ${{ inputs.working-directory }}/dist/ verbose: true print-hash: true repository-url: https://test.pypi.org/legacy/ # We overwrite any existing distributions with the same name and version. # This is *only for CI use* and is *extremely dangerous* otherwise! # https://github.com/pypa/gh-action-pypi-publish#tolerating-release-package-file-duplicates skip-existing: true # Temp workaround since attestations are on by default as of gh-action-pypi-publish v1.11.0 attestations: false ``` ## /.github/workflows/api_doc_build.yml ```yml path="/.github/workflows/api_doc_build.yml" name: API docs build on: workflow_dispatch: schedule: - cron: '0 13 * * *' env: PYTHON_VERSION: "3.11" jobs: build: if: github.repository == 'langchain-ai/langchain' || github.event_name != 'schedule' runs-on: ubuntu-latest permissions: write-all steps: - uses: actions/checkout@v4 with: path: langchain - uses: actions/checkout@v4 with: repository: langchain-ai/langchain-api-docs-html path: langchain-api-docs-html token: ${{ secrets.TOKEN_GITHUB_API_DOCS_HTML }} - name: Get repos with yq id: get-unsorted-repos uses: mikefarah/yq@master with: cmd: | yq ' .packages[] | select( ( (.repo | test("^langchain-ai/")) and (.repo != "langchain-ai/langchain") ) or (.include_in_api_ref // false) ) | .repo ' langchain/libs/packages.yml - name: Parse YAML and checkout repos env: REPOS_UNSORTED: ${{ steps.get-unsorted-repos.outputs.result }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} run: | # Get unique repositories REPOS=$(echo "$REPOS_UNSORTED" | sort -u) # Checkout each unique repository that is in langchain-ai org for repo in $REPOS; do REPO_NAME=$(echo $repo | cut -d'/' -f2) echo "Checking out $repo to $REPO_NAME" git clone --depth 1 https://github.com/$repo.git $REPO_NAME done - name: Setup python ${{ env.PYTHON_VERSION }} uses: actions/setup-python@v5 id: setup-python with: python-version: ${{ env.PYTHON_VERSION }} - name: Install initial py deps working-directory: langchain run: | python -m pip install -U uv python -m uv pip install --upgrade --no-cache-dir pip setuptools pyyaml - name: Move libs with script run: python langchain/.github/scripts/prep_api_docs_build.py env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - name: Rm old html run: rm -rf langchain-api-docs-html/api_reference_build/html - name: Install dependencies working-directory: langchain run: | python -m uv pip install $(ls ./libs/partners | xargs -I {} echo "./libs/partners/{}") --overrides ./docs/vercel_overrides.txt python -m uv pip install libs/core libs/langchain libs/text-splitters libs/community libs/experimental libs/standard-tests python -m uv pip install -r docs/api_reference/requirements.txt - name: Set Git config working-directory: langchain run: | git config --local user.email "actions@github.com" git config --local user.name "Github Actions" - name: Build docs working-directory: langchain run: | python docs/api_reference/create_api_rst.py python -m sphinx -T -E -b html -d ../langchain-api-docs-html/_build/doctrees -c docs/api_reference docs/api_reference ../langchain-api-docs-html/api_reference_build/html -j auto python docs/api_reference/scripts/custom_formatter.py ../langchain-api-docs-html/api_reference_build/html # Default index page is blank so we copy in the actual home page. cp ../langchain-api-docs-html/api_reference_build/html/{reference,index}.html rm -rf ../langchain-api-docs-html/_build/ # https://github.com/marketplace/actions/add-commit - uses: EndBug/add-and-commit@v9 with: cwd: langchain-api-docs-html message: 'Update API docs build' ``` ## /.github/workflows/check-broken-links.yml ```yml path="/.github/workflows/check-broken-links.yml" name: Check Broken Links on: workflow_dispatch: schedule: - cron: '0 13 * * *' jobs: check-links: if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule' runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Use Node.js 18.x uses: actions/setup-node@v3 with: node-version: 18.x cache: "yarn" cache-dependency-path: ./docs/yarn.lock - name: Install dependencies run: yarn install --immutable --mode=skip-build working-directory: ./docs - name: Check broken links run: yarn check-broken-links working-directory: ./docs ``` ## /.github/workflows/check_core_versions.yml ```yml path="/.github/workflows/check_core_versions.yml" name: Check `langchain-core` version equality on: pull_request: paths: - 'libs/core/pyproject.toml' - 'libs/core/langchain_core/version.py' jobs: check_version_equality: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Check version equality run: | PYPROJECT_VERSION=$(grep -Po '(?<=^version = ")[^"]*' libs/core/pyproject.toml) VERSION_PY_VERSION=$(grep -Po '(?<=^VERSION = ")[^"]*' libs/core/langchain_core/version.py) # Compare the two versions if [ "$PYPROJECT_VERSION" != "$VERSION_PY_VERSION" ]; then echo "langchain-core versions in pyproject.toml and version.py do not match!" echo "pyproject.toml version: $PYPROJECT_VERSION" echo "version.py version: $VERSION_PY_VERSION" exit 1 else echo "Versions match: $PYPROJECT_VERSION" fi ``` ## /.github/workflows/check_diffs.yml ```yml path="/.github/workflows/check_diffs.yml" name: CI on: push: branches: [master] pull_request: merge_group: # If another push to the same PR or branch happens while this workflow is still running, # cancel the earlier run in favor of the next run. # # There's no point in testing an outdated version of the code. GitHub only allows # a limited number of job runners to be active at the same time, so it's better to cancel # pointless jobs early so that more useful jobs can run sooner. concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true env: UV_FROZEN: "true" UV_NO_SYNC: "true" jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: '3.11' - id: files uses: Ana06/get-changed-files@v2.2.0 - id: set-matrix run: | python -m pip install packaging requests python .github/scripts/check_diff.py ${{ steps.files.outputs.all }} >> $GITHUB_OUTPUT outputs: lint: ${{ steps.set-matrix.outputs.lint }} test: ${{ steps.set-matrix.outputs.test }} extended-tests: ${{ steps.set-matrix.outputs.extended-tests }} compile-integration-tests: ${{ steps.set-matrix.outputs.compile-integration-tests }} dependencies: ${{ steps.set-matrix.outputs.dependencies }} test-doc-imports: ${{ steps.set-matrix.outputs.test-doc-imports }} test-pydantic: ${{ steps.set-matrix.outputs.test-pydantic }} lint: name: cd ${{ matrix.job-configs.working-directory }} needs: [ build ] if: ${{ needs.build.outputs.lint != '[]' }} strategy: matrix: job-configs: ${{ fromJson(needs.build.outputs.lint) }} fail-fast: false uses: ./.github/workflows/_lint.yml with: working-directory: ${{ matrix.job-configs.working-directory }} python-version: ${{ matrix.job-configs.python-version }} secrets: inherit test: name: cd ${{ matrix.job-configs.working-directory }} needs: [ build ] if: ${{ needs.build.outputs.test != '[]' }} strategy: matrix: job-configs: ${{ fromJson(needs.build.outputs.test) }} fail-fast: false uses: ./.github/workflows/_test.yml with: working-directory: ${{ matrix.job-configs.working-directory }} python-version: ${{ matrix.job-configs.python-version }} secrets: inherit test-pydantic: name: cd ${{ matrix.job-configs.working-directory }} needs: [ build ] if: ${{ needs.build.outputs.test-pydantic != '[]' }} strategy: matrix: job-configs: ${{ fromJson(needs.build.outputs.test-pydantic) }} fail-fast: false uses: ./.github/workflows/_test_pydantic.yml with: working-directory: ${{ matrix.job-configs.working-directory }} pydantic-version: ${{ matrix.job-configs.pydantic-version }} secrets: inherit test-doc-imports: needs: [ build ] if: ${{ needs.build.outputs.test-doc-imports != '[]' }} strategy: matrix: job-configs: ${{ fromJson(needs.build.outputs.test-doc-imports) }} fail-fast: false uses: ./.github/workflows/_test_doc_imports.yml secrets: inherit with: python-version: ${{ matrix.job-configs.python-version }} compile-integration-tests: name: cd ${{ matrix.job-configs.working-directory }} needs: [ build ] if: ${{ needs.build.outputs.compile-integration-tests != '[]' }} strategy: matrix: job-configs: ${{ fromJson(needs.build.outputs.compile-integration-tests) }} fail-fast: false uses: ./.github/workflows/_compile_integration_test.yml with: working-directory: ${{ matrix.job-configs.working-directory }} python-version: ${{ matrix.job-configs.python-version }} secrets: inherit extended-tests: name: "cd ${{ matrix.job-configs.working-directory }} / make extended_tests #${{ matrix.job-configs.python-version }}" needs: [ build ] if: ${{ needs.build.outputs.extended-tests != '[]' }} strategy: matrix: # note different variable for extended test dirs job-configs: ${{ fromJson(needs.build.outputs.extended-tests) }} fail-fast: false runs-on: ubuntu-latest timeout-minutes: 20 defaults: run: working-directory: ${{ matrix.job-configs.working-directory }} steps: - uses: actions/checkout@v4 - name: Set up Python ${{ matrix.job-configs.python-version }} + uv uses: "./.github/actions/uv_setup" with: python-version: ${{ matrix.job-configs.python-version }} - name: Install dependencies and run extended tests shell: bash run: | echo "Running extended tests, installing dependencies with uv..." uv venv uv sync --group test VIRTUAL_ENV=.venv uv pip install -r extended_testing_deps.txt VIRTUAL_ENV=.venv make extended_tests - name: Ensure the tests did not create any additional files shell: bash run: | set -eu STATUS="$(git status)" echo "$STATUS" # grep will exit non-zero if the target message isn't found, # and `set -e` above will cause the step to fail. echo "$STATUS" | grep 'nothing to commit, working tree clean' ci_success: name: "CI Success" needs: [build, lint, test, compile-integration-tests, extended-tests, test-doc-imports, test-pydantic] if: | always() runs-on: ubuntu-latest env: JOBS_JSON: ${{ toJSON(needs) }} RESULTS_JSON: ${{ toJSON(needs.*.result) }} EXIT_CODE: ${{!contains(needs.*.result, 'failure') && !contains(needs.*.result, 'cancelled') && '0' || '1'}} steps: - name: "CI Success" run: | echo $JOBS_JSON echo $RESULTS_JSON echo "Exiting with $EXIT_CODE" exit $EXIT_CODE ``` ## /.github/workflows/check_new_docs.yml ```yml path="/.github/workflows/check_new_docs.yml" name: Integration docs lint on: push: branches: [master] pull_request: # If another push to the same PR or branch happens while this workflow is still running, # cancel the earlier run in favor of the next run. # # There's no point in testing an outdated version of the code. GitHub only allows # a limited number of job runners to be active at the same time, so it's better to cancel # pointless jobs early so that more useful jobs can run sooner. concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: '3.10' - id: files uses: Ana06/get-changed-files@v2.2.0 with: filter: | *.ipynb *.md *.mdx - name: Check new docs run: | python docs/scripts/check_templates.py ${{ steps.files.outputs.added }} ``` ## /.github/workflows/codespell.yml ```yml path="/.github/workflows/codespell.yml" name: CI / cd . / make spell_check on: push: branches: [master, v0.1, v0.2] pull_request: permissions: contents: read jobs: codespell: name: (Check for spelling errors) runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v4 - name: Install Dependencies run: | pip install toml - name: Extract Ignore Words List run: | # Use a Python script to extract the ignore words list from pyproject.toml python .github/workflows/extract_ignored_words_list.py id: extract_ignore_words # - name: Codespell # uses: codespell-project/actions-codespell@v2 # with: # skip: guide_imports.json,*.ambr,./cookbook/data/imdb_top_1000.csv,*.lock # ignore_words_list: ${{ steps.extract_ignore_words.outputs.ignore_words_list }} # exclude_file: ./.github/workflows/codespell-exclude ``` ## /.github/workflows/codspeed.yml ```yml path="/.github/workflows/codspeed.yml" name: CodSpeed on: push: branches: - master pull_request: paths: - 'libs/core/**' # `workflow_dispatch` allows CodSpeed to trigger backtest # performance analysis in order to generate initial data. workflow_dispatch: jobs: codspeed: name: Run benchmarks if: (github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'run-codspeed-benchmarks')) || github.event_name == 'workflow_dispatch' || github.event_name == 'push' runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 # We have to use 3.12, 3.13 is not yet supported - name: Install uv uses: astral-sh/setup-uv@v5 with: python-version: "3.12" # Using this action is still necessary for CodSpeed to work - uses: actions/setup-python@v3 with: python-version: "3.12" - name: install deps run: uv sync --group test working-directory: ./libs/core - name: Run benchmarks uses: CodSpeedHQ/action@v3 with: token: ${{ secrets.CODSPEED_TOKEN }} run: | cd libs/core uv run --no-sync pytest ./tests/benchmarks --codspeed mode: walltime ``` ## /.github/workflows/extract_ignored_words_list.py ```py path="/.github/workflows/extract_ignored_words_list.py" import toml pyproject_toml = toml.load("pyproject.toml") # Extract the ignore words list (adjust the key as per your TOML structure) ignore_words_list = ( pyproject_toml.get("tool", {}).get("codespell", {}).get("ignore-words-list") ) print(f"::set-output name=ignore_words_list::{ignore_words_list}") ``` ## /.github/workflows/people.yml ```yml path="/.github/workflows/people.yml" name: LangChain People on: schedule: - cron: "0 14 1 * *" push: branches: [jacob/people] workflow_dispatch: jobs: langchain-people: if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule' runs-on: ubuntu-latest permissions: write-all steps: - name: Dump GitHub context env: GITHUB_CONTEXT: ${{ toJson(github) }} run: echo "$GITHUB_CONTEXT" - uses: actions/checkout@v4 # Ref: https://github.com/actions/runner/issues/2033 - name: Fix git safe.directory in container run: mkdir -p /home/runner/work/_temp/_github_home && printf "[safe]\n\tdirectory = /github/workspace" > /home/runner/work/_temp/_github_home/.gitconfig - uses: ./.github/actions/people with: token: ${{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }} ``` ## /.github/workflows/run_notebooks.yml ```yml path="/.github/workflows/run_notebooks.yml" name: Run notebooks on: workflow_dispatch: inputs: python_version: description: 'Python version' required: false default: '3.11' working-directory: description: 'Working directory or subset (e.g., docs/docs/tutorials/llm_chain.ipynb or docs/docs/how_to)' required: false default: 'all' schedule: - cron: '0 13 * * *' env: UV_FROZEN: "true" jobs: build: runs-on: ubuntu-latest if: github.repository == 'langchain-ai/langchain' || github.event_name != 'schedule' name: "Test docs" steps: - uses: actions/checkout@v4 - name: Set up Python + uv uses: "./.github/actions/uv_setup" with: python-version: ${{ github.event.inputs.python_version || '3.11' }} - name: 'Authenticate to Google Cloud' id: 'auth' uses: google-github-actions/auth@v2 with: credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}' - name: Configure AWS Credentials uses: aws-actions/configure-aws-credentials@v4 with: aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} aws-region: ${{ secrets.AWS_REGION }} - name: Install dependencies run: | uv sync --group dev --group test - name: Pre-download files run: | uv run python docs/scripts/cache_data.py curl -s https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql | sqlite3 docs/docs/how_to/Chinook.db cp docs/docs/how_to/Chinook.db docs/docs/tutorials/Chinook.db - name: Prepare notebooks run: | uv run python docs/scripts/prepare_notebooks_for_ci.py --comment-install-cells --working-directory ${{ github.event.inputs.working-directory || 'all' }} - name: Run notebooks env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }} GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }} GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }} MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }} OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} TAVILY_API_KEY: ${{ secrets.TAVILY_API_KEY }} TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }} WORKING_DIRECTORY: ${{ github.event.inputs.working-directory || 'all' }} run: | ./docs/scripts/execute_notebooks.sh $WORKING_DIRECTORY ``` ## /.github/workflows/scheduled_test.yml ```yml path="/.github/workflows/scheduled_test.yml" name: Scheduled tests on: workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI inputs: working-directory-force: type: string description: "From which folder this pipeline executes - defaults to all in matrix - example value: libs/partners/anthropic" python-version-force: type: string description: "Python version to use - defaults to 3.9 and 3.11 in matrix - example value: 3.9" schedule: - cron: '0 13 * * *' env: POETRY_VERSION: "1.8.4" UV_FROZEN: "true" DEFAULT_LIBS: '["libs/partners/openai", "libs/partners/anthropic", "libs/partners/fireworks", "libs/partners/groq", "libs/partners/mistralai", "libs/partners/xai", "libs/partners/google-vertexai", "libs/partners/google-genai", "libs/partners/aws"]' POETRY_LIBS: ("libs/partners/google-vertexai" "libs/partners/google-genai" "libs/partners/aws") jobs: compute-matrix: if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule' runs-on: ubuntu-latest name: Compute matrix outputs: matrix: ${{ steps.set-matrix.outputs.matrix }} steps: - name: Set matrix id: set-matrix env: DEFAULT_LIBS: ${{ env.DEFAULT_LIBS }} WORKING_DIRECTORY_FORCE: ${{ github.event.inputs.working-directory-force || '' }} PYTHON_VERSION_FORCE: ${{ github.event.inputs.python-version-force || '' }} run: | # echo "matrix=..." where matrix is a json formatted str with keys python-version and working-directory # python-version should default to 3.9 and 3.11, but is overridden to [PYTHON_VERSION_FORCE] if set # working-directory should default to DEFAULT_LIBS, but is overridden to [WORKING_DIRECTORY_FORCE] if set python_version='["3.9", "3.11"]' working_directory="$DEFAULT_LIBS" if [ -n "$PYTHON_VERSION_FORCE" ]; then python_version="[\"$PYTHON_VERSION_FORCE\"]" fi if [ -n "$WORKING_DIRECTORY_FORCE" ]; then working_directory="[\"$WORKING_DIRECTORY_FORCE\"]" fi matrix="{\"python-version\": $python_version, \"working-directory\": $working_directory}" echo $matrix echo "matrix=$matrix" >> $GITHUB_OUTPUT build: if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule' name: Python ${{ matrix.python-version }} - ${{ matrix.working-directory }} runs-on: ubuntu-latest needs: [compute-matrix] timeout-minutes: 20 strategy: fail-fast: false matrix: python-version: ${{ fromJSON(needs.compute-matrix.outputs.matrix).python-version }} working-directory: ${{ fromJSON(needs.compute-matrix.outputs.matrix).working-directory }} steps: - uses: actions/checkout@v4 with: path: langchain - uses: actions/checkout@v4 with: repository: langchain-ai/langchain-google path: langchain-google - uses: actions/checkout@v4 with: repository: langchain-ai/langchain-aws path: langchain-aws - name: Move libs run: | rm -rf \ langchain/libs/partners/google-genai \ langchain/libs/partners/google-vertexai mv langchain-google/libs/genai langchain/libs/partners/google-genai mv langchain-google/libs/vertexai langchain/libs/partners/google-vertexai mv langchain-aws/libs/aws langchain/libs/partners/aws - name: Set up Python ${{ matrix.python-version }} with poetry if: contains(env.POETRY_LIBS, matrix.working-directory) uses: "./langchain/.github/actions/poetry_setup" with: python-version: ${{ matrix.python-version }} poetry-version: ${{ env.POETRY_VERSION }} working-directory: langchain/${{ matrix.working-directory }} cache-key: scheduled - name: Set up Python ${{ matrix.python-version }} + uv if: "!contains(env.POETRY_LIBS, matrix.working-directory)" uses: "./langchain/.github/actions/uv_setup" with: python-version: ${{ matrix.python-version }} - name: 'Authenticate to Google Cloud' id: 'auth' uses: google-github-actions/auth@v2 with: credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}' - name: Configure AWS Credentials uses: aws-actions/configure-aws-credentials@v4 with: aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} aws-region: ${{ secrets.AWS_REGION }} - name: Install dependencies (poetry) if: contains(env.POETRY_LIBS, matrix.working-directory) run: | echo "Running scheduled tests, installing dependencies with poetry..." cd langchain/${{ matrix.working-directory }} poetry install --with=test_integration,test - name: Install dependencies (uv) if: "!contains(env.POETRY_LIBS, matrix.working-directory)" run: | echo "Running scheduled tests, installing dependencies with uv..." cd langchain/${{ matrix.working-directory }} uv sync --group test --group test_integration - name: Run integration tests env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }} AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }} AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }} AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }} AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME }} AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }} AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }} DEEPSEEK_API_KEY: ${{ secrets.DEEPSEEK_API_KEY }} FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }} GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }} HUGGINGFACEHUB_API_TOKEN: ${{ secrets.HUGGINGFACEHUB_API_TOKEN }} MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }} XAI_API_KEY: ${{ secrets.XAI_API_KEY }} COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }} NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }} GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }} GOOGLE_SEARCH_API_KEY: ${{ secrets.GOOGLE_SEARCH_API_KEY }} GOOGLE_CSE_ID: ${{ secrets.GOOGLE_CSE_ID }} PPLX_API_KEY: ${{ secrets.PPLX_API_KEY }} run: | cd langchain/${{ matrix.working-directory }} make integration_tests - name: Remove external libraries run: | rm -rf \ langchain/libs/partners/google-genai \ langchain/libs/partners/google-vertexai \ langchain/libs/partners/aws - name: Ensure the tests did not create any additional files working-directory: langchain run: | set -eu STATUS="$(git status)" echo "$STATUS" # grep will exit non-zero if the target message isn't found, # and `set -e` above will cause the step to fail. echo "$STATUS" | grep 'nothing to commit, working tree clean' ``` ## /.gitignore ```gitignore path="/.gitignore" .vs/ .vscode/ .idea/ # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ pip-wheel-metadata/ share/python-wheels/ *.egg-info/ .installed.cfg *.egg MANIFEST # Google GitHub Actions credentials files created by: # https://github.com/google-github-actions/auth # # That action recommends adding this gitignore to prevent accidentally committing keys. gha-creds-*.json # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .nox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover *.py,cover .hypothesis/ .pytest_cache/ .codspeed/ # Translations *.mo *.pot # Django stuff: *.log local_settings.py db.sqlite3 db.sqlite3-journal # Flask stuff: instance/ .webassets-cache # Scrapy stuff: .scrapy # Sphinx documentation docs/_build/ docs/docs/_build/ # PyBuilder target/ # Jupyter Notebook .ipynb_checkpoints notebooks/ # IPython profile_default/ ipython_config.py # pyenv .python-version # pipenv # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. # However, in case of collaboration, if having platform-specific dependencies or dependencies # having no cross-platform support, pipenv may install dependencies that don't work, or not # install all needed dependencies. #Pipfile.lock # PEP 582; used by e.g. github.com/David-OConnor/pyflow __pypackages__/ # Celery stuff celerybeat-schedule celerybeat.pid # SageMath parsed files *.sage.py # Environments .env .envrc .venv* venv* env/ ENV/ env.bak/ # Spyder project settings .spyderproject .spyproject # Rope project settings .ropeproject # mkdocs documentation /site # mypy .mypy_cache/ .mypy_cache_test/ .dmypy.json dmypy.json # Pyre type checker .pyre/ # macOS display setting files .DS_Store # Wandb directory wandb/ # asdf tool versions .tool-versions /.ruff_cache/ *.pkl *.bin # integration test artifacts data_map* \[('_type', 'fake'), ('stop', None)] # Replit files *replit* node_modules docs/.yarn/ docs/node_modules/ docs/.docusaurus/ docs/.cache-loader/ docs/_dist docs/api_reference/*api_reference.rst docs/api_reference/*.md docs/api_reference/_build docs/api_reference/*/ !docs/api_reference/_static/ !docs/api_reference/templates/ !docs/api_reference/themes/ !docs/api_reference/_extensions/ !docs/api_reference/scripts/ docs/docs/build docs/docs/node_modules docs/docs/yarn.lock _dist docs/docs/templates prof virtualenv/ ``` ## /.pre-commit-config.yaml ```yaml path="/.pre-commit-config.yaml" repos: - repo: local hooks: - id: core name: format core language: system entry: make -C libs/core format files: ^libs/core/ pass_filenames: false - id: community name: format community language: system entry: make -C libs/community format files: ^libs/community/ pass_filenames: false - id: langchain name: format langchain language: system entry: make -C libs/langchain format files: ^libs/langchain/ pass_filenames: false - id: standard-tests name: format standard-tests language: system entry: make -C libs/standard-tests format files: ^libs/standard-tests/ pass_filenames: false - id: text-splitters name: format text-splitters language: system entry: make -C libs/text-splitters format files: ^libs/text-splitters/ pass_filenames: false - id: anthropic name: format partners/anthropic language: system entry: make -C libs/partners/anthropic format files: ^libs/partners/anthropic/ pass_filenames: false - id: chroma name: format partners/chroma language: system entry: make -C libs/partners/chroma format files: ^libs/partners/chroma/ pass_filenames: false - id: couchbase name: format partners/couchbase language: system entry: make -C libs/partners/couchbase format files: ^libs/partners/couchbase/ pass_filenames: false - id: exa name: format partners/exa language: system entry: make -C libs/partners/exa format files: ^libs/partners/exa/ pass_filenames: false - id: fireworks name: format partners/fireworks language: system entry: make -C libs/partners/fireworks format files: ^libs/partners/fireworks/ pass_filenames: false - id: groq name: format partners/groq language: system entry: make -C libs/partners/groq format files: ^libs/partners/groq/ pass_filenames: false - id: huggingface name: format partners/huggingface language: system entry: make -C libs/partners/huggingface format files: ^libs/partners/huggingface/ pass_filenames: false - id: mistralai name: format partners/mistralai language: system entry: make -C libs/partners/mistralai format files: ^libs/partners/mistralai/ pass_filenames: false - id: nomic name: format partners/nomic language: system entry: make -C libs/partners/nomic format files: ^libs/partners/nomic/ pass_filenames: false - id: ollama name: format partners/ollama language: system entry: make -C libs/partners/ollama format files: ^libs/partners/ollama/ pass_filenames: false - id: openai name: format partners/openai language: system entry: make -C libs/partners/openai format files: ^libs/partners/openai/ pass_filenames: false - id: prompty name: format partners/prompty language: system entry: make -C libs/partners/prompty format files: ^libs/partners/prompty/ pass_filenames: false - id: qdrant name: format partners/qdrant language: system entry: make -C libs/partners/qdrant format files: ^libs/partners/qdrant/ pass_filenames: false - id: voyageai name: format partners/voyageai language: system entry: make -C libs/partners/voyageai format files: ^libs/partners/voyageai/ pass_filenames: false - id: root name: format docs, cookbook language: system entry: make format files: ^(docs|cookbook)/ pass_filenames: false ``` ## /.readthedocs.yaml ```yaml path="/.readthedocs.yaml" # Read the Docs configuration file # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details version: 2 # Set the version of Python and other tools you might need build: os: ubuntu-22.04 tools: python: "3.11" commands: - mkdir -p $READTHEDOCS_OUTPUT - cp -r api_reference_build/* $READTHEDOCS_OUTPUT # Build documentation in the docs/ directory with Sphinx sphinx: configuration: docs/api_reference/conf.py # If using Sphinx, optionally build your docs in additional formats such as PDF formats: - pdf # Optionally declare the Python requirements required to build your docs python: install: - requirements: docs/api_reference/requirements.txt ``` ## /CITATION.cff ```cff path="/CITATION.cff" cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Chase" given-names: "Harrison" title: "LangChain" date-released: 2022-10-17 url: "https://github.com/langchain-ai/langchain" ``` ## /LICENSE ``` path="/LICENSE" MIT License Copyright (c) LangChain, Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ``` ## /MIGRATE.md # Migrating Please see the following guides for migrating LangChain code: * Migrate to [LangChain v0.3](https://python.langchain.com/docs/versions/v0_3/) * Migrate to [LangChain v0.2](https://python.langchain.com/docs/versions/v0_2/) * Migrating from [LangChain 0.0.x Chains](https://python.langchain.com/docs/versions/migrating_chains/) * Upgrade to [LangGraph Memory](https://python.langchain.com/docs/versions/migrating_memory/) The [LangChain CLI](https://python.langchain.com/docs/versions/v0_3/#migrate-using-langchain-cli) can help you automatically upgrade your code to use non-deprecated imports. This will be especially helpful if you're still on either version 0.0.x or 0.1.x of LangChain. ## /Makefile ``` path="/Makefile" .PHONY: all clean help docs_build docs_clean docs_linkcheck api_docs_build api_docs_clean api_docs_linkcheck spell_check spell_fix lint lint_package lint_tests format format_diff .EXPORT_ALL_VARIABLES: UV_FROZEN = true ## help: Show this help info. help: Makefile @printf "\n\033[1mUsage: make ...\033[0m\n\n\033[1mTargets:\033[0m\n\n" @sed -n 's/^## //p' $< | awk -F':' '{printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}' | sort | sed -e 's/^/ /' ## all: Default target, shows help. all: help ## clean: Clean documentation and API documentation artifacts. clean: docs_clean api_docs_clean ###################### # DOCUMENTATION ###################### ## docs_build: Build the documentation. docs_build: cd docs && make build ## docs_clean: Clean the documentation build artifacts. docs_clean: cd docs && make clean ## docs_linkcheck: Run linkchecker on the documentation. docs_linkcheck: uv run --no-group test linkchecker _dist/docs/ --ignore-url node_modules ## api_docs_build: Build the API Reference documentation. api_docs_build: uv run --no-group test python docs/api_reference/create_api_rst.py cd docs/api_reference && uv run --no-group test make html uv run --no-group test python docs/api_reference/scripts/custom_formatter.py docs/api_reference/_build/html/ API_PKG ?= text-splitters api_docs_quick_preview: uv run --no-group test python docs/api_reference/create_api_rst.py $(API_PKG) cd docs/api_reference && uv run make html uv run --no-group test python docs/api_reference/scripts/custom_formatter.py docs/api_reference/_build/html/ open docs/api_reference/_build/html/reference.html ## api_docs_clean: Clean the API Reference documentation build artifacts. api_docs_clean: find ./docs/api_reference -name '*_api_reference.rst' -delete git clean -fdX ./docs/api_reference rm docs/api_reference/index.md ## api_docs_linkcheck: Run linkchecker on the API Reference documentation. api_docs_linkcheck: uv run --no-group test linkchecker docs/api_reference/_build/html/index.html ## spell_check: Run codespell on the project. spell_check: uv run --no-group test codespell --toml pyproject.toml ## spell_fix: Run codespell on the project and fix the errors. spell_fix: uv run --no-group test codespell --toml pyproject.toml -w ###################### # LINTING AND FORMATTING ###################### ## lint: Run linting on the project. lint lint_package lint_tests: uv run --group lint ruff check docs cookbook uv run --group lint ruff format docs cookbook cookbook --diff uv run --group lint ruff check --select I docs cookbook git --no-pager grep 'from langchain import' docs cookbook | grep -vE 'from langchain import (hub)' && echo "Error: no importing langchain from root in docs, except for hub" && exit 1 || exit 0 git --no-pager grep 'api.python.langchain.com' -- docs/docs ':!docs/docs/additional_resources/arxiv_references.mdx' ':!docs/docs/integrations/document_loaders/sitemap.ipynb' || exit 0 && \ echo "Error: you should link python.langchain.com/api_reference, not api.python.langchain.com in the docs" && \ exit 1 ## format: Format the project files. format format_diff: uv run --group lint ruff format docs cookbook uv run --group lint ruff check --select I --fix docs cookbook update-package-downloads: uv run python docs/scripts/packages_yml_get_downloads.py ``` ## /README.md LangChain Logo

[![Release Notes](https://img.shields.io/github/release/langchain-ai/langchain?style=flat-square)](https://github.com/langchain-ai/langchain/releases) [![CI](https://github.com/langchain-ai/langchain/actions/workflows/check_diffs.yml/badge.svg)](https://github.com/langchain-ai/langchain/actions/workflows/check_diffs.yml) [![PyPI - License](https://img.shields.io/pypi/l/langchain-core?style=flat-square)](https://opensource.org/licenses/MIT) [![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-core?style=flat-square)](https://pypistats.org/packages/langchain-core) [![GitHub star chart](https://img.shields.io/github/stars/langchain-ai/langchain?style=flat-square)](https://star-history.com/#langchain-ai/langchain) [![Open Issues](https://img.shields.io/github/issues-raw/langchain-ai/langchain?style=flat-square)](https://github.com/langchain-ai/langchain/issues) [![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode&style=flat-square)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain) [](https://codespaces.new/langchain-ai/langchain) [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI)](https://twitter.com/langchainai) [![CodSpeed Badge](https://img.shields.io/endpoint?url=https://codspeed.io/badge.json)](https://codspeed.io/langchain-ai/langchain) > [!NOTE] > Looking for the JS/TS library? Check out [LangChain.js](https://github.com/langchain-ai/langchainjs). LangChain is a framework for building LLM-powered applications. It helps you chain together interoperable components and third-party integrations to simplify AI application development — all while future-proofing decisions as the underlying technology evolves. ```bash pip install -U langchain ``` To learn more about LangChain, check out [the docs](https://python.langchain.com/docs/introduction/). If you’re looking for more advanced customization or agent orchestration, check out [LangGraph](https://langchain-ai.github.io/langgraph/), our framework for building controllable agent workflows. ## Why use LangChain? LangChain helps developers build applications powered by LLMs through a standard interface for models, embeddings, vector stores, and more. Use LangChain for: - **Real-time data augmentation**. Easily connect LLMs to diverse data sources and external / internal systems, drawing from LangChain’s vast library of integrations with model providers, tools, vector stores, retrievers, and more. - **Model interoperability**. Swap models in and out as your engineering team experiments to find the best choice for your application’s needs. As the industry frontier evolves, adapt quickly — LangChain’s abstractions keep you moving without losing momentum. ## LangChain’s ecosystem While the LangChain framework can be used standalone, it also integrates seamlessly with any LangChain product, giving developers a full suite of tools when building LLM applications. To improve your LLM application development, pair LangChain with: - [LangSmith](http://www.langchain.com/langsmith) - Helpful for agent evals and observability. Debug poor-performing LLM app runs, evaluate agent trajectories, gain visibility in production, and improve performance over time. - [LangGraph](https://langchain-ai.github.io/langgraph/) - Build agents that can reliably handle complex tasks with LangGraph, our low-level agent orchestration framework. LangGraph offers customizable architecture, long-term memory, and human-in-the-loop workflows — and is trusted in production by companies like LinkedIn, Uber, Klarna, and GitLab. - [LangGraph Platform](https://langchain-ai.github.io/langgraph/concepts/#langgraph-platform) - Deploy and scale agents effortlessly with a purpose-built deployment platform for long running, stateful workflows. Discover, reuse, configure, and share agents across teams — and iterate quickly with visual prototyping in [LangGraph Studio](https://langchain-ai.github.io/langgraph/concepts/langgraph_studio/). ## Additional resources - [Tutorials](https://python.langchain.com/docs/tutorials/): Simple walkthroughs with guided examples on getting started with LangChain. - [How-to Guides](https://python.langchain.com/docs/how_to/): Quick, actionable code snippets for topics such as tool calling, RAG use cases, and more. - [Conceptual Guides](https://python.langchain.com/docs/concepts/): Explanations of key concepts behind the LangChain framework. - [API Reference](https://python.langchain.com/api_reference/): Detailed reference on navigating base packages and integrations for LangChain. ## /SECURITY.md # Security Policy LangChain has a large ecosystem of integrations with various external resources like local and remote file systems, APIs and databases. These integrations allow developers to create versatile applications that combine the power of LLMs with the ability to access, interact with and manipulate external resources. ## Best practices When building such applications developers should remember to follow good security practices: * [**Limit Permissions**](https://en.wikipedia.org/wiki/Principle_of_least_privilege): Scope permissions specifically to the application's need. Granting broad or excessive permissions can introduce significant security vulnerabilities. To avoid such vulnerabilities, consider using read-only credentials, disallowing access to sensitive resources, using sandboxing techniques (such as running inside a container), specifying proxy configurations to control external requests, etc. as appropriate for your application. * **Anticipate Potential Misuse**: Just as humans can err, so can Large Language Models (LLMs). Always assume that any system access or credentials may be used in any way allowed by the permissions they are assigned. For example, if a pair of database credentials allows deleting data, it’s safest to assume that any LLM able to use those credentials may in fact delete data. * [**Defense in Depth**](https://en.wikipedia.org/wiki/Defense_in_depth_(computing)): No security technique is perfect. Fine-tuning and good chain design can reduce, but not eliminate, the odds that a Large Language Model (LLM) may make a mistake. It’s best to combine multiple layered security approaches rather than relying on any single layer of defense to ensure security. For example: use both read-only permissions and sandboxing to ensure that LLMs are only able to access data that is explicitly meant for them to use. Risks of not doing so include, but are not limited to: * Data corruption or loss. * Unauthorized access to confidential information. * Compromised performance or availability of critical resources. Example scenarios with mitigation strategies: * A user may ask an agent with access to the file system to delete files that should not be deleted or read the content of files that contain sensitive information. To mitigate, limit the agent to only use a specific directory and only allow it to read or write files that are safe to read or write. Consider further sandboxing the agent by running it in a container. * A user may ask an agent with write access to an external API to write malicious data to the API, or delete data from that API. To mitigate, give the agent read-only API keys, or limit it to only use endpoints that are already resistant to such misuse. * A user may ask an agent with access to a database to drop a table or mutate the schema. To mitigate, scope the credentials to only the tables that the agent needs to access and consider issuing READ-ONLY credentials. If you're building applications that access external resources like file systems, APIs or databases, consider speaking with your company's security team to determine how to best design and secure your applications. ## Reporting OSS Vulnerabilities LangChain is partnered with [huntr by Protect AI](https://huntr.com/) to provide a bounty program for our open source projects. Please report security vulnerabilities associated with the LangChain open source projects by visiting the following link: [https://huntr.com/bounties/disclose/](https://huntr.com/bounties/disclose/?target=https%3A%2F%2Fgithub.com%2Flangchain-ai%2Flangchain&validSearch=true) Before reporting a vulnerability, please review: 1) In-Scope Targets and Out-of-Scope Targets below. 2) The [langchain-ai/langchain](https://python.langchain.com/docs/contributing/repo_structure) monorepo structure. 3) The [Best practicies](#best-practices) above to understand what we consider to be a security vulnerability vs. developer responsibility. ### In-Scope Targets The following packages and repositories are eligible for bug bounties: - langchain-core - langchain (see exceptions) - langchain-community (see exceptions) - langgraph - langserve ### Out of Scope Targets All out of scope targets defined by huntr as well as: - **langchain-experimental**: This repository is for experimental code and is not eligible for bug bounties (see [package warning](https://pypi.org/project/langchain-experimental/)), bug reports to it will be marked as interesting or waste of time and published with no bounty attached. - **tools**: Tools in either langchain or langchain-community are not eligible for bug bounties. This includes the following directories - libs/langchain/langchain/tools - libs/community/langchain_community/tools - Please review the [best practices](#best-practices) for more details, but generally tools interact with the real world. Developers are expected to understand the security implications of their code and are responsible for the security of their tools. - Code documented with security notices. This will be decided done on a case by case basis, but likely will not be eligible for a bounty as the code is already documented with guidelines for developers that should be followed for making their application secure. - Any LangSmith related repositories or APIs (see [Reporting LangSmith Vulnerabilities](#reporting-langsmith-vulnerabilities)). ## Reporting LangSmith Vulnerabilities Please report security vulnerabilities associated with LangSmith by email to `security@langchain.dev`. - LangSmith site: https://smith.langchain.com - SDK client: https://github.com/langchain-ai/langsmith-sdk ### Other Security Concerns For any other security concerns, please contact us at `security@langchain.dev`. ## /cookbook/Gemma_LangChain.ipynb ```ipynb path="/cookbook/Gemma_LangChain.ipynb" { "cells": [ { "cell_type": "markdown", "metadata": { "id": "BYejgj8Zf-LG", "tags": [] }, "source": [ "## Getting started with LangChain and Gemma, running locally or in the Cloud" ] }, { "cell_type": "markdown", "metadata": { "id": "2IxjMb9-jIJ8" }, "source": [ "### Installing dependencies" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 9436, "status": "ok", "timestamp": 1708975187360, "user": { "displayName": "", "userId": "" }, "user_tz": -60 }, "id": "XZaTsXfcheTF", "outputId": "eb21d603-d824-46c5-f99f-087fb2f618b1", "tags": [] }, "outputs": [], "source": [ "!pip install --upgrade langchain langchain-google-vertexai" ] }, { "cell_type": "markdown", "metadata": { "id": "IXmAujvC3Kwp" }, "source": [ "### Running the model" ] }, { "cell_type": "markdown", "metadata": { "id": "CI8Elyc5gBQF" }, "source": [ "Go to the VertexAI Model Garden on Google Cloud [console](https://pantheon.corp.google.com/vertex-ai/publishers/google/model-garden/335), and deploy the desired version of Gemma to VertexAI. It will take a few minutes, and after the endpoint is ready, you need to copy its number." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "gv1j8FrVftsC" }, "outputs": [], "source": [ "# @title Basic parameters\n", "project: str = \"PUT_YOUR_PROJECT_ID_HERE\" # @param {type:\"string\"}\n", "endpoint_id: str = \"PUT_YOUR_ENDPOINT_ID_HERE\" # @param {type:\"string\"}\n", "location: str = \"PUT_YOUR_ENDPOINT_LOCAtION_HERE\" # @param {type:\"string\"}" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "executionInfo": { "elapsed": 3, "status": "ok", "timestamp": 1708975440503, "user": { "displayName": "", "userId": "" }, "user_tz": -60 }, "id": "bhIHsFGYjtFt", "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-02-27 17:15:10.457149: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", "2024-02-27 17:15:10.508925: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", "2024-02-27 17:15:10.508957: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", "2024-02-27 17:15:10.510289: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", "2024-02-27 17:15:10.518898: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" ] } ], "source": [ "from langchain_google_vertexai import (\n", " GemmaChatVertexAIModelGarden,\n", " GemmaVertexAIModelGarden,\n", ")" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "executionInfo": { "elapsed": 351, "status": "ok", "timestamp": 1708975440852, "user": { "displayName": "", "userId": "" }, "user_tz": -60 }, "id": "WJv-UVWwh0lk", "tags": [] }, "outputs": [], "source": [ "llm = GemmaVertexAIModelGarden(\n", " endpoint_id=endpoint_id,\n", " project=project,\n", " location=location,\n", ")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 714, "status": "ok", "timestamp": 1708975441564, "user": { "displayName": "", "userId": "" }, "user_tz": -60 }, "id": "6kM7cEFdiN9h", "outputId": "fb420c56-5614-4745-cda8-0ee450a3e539", "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Prompt:\n", "What is the meaning of life?\n", "Output:\n", " Who am I? Why do I exist? These are questions I have struggled with\n" ] } ], "source": [ "output = llm.invoke(\"What is the meaning of life?\")\n", "print(output)" ] }, { "cell_type": "markdown", "metadata": { "id": "zzep9nfmuUcO" }, "source": [ "We can also use Gemma as a multi-turn chat model:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 964, "status": "ok", "timestamp": 1708976298189, "user": { "displayName": "", "userId": "" }, "user_tz": -60 }, "id": "8tPHoM5XiZOl", "outputId": "7b8fb652-9aed-47b0-c096-aa1abfc3a2a9", "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "content='Prompt:\\nuser\\nHow much is 2+2?\\nmodel\\nOutput:\\n8-years old.\\n\\n=0.3.1, but you have ml-dtypes 0.2.0 which is incompatible.\u001b[0m\u001b[31m\n", "\u001b[0m" ] } ], "source": [ "!pip install keras>=3 keras_nlp" ] }, { "cell_type": "markdown", "metadata": { "id": "E9zn8nYpv3QZ" }, "source": [ "### Usage" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "executionInfo": { "elapsed": 8536, "status": "ok", "timestamp": 1708976601206, "user": { "displayName": "", "userId": "" }, "user_tz": -60 }, "id": "0LFRmY8TjCkI", "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-02-27 16:38:40.797559: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", "2024-02-27 16:38:40.848444: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", "2024-02-27 16:38:40.848478: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", "2024-02-27 16:38:40.849728: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", "2024-02-27 16:38:40.857936: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" ] } ], "source": [ "from langchain_google_vertexai import GemmaLocalKaggle" ] }, { "cell_type": "markdown", "metadata": { "id": "v-o7oXVavdMQ" }, "source": [ "You can specify the keras backend (by default it's `tensorflow`, but you can change it be `jax` or `torch`)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "executionInfo": { "elapsed": 9, "status": "ok", "timestamp": 1708976601206, "user": { "displayName": "", "userId": "" }, "user_tz": -60 }, "id": "vvTUH8DNj5SF", "tags": [] }, "outputs": [], "source": [ "# @title Basic parameters\n", "keras_backend: str = \"jax\" # @param {type:\"string\"}\n", "model_name: str = \"gemma_2b_en\" # @param {type:\"string\"}" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "executionInfo": { "elapsed": 40836, "status": "ok", "timestamp": 1708976761257, "user": { "displayName": "", "userId": "" }, "user_tz": -60 }, "id": "YOmrqxo5kHXK", "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-02-27 16:23:14.661164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20549 MB memory: -> device: 0, name: NVIDIA L4, pci bus id: 0000:00:03.0, compute capability: 8.9\n", "normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.\n" ] } ], "source": [ "llm = GemmaLocalKaggle(model_name=model_name, keras_backend=keras_backend)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "id": "Zu6yPDUgkQtQ", "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "W0000 00:00:1709051129.518076 774855 graph_launch.cc:671] Fallback to op-by-op mode because memset node breaks graph update\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "What is the meaning of life?\n", "\n", "The question is one of the most important questions in the world.\n", "\n", "It’s the question that has\n" ] } ], "source": [ "output = llm.invoke(\"What is the meaning of life?\", max_tokens=30)\n", "print(output)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ChatModel" ] }, { "cell_type": "markdown", "metadata": { "id": "MSctpRE4u43N" }, "source": [ "Same as above, using Gemma locally as a multi-turn chat model. You might need to re-start the notebook and clean your GPU memory in order to avoid OOM errors:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-02-27 16:58:22.331067: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", "2024-02-27 16:58:22.382948: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", "2024-02-27 16:58:22.382978: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", "2024-02-27 16:58:22.384312: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", "2024-02-27 16:58:22.392767: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" ] } ], "source": [ "from langchain_google_vertexai import GemmaChatLocalKaggle" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [] }, "outputs": [], "source": [ "# @title Basic parameters\n", "keras_backend: str = \"jax\" # @param {type:\"string\"}\n", "model_name: str = \"gemma_2b_en\" # @param {type:\"string\"}" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-02-27 16:58:29.001922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20549 MB memory: -> device: 0, name: NVIDIA L4, pci bus id: 0000:00:03.0, compute capability: 8.9\n", "normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.\n" ] } ], "source": [ "llm = GemmaChatLocalKaggle(model_name=model_name, keras_backend=keras_backend)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "executionInfo": { "elapsed": 3, "status": "aborted", "timestamp": 1708976382957, "user": { "displayName": "", "userId": "" }, "user_tz": -60 }, "id": "JrJmvZqwwLqj" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-02-27 16:58:49.848412: I external/local_xla/xla/service/service.cc:168] XLA service 0x55adc0cf2c10 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:\n", "2024-02-27 16:58:49.848458: I external/local_xla/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA L4, Compute Capability 8.9\n", "2024-02-27 16:58:50.116614: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.\n", "2024-02-27 16:58:54.389324: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8900\n", "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n", "I0000 00:00:1709053145.225207 784891 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.\n", "W0000 00:00:1709053145.284227 784891 graph_launch.cc:671] Fallback to op-by-op mode because memset node breaks graph update\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "content=\"user\\nHi! Who are you?\\nmodel\\nI'm a model.\\n Tampoco\\nI'm a model.\"\n" ] } ], "source": [ "from langchain_core.messages import HumanMessage\n", "\n", "message1 = HumanMessage(content=\"Hi! Who are you?\")\n", "answer1 = llm.invoke([message1], max_tokens=30)\n", "print(answer1)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "content=\"user\\nHi! Who are you?\\nmodel\\nuser\\nHi! Who are you?\\nmodel\\nI'm a model.\\n Tampoco\\nI'm a model.\\nuser\\nWhat can you help me with?\\nmodel\"\n" ] } ], "source": [ "message2 = HumanMessage(content=\"What can you help me with?\")\n", "answer2 = llm.invoke([message1, answer1, message2], max_tokens=60)\n", "\n", "print(answer2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can post-process the response if you want to avoid multi-turn statements:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "content=\"I'm a model.\\n Tampoco\\nI'm a model.\"\n", "content='I can help you with your modeling.\\n Tampoco\\nI can'\n" ] } ], "source": [ "answer1 = llm.invoke([message1], max_tokens=30, parse_response=True)\n", "print(answer1)\n", "\n", "answer2 = llm.invoke([message1, answer1, message2], max_tokens=60, parse_response=True)\n", "print(answer2)" ] }, { "cell_type": "markdown", "metadata": { "id": "EiZnztso7hyF" }, "source": [ "## Running Gemma locally from HuggingFace" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "qqAqsz5R7nKf", "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-02-27 17:02:21.832409: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", "2024-02-27 17:02:21.883625: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", "2024-02-27 17:02:21.883656: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", "2024-02-27 17:02:21.884987: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", "2024-02-27 17:02:21.893340: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" ] } ], "source": [ "from langchain_google_vertexai import GemmaChatLocalHF, GemmaLocalHF" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "tsyntzI08cOr", "tags": [] }, "outputs": [], "source": [ "# @title Basic parameters\n", "hf_access_token: str = \"PUT_YOUR_TOKEN_HERE\" # @param {type:\"string\"}\n", "model_name: str = \"google/gemma-2b\" # @param {type:\"string\"}" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "JWrqEkOo8sm9", "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a0d6de5542254ed1b6d3ba65465e050e", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Loading checkpoint shards: 0%| | 0/2 [00:00user\\nHi! Who are you?\\nmodel\\nI'm a model.\\n\\nuser\\nWhat do you mean\"\n" ] } ], "source": [ "from langchain_core.messages import HumanMessage\n", "\n", "message1 = HumanMessage(content=\"Hi! Who are you?\")\n", "answer1 = llm.invoke([message1], max_tokens=60)\n", "print(answer1)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "content=\"user\\nHi! Who are you?\\nmodel\\nuser\\nHi! Who are you?\\nmodel\\nI'm a model.\\n\\nuser\\nWhat do you mean\\nuser\\nWhat can you help me with?\\nmodel\\nI can help you with anything.\\n<\"\n" ] } ], "source": [ "message2 = HumanMessage(content=\"What can you help me with?\")\n", "answer2 = llm.invoke([message1, answer1, message2], max_tokens=140)\n", "\n", "print(answer2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the same with posprocessing:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "content=\"I'm a model.\\n\\n\"\n", "content='I can help you with anything.\\n\\n\\n'\n" ] } ], "source": [ "answer1 = llm.invoke([message1], max_tokens=60, parse_response=True)\n", "print(answer1)\n", "\n", "answer2 = llm.invoke([message1, answer1, message2], max_tokens=120, parse_response=True)\n", "print(answer2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "colab": { "provenance": [] }, "environment": { "kernel": "python3", "name": ".m116", "type": "gcloud", "uri": "gcr.io/deeplearning-platform-release/:m116" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 4 } ``` ## /cookbook/LLaMA2_sql_chat.ipynb ```ipynb path="/cookbook/LLaMA2_sql_chat.ipynb" { "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "fc935871-7640-41c6-b798-58514d860fe0", "metadata": {}, "source": [ "## LLaMA2 chat with SQL\n", "\n", "Open source, local LLMs are great to consider for any application that demands data privacy.\n", "\n", "SQL is one good example. \n", "\n", "This cookbook shows how to perform text-to-SQL using various local versions of LLaMA2 run locally.\n", "\n", "## Packages" ] }, { "cell_type": "code", "execution_count": null, "id": "81adcf8b-395a-4f02-8749-ac976942b446", "metadata": {}, "outputs": [], "source": [ "! pip install langchain replicate" ] }, { "cell_type": "markdown", "id": "8e13ed66-300b-4a23-b8ac-44df68ee4733", "metadata": {}, "source": [ "## LLM\n", "\n", "There are a few ways to access LLaMA2.\n", "\n", "To run locally, we use Ollama.ai. \n", "\n", "See [here](/docs/integrations/chat/ollama) for details on installation and setup.\n", "\n", "Also, see [here](/docs/guides/development/local_llms) for our full guide on local LLMs.\n", " \n", "To use an external API, which is not private, we can use Replicate." ] }, { "cell_type": "code", "execution_count": 1, "id": "6a75a5c6-34ee-4ab9-a664-d9b432d812ee", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Init param `input` is deprecated, please use `model_kwargs` instead.\n" ] } ], "source": [ "# Local\n", "from langchain_community.chat_models import ChatOllama\n", "\n", "llama2_chat = ChatOllama(model=\"llama2:13b-chat\")\n", "llama2_code = ChatOllama(model=\"codellama:7b-instruct\")\n", "\n", "# API\n", "from langchain_community.llms import Replicate\n", "\n", "# REPLICATE_API_TOKEN = getpass()\n", "# os.environ[\"REPLICATE_API_TOKEN\"] = REPLICATE_API_TOKEN\n", "replicate_id = \"meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d\"\n", "llama2_chat_replicate = Replicate(\n", " model=replicate_id, input={\"temperature\": 0.01, \"max_length\": 500, \"top_p\": 1}\n", ")" ] }, { "cell_type": "code", "execution_count": 2, "id": "ce96f7ea-b3d5-44e1-9fa5-a79e04a9e1fb", "metadata": {}, "outputs": [], "source": [ "# Simply set the LLM we want to use\n", "llm = llama2_chat" ] }, { "cell_type": "markdown", "id": "80222165-f353-4e35-a123-5f70fd70c6c8", "metadata": {}, "source": [ "## DB\n", "\n", "Connect to a SQLite DB.\n", "\n", "To create this particular DB, you can use the code and follow the steps shown [here](https://github.com/facebookresearch/llama-recipes/blob/main/demo_apps/StructuredLlama.ipynb)." ] }, { "cell_type": "code", "execution_count": 3, "id": "025bdd82-3bb1-4948-bc7c-c3ccd94fd05c", "metadata": {}, "outputs": [], "source": [ "from langchain_community.utilities import SQLDatabase\n", "\n", "db = SQLDatabase.from_uri(\"sqlite:///nba_roster.db\", sample_rows_in_table_info=0)\n", "\n", "\n", "def get_schema(_):\n", " return db.get_table_info()\n", "\n", "\n", "def run_query(query):\n", " return db.run(query)" ] }, { "cell_type": "markdown", "id": "654b3577-baa2-4e12-a393-f40e5db49ac7", "metadata": {}, "source": [ "## Query a SQL Database \n", "\n", "Follow the runnables workflow [here](https://python.langchain.com/docs/expression_language/cookbook/sql_db)." ] }, { "cell_type": "code", "execution_count": 4, "id": "5a4933ea-d9c0-4b0a-8177-ba4490c6532b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "' SELECT \"Team\" FROM nba_roster WHERE \"NAME\" = \\'Klay Thompson\\';'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Prompt\n", "from langchain_core.prompts import ChatPromptTemplate\n", "\n", "# Update the template based on the type of SQL Database like MySQL, Microsoft SQL Server and so on\n", "template = \"\"\"Based on the table schema below, write a SQL query that would answer the user's question:\n", "{schema}\n", "\n", "Question: {question}\n", "SQL Query:\"\"\"\n", "prompt = ChatPromptTemplate.from_messages(\n", " [\n", " (\"system\", \"Given an input question, convert it to a SQL query. No pre-amble.\"),\n", " (\"human\", template),\n", " ]\n", ")\n", "\n", "# Chain to query\n", "from langchain_core.output_parsers import StrOutputParser\n", "from langchain_core.runnables import RunnablePassthrough\n", "\n", "sql_response = (\n", " RunnablePassthrough.assign(schema=get_schema)\n", " | prompt\n", " | llm.bind(stop=[\"\\nSQLResult:\"])\n", " | StrOutputParser()\n", ")\n", "\n", "sql_response.invoke({\"question\": \"What team is Klay Thompson on?\"})" ] }, { "cell_type": "markdown", "id": "a0e9e2c8-9b88-4853-ac86-001bc6cc6695", "metadata": {}, "source": [ "We can review the results:\n", "\n", "* [LangSmith trace](https://smith.langchain.com/public/afa56a06-b4e2-469a-a60f-c1746e75e42b/r) LLaMA2-13 Replicate API\n", "* [LangSmith trace](https://smith.langchain.com/public/2d4ecc72-6b8f-4523-8f0b-ea95c6b54a1d/r) LLaMA2-13 local \n" ] }, { "cell_type": "code", "execution_count": 15, "id": "2a2825e3-c1b6-4f7d-b9c9-d9835de323bb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AIMessage(content=' Based on the table schema and SQL query, there are 30 unique teams in the NBA.')" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Chain to answer\n", "template = \"\"\"Based on the table schema below, question, sql query, and sql response, write a natural language response:\n", "{schema}\n", "\n", "Question: {question}\n", "SQL Query: {query}\n", "SQL Response: {response}\"\"\"\n", "prompt_response = ChatPromptTemplate.from_messages(\n", " [\n", " (\n", " \"system\",\n", " \"Given an input question and SQL response, convert it to a natural language answer. No pre-amble.\",\n", " ),\n", " (\"human\", template),\n", " ]\n", ")\n", "\n", "full_chain = (\n", " RunnablePassthrough.assign(query=sql_response)\n", " | RunnablePassthrough.assign(\n", " schema=get_schema,\n", " response=lambda x: db.run(x[\"query\"]),\n", " )\n", " | prompt_response\n", " | llm\n", ")\n", "\n", "full_chain.invoke({\"question\": \"How many unique teams are there?\"})" ] }, { "cell_type": "markdown", "id": "ec17b3ee-6618-4681-b6df-089bbb5ffcd7", "metadata": {}, "source": [ "We can review the results:\n", "\n", "* [LangSmith trace](https://smith.langchain.com/public/10420721-746a-4806-8ecf-d6dc6399d739/r) LLaMA2-13 Replicate API\n", "* [LangSmith trace](https://smith.langchain.com/public/5265ebab-0a22-4f37-936b-3300f2dfa1c1/r) LLaMA2-13 local " ] }, { "cell_type": "markdown", "id": "1e85381b-1edc-4bb3-a7bd-2ab23f81e54d", "metadata": {}, "source": [ "## Chat with a SQL DB \n", "\n", "Next, we can add memory." ] }, { "cell_type": "code", "execution_count": 7, "id": "022868f2-128e-42f5-8d90-d3bb2f11d994", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "' SELECT \"Team\" FROM nba_roster WHERE \"NAME\" = \\'Klay Thompson\\';'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Prompt\n", "from langchain.memory import ConversationBufferMemory\n", "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n", "\n", "template = \"\"\"Given an input question, convert it to a SQL query. No pre-amble. Based on the table schema below, write a SQL query that would answer the user's question:\n", "{schema}\n", "\"\"\"\n", "prompt = ChatPromptTemplate.from_messages(\n", " [\n", " (\"system\", template),\n", " MessagesPlaceholder(variable_name=\"history\"),\n", " (\"human\", \"{question}\"),\n", " ]\n", ")\n", "\n", "memory = ConversationBufferMemory(return_messages=True)\n", "\n", "# Chain to query with memory\n", "from langchain_core.runnables import RunnableLambda\n", "\n", "sql_chain = (\n", " RunnablePassthrough.assign(\n", " schema=get_schema,\n", " history=RunnableLambda(lambda x: memory.load_memory_variables(x)[\"history\"]),\n", " )\n", " | prompt\n", " | llm.bind(stop=[\"\\nSQLResult:\"])\n", " | StrOutputParser()\n", ")\n", "\n", "\n", "def save(input_output):\n", " output = {\"output\": input_output.pop(\"output\")}\n", " memory.save_context(input_output, output)\n", " return output[\"output\"]\n", "\n", "\n", "sql_response_memory = RunnablePassthrough.assign(output=sql_chain) | save\n", "sql_response_memory.invoke({\"question\": \"What team is Klay Thompson on?\"})" ] }, { "cell_type": "code", "execution_count": 21, "id": "800a7a3b-f411-478b-af51-2310cd6e0425", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AIMessage(content=' Sure! Here\\'s the natural language response based on the given input:\\n\\n\"Klay Thompson\\'s salary is $43,219,440.\"')" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Chain to answer\n", "template = \"\"\"Based on the table schema below, question, sql query, and sql response, write a natural language response:\n", "{schema}\n", "\n", "Question: {question}\n", "SQL Query: {query}\n", "SQL Response: {response}\"\"\"\n", "prompt_response = ChatPromptTemplate.from_messages(\n", " [\n", " (\n", " \"system\",\n", " \"Given an input question and SQL response, convert it to a natural language answer. No pre-amble.\",\n", " ),\n", " (\"human\", template),\n", " ]\n", ")\n", "\n", "full_chain = (\n", " RunnablePassthrough.assign(query=sql_response_memory)\n", " | RunnablePassthrough.assign(\n", " schema=get_schema,\n", " response=lambda x: db.run(x[\"query\"]),\n", " )\n", " | prompt_response\n", " | llm\n", ")\n", "\n", "full_chain.invoke({\"question\": \"What is his salary?\"})" ] }, { "cell_type": "markdown", "id": "b77fee61-f4da-4bb1-8285-14101e505518", "metadata": {}, "source": [ "Here is the [trace](https://smith.langchain.com/public/54794d18-2337-4ce2-8b9f-3d8a2df89e51/r)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 5 } ``` The content has been capped at 50000 tokens, and files over NaN bytes have been omitted. The user could consider applying other filters to refine the result. The better and more specific the context, the better the LLM can follow instructions. If the context seems verbose, the user can refine the filter using uithub. Thank you for using https://uithub.com - Perfect LLM context for any GitHub repo.