``` ├── .cursor/ ├── rules/ ├── python.mdc ├── .env.example ├── .gitignore ├── .python-version ├── LICENSE ├── README.md ├── cursor-rules-cli/ ├── .pypirc ├── LICENSE ├── MANIFEST.in ├── PUBLISHING.md ├── README.md ├── cursor-rules-cli-logo.jpeg ├── cursor-rules-cli.cast ├── cursor-rules-cli.gif ├── cursor-rules-cli.png ├── pyproject.toml ├── rules.json ├── setup.py ├── src/ ├── __init__.py ├── downloader.py ├── installer.py ├── main.py ├── matcher.py ├── scanner.py ├── utils.py ├── pyproject.toml ├── requirements.txt ├── rules-mdc/ ├── actix-web.mdc ├── aiohttp.mdc ├── amazon-ec2.mdc ├── amazon-s3.mdc ├── android-sdk.mdc ``` ## /.cursor/rules/python.mdc ```mdc path="/.cursor/rules/python.mdc" --- description: package and dependency mangenement globs: alwaysApply: true --- Use uv instead of pip. uv add library uv run script.py DO NOT create requirements.txt. Use uv add commands. ``` ## /.env.example ```example path="/.env.example" # API Keys for MDC Rules Generator # Copy this file to .env and fill in your API keys # Required for Exa semantic search EXA_API_KEY=your_exa_api_key_here # Choose one of the following based on your LLM provider: # For Gemini (default in config.yaml) GOOGLE_API_KEY=your_google_api_key_here # For OpenAI models (uncomment if using) # OPENAI_API_KEY=your_openai_api_key_here # For Anthropic Claude models (uncomment if using) # ANTHROPIC_API_KEY=your_anthropic_api_key_here ``` ## /.gitignore ```gitignore path="/.gitignore" # Python-generated files __pycache__/ *.py[oc] build/ dist/ wheels/ *.egg-info # Virtual environments .venv awesome-cursorrules/ .env exa_results/ logs/ .cache/ .DS_Store ``` ## /.python-version ```python-version path="/.python-version" 3.11 ``` ## /LICENSE ``` path="/LICENSE" Creative Commons Legal Code CC0 1.0 Universal CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER. Statement of Purpose The laws of most jurisdictions throughout the world automatically confer exclusive Copyright and Related Rights (defined below) upon the creator and subsequent owner(s) (each and all, an "owner") of an original work of authorship and/or a database (each, a "Work"). Certain owners wish to permanently relinquish those rights to a Work for the purpose of contributing to a commons of creative, cultural and scientific works ("Commons") that the public can reliably and without fear of later claims of infringement build upon, modify, incorporate in other works, reuse and redistribute as freely as possible in any form whatsoever and for any purposes, including without limitation commercial purposes. These owners may contribute to the Commons to promote the ideal of a free culture and the further production of creative, cultural and scientific works, or to gain reputation or greater distribution for their Work in part through the use and efforts of others. For these and/or other purposes and motivations, and without any expectation of additional consideration or compensation, the person associating CC0 with a Work (the "Affirmer"), to the extent that he or she is an owner of Copyright and Related Rights in the Work, voluntarily elects to apply CC0 to the Work and publicly distribute the Work under its terms, with knowledge of his or her Copyright and Related Rights in the Work and the meaning and intended legal effect of CC0 on those rights. 1. Copyright and Related Rights. A Work made available under CC0 may be protected by copyright and related or neighboring rights ("Copyright and Related Rights"). Copyright and Related Rights include, but are not limited to, the following: i. the right to reproduce, adapt, distribute, perform, display, communicate, and translate a Work; ii. moral rights retained by the original author(s) and/or performer(s); iii. publicity and privacy rights pertaining to a person's image or likeness depicted in a Work; iv. rights protecting against unfair competition in regards to a Work, subject to the limitations in paragraph 4(a), below; v. rights protecting the extraction, dissemination, use and reuse of data in a Work; vi. database rights (such as those arising under Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, and under any national implementation thereof, including any amended or successor version of such directive); and vii. other similar, equivalent or corresponding rights throughout the world based on applicable law or treaty, and any national implementations thereof. 2. Waiver. To the greatest extent permitted by, but not in contravention of, applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and unconditionally waives, abandons, and surrenders all of Affirmer's Copyright and Related Rights and associated claims and causes of action, whether now known or unknown (including existing as well as future claims and causes of action), in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each member of the public at large and to the detriment of Affirmer's heirs and successors, fully intending that such Waiver shall not be subject to revocation, rescission, cancellation, termination, or any other legal or equitable action to disrupt the quiet enjoyment of the Work by the public as contemplated by Affirmer's express Statement of Purpose. 3. Public License Fallback. Should any part of the Waiver for any reason be judged legally invalid or ineffective under applicable law, then the Waiver shall be preserved to the maximum extent permitted taking into account Affirmer's express Statement of Purpose. In addition, to the extent the Waiver is so judged Affirmer hereby grants to each affected person a royalty-free, non transferable, non sublicensable, non exclusive, irrevocable and unconditional license to exercise Affirmer's Copyright and Related Rights in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "License"). The License shall be deemed effective as of the date CC0 was applied by Affirmer to the Work. Should any part of the License for any reason be judged legally invalid or ineffective under applicable law, such partial invalidity or ineffectiveness shall not invalidate the remainder of the License, and in such case Affirmer hereby affirms that he or she will not (i) exercise any of his or her remaining Copyright and Related Rights in the Work or (ii) assert any associated claims and causes of action with respect to the Work, in either case contrary to Affirmer's express Statement of Purpose. 4. Limitations and Disclaimers. a. No trademark or patent rights held by Affirmer are waived, abandoned, surrendered, licensed or otherwise affected by this document. b. Affirmer offers the Work as-is and makes no representations or warranties of any kind concerning the Work, express, implied, statutory or otherwise, including without limitation warranties of title, merchantability, fitness for a particular purpose, non infringement, or the absence of latent or other defects, accuracy, or the present or absence of errors, whether or not discoverable, all to the greatest extent permissible under applicable law. c. Affirmer disclaims responsibility for clearing rights of other persons that may apply to the Work or any use thereof, including without limitation any person's Copyright and Related Rights in the Work. Further, Affirmer disclaims responsibility for obtaining any necessary consents, permissions or other rights required for any use of the Work. d. Affirmer understands and acknowledges that Creative Commons is not a party to this document and has no duty or obligation with respect to this CC0 or use of the Work. ``` ## /README.md # MDC Rules Generator > **Disclaimer:** This project is not officially associated with or endorsed by Cursor. It is a community-driven initiative to enhance the Cursor experience. Cursor Rules CLI - Auto-install relevant Cursor rules with one simple command | Product Hunt This project generates Cursor MDC (Markdown Cursor) rule files from a structured JSON file containing library information. It uses Exa for semantic search and LLM (Gemini) for content generation. ## Features - Generates comprehensive MDC rule files for libraries - Uses Exa for semantic web search to gather best practices - Leverages LLM to create detailed, structured content - Supports parallel processing for efficiency - Tracks progress to allow resuming interrupted runs - Smart retry system that focuses on failed libraries by default ## Prerequisites - Python 3.8+ - [uv](https://github.com/astral-sh/uv) for dependency management - API keys for: - Exa (for semantic search) - LLM provider (Gemini, OpenAI, or Anthropic) ## Installation 1. Clone this repository: ```bash git clone https://github.com/sanjeed5/awesome-cursor-rules-mdc.git cd awesome-cursor-rules-mdc ``` 2. Install dependencies using uv: ```bash uv sync ``` 3. Set up environment variables: Create a `.env` file in the project root with your API keys (see `.env.example`): ``` EXA_API_KEY=your_exa_api_key GOOGLE_API_KEY=your_google_api_key # For Gemini # Or use one of these depending on your LLM choice: # OPENAI_API_KEY=your_openai_api_key # ANTHROPIC_API_KEY=your_anthropic_api_key ``` ## Usage Run the generator script with: ```bash uv run src/generate_mdc_files.py ``` By default, the script will only process libraries that failed in previous runs. ### Command-line Options - `--test`: Run in test mode (process only one library) - `--tag TAG`: Process only libraries with a specific tag - `--library LIBRARY`: Process only a specific library - `--output OUTPUT_DIR`: Specify output directory for MDC files - `--verbose`: Enable verbose logging - `--workers N`: Set number of parallel workers - `--rate-limit N`: Set API rate limit calls per minute - `--regenerate-all`: Process all libraries, including previously completed ones ### Examples ```bash # Process failed libraries (default behavior) uv run src/generate_mdc_files.py # Regenerate all libraries uv run src/generate_mdc_files.py --regenerate-all # Process only Python libraries uv run src/generate_mdc_files.py --tag python # Process a specific library uv run src/generate_mdc_files.py --library react ``` ## Adding New Rules Adding support for new libraries is simple: 1. **Edit the rules.json file**: - Add a new entry to the `libraries` array: ```json { "name": "your-library-name", "tags": ["relevant-tag1", "relevant-tag2"] } ``` 2. **Generate the MDC files**: - Run the generator script: ```bash uv run src/generate_mdc_files.py ``` - The script automatically detects and processes new libraries 3. **Contribute back**: - Test your new rules with real projects - Consider raising a PR to contribute your additions back to the community ## Configuration The script uses a `config.yaml` file for configuration. You can modify this file to adjust: - API rate limits - Output directories - LLM model selection - Processing parameters ## Project Structure ``` . ├── cursor-rules-cli/ # CLI tool for finding and installing rules (deprecated) │ ├── src/ # CLI source code │ ├── docs/ # CLI documentation │ └── README.md # CLI usage instructions ├── src/ # Main source code directory │ ├── generate_mdc_files.py # Main generator script │ ├── config.yaml # Configuration file │ ├── mdc-instructions.txt # Instructions for MDC generation │ ├── logs/ # Log files directory │ └── exa_results/ # Directory for Exa search results ├── rules-mdc/ # Output directory for generated MDC files ├── rules.json # Input file with library information ├── pyproject.toml # Project dependencies and metadata ├── .env.example # Example environment variables └── LICENSE # MIT License ``` ## License [MIT License](LICENSE) ## /cursor-rules-cli/.pypirc ```pypirc path="/cursor-rules-cli/.pypirc" [distutils] index-servers = pypi testpypi [pypi] username = __token__ password = your_pypi_token [testpypi] repository = https://test.pypi.org/legacy/ username = __token__ password = your_testpypi_token ``` ## /cursor-rules-cli/LICENSE ``` path="/cursor-rules-cli/LICENSE" MIT License Copyright (c) 2025 Sanjeed Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ``` ## /cursor-rules-cli/MANIFEST.in ```in path="/cursor-rules-cli/MANIFEST.in" include LICENSE include README.md include rules.json recursive-include src *.py recursive-include src *.json ``` ## /cursor-rules-cli/PUBLISHING.md # Publishing to PyPI This document provides instructions for publishing the `cursor-rules` package to PyPI. ## Prerequisites 1. Create a PyPI account at https://pypi.org/account/register/ 2. Generate an API token at https://pypi.org/manage/account/token/ 3. Install required tools: ```bash pip install build twine ``` ## Publishing Steps 1. Update the version in `src/__init__.py` 2. Build the package: ```bash python -m build ``` 3. Test the package locally: ```bash pip install --force-reinstall dist/cursor_rules-*.whl cursor-rules --help ``` 4. Upload to TestPyPI (optional): ```bash python -m twine upload --repository testpypi dist/* ``` 5. Install from TestPyPI (optional): ```bash pip install --index-url https://test.pypi.org/simple/ cursor-rules ``` 6. Upload to PyPI: ```bash python -m twine upload dist/* ``` ## Using API Tokens When using twine, you can either: 1. Create a `.pypirc` file in your home directory: ``` [distutils] index-servers = pypi testpypi [pypi] username = __token__ password = your_pypi_token [testpypi] repository = https://test.pypi.org/legacy/ username = __token__ password = your_testpypi_token ``` 2. Or provide credentials via environment variables: ```bash export TWINE_USERNAME=__token__ export TWINE_PASSWORD=your_pypi_token ``` 3. Or enter them when prompted by twine. ## Updating the Package 1. Make your changes 2. Update the version in `src/__init__.py` 3. Rebuild and upload following the steps above ## /cursor-rules-cli/README.md # Cursor Rules CLI > **Disclaimer:** This project is not officially associated with or endorsed by Cursor. It is a community-driven initiative to enhance the Cursor experience. Cursor Rules CLI - Auto-install relevant Cursor rules with one simple command | Product Hunt A simple tool that helps you find and install the right Cursor rules for your project. It scans your project to identify libraries and frameworks you're using and suggests matching rules. ![Cursor Rules CLI Demo](cursor-rules-cli.gif) ## Features - 🔍 Auto-detects libraries in your project - 📝 Supports direct library specification - 📥 Downloads and installs rules into Cursor - 🎨 Provides a colorful, user-friendly interface - 🔀 Works with custom rule repositories - 🔒 100% privacy-focused (all scanning happens locally) - 🔄 GitHub API integration for reliable downloads ## Installation ```bash pip install cursor-rules ``` ## Basic Usage ```bash # Scan current project and install matching rules cursor-rules # Specify libraries directly (skips project scanning) cursor-rules --libraries "react,tailwind,typescript" # Scan a specific project directory cursor-rules -d /path/to/my/project ``` ## Common Options | Option | Description | |--------|-------------| | `--dry-run` | Preview without installing anything | | `--force` | Replace existing rules | | `-v, --verbose` | Show detailed output | | `--quick-scan` | Faster scan (checks package files only) | | `--max-results N` | Show top N results (default: 20) | ## Custom Repositories ```bash # Use rules from your own GitHub repository cursor-rules --source https://github.com/your-username/your-repo # Save repository setting for future use cursor-rules --source https://github.com/your-username/your-repo --save-config ``` ## Repository URL Format The tool now uses the GitHub API to reliably download rules. You can specify the repository URL in several formats: ```bash # Standard GitHub repository URL (recommended) cursor-rules --source https://github.com/username/repo # With a specific branch cursor-rules --source https://github.com/username/repo/tree/branch-name # Legacy raw content URL will also work cursor-rules --source https://raw.githubusercontent.com/username/repo/branch ``` ## Configuration ```bash # View current settings cursor-rules --show-config # Save settings globally cursor-rules --save-config # Save settings for current project only cursor-rules --save-project-config ``` ## Full Options Reference Run `cursor-rules --help` to see all available options. ## License MIT ## Todo: - [ ] Test the custom repo feature ## /cursor-rules-cli/cursor-rules-cli-logo.jpeg Binary file available at https://raw.githubusercontent.com/sanjeed5/awesome-cursor-rules-mdc/refs/heads/main/cursor-rules-cli/cursor-rules-cli-logo.jpeg ## /cursor-rules-cli/cursor-rules-cli.cast ```cast path="/cursor-rules-cli/cursor-rules-cli.cast" {"version": 2, "width": 93, "height": 21, "timestamp": 1741028347, "env": {"SHELL": "/bin/zsh", "TERM": "xterm-256color"}} [0.984595, "o", "\u001b[1m\u001b[7m%\u001b[27m\u001b[1m\u001b[0m \r \r"] [0.985302, "o", "\u001b]2;stylumia@Sanjeeds-MacBook-Air:~/projects/agents/smolagents-tutorials\u0007\u001b]1;..nts-tutorials\u0007"] [0.987528, "o", "\u001b]7;file://Sanjeeds-MacBook-Air.local/Users/stylumia/projects/agents/smolagents-tutorials\u001b\\"] [1.005076, "o", "\u001b]697;OSCUnlock=7b6883b11bfa4f3a909885153dc9f047\u0007\u001b]697;Dir=/Users/stylumia/projects/agents/smolagents-tutorials\u0007\u001b]697;Shell=zsh\u0007"] [1.005108, "o", "\u001b]697;ShellPath=/bin/zsh\u0007\u001b]697;PID=89174\u0007\u001b]697;ExitCode=0\u0007"] [1.005248, "o", "\u001b]697;TTY=/dev/ttys141\u0007\u001b]697;Log=\u0007\u001b]697;ZshAutosuggestionColor=fg=8\u0007"] [1.00539, "o", "\u001b]697;FigAutosuggestionColor=\u0007\u001b]697;User=stylumia\u0007"] [1.007315, "o", "\r\u001b[0m\u001b[27m\u001b[24m\u001b[J\u001b]697;StartPrompt\u0007\u001b[01;32m➜ \u001b[36msmolagents-tutorials\u001b[00m \u001b]697;EndPrompt\u0007\u001b]697;NewCmd=7b6883b11bfa4f3a909885153dc9f047\u0007"] [1.00733, "o", "\u001b[K\u001b[68C\u001b]697;StartPrompt\u0007\u001b]697;EndPrompt\u0007\u001b[68D"] [1.007538, "o", "\u001b[?1h\u001b=\u001b[?2004h"] [1.059751, "o", "\r\r\u001b[0m\u001b[27m\u001b[24m\u001b[J\u001b]697;StartPrompt\u0007\u001b[01;32m➜ \u001b[36msmolagents-tutorials\u001b[00m \u001b[01;34mgit:(\u001b[31mmain\u001b[34m) \u001b[33m✗\u001b[00m \u001b]697;EndPrompt\u0007\u001b]697;NewCmd=7b6883b11bfa4f3a909885153dc9f047\u0007\u001b[K\u001b[55C\u001b]697;StartPrompt\u0007\u001b]697;EndPrompt\u0007\u001b[55D"] [1.754231, "o", "c"] [1.766068, "o", "\b\u001b[1m\u001b[31mc\u001b[0m\u001b[39m"] [1.766439, "o", "\b\u001b[1m\u001b[31mc\u001b[0m\u001b[39m\u001b[90mlear\u001b[39m\b\b\b\b"] [1.939123, "o", "\b\u001b[1m\u001b[31mc\u001b[1m\u001b[31mu\u001b[0m\u001b[39m\u001b[39m \u001b[39m \u001b[39m \b\b\b"] [1.941497, "o", "\b\b\u001b[0m\u001b[32mc\u001b[0m\u001b[32mu\u001b[39m"] [1.941812, "o", "\u001b[90mrsor-rules\u001b[39m\u001b[10D"] [2.120794, "o", "\b\b\u001b[32mc\u001b[32mu\u001b[32mr\u001b[39m"] [2.126876, "o", "\b\b\b\u001b[1m\u001b[31mc\u001b[1m\u001b[31mu\u001b[1m\u001b[31mr\u001b[0m\u001b[39m"] [2.484178, "o", "\u001b[39ms\u001b[39mo\u001b[39mr\u001b[39m-\u001b[39mr\u001b[39mu\u001b[39ml\u001b[39me\u001b[39ms"] [2.486415, "o", "\u001b[12D\u001b[0m\u001b[32mc\u001b[0m\u001b[32mu\u001b[0m\u001b[32mr\u001b[32ms\u001b[32mo\u001b[32mr\u001b[32m-\u001b[32mr\u001b[32mu\u001b[32ml\u001b[32me\u001b[32ms\u001b[39m"] [2.939399, "o", "\u001b[?1l\u001b>"] [2.939717, "o", "\u001b[?2004l"] [2.941204, "o", "\r\r\n"] [2.942476, "o", "\u001b]697;OSCLock=7b6883b11bfa4f3a909885153dc9f047\u0007"] [2.942588, "o", "\u001b]697;PreExec\u0007"] [2.942643, "o", "\u001b]2;cursor-rules\u0007\u001b]1;cursor-rules\u0007"] [3.164394, "o", "\u001b[32mINFO\u001b[0m: Scanning for libraries and frameworks...\r\n"] [3.164481, "o", "\u001b[32mINFO\u001b[0m: Scanning for libraries and frameworks...\r\n"] [3.166587, "o", "\u001b[32mINFO\u001b[0m: Detected 135 libraries/frameworks.\r\n"] [3.166609, "o", "\u001b[32mINFO\u001b[0m: Finding relevant rules...\r\n"] [3.167137, "o", "\u001b[32mINFO\u001b[0m: Successfully loaded rules.json from /Users/stylumia/projects/awesome-cursor-rules-mdc/cursor-rules-cli/rules.json\r\n"] [3.168112, "o", "\u001b[32mINFO\u001b[0m: Found \u001b[32m20\u001b[0m relevant rule files.\r\n"] [3.16821, "o", "\r\n\u001b[1m\u001b[34mAvailable Cursor rules for your project:\u001b[0m\r\n\r\n\u001b[1m\u001b[32mDirect Dependencies:\u001b[0m\r\n\u001b[32m1.\u001b[0m \u001b[36mscikit-learn\u001b[0m [ai, ml, machine-learning, python, data-science] (0.87)\r\n\u001b[32m2.\u001b[0m \u001b[36mpandas\u001b[0m [ai, ml, data-science, python, data-analysis] (0.87)\r\n\u001b[32m3.\u001b[0m \u001b[36mnumpy\u001b[0m [ai, ml, data-science, python, numerical-computing] (0.87)\r\n\u001b[32m4.\u001b[0m \u001b[36mscipy\u001b[0m [ai, ml, data-science, python, scientific-computing] (0.87)\r\n\u001b[32m5.\u001b[0m \u001b[36mtornado\u001b[0m [backend, framework, python, async] (0.83)\r\n\u001b[32m6.\u001b[0m \u001b[36msmolagents\u001b[0m [ai, ml, llm, python, agent-framework, lightweight] (0.82)\r\n\u001b[32m7.\u001b[0m \u001b[36msqlalchemy\u001b[0m [database, orm, python, sql] (0.82)\r\n\u001b[32m8.\u001b[0m \u001b[36msetuptools\u001b[0m [development, build-tool, python, packaging] (0.82)\r\n\u001b[32m9.\u001b[0m \u001b[36mpydantic\u001b[0m [development, python, data-validation, type-checking] (0.82)\r\n\u001b[32m10.\u001b[0m \u001b[36mlangchain\u001b[0m [ai, ml, llm, python] (0.82)\r\n\u001b[32m11.\u001b[0m \u001b[36mhttpx\u001b[0m [web, python, http-client, async] (0.82)\r\n\u001b[32m12.\u001b[0m \u001b[36maiohttp\u001b[0m [web, pyt"] [3.168309, "o", "hon, http-client, async] (0.82)\r\n\u001b[32m13.\u001b[0m \u001b[36mtransformers\u001b[0m [python, nlp, deep-learning, huggingface] (0.82)\r\n\u001b[32m14.\u001b[0m \u001b[36mrich\u001b[0m [python, utilities, terminal, formatting] (0.82)\r\n\u001b[32m15.\u001b[0m \u001b[36mrequests\u001b[0m [web, python, http-client] (0.81)\r\n\u001b[32m16.\u001b[0m \u001b[36mbeautifulsoup4\u001b[0m [python, web-scraping, html-parsing] (0.81)\r\n\u001b[32m17.\u001b[0m \u001b[36manyio\u001b[0m [python, async, compatibility-layer] (0.81)\r\n\u001b[32m18.\u001b[0m \u001b[36mtqdm\u001b[0m [python, utilities, progress-bar] (0.81)\r\n\u001b[32m19.\u001b[0m \u001b[36mclick\u001b[0m [python, utilities, cli] (0.81)\r\n"] [3.168328, "o", "\r\n\u001b[1m\u001b[33mOther Relevant Rules:\u001b[0m\r\n\u001b[32m20.\u001b[0m \u001b[36mpytorch\u001b[0m [ai, ml, machine-learning, python, deep-learning] (0.82)\r\n\r\n\u001b[1mSelect rules to install:\u001b[0m\r\n \u001b[33m* Enter comma-separated numbers (e.g., 1,3,5)\u001b[0m\r\n \u001b[33m* Type 'all' to select all rules\u001b[0m\r\n \u001b[33m* Type 'category:name' to select all rules in a category (e.g., 'category:development')\u001b[0m\r\n \u001b[33m* Type 'none' to cancel\u001b[0m\r\n\u001b[32m> \u001b[0m"] [4.962026, "o", "6"] [5.205157, "o", ","] [5.755048, "o", "9"] [6.427783, "o", ","] [6.737927, "o", "1"] [7.081027, "o", "6"] [7.560083, "o", "\r\n"] [7.633727, "o", "\u001b[32mINFO\u001b[0m: Downloaded smolagents\r\n"] [7.847926, "o", "\u001b[32mINFO\u001b[0m: Downloaded beautifulsoup4\r\n"] [7.942039, "o", "\u001b[32mINFO\u001b[0m: Downloaded pydantic\r\n"] [7.94271, "o", "\u001b[32mINFO\u001b[0m: Successfully downloaded all 3 rules\r\n"] [7.944752, "o", "\u001b[32mINFO\u001b[0m: Backed up existing rules to /Users/stylumia/projects/agents/smolagents-tutorials/.cursor/backups/rules_backup_20250304_002915\r\n"] [7.944899, "o", "\u001b[33mWARNING\u001b[0m: \u001b[33mSkipping smolagents: Rule already exists (use --force to overwrite)\u001b[0m\r\n"] [7.946423, "o", "\u001b[32mINFO\u001b[0m: Installed 2/3 rules to /Users/stylumia/projects/agents/smolagents-tutorials/.cursor/rules\r\n"] [7.946532, "o", "\u001b[32mINFO\u001b[0m: \u001b[32m✅ Successfully installed 2 rules!\u001b[0m\r\n\u001b[33mWARNING\u001b[0m: \u001b[33m\u001b[33m⚠️ Failed to install 1 rules:\u001b[0m\u001b[0m\r\n"] [7.946792, "o", "\u001b[0m"] [7.976129, "o", "\u001b[1m\u001b[7m%\u001b[27m\u001b[1m\u001b[0m \r \r"] [7.977311, "o", "\u001b]2;stylumia@Sanjeeds-MacBook-Air:~/projects/agents/smolagents-tutorials\u0007\u001b]1;..nts-tutorials\u0007"] [7.980009, "o", "\u001b]7;file://Sanjeeds-MacBook-Air.local/Users/stylumia/projects/agents/smolagents-tutorials\u001b\\"] [7.992933, "o", "\u001b]697;OSCUnlock=7b6883b11bfa4f3a909885153dc9f047\u0007\u001b]697;Dir=/Users/stylumia/projects/agents/smolagents-tutorials\u0007"] [7.993049, "o", "\u001b]697;Shell=zsh\u0007\u001b]697;ShellPath=/bin/zsh\u0007"] [7.993191, "o", "\u001b]697;PID=89174\u0007\u001b]697;ExitCode=0\u0007\u001b]697;TTY=/dev/ttys141\u0007\u001b]697;Log=\u0007\u001b]697;ZshAutosuggestionColor=fg=8\u0007\u001b]697;FigAutosuggestionColor=\u0007\u001b]697;User=stylumia\u0007"] [7.99525, "o", "\r\u001b[0m\u001b[27m\u001b[24m\u001b[J\u001b]697;StartPrompt\u0007\u001b[01;32m➜ \u001b[36msmolagents-tutorials\u001b[00m \u001b[01;34mgit:(\u001b[31mmain\u001b[34m) \u001b[33m✗\u001b[00m \u001b]697;EndPrompt\u0007\u001b]697;NewCmd=7b6883b11bfa4f3a909885153dc9f047\u0007"] [7.995282, "o", "\u001b[K\u001b[55C\u001b]697;StartPrompt\u0007\u001b]697;EndPrompt\u0007\u001b[55D"] [7.995396, "o", "\u001b[?1h\u001b=\u001b[?2004h"] [9.5534, "o", "e"] [9.558133, "o", "\b\u001b[1m\u001b[31me\u001b[0m\u001b[39m"] [9.558402, "o", "\b\u001b[1m\u001b[31me\u001b[0m\u001b[39m\u001b[90mxit\u001b[39m\b\b\b"] [9.752584, "o", "\b\u001b[1m\u001b[31me\u001b[1m\u001b[31mx\u001b[0m\u001b[39m"] [9.754766, "o", "\b\b\u001b[0m\u001b[32me\u001b[0m\u001b[32mx\u001b[39m"] [9.912208, "o", "\b\b\u001b[32me\u001b[32mx\u001b[32mi\u001b[39m"] [9.916324, "o", "\b\b\b\u001b[1m\u001b[31me\u001b[1m\u001b[31mx\u001b[1m\u001b[31mi\u001b[0m\u001b[39m"] [10.150086, "o", "\b\u001b[1m\u001b[31mi\u001b[1m\u001b[31mt\u001b[0m\u001b[39m"] [10.152603, "o", "\b\b\b\b\u001b[0m\u001b[32me\u001b[0m\u001b[32mx\u001b[0m\u001b[32mi\u001b[0m\u001b[32mt\u001b[39m"] [10.996501, "o", "\u001b[?1l\u001b>"] [10.996806, "o", "\u001b[?2004l"] [10.998773, "o", "\r\r\n"] [11.000081, "o", "\u001b]697;OSCLock=7b6883b11bfa4f3a909885153dc9f047\u0007"] [11.000111, "o", "\u001b]697;PreExec\u0007"] [11.000504, "o", "\u001b]2;exit\u0007\u001b]1;exit\u0007"] ``` ## /cursor-rules-cli/cursor-rules-cli.gif Binary file available at https://raw.githubusercontent.com/sanjeed5/awesome-cursor-rules-mdc/refs/heads/main/cursor-rules-cli/cursor-rules-cli.gif ## /cursor-rules-cli/cursor-rules-cli.png Binary file available at https://raw.githubusercontent.com/sanjeed5/awesome-cursor-rules-mdc/refs/heads/main/cursor-rules-cli/cursor-rules-cli.png ## /cursor-rules-cli/pyproject.toml ```toml path="/cursor-rules-cli/pyproject.toml" [build-system] requires = ["setuptools>=42", "wheel"] build-backend = "setuptools.build_meta" ``` ## /cursor-rules-cli/rules.json ```json path="/cursor-rules-cli/rules.json" { "libraries": [ { "name": "react", "tags": ["frontend", "framework", "javascript"] }, { "name": "react-native", "tags": ["frontend", "framework", "javascript", "mobile", "cross-platform"] }, { "name": "react-query", "tags": ["frontend", "javascript", "data-fetching"] }, { "name": "react-redux", "tags": ["frontend", "javascript", "state-management"] }, { "name": "react-mobx", "tags": ["frontend", "javascript", "state-management"] }, { "name": "next-js", "tags": ["frontend", "framework", "javascript", "react", "ssr"] }, { "name": "vue", "tags": ["frontend", "framework", "javascript"] }, { "name": "vue3", "tags": ["frontend", "framework", "javascript"] }, { "name": "nuxt", "tags": ["frontend", "framework", "javascript", "vue", "ssr"] }, { "name": "angular", "tags": ["frontend", "framework", "javascript", "typescript"] }, { "name": "svelte", "tags": ["frontend", "framework", "javascript"] }, { "name": "sveltekit", "tags": ["frontend", "framework", "javascript", "svelte", "ssr"] }, { "name": "solidjs", "tags": ["frontend", "framework", "javascript"] }, { "name": "qwik", "tags": ["frontend", "framework", "javascript"] }, { "name": "express", "tags": ["backend", "framework", "javascript", "nodejs"] }, { "name": "nestjs", "tags": ["backend", "framework", "javascript", "typescript", "nodejs"] }, { "name": "bun", "tags": ["backend", "javascript", "runtime", "nodejs-alternative"] }, { "name": "django", "tags": ["backend", "framework", "python", "orm", "full-stack"] }, { "name": "flask", "tags": ["backend", "framework", "python", "microframework"] }, { "name": "fastapi", "tags": ["backend", "framework", "python", "api", "async"] }, { "name": "pyramid", "tags": ["backend", "framework", "python"] }, { "name": "tornado", "tags": ["backend", "framework", "python", "async"] }, { "name": "sanic", "tags": ["backend", "framework", "python", "async"] }, { "name": "bottle", "tags": ["backend", "framework", "python", "microframework"] }, { "name": "laravel", "tags": ["backend", "framework", "php"] }, { "name": "springboot", "tags": ["backend", "framework", "java"] }, { "name": "fiber", "tags": ["backend", "framework", "go"] }, { "name": "servemux", "tags": ["backend", "framework", "go"] }, { "name": "phoenix", "tags": ["backend", "framework", "elixir"] }, { "name": "actix-web", "tags": ["backend", "framework", "rust"] }, { "name": "rocket", "tags": ["backend", "framework", "rust"] }, { "name": "shadcn", "tags": ["ui", "component-library", "react"] }, { "name": "chakra-ui", "tags": ["ui", "component-library", "react"] }, { "name": "material-ui", "tags": ["ui", "component-library", "react"] }, { "name": "tailwind", "tags": ["ui", "css", "utility-first"] }, { "name": "jetpack-compose", "tags": ["ui", "mobile", "android", "kotlin"] }, { "name": "tkinter", "tags": ["ui", "gui", "python", "desktop"] }, { "name": "pyqt", "tags": ["ui", "gui", "python", "desktop", "qt"] }, { "name": "pyside", "tags": ["ui", "gui", "python", "desktop", "qt"] }, { "name": "kivy", "tags": ["ui", "gui", "python", "cross-platform", "mobile"] }, { "name": "pygame", "tags": ["ui", "gui", "python", "game-development"] }, { "name": "customtkinter", "tags": ["ui", "gui", "python", "desktop", "tkinter"] }, { "name": "redux", "tags": ["state-management", "javascript", "react"] }, { "name": "mobx", "tags": ["state-management", "javascript", "react"] }, { "name": "zustand", "tags": ["state-management", "javascript", "react"] }, { "name": "riverpod", "tags": ["state-management", "flutter", "dart"] }, { "name": "supabase", "tags": ["database", "sql", "postgresql", "backend-as-service"] }, { "name": "postgresql", "tags": ["database", "sql", "relational"] }, { "name": "prisma", "tags": ["database", "orm", "typescript", "javascript"] }, { "name": "mongodb", "tags": ["database", "nosql", "document"] }, { "name": "redis", "tags": ["database", "nosql", "key-value", "in-memory"] }, { "name": "duckdb", "tags": ["database", "analytics", "sql", "olap"] }, { "name": "sqlalchemy", "tags": ["database", "orm", "python", "sql"] }, { "name": "peewee", "tags": ["database", "orm", "python", "sql"] }, { "name": "pony", "tags": ["database", "orm", "python", "sql"] }, { "name": "tortoise-orm", "tags": ["database", "orm", "python", "sql", "async"] }, { "name": "django-orm", "tags": ["database", "orm", "python", "sql", "django"] }, { "name": "vite", "tags": ["development", "build-tool", "javascript"] }, { "name": "webpack", "tags": ["development", "build-tool", "javascript"] }, { "name": "turbopack", "tags": ["development", "build-tool", "javascript"] }, { "name": "poetry", "tags": ["development", "build-tool", "python", "dependency-management"] }, { "name": "setuptools", "tags": ["development", "build-tool", "python", "packaging"] }, { "name": "jest", "tags": ["development", "testing", "javascript"] }, { "name": "detox", "tags": ["development", "testing", "javascript", "react-native", "e2e"] }, { "name": "playwright", "tags": ["development", "testing", "javascript", "e2e", "browser"] }, { "name": "vitest", "tags": ["development", "testing", "javascript", "vite"] }, { "name": "python", "tags": ["python"] }, { "name": "pytest", "tags": ["development", "testing", "python"] }, { "name": "unittest", "tags": ["development", "testing", "python", "standard-library"] }, { "name": "nose2", "tags": ["development", "testing", "python"] }, { "name": "hypothesis", "tags": ["development", "testing", "python", "property-based"] }, { "name": "behave", "tags": ["development", "testing", "python", "bdd"] }, { "name": "docker", "tags": ["development", "containerization", "devops"] }, { "name": "kubernetes", "tags": ["development", "containerization", "orchestration", "devops"] }, { "name": "git", "tags": ["development", "version-control"] }, { "name": "mkdocs", "tags": ["development", "documentation", "markdown"] }, { "name": "sphinx", "tags": ["development", "documentation", "python", "rst"] }, { "name": "pdoc", "tags": ["development", "documentation", "python", "auto-generation"] }, { "name": "github-actions", "tags": ["development", "ci-cd", "devops"] }, { "name": "terraform", "tags": ["development", "infrastructure", "iac", "devops"] }, { "name": "black", "tags": ["development", "python", "formatter", "linting"] }, { "name": "flake8", "tags": ["development", "python", "linting"] }, { "name": "pylint", "tags": ["development", "python", "linting", "static-analysis"] }, { "name": "mypy", "tags": ["development", "python", "type-checking", "static-analysis"] }, { "name": "isort", "tags": ["development", "python", "formatter", "imports"] }, { "name": "pydantic", "tags": ["development", "python", "data-validation", "type-checking"] }, { "name": "pyright", "tags": ["development", "python", "type-checking", "static-analysis"] }, { "name": "tauri", "tags": ["cross-platform", "desktop", "rust", "javascript"] }, { "name": "electron", "tags": ["cross-platform", "desktop", "javascript"] }, { "name": "expo", "tags": ["cross-platform", "mobile", "react-native"] }, { "name": "flutter", "tags": ["cross-platform", "mobile", "dart"] }, { "name": "pytorch", "tags": ["ai", "ml", "machine-learning", "python", "deep-learning"] }, { "name": "scikit-learn", "tags": ["ai", "ml", "machine-learning", "python", "data-science"] }, { "name": "pandas", "tags": ["ai", "ml", "data-science", "python", "data-analysis"] }, { "name": "tensorflow", "tags": ["ai", "ml", "machine-learning", "python", "deep-learning"] }, { "name": "keras", "tags": ["ai", "ml", "machine-learning", "python", "deep-learning"] }, { "name": "xgboost", "tags": ["ai", "ml", "machine-learning", "python", "gradient-boosting"] }, { "name": "lightgbm", "tags": ["ai", "ml", "machine-learning", "python", "gradient-boosting"] }, { "name": "cuda", "tags": ["ai", "ml", "gpu-computing", "parallel-computing"] }, { "name": "numba", "tags": ["ai", "ml", "gpu-computing", "python", "jit-compiler"] }, { "name": "langchain", "tags": ["ai", "ml", "llm", "python"] }, { "name": "huggingface", "tags": ["ai", "ml", "llm", "python", "transformers"] }, { "name": "vllm", "tags": ["ai", "ml", "llm", "python", "inference"] }, { "name": "llama-index", "tags": ["ai", "ml", "llm", "python", "rag"] }, { "name": "modal", "tags": ["ai", "ml", "cloud-inference", "serverless", "deployment"] }, { "name": "numpy", "tags": ["ai", "ml", "data-science", "python", "numerical-computing"] }, { "name": "scipy", "tags": ["ai", "ml", "data-science", "python", "scientific-computing"] }, { "name": "matplotlib", "tags": ["ai", "ml", "data-science", "python", "data-visualization"] }, { "name": "seaborn", "tags": ["ai", "ml", "data-science", "python", "data-visualization"] }, { "name": "plotly", "tags": ["ai", "ml", "data-science", "python", "interactive-visualization"] }, { "name": "statsmodels", "tags": ["ai", "ml", "data-science", "python", "statistics"] }, { "name": "dask", "tags": ["ai", "ml", "data-science", "python", "parallel-computing", "big-data"] }, { "name": "htmx", "tags": ["web", "javascript", "modern-patterns"] }, { "name": "trpc", "tags": ["web", "typescript", "api", "modern-patterns"] }, { "name": "typescript", "tags": ["web", "javascript", "type-checking", "language"] }, { "name": "zod", "tags": ["web", "typescript", "validation", "type-checking"] }, { "name": "axios", "tags": ["web", "javascript", "http-client"] }, { "name": "guzzle", "tags": ["web", "php", "http-client"] }, { "name": "requests", "tags": ["web", "python", "http-client"] }, { "name": "httpx", "tags": ["web", "python", "http-client", "async"] }, { "name": "aiohttp", "tags": ["web", "python", "http-client", "async"] }, { "name": "graphql", "tags": ["web", "api", "query-language"] }, { "name": "apollo-client", "tags": ["web", "api", "graphql", "javascript"] }, { "name": "flask-restful", "tags": ["web", "api", "python", "flask"] }, { "name": "solidity", "tags": ["blockchain", "ethereum", "smart-contracts", "language"] }, { "name": "hardhat", "tags": ["blockchain", "ethereum", "development", "javascript"] }, { "name": "vercel", "tags": ["cloud", "deployment", "serverless", "frontend"] }, { "name": "cloudflare", "tags": ["cloud", "deployment", "edge-computing", "cdn"] }, { "name": "aws-lambda", "tags": ["cloud", "serverless", "aws"] }, { "name": "aws", "tags": ["cloud", "major-platform"] }, { "name": "gcp", "tags": ["cloud", "major-platform"] }, { "name": "azure", "tags": ["cloud", "major-platform"] }, { "name": "beautifulsoup4", "tags": ["python", "web-scraping", "html-parsing"] }, { "name": "scrapy", "tags": ["python", "web-scraping", "crawler", "framework"] }, { "name": "selenium", "tags": ["python", "web-scraping", "browser-automation", "testing"] }, { "name": "asyncio", "tags": ["python", "async", "standard-library"] }, { "name": "trio", "tags": ["python", "async"] }, { "name": "anyio", "tags": ["python", "async", "compatibility-layer"] }, { "name": "nltk", "tags": ["python", "nlp", "text-processing"] }, { "name": "spacy", "tags": ["python", "nlp", "text-processing"] }, { "name": "gensim", "tags": ["python", "nlp", "topic-modeling"] }, { "name": "transformers", "tags": ["python", "nlp", "deep-learning", "huggingface"] }, { "name": "pillow", "tags": ["python", "image-processing"] }, { "name": "opencv-python", "tags": ["python", "image-processing", "computer-vision"] }, { "name": "scikit-image", "tags": ["python", "image-processing", "scientific-computing"] }, { "name": "tqdm", "tags": ["python", "utilities", "progress-bar"] }, { "name": "rich", "tags": ["python", "utilities", "terminal", "formatting"] }, { "name": "click", "tags": ["python", "utilities", "cli"] }, { "name": "typer", "tags": ["python", "utilities", "cli"] }, { "name": "streamlit", "tags": ["python", "utilities", "data-apps", "dashboard"] }, { "name": "css", "tags": ["web", "frontend", "styling", "language"] }, { "name": "crewai", "tags": ["ai", "ml", "llm", "python", "agent-framework", "multi-agent"] }, { "name": "smolagents", "tags": ["ai", "ml", "llm", "python", "agent-framework", "lightweight"] }, { "name": "langgraph", "tags": ["ai", "ml", "llm", "python", "agent-framework", "workflow"] }, { "name": "autogen", "tags": ["ai", "ml", "llm", "python", "agent-framework", "multi-agent"] }, { "name": "llamaindex-js", "tags": ["ai", "ml", "llm", "javascript", "rag"] }, { "name": "langchain-js", "tags": ["ai", "ml", "llm", "javascript"] }, { "name": "asp-net", "tags": ["backend", "framework", "csharp", "microsoft"] }, { "name": "aws-amplify", "tags": ["cloud", "frontend", "aws", "full-stack"] }, { "name": "aws-cli", "tags": ["cloud", "devops", "aws", "command-line"] }, { "name": "aws-dynamodb", "tags": ["database", "nosql", "aws", "cloud"] }, { "name": "aws-ecs", "tags": ["cloud", "containerization", "aws", "orchestration"] }, { "name": "aws-rds", "tags": ["database", "sql", "aws", "cloud"] }, { "name": "amazon-ec2", "tags": ["cloud", "infrastructure", "aws", "virtual-machines"] }, { "name": "amazon-s3", "tags": ["cloud", "storage", "aws", "object-storage"] }, { "name": "android-sdk", "tags": ["mobile", "framework", "java", "kotlin"] }, { "name": "ansible", "tags": ["devops", "infrastructure", "automation", "configuration-management"] }, { "name": "ant-design", "tags": ["ui", "component-library", "react", "design-system"] }, { "name": "apollo-graphql", "tags": ["web", "api", "graphql", "javascript"] }, { "name": "astro", "tags": ["frontend", "framework", "javascript", "static-site"] }, { "name": "auth0", "tags": ["authentication", "security", "identity", "saas"] }, { "name": "azure-pipelines", "tags": ["devops", "ci-cd", "microsoft", "cloud"] }, { "name": "bash", "tags": ["shell", "scripting", "unix", "command-line"] }, { "name": "boto3", "tags": ["cloud", "aws", "python", "sdk"] }, { "name": "c-sharp", "tags": ["language", "microsoft", "dotnet", "backend"] }, { "name": "cheerio", "tags": ["web-scraping", "javascript", "html-parsing", "nodejs"] }, { "name": "circleci", "tags": ["devops", "ci-cd", "cloud", "automation"] }, { "name": "clerk", "tags": ["authentication", "security", "identity", "saas"] }, { "name": "codemirror", "tags": ["editor", "javascript", "text-editor", "code-editor"] }, { "name": "cypress", "tags": ["testing", "e2e", "javascript", "browser"] }, { "name": "d3", "tags": ["data-visualization", "javascript", "svg", "charts"] }, { "name": "datadog", "tags": ["monitoring", "observability", "devops", "cloud"] }, { "name": "deno", "tags": ["javascript", "runtime", "typescript", "nodejs-alternative"] }, { "name": "digitalocean", "tags": ["cloud", "hosting", "infrastructure", "paas"] }, { "name": "discord-api", "tags": ["api", "messaging", "gaming", "communication"] }, { "name": "django-rest-framework", "tags": ["api", "python", "django", "rest"] }, { "name": "drizzle", "tags": ["database", "orm", "typescript", "sql"] }, { "name": "elk-stack", "tags": ["logging", "monitoring", "search", "analytics"] }, { "name": "esbuild", "tags": ["build-tool", "javascript", "bundler", "performance"] }, { "name": "eslint", "tags": ["linting", "javascript", "static-analysis", "code-quality"] }, { "name": "elasticsearch", "tags": ["search", "database", "full-text", "analytics"] }, { "name": "emacs", "tags": ["editor", "text-editor", "lisp", "extensible"] }, { "name": "ffmpeg", "tags": ["multimedia", "video", "audio", "conversion"] }, { "name": "fabric-js", "tags": ["canvas", "graphics", "javascript", "interactive"] }, { "name": "firebase", "tags": ["backend-as-service", "database", "authentication", "google"] }, { "name": "fontawesome", "tags": ["icons", "ui", "web", "design"] }, { "name": "gcp-cli", "tags": ["cloud", "google", "command-line", "devops"] }, { "name": "gitlab-ci", "tags": ["devops", "ci-cd", "automation", "git"] }, { "name": "go", "tags": ["language", "backend", "performance", "google"] }, { "name": "godot", "tags": ["game-development", "engine", "cross-platform", "open-source"] }, { "name": "google-maps-js", "tags": ["maps", "geolocation", "javascript", "api"] }, { "name": "gradle", "tags": ["build-tool", "java", "android", "automation"] }, { "name": "grafana", "tags": ["monitoring", "visualization", "dashboards", "observability"] }, { "name": "heroku", "tags": ["cloud", "paas", "hosting", "deployment"] }, { "name": "insomnia", "tags": ["api", "testing", "development", "http-client"] }, { "name": "ionic", "tags": ["mobile", "framework", "cross-platform", "javascript"] }, { "name": "jax", "tags": ["ai", "ml", "numerical-computing", "python"] }, { "name": "junit", "tags": ["testing", "java", "unit-testing", "framework"] }, { "name": "java", "tags": ["language", "backend", "enterprise", "jvm"] }, { "name": "jenkins", "tags": ["devops", "ci-cd", "automation", "build"] }, { "name": "jquery", "tags": ["javascript", "dom", "library", "frontend"] }, { "name": "llvm", "tags": ["compiler", "infrastructure", "optimization", "toolchain"] }, { "name": "mlx", "tags": ["ai", "ml", "apple", "deep-learning"] }, { "name": "maven", "tags": ["build-tool", "java", "dependency-management", "project-management"] }, { "name": "microsoft-teams", "tags": ["collaboration", "communication", "microsoft", "enterprise"] }, { "name": "mockito", "tags": ["testing", "java", "mocking", "unit-testing"] }, { "name": "neo4j", "tags": ["database", "graph", "nosql", "relationships"] }, { "name": "netlify", "tags": ["hosting", "deployment", "jamstack", "frontend"] }, { "name": "nginx", "tags": ["web-server", "proxy", "load-balancer", "performance"] }, { "name": "notion-api", "tags": ["api", "productivity", "collaboration", "integration"] }, { "name": "openai", "tags": ["ai", "ml", "llm", "api"] }, { "name": "php", "tags": ["language", "backend", "web", "server-side"] }, { "name": "postman", "tags": ["api", "testing", "development", "http-client"] }, { "name": "puppeteer", "tags": ["web-scraping", "browser-automation", "testing", "javascript"] }, { "name": "ros", "tags": ["robotics", "framework", "middleware", "distributed-systems"] }, { "name": "railway", "tags": ["hosting", "deployment", "paas", "devops"] }, { "name": "remix", "tags": ["frontend", "framework", "react", "javascript"] }, { "name": "ruby", "tags": ["language", "backend", "web", "scripting"] }, { "name": "rust", "tags": ["language", "systems", "performance", "safety"] }, { "name": "sqlite", "tags": ["database", "sql", "embedded", "lightweight"] }, { "name": "sentry", "tags": ["error-tracking", "monitoring", "debugging", "observability"] }, { "name": "socket-io", "tags": ["websockets", "real-time", "javascript", "communication"] }, { "name": "spring", "tags": ["backend", "framework", "java", "enterprise"] }, { "name": "stripe", "tags": ["payments", "api", "e-commerce", "financial"] }, { "name": "three-js", "tags": ["3d", "graphics", "webgl", "javascript"] }, { "name": "tinygrad", "tags": ["ai", "ml", "deep-learning", "lightweight"] }, { "name": "unity", "tags": ["game-development", "engine", "cross-platform", "c-sharp"] }, { "name": "unreal-engine", "tags": ["game-development", "engine", "cross-platform", "c++"] }, { "name": "vim", "tags": ["editor", "text-editor", "terminal", "productivity"] }, { "name": "zsh", "tags": ["shell", "command-line", "unix", "terminal"] } ] } ``` ## /cursor-rules-cli/setup.py ```py path="/cursor-rules-cli/setup.py" #!/usr/bin/env python """ Setup script for cursor-rules. """ from setuptools import setup, find_packages import os import shutil from pathlib import Path # Read version from __init__.py with open(os.path.join("src", "__init__.py"), "r") as f: for line in f: if line.startswith("__version__"): version = line.split("=")[1].strip().strip('"').strip("'") break else: version = "0.1.0" # Read long description from README.md long_description = "A CLI tool to scan projects and install relevant Cursor rules (.mdc files)." readme_path = Path("README.md") if readme_path.exists(): with open(readme_path, "r", encoding="utf-8") as f: long_description = f.read() # Always copy the latest rules.json from project root to ensure consistency project_root = Path(__file__).parent.parent root_rules_json = project_root / "rules.json" package_rules_json = Path("rules.json") if root_rules_json.exists(): # Copy to package root only shutil.copy2(root_rules_json, package_rules_json) print(f"Copied rules.json from project root to package root") else: print("Warning: rules.json not found in project root") setup( name="cursor-rules", version=version, description="A CLI tool to scan projects and install relevant Cursor rules", long_description=long_description, long_description_content_type="text/markdown", author="sanjeed5", author_email="hi@sanjeed.in", url="https://github.com/sanjeed5/awesome-cursor-rules-mdc", package_dir={"cursor_rules_cli": "src"}, packages=["cursor_rules_cli"], include_package_data=True, package_data={ "cursor_rules_cli": ["*.json"], }, entry_points={ "console_scripts": [ "cursor-rules=cursor_rules_cli.main:main", ], }, python_requires=">=3.8", keywords="cursor, rules, mdc, cli", classifiers=[ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9", "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", ], install_requires=[ "requests>=2.25.0", "colorama>=0.4.4", "tqdm>=4.62.0", "urllib3>=2.0.0", "validators>=0.20.0", ] ) ``` ## /cursor-rules-cli/src/__init__.py ```py path="/cursor-rules-cli/src/__init__.py" """ Cursor Rules CLI - A tool to scan projects and suggest relevant Cursor rules """ __version__ = "0.5.2" ``` ## /cursor-rules-cli/src/downloader.py ```py path="/cursor-rules-cli/src/downloader.py" """ Downloader module for downloading MDC rule files. This module handles downloading the selected MDC rule files from the repository. """ import os import time import logging import requests import re import base64 from pathlib import Path from typing import Dict, List, Any, Optional, Tuple from concurrent.futures import ThreadPoolExecutor, as_completed from urllib3.util.retry import Retry from requests.adapters import HTTPAdapter from cursor_rules_cli import utils logger = logging.getLogger(__name__) # Rate limiting settings DEFAULT_RATE_LIMIT = 10 # requests per second DEFAULT_MAX_RETRIES = 3 DEFAULT_RETRY_DELAY = 2 # seconds DEFAULT_TIMEOUT = 10 # seconds class DownloadError(Exception): """Custom exception for download errors.""" pass class ValidationError(Exception): """Custom exception for validation errors.""" pass def extract_repo_info(source_url: str) -> Tuple[str, str, str]: """ Extract owner and repo name from GitHub URL. Args: source_url: GitHub URL Returns: Tuple of (owner, repo, branch) Raises: ValueError: If URL is not a valid GitHub repository URL """ # Handle various GitHub URL formats github_patterns = [ r"https?://github\.com/([^/]+)/([^/]+)(?:/tree/([^/]+))?", # github.com URLs r"https?://raw\.githubusercontent\.com/([^/]+)/([^/]+)/([^/]+)" # raw.githubusercontent.com URLs ] for pattern in github_patterns: match = re.match(pattern, source_url) if match: groups = match.groups() owner = groups[0] repo = groups[1] branch = groups[2] if len(groups) > 2 and groups[2] else "main" return owner, repo, branch raise ValueError(f"Invalid GitHub URL format: {source_url}") def create_session() -> requests.Session: """ Create a requests session with retry configuration. Returns: Configured requests session """ session = requests.Session() # Configure retry strategy retry_strategy = Retry( total=DEFAULT_MAX_RETRIES, backoff_factor=0.5, status_forcelist=[429, 500, 502, 503, 504], ) # Mount the retry adapter adapter = HTTPAdapter(max_retries=retry_strategy) session.mount("https://", adapter) session.mount("http://", adapter) return session def verify_source_url(source_url: str, session: requests.Session = None) -> Tuple[bool, str]: """ Verify that the source URL is a valid GitHub repository. Args: source_url: GitHub repository URL session: Optional requests session to use Returns: Tuple of (is_accessible: bool, error_message: str) """ if not session: session = create_session() try: # Extract repo information owner, repo, branch = extract_repo_info(source_url) # Check if the repository exists using GitHub API api_url = f"https://api.github.com/repos/{owner}/{repo}" response = session.get(api_url, timeout=DEFAULT_TIMEOUT) if response.status_code >= 400: return False, f"GitHub repository not found: {owner}/{repo} (Status code: {response.status_code})" # Check if the branch exists branches_url = f"{api_url}/branches/{branch}" response = session.get(branches_url, timeout=DEFAULT_TIMEOUT) if response.status_code >= 400: return False, f"Branch not found: {branch} (Status code: {response.status_code})" # Check if the rules-mdc directory exists contents_url = f"{api_url}/contents/rules-mdc?ref={branch}" response = session.get(contents_url, timeout=DEFAULT_TIMEOUT) if response.status_code >= 400: return False, f"Rules directory not found: rules-mdc (Status code: {response.status_code})" return True, "" except ValueError as e: return False, str(e) except requests.RequestException as e: return False, f"Failed to connect to GitHub: {e}" except Exception as e: return False, f"Unexpected error verifying source URL: {e}" def download_rules( rules: List[Dict[str, Any]], source_url: str, temp_dir: Optional[Path] = None, rate_limit: int = DEFAULT_RATE_LIMIT, max_retries: int = DEFAULT_MAX_RETRIES, max_workers: int = 4, ) -> List[Dict[str, Any]]: """ Download selected MDC rule files from GitHub. Args: rules: List of rule metadata to download source_url: GitHub repository URL temp_dir: Temporary directory to store downloaded files rate_limit: Maximum requests per second max_retries: Maximum number of retries for failed downloads max_workers: Maximum number of concurrent downloads Returns: List of downloaded rule metadata with local file paths Raises: DownloadError: If there are critical download failures """ if not rules: logger.warning("No rules to download") return [] # Create temporary directory if not provided if temp_dir is None: temp_dir = Path.home() / ".cursor-rules-cli" / "temp" temp_dir.mkdir(parents=True, exist_ok=True) logger.debug(f"Using temporary directory: {temp_dir}") # Create rate limiter and session rate_limiter = utils.RateLimiter(rate_limit) session = create_session() # Verify source URL is accessible is_accessible, error_msg = verify_source_url(source_url, session) if not is_accessible: logger.error(f"Source URL verification failed: {error_msg}") logger.error(f"Please check if the source URL is correct: {source_url}") raise DownloadError(f"Source URL is not accessible: {error_msg}") # Get repository information try: owner, repo, branch = extract_repo_info(source_url) logger.info(f"Using GitHub repository: {owner}/{repo}, branch: {branch}") except ValueError as e: logger.error(f"Invalid GitHub URL: {str(e)}") raise DownloadError(f"Invalid GitHub URL: {str(e)}") # Download rules in parallel downloaded_rules = [] failed_downloads = [] with ThreadPoolExecutor(max_workers=max_workers) as executor: # Submit download tasks future_to_rule = { executor.submit( download_rule_from_github, rule, owner, repo, branch, temp_dir, rate_limiter, session, max_retries ): rule for rule in rules } # Process results as they complete for future in as_completed(future_to_rule): rule = future_to_rule[future] try: result = future.result() if result: downloaded_rules.append(result) logger.info(f"Downloaded {rule['name']}") else: failed_downloads.append(rule) logger.error(f"Failed to download {rule['name']}") except Exception as e: failed_downloads.append(rule) logger.error(f"Error downloading {rule['name']}: {str(e)}") # Close the session session.close() # Report download statistics total_rules = len(rules) success_count = len(downloaded_rules) failed_count = len(failed_downloads) if failed_count > 0: failed_names = [rule['name'] for rule in failed_downloads] logger.warning( f"Downloaded {success_count}/{total_rules} rules. " f"Failed to download {failed_count} rules: {', '.join(failed_names)}" ) if failed_count == total_rules: logger.error("All downloads failed. Please check your internet connection and the source URL.") logger.error(f"Source URL: {source_url}") raise DownloadError("All downloads failed. Check internet connection and source URL.") else: logger.info(f"Successfully downloaded all {success_count} rules") return downloaded_rules def download_rule_from_github( rule: Dict[str, Any], owner: str, repo: str, branch: str, temp_dir: Path, rate_limiter: utils.RateLimiter, session: requests.Session, max_retries: int, ) -> Optional[Dict[str, Any]]: """ Download a single MDC rule file from GitHub with validation. Args: rule: Rule metadata owner: GitHub repository owner repo: GitHub repository name branch: GitHub repository branch temp_dir: Temporary directory to store downloaded file rate_limiter: Rate limiter instance session: Requests session max_retries: Maximum number of retries Returns: Updated rule metadata with local file path or None if failed Raises: ValidationError: If the downloaded content fails validation """ name = rule["name"] file_path = f"rules-mdc/{name}.mdc" # Create the GitHub API URL for the file api_url = f"https://api.github.com/repos/{owner}/{repo}/contents/{file_path}?ref={branch}" # Create local file path local_path = temp_dir / f"{name}.mdc" # Try to download the file for attempt in range(max_retries + 1): try: # Respect rate limit rate_limiter.wait() # Download the file using GitHub API response = session.get(api_url, timeout=DEFAULT_TIMEOUT) response.raise_for_status() # Extract content from GitHub API response data = response.json() if "content" not in data: raise ValidationError(f"GitHub API response doesn't contain file content for {name}") # Decode base64 content content = base64.b64decode(data["content"].replace("\n", "")).decode("utf-8") # Validate content is_valid, error_msg = utils.validate_mdc_content(content) if not is_valid: logger.error(f"Content validation failed for {name}: {error_msg}") raise ValidationError(f"Content validation failed: {error_msg}") # Calculate content hash before saving content_hash = utils.calculate_content_hash(content) logger.debug(f"Content hash for {name}: {content_hash}") # Save the file with open(local_path, "w", encoding="utf-8") as f: f.write(content) # Verify the saved file saved_hash = utils.calculate_file_hash(local_path) logger.debug(f"Saved file hash for {name}: {saved_hash}") if saved_hash != content_hash: logger.error(f"File integrity check failed for {name}. Content hash: {content_hash}, File hash: {saved_hash}") # Fix: Read the file back and compare the content with open(local_path, "r", encoding="utf-8") as f: saved_content = f.read() if content == saved_content: logger.info(f"Content matches but hashes differ for {name}. This may be due to line ending differences. Proceeding anyway.") # Update rule metadata and continue rule["local_path"] = str(local_path) rule["content"] = content rule["hash"] = saved_hash # Use the file hash since that's what we'll verify against later return rule else: logger.error(f"Content mismatch for {name}") raise ValidationError("File integrity check failed") # Update rule metadata rule["local_path"] = str(local_path) rule["content"] = content rule["hash"] = content_hash return rule except requests.RequestException as e: if attempt < max_retries: delay = DEFAULT_RETRY_DELAY * (attempt + 1) logger.warning(f"Attempt {attempt + 1}/{max_retries + 1} failed for {name}: {e}") logger.warning(f"Retrying in {delay} seconds...") time.sleep(delay) else: logger.error(f"Failed to download {name} after {max_retries + 1} attempts: {e}") return None except ValidationError as e: logger.error(f"Validation failed for {name}: {e}") return None except Exception as e: logger.error(f"Unexpected error downloading {name}: {e}") return None return None def preview_rule_content(rule: Dict[str, Any], max_lines: int = 10) -> str: """ Generate a preview of the rule content. Args: rule: Rule metadata with content max_lines: Maximum number of lines to include Returns: Preview of the rule content """ if "content" not in rule: return "Content not available" lines = rule["content"].splitlines() if len(lines) <= max_lines: return rule["content"] # Show first few lines return "\n".join(lines[:max_lines]) + f"\n... (and {len(lines) - max_lines} more lines)" if __name__ == "__main__": # For testing import json logging.basicConfig(level=logging.DEBUG) # Example rule test_rule = { "name": "react", "tags": ["frontend", "framework", "javascript"], "path": "rules-mdc/react.mdc", "url": "https://raw.githubusercontent.com/sanjeed5/awesome-cursor-rules-mdc/main/rules-mdc/react.mdc", "description": "react (frontend, framework, javascript)", } # Test download downloaded = download_rules([test_rule], "") if downloaded: print(f"Downloaded rule: {downloaded[0]['name']}") print("Preview:") print(preview_rule_content(downloaded[0])) ``` ## /cursor-rules-cli/src/installer.py ```py path="/cursor-rules-cli/src/installer.py" """ Installer module for installing MDC rule files. This module handles installing the downloaded MDC rule files to the project's .cursor/rules directory. """ import os import shutil import logging from pathlib import Path from typing import Dict, List, Any, Optional from datetime import datetime logger = logging.getLogger(__name__) def install_rules( rules: List[Dict[str, Any]], force: bool = False, cursor_dir: Optional[Path] = None, backup: bool = True, ) -> Dict[str, List]: """ Install downloaded MDC rule files to the project's .cursor/rules directory. Args: rules: List of rule metadata with local file paths force: Whether to overwrite existing rules cursor_dir: Path to .cursor directory (defaults to ./.cursor in current directory) backup: Whether to backup existing rules Returns: Dictionary with 'installed' and 'failed' lists """ result = { "installed": [], "failed": [] } if not rules: logger.warning("No rules to install") return result # Determine .cursor directory - use project local directory if cursor_dir is None: cursor_dir = Path.cwd() / ".cursor" # Create rules directory if it doesn't exist rules_dir = cursor_dir / "rules" rules_dir.mkdir(parents=True, exist_ok=True) logger.debug(f"Installing rules to {rules_dir}") # Backup existing rules if needed if backup and any(rules_dir.glob("*.mdc")): backup_dir = create_backup(rules_dir) if backup_dir: logger.info(f"Backed up existing rules to {backup_dir}") # Install each rule for rule in rules: if "local_path" not in rule: failure = { "rule": rule, "error": "No local file path" } result["failed"].append(failure) logger.warning(f"Skipping {rule['name']}: No local file path") continue # Determine target path target_path = rules_dir / f"{rule['name']}.mdc" # Check if rule already exists if target_path.exists() and not force: failure = { "rule": rule, "error": "Rule already exists (use --force to overwrite)" } result["failed"].append(failure) logger.warning(f"Skipping {rule['name']}: Rule already exists (use --force to overwrite)") continue # Copy the rule file try: shutil.copy2(rule["local_path"], target_path) logger.debug(f"Installed {rule['name']} to {target_path}") result["installed"].append(rule) except IOError as e: failure = { "rule": rule, "error": str(e) } result["failed"].append(failure) logger.error(f"Failed to install {rule['name']}: {e}") logger.info(f"Installed {len(result['installed'])}/{len(rules)} rules to {rules_dir}") return result def create_backup(rules_dir: Path) -> Optional[Path]: """ Create a backup of existing rules in the project directory. Args: rules_dir: Path to .cursor/rules directory Returns: Path to backup directory or None if failed """ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") # Keep backups in the project directory under .cursor/backups backup_dir = rules_dir.parent / "backups" / f"rules_backup_{timestamp}" try: # Create backup directory backup_dir.mkdir(parents=True, exist_ok=True) # Copy existing rules for rule_file in rules_dir.glob("*.mdc"): shutil.copy2(rule_file, backup_dir / rule_file.name) return backup_dir except IOError as e: logger.error(f"Failed to create backup: {e}") return None def list_installed_rules(cursor_dir: Optional[Path] = None) -> List[Dict[str, Any]]: """ List installed MDC rule files. Args: cursor_dir: Path to .cursor directory (defaults to ./.cursor in current directory) Returns: List of installed rule metadata """ # Determine .cursor directory - use project local directory if cursor_dir is None: cursor_dir = Path.cwd() / ".cursor" rules_dir = cursor_dir / "rules" if not rules_dir.exists(): logger.debug(f"Rules directory not found: {rules_dir}") return [] installed_rules = [] for rule_file in rules_dir.glob("*.mdc"): # Extract rule name from filename name = rule_file.stem # Read first few lines to extract description try: with open(rule_file, "r", encoding="utf-8") as f: content = f.read(1000) # Read first 1000 chars # Try to extract description from frontmatter description = name if "description:" in content: desc_line = [line for line in content.split("\n") if "description:" in line] if desc_line: description = desc_line[0].split("description:")[1].strip() except IOError: description = name installed_rules.append({ "name": name, "path": str(rule_file), "description": description, }) return installed_rules if __name__ == "__main__": # For testing import sys logging.basicConfig(level=logging.DEBUG) # List installed rules rules = list_installed_rules() print(f"Installed rules: {len(rules)}") for rule in rules: print(f" - {rule['name']}: {rule['description']}") ``` ## /cursor-rules-cli/src/main.py ```py path="/cursor-rules-cli/src/main.py" #!/usr/bin/env python """ cursor-rules-cli: A tool to scan projects and suggest relevant Cursor rules This module serves as the entry point for the CLI tool. """ import os import sys import logging import argparse from pathlib import Path from typing import Dict, Any, List import json from colorama import Fore, Style, init as init_colorama # Initialize colorama init_colorama() # Import local modules from cursor_rules_cli.scanner import scan_project, scan_package_files from cursor_rules_cli.matcher import match_libraries from cursor_rules_cli.downloader import download_rules from cursor_rules_cli.installer import install_rules from cursor_rules_cli.utils import ( load_config, save_config, get_config_file, load_project_config, save_project_config, get_project_config_file, merge_configs, validate_github_repo, DEFAULT_RULES_PATH ) # Configure logging with colors class ColoredFormatter(logging.Formatter): """Custom formatter to add colors to log messages.""" COLORS = { 'DEBUG': Fore.CYAN, 'INFO': Fore.GREEN, 'WARNING': Fore.YELLOW, 'ERROR': Fore.RED, 'CRITICAL': Fore.RED + Style.BRIGHT } def format(self, record): levelname = record.levelname if levelname in self.COLORS: record.levelname = f"{self.COLORS[levelname]}{levelname}{Style.RESET_ALL}" if record.levelno >= logging.WARNING: record.msg = f"{self.COLORS[levelname]}{record.msg}{Style.RESET_ALL}" return super().format(record) # Configure logging handler = logging.StreamHandler() handler.setFormatter(ColoredFormatter("%(levelname)s: %(message)s")) logging.basicConfig( level=logging.INFO, handlers=[handler] ) logger = logging.getLogger(__name__) def parse_args(): """Parse command line arguments.""" parser = argparse.ArgumentParser( description="Scan your project and install relevant Cursor rules (.mdc files)." ) parser.add_argument( "-d", "--directory", default=".", help="Project directory to scan (default: current directory)" ) parser.add_argument( "--dry-run", action="store_true", help="Show what would be done without making changes" ) parser.add_argument( "--force", action="store_true", help="Force overwrite existing rules" ) parser.add_argument( "--source", default="https://github.com/sanjeed5/awesome-cursor-rules-mdc", help="GitHub repository URL for downloading rules" ) parser.add_argument( "--custom-repo", default=None, help="GitHub username/repo for a forked repository (e.g., 'username/repo')" ) parser.add_argument( "--set-repo", action="store_true", help="Set custom repository without running scan" ) parser.add_argument( "--rules-json", default=None, help="Path to custom rules.json file" ) parser.add_argument( "--save-config", action="store_true", help="Save current settings as default configuration" ) parser.add_argument( "--save-project-config", action="store_true", help="Save current settings as project-specific configuration" ) parser.add_argument( "--show-config", action="store_true", help="Show current configuration" ) parser.add_argument( "--quick-scan", action="store_true", help="Perform a quick scan (only check package files, not imports)" ) parser.add_argument( "--max-results", type=int, default=20, help="Maximum number of rules to display (default: 20)" ) parser.add_argument( "--min-score", type=float, default=0.5, help="Minimum relevance score for rules (0-1, default: 0.5)" ) parser.add_argument( "--libraries", type=str, help="Comma-separated list of libraries to match directly (e.g., 'react,vue,django')" ) parser.add_argument( "-v", "--verbose", action="store_true", help="Enable verbose output" ) return parser.parse_args() def display_config(config: Dict[str, Any], global_config: Dict[str, Any], project_config: Dict[str, Any]): """Display the current configuration.""" print(f"\n{Style.BRIGHT}{Fore.BLUE}Current Configuration:{Style.RESET_ALL}") # First display CLI-wide settings cli_wide_settings = ["custom_repo", "source"] if any(setting in config for setting in cli_wide_settings): print(f"\n{Fore.BLUE}CLI-wide settings:{Style.RESET_ALL}") for key in cli_wide_settings: if key in config: source = f" {Fore.GREEN}(global){Style.RESET_ALL}" if key in global_config else f" {Fore.YELLOW}(default){Style.RESET_ALL}" print(f" {Fore.BLUE}{key}{Style.RESET_ALL}: {config[key]}{source}") # Then display project-specific settings project_settings = [k for k in config.keys() if k not in cli_wide_settings] if project_settings: print(f"\n{Fore.BLUE}Project-specific settings:{Style.RESET_ALL}") for key in project_settings: source = "" if key in project_config: source = f" {Fore.CYAN}(project){Style.RESET_ALL}" elif key in global_config: source = f" {Fore.GREEN}(global){Style.RESET_ALL}" else: source = f" {Fore.YELLOW}(default){Style.RESET_ALL}" print(f" {Fore.BLUE}{key}{Style.RESET_ALL}: {config[key]}{source}") print() def main(): """Main entry point for the CLI.""" # Parse command line arguments args = parse_args() # Convert directory to Path project_dir = Path(args.directory).resolve() # Load configurations global_config = load_config() project_config = load_project_config(project_dir) # Merge configurations (project config takes precedence) config = merge_configs(global_config, project_config) # Ensure rules_json is in config if "rules_json" not in config: config["rules_json"] = str(DEFAULT_RULES_PATH) # Ensure source is in config if "source" not in config: config["source"] = "https://raw.githubusercontent.com/sanjeed5/awesome-cursor-rules-mdc/main" # Handle direct library input if provided libraries_directly_provided = False if args.libraries: libraries = [lib.strip() for lib in args.libraries.split(",") if lib.strip()] if not libraries: logger.error("No valid libraries provided") return 1 logger.info(f"Using directly provided libraries: {', '.join(libraries)}") detected_libraries = libraries libraries_directly_provided = True else: # Run the scanning phase try: # Use quick scan if requested scan_start_msg = "Quick scanning" if args.quick_scan else "Scanning" logger.info(f"{scan_start_msg} for libraries and frameworks...") # Scan project for libraries logger.info("Scanning for libraries and frameworks...") detected_libraries = scan_project( project_dir=project_dir, quick_scan=args.quick_scan, rules_path=config["rules_json"], use_cache=not args.force ) # Get direct match libraries from package files direct_match_libraries = scan_package_files(Path(project_dir)) logger.info(f"Detected {len(detected_libraries)} libraries/frameworks.") # Match libraries with rules logger.info("Finding relevant rules...") matching_rules = match_libraries( detected_libraries=detected_libraries, source_url=config["source"], direct_match_libraries=direct_match_libraries, custom_json_path=config["rules_json"], max_results=args.max_results, min_score=args.min_score ) if not matching_rules: logger.warning("No matching libraries found for your project.") return 0 logger.info(f"Found {Fore.GREEN}{len(matching_rules)}{Style.RESET_ALL} relevant rule files.") # Display and select rules to download selected_rules = display_matched_rules(matching_rules, args.max_results) if not selected_rules: logger.info("No rules selected. Exiting.") return 0 # Download selected rules if args.dry_run: logger.info(f"{Fore.YELLOW}DRY RUN:{Style.RESET_ALL} Would download the following rules:") for rule in selected_rules: logger.info(f" - {Fore.CYAN}{rule}{Style.RESET_ALL}") else: try: # Normalize source URL if needed (remove trailing slashes) source_url = config["source"].rstrip('/') # Log source information logger.info(f"Using source URL: {source_url}") # Download selected rules downloaded_rules = download_rules(selected_rules, source_url) # Install downloaded rules result = install_rules(downloaded_rules, force=args.force) if result["installed"]: logger.info(f"{Fore.GREEN}✅ Successfully installed {len(result['installed'])} rules!{Style.RESET_ALL}") if result["failed"]: logger.warning(f"{Fore.YELLOW}⚠️ Failed to install {len(result['failed'])} rules:{Style.RESET_ALL}") for rule in result["failed"]: logger.warning(f" - {Fore.CYAN}{rule}{Style.RESET_ALL}") except Exception as e: logger.error(f"An error occurred: {str(e)}") return 1 except KeyboardInterrupt: logger.info(f"\n{Fore.YELLOW}Operation cancelled by user.{Style.RESET_ALL}") return 130 except Exception as e: logger.error(f"An error occurred: {e}") if args.verbose: import traceback traceback.print_exc() return 1 # Override config with command line arguments if args.custom_repo is not None: # Validate custom repo if provided if args.custom_repo and not validate_github_repo(args.custom_repo): logger.error(f"{Fore.RED}Invalid GitHub repository: {args.custom_repo}{Style.RESET_ALL}") logger.error(f"{Fore.RED}Repository must exist and contain a rules.json file.{Style.RESET_ALL}") return 1 config["custom_repo"] = args.custom_repo elif "custom_repo" not in config: config["custom_repo"] = None if args.rules_json is not None: config["rules_json"] = args.rules_json elif "rules_json" not in config: config["rules_json"] = str(DEFAULT_RULES_PATH) if args.source != "https://raw.githubusercontent.com/sanjeed5/awesome-cursor-rules-mdc/main": config["source"] = args.source elif "source" not in config: config["source"] = "https://raw.githubusercontent.com/sanjeed5/awesome-cursor-rules-mdc/main" # Set custom repository without running scan if requested if args.set_repo: if args.custom_repo is None: logger.error(f"{Fore.RED}Please specify a custom repository with --custom-repo.{Style.RESET_ALL}") return 1 global_config["custom_repo"] = config["custom_repo"] save_config(global_config) logger.info(f"{Fore.GREEN}Custom repository set to: {config['custom_repo']}{Style.RESET_ALL}") return 0 # Show configuration if requested if args.show_config: display_config(config, global_config, project_config) return 0 # Save configuration if requested if args.save_config: # For custom repo, only save to global config, not project config global_config_to_save = global_config.copy() if "custom_repo" in config: global_config_to_save["custom_repo"] = config["custom_repo"] if "source" in config: global_config_to_save["source"] = config["source"] save_config(global_config_to_save) logger.info(f"{Fore.GREEN}Global configuration saved successfully.{Style.RESET_ALL}") if not args.directory or args.directory == ".": return 0 if args.save_project_config: # Don't include custom_repo in project config project_config_to_save = {k: v for k, v in config.items() if k not in ["custom_repo", "source"]} save_project_config(project_dir, project_config_to_save) logger.info(f"{Fore.GREEN}Project configuration saved to {project_dir / '.cursor-rules-cli.json'}{Style.RESET_ALL}") if not args.directory or args.directory == ".": return 0 # Set log level based on verbosity if args.verbose: logging.getLogger().setLevel(logging.DEBUG) logger.info(f"{Style.BRIGHT}{Fore.BLUE}Cursor Rules CLI{Style.RESET_ALL}") logger.info(f"Scanning project directory: {Fore.CYAN}{os.path.abspath(args.directory)}{Style.RESET_ALL}") # Handle custom repository if specified source_url = config["source"] if config["custom_repo"]: source_url = f"https://raw.githubusercontent.com/{config['custom_repo']}/main" logger.info(f"Using custom repository: {Fore.CYAN}{config['custom_repo']}{Style.RESET_ALL}") # Run the scanning phase only if libraries were not directly provided try: # Skip scanning if libraries were directly provided if not libraries_directly_provided: # Use quick scan if requested scan_start_msg = "Quick scanning" if args.quick_scan else "Scanning" logger.info(f"{scan_start_msg} for libraries and frameworks...") # Scan project for libraries logger.info("Scanning for libraries and frameworks...") detected_libraries = scan_project( project_dir=project_dir, quick_scan=args.quick_scan, rules_path=config["rules_json"], use_cache=not args.force ) # Get direct match libraries from package files direct_match_libraries = scan_package_files(Path(project_dir)) logger.info(f"Detected {len(detected_libraries)} libraries/frameworks.") else: # For directly provided libraries, we don't need to scan package files direct_match_libraries = set(detected_libraries) logger.info(f"{Fore.CYAN}Skipping project scan - using directly provided libraries only{Style.RESET_ALL}") # Match libraries with rules logger.info("Finding relevant rules...") matching_rules = match_libraries( detected_libraries=detected_libraries, source_url=source_url, direct_match_libraries=direct_match_libraries, custom_json_path=config["rules_json"], max_results=args.max_results, min_score=args.min_score ) if not matching_rules: logger.warning("No matching libraries found for your project.") return 0 logger.info(f"Found {Fore.GREEN}{len(matching_rules)}{Style.RESET_ALL} relevant rule files.") # Display and select rules to download selected_rules = display_matched_rules(matching_rules, args.max_results) if not selected_rules: logger.info("No rules selected. Exiting.") return 0 # Download selected rules if args.dry_run: logger.info(f"{Fore.YELLOW}DRY RUN:{Style.RESET_ALL} Would download the following rules:") for rule in selected_rules: logger.info(f" - {Fore.CYAN}{rule}{Style.RESET_ALL}") else: downloaded_rules = download_rules(selected_rules, source_url) # Install downloaded rules result = install_rules(downloaded_rules, force=args.force) if result["installed"]: logger.info(f"{Fore.GREEN}✅ Successfully installed {len(result['installed'])} rules!{Style.RESET_ALL}") if result["failed"]: logger.warning(f"{Fore.YELLOW}⚠️ Failed to install {len(result['failed'])} rules:{Style.RESET_ALL}") for rule in result["failed"]: logger.warning(f" - {Fore.CYAN}{rule}{Style.RESET_ALL}") return 0 except KeyboardInterrupt: logger.info(f"\n{Fore.YELLOW}Operation cancelled by user.{Style.RESET_ALL}") return 130 except Exception as e: logger.error(f"An error occurred: {e}") if args.verbose: import traceback traceback.print_exc() return 1 def group_rules_by_category(rules: List[Dict[str, Any]]) -> Dict[str, List[Dict[str, Any]]]: """ Group rules by their category. Args: rules: List of rule dictionaries with category information Returns: Dictionary mapping categories to lists of rules """ categories = {} for rule in rules: category = rule.get("category", "other") if category not in categories: categories[category] = [] categories[category].append(rule) return categories def get_category_display_name(category: str) -> str: """ Get a display name for a category. Args: category: Category key Returns: Display name for the category """ category_names = { "development": "Development Tools", "frontend": "Frontend Frameworks & Libraries", "backend": "Backend Frameworks & Libraries", "database": "Database & ORM", "ai_ml": "AI & Machine Learning", "devops": "DevOps & Cloud", "utilities": "Utilities & CLI Tools", "other": "Other Libraries", } return category_names.get(category, category.title()) def display_matched_rules(matched_rules: List[Dict[str, Any]], max_results: int = 20) -> List[Dict[str, Any]]: """ Display matched rules and return selected rule objects. Args: matched_rules: List of matched rules max_results: Maximum number of rules to display Returns: List of selected rule objects """ if not matched_rules: logger.info("No relevant rules found for your project.") return [] # Group rules by category (direct_match vs others) direct_matches = [] other_matches = [] for rule in matched_rules: # Check if this is a direct match from package files if rule.get("is_direct_match", False): direct_matches.append(rule) else: other_matches.append(rule) # Sort each group by relevance score direct_matches.sort(key=lambda x: x["relevance_score"], reverse=True) other_matches.sort(key=lambda x: x["relevance_score"], reverse=True) # Combine the lists with direct matches first sorted_rules = direct_matches + other_matches # Limit to max_results display_rules = sorted_rules[:max_results] # Display rules print(f"\n{Style.BRIGHT}{Fore.BLUE}Available Cursor rules for your project:{Style.RESET_ALL}\n") # Display direct matches first if direct_matches: print(f"{Style.BRIGHT}{Fore.GREEN}Direct Dependencies:{Style.RESET_ALL}") for i, rule in enumerate([r for r in display_rules if r.get("is_direct_match", False)], 1): tags = f"[{', '.join(rule.get('tags', []))}]" if rule.get('tags') else "" score = f"({rule['relevance_score']:.2f})" print(f"{Fore.GREEN}{i}.{Style.RESET_ALL} {Fore.CYAN}{rule['rule']}{Style.RESET_ALL} {tags} {score}") # Display other matches if other_matches and any(not r.get("is_direct_match", False) for r in display_rules): print(f"\n{Style.BRIGHT}{Fore.YELLOW}Other Relevant Rules:{Style.RESET_ALL}") # Continue numbering from where direct matches left off start_idx = len([r for r in display_rules if r.get("is_direct_match", False)]) + 1 for i, rule in enumerate([r for r in display_rules if not r.get("is_direct_match", False)], start_idx): tags = f"[{', '.join(rule.get('tags', []))}]" if rule.get('tags') else "" score = f"({rule['relevance_score']:.2f})" print(f"{Fore.GREEN}{i}.{Style.RESET_ALL} {Fore.CYAN}{rule['rule']}{Style.RESET_ALL} {tags} {score}") # Get user selection print(f"\n{Style.BRIGHT}Select rules to install:{Style.RESET_ALL}") print(f" {Fore.YELLOW}* Enter comma-separated numbers (e.g., 1,3,5){Style.RESET_ALL}") print(f" {Fore.YELLOW}* Type 'all' to select all rules{Style.RESET_ALL}") print(f" {Fore.YELLOW}* Type 'category:name' to select all rules in a category (e.g., 'category:development'){Style.RESET_ALL}") print(f" {Fore.YELLOW}* Type 'none' to cancel{Style.RESET_ALL}") selection = input(f"{Fore.GREEN}> {Style.RESET_ALL}").strip().lower() if selection == "none": logger.info("No rules selected. Exiting.") return [] if selection == "all": return display_rules if selection.startswith("category:"): category = selection.split(":", 1)[1] return [ rule for rule in display_rules if category in rule.get("tags", []) ] try: indices = [int(idx.strip()) for idx in selection.split(",") if idx.strip()] return [display_rules[idx - 1] for idx in indices if 1 <= idx <= len(display_rules)] except (ValueError, IndexError): logger.error(f"{Fore.RED}Invalid selection. Please try again.{Style.RESET_ALL}") return display_matched_rules(matched_rules, max_results) if __name__ == "__main__": sys.exit(main()) ``` ## /cursor-rules-cli/src/matcher.py ```py path="/cursor-rules-cli/src/matcher.py" """ Matcher module for matching detected libraries with MDC rules. This module matches detected libraries with MDC rules based on relevance scores, library relationships, and project context. """ import os import json import logging from pathlib import Path from typing import Dict, List, Set, Optional, Any, Tuple from cursor_rules_cli import utils logger = logging.getLogger(__name__) # Minimum relevance score for a rule to be considered MIN_RELEVANCE_SCORE = 0.5 # Maximum number of rules to return MAX_RULES = 10 def match_libraries( detected_libraries: List[str], source_url: str, direct_match_libraries: Optional[Set[str]] = None, custom_json_path: Optional[Path] = None, max_results: int = MAX_RULES, min_score: float = MIN_RELEVANCE_SCORE ) -> List[Dict[str, Any]]: """ Match detected libraries with available rules. Args: detected_libraries: List of detected libraries source_url: Base URL for the repository direct_match_libraries: Set of libraries that are direct matches from package files custom_json_path: Path to custom rules.json file max_results: Maximum number of rules to return min_score: Minimum relevance score for rules Returns: List of matched rules with metadata """ # Create a RuleMatcher instance matcher = RuleMatcher( rules_path=str(custom_json_path) if custom_json_path else None, min_relevance_score=min_score, max_rules=max_results ) # Match rules matched_rules = matcher.match_rules(detected_libraries) # Add URL and other metadata to each rule for rule in matched_rules: rule_name = rule.get("rule") rule["name"] = rule_name # We don't construct a URL here anymore - the downloader will handle this using GitHub API # Mark direct matches if direct_match_libraries and rule_name.lower() in (lib.lower() for lib in direct_match_libraries): rule["is_direct_match"] = True else: rule["is_direct_match"] = False return matched_rules class RuleMatcher: """ Class for matching detected libraries with MDC rules. """ def __init__( self, rules_path: str = None, use_cache: bool = True, min_relevance_score: float = MIN_RELEVANCE_SCORE, max_rules: int = MAX_RULES ): """ Initialize the RuleMatcher. Args: rules_path: Path to rules.json file use_cache: Whether to use caching min_relevance_score: Minimum relevance score for a rule max_rules: Maximum number of rules to return """ self.rules_path = rules_path self.use_cache = use_cache self.min_relevance_score = min_relevance_score self.max_rules = max_rules # Load library data from rules.json self.library_data = utils.load_library_data(rules_path) # Create library mappings self._create_library_mappings() def _create_library_mappings(self): """Create mappings for efficient library lookups.""" self.lib_to_tags = {} self.tag_to_libs = {} self.lib_to_related = {} if not self.library_data or "libraries" not in self.library_data: return for lib in self.library_data["libraries"]: lib_name = lib["name"].lower() # Map library to its tags tags = lib.get("tags", []) self.lib_to_tags[lib_name] = set(tags) # Map tags to libraries for tag in tags: if tag not in self.tag_to_libs: self.tag_to_libs[tag] = set() self.tag_to_libs[tag].add(lib_name) # Map library to related libraries related = lib.get("related", []) self.lib_to_related[lib_name] = set(related) def match_rules( self, detected_libraries: List[str], project_context: Optional[Dict[str, float]] = None ) -> List[Dict[str, Any]]: """ Match detected libraries with MDC rules. Args: detected_libraries: List of detected libraries project_context: Optional project context scores Returns: List of matched rules with relevance scores """ if not self.library_data or "libraries" not in self.library_data: logger.warning("No libraries found in rules.json") return [] # Check cache first if self.use_cache: cache_key = utils.create_cache_key( ",".join(sorted(detected_libraries)), str(project_context), self.min_relevance_score, self.max_rules ) cached_data = utils.get_cached_data(cache_key) if cached_data: logger.debug("Using cached rule matches") return cached_data # Normalize library names normalized_libs = { utils.normalize_library_name(lib, self.library_data) for lib in detected_libraries } # Get project context if not provided if project_context is None: project_context = utils.get_project_context(normalized_libs, self.library_data) # Calculate relevance scores for each library in rules.json library_scores = [] for library in self.library_data["libraries"]: score = self._calculate_library_relevance( library, normalized_libs, project_context ) if score >= self.min_relevance_score: library_scores.append((library, score)) # Sort libraries by relevance score library_scores.sort(key=lambda x: x[1], reverse=True) # Format results results = [] for library, score in library_scores[:self.max_rules]: result = { "rule": library["name"], "relevance_score": round(score, 3), "description": f"{library['name']} ({', '.join(library.get('tags', []))})", "tags": library.get("tags", []), "libraries": [library["name"]], "category": self._categorize_library(library, normalized_libs) } results.append(result) # Cache results if self.use_cache: utils.set_cached_data(cache_key, results) return results def _calculate_library_relevance( self, library: Dict[str, Any], detected_libs: Set[str], project_context: Dict[str, float] ) -> float: """ Calculate relevance score for a library. Args: library: Library data detected_libs: Set of detected libraries project_context: Project context scores Returns: Relevance score between 0 and 1 """ # Direct match score lib_name = library["name"].lower() direct_match = 1.0 if lib_name in detected_libs else 0.0 # Tag similarity score tag_score = self._calculate_tag_similarity_score(library, detected_libs) # Context score from project type and tags context_score = self._calculate_context_score(library, project_context) # Combine scores with weights weights = { "direct_match": 0.8, "tag_similarity": 0.15, "context": 0.05 } total_score = ( weights["direct_match"] * direct_match + weights["tag_similarity"] * tag_score + weights["context"] * context_score ) return total_score def _calculate_context_score( self, library: Dict[str, Any], project_context: Dict[str, float] ) -> float: """ Calculate context match score for a library. Args: library: Library data project_context: Project context scores Returns: Score between 0 and 1 """ library_tags = set(library.get("tags", [])) if not library_tags or not project_context: return 0 # Calculate weighted average of context scores for matching tags total_score = 0 total_weight = 0 for tag in library_tags: if tag in project_context: weight = project_context[tag] total_score += weight total_weight += 1 return total_score / total_weight if total_weight > 0 else 0 def _calculate_tag_similarity_score( self, library: Dict[str, Any], detected_libs: Set[str] ) -> float: """ Calculate tag similarity score for a library. Args: library: Library data detected_libs: Set of detected libraries Returns: Score between 0 and 1 """ library_tags = set(library.get("tags", [])) if not library_tags: return 0 # Get all tags from detected libraries lib_tags = set() for lib in detected_libs: if lib in self.lib_to_tags: lib_tags.update(self.lib_to_tags[lib]) if not lib_tags: return 0 # Calculate Jaccard similarity intersection = library_tags & lib_tags union = library_tags | lib_tags return len(intersection) / len(union) def _categorize_library( self, library: Dict[str, Any], detected_libs: Set[str] ) -> str: """ Categorize a library based on its relationship to detected libraries. Args: library: Library data detected_libs: Set of detected libraries Returns: Category string """ lib_name = library["name"].lower() # Check for direct matches if lib_name in detected_libs: return "direct_match" # Check for tag matches library_tags = set(library.get("tags", [])) lib_tags = set() for lib in detected_libs: if lib in self.lib_to_tags: lib_tags.update(self.lib_to_tags[lib]) if library_tags & lib_tags: return "tag_match" return "suggested" if __name__ == "__main__": # For testing import sys logging.basicConfig(level=logging.DEBUG) if len(sys.argv) > 1: rules_path = sys.argv[1] else: rules_path = None # Example usage matcher = RuleMatcher(rules_path) detected_libs = ["react", "next-js", "tailwindcss"] matched_rules = matcher.match_rules(detected_libs) print("\nDetected libraries:", detected_libs) print("\nMatched rules:") for rule in matched_rules: print(f"\n{rule['rule']} (score: {rule['relevance_score']}):") print(f" Category: {rule['category']}") print(f" Description: {rule['description']}") print(f" Tags: {', '.join(rule['tags'])}") print(f" Libraries: {', '.join(rule['libraries'])}") ``` ## /cursor-rules-cli/src/scanner.py ```py path="/cursor-rules-cli/src/scanner.py" """ Scanner module for detecting libraries and frameworks in a project. This module scans a project directory to identify which libraries and frameworks are being used based on package manager files, import statements, and framework-specific file patterns. """ import os import json import logging import re from pathlib import Path from typing import Dict, List, Set, Optional, Tuple, Any from concurrent.futures import ThreadPoolExecutor, as_completed from cursor_rules_cli import utils logger = logging.getLogger(__name__) # File patterns to look for PACKAGE_PATTERNS = { "node": [ "package.json", "yarn.lock", "pnpm-lock.yaml", "package-lock.json" ], "python": [ "requirements.txt", "pyproject.toml", "Pipfile", "setup.py", "uv.lock", "poetry.lock", "conda.yaml", "environment.yml" ], "php": ["composer.json", "composer.lock"], "rust": ["Cargo.toml", "Cargo.lock"], "go": ["go.mod", "go.sum"], "ruby": ["Gemfile", "Gemfile.lock"], "java": ["pom.xml", "build.gradle", "build.gradle.kts"], "dotnet": ["*.csproj", "*.fsproj", "*.vbproj", "packages.config"], } # Framework-specific file patterns FRAMEWORK_PATTERNS = { "react": ["src/App.jsx", "src/App.tsx", "src/App.js", "public/index.html"], "vue": ["src/App.vue", "src/main.js", "public/index.html"], "angular": ["angular.json", "src/app/app.module.ts"], "next-js": ["next.config.js", "pages/_app.js", "pages/_app.tsx"], "nuxt": ["nuxt.config.js", "nuxt.config.ts"], "svelte": ["svelte.config.js", "src/App.svelte"], "django": ["manage.py", "wsgi.py", "asgi.py"], "flask": ["app.py", "wsgi.py", "application.py"], "fastapi": ["main.py"], "express": ["app.js", "server.js"], "nestjs": ["nest-cli.json", "src/main.ts"], "laravel": ["artisan", "composer.json"], "spring-boot": ["src/main/java", "src/main/resources/application.properties"], } # Import patterns for different languages IMPORT_PATTERNS = { "python": { "files": ["*.py"], "regex": [ r"(?:^|\n)\s*(?:import|from)\s+([a-zA-Z0-9_.]+)", r"(?:^|\n)\s*from\s+([a-zA-Z0-9_.]+)\s+import", r"(?:^|\n)\s*__import__\(['\"]([a-zA-Z0-9_.]+)['\"]\)", r"(?:^|\n)\s*importlib\.import_module\(['\"]([a-zA-Z0-9_.]+)['\"]\)" ] }, "javascript": { "files": ["*.js", "*.jsx", "*.ts", "*.tsx"], "regex": [ r"(?:^|\n)\s*import\s+.*?(?:from\s+['\"]([^'\"]+)['\"]|['\"]([^'\"]+)['\"])", r"(?:^|\n)\s*require\(['\"]([^'\"]+)['\"]\)", r"(?:^|\n)\s*import\(['\"]([^'\"]+)['\"]\)" ] }, "php": { "files": ["*.php"], "regex": [ r"(?:^|\n)\s*(?:use|require|include|require_once|include_once)\s+['\"]?([a-zA-Z0-9_\\/.]+)", r"(?:^|\n)\s*namespace\s+([a-zA-Z0-9_\\/.]+)" ] }, "java": { "files": ["*.java"], "regex": [ r"(?:^|\n)\s*import\s+([a-zA-Z0-9_.]+)", r"(?:^|\n)\s*package\s+([a-zA-Z0-9_.]+)" ] }, "rust": { "files": ["*.rs"], "regex": [ r"(?:^|\n)\s*(?:use|extern\s+crate)\s+([a-zA-Z0-9_:]+)", r"(?:^|\n)\s*mod\s+([a-zA-Z0-9_]+)" ] }, } # Directories to exclude from scanning EXCLUDED_DIRS = [ "node_modules", "venv", ".venv", "env", ".env", "__pycache__", ".git", ".github", ".idea", ".vscode", "dist", "build", "target", "out", "bin", "obj", ".next", ".nuxt", ".svelte-kit", ".cache", ".pytest_cache", ".mypy_cache", ".ruff_cache", "site-packages", "lib/python*", ] # Maximum directory depth for import scanning MAX_SCAN_DEPTH = 5 def scan_project( project_dir: str, quick_scan: bool = False, max_depth: int = MAX_SCAN_DEPTH, rules_path: str = None, max_workers: int = None, use_cache: bool = True ) -> List[str]: """ Scan a project directory to detect libraries and frameworks. Args: project_dir: Path to the project directory quick_scan: If True, only scan package files, not imports max_depth: Maximum directory depth for scanning rules_path: Path to rules.json file max_workers: Maximum number of worker threads (None for CPU count) use_cache: Whether to use caching Returns: List of detected libraries and frameworks """ project_path = Path(project_dir).resolve() logger.debug(f"Scanning project at {project_path}") # Check cache first if use_cache: cache_key = utils.create_cache_key( str(project_path), quick_scan, max_depth, rules_path ) cached_data = utils.get_cached_data(cache_key) if cached_data: logger.debug("Using cached scan results") return cached_data # Load library data from rules.json library_data = utils.load_library_data(rules_path) # Track both the libraries and their sources detected_libraries = set() direct_match_libraries = set() # Track direct matches separately with ThreadPoolExecutor(max_workers=max_workers) as executor: # Submit package file scanning first to identify direct dependencies package_files_future = executor.submit(scan_package_files, project_path) # Submit other scanning tasks future_to_task = { executor.submit(scan_docker_files, project_path): "docker_files", executor.submit(scan_github_actions, project_path): "github_actions", executor.submit(detect_frameworks, project_path): "frameworks" } # Process package files result first to identify direct dependencies try: direct_matches = package_files_future.result() detected_libraries.update(direct_matches) direct_match_libraries.update(direct_matches) # Mark as direct matches logger.debug(f"Completed package_files scan, found {len(direct_matches)} direct dependencies") except Exception as e: logger.error(f"Error in package_files scan: {e}") # Add import scanning if not quick scan if not quick_scan: future_to_task[executor.submit(scan_imports, project_path, max_depth)] = "imports" # Process results from other scanning tasks for future in as_completed(future_to_task): task_name = future_to_task[future] try: result = future.result() detected_libraries.update(result) logger.debug(f"Completed {task_name} scan") except Exception as e: logger.error(f"Error in {task_name} scan: {e}") # Normalize library names normalized_libraries = { utils.normalize_library_name(lib, library_data) for lib in detected_libraries } normalized_direct_matches = { utils.normalize_library_name(lib, library_data) for lib in direct_match_libraries } # Detect additional frameworks based on rules.json if library_data: framework_libs = detect_frameworks_from_rules(normalized_libraries, library_data) normalized_libraries.update(framework_libs) # Sort libraries, prioritizing direct matches first, then by popularity sorted_libraries = sorted( normalized_libraries, key=lambda x: ( x in normalized_direct_matches, # Direct matches first utils.calculate_library_popularity(x, library_data) # Then by popularity ), reverse=True ) # Cache results if use_cache: utils.set_cached_data(cache_key, sorted_libraries) return sorted_libraries def scan_package_files(project_path: Path) -> Set[str]: """ Scan package manager files to detect libraries. Args: project_path: Path to the project directory Returns: Set of detected libraries """ detected_libs = set() # Check for Node.js package files for node_file in ["package.json", "yarn.lock", "pnpm-lock.yaml"]: file_path = project_path / node_file if file_path.exists(): logger.debug(f"Found {node_file}") try: if node_file == "package.json": with open(file_path, 'r') as f: data = json.load(f) # Add dependencies deps = data.get("dependencies", {}) dev_deps = data.get("devDependencies", {}) all_deps = {**deps, **dev_deps} # Add detected libraries detected_libs.update(all_deps.keys()) # Detect framework from dependencies framework_deps = { "react": "react", "vue": "vue", "next": "next-js", "nuxt": "nuxt", "svelte": "svelte", "@angular/core": "angular", "express": "express", "@nestjs/core": "nestjs" } for dep, framework in framework_deps.items(): if dep in deps: detected_libs.add(framework) elif node_file == "yarn.lock": with open(file_path, 'r') as f: content = f.read() # Extract package names from yarn.lock packages = re.findall(r'^"?([^@\s"]+)@', content, re.MULTILINE) detected_libs.update(packages) elif node_file == "pnpm-lock.yaml": with open(file_path, 'r') as f: content = f.read() # Extract package names from pnpm-lock.yaml packages = re.findall(r'(?:^|\n)\s*/([^/:]+):', content) detected_libs.update(packages) except (json.JSONDecodeError, IOError) as e: logger.warning(f"Error parsing {node_file}: {e}") # Check for Python package files python_files = { "requirements.txt": r'^([a-zA-Z0-9_.-]+)', "pyproject.toml": None, # Pattern not needed, handled specially "Pipfile": r'(?:^|\n)\s*([a-zA-Z0-9_.-]+)\s*=', "setup.py": r'install_requires=\[([^\]]+)\]', # Keep uv.lock for compatibility with uv (modern Python package manager) # Only check for it if it exists to avoid unnecessary file operations "uv.lock": r'name\s*=\s*"([^"]+)"' if os.path.exists(project_path / "uv.lock") else None } for file_name, pattern in python_files.items(): file_path = project_path / file_name if file_path.exists(): logger.debug(f"Found {file_name}") try: with open(file_path, 'r') as f: content = f.read() if file_name == "setup.py": # Special handling for setup.py matches = re.search(pattern, content) if matches: packages = re.findall(r'[\'"]([^\'\"]+)[\'"]', matches.group(1)) detected_libs.update(p.split('>=')[0].split('==')[0].strip() for p in packages) elif file_name == "pyproject.toml": # Special handling for pyproject.toml # Look for dependencies section in PEP 621 format pep621_deps_match = re.search(r'\[project\].*?dependencies\s*=\s*\[(.*?)\]', content, re.DOTALL) if pep621_deps_match: deps_content = pep621_deps_match.group(1) # Extract package names from dependencies packages = re.findall(r'[\'"]([a-zA-Z0-9_.-]+)(?:>=|==|>|<|~=|!=|@|$)', deps_content) detected_libs.update(packages) # Look for dependencies section in Poetry format poetry_deps_match = re.search(r'\[tool\.poetry\.dependencies\](.*?)(?:\[|\Z)', content, re.DOTALL) if poetry_deps_match: deps_content = poetry_deps_match.group(1) # Extract package names from Poetry dependencies packages = re.findall(r'([a-zA-Z0-9_.-]+)\s*=', deps_content) detected_libs.update(packages) # Also check for dev-dependencies in Poetry format dev_deps_match = re.search(r'\[tool\.poetry\.dev-dependencies\](.*?)(?:\[|\Z)', content, re.DOTALL) if dev_deps_match: dev_deps_content = dev_deps_match.group(1) dev_packages = re.findall(r'([a-zA-Z0-9_.-]+)\s*=', dev_deps_content) detected_libs.update(dev_packages) else: # General pattern matching packages = re.findall(pattern, content) detected_libs.update(p.split('>=')[0].split('==')[0].strip() for p in packages) # Check for common frameworks framework_packages = {"django", "flask", "fastapi"} detected_libs.update(framework_packages & detected_libs) except IOError as e: logger.warning(f"Error reading {file_name}: {e}") return detected_libs def scan_docker_files(project_path: Path) -> Set[str]: """ Scan Dockerfile and docker-compose files for libraries. Args: project_path: Path to the project directory Returns: Set of detected libraries """ detected_libs = set() docker_files = ["Dockerfile", "docker-compose.yml", "docker-compose.yaml"] for file_name in docker_files: file_path = project_path / file_name if file_path.exists(): logger.debug(f"Found {file_name}") try: with open(file_path, 'r') as f: content = f.read() # Look for common package installations pip_packages = re.findall(r'pip\s+install\s+([^\s&|;]+)', content) npm_packages = re.findall(r'npm\s+install\s+([^\s&|;]+)', content) apt_packages = re.findall(r'apt-get\s+install\s+([^\s&|;]+)', content) detected_libs.update(pip_packages) detected_libs.update(npm_packages) detected_libs.update(apt_packages) # Look for base images base_images = re.findall(r'FROM\s+([^\s:]+)', content) detected_libs.update(base_images) except IOError as e: logger.warning(f"Error reading {file_name}: {e}") return detected_libs def scan_github_actions(project_path: Path) -> Set[str]: """ Scan GitHub Actions workflow files for libraries. Args: project_path: Path to the project directory Returns: Set of detected libraries """ detected_libs = set() workflows_dir = project_path / ".github" / "workflows" if not workflows_dir.exists(): return detected_libs for workflow_file in workflows_dir.glob("*.yml"): logger.debug(f"Found workflow file: {workflow_file}") try: with open(workflow_file, 'r') as f: content = f.read() # Look for common actions and tools actions = re.findall(r'uses:\s+([^\s@]+)', content) detected_libs.update(actions) # Look for package installations pip_packages = re.findall(r'pip\s+install\s+([^\s&|;]+)', content) npm_packages = re.findall(r'npm\s+install\s+([^\s&|;]+)', content) detected_libs.update(pip_packages) detected_libs.update(npm_packages) except IOError as e: logger.warning(f"Error reading workflow file {workflow_file}: {e}") return detected_libs def detect_frameworks(project_path: Path) -> Set[str]: """ Detect frameworks based on specific file patterns. Args: project_path: Path to the project directory Returns: Set of detected frameworks """ detected_frameworks = set() for framework, patterns in FRAMEWORK_PATTERNS.items(): for pattern in patterns: # Check if the pattern is a directory if not pattern.endswith(('/', '\\')) and not os.path.splitext(pattern)[1]: if (project_path / pattern).is_dir(): logger.debug(f"Found framework directory pattern: {pattern}") detected_frameworks.add(framework) break # Check for file patterns matches = list(project_path.glob(pattern)) if matches: logger.debug(f"Found framework file pattern: {pattern}") detected_frameworks.add(framework) break return detected_frameworks def detect_frameworks_from_rules(detected_libs: Set[str], library_data: Dict[str, Any]) -> Set[str]: """ Detect frameworks based on detected libraries and rules.json data. Args: detected_libs: Set of detected libraries library_data: Library data from rules.json Returns: Set of detected frameworks """ detected_frameworks = set() if not library_data or "libraries" not in library_data: return detected_frameworks # Create a mapping of library names to their data lib_map = {lib["name"].lower(): lib for lib in library_data["libraries"]} # Check detected libraries against rules.json for lib in detected_libs: lib_lower = lib.lower() if lib_lower in lib_map: # Add the library itself detected_frameworks.add(lib_lower) # Check if this library is a framework tags = lib_map[lib_lower].get("tags", []) if "framework" in tags: detected_frameworks.add(lib_lower) # Check for related libraries based on tags # For example, if we detect "react", we might want to check for "react-router" if "react" in lib_lower and "frontend" in tags: for related_lib, related_data in lib_map.items(): if "react" in related_lib and related_lib != lib_lower: if any(tag in related_data.get("tags", []) for tag in ["frontend", "ui"]): detected_frameworks.add(related_lib) return detected_frameworks def scan_imports(project_path: Path, max_depth: int = MAX_SCAN_DEPTH) -> Set[str]: """ Scan source files for import statements to detect libraries. Args: project_path: Path to the project directory max_depth: Maximum directory depth for scanning Returns: Set of detected libraries from imports """ detected_imports = set() for lang, pattern_info in IMPORT_PATTERNS.items(): file_patterns = pattern_info["files"] import_regexes = pattern_info["regex"] # Use a more efficient file traversal with depth limit and exclusions for file_pattern in file_patterns: for file_path in find_files(project_path, file_pattern, max_depth): try: with open(file_path, 'r', encoding='utf-8', errors='ignore') as f: content = f.read() # Find all imports using multiple regex patterns for import_regex in import_regexes: imports = re.findall(import_regex, content) # Process matches for imp in imports: if isinstance(imp, tuple): # Some regex patterns might have multiple capture groups imp = next((i for i in imp if i), "") if imp: # Extract the top-level package name top_level = imp.split('.')[0].split('/')[0] if top_level and not top_level.startswith(('.', '_')): detected_imports.add(top_level.lower()) except (IOError, UnicodeDecodeError) as e: logger.debug(f"Error reading {file_path}: {e}") return detected_imports def find_files(root_dir: Path, pattern: str, max_depth: int, current_depth: int = 0) -> List[Path]: """ Find files matching a pattern with depth limit and directory exclusions. Args: root_dir: Root directory to start searching from pattern: File pattern to match max_depth: Maximum directory depth to search current_depth: Current depth in the directory tree Returns: List of file paths matching the pattern """ if current_depth > max_depth: return [] matching_files = [] try: for item in root_dir.iterdir(): if item.is_file() and item.match(pattern): matching_files.append(item) elif item.is_dir() and not should_exclude_dir(item): matching_files.extend(find_files(item, pattern, max_depth, current_depth + 1)) except (PermissionError, OSError) as e: logger.debug(f"Error accessing {root_dir}: {e}") return matching_files def should_exclude_dir(dir_path: Path) -> bool: """ Check if a directory should be excluded from scanning. Args: dir_path: Path to the directory Returns: True if the directory should be excluded, False otherwise """ dir_name = dir_path.name return dir_name in EXCLUDED_DIRS or dir_name.startswith('.') if __name__ == "__main__": # For testing import sys logging.basicConfig(level=logging.DEBUG) if len(sys.argv) > 1: project_dir = sys.argv[1] else: project_dir = "." libraries = scan_project(project_dir) print(f"Detected libraries: {libraries}") ``` ## /cursor-rules-cli/src/utils.py ```py path="/cursor-rules-cli/src/utils.py" """ Utils module with helper functions for the Cursor Rules CLI. This module provides utility functions for file operations, security verification, and other common tasks. """ import os import re import hashlib import logging import platform import json import time from threading import Lock from pathlib import Path from typing import Dict, List, Any, Optional, Tuple, Set import requests from functools import lru_cache from urllib.parse import urlparse import validators logger = logging.getLogger(__name__) class RateLimiter: """Rate limiter to avoid overwhelming servers.""" def __init__(self, rate_limit: int): """ Initialize rate limiter. Args: rate_limit: Maximum requests per second """ self.rate_limit = rate_limit self.last_request_time = 0 self._lock = Lock() # Thread-safe locking def wait(self): """ Wait if necessary to respect the rate limit. Thread-safe implementation. """ with self._lock: current_time = time.time() elapsed = current_time - self.last_request_time # If we've made a request recently, wait if elapsed < (1.0 / self.rate_limit): sleep_time = (1.0 / self.rate_limit) - elapsed time.sleep(sleep_time) self.last_request_time = time.time() def calculate_content_hash(content: str) -> str: """ Calculate SHA-256 hash of content string. Args: content: Content to hash Returns: SHA-256 hash as hex string """ # Normalize line endings to '\n' content = content.replace('\r\n', '\n').replace('\r', '\n') return hashlib.sha256(content.encode('utf-8')).hexdigest() def get_cursor_dir() -> Path: """ Get the path to the project's .cursor directory. Returns: Path to the project's .cursor directory """ return Path.cwd() / ".cursor" def get_rules_dir() -> Path: """ Get the path to the project's .cursor/rules directory. Returns: Path to the project's .cursor/rules directory """ return get_cursor_dir() / "rules" def get_config_file() -> Path: """ Get the path to the configuration file. Returns: Path to the configuration file in the project's .cursor directory """ return get_cursor_dir() / "rules-cli-config.json" def load_config() -> Dict[str, Any]: """ Load configuration from the config file. Returns: Configuration dictionary """ config_file = get_config_file() if not config_file.exists(): return {} try: with open(config_file, 'r') as f: return json.load(f) except (IOError, json.JSONDecodeError) as e: logger.error(f"Failed to load config file: {e}") return {} def save_config(config: Dict[str, Any]) -> bool: """ Save configuration to the config file. Args: config: Configuration dictionary Returns: True if successful, False otherwise """ config_file = get_config_file() try: # Ensure the directory exists ensure_dir_exists(config_file.parent) with open(config_file, 'w') as f: json.dump(config, f, indent=2) return True except IOError as e: logger.error(f"Failed to save config file: {e}") return False def ensure_dir_exists(path: Path) -> bool: """ Ensure a directory exists, creating it if necessary. Args: path: Path to the directory Returns: True if the directory exists or was created, False otherwise """ try: path.mkdir(parents=True, exist_ok=True) return True except OSError as e: logger.error(f"Failed to create directory {path}: {e}") return False def is_url_trusted(url: str) -> Tuple[bool, str]: """ Check if a URL is from a trusted source using proper URL parsing. Args: url: URL to check Returns: Tuple of (is_trusted: bool, error_message: str) """ # First validate URL format if not validators.url(url): return False, "Invalid URL format" try: parsed_url = urlparse(url) # Check for HTTPS if parsed_url.scheme != "https": return False, "URL must use HTTPS" # List of trusted domains and their subdomains trusted_domains = [ "raw.githubusercontent.com", "github.com", ] # Extract domain from URL domain = parsed_url.netloc.lower() # Check if domain exactly matches or is subdomain of trusted domains is_trusted = any( domain == trusted_domain or domain.endswith(f".{trusted_domain}") for trusted_domain in trusted_domains ) if not is_trusted: return False, f"Domain {domain} is not in trusted list" # Additional security checks for GitHub URLs if "github" in domain: # Validate path format for raw.githubusercontent.com if domain == "raw.githubusercontent.com": path_parts = [p for p in parsed_url.path.split("/") if p] if len(path_parts) < 4: # username/repo/branch/path return False, "Invalid GitHub raw URL format" # Validate path format for github.com elif domain == "github.com": path_parts = [p for p in parsed_url.path.split("/") if p] if len(path_parts) < 2: # username/repo return False, "Invalid GitHub repository URL format" return True, "" except Exception as e: return False, f"URL validation error: {str(e)}" def validate_mdc_content(content: str) -> Tuple[bool, str]: """ Validate MDC file content more thoroughly. Args: content: Content to validate Returns: Tuple of (is_valid: bool, error_message: str) """ if not content: return False, "Empty content" # Check for frontmatter if not content.startswith("---"): return False, "Missing frontmatter start" # Find end of frontmatter frontmatter_end = content.find("---", 3) if frontmatter_end == -1: return False, "Missing frontmatter end" # Extract frontmatter frontmatter = content[3:frontmatter_end].strip() # Required fields in frontmatter required_fields = ["description", "globs"] # Check for required fields for field in required_fields: if f"{field}:" not in frontmatter: return False, f"Missing required field: {field}" # Check for content after frontmatter content_after_frontmatter = content[frontmatter_end + 3:].strip() if not content_after_frontmatter: return False, "No content after frontmatter" # Check for potentially malicious content suspicious_patterns = [ r" Optional[str]: """ Calculate the SHA-256 hash of a file. Args: file_path: Path to the file Returns: SHA-256 hash as a hex string, or None if failed """ try: # Read the file as text and normalize line endings with open(file_path, "r", encoding="utf-8") as f: content = f.read() # Normalize line endings to '\n' content = content.replace('\r\n', '\n').replace('\r', '\n') # Calculate hash from normalized content return hashlib.sha256(content.encode('utf-8')).hexdigest() except IOError as e: logger.error(f"Failed to calculate hash for {file_path}: {e}") return None def is_valid_mdc_file(content: str) -> bool: """ Check if the content is a valid MDC file. Args: content: File content to check Returns: True if valid, False otherwise """ # Check for frontmatter if not content.startswith("---"): return False # Check for description if "description:" not in content: return False # Check for globs if "globs:" not in content: return False # Check for closing frontmatter frontmatter_end = content.find("---", 3) if frontmatter_end == -1: return False # Check for content after frontmatter if len(content) <= frontmatter_end + 3: return False # Check if there's actual content after the frontmatter content_after_frontmatter = content[frontmatter_end + 3:].strip() if not content_after_frontmatter: return False return True def sanitize_filename(name: str) -> str: """ Sanitize a filename to ensure it's valid. Args: name: Filename to sanitize Returns: Sanitized filename """ # Replace invalid characters with underscores invalid_chars = r'[<>:"/\\|?*]' sanitized = re.sub(invalid_chars, "_", name) # Ensure it's not too long max_length = 255 if platform.system() != "Windows" else 240 if len(sanitized) > max_length: sanitized = sanitized[:max_length] return sanitized def preview_content(content: str, max_lines: int = 10) -> str: """ Generate a preview of content. Args: content: Content to preview max_lines: Maximum number of lines to include Returns: Preview of the content """ lines = content.split("\n") if len(lines) <= max_lines: return content # Show first few lines and indicate there's more preview_lines = lines[:max_lines] preview = "\n".join(preview_lines) preview += f"\n... ({len(lines) - max_lines} more lines)" return preview def validate_github_repo(repo: str) -> bool: """ Validate a GitHub repository string. Args: repo: GitHub repository string (username/repo) Returns: True if valid, False otherwise """ if not repo: return False # Check format (username/repo) if not re.match(r'^[a-zA-Z0-9_-]+/[a-zA-Z0-9_.-]+$', repo): return False # Check if the repository exists and has a rules.json file try: url = f"https://raw.githubusercontent.com/{repo}/main/rules.json" response = requests.head(url, timeout=5) return response.status_code == 200 except requests.RequestException: return False def get_project_config_file(project_dir: Path) -> Path: """ Get the path to the project-specific configuration file. Args: project_dir: Path to the project directory Returns: Path to the project configuration file """ return project_dir / ".cursor-rules-cli.json" def load_project_config(project_dir: Path) -> Dict[str, Any]: """ Load configuration from the project-specific config file. Args: project_dir: Path to the project directory Returns: Project configuration dictionary """ config_file = get_project_config_file(project_dir) if not config_file.exists(): return {} try: with open(config_file, 'r') as f: return json.load(f) except (IOError, json.JSONDecodeError) as e: logger.error(f"Failed to load project config file: {e}") return {} def save_project_config(project_dir: Path, config: Dict[str, Any]) -> bool: """ Save configuration to the project-specific config file. Args: project_dir: Path to the project directory config: Configuration dictionary Returns: True if successful, False otherwise """ config_file = get_project_config_file(project_dir) try: with open(config_file, 'w') as f: json.dump(config, f, indent=2) return True except IOError as e: logger.error(f"Failed to save project config file: {e}") return False def merge_configs(global_config: Dict[str, Any], project_config: Dict[str, Any]) -> Dict[str, Any]: """ Merge global and project-specific configurations. Project configuration takes precedence over global configuration for project-specific settings. CLI-wide settings (like custom_repo and source) are always taken from global config. Args: global_config: Global configuration dictionary project_config: Project-specific configuration dictionary Returns: Merged configuration dictionary """ # CLI-wide settings that should only come from global config cli_wide_settings = ["custom_repo", "source"] # Start with a copy of the global config merged = global_config.copy() # Update with project config, but exclude CLI-wide settings project_config_filtered = {k: v for k, v in project_config.items() if k not in cli_wide_settings} merged.update(project_config_filtered) return merged # Default paths DEFAULT_RULES_PATH = Path(__file__).parent.parent / "rules.json" DEFAULT_CACHE_DIR = Path(__file__).parent.parent / ".cache" @lru_cache(maxsize=1) def load_library_data(rules_path: Optional[str] = None) -> Dict[str, Any]: """ Load and cache library data from rules.json. Args: rules_path: Optional path to rules.json Returns: Dictionary of library data """ # If a specific path is provided, use it first if rules_path and Path(rules_path).exists(): logger.debug(f"Using specified rules.json at {rules_path}") path_to_use = Path(rules_path) else: # Search for rules.json in priority order possible_paths = [ Path(__file__).parent.parent / "rules.json", # package root rules.json Path.cwd() / "rules.json" # current directory rules.json ] path_to_use = None for path in possible_paths: if path.exists(): path_to_use = path logger.debug(f"Found rules.json at {path}") break if not path_to_use or not path_to_use.exists(): logger.warning("rules.json not found in any standard location, using default library detection") return {} try: with open(path_to_use, 'r') as f: data = json.load(f) logger.info(f"Successfully loaded rules.json from {path_to_use}") return data except (json.JSONDecodeError, IOError) as e: logger.warning(f"Error loading rules.json from {path_to_use}: {e}") return {} def normalize_library_name(name: str, library_data: Dict[str, Any]) -> str: """ Normalize a library name to match rules.json conventions. Args: name: Library name to normalize library_data: Library data from rules.json Returns: Normalized library name """ if not library_data or "libraries" not in library_data: return name.lower() name_lower = name.lower() lib_map = {lib["name"].lower(): lib["name"] for lib in library_data["libraries"]} # Handle special cases and common aliases special_cases = { "torch": "pytorch", "tf": "tensorflow", "bs4": "beautifulsoup4", "plt": "matplotlib", "np": "numpy", "pd": "pandas" } if name_lower in special_cases and special_cases[name_lower] in lib_map: return lib_map[special_cases[name_lower]] return lib_map.get(name_lower, name_lower) def calculate_library_popularity(lib_name: str, library_data: Dict[str, Any]) -> float: """ Calculate a library's popularity score based on its tags and relationships. Args: lib_name: Library name library_data: Library data from rules.json Returns: Popularity score between 0 and 1 """ if not library_data or "libraries" not in library_data: return 0.5 # Default score lib_map = {lib["name"].lower(): lib for lib in library_data["libraries"]} lib_name_lower = lib_name.lower() if lib_name_lower not in lib_map: return 0.5 lib_info = lib_map[lib_name_lower] tags = lib_info.get("tags", []) # Base score from number of tags (more tags = more versatile) tag_score = min(len(tags) / 10, 0.5) # Up to 0.5 from tags # Additional score from important tags important_tags = {"framework", "language", "major-platform"} tag_importance = sum(0.1 for tag in tags if tag in important_tags) # Calculate related libraries score related_count = sum(1 for lib in library_data["libraries"] if any(tag in lib.get("tags", []) for tag in tags)) relationship_score = min(related_count / len(library_data["libraries"]), 0.3) total_score = tag_score + tag_importance + relationship_score return min(total_score, 1.0) def get_project_context(detected_libs: Set[str], library_data: Dict[str, Any]) -> Dict[str, float]: """ Determine project context based on detected libraries. Args: detected_libs: Set of detected library names library_data: Library data from rules.json Returns: Dictionary of context scores (e.g., {'frontend': 0.8, 'backend': 0.3}) """ contexts = { "frontend": 0.0, "backend": 0.0, "data-science": 0.0, "devops": 0.0, "mobile": 0.0 } if not library_data or "libraries" not in library_data: return contexts lib_map = {lib["name"].lower(): lib for lib in library_data["libraries"]} total_libs = len(detected_libs) if total_libs == 0: return contexts # Context indicators in tags context_tags = { "frontend": {"frontend", "ui", "javascript", "css", "html"}, "backend": {"backend", "api", "server", "database"}, "data-science": {"data-science", "machine-learning", "ai", "analytics"}, "devops": {"devops", "ci-cd", "containerization", "cloud"}, "mobile": {"mobile", "ios", "android", "cross-platform"} } # Calculate scores based on detected libraries for lib in detected_libs: lib_lower = lib.lower() if lib_lower in lib_map: lib_tags = set(lib_map[lib_lower].get("tags", [])) for context, indicators in context_tags.items(): if lib_tags & indicators: # If there's any overlap contexts[context] += 1 / total_libs # Normalize scores max_score = max(contexts.values()) if max_score > 0: contexts = {k: v / max_score for k, v in contexts.items()} return contexts def create_cache_key(*args) -> str: """ Create a cache key from arguments. Args: *args: Arguments to create key from Returns: Cache key string """ key = ":".join(str(arg) for arg in args) # If the key is too long, hash it to avoid file name length issues if len(key) > 100: return hashlib.md5(key.encode()).hexdigest() return key def get_cached_data(cache_key: str) -> Optional[Any]: """ Get data from cache. Args: cache_key: Cache key Returns: Cached data or None if not found """ cache_file = DEFAULT_CACHE_DIR / f"{cache_key}.json" if cache_file.exists(): try: with open(cache_file, 'r') as f: return json.load(f) except (json.JSONDecodeError, IOError): return None return None def set_cached_data(cache_key: str, data: Any) -> bool: """ Save data to cache. Args: cache_key: Cache key data: Data to cache Returns: True if successful, False otherwise """ try: DEFAULT_CACHE_DIR.mkdir(parents=True, exist_ok=True) cache_file = DEFAULT_CACHE_DIR / f"{cache_key}.json" with open(cache_file, 'w') as f: json.dump(data, f) return True except (IOError, OSError): return False if __name__ == "__main__": # For testing logging.basicConfig(level=logging.DEBUG) cursor_dir = get_cursor_dir() rules_dir = get_rules_dir() print(f"Cursor directory: {cursor_dir}") print(f"Rules directory: {rules_dir}") # Test directory creation if ensure_dir_exists(rules_dir): print(f"Created rules directory: {rules_dir}") # Test filename sanitization print(f"Sanitized filename: {sanitize_filename('invalid:file*name.txt')}") ``` ## /pyproject.toml ```toml path="/pyproject.toml" [project] name = "mdc-rules-generator" version = "0.1.0" description = "Generate Cursor MDC rule files from a structured JSON file" authors = [ {name = "Sanjeed", email = "hi@sanjeed.in"}, ] requires-python = ">=3.8" readme = "README.md" license = {text = "MIT"} dependencies = [ "python-dotenv>=1.0.0", "litellm>=1.0.0", "tenacity>=8.2.3", "ratelimit>=2.2.1", "pydantic>=2.0.0", "exa-py>=1.0.0", "pyyaml>=6.0.0", "build>=1.2.2.post1", "twine>=6.1.0", ] [build-system] requires = ["hatchling"] build-backend = "hatchling.build" [tool.hatch.build.targets.wheel] packages = ["src"] ``` ## /requirements.txt litellm>=1.30.3 python-dotenv>=1.0.1 tenacity>=8.2.3 ratelimit>=2.2.1 pydantic>=2.6.1 ## /rules-mdc/actix-web.mdc ```mdc path="/rules-mdc/actix-web.mdc" --- description: Comprehensive best practices for developing robust, efficient, and maintainable applications using the actix-web framework in Rust. This rule covers coding standards, project structure, performance, security, testing, and common pitfalls. globs: **/*.rs --- # Actix-web Best Practices: A Comprehensive Guide This guide provides a comprehensive overview of best practices for developing applications using the actix-web framework in Rust. It covers various aspects of development, including code organization, performance optimization, security, testing, and common pitfalls. ## 1. Code Organization and Structure A well-organized codebase is crucial for maintainability and scalability. Here's how to structure your actix-web project effectively: ### 1.1. Directory Structure Best Practices Adopt a modular and layered architecture. A common and recommended directory structure is as follows: project_root/ ├── src/ │ ├── main.rs # Entry point of the application │ ├── lib.rs # Library file if extracting common functionality │ ├── modules/ │ │ ├── mod.rs # Module declaration │ │ ├── auth/ # Authentication module │ │ │ ├── mod.rs # Auth module declaration │ │ │ ├── models.rs # Auth models │ │ │ ├── routes.rs # Auth routes │ │ │ ├── handlers.rs # Auth handlers │ │ ├── users/ # User management module │ │ │ ├── mod.rs # User module declaration │ │ │ ├── models.rs # User models │ │ │ ├── routes.rs # User routes │ │ │ ├── handlers.rs # User handlers │ ├── models/ # Data models │ │ ├── mod.rs │ │ ├── user.rs # User model │ │ ├── post.rs # Post model │ ├── routes/ # Route configurations │ │ ├── mod.rs │ │ ├── auth_routes.rs # Authentication routes │ │ ├── user_routes.rs # User routes │ ├── handlers/ # Request handlers (controllers) │ │ ├── mod.rs │ │ ├── auth_handlers.rs # Authentication handlers │ │ ├── user_handlers.rs # User handlers │ ├── middleware/ # Custom middleware components │ │ ├── mod.rs │ │ ├── logger.rs # Logging middleware │ │ ├── auth.rs # Authentication middleware │ ├── utils/ # Utility functions and modules │ │ ├── mod.rs │ │ ├── db.rs # Database connection utility │ ├── errors/ # Custom error definitions │ │ ├── mod.rs │ │ ├── app_error.rs # Application-specific error types ├── tests/ # Integration and unit tests │ ├── mod.rs │ ├── api_tests.rs # Integration tests for API endpoints ├── .env # Environment variables ├── Cargo.toml # Project dependencies and metadata ├── Cargo.lock # Dependency lockfile ### 1.2. File Naming Conventions * **Modules:** Use lowercase, descriptive names (e.g., `auth`, `users`, `posts`). * **Files:** Use lowercase with underscores (e.g., `user_routes.rs`, `auth_handlers.rs`). * **Models:** Use singular nouns (e.g., `user.rs`, `post.rs`). * **Handlers:** Use descriptive names indicating the action performed (e.g., `create_user`, `get_user`). * **Routes:** Use names indicating the resource they handle (e.g., `user_routes`, `auth_routes`). ### 1.3. Module Organization * **Explicit Module Declarations:** Always declare submodules in `mod.rs` files. This ensures proper module resolution and prevents naming conflicts. * **Clear Boundaries:** Each module should have a well-defined responsibility. Avoid mixing unrelated functionalities within the same module. * **Public vs. Private:** Use `pub` keyword judiciously to control visibility. Keep implementation details private to modules to prevent accidental external dependencies. ### 1.4. Component Architecture * **Layered Architecture:** Separate concerns into distinct layers (e.g., data access, business logic, presentation). This improves testability and maintainability. * **Dependency Injection:** Use dependency injection to provide dependencies to handlers. This makes it easier to test and configure your application. * **Services:** Encapsulate business logic into services. Handlers should primarily focus on request/response handling and delegate business logic to services. ### 1.5. Code Splitting Strategies * **Feature-Based Splitting:** Group code based on features (e.g., authentication, user management). This makes it easier to understand and maintain related code. * **Module-Based Splitting:** Split code into modules based on functionality. This improves code organization and reusability. * **Lazy Loading (Future Enhancement):** For very large applications, consider lazy loading modules or features to reduce initial startup time. This can be accomplished by dynamically enabling parts of your application based on configuration or runtime conditions. ## 2. Common Patterns and Anti-patterns ### 2.1. Design Patterns Specific to Actix-web * **Extractor Pattern:** Use extractors to handle different types of incoming data (e.g., `Path`, `Query`, `Json`, `Form`). Extractors simplify handler logic and provide type safety. * **Middleware Pattern:** Implement custom middleware for tasks like logging, authentication, and request modification. Middleware allows you to intercept and process requests before they reach the handlers. * **State Management Pattern:** Use `web::Data` to share application state across handlers. This provides a thread-safe way to access shared resources like database connections and configuration settings. * **Error Handling Pattern:** Define custom error types and implement the `ResponseError` trait for centralized error handling and consistent error responses. ### 2.2. Recommended Approaches for Common Tasks * **Database Integration:** Use an asynchronous database driver like `tokio-postgres` or `sqlx` for efficient database interactions. * **Authentication:** Implement authentication using JWT (JSON Web Tokens) or other secure authentication mechanisms. * **Authorization:** Implement role-based access control (RBAC) or attribute-based access control (ABAC) to restrict access to resources based on user roles or attributes. * **Logging:** Use a logging framework like `tracing` or `log` for structured logging and monitoring. * **Configuration Management:** Use a configuration library like `config` or `dotenv` to manage application settings from environment variables and configuration files. ### 2.3. Anti-patterns and Code Smells to Avoid * **Long Handler Functions:** Keep handler functions short and focused. Delegate complex logic to services or helper functions. * **Tight Coupling:** Avoid tight coupling between modules. Use interfaces and dependency injection to decouple components. * **Ignoring Errors:** Always handle errors gracefully and provide informative error messages to the client. * **Blocking Operations in Handlers:** Avoid performing blocking operations (e.g., synchronous I/O) in handler functions. Use asynchronous operations to prevent blocking the event loop. * **Overusing Global State:** Minimize the use of global state. Prefer passing state as dependencies to handler functions. ### 2.4. State Management Best Practices * **Immutable State:** Prefer immutable state whenever possible. This reduces the risk of race conditions and makes it easier to reason about the application. * **Thread-Safe Data Structures:** Use thread-safe data structures like `Arc>` or `RwLock` to share mutable state across threads. * **Avoid Direct Mutability:** Avoid directly mutating shared state. Instead, use atomic operations or message passing to coordinate state updates. * **Dependency Injection:** Use dependency injection to provide state to handler functions. This makes it easier to test and configure the application. ### 2.5. Error Handling Patterns * **Custom Error Types:** Define custom error types to represent different error scenarios in your application. * **`ResponseError` Trait:** Implement the `ResponseError` trait for custom error types to generate appropriate HTTP responses. * **Centralized Error Handling:** Use a centralized error handling mechanism (e.g., middleware) to catch and process errors consistently. * **Informative Error Messages:** Provide informative error messages to the client to help them understand and resolve the issue. * **Logging Errors:** Log errors with sufficient detail to help diagnose and debug issues. * **`Result` Type:** Leverage the `Result` type effectively, propagating errors up the call stack using the `?` operator, and handle them at the appropriate level. ## 3. Performance Considerations Optimizing performance is crucial for building scalable and responsive actix-web applications. ### 3.1. Optimization Techniques * **Asynchronous Operations:** Use asynchronous operations for I/O-bound tasks (e.g., database access, network requests) to prevent blocking the event loop. * **Connection Pooling:** Use connection pooling for database connections to reduce the overhead of establishing new connections. * **Caching:** Implement caching for frequently accessed data to reduce database load and improve response times. * **Compression:** Enable compression (e.g., gzip) for responses to reduce the amount of data transmitted over the network. * **Keep-Alive Connections:** Use keep-alive connections to reuse existing TCP connections and reduce connection establishment overhead. ### 3.2. Memory Management * **Avoid Unnecessary Cloning:** Minimize cloning of data to reduce memory allocations and copying. * **Use References:** Use references instead of copying data whenever possible. * **Smart Pointers:** Use smart pointers (e.g., `Box`, `Arc`, `Rc`) to manage memory efficiently. * **String Handling:** Be mindful of string handling. Use `String` when ownership is needed, and `&str` when a read-only view is sufficient. ### 3.3. Rendering Optimization * **Template Caching:** Cache templates to reduce the overhead of parsing and compiling templates on each request. * **Minimize DOM Updates:** Minimize DOM updates in the client-side JavaScript code to improve rendering performance. * **Efficient Serialization:** Ensure your data serialization is efficient, using appropriate data structures and serialization libraries (e.g., `serde_json`). ### 3.4. Bundle Size Optimization * **Dependency Pruning:** Remove unused dependencies from your `Cargo.toml` file to reduce the bundle size. * **Feature Flags:** Use feature flags to enable or disable optional features at compile time. * **Code Minification:** Use code minification to reduce the size of your JavaScript and CSS files. ### 3.5. Lazy Loading Strategies * **Lazy Initialization:** Use lazy initialization for expensive resources to defer their creation until they are actually needed. * **On-Demand Loading:** Load resources on demand (e.g., images, data) to reduce the initial load time. ## 4. Security Best Practices Security is paramount for building robust and reliable actix-web applications. ### 4.1. Common Vulnerabilities and How to Prevent Them * **SQL Injection:** Use parameterized queries or ORMs to prevent SQL injection attacks. * **Cross-Site Scripting (XSS):** Sanitize user input and escape output to prevent XSS attacks. * **Cross-Site Request Forgery (CSRF):** Implement CSRF protection to prevent unauthorized requests from other websites. * **Authentication and Authorization:** Use strong authentication and authorization mechanisms to protect sensitive resources. * **Denial-of-Service (DoS):** Implement rate limiting and other defense mechanisms to prevent DoS attacks. ### 4.2. Input Validation * **Validate All Input:** Validate all user input to ensure that it conforms to the expected format and range. * **Use Type Safety:** Use type safety to prevent invalid data from being processed. * **Regular Expressions:** Use regular expressions to validate complex input patterns. * **Whitelist vs. Blacklist:** Prefer whitelisting valid input over blacklisting invalid input. ### 4.3. Authentication and Authorization Patterns * **JWT (JSON Web Tokens):** Use JWT for stateless authentication and authorization. * **OAuth 2.0:** Use OAuth 2.0 for delegated authorization. * **RBAC (Role-Based Access Control):** Use RBAC to restrict access to resources based on user roles. * **ABAC (Attribute-Based Access Control):** Use ABAC to restrict access to resources based on user attributes. * **Password Hashing:** Always hash passwords using a strong hashing algorithm (e.g., bcrypt, Argon2) and store them securely. ### 4.4. Data Protection Strategies * **Encryption:** Encrypt sensitive data at rest and in transit. * **Data Masking:** Mask sensitive data to prevent unauthorized access. * **Data Anonymization:** Anonymize data to protect user privacy. * **Access Control:** Implement strict access control policies to restrict access to sensitive data. ### 4.5. Secure API Communication * **HTTPS:** Use HTTPS for all API communication to encrypt data in transit. * **TLS Certificates:** Use valid TLS certificates from a trusted certificate authority. * **API Keys:** Use API keys to authenticate API clients. * **Rate Limiting:** Implement rate limiting to prevent abuse and DoS attacks. ## 5. Testing Approaches Thorough testing is essential for ensuring the quality and reliability of actix-web applications. ### 5.1. Unit Testing Strategies * **Test Individual Modules:** Unit test individual modules and functions in isolation. * **Mock Dependencies:** Use mocking to isolate units from external dependencies (e.g., database, API). * **Test Edge Cases:** Test edge cases and boundary conditions to ensure that the code handles them correctly. * **Table-Driven Tests:** Use table-driven tests to test multiple scenarios with different inputs and expected outputs. ### 5.2. Integration Testing * **Test API Endpoints:** Integration test API endpoints to ensure that they function correctly together. * **Test Database Interactions:** Test database interactions to ensure that data is read and written correctly. * **Test Middleware:** Test middleware to ensure that they correctly process requests and responses. ### 5.3. End-to-End Testing * **Simulate User Interactions:** End-to-end tests simulate user interactions to test the entire application flow. * **Use a Testing Framework:** Use a testing framework (e.g., Selenium, Cypress) to automate end-to-end tests. ### 5.4. Test Organization * **Test Directory:** Keep tests in a separate `tests` directory. * **Test Modules:** Organize tests into modules that mirror the application structure. * **Test Naming:** Use descriptive names for test functions to indicate what they are testing. ### 5.5. Mocking and Stubbing * **Mock External Dependencies:** Mock external dependencies (e.g., database, API) to isolate units from external factors. * **Use Mocking Libraries:** Use mocking libraries (e.g., `mockall`) to create mock objects and define their behavior. * **Stub Data:** Use stub data to simulate different scenarios and test edge cases. ## 6. Common Pitfalls and Gotchas Be aware of common pitfalls and gotchas when developing actix-web applications. ### 6.1. Frequent Mistakes Developers Make * **Blocking Operations:** Performing blocking operations in handler functions can block the event loop and degrade performance. * **Incorrect Error Handling:** Ignoring errors or not handling them correctly can lead to unexpected behavior and security vulnerabilities. * **Not Validating Input:** Not validating user input can lead to security vulnerabilities and data corruption. * **Overusing Global State:** Overusing global state can make the application difficult to reason about and test. * **Not Using Asynchronous Operations:** Not using asynchronous operations for I/O-bound tasks can degrade performance. ### 6.2. Edge Cases to be Aware Of * **Handling Large Requests:** Be mindful of handling large requests and implement appropriate size limits to prevent DoS attacks. * **Handling Concurrent Requests:** Ensure that the application can handle concurrent requests efficiently and without race conditions. * **Handling Network Errors:** Handle network errors gracefully and provide informative error messages to the client. * **Handling Database Connection Errors:** Handle database connection errors gracefully and implement retry mechanisms. ### 6.3. Version-Specific Issues * **Breaking Changes:** Be aware of breaking changes in actix-web and its dependencies. * **Deprecated Features:** Avoid using deprecated features and migrate to the recommended alternatives. * **Compatibility:** Ensure that the application is compatible with the target Rust version and operating system. ### 6.4. Compatibility Concerns * **Rust Version:** Ensure compatibility with the supported Rust versions. * **Operating System:** Test on different operating systems (Linux, macOS, Windows). * **Browser Compatibility (if applicable):** If the application includes a front-end, test with various browsers. ### 6.5. Debugging Strategies * **Logging:** Use logging to track the application's execution flow and identify potential issues. * **Debugging Tools:** Use debugging tools (e.g., `gdb`, `lldb`) to inspect the application's state and step through the code. * **Unit Tests:** Write unit tests to isolate and debug individual modules and functions. * **Profiling:** Use profiling tools to identify performance bottlenecks. ## 7. Tooling and Environment Using the right tools and environment can significantly improve the development experience and productivity. ### 7.1. Recommended Development Tools * **Rust IDE:** Use a Rust IDE (e.g., Visual Studio Code with the Rust extension, IntelliJ Rust) for code completion, syntax highlighting, and debugging. * **Cargo:** Use Cargo, the Rust package manager, for managing dependencies and building the application. * **Rustup:** Use Rustup for managing Rust toolchains and versions. * **Clippy:** Use Clippy, a Rust linter, for identifying potential code quality issues. * **Formatter:** Use rustfmt to automatically format the code according to the Rust style guide. ### 7.2. Build Configuration * **Release Mode:** Build the application in release mode for optimized performance. * **Link-Time Optimization (LTO):** Enable link-time optimization to improve performance. * **Codegen Units:** Experiment with different codegen unit settings to optimize compilation time and code size. ### 7.3. Linting and Formatting * **Clippy:** Use Clippy to identify potential code quality issues and enforce coding standards. * **Rustfmt:** Use rustfmt to automatically format the code according to the Rust style guide. * **Pre-commit Hooks:** Use pre-commit hooks to automatically run Clippy and rustfmt before committing changes. ### 7.4. Deployment Best Practices * **Containerization:** Use containerization (e.g., Docker) to package the application and its dependencies into a portable container. * **Orchestration:** Use container orchestration (e.g., Kubernetes) to manage and scale the application. * **Reverse Proxy:** Use a reverse proxy (e.g., Nginx, Apache) to handle incoming requests and route them to the application. * **Load Balancing:** Use load balancing to distribute traffic across multiple instances of the application. * **Monitoring:** Implement monitoring to track the application's health and performance. ### 7.5. CI/CD Integration * **Continuous Integration (CI):** Use a CI system (e.g., GitHub Actions, GitLab CI, Jenkins) to automatically build, test, and lint the code on every commit. * **Continuous Delivery (CD):** Use a CD system to automatically deploy the application to production after it passes all tests. * **Automated Testing:** Automate unit, integration, and end-to-end tests in the CI/CD pipeline. By following these best practices, you can build robust, efficient, and maintainable actix-web applications that meet the highest standards of quality and security. Remember to stay up-to-date with the latest recommendations and adapt them to your specific project needs. ``` ## /rules-mdc/aiohttp.mdc ```mdc path="/rules-mdc/aiohttp.mdc" --- description: Comprehensive guide for aiohttp development covering code organization, performance, security, testing, and deployment best practices. Provides actionable guidance for developers to build robust and maintainable aiohttp applications. globs: **/*.py --- # Aiohttp Best Practices This document provides a comprehensive guide to aiohttp development, covering code organization, performance, security, testing, and deployment. Library Information: - Name: aiohttp - Tags: web, python, http-client, async ## 1. Code Organization and Structure ### 1.1. Directory Structure Best Practices: * **Project Root:** * `src/`: Contains the main application code. * `main.py`: Entry point of the application. * `app.py`: Application factory and setup. * `routes.py`: Defines application routes. * `handlers/`: Contains request handlers. * `user_handlers.py`: User-related handlers. * `product_handlers.py`: Product-related handlers. * `middlewares/`: Custom middleware components. * `logging_middleware.py`: Logging middleware. * `auth_middleware.py`: Authentication middleware. * `utils/`: Utility modules. * `db.py`: Database connection and utilities. * `config.py`: Configuration management. * `tests/`: Contains unit and integration tests. * `conftest.py`: Pytest configuration file. * `unit/`: Unit tests. * `integration/`: Integration tests. * `static/`: Static files (CSS, JavaScript, images). * `templates/`: Jinja2 or other template files. * `docs/`: Project documentation. * `requirements.txt`: Python dependencies. * `Dockerfile`: Docker configuration file. * `docker-compose.yml`: Docker Compose configuration. * `.env`: Environment variables. * `README.md`: Project description and instructions. * `.gitignore`: Specifies intentionally untracked files that Git should ignore. * `.cursor/rules/`: Project specific Cursor AI rules. ### 1.2. File Naming Conventions: * Python files: `snake_case.py` (e.g., `user_handlers.py`, `database_utils.py`). * Class names: `CamelCase` (e.g., `UserHandler`, `DatabaseConnection`). * Function names: `snake_case` (e.g., `get_user`, `create_product`). * Variables: `snake_case` (e.g., `user_id`, `product_name`). * Constants: `UPPER_SNAKE_CASE` (e.g., `DEFAULT_PORT`, `MAX_CONNECTIONS`). ### 1.3. Module Organization: * Group related functionality into modules. * Use clear and descriptive module names. * Avoid circular dependencies. * Keep modules focused and concise. * Use packages to organize modules into a hierarchical structure. ### 1.4. Component Architecture: * **Layered Architecture:** Separate the application into distinct layers (e.g., presentation, business logic, data access). * **Microservices Architecture:** Decompose the application into small, independent services. * **Hexagonal Architecture (Ports and Adapters):** Decouple the application core from external dependencies. * **MVC (Model-View-Controller):** Organize the application into models (data), views (presentation), and controllers (logic). ### 1.5. Code Splitting Strategies: * **Route-based splitting:** Load modules based on the requested route. * **Feature-based splitting:** Divide the application into feature modules. * **Component-based splitting:** Split the application into reusable components. * **On-demand loading:** Load modules only when they are needed. * **Asynchronous loading:** Use `asyncio.gather` or similar techniques to load modules concurrently. ## 2. Common Patterns and Anti-patterns ### 2.1. Design Patterns: * **Singleton:** For managing shared resources like database connections or configuration objects. * **Factory:** For creating instances of classes with complex initialization logic. * **Strategy:** For implementing different algorithms or behaviors. * **Observer:** For implementing event-driven systems. * **Middleware:** For handling cross-cutting concerns like logging, authentication, and error handling. ### 2.2. Recommended Approaches for Common Tasks: * **Request Handling:** Use request handlers to process incoming requests. * **Routing:** Use `aiohttp.web.RouteTableDef` for defining routes. * **Middleware:** Implement middleware for request pre-processing and response post-processing. * **Data Serialization:** Use `aiohttp.web.json_response` for serializing data to JSON. * **Error Handling:** Implement custom error handlers to handle exceptions gracefully. * **Session Management:** Use `aiohttp-session` for managing user sessions. * **WebSockets:** Utilize `aiohttp.web.WebSocketResponse` for handling WebSocket connections. ### 2.3. Anti-patterns and Code Smells: * **Creating a new `ClientSession` for each request:** This is a performance bottleneck. Reuse a single `ClientSession`. * **Blocking operations in asynchronous code:** Avoid using blocking operations (e.g., `time.sleep`) in asynchronous code. * **Ignoring exceptions:** Always handle exceptions properly to prevent unexpected behavior. * **Overusing global variables:** Avoid using global variables as much as possible to maintain code clarity and testability. * **Tight coupling:** Decouple components to improve maintainability and reusability. * **Hardcoding configuration:** Use environment variables or configuration files to manage configuration settings. ### 2.4. State Management: * **Application State:** Store application-level state in the `aiohttp.web.Application` instance. * **Request State:** Store request-specific state in the `aiohttp.web.Request` instance. * **Session State:** Use `aiohttp-session` to manage user session data. * **Database:** Use a database like PostgreSQL, MySQL, or MongoDB to store persistent state. * **Redis/Memcached:** Use in-memory data stores for caching frequently accessed data. ### 2.5. Error Handling: * **Use `try-except` blocks:** Wrap code that may raise exceptions in `try-except` blocks. * **Handle specific exceptions:** Catch specific exception types instead of using a generic `except Exception`. * **Log exceptions:** Log exceptions with detailed information for debugging. * **Return informative error responses:** Return appropriate HTTP status codes and error messages to the client. * **Implement custom error handlers:** Create custom error handlers to handle specific exception types. * **Use `aiohttp.web.HTTPException`:** Raise `aiohttp.web.HTTPException` to return HTTP error responses. ## 3. Performance Considerations ### 3.1. Optimization Techniques: * **Reuse `ClientSession`:** Always reuse a single `ClientSession` instance for making multiple requests. * **Connection Pooling:** aiohttp automatically uses connection pooling, so reuse your session. * **Keep-Alive Connections:** Keep-alive connections are enabled by default, reducing connection overhead. * **Gzip Compression:** Enable Gzip compression for responses to reduce bandwidth usage. * **Caching:** Implement caching for frequently accessed data to reduce database load. * **Optimize Database Queries:** Optimize database queries to improve response times. * **Use Indexes:** Use indexes in your database tables to speed up queries. * **Limit Payload Size:** Keep request and response payloads as small as possible. * **Background Tasks:** Use `asyncio.create_task` to offload long-running tasks to the background. * **Profiling:** Use profiling tools to identify performance bottlenecks. ### 3.2. Memory Management: * **Avoid Memory Leaks:** Ensure that all resources are properly released to prevent memory leaks. * **Use Generators:** Use generators to process large datasets in chunks. * **Limit Object Creation:** Minimize the creation of objects to reduce memory overhead. * **Use Data Structures Efficiently:** Choose appropriate data structures to optimize memory usage. * **Garbage Collection:** Understand how Python's garbage collection works and optimize your code accordingly. ### 3.3. Rendering Optimization: * **Template Caching:** Cache templates to reduce rendering time. * **Minimize Template Logic:** Keep template logic simple and move complex logic to request handlers. * **Use Asynchronous Templates:** Use asynchronous template engines like `aiohttp-jinja2`. * **Optimize Static Files:** Optimize static files (CSS, JavaScript, images) to reduce page load times. ### 3.4. Bundle Size Optimization: * **Minimize Dependencies:** Reduce the number of dependencies in your project. * **Tree Shaking:** Use tree shaking to remove unused code from your bundles. * **Code Minification:** Minify your code to reduce bundle sizes. * **Code Compression:** Compress your code to further reduce bundle sizes. ### 3.5. Lazy Loading: * **Lazy Load Modules:** Load modules only when they are needed. * **Lazy Load Images:** Load images only when they are visible in the viewport. * **Use Asynchronous Loading:** Use `asyncio.gather` or similar techniques to load resources concurrently. ## 4. Security Best Practices ### 4.1. Common Vulnerabilities: * **SQL Injection:** Prevent SQL injection by using parameterized queries or an ORM. * **Cross-Site Scripting (XSS):** Prevent XSS by escaping user input in templates. * **Cross-Site Request Forgery (CSRF):** Prevent CSRF by using CSRF tokens. * **Authentication and Authorization Issues:** Implement secure authentication and authorization mechanisms. * **Denial-of-Service (DoS) Attacks:** Implement rate limiting and other measures to prevent DoS attacks. * **Insecure Dependencies:** Keep your dependencies up to date to prevent vulnerabilities. ### 4.2. Input Validation: * **Validate all user input:** Validate all user input to prevent malicious data from entering your application. * **Use a validation library:** Use a validation library like `marshmallow` or `voluptuous` to simplify input validation. * **Sanitize user input:** Sanitize user input to remove potentially harmful characters. * **Limit input length:** Limit the length of input fields to prevent buffer overflows. * **Use regular expressions:** Use regular expressions to validate input patterns. ### 4.3. Authentication and Authorization: * **Use a strong authentication scheme:** Use a strong authentication scheme like OAuth 2.0 or JWT. * **Store passwords securely:** Store passwords securely using a hashing algorithm like bcrypt. * **Implement role-based access control (RBAC):** Use RBAC to control access to resources based on user roles. * **Use secure cookies:** Use secure cookies to protect session data. * **Implement multi-factor authentication (MFA):** Use MFA to add an extra layer of security. ### 4.4. Data Protection: * **Encrypt sensitive data:** Encrypt sensitive data at rest and in transit. * **Use HTTPS:** Use HTTPS to encrypt communication between the client and the server. * **Store data securely:** Store data in a secure location with appropriate access controls. * **Regularly back up data:** Regularly back up data to prevent data loss. * **Comply with data privacy regulations:** Comply with data privacy regulations like GDPR and CCPA. ### 4.5. Secure API Communication: * **Use HTTPS:** Always use HTTPS for API communication. * **Implement API authentication:** Use API keys or tokens to authenticate API requests. * **Rate limit API requests:** Implement rate limiting to prevent abuse. * **Validate API requests:** Validate API requests to prevent malicious data from entering your application. * **Log API requests:** Log API requests for auditing and debugging. ## 5. Testing Approaches ### 5.1. Unit Testing: * **Test individual components:** Unit tests should test individual components in isolation. * **Use a testing framework:** Use a testing framework like `pytest` or `unittest`. * **Write clear and concise tests:** Write clear and concise tests that are easy to understand. * **Test edge cases:** Test edge cases and boundary conditions. * **Use mocks and stubs:** Use mocks and stubs to isolate components under test. ### 5.2. Integration Testing: * **Test interactions between components:** Integration tests should test interactions between different components. * **Test with real dependencies:** Integration tests should use real dependencies whenever possible. * **Test the entire application flow:** Integration tests should test the entire application flow. * **Use a testing database:** Use a testing database to isolate integration tests from the production database. ### 5.3. End-to-End Testing: * **Test the entire system:** End-to-end tests should test the entire system from end to end. * **Use a testing environment:** Use a testing environment that mimics the production environment. * **Automate end-to-end tests:** Automate end-to-end tests to ensure that the system is working correctly. * **Use a browser automation tool:** Use a browser automation tool like Selenium or Puppeteer. ### 5.4. Test Organization: * **Organize tests by module:** Organize tests by module to improve test discovery and maintainability. * **Use descriptive test names:** Use descriptive test names that clearly indicate what the test is verifying. * **Use test fixtures:** Use test fixtures to set up and tear down test environments. * **Use test markers:** Use test markers to categorize tests and run specific test suites. ### 5.5. Mocking and Stubbing: * **Use mocks to simulate dependencies:** Use mocks to simulate the behavior of dependencies. * **Use stubs to provide predefined responses:** Use stubs to provide predefined responses to API calls. * **Use mocking libraries:** Use mocking libraries like `unittest.mock` or `pytest-mock`. * **Avoid over-mocking:** Avoid over-mocking, as it can make tests less reliable. ## 6. Common Pitfalls and Gotchas ### 6.1. Frequent Mistakes: * **Not handling exceptions properly:** Always handle exceptions to prevent unexpected behavior. * **Using blocking operations in asynchronous code:** Avoid using blocking operations in asynchronous code. * **Not closing `ClientSession`:** Always close the `ClientSession` to release resources. * **Not validating user input:** Always validate user input to prevent security vulnerabilities. * **Not using HTTPS:** Always use HTTPS for secure communication. ### 6.2. Edge Cases: * **Handling timeouts:** Implement proper timeout handling to prevent requests from hanging indefinitely. * **Handling connection errors:** Handle connection errors gracefully to prevent application crashes. * **Handling large payloads:** Handle large payloads efficiently to prevent memory issues. * **Handling concurrent requests:** Handle concurrent requests properly to prevent race conditions. * **Handling Unicode encoding:** Be aware of Unicode encoding issues when processing text data. ### 6.3. Version-Specific Issues: * **aiohttp version compatibility:** Be aware of compatibility issues between different aiohttp versions. * **asyncio version compatibility:** Be aware of compatibility issues between aiohttp and different asyncio versions. * **Python version compatibility:** Be aware of compatibility issues between aiohttp and different Python versions. ### 6.4. Compatibility Concerns: * **Compatibility with other libraries:** Be aware of compatibility issues between aiohttp and other libraries. * **Compatibility with different operating systems:** Be aware of compatibility issues between aiohttp and different operating systems. * **Compatibility with different web servers:** Be aware of compatibility issues between aiohttp and different web servers. ### 6.5. Debugging Strategies: * **Use logging:** Use logging to track application behavior and identify issues. * **Use a debugger:** Use a debugger to step through code and examine variables. * **Use a profiler:** Use a profiler to identify performance bottlenecks. * **Use error reporting tools:** Use error reporting tools to track and fix errors in production. * **Use a network analyzer:** Use a network analyzer like Wireshark to capture and analyze network traffic. ## 7. Tooling and Environment ### 7.1. Recommended Development Tools: * **IDE:** Use an IDE like VS Code, PyCharm, or Sublime Text. * **Virtual Environment:** Use a virtual environment to isolate project dependencies. * **Package Manager:** Use a package manager like pip or poetry to manage dependencies. * **Testing Framework:** Use a testing framework like pytest or unittest. * **Linting Tool:** Use a linting tool like pylint or flake8 to enforce code style. * **Formatting Tool:** Use a formatting tool like black or autopep8 to format code automatically. ### 7.2. Build Configuration: * **Use a build system:** Use a build system like Make or tox to automate build tasks. * **Define dependencies in `requirements.txt` or `pyproject.toml`:** Specify all project dependencies in a `requirements.txt` or `pyproject.toml` file. * **Use a Dockerfile:** Use a Dockerfile to create a containerized build environment. * **Use Docker Compose:** Use Docker Compose to manage multi-container applications. ### 7.3. Linting and Formatting: * **Use a consistent code style:** Use a consistent code style throughout the project. * **Configure linting tools:** Configure linting tools to enforce code style rules. * **Configure formatting tools:** Configure formatting tools to format code automatically. * **Use pre-commit hooks:** Use pre-commit hooks to run linters and formatters before committing code. ### 7.4. Deployment: * **Use a web server:** Use a web server like Nginx or Apache to serve the application. * **Use a process manager:** Use a process manager like Supervisor or systemd to manage the application process. * **Use a reverse proxy:** Use a reverse proxy to improve security and performance. * **Use a load balancer:** Use a load balancer to distribute traffic across multiple servers. * **Use a monitoring system:** Use a monitoring system to track application health and performance. * **Standalone Server:** aiohttp.web.run_app(), simple but doesn't utilize all CPU cores. * **Nginx + Supervisord:** Nginx prevents attacks, allows utilizing all CPU cores, and serves static files faster. * **Nginx + Gunicorn:** Gunicorn launches the app as worker processes, simplifying deployment compared to bare Nginx. ### 7.5. CI/CD Integration: * **Use a CI/CD pipeline:** Use a CI/CD pipeline to automate the build, test, and deployment process. * **Use a CI/CD tool:** Use a CI/CD tool like Jenkins, GitLab CI, or GitHub Actions. * **Run tests in the CI/CD pipeline:** Run tests in the CI/CD pipeline to ensure that code changes don't break the application. * **Automate deployment:** Automate deployment to reduce manual effort and improve consistency. ## Additional Best Practices: * **Session Management:** Always create a `ClientSession` for making requests and reuse it across multiple requests to benefit from connection pooling. Avoid creating a new session for each request, as this can lead to performance issues. * **Error Handling:** Implement robust error handling in your request handlers. Use try-except blocks to manage exceptions, particularly for network-related errors. For example, handle `ConnectionResetError` to manage client disconnections gracefully. * **Middleware Usage:** Utilize middleware for cross-cutting concerns such as logging, error handling, and modifying requests/responses. Define middleware functions that accept a request and a handler, allowing you to process requests before they reach your main handler. * **Graceful Shutdown:** Implement graceful shutdown procedures for your server to ensure that ongoing requests are completed before the application exits. This can be achieved by registering shutdown signals and cleaning up resources. * **Security Practices:** When deploying, consider using a reverse proxy like Nginx for added security and performance. Configure SSL/TLS correctly to secure your application. * **Character Set Detection:** If a response does not include the charset needed to decode the body, use `ClientSession` accepts a `fallback_charset_resolver` parameter which can be used to introduce charset guessing functionality. * **Persistent Session:** Use `Cleanup Context` when creating a persistent session. By adhering to these practices, developers can enhance the reliability, performance, and security of their `aiohttp` applications. ``` ## /rules-mdc/amazon-ec2.mdc ```mdc path="/rules-mdc/amazon-ec2.mdc" --- description: This rule file provides best practices, coding standards, and security guidelines for developing, deploying, and maintaining applications using the amazon-ec2 library within the AWS ecosystem. It focuses on infrastructure as code (IaC), resource management, performance, and security considerations for robust and scalable EC2-based solutions. globs: **/*.{tf,json,yml,yaml,py,js,ts,sh,java,go,rb,m} --- - ## General Principles - **Infrastructure as Code (IaC):** Treat your infrastructure as code. Define and provision AWS resources (EC2 instances, security groups, networks) using code (e.g., AWS CloudFormation, AWS CDK, Terraform). This ensures consistency, repeatability, and version control. - **Security First:** Integrate security best practices into every stage of development, from IaC template creation to instance configuration. Implement the principle of least privilege, regularly patch instances, and utilize security assessment tools. - **Modularity and Reusability:** Design your infrastructure and application code in modular components that can be reused across multiple projects or environments. - **Automation:** Automate as much of the infrastructure provisioning, deployment, and management processes as possible. Use CI/CD pipelines for automated testing and deployment. - **Monitoring and Logging:** Implement comprehensive monitoring and logging to track the health, performance, and security of your EC2 instances and applications. - ## 1. Code Organization and Structure - **Directory Structure Best Practices:** - Adopt a logical directory structure that reflects the architecture of your application and infrastructure. - Example: project-root/ ├── modules/ # Reusable infrastructure modules (e.g., VPC, security groups) │ ├── vpc/ # VPC module │ │ ├── main.tf # Terraform configuration for the VPC │ │ ├── variables.tf # Input variables for the VPC module │ │ ├── outputs.tf # Output values for the VPC module │ ├── security_group/ # Security Group module │ │ ├── ... ├── environments/ # Environment-specific configurations │ ├── dev/ # Development environment │ │ ├── main.tf # Terraform configuration for the Dev environment │ │ ├── variables.tf # Environment specific variables │ ├── prod/ # Production environment │ │ ├── ... ├── scripts/ # Utility scripts (e.g., deployment scripts, automation scripts) │ ├── deploy.sh # Deployment script │ ├── update_ami.py # Python script to update AMI ├── application/ # Application code │ ├── src/ # Source code │ ├── tests/ # Unit and integration tests ├── README.md └── ... - **File Naming Conventions:** - Use consistent and descriptive file names. - Examples: - `main.tf`: Main Terraform configuration file - `variables.tf`: Terraform variables file - `outputs.tf`: Terraform output values file - `deploy.sh`: Deployment script - `instance.py`: Python module for instance management - **Module Organization:** - Encapsulate reusable infrastructure components into modules (e.g., VPC, security groups, load balancers). - Each module should have: - A clear purpose. - Well-defined input variables and output values. - Comprehensive documentation. - Keep modules small and focused. - **Component Architecture:** - Design your application as a collection of loosely coupled components. - Each component should have: - A well-defined interface. - Clear responsibilities. - Independent deployment lifecycle. - **Code Splitting:** - Break down large application codebases into smaller, manageable modules. - Use lazy loading to load modules on demand, reducing initial load time. - Example (Python): python # main.py import importlib def load_module(module_name): module = importlib.import_module(module_name) return module # Load the module when needed my_module = load_module('my_module') my_module.my_function() - ## 2. Common Patterns and Anti-patterns - **Design Patterns:** - **Singleton:** Use when exactly one instance of a class is needed (e.g., a configuration manager). - **Factory:** Use to create objects without specifying their concrete classes (e.g., creating different types of EC2 instances). - **Strategy:** Use to define a family of algorithms, encapsulate each one, and make them interchangeable (e.g., different instance termination strategies). - **Common Tasks:** - **Creating an EC2 Instance (AWS CLI):** bash aws ec2 run-instances --image-id ami-xxxxxxxxxxxxxxxxx --instance-type t2.micro --key-name MyKeyPair --security-group-ids sg-xxxxxxxxxxxxxxxxx - **Creating an EC2 Instance (AWS CDK): typescript import * as ec2 from 'aws-cdk-lib/aws-ec2'; const vpc = new ec2.Vpc(this, 'TheVPC', { maxAzs: 3 }); const instance = new ec2.Instance(this, 'EC2Instance', { vpc: vpc, instanceType: ec2.InstanceType.of(ec2.InstanceClass.T2, ec2.InstanceSize.MICRO), machineImage: new ec2.AmazonLinux2023Image({generation: ec2.AmazonLinuxGeneration.AMAZON_LINUX_2}), }); - **Attaching an EBS Volume:** - Ensure the EBS volume is in the same Availability Zone as the EC2 instance. - Use the `aws ec2 attach-volume` command or the equivalent SDK call. - **Anti-patterns:** - **Hardcoding AWS Credentials:** Never hardcode AWS credentials in your code. Use IAM roles for EC2 instances and IAM users with restricted permissions for local development. - **Creating Publicly Accessible S3 Buckets:** Avoid creating S3 buckets that are publicly accessible without proper security controls. - **Ignoring Error Handling:** Always handle exceptions and errors gracefully. Provide meaningful error messages and logging. - **Over-Permissive Security Groups:** Implement the principle of least privilege. Grant only the minimum necessary permissions to your security groups. - **State Management:** - Use state files (e.g., Terraform state) to track the current state of your infrastructure. - Store state files securely (e.g., in an S3 bucket with encryption and versioning). - Use locking mechanisms to prevent concurrent modifications to the state file. - **Error Handling:** - Implement robust error handling to catch exceptions and prevent application crashes. - Use try-except blocks to handle potential errors. - Log error messages with sufficient detail for debugging. - ## 3. Performance Considerations - **Optimization Techniques:** - **Instance Type Selection:** Choose the appropriate EC2 instance type based on your application's requirements (CPU, memory, network). - **EBS Optimization:** Use Provisioned IOPS (PIOPS) EBS volumes for high-performance applications. - **Caching:** Implement caching mechanisms to reduce database load and improve response times (e.g., using Amazon ElastiCache). - **Load Balancing:** Distribute traffic across multiple EC2 instances using an Elastic Load Balancer (ELB). - **Auto Scaling:** Use Auto Scaling groups to automatically scale your EC2 instances based on demand. - **Memory Management:** - Monitor memory usage on your EC2 instances. - Optimize application code to reduce memory consumption. - Use memory profiling tools to identify memory leaks. - **Bundle Size Optimization:** - Minimize the size of your application's deployment package. - Remove unnecessary dependencies. - Use code minification and compression. - Example (Python): bash # Create a virtual environment python3 -m venv .venv source .venv/bin/activate # Install only necessary dependencies pip install --no-cache-dir -r requirements.txt # Create deployment package zip -r deployment_package.zip * - **Lazy Loading:** - Load application modules on demand to reduce initial load time. - Use code splitting to break down large modules into smaller chunks. - Example (JavaScript): javascript // main.js async function loadModule() { const module = await import('./my_module.js'); module.myFunction(); } loadModule(); - ## 4. Security Best Practices - **Common Vulnerabilities:** - **SQL Injection:** Prevent SQL injection by using parameterized queries and input validation. - **Cross-Site Scripting (XSS):** Prevent XSS by sanitizing user input and encoding output. - **Remote Code Execution (RCE):** Prevent RCE by validating user input and using secure coding practices. - **Unsecured API endpoints:** Secure API endpoints using authentication and authorization mechanisms. - **Input Validation:** - Validate all user input to prevent malicious code from being injected into your application. - Use regular expressions and data type validation. - **Authentication and Authorization:** - Use strong authentication mechanisms (e.g., multi-factor authentication). - Implement role-based access control (RBAC) to restrict access to sensitive resources. - Use AWS IAM roles for EC2 instances to grant access to AWS resources. - **Data Protection:** - Encrypt sensitive data at rest and in transit. - Use HTTPS for all API communication. - Store sensitive data in secure storage (e.g., AWS Secrets Manager). - **Secure API Communication:** - Use HTTPS for all API communication. - Validate API requests and responses. - Implement rate limiting to prevent abuse. - Use AWS API Gateway to manage and secure your APIs. - ## 5. Testing Approaches - **Unit Testing:** - Write unit tests for individual components to verify their functionality. - Use mocking and stubbing to isolate components from external dependencies. - Example (Python): python import unittest from unittest.mock import Mock class MyComponent: def __init__(self, external_dependency): self.dependency = external_dependency def my_function(self, input_data): result = self.dependency.process_data(input_data) return result class TestMyComponent(unittest.TestCase): def test_my_function(self): # Create a mock for the external dependency mock_dependency = Mock() mock_dependency.process_data.return_value = "Mocked Result" # Create an instance of MyComponent with the mock dependency component = MyComponent(mock_dependency) # Call the function to be tested result = component.my_function("Test Input") # Assert the expected behavior self.assertEqual(result, "Mocked Result") mock_dependency.process_data.assert_called_once_with("Test Input") if __name__ == '__main__': unittest.main() - **Integration Testing:** - Write integration tests to verify the interaction between components. - Test the integration of your application with AWS services. - **End-to-End Testing:** - Write end-to-end tests to verify the entire application flow. - Simulate real user scenarios. - **Test Organization:** - Organize your tests into a logical directory structure. - Use meaningful test names. - Keep tests independent of each other. - **Mocking and Stubbing:** - Use mocking and stubbing to isolate components from external dependencies. - Example (AWS CDK): typescript // Mocking AWS SDK calls in Jest jest.mock('aws-sdk', () => { const mEC2 = { describeInstances: jest.fn().mockReturnValue({ promise: jest.fn().mockResolvedValue({ Reservations: [] }), }), }; return { EC2: jest.fn().mockImplementation(() => mEC2), }; }); - ## 6. Common Pitfalls and Gotchas - **Frequent Mistakes:** - **Incorrect Security Group Configuration:** Incorrectly configured security groups can expose your EC2 instances to security risks. - **Insufficient Resource Limits:** Exceeding AWS resource limits can cause application failures. - **Not Using Auto Scaling:** Not using Auto Scaling can lead to performance bottlenecks and outages during periods of high demand. - **Forgetting to Terminate Unused Instances:** Forgetting to terminate unused EC2 instances can lead to unnecessary costs. - **Edge Cases:** - **Spot Instance Interruptions:** Spot instances can be interrupted with short notice. Design your application to handle spot instance interruptions gracefully. - **Network Connectivity Issues:** Network connectivity issues can prevent your application from accessing AWS services or other resources. - **Version-Specific Issues:** - Be aware of version-specific issues with the amazon-ec2 library or AWS services. - Consult the documentation for the specific versions you are using. - **Compatibility Concerns:** - Ensure compatibility between your application and the underlying operating system and libraries. - Test your application on different operating systems and browsers. - **Debugging Strategies:** - Use logging and monitoring to track the behavior of your application. - Use debugging tools to identify and fix errors. - Consult the AWS documentation and community forums for help. - ## 7. Tooling and Environment - **Recommended Tools:** - **AWS CLI:** Command-line interface for interacting with AWS services. - **AWS Management Console:** Web-based interface for managing AWS resources. - **AWS CloudFormation:** Infrastructure as code service for provisioning and managing AWS resources. - **AWS CDK:** Cloud Development Kit for defining cloud infrastructure in code. - **Terraform:** Infrastructure as code tool for provisioning and managing cloud resources. - **Packer:** Tool for creating machine images. - **Ansible:** Configuration management tool. - **Build Configuration:** - Use a build tool (e.g., Make, Gradle, Maven) to automate the build process. - Define dependencies and build steps in a build file. - Example (Python): makefile # Makefile venv: python3 -m venv .venv . .venv/bin/activate pip install -r requirements.txt deploy: zip -r deployment_package.zip * aws s3 cp deployment_package.zip s3://my-bucket/deployment_package.zip aws lambda update-function-code --function-name my-function --s3-bucket my-bucket --s3-key deployment_package.zip - **Linting and Formatting:** - Use a linter (e.g., pylint, eslint) to enforce code style and identify potential errors. - Use a formatter (e.g., black, prettier) to automatically format your code. - **Deployment Best Practices:** - Use a deployment pipeline to automate the deployment process. - Deploy to a staging environment before deploying to production. - Use blue/green deployments to minimize downtime. - **CI/CD Integration:** - Integrate your application with a CI/CD system (e.g., Jenkins, CircleCI, GitLab CI). - Automate testing, building, and deployment. - ## Additional Considerations - **Cost Optimization:** Regularly review your AWS resource usage and identify opportunities for cost savings. Consider using Reserved Instances or Spot Instances to reduce costs. - **Disaster Recovery:** Implement a disaster recovery plan to ensure business continuity in the event of an outage. Use AWS Backup or other backup solutions to protect your data. - **Compliance:** Ensure that your application complies with relevant regulations and standards (e.g., PCI DSS, HIPAA). - ## References - [AWS EC2 Best Practices](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-best-practices.html) - [AWS CDK Best Practices](https://docs.aws.amazon.com/cdk/v2/guide/best-practices.html) - [Terraform AWS Provider Best Practices](https://docs.aws.amazon.com/prescriptive-guidance/latest/terraform-aws-provider-best-practices/structure.html) ``` ## /rules-mdc/amazon-s3.mdc ```mdc path="/rules-mdc/amazon-s3.mdc" --- description: This rule file provides comprehensive best practices, coding standards, and security guidelines for developing applications using Amazon S3. It aims to ensure secure, performant, and maintainable S3 integrations. globs: **/*S3*.{js,ts,jsx,tsx,py,java,go,csharp} --- - Always disable public access to S3 buckets unless explicitly needed. Use AWS Identity and Access Management (IAM) policies and bucket policies for access control instead of Access Control Lists (ACLs), which are now generally deprecated. - Implement encryption for data at rest using Server-Side Encryption (SSE), preferably with AWS Key Management Service (KMS) for enhanced security. - Use S3 Transfer Acceleration for faster uploads over long distances and enable versioning to protect against accidental deletions. Monitor performance using Amazon CloudWatch and enable logging for auditing. Additionally, consider using S3 Storage Lens for insights into storage usage and activity trends. - Leverage S3's lifecycle policies to transition objects to cheaper storage classes based on access patterns, and regularly review your storage usage to optimize costs. Utilize S3 Intelligent-Tiering for automatic cost savings based on changing access patterns. ## Amazon S3 Best Practices and Coding Standards This document provides comprehensive best practices, coding standards, and security guidelines for developing applications using Amazon S3. Following these guidelines will help ensure that your S3 integrations are secure, performant, maintainable, and cost-effective. ### 1. Code Organization and Structure #### 1.1. Directory Structure Best Practices Organize your code related to Amazon S3 into logical directories based on functionality. project/ ├── src/ │ ├── s3/ │ │ ├── utils.js # Utility functions for S3 operations │ │ ├── uploader.js # Handles uploading files to S3 │ │ ├── downloader.js # Handles downloading files from S3 │ │ ├── config.js # Configuration for S3 (bucket name, region, etc.) │ │ ├── errors.js # Custom error handling for S3 operations │ │ └── index.js # Entry point for S3 module │ ├── ... │ └── tests/ │ ├── s3/ │ │ ├── uploader.test.js # Unit tests for uploader.js │ │ └── ... │ └── ... ├── ... #### 1.2. File Naming Conventions Use descriptive and consistent file names. * `uploader.js`: Module for uploading files to S3. * `downloader.js`: Module for downloading files from S3. * `s3_service.py`: (Python Example) Defines S3 related services. * `S3Manager.java`: (Java Example) Manages S3 client and configurations. #### 1.3. Module Organization * **Single Responsibility Principle:** Each module should have a clear and specific purpose (e.g., uploading, downloading, managing bucket lifecycle). * **Abstraction:** Hide complex S3 operations behind simpler interfaces. * **Configuration:** Store S3 configuration details (bucket name, region, credentials) in a separate configuration file. Example (JavaScript): javascript // s3/uploader.js import AWS from 'aws-sdk'; import config from './config'; const s3 = new AWS.S3(config.s3); export const uploadFile = async (file, key) => { const params = { Bucket: config.s3.bucketName, Key: key, Body: file }; try { await s3.upload(params).promise(); console.log(`File uploaded successfully: ${key}`); } catch (error) { console.error('Error uploading file:', error); throw error; // Re-throw for handling in the caller. } }; #### 1.4. Component Architecture For larger applications, consider a component-based architecture. This can involve creating distinct components for different S3-related tasks. For example: * **Upload Component:** Handles file uploads, progress tracking, and error handling. * **Download Component:** Handles file downloads, progress tracking, and caching. * **Management Component:** Manages bucket creation, deletion, and configuration. #### 1.5. Code Splitting If you have a large application using S3, consider using code splitting to reduce the initial load time. This involves breaking your code into smaller chunks that can be loaded on demand. This is especially relevant for front-end applications using S3 for asset storage. * **Dynamic Imports:** Use dynamic imports to load S3-related modules only when needed. * **Webpack/Rollup:** Configure your bundler to create separate chunks for S3 code. ### 2. Common Patterns and Anti-patterns #### 2.1. Design Patterns * **Strategy Pattern:** Use a strategy pattern to handle different storage classes or encryption methods. * **Factory Pattern:** Use a factory pattern to create S3 clients with different configurations. * **Singleton Pattern:** Use a singleton pattern if you want to use one s3 instance for all the s3 interactions. #### 2.2. Recommended Approaches for Common Tasks * **Uploading large files:** Use multipart upload for files larger than 5 MB. This allows you to upload files in parallel and resume interrupted uploads. * **Downloading large files:** Use byte-range fetches to download files in chunks. This is useful for resuming interrupted downloads and for accessing specific portions of a file. * **Deleting multiple objects:** Use the `deleteObjects` API to delete multiple objects in a single request. This is more efficient than deleting objects one by one. #### 2.3. Anti-patterns and Code Smells * **Hardcoding credentials:** Never hardcode AWS credentials in your code. Use IAM roles or environment variables. * **Insufficient error handling:** Always handle errors from S3 operations gracefully. Provide informative error messages and retry failed operations. * **Ignoring bucket access control:** Properly configure bucket policies and IAM roles to restrict access to your S3 buckets. * **Overly permissive bucket policies:** Avoid granting overly broad permissions in your bucket policies. Follow the principle of least privilege. * **Not using versioning:** Failing to enable versioning can lead to data loss if objects are accidentally deleted or overwritten. * **Assuming immediate consistency:** S3 provides eventual consistency for some operations. Be aware of this and design your application accordingly. * **Polling for object existence:** Instead of polling, use S3 events to trigger actions when objects are created or modified. * **Inefficient data retrieval:** Avoid retrieving entire objects when only a portion of the data is needed. Use byte-range fetches or S3 Select to retrieve only the necessary data. #### 2.4. State Management * **Stateless operations:** Design your S3 operations to be stateless whenever possible. This makes your application more scalable and resilient. * **Caching:** Use caching to reduce the number of requests to S3. Consider using a CDN (Content Delivery Network) to cache frequently accessed objects. * **Session management:** If you need to maintain state, store session data in a separate database or cache, not in S3. #### 2.5. Error Handling * **Retry mechanism:** Implement retry logic with exponential backoff for transient errors. * **Specific error handling:** Handle different S3 errors differently (e.g., retry 503 errors, log 403 errors). * **Centralized error logging:** Log all S3 errors to a centralized logging system for monitoring and analysis. Example (Python): python import boto3 from botocore.exceptions import ClientError import time s3 = boto3.client('s3') def upload_file(file_name, bucket, object_name=None): """Upload a file to an S3 bucket""" if object_name is None: object_name = file_name for attempt in range(3): # Retry up to 3 times try: response = s3.upload_file(file_name, bucket, object_name) return True except ClientError as e: if e.response['Error']['Code'] == 'NoSuchBucket': print(f"The bucket {bucket} does not exist.") return False elif e.response['Error']['Code'] == 'AccessDenied': print("Access denied. Check your credentials and permissions.") return False else: print(f"An error occurred: {e}") if attempt < 2: # Wait and retry time.sleep(2 ** attempt) else: return False except Exception as e: print(f"An unexpected error occurred: {e}") return False return False # Reached max retries and failed ### 3. Performance Considerations #### 3.1. Optimization Techniques * **Use S3 Transfer Acceleration:** If you are uploading or downloading files from a geographically distant location, use S3 Transfer Acceleration to improve performance. S3 Transfer Acceleration utilizes Amazon CloudFront's globally distributed edge locations. * **Use multipart upload:** For large files, use multipart upload to upload files in parallel. The documentation states the best practice is to use this for files larger than 5MB. * **Enable gzip compression:** Compress objects before uploading them to S3 to reduce storage costs and improve download times. Set the `Content-Encoding` header to `gzip` when uploading compressed objects. * **Use HTTP/2:** Enable HTTP/2 on your S3 bucket to improve performance. * **Optimize object sizes:** Store related data in a single object to reduce the number of requests to S3. #### 3.2. Memory Management * **Stream data:** Avoid loading entire files into memory. Use streams to process data in chunks. * **Release resources:** Release S3 client objects when they are no longer needed. #### 3.3. Bundle Size Optimization * **Tree shaking:** Use a bundler that supports tree shaking to remove unused code from your bundle. * **Code splitting:** Split your code into smaller chunks that can be loaded on demand. #### 3.4. Lazy Loading * **Load images on demand:** Load images from S3 only when they are visible on the screen. * **Lazy load data:** Load data from S3 only when it is needed. ### 4. Security Best Practices #### 4.1. Common Vulnerabilities * **Publicly accessible buckets:** Ensure that your S3 buckets are not publicly accessible. * **Insufficient access control:** Properly configure bucket policies and IAM roles to restrict access to your S3 buckets. * **Cross-site scripting (XSS):** Sanitize user input to prevent XSS attacks if you are serving content directly from S3. * **Data injection:** Validate all data before storing it in S3 to prevent data injection attacks. #### 4.2. Input Validation * **Validate file types:** Validate the file types of uploaded objects to prevent malicious files from being stored in S3. * **Validate file sizes:** Limit the file sizes of uploaded objects to prevent denial-of-service attacks. * **Sanitize file names:** Sanitize file names to prevent directory traversal attacks. #### 4.3. Authentication and Authorization * **Use IAM roles:** Use IAM roles to grant permissions to applications running on EC2 instances or other AWS services. * **Use temporary credentials:** Use temporary credentials for applications that need to access S3 from outside of AWS. You can use AWS STS (Security Token Service) to generate temporary credentials. * **Principle of least privilege:** Grant only the minimum permissions required for each user or application. #### 4.4. Data Protection * **Encrypt data at rest:** Use server-side encryption (SSE) or client-side encryption to encrypt data at rest in S3. * **Encrypt data in transit:** Use HTTPS to encrypt data in transit between your application and S3. * **Enable versioning:** Enable versioning to protect against accidental data loss. * **Enable MFA Delete:** Require multi-factor authentication to delete objects from S3. * **Object locking:** Use S3 Object Lock to prevent objects from being deleted or overwritten for a specified period of time. #### 4.5. Secure API Communication * **Use HTTPS:** Always use HTTPS to communicate with the S3 API. * **Validate certificates:** Validate the SSL/TLS certificates of the S3 endpoints. * **Restrict access:** Restrict access to the S3 API using IAM policies and bucket policies. ### 5. Testing Approaches #### 5.1. Unit Testing * **Mock S3 client:** Mock the S3 client to isolate your unit tests. * **Test individual functions:** Test individual functions that interact with S3. * **Verify error handling:** Verify that your code handles S3 errors correctly. Example (JavaScript with Jest): javascript // s3/uploader.test.js import { uploadFile } from './uploader'; import AWS from 'aws-sdk'; jest.mock('aws-sdk', () => { const mS3 = { upload: jest.fn().mockReturnThis(), promise: jest.fn(), }; return { S3: jest.fn(() => mS3), }; }); describe('uploadFile', () => { it('should upload file successfully', async () => { const mockS3 = new AWS.S3(); mockS3.promise.mockResolvedValue({}); const file = 'test file content'; const key = 'test.txt'; await uploadFile(file, key); expect(AWS.S3).toHaveBeenCalledTimes(1); expect(mockS3.upload).toHaveBeenCalledWith({ Bucket: 'your-bucket-name', Key: key, Body: file }); }); it('should handle upload error', async () => { const mockS3 = new AWS.S3(); mockS3.promise.mockRejectedValue(new Error('Upload failed')); const file = 'test file content'; const key = 'test.txt'; await expect(uploadFile(file, key)).rejects.toThrow('Upload failed'); }); }); #### 5.2. Integration Testing * **Test with real S3 buckets:** Create a dedicated S3 bucket for integration tests. * **Test end-to-end flows:** Test complete workflows that involve S3 operations. * **Verify data integrity:** Verify that data is correctly stored and retrieved from S3. #### 5.3. End-to-End Testing * **Simulate user scenarios:** Simulate real user scenarios to test your application's S3 integration. * **Monitor performance:** Monitor the performance of your S3 integration under load. #### 5.4. Test Organization * **Separate test directories:** Create separate test directories for unit tests, integration tests, and end-to-end tests. * **Descriptive test names:** Use descriptive test names that clearly indicate what is being tested. #### 5.5. Mocking and Stubbing * **Mock S3 client:** Use a mocking library (e.g., Jest, Mockito) to mock the S3 client. * **Stub S3 responses:** Stub S3 API responses to simulate different scenarios. * **Use dependency injection:** Use dependency injection to inject mocked S3 clients into your components. ### 6. Common Pitfalls and Gotchas #### 6.1. Frequent Mistakes * **Forgetting to handle errors:** Failing to handle S3 errors can lead to unexpected behavior and data loss. * **Using incorrect region:** Using the wrong region can result in connection errors and data transfer costs. * **Exposing sensitive data:** Storing sensitive data in S3 without proper encryption can lead to security breaches. * **Not cleaning up temporary files:** Failing to delete temporary files after uploading them to S3 can lead to storage waste. * **Overusing public read access:** Granting public read access to S3 buckets can expose sensitive data to unauthorized users. #### 6.2. Edge Cases * **Eventual consistency:** S3 provides eventual consistency for some operations. Be aware of this and design your application accordingly. * **Object size limits:** Be aware of the object size limits for S3. * **Request rate limits:** Be aware of the request rate limits for S3. * **Special characters in object keys:** Handle special characters in object keys correctly. #### 6.3. Version-Specific Issues * **SDK compatibility:** Ensure that your AWS SDK is compatible with the S3 API version. * **API changes:** Be aware of any API changes that may affect your application. #### 6.4. Compatibility Concerns * **Browser compatibility:** Ensure that your application is compatible with different browsers if you are using S3 directly from the browser. * **Serverless environments:** Be aware of any limitations when using S3 in serverless environments (e.g., Lambda). #### 6.5. Debugging Strategies * **Enable logging:** Enable logging to track S3 API calls and errors. * **Use S3 monitoring tools:** Use S3 monitoring tools to monitor the performance and health of your S3 buckets. * **Check S3 access logs:** Analyze S3 access logs to identify potential security issues. * **Use AWS CloudTrail:** Use AWS CloudTrail to track API calls to S3. * **Use AWS X-Ray:** Use AWS X-Ray to trace requests through your application and identify performance bottlenecks. ### 7. Tooling and Environment #### 7.1. Recommended Development Tools * **AWS CLI:** The AWS Command Line Interface (CLI) is a powerful tool for managing S3 resources. * **AWS SDK:** The AWS SDK provides libraries for interacting with S3 from various programming languages. * **S3 Browser:** S3 Browser is a Windows client for managing S3 buckets and objects. * **Cyberduck:** Cyberduck is a cross-platform client for managing S3 buckets and objects. * **Cloudberry Explorer:** Cloudberry Explorer is a Windows client for managing S3 buckets and objects. #### 7.2. Build Configuration * **Use environment variables:** Store S3 configuration details (bucket name, region, credentials) in environment variables. * **Use a build tool:** Use a build tool (e.g., Maven, Gradle, Webpack) to manage your project dependencies and build your application. #### 7.3. Linting and Formatting * **Use a linter:** Use a linter (e.g., ESLint, PyLint) to enforce code style and best practices. * **Use a formatter:** Use a code formatter (e.g., Prettier, Black) to automatically format your code. #### 7.4. Deployment * **Use infrastructure as code:** Use infrastructure as code (e.g., CloudFormation, Terraform) to automate the deployment of your S3 resources. * **Use a deployment pipeline:** Use a deployment pipeline to automate the deployment of your application. * **Use blue/green deployments:** Use blue/green deployments to minimize downtime during deployments. #### 7.5. CI/CD Integration * **Integrate with CI/CD tools:** Integrate your S3 deployment process with CI/CD tools (e.g., Jenkins, CircleCI, Travis CI). * **Automate testing:** Automate your unit tests, integration tests, and end-to-end tests as part of your CI/CD pipeline. * **Automate deployments:** Automate the deployment of your application to S3 as part of your CI/CD pipeline. By following these best practices and coding standards, you can ensure that your Amazon S3 integrations are secure, performant, maintainable, and cost-effective. ``` The content has been capped at 50000 tokens, and files over NaN bytes have been omitted. The user could consider applying other filters to refine the result. The better and more specific the context, the better the LLM can follow instructions. If the context seems verbose, the user can refine the filter using uithub. Thank you for using https://uithub.com - Perfect LLM context for any GitHub repo.