``` ├── .github/ ├── ISSUE_TEMPLATE/ ├── bug_report.md ├── CODE_OF_CONDUCT.md ├── LICENSE ├── README.md ├── environment.yml ├── remwm.py ├── remwmgui.py ├── setup.sh ├── utils.py ``` ## /.github/ISSUE_TEMPLATE/bug_report.md --- name: Bug report about: Create a report to help us improve title: '' labels: '' assignees: '' --- **Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior: 1. Go to '...' 2. Click on '....' 3. Scroll down to '....' 4. See error **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** If applicable, add screenshots to help explain your problem. **Desktop (please complete the following information):** - OS: [e.g. iOS] - Browser [e.g. chrome, safari] - Version [e.g. 22] **Smartphone (please complete the following information):** - Device: [e.g. iPhone6] - OS: [e.g. iOS8.1] - Browser [e.g. stock browser, safari] - Version [e.g. 22] **Additional context** Add any other context about the problem here. ## /CODE_OF_CONDUCT.md # Contributor Covenant Code of Conduct ## Our Pledge We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation. We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community. ## Our Standards Examples of behavior that contributes to a positive environment for our community include: * Demonstrating empathy and kindness toward other people * Being respectful of differing opinions, viewpoints, and experiences * Giving and gracefully accepting constructive feedback * Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience * Focusing on what is best not just for us as individuals, but for the overall community Examples of unacceptable behavior include: * The use of sexualized language or imagery, and sexual attention or advances of any kind * Trolling, insulting or derogatory comments, and personal or political attacks * Public or private harassment * Publishing others' private information, such as a physical or email address, without their explicit permission * Other conduct which could reasonably be considered inappropriate in a professional setting ## Enforcement Responsibilities Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful. Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate. ## Scope This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. ## Enforcement Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at https://github.com/D-Ogi/WatermarkRemover-AI/issues. All complaints will be reviewed and investigated promptly and fairly. All community leaders are obligated to respect the privacy and security of the reporter of any incident. ## Enforcement Guidelines Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct: ### 1. Correction **Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community. **Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested. ### 2. Warning **Community Impact**: A violation through a single incident or series of actions. **Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban. ### 3. Temporary Ban **Community Impact**: A serious violation of community standards, including sustained inappropriate behavior. **Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban. ### 4. Permanent Ban **Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals. **Consequence**: A permanent ban from any sort of public interaction within the community. ## Attribution This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.0, available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html. Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder](https://github.com/mozilla/diversity). [homepage]: https://www.contributor-covenant.org For answers to common questions about this code of conduct, see the FAQ at https://www.contributor-covenant.org/faq. Translations are available at https://www.contributor-covenant.org/translations. ## /LICENSE ``` path="/LICENSE" MIT License Copyright (c) 2025 D-Ogi Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ``` ## /README.md # WatermarkRemover-AI **AI-Powered Watermark Removal Tool using Florence-2 and LaMA Models** [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![Python: 3.10+](https://img.shields.io/badge/Python-3.10%2B-blue.svg)](https://www.python.org/downloads/) ![image](https://github.com/user-attachments/assets/8f7fb600-695f-4dd7-958c-0cff516b5c7a) Example of watermark removal with LaMa inpainting ![image](https://github.com/user-attachments/assets/e89825fb-3b14-4358-96f3-feb526908ad3) ![image](https://github.com/user-attachments/assets/64e63d5c-4ecc-4fe0-954d-1b72e6a29580) ## Overview `WatermarkRemover-AI` is a cutting-edge application that leverages AI models for precise watermark detection and seamless removal. It uses Florence-2 from Microsoft for watermark identification and LaMA for inpainting to fill in the removed regions naturally. The software offers both a command-line interface (CLI) and a PyQt6-based graphical user interface (GUI), making it accessible to both casual and advanced users. ## Table of Contents - [Features](#features) - [Technical Overview](#technical-overview) - [Installation](#installation) - [Usage](#usage) - [Preferred Way: Setup Script](#preferred-way-setup-script) - [Manual Way](#manual-way) - [Using the GUI](#using-the-gui) - [Using the CLI](#using-the-cli) - [Upgrade Notes](#upgrade-notes) - [Alpha Masking](#alpha-masking) - [Contributing](#contributing) - [License](#license) --- ## Features - **Dual Modes**: Process individual images or entire directories of images. - **Advanced Watermark Detection**: Utilizes Florence-2's open-vocabulary detection for accurate watermark identification. - **Seamless Inpainting**: Employs LaMA for high-quality, context-aware inpainting. - **Customizable Output**: - Configure maximum bounding box size for watermark detection. - Set transparency for watermark regions. - Force specific output formats (PNG, WEBP, JPG). - **Progress Tracking**: Real-time progress updates in both GUI and CLI modes. - **Dark Mode Support**: GUI automatically adapts to system dark mode settings. - **Efficient Resource Management**: Optimized for GPU acceleration using CUDA (optional). --- ## Technical Overview ### Florence-2 for Watermark Detection - Florence-2 detects watermarks using open-vocabulary object detection. - Bounding boxes are filtered to ensure that only small regions (configurable by the user) are processed. ### LaMA for Inpainting - The LaMA model seamlessly fills in watermark regions with context-aware content. - Supports high-resolution inpainting by using cropping and resizing strategies. ### PyQt6 GUI - User-friendly interface for selecting input/output paths, configuring settings, and tracking progress. - Dark mode and customization options enhance the user experience. --- ## Installation ### Prerequisites - Conda/Miniconda installed. - CUDA (optional for GPU acceleration; the application runs well on CPUs too). ### Steps 1. **Clone the Repository:** ```bash git clone https://github.com/D-Ogi/WatermarkRemover-AI.git cd WatermarkRemover-AI ``` 2. **Run the Setup Script:** ```bash bash setup.sh ``` The `setup.sh` script automatically sets up the environment, installs dependencies, and launches the GUI application. It also provides convenient options for CLI usage. 3. **Download the LaMA Model:** ```bash conda activate py312aiwatermark iopaint download --model lama ``` The LaMA inpainting model files aren't included in the repository and must be downloaded separately (approximately 196MB). 5. **Fast-Track Options:** - **To Use the CLI Immediately**: After running `setup.sh`, you can use the CLI directly without activating the environment manually: ```bash ./setup.sh input_path output_path [options] ``` Example: ```bash ./setup.sh ./input_images ./output_images --overwrite --transparent ``` - **To Activate the Environment Without Starting the Application**: Use: ```bash conda activate py312aiwatermark ``` --- ## Usage ### Preferred Way: Setup Script 1. **Run the Setup Script**: ```bash bash setup.sh ``` - The GUI will launch automatically, and the environment will be ready for immediate CLI or GUI use. - For CLI use, run: ```bash ./setup.sh input_path output_path [options] ``` Example: ```bash ./setup.sh ./input_images ./output_images --overwrite --transparent ``` ### Manual Way 1. **Activate the Environment**: ```bash conda activate py312aiwatermark ``` 2. **Launch GUI or CLI**: - **GUI**: ```bash python remwmgui.py ``` - **CLI**: ```bash python remwm.py input_path output_path [options] ``` ### Using the GUI 1. **Launch the GUI**: If not launched automatically, start it with: ```bash python remwmgui.py ``` 2. **Configure Settings**: - **Mode**: Select "Process Single Image" or "Process Directory". - **Paths**: Browse and set the input/output directories. - **Options**: - Enable overwriting of existing files (directory processing only, single image processing always overwrites) - Enable transparency for watermark regions. - Adjust the maximum bounding box size for watermark detection. - **Output Format**: Choose between PNG, WEBP, JPG, or retain the original format. 3. **Start Processing**: - Click "Start" to begin processing. - Monitor progress and logs in the GUI. ### Using the CLI 1. **Basic Command**: ```bash python remwm.py input_path output_path ``` 2. **Options**: - `--overwrite`: Overwrite existing files. - `--transparent`: Make watermark regions transparent instead of removing them. - `--max-bbox-percent`: Set the maximum bounding box size for watermark detection (default: 10%). - `--force-format`: Force output format (PNG, WEBP, or JPG). 3. **Example**: ```bash python remwm.py ./input_images ./output_images --overwrite --max-bbox-percent=15 --force-format=PNG ``` --- ### Upgrade Notes If you have previously used an older version of the repository or set up an incorrect Conda environment, follow these steps to upgrade: 1. **Update the Repository**: ```bash git pull ``` 2. **Remove the Old Environment**: ```bash conda deactivate conda env remove -n py312 ``` 3. **Run the Setup Script**: ```bash bash setup.sh ``` This will recreate the correct environment (`py312aiwatermark`) and ensure all dependencies are up-to-date. --- ## Alpha Masking We implemented alpha masking to allow selective manipulation of watermark regions without altering other parts of the image. ### Why Alpha Masking? - **Precision**: Enable box-targeted watermark removal by isolating specific regions. - **Flexibility**: By controlling opacity in alpha layers, we can achieve a range of effects by complete removal to transparency. - **Minimal Impact**: This method ensures that areas outside the watermark remain untouched, preserving image quality. --- ## Contributing Contributions are welcome! To contribute: 1. Fork the repository. 2. Create a new branch for your feature. 3. Submit a pull request detailing your changes. --- ## License This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. ## /environment.yml ```yml path="/environment.yml" name: py312aiwatermark channels: - nvidia - conda-forge - defaults dependencies: - python=3.12 - pip - numpy - tqdm - loguru - click - pillow - opencv - pip: - --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126 ``` ## /remwm.py ```py path="/remwm.py" import sys import click from pathlib import Path import cv2 import numpy as np from PIL import Image, ImageDraw from transformers import AutoProcessor, AutoModelForCausalLM from iopaint.model_manager import ModelManager from iopaint.schema import HDStrategy, LDMSampler, InpaintRequest as Config import torch from torch.nn import Module import tqdm from loguru import logger from enum import Enum try: from cv2.typing import MatLike except ImportError: MatLike = np.ndarray class TaskType(str, Enum): OPEN_VOCAB_DETECTION = "" """Detect bounding box for objects and OCR text""" def identify(task_prompt: TaskType, image: MatLike, text_input: str, model: AutoModelForCausalLM, processor: AutoProcessor, device: str): if not isinstance(task_prompt, TaskType): raise ValueError(f"task_prompt must be a TaskType, but {task_prompt} is of type {type(task_prompt)}") prompt = task_prompt.value if text_input is None else task_prompt.value + text_input inputs = processor(text=prompt, images=image, return_tensors="pt") inputs = {k: v.to(device) for k, v in inputs.items()} generated_ids = model.generate( input_ids=inputs["input_ids"], pixel_values=inputs["pixel_values"], max_new_tokens=1024, early_stopping=False, do_sample=False, num_beams=3, ) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0] return processor.post_process_generation( generated_text, task=task_prompt.value, image_size=(image.width, image.height) ) def get_watermark_mask(image: MatLike, model: AutoModelForCausalLM, processor: AutoProcessor, device: str, max_bbox_percent: float): text_input = "watermark" task_prompt = TaskType.OPEN_VOCAB_DETECTION parsed_answer = identify(task_prompt, image, text_input, model, processor, device) mask = Image.new("L", image.size, 0) draw = ImageDraw.Draw(mask) detection_key = "" if detection_key in parsed_answer and "bboxes" in parsed_answer[detection_key]: image_area = image.width * image.height for bbox in parsed_answer[detection_key]["bboxes"]: x1, y1, x2, y2 = map(int, bbox) bbox_area = (x2 - x1) * (y2 - y1) if (bbox_area / image_area) * 100 <= max_bbox_percent: draw.rectangle([x1, y1, x2, y2], fill=255) else: logger.warning(f"Skipping large bounding box: {bbox} covering {bbox_area / image_area:.2%} of the image") return mask def process_image_with_lama(image: MatLike, mask: MatLike, model_manager: ModelManager): config = Config( ldm_steps=50, ldm_sampler=LDMSampler.ddim, hd_strategy=HDStrategy.CROP, hd_strategy_crop_margin=64, hd_strategy_crop_trigger_size=800, hd_strategy_resize_limit=1600, ) result = model_manager(image, mask, config) if result.dtype in [np.float64, np.float32]: result = np.clip(result, 0, 255).astype(np.uint8) return result def make_region_transparent(image: Image.Image, mask: Image.Image): image = image.convert("RGBA") mask = mask.convert("L") transparent_image = Image.new("RGBA", image.size) for x in range(image.width): for y in range(image.height): if mask.getpixel((x, y)) > 0: transparent_image.putpixel((x, y), (0, 0, 0, 0)) else: transparent_image.putpixel((x, y), image.getpixel((x, y))) return transparent_image @click.command() @click.argument("input_path", type=click.Path(exists=True)) @click.argument("output_path", type=click.Path()) @click.option("--overwrite", is_flag=True, help="Overwrite existing files in bulk mode.") @click.option("--transparent", is_flag=True, help="Make watermark regions transparent instead of removing.") @click.option("--max-bbox-percent", default=10.0, help="Maximum percentage of the image that a bounding box can cover.") @click.option("--force-format", type=click.Choice(["PNG", "WEBP", "JPG"], case_sensitive=False), default=None, help="Force output format. Defaults to input format.") def main(input_path: str, output_path: str, overwrite: bool, transparent: bool, max_bbox_percent: float, force_format: str): input_path = Path(input_path) output_path = Path(output_path) device = "cuda" if torch.cuda.is_available() else "cpu" print(f"Using device: {device}") florence_model = AutoModelForCausalLM.from_pretrained("microsoft/Florence-2-large", trust_remote_code=True).to(device).eval() florence_processor = AutoProcessor.from_pretrained("microsoft/Florence-2-large", trust_remote_code=True) logger.info("Florence-2 Model loaded") if not transparent: model_manager = ModelManager(name="lama", device=device) logger.info("LaMa model loaded") def handle_one(image_path: Path, output_path: Path): if output_path.exists() and not overwrite: logger.info(f"Skipping existing file: {output_path}") return image = Image.open(image_path).convert("RGB") mask_image = get_watermark_mask(image, florence_model, florence_processor, device, max_bbox_percent) if transparent: result_image = make_region_transparent(image, mask_image) else: lama_result = process_image_with_lama(np.array(image), np.array(mask_image), model_manager) result_image = Image.fromarray(cv2.cvtColor(lama_result, cv2.COLOR_BGR2RGB)) # Determine output format if force_format: output_format = force_format.upper() elif transparent: output_format = "PNG" else: output_format = image_path.suffix[1:].upper() if output_format not in ["PNG", "WEBP", "JPG"]: output_format = "PNG" # Map JPG to JPEG for PIL compatibility if output_format == "JPG": output_format = "JPEG" if transparent and output_format == "JPG": logger.warning("Transparency detected. Defaulting to PNG for transparency support.") output_format = "PNG" new_output_path = output_path.with_suffix(f".{output_format.lower()}") result_image.save(new_output_path, format=output_format) logger.info(f"input_path:{image_path}, output_path:{new_output_path}") if input_path.is_dir(): if not output_path.exists(): output_path.mkdir(parents=True) images = list(input_path.glob("*.[jp][pn]g")) + list(input_path.glob("*.webp")) total_images = len(images) for idx, image_path in enumerate(tqdm.tqdm(images, desc="Processing images")): output_file = output_path / image_path.name handle_one(image_path, output_file) progress = int((idx + 1) / total_images * 100) print(f"input_path:{image_path}, output_path:{output_file}, overall_progress:{progress}") else: output_file = output_path.with_suffix(".webp" if transparent else output_path.suffix) handle_one(input_path, output_file) print(f"input_path:{input_path}, output_path:{output_file}, overall_progress:100") if __name__ == "__main__": main() ``` ## /remwmgui.py ```py path="/remwmgui.py" import os import sys import subprocess import psutil import yaml import torch from pathlib import Path from PyQt6.QtWidgets import ( QApplication, QMainWindow, QFileDialog, QLabel, QLineEdit, QPushButton, QVBoxLayout, QHBoxLayout, QWidget, QTextEdit, QProgressBar, QComboBox, QMessageBox, QRadioButton, QButtonGroup, QSlider, QCheckBox, QStatusBar ) from PyQt6.QtCore import Qt, pyqtSignal, QObject, QThread, QTimer from PyQt6.QtGui import QPalette, QColor from loguru import logger CONFIG_FILE = "ui.yml" class Worker(QObject): log_signal = pyqtSignal(str) progress_signal = pyqtSignal(int) finished_signal = pyqtSignal() def __init__(self, process): super().__init__() self.process = process def run(self): try: for line in iter(self.process.stdout.readline, ""): self.log_signal.emit(line) if "overall_progress:" in line: progress = int(line.strip().split("overall_progress:")[1].strip()) self.progress_signal.emit(progress) self.process.stdout.close() finally: self.finished_signal.emit() class WatermarkRemoverGUI(QMainWindow): def __init__(self): super().__init__() self.setWindowTitle("Watermark Remover GUI") self.setGeometry(100, 100, 800, 600) # Initialize UI elements self.radio_single = QRadioButton("Process Single Image") self.radio_batch = QRadioButton("Process Directory") self.radio_single.setChecked(True) self.mode_group = QButtonGroup() self.mode_group.addButton(self.radio_single) self.mode_group.addButton(self.radio_batch) self.input_path = QLineEdit(self) self.output_path = QLineEdit(self) self.overwrite_checkbox = QCheckBox("Overwrite Existing Files", self) self.transparent_checkbox = QCheckBox("Make Watermark Transparent", self) self.max_bbox_percent_slider = QSlider(Qt.Orientation.Horizontal, self) self.max_bbox_percent_slider.setRange(1, 100) self.max_bbox_percent_slider.setValue(10) self.max_bbox_percent_label = QLabel(f"Max BBox Percent: 10%", self) self.max_bbox_percent_slider.valueChanged.connect(self.update_bbox_label) self.force_format_png = QRadioButton("PNG") self.force_format_webp = QRadioButton("WEBP") self.force_format_jpg = QRadioButton("JPG") self.force_format_none = QRadioButton("None") self.force_format_none.setChecked(True) self.force_format_group = QButtonGroup() self.force_format_group.addButton(self.force_format_png) self.force_format_group.addButton(self.force_format_webp) self.force_format_group.addButton(self.force_format_jpg) self.force_format_group.addButton(self.force_format_none) self.progress_bar = QProgressBar(self) self.logs = QTextEdit(self) self.logs.setReadOnly(True) self.logs.setVisible(False) self.start_button = QPushButton("Start", self) self.stop_button = QPushButton("Stop", self) self.toggle_logs_button = QPushButton("Show Logs", self) self.toggle_logs_button.setCheckable(True) self.stop_button.setDisabled(True) # Status bar for system info self.status_bar = QStatusBar() self.setStatusBar(self.status_bar) self.timer = QTimer() self.timer.timeout.connect(self.update_system_info) self.timer.start(1000) # Update every second self.process = None self.thread = None # Layout layout = QVBoxLayout() # Mode selection mode_layout = QHBoxLayout() mode_layout.addWidget(self.radio_single) mode_layout.addWidget(self.radio_batch) # Input and output paths path_layout = QVBoxLayout() path_layout.addWidget(QLabel("Input Path:")) path_layout.addWidget(self.input_path) path_layout.addWidget(QPushButton("Browse", clicked=self.browse_input)) path_layout.addWidget(QLabel("Output Path:")) path_layout.addWidget(self.output_path) path_layout.addWidget(QPushButton("Browse", clicked=self.browse_output)) # Options options_layout = QVBoxLayout() options_layout.addWidget(self.overwrite_checkbox) options_layout.addWidget(self.transparent_checkbox) bbox_layout = QVBoxLayout() bbox_layout.addWidget(self.max_bbox_percent_label) bbox_layout.addWidget(self.max_bbox_percent_slider) options_layout.addLayout(bbox_layout) force_format_layout = QHBoxLayout() force_format_layout.addWidget(QLabel("Force Format:")) force_format_layout.addWidget(self.force_format_png) force_format_layout.addWidget(self.force_format_webp) force_format_layout.addWidget(self.force_format_jpg) force_format_layout.addWidget(self.force_format_none) options_layout.addLayout(force_format_layout) # Logs and progress progress_layout = QVBoxLayout() progress_layout.addWidget(QLabel("Progress:")) progress_layout.addWidget(self.progress_bar) progress_layout.addWidget(self.toggle_logs_button) progress_layout.addWidget(self.logs) # Buttons button_layout = QHBoxLayout() button_layout.addWidget(self.start_button) button_layout.addWidget(self.stop_button) # Final assembly layout.addLayout(mode_layout) layout.addLayout(path_layout) layout.addLayout(options_layout) layout.addLayout(progress_layout) layout.addLayout(button_layout) container = QWidget() container.setLayout(layout) self.setCentralWidget(container) # Connect buttons self.start_button.clicked.connect(self.start_processing) self.stop_button.clicked.connect(self.stop_processing) self.toggle_logs_button.toggled.connect(self.toggle_logs) self.apply_dark_mode_if_needed() # Load configuration self.load_config() def update_bbox_label(self, value): self.max_bbox_percent_label.setText(f"Max BBox Percent: {value}%") def toggle_logs(self, checked): self.logs.setVisible(checked) self.toggle_logs_button.setText("Hide Logs" if checked else "Show Logs") def apply_dark_mode_if_needed(self): if QApplication.instance().styleHints().colorScheme() == Qt.ColorScheme.Dark: dark_palette = QPalette() dark_palette.setColor(QPalette.ColorRole.Window, QColor(53, 53, 53)) dark_palette.setColor(QPalette.ColorRole.WindowText, QColor(255, 255, 255)) dark_palette.setColor(QPalette.ColorRole.Base, QColor(25, 25, 25)) dark_palette.setColor(QPalette.ColorRole.AlternateBase, QColor(53, 53, 53)) dark_palette.setColor(QPalette.ColorRole.ToolTipBase, QColor(255, 255, 255)) dark_palette.setColor(QPalette.ColorRole.ToolTipText, QColor(255, 255, 255)) dark_palette.setColor(QPalette.ColorRole.Text, QColor(255, 255, 255)) dark_palette.setColor(QPalette.ColorRole.Button, QColor(53, 53, 53)) dark_palette.setColor(QPalette.ColorRole.ButtonText, QColor(255, 255, 255)) dark_palette.setColor(QPalette.ColorRole.BrightText, QColor(255, 0, 0)) dark_palette.setColor(QPalette.ColorRole.Link, QColor(42, 130, 218)) dark_palette.setColor(QPalette.ColorRole.Highlight, QColor(42, 130, 218)) dark_palette.setColor(QPalette.ColorRole.HighlightedText, QColor(0, 0, 0)) QApplication.instance().setPalette(dark_palette) def update_system_info(self): cuda_available = "CUDA: Available" if torch.cuda.is_available() else "CUDA: Not Available" ram = psutil.virtual_memory() ram_usage = ram.used // (1024 ** 2) ram_total = ram.total // (1024 ** 2) ram_percentage = ram.percent vram_status = "Not Available" if torch.cuda.is_available(): gpu_info = torch.cuda.get_device_properties(0) vram_total = gpu_info.total_memory // (1024 ** 2) vram_used = vram_total - (torch.cuda.memory_reserved(0) // (1024 ** 2)) vram_percentage = (vram_used / vram_total) * 100 vram_status = f"VRAM: {vram_used} MB / {vram_total} MB ({vram_percentage:.2f}%)" status_text = ( f"{cuda_available} | RAM: {ram_usage} MB / {ram_total} MB ({ram_percentage}%) | {vram_status} | CPU Load: {psutil.cpu_percent()}%" ) self.status_bar.showMessage(status_text) def browse_input(self): if self.radio_single.isChecked(): path, _ = QFileDialog.getOpenFileName(self, "Select Input Image", "", "Images (*.png *.jpg *.jpeg *.webp)") else: path = QFileDialog.getExistingDirectory(self, "Select Input Directory") if path: self.input_path.setText(path) def browse_output(self): path = QFileDialog.getExistingDirectory(self, "Select Output Directory") if path: self.output_path.setText(path) def start_processing(self): input_path = self.input_path.text() output_path = self.output_path.text() if not input_path or not output_path: QMessageBox.critical(self, "Error", "Input and Output paths are required.") return overwrite = "--overwrite" if self.overwrite_checkbox.isChecked() else "" transparent = "--transparent" if self.transparent_checkbox.isChecked() else "" max_bbox_percent = self.max_bbox_percent_slider.value() force_format = "None" if self.force_format_png.isChecked(): force_format = "PNG" elif self.force_format_webp.isChecked(): force_format = "WEBP" elif self.force_format_jpg.isChecked(): force_format = "JPG" force_format_option = f"--force-format={force_format}" if force_format != "None" else "" command = [ "python", "remwm.py", input_path, output_path, overwrite, transparent, f"--max-bbox-percent={max_bbox_percent}", force_format_option ] command = [arg for arg in command if arg] # Remove empty strings self.process = subprocess.Popen( command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, env={**os.environ, "PYTHONUNBUFFERED": "1"} ) self.worker = Worker(self.process) self.worker.log_signal.connect(self.update_logs) self.worker.progress_signal.connect(self.update_progress_bar) self.worker.finished_signal.connect(self.reset_ui) self.thread = QThread() self.worker.moveToThread(self.thread) self.thread.started.connect(self.worker.run) self.thread.start() self.stop_button.setDisabled(False) self.start_button.setDisabled(True) def update_logs(self, line): self.logs.append(line) def update_progress_bar(self, progress): self.progress_bar.setValue(progress) def stop_processing(self): if self.process: self.process.terminate() self.process.wait() self.reset_ui() def reset_ui(self): self.stop_button.setDisabled(True) self.start_button.setDisabled(False) if self.thread and self.thread.isRunning(): self.thread.quit() self.thread.wait() self.process = None self.thread = None def save_config(self): config = { "input_path": self.input_path.text(), "output_path": self.output_path.text(), "overwrite": self.overwrite_checkbox.isChecked(), "transparent": self.transparent_checkbox.isChecked(), "max_bbox_percent": self.max_bbox_percent_slider.value(), "force_format": "PNG" if self.force_format_png.isChecked() else "WEBP" if self.force_format_webp.isChecked() else "JPG" if self.force_format_jpg.isChecked() else "None", "mode": "single" if self.radio_single.isChecked() else "batch" } with open(CONFIG_FILE, "w") as f: yaml.dump(config, f) def load_config(self): if os.path.exists(CONFIG_FILE): with open(CONFIG_FILE, "r") as f: config = yaml.safe_load(f) self.input_path.setText(config.get("input_path", "")) self.output_path.setText(config.get("output_path", "")) self.overwrite_checkbox.setChecked(config.get("overwrite", False)) self.transparent_checkbox.setChecked(config.get("transparent", False)) self.max_bbox_percent_slider.setValue(config.get("max_bbox_percent", 10)) force_format = config.get("force_format", "None") if force_format == "PNG": self.force_format_png.setChecked(True) elif force_format == "WEBP": self.force_format_webp.setChecked(True) elif force_format == "JPG": self.force_format_jpg.setChecked(True) else: self.force_format_none.setChecked(True) mode = config.get("mode", "single") if mode == "single": self.radio_single.setChecked(True) else: self.radio_batch.setChecked(True) def closeEvent(self, event): self.save_config() event.accept() if __name__ == "__main__": app = QApplication(sys.argv) gui = WatermarkRemoverGUI() gui.show() sys.exit(app.exec()) ``` ## /setup.sh ```sh path="/setup.sh" #!/usr/bin/env bash # Exit immediately if a command exits with a non-zero status set -e # Default values ENV_NAME="py312aiwatermark" ENV_FILE="environment.yml" INSTALL_DIR="" FORCE_REINSTALL=false USE_DEFAULT_DIR=true # Function to detect previously used directory find_existing_install_dir() { if [ -d "$PWD/conda_envs" ] && conda env list | grep -q "^${ENV_NAME}\\s"; then echo "$PWD/conda_envs" else echo "" fi } # Function to display usage usage() { echo "Usage: $0 [options] -- [script arguments]" echo "Options:" echo " --activate Activate the Conda environment and provide instructions to deactivate it." echo " --reinstall Reinstall the Conda environment." echo " --current-dir Use the current directory as the root for the Conda environment." echo " --help Show this help message." echo "Script Arguments:" echo " Any arguments after -- will be passed directly to remwm.py." exit 1 } # Parse arguments ACTIVATE_ONLY=false SCRIPT_ARGS=() while [[ $# -gt 0 ]]; do case "$1" in --activate) ACTIVATE_ONLY=true shift ;; --reinstall) FORCE_REINSTALL=true shift ;; --current-dir) USE_DEFAULT_DIR=false INSTALL_DIR="$PWD/conda_envs" shift ;; --help) usage ;; --) shift SCRIPT_ARGS=("$@") break ;; *) echo "Unknown option: $1" usage ;; esac done # Check if Conda is installed if ! command -v conda &> /dev/null; then echo "Conda could not be found. Please install Conda or Miniconda and try again." exit 1 fi # Determine installation directory if [ "$USE_DEFAULT_DIR" = false ]; then echo "Using current directory as the root for Conda environments: $INSTALL_DIR" export CONDA_ENVS_PATH="$INSTALL_DIR" export CONDA_PKGS_DIRS="$INSTALL_DIR/pkgs" else EXISTING_DIR=$(find_existing_install_dir) if [ -n "$EXISTING_DIR" ]; then echo "Detected existing installation in: $EXISTING_DIR" export CONDA_ENVS_PATH="$EXISTING_DIR" export CONDA_PKGS_DIRS="$EXISTING_DIR/pkgs" else echo "Using default Conda directory." fi fi # Check if the environment already exists or needs to be reinstalled if conda env list | grep -q "^${ENV_NAME}\\s"; then if [ "$FORCE_REINSTALL" = true ]; then echo "Reinstalling environment '${ENV_NAME}'..." conda env remove -n "${ENV_NAME}" else echo "Environment '${ENV_NAME}' already exists. Activating it..." eval "$(conda shell.bash hook)" conda activate "${ENV_NAME}" fi fi if ! conda env list | grep -q "^${ENV_NAME}\\s"; then # Create the Conda environment echo "Creating Conda environment '${ENV_NAME}' from '${ENV_FILE}'..." conda env create -f "${ENV_FILE}" || { echo "Failed to create the Conda environment." exit 1 } eval "$(conda shell.bash hook)" conda activate "${ENV_NAME}" fi if [ "$ACTIVATE_ONLY" = true ]; then echo "Environment '${ENV_NAME}' activated. To deactivate, run 'conda deactivate'." exit 0 fi # Ensure required dependencies are installed pip list | grep -q PyQt6 || pip install PyQt6 pip list | grep -q transformers || pip install transformers pip list | grep -q iopaint || pip install iopaint pip list | grep -q opencv-python-headless || pip install opencv-python-headless # Run remwm.py with passed arguments python remwm.py "${SCRIPT_ARGS[@]}" ``` ## /utils.py ```py path="/utils.py" from enum import Enum import random import matplotlib.patches as patches import numpy as np from PIL import ImageDraw # Constants colormap = ['blue', 'orange', 'green', 'purple', 'brown', 'pink', 'gray', 'olive', 'cyan', 'red', 'lime', 'indigo', 'violet', 'aqua', 'magenta', 'coral', 'gold', 'tan', 'skyblue'] # To be set model = None processor = None def set_model_info(model_, processor_): global model, processor model = model_ processor = processor_ class TaskType(str, Enum): """The types of tasks supported""" CAPTION = '' DETAILED_CAPTION = '' MORE_DETAILED_CAPTION = '' def run_example(task_prompt: TaskType, image, text_input=None): """Runs an inference task using the model.""" if not isinstance(task_prompt, TaskType): raise ValueError(f"task_prompt must be a TaskType, but {task_prompt} is of type {type(task_prompt)}") prompt = task_prompt.value if text_input is None else task_prompt.value + text_input inputs = processor(text=prompt, images=image, return_tensors="pt") generated_ids = model.generate( input_ids=inputs["input_ids"].cuda(), pixel_values=inputs["pixel_values"].cuda(), max_new_tokens=1024, early_stopping=False, do_sample=False, num_beams=3, ) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0] parsed_answer = processor.post_process_generation( generated_text, task=task_prompt.value, image_size=(image.width, image.height) ) return parsed_answer def draw_polygons(image, prediction, fill_mask=False): """Draws segmentation masks with polygons on an image.""" draw = ImageDraw.Draw(image) for polygons, label in zip(prediction['polygons'], prediction['labels']): color = random.choice(colormap) fill_color = random.choice(colormap) if fill_mask else None for polygon in polygons: polygon = np.array(polygon).reshape(-1, 2) if len(polygon) < 3: print('Invalid polygon:', polygon) continue polygon = (polygon * 1).reshape(-1).tolist() # No scaling draw.polygon(polygon, outline=color, fill=fill_color) draw.text((polygon[0] + 8, polygon[1] + 2), label, fill=color) return image def draw_ocr_bboxes(image, prediction): """Draws OCR bounding boxes on an image.""" draw = ImageDraw.Draw(image) bboxes, labels = prediction['quad_boxes'], prediction['labels'] for box, label in zip(bboxes, labels): color = random.choice(colormap) new_box = (np.array(box) * 1).tolist() # No scaling draw.polygon(new_box, width=3, outline=color) draw.text((new_box[0] + 8, new_box[1] + 2), "{}".format(label), align="right", fill=color) return image def convert_bbox_to_relative(box, image): """Converts bounding box pixel coordinates to relative coordinates in the range 0-999.""" return [ (box[0] / image.width) * 999, (box[1] / image.height) * 999, (box[2] / image.width) * 999, (box[3] / image.height) * 999, ] def convert_relative_to_bbox(relative, image): """Converts list of relative coordinates to pixel coordinates.""" return [ (relative[0] / 999) * image.width, (relative[1] / 999) * image.height, (relative[2] / 999) * image.width, (relative[3] / 999) * image.height, ] def convert_bbox_to_loc(box, image): """Converts bounding box pixel coordinates to position tokens.""" relative_coordinates = convert_bbox_to_relative(box, image) return ''.join([f'' for i in relative_coordinates]) ``` The better and more specific the context, the better the LLM can follow instructions. If the context seems verbose, the user can refine the filter using uithub. Thank you for using https://uithub.com - Perfect LLM context for any GitHub repo.