``` ├── .gitignore ├── LICENSE ├── README.md ├── app/ ├── .expo-shared/ ├── assets.json ├── .gitignore ├── App.tsx ├── README.md ├── app.json ├── assets/ ├── icon.png ├── splash.png ├── babel.config.js ├── components/ ├── Base64.tsx ├── ProgressIndicator.tsx ├── Server.tsx ├── package-lock.json ``` ## /.gitignore ```gitignore path="/.gitignore" venv .vscode .DS_Store .pyc node_modules __pycache__ server/*.png server/*.jpg *.psd ``` ## /LICENSE ``` path="/LICENSE" MIT License Copyright (c) 2020 Cyril Diagne Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ``` ## /README.md # AR Cut & Paste An AR+ML prototype that allows cutting elements from your surroundings and pasting them in an image editing software. Although only Photoshop is being handled currently, it may handle different outputs in the future. Demo & more infos: [Thread](https://twitter.com/cyrildiagne/status/1256916982764646402) ⚠️ This is a research prototype and not a consumer / photoshop user tool. **Update 2020.05.11:** If you're looking for an easy to use app based on this research, head over to https://clipdrop.co ## Modules This prototype runs as 3 independent modules: - **The mobile app** - Check out the [/app](/app) folder for instructions on how to deploy the app to your mobile. - **The local server** - The interface between the mobile app and Photoshop. - It finds the position pointed on screen by the camera using [screenpoint](https://github.com/cyrildiagne/screenpoint) - Check out the [/server](/server) folder for instructions on configuring the local server - **The object detection / background removal service** - For now, the salience detection and background removal are delegated to an external service - It would be a lot simpler to use something like [DeepLap](https://github.com/shaqian/tflite-react-native) directly within the mobile app. But that hasn't been implemented in this repo yet. ## Usage ### 1 - Configure Photoshop - Go to "Preferences > Plug-ins", enable "Remote Connection" and set a friendly password that you'll need later. - Make sure that your PS document settings match those in ```server/src/ps.py```, otherwise only an empty layer will be pasted. - Also make sure that your document has some sort of background. If the background is just blank, SIFT will probably not have enough feature to do a correct match. ### 2 - Setup the external salience object detection service #### Option 1: Set up your own model service (requires a CUDA GPU) - As mentioned above, for the time being, you must deploy the BASNet model (Qin & al, CVPR 2019) as an external HTTP service using this [BASNet-HTTP wrapper](https://github.com/cyrildiagne/basnet-http) (requires a CUDA GPU) - You will need the deployed service URL to configure the local server - Make sure to configure a different port if you're running BASNet on the same computer as the local service #### Option 2: Use a community provided endpoint A public endpoint has been provided by members of the community. This is useful if you don't have your own CUDA GPU or do not want to go through the process of running the servce on your own. Use this endpoint by launching the local server with `--basnet_service_ip http://u2net-predictor.tenant-compass.global.coreweave.com` ### 3 - Configure and run the local server - Follow the instructions in [/server](/server) to setup & run the local server. ### 4 - Configure and run the mobile app - Follow the instructions in [/app](/app) to setup & deploy the mobile app. ## Thanks and Acknowledgements - [BASNet code](https://github.com/NathanUA/BASNet) for '[*BASNet: Boundary-Aware Salient Object Detection*](http://openaccess.thecvf.com/content_CVPR_2019/html/Qin_BASNet_Boundary-Aware_Salient_Object_Detection_CVPR_2019_paper.html) [code](https://github.com/NathanUA/BASNet)', [Xuebin Qin](https://webdocs.cs.ualberta.ca/~xuebin/), [Zichen Zhang](https://webdocs.cs.ualberta.ca/~zichen2/), [Chenyang Huang](https://chenyangh.com/), [Chao Gao](https://cgao3.github.io/), [Masood Dehghan](https://sites.google.com/view/masoodd) and [Martin Jagersand](https://webdocs.cs.ualberta.ca/~jag/) - RunwayML for the [Photoshop paste code](https://github.com/runwayml/RunwayML-for-Photoshop/blob/master/host/index.jsx) - [CoreWeave](https://www.coreweave.com) for hosting the public U^2Net model endpoint on Tesla V100s ## /app/.expo-shared/assets.json ```json path="/app/.expo-shared/assets.json" { "12bb71342c6255bbf50437ec8f4441c083f47cdb74bd89160c15e4f43e52a1cb": true, "40b842e832070c58deac6aa9e08fa459302ee3f9da492c7e77d93d2fbf4a56fd": true } ``` ## /app/.gitignore ```gitignore path="/app/.gitignore" node_modules/**/* .expo/* npm-debug.* *.jks *.p8 *.p12 *.key *.mobileprovision *.orig.* web-build/ web-report/ # macOS .DS_Store ``` ## /app/App.tsx ```tsx path="/app/App.tsx" import React, { useState, useEffect } from "react"; import { Text, View, Image, TouchableWithoutFeedback, StyleSheet, } from "react-native"; import * as ImageManipulator from "expo-image-manipulator"; import { Camera } from "expo-camera"; import ProgressIndicator from "./components/ProgressIndicator"; import server from "./components/Server"; const styles = StyleSheet.create({ resultImgView: { position: "absolute", zIndex: 200, top: 0, left: 0, width: "100%", height: "100%", }, resultImg: { position: "absolute", zIndex: 300, top: "25%", left: 0, width: "100%", height: "50%", }, }); interface State { hasPermission: boolean; type: any; camera: any; currImgSrc: string | null; } export default function App() { const [state, setState] = useState({ hasPermission: false, type: Camera.Constants.Type.back, camera: null, currImgSrc: "", } as State); const [pressed, setPressed] = useState(false); const [pasting, setPasting] = useState(false); let camera: any = null; useEffect(() => { (async () => { // Ping the server on startup. server.ping(); // Request permission. const { status } = await Camera.requestPermissionsAsync(); const hasPermission = status === "granted" ? true : false; setState({ ...state, hasPermission }); })(); }, []); async function cut(): Promise { const start = Date.now(); console.log(""); console.log("Cut"); console.log(camera.pictureSize); // const ratios = await camera.getSupportedRatiosAsync() // console.log(ratios) // const sizes = await camera.getAvailablePictureSizeAsync("2:1") // console.log(sizes) console.log("> taking image..."); const opts = { skipProcessing: true, exif: false, quality: 0 }; // const opts = {}; let photo = await camera.takePictureAsync(opts); console.log("> resizing..."); const { uri } = await ImageManipulator.manipulateAsync( photo.uri, [ { resize: { width: 256, height: 512 } }, { crop: { originX: 0, originY: 128, width: 256, height: 256 } }, // { resize: { width: 256, height: 457 } }, // { crop: { originX: 0, originY: 99, width: 256, height: 256 } }, // { resize: { width: 256, height: 341 } }, // { crop: { originX: 0, originY: 42, width: 256, height: 256 } }, ] // { compress: 0, format: ImageManipulator.SaveFormat.JPEG, base64: false } ); console.log("> sending to /cut..."); const resp = await server.cut(uri); console.log(`Done in ${((Date.now() - start) / 1000).toFixed(3)}s`); return resp; } async function paste() { const start = Date.now(); console.log(""); console.log("Paste"); console.log("> taking image..."); // const opts = { skipProcessing: true, exif: false }; const opts = {}; let photo = await camera.takePictureAsync(opts); console.log("> resizing..."); const { uri } = await ImageManipulator.manipulateAsync(photo.uri, [ // { resize: { width: 512, height: 1024 } }, { resize: { width: 350, height: 700 } }, ]); console.log("> sending to /paste..."); try { const resp = await server.paste(uri); if (resp.status !== "ok") { if (resp.status === "screen not found") { console.log("screen not found"); } else { throw new Error(resp); } } } catch (e) { console.error("error pasting:", e); } console.log(`Done in ${((Date.now() - start) / 1000).toFixed(3)}s`); } async function onPressIn() { setPressed(true); const resp = await cut(); // Check if we're still pressed. // if (pressed) { setState({ ...state, currImgSrc: resp }); // } } async function onPressOut() { setPressed(false); setPasting(true); if (state.currImgSrc !== "") { await paste(); setState({ ...state, currImgSrc: "" }); setPasting(false); } } if (state.hasPermission === null) { return ; } if (state.hasPermission === false) { return No access to camera; } let camOpacity = 1; if (pressed && state.currImgSrc !== "") { camOpacity = 0.8; } return ( (camera = ref)} > {pressed && state.currImgSrc !== "" ? ( <> ) : null} {(pressed && state.currImgSrc === "") || pasting ? : null} ); } ``` ## /app/README.md # AR Cut Paste Mobile App An [Expo](expo.io) / [React Native](#) mobile application. Please follow instructions from the [expo website](https://expo.io/learn) to see how to preview the app on your phone using the Expo app. ## Setup ```bash npm install ``` Then update the IP address in `components/Server.tsx` to point to the IP address of the computer running the local server: ```js 3: const URL = "http://192.168.1.29:8080"; ``` ## Run ```bash npm start ``` ## /app/app.json ```json path="/app/app.json" { "expo": { "name": "app", "slug": "app", "platforms": [ "ios", "android", "web" ], "version": "1.0.0", "orientation": "portrait", "icon": "./assets/icon.png", "splash": { "image": "./assets/splash.png", "resizeMode": "contain", "backgroundColor": "#ffffff" }, "updates": { "fallbackToCacheTimeout": 0 }, "assetBundlePatterns": [ "**/*" ], "ios": { "supportsTablet": true } } } ``` ## /app/assets/icon.png Binary file available at https://raw.githubusercontent.com/cyrildiagne/ar-cutpaste/refs/heads/main/app/assets/icon.png ## /app/assets/splash.png Binary file available at https://raw.githubusercontent.com/cyrildiagne/ar-cutpaste/refs/heads/main/app/assets/splash.png ## /app/babel.config.js ```js path="/app/babel.config.js" module.exports = function(api) { api.cache(true); return { presets: ['babel-preset-expo'], }; }; ``` ## /app/components/Base64.tsx ```tsx path="/app/components/Base64.tsx" // https://stackoverflow.com/questions/42829838/react-native-atob-btoa-not-working-without-remote-js-debugging const chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="; const Base64 = { btoa: (input: string = "") => { let str = input; let output = ""; for ( let block = 0, charCode, i = 0, map = chars; str.charAt(i | 0) || ((map = "="), i % 1); output += map.charAt(63 & (block >> (8 - (i % 1) * 8))) ) { charCode = str.charCodeAt((i += 3 / 4)); if (charCode > 0xff) { throw new Error( "'btoa' failed: The string to be encoded contains characters outside of the Latin1 range." ); } block = (block << 8) | charCode; } return output; }, atob: (input: string = "") => { let str = input.replace(/=+$/, ""); let output = ""; if (str.length % 4 == 1) { throw new Error( "'atob' failed: The string to be decoded is not correctly encoded." ); } for ( let bc = 0, bs = 0, buffer, i = 0; (buffer = str.charAt(i++)); ~buffer && ((bs = bc % 4 ? bs * 64 + buffer : buffer), bc++ % 4) ? (output += String.fromCharCode(255 & (bs >> ((-2 * bc) & 6)))) : 0 ) { buffer = chars.indexOf(buffer); } return output; }, }; FileReader.prototype.readAsArrayBuffer = function (blob) { if (this.readyState === this.LOADING) throw new Error("InvalidStateError"); this._setReadyState(this.LOADING); this._result = null; this._error = null; const fr = new FileReader(); fr.onloadend = () => { const content = Base64.atob( fr.result.substr(fr.result.indexOf(',') + 1) ); const buffer = new ArrayBuffer(content.length); const view = new Uint8Array(buffer); view.set(Array.from(content).map((c) => c.charCodeAt(0))); this._result = buffer; this._setReadyState(this.DONE); }; fr.readAsDataURL(blob); }; // from: https://stackoverflow.com/questions/42829838/react-native-atob-btoa-not-working-without-remote-js-debugging // const chars = // "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="; // const atob = (input = "") => { // let str = input.replace(/=+$/, ""); // let output = ""; // if (str.length % 4 == 1) { // throw new Error( // "'atob' failed: The string to be decoded is not correctly encoded." // ); // } // for ( // let bc = 0, bs = 0, buffer, i = 0; // (buffer = str.charAt(i++)); // ~buffer && ((bs = bc % 4 ? bs * 64 + buffer : buffer), bc++ % 4) // ? (output += String.fromCharCode(255 & (bs >> ((-2 * bc) & 6)))) // : 0 // ) { // buffer = chars.indexOf(buffer); // } // return output; // }; export default Base64; ``` ## /app/components/ProgressIndicator.tsx ```tsx path="/app/components/ProgressIndicator.tsx" // @refresh reset import React, { useState, useEffect } from "react"; import { View, Animated, StyleSheet } from "react-native"; import Svg, { Circle } from "react-native-svg"; const AnimatedCircle = Animated.createAnimatedComponent(Circle); const numX = 4; const numY = 5; const total = numX * numY; const styles = StyleSheet.create({ container: { ...StyleSheet.absoluteFillObject, alignItems: "center", justifyContent: "center", }, }); export default function ProgressIndicator() { const init = Array(total) .fill(1) .map((x) => ({ r: new Animated.Value(1), a: new Animated.Value(1) })); const [anim, setAnim] = useState(init); useEffect(() => { console.log("update"); const c = anim.map((v, i: number) => { const t = 400 + Math.random() * 300; const seq = Animated.parallel([ Animated.sequence([ Animated.timing(anim[i].r, { toValue: 3, duration: t - 50 }), Animated.timing(anim[i].r, { toValue: 1, duration: t }), ]), Animated.sequence([ Animated.timing(anim[i].a, { toValue: 0.1, duration: t - 50 }), Animated.timing(anim[i].a, { toValue: 1, duration: t }), ]), ]); return Animated.loop(seq); }); // console.log(c) Animated.parallel(c).start(); }, []); let circles = []; const margin = 100 / (numX); for (let x = 0; x < numX; x++) { for (let y = 0; y < numY; y++) { const i = y * numX + x; circles.push({ x: (x + 0.5) * margin, y: (y) * margin, r: anim[i].r, a: anim[i].a, }); } } return ( {circles.map((c) => ( ))} ); } ``` ## /app/components/Server.tsx ```tsx path="/app/components/Server.tsx" import Base64 from "./Base64"; const URL = "http://192.168.1.29:8080"; function arrayBufferToBase64(buffer: ArrayBuffer) { let binary = ""; const bytes = [].slice.call(new Uint8Array(buffer)); bytes.forEach((b) => (binary += String.fromCharCode(b))); return Base64.btoa(binary); } function ping() { fetch(URL + "/ping").catch((e) => console.error(e)); } async function cut(imageURI: string) { const formData = new FormData(); formData.append("data", { uri: imageURI, name: "photo", type: "image/jpg", }); const resp = await fetch(URL + "/cut", { method: "POST", body: formData, }).then(async (res) => { console.log("> converting..."); const buffer = await res.arrayBuffer(); const base64Flag = "data:image/png;base64,"; const imageStr = arrayBufferToBase64(buffer); return base64Flag + imageStr; }); return resp; } async function paste(imageURI: string) { const formData = new FormData(); formData.append("data", { uri: imageURI, name: "photo", type: "image/jpg", }); const resp = await fetch(URL + "/paste", { method: "POST", body: formData, }).then((r) => r.json()); return resp; } export default { ping, cut, paste, }; ``` The content has been capped at 50000 tokens, and files over NaN bytes have been omitted. The user could consider applying other filters to refine the result. The better and more specific the context, the better the LLM can follow instructions. If the context seems verbose, the user can refine the filter using uithub. Thank you for using https://uithub.com - Perfect LLM context for any GitHub repo.