Segment Anything in High Quality Using Deep Learning
SAM is designed to improve the quality of SAM’s mask prediction, particularly for objects with intricate structures.
Our implementation retains the original prompt design, efficiency, and zero-shot generalizability of SAM, with minimal additional parameters and computation. We introduce a High-Quality Output Token into SAM’s mask decoder to predict high-quality masks. This token is not just applied to mask-decoder features but is first fused with early and final Vision Transformer (ViT) features for improved mask details. The resulting HQ-SAM model is trained on a dataset of 44K fine-grained masks from various sources. It exhibits high performance across 9 diverse segmentation datasets for different downstream tasks.
Repository Structure
This repository is structured as follows:
demo
: Contains two demo scripts (demo_hqsam.py
anddemo_sam.py
) for running HQ-SAM and SAM, respectively. Theinput_imgs
directory withindemo
houses several example images for testing.segment_anything
: This is the core directory of the project. It includes scripts for building SAM and HQ-SAM models (build_sam.py
andbuild_sam_baseline.py
), an automatic mask generator (automatic_mask_generator.py
), and a predictor script (predictor.py
). Themodeling
subdirectory contains the model components, including the image encoder, mask decoder, prompt encoder, and transformer. Theutils
subdirectory provides utility scripts for mask generation, ONNX model conversion, and image transformations.visual_demo
: This directory contains several GIFs that visually demonstrate the capabilities of HQ-SAM.setup.cfg
andsetup.py
: These files are for setting up the project environment.
Getting Started
To start with HQ-SAM, ensure you have Python 3.8 or higher, along with PyTorch 1.7 and TorchVision 0.8 installed. The codebase also requires several other dependencies for mask post-processing, saving masks in COCO format, running the example notebooks, and exporting the model in ONNX format. You can clone this repository and install it using pip.
After the environment is set up, you can run the demos or use the model components for your segmentation tasks. You can also view the GIFs in the visual_demo
directory to see HQ-SAM in action. The model has been trained on a dataset of 44k masks and its efficacy has been demonstrated across various segmentation tasks.
Here is the tree show of all the files in this software:
.
├── demo
│ ├── demo_hqsam.py
│ ├── demo_sam.py
│ └── input_imgs
│ ├── example0.png
│ ├── example1.png
│ ├── example2.png
│ ├── example3.png
│ ├── example4.png
│ ├── example5.png
│ ├── example6.png
│ ├── example7.png
│ └── example8.png
├── segment_anything
│ ├── __init__.py
│ ├── automatic_mask_generator.py
│ ├── build_sam.py
│ ├── build_sam_baseline.py
│ ├── modeling
│ │ ├── __init__.py
│ │ ├── common.py
│ │ ├── image_encoder.py
│ │ ├── mask_decoder.py
│ │ ├── mask_decoder_hq.py
│ │ ├── prompt_encoder.py
│ │ ├── sam.py
│ │ └── transformer.py
│ ├── predictor.py
│ └── utils
│ ├── __init__.py
│ ├── amg.py
│ ├── onnx.py
│ └── transforms.py
├── setup.cfg
├── setup.py
└── visual_demo
├── 1.gif
├── 2.gif
├── 3.gif
├── 4.gif
├── 5.gif
└── 6.gif
Download the source code by using the link in comment box: