Segment Anything in High Quality Using Deep Learning

SAM is designed to improve the quality of SAM’s mask prediction, particularly for objects with intricate structures.

Jun 21, 2023

∙ Paid

Our implementation retains the original prompt design, efficiency, and zero-shot generalizability of SAM, with minimal additional parameters and computation. We introduce a High-Quality Output Token into SAM’s mask decoder to predict high-quality masks. This token is not just applied to mask-decoder features but is first fused with early and final Vision Transformer (ViT) features for improved mask details. The resulting HQ-SAM model is trained on a dataset of 44K fine-grained masks from various sources. It exhibits high performance across 9 diverse segmentation datasets for different downstream tasks.

Repository Structure

This repository is structured as follows:

demo: Contains two demo scripts (demo_hqsam.py and demo_sam.py) for running HQ-SAM and SAM, respectively. The input_imgs directory within demo houses several example images for testing.
segment_anything: This is the core directory of the project. It includes scripts for building SAM and HQ-SAM models (build_sam.py and build_sam_baseline.py), an automatic mask generator (automatic_mask_generator.py), and a predictor script (predictor.py). The modeling subdirectory contains the model components, including the image encoder, mask decoder, prompt encoder, and transformer. The utilssubdirectory provides utility scripts for mask generation, ONNX model conversion, and image transformations.
visual_demo: This directory contains several GIFs that visually demonstrate the capabilities of HQ-SAM.
setup.cfg and setup.py: These files are for setting up the project environment.

Getting Started

To start with HQ-SAM, ensure you have Python 3.8 or higher, along with PyTorch 1.7 and TorchVision 0.8 installed. The codebase also requires several other dependencies for mask post-processing, saving masks in COCO format, running the example notebooks, and exporting the model in ONNX format. You can clone this repository and install it using pip.

After the environment is set up, you can run the demos or use the model components for your segmentation tasks. You can also view the GIFs in the visual_demo directory to see HQ-SAM in action. The model has been trained on a dataset of 44k masks and its efficacy has been demonstrated across various segmentation tasks.

Here is the tree show of all the files in this software:

.
├── demo
│   ├── demo_hqsam.py
│   ├── demo_sam.py
│   └── input_imgs
│       ├── example0.png
│       ├── example1.png
│       ├── example2.png
│       ├── example3.png
│       ├── example4.png
│       ├── example5.png
│       ├── example6.png
│       ├── example7.png
│       └── example8.png
├── segment_anything
│   ├── __init__.py
│   ├── automatic_mask_generator.py
│   ├── build_sam.py
│   ├── build_sam_baseline.py
│   ├── modeling
│   │   ├── __init__.py
│   │   ├── common.py
│   │   ├── image_encoder.py
│   │   ├── mask_decoder.py
│   │   ├── mask_decoder_hq.py
│   │   ├── prompt_encoder.py
│   │   ├── sam.py
│   │   └── transformer.py
│   ├── predictor.py
│   └── utils
│       ├── __init__.py
│       ├── amg.py
│       ├── onnx.py
│       └── transforms.py
├── setup.cfg
├── setup.py
└── visual_demo
    ├── 1.gif
    ├── 2.gif
    ├── 3.gif
    ├── 4.gif
    ├── 5.gif
    └── 6.gif

Download the source code by using the link in comment box:

Segment Anything in High Quality Using Deep Learning

SAM is designed to improve the quality of SAM’s mask prediction, particularly for objects with intricate structures.

Repository Structure

Getting Started

This post is for paid subscribers