The official implementation of SAGA (Segment Any 3D GAussians).
The installation of SAGA is similar to 3D Gaussian Splatting.
git clone git@github.com:Jumpat/SegAnyGAussians.git
or
git clone https://github.com/Jumpat/SegAnyGAussians.git
Then install the dependencies:
conda env create --file environment.yml
conda activate gaussian_splatting
In default, we use the public ViT-H model for SAM. You can download the pre-trained model from here and put it under ./third_party/segment-anything/sam_ckpt.
The used datasets are 360_v2, nerf_llff_data and LERF.
The data structure of SAGA is shown as follows:
./data
/360_v2
/garden
/images
/images_2
/images_4
/images_8
/sparse
/features
/sam_masks
/mask_scales
...
/nerf_llff_data
/fern
/images
/poses_bounds.npy
/sparse
/features
/sam_masks
/mask_scales
/horns
...
...
/lerf_data
...
Since we need the pre-trained 3D-GS model for mask scales extraction, the first step is to train the 3D Gaussians:
We inherit all attributes from 3DGS, more information about training the Gaussians can be found in their repo.
python train_scene.py -s <path to COLMAP or NeRF Synthetic dataset>
Then, to get the sam_masks and corresponding mask scales, run the following command:
python extract_segment_everything_masks.py --image_root <path to the scene data> --sam_checkpoint_path <path to the pre-trained SAM model> --downsample <1/2/4/8>
python get_scale.py --image_root <path to the scene data> --model_path <path to the pre-trained 3DGS model>
Note that sometimes the downsample is essential due to the limited GPU memory.
If you want to try the open-vocabulary segmentation, extract the CLIP features first:
python get_clip_features.py --image_root <path to the scene data>
python train_contrastive_feature.py -m <path to the pre-trained 3DGS model> --iterations 10000 --num_sampled_rays 1000
Currently SAGA provides an interactive GUI (saga_gui.py) implemented with dearpygui and a jupyter-notebook (prompt_segmenting.ipynb). To run the GUI:
python saga_gui.py --model_path <path to the pre-trained 3DGS model>
Temporarily, open-vocabulary segmentation is only implemented in the jupyter notebook. Please refer to prompt_segmenting.ipynb for detailed instructions.
After saving segmentation results in the interactive GUI or running the scripts in prompt_segmenting.ipynb, the bitmap of the Gaussians will be saved in ./segmentation_res/<name>.pt (you can set the name by yourself). To render the segmentation results on training views (get the segmented object by removing the background), run the following command:
python render.py -m <path to the pre-trained 3DGS model> --precomputed_mask <path to the segmentation results> --target scene --segment
To get the 2D rendered masks, run the following command:
python render.py -m <path to the pre-trained 3DGS model> --precomputed_mask <path to the segmentation results> --target seg
You can also render the pre-trained 3DGS model without segmentation:
python render.py -m <path to the pre-trained 3DGS model> --target scene
If you find this project helpful for your research, please consider citing the report and giving a ⭐.
@article{cen2023saga,
title={Segment Any 3D Gaussians},
author={Jiazhong Cen and Jiemin Fang and Chen Yang and Lingxi Xie and Xiaopeng Zhang and Wei Shen and Qi Tian},
year={2023},
journal={arXiv preprint arXiv:2312.00860},
}
The implementation of saga refers to GARField, OmniSeg3D, Gaussian Splatting, and we sincerely thank them for their contributions to the community.