HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound probe pose estimation

HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound-probe pose estimation

University College London (UCL)
27th International Conference on Medical Image Computing and Computer Assisted Intervention
MICCAI 2024

🔄 Update (July 2025)

We've released an extended version of the HUP-3D dataset — called HUP-3D-v2 — which includes the original 31,680 frames and an additional 31,680 frames of grasps using the wireless Clarius C3 ultrasound probe. This extension enhances data diversity and supports improved model generalisation when used to train 3D pose prediction models. We’ve also released an updated version of the HUP-3D model code for 3D joint hand–probe pose estimation, now with support for multi-modal (RGB-D) input. Additionally, the dataset containing only the 31,680 new frames of Clarius C3 grasps is also available separately. You can access the datasets and updated 3D pose estimation evaluation code for RGB-D support here:

Abstract

We present HUP-3D, a 3D multiview multimodal synthetic dataset for hand ultrasound (US) probe pose estimation in the context of obstetric ultrasound. Egocentric markerless 3D joint pose estimation has potential applications in mixed reality medical education. The ability to understand hand and probe movements opens the door to tailored guidance and mentoring applications. Our dataset consists of over 31k sets of RGB, depth, and segmentation mask frames, including pose-related reference data, with an emphasis on image diversity and complexity. Adopting a camera viewpoint-based sphere concept allows us to capture a variety of views and generate multiple hand grasps poses using a pre-trained network. Additionally, our approach includes a software-based image rendering concept, enhancing diversity with various hand and arm textures, lighting conditions, and background images. We validated our proposed dataset with state-of-the-art learning models and we obtained the lowest hand-object keypoint errors. The supplementary material details the parameters for sphere-based camera view angles and the grasp generation and rendering pipeline configuration. The source code for our grasp generation and rendering pipeline, along with the dataset, is publicly available at https://manuelbirlo.github.io/HUP-3D/.

BibTeX

@InProceedings{Bir_HUP3D_MICCAI2024, author = { Birlo, Manuel and Caramalau, Razvan and Edwards, Philip J. “Eddie” and Dromey, Brian and Clarkson, Matthew J. and Stoyanov, Danail}, title = { { HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound-probe pose estimation } }, booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024}, year = {2024}, publisher = {Springer Nature Switzerland}, volume = {LNCS 15001}, month = {October}, page = {pending} }

HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound-probe pose estimation

🔄 Update (July 2025)

This video shows a full sample sphere coverage of the generated RGB, depth and segmentation maps of our HUP-3D dataset

Abstract

Sample frames from the HUP-3D dataset, grouped columnwise, from left to right: RGB, depth, segmentation map, and ground truth annotations.

Qualitative results, shown with 4 test images from HUP-3D: image columns from left to right: RGB, predicted hand joints, predicted probe cor- ners, predicted joints and corners, ground truth of joints and corners.

Video Presentation

MICCAI Presentation Video

Poster

BibTeX