Towards a General-Purpose Video Model for Primate Behavior in the Wild
Pretraining on PriVi improves state-of-the-art across four primate behavior benchmarks. Frozen evaluation of a V-JEPA model pretrained on PriVi consistently outperforms prior work, including fully finetuned baselines. Additionally pretraining on the target datasets improves performance further.
Non-human primates are our closest living relatives, and analyzing their behavior is central to research in cognition, evolution, and conservation. Computer vision could greatly aid this research, but existing methods often rely on human-centric pretrained models and focus on single datasets, which limits generalization. We address this limitation by shifting from a model-centric to a data-centric approach and introduce PriVi, a large-scale primate-centric video pretraining dataset. PriVi contains 424 hours of curated video, combining 174 hours from behavioral research across 11 settings with 250 hours of diverse web-sourced footage, assembled through a scalable data curation pipeline. We continue pretraining V-JEPA, a large-scale video model, on PriVi to learn primate-specific representations and evaluate it using a lightweight frozen classifier. Across four benchmark datasets — ChimpACT, PanAf500, BaboonLand, and ChimpBehave — our approach consistently outperforms prior work, including fully finetuned baselines, and scales favorably with fewer labels. These results demonstrate for the first time that domain-level pretraining, where pretraining is conducted on similar data but not the target dataset itself, works for video models. Our primate-centric pretraining substantially improves data efficiency and generalization, making it a promising approach for low-label applications.
PriVi spans a wide range of primate species, environments, and recording conditions — from camera traps in the wild to controlled lab settings.
PriVi combines two complementary subsets: R&O (174h) consists of research video from 11 behavioral studies spanning lemurs, baboons, chimpanzees, macaques, marmosets, and more. YT-Filtered (250h) adds diverse web-sourced footage curated through our automated pipeline to maximize coverage of species and settings.
| Genus / Family | YT-Filt. | R&O | PriVi |
|---|---|---|---|
| Macaques | 63.1% | 14.1% | 43.0% |
| Chimpanzees | 7.8% | 35.7% | 19.3% |
| Baboons | 1.3% | 16.2% | 7.4% |
| True Lemurs | <1% | 22.7% | 9.8% |
| Marmosets | <1% | 5.7% | 2.1% |
| Squirrel monkeys | <1% | 5.4% | 2.0% |
| Orangutans | 4.1% | 0% | 2.4% |
| Others | 8.1% | 0% | 4.8% |
| Setting | YT-Filt. | R&O | PriVi |
|---|---|---|---|
| In the wild | 59.6% | 62.7% | 60.9% |
| Wild-like | 27.8% | 22.4% | 25.6% |
| Indoors | 4.1% | 14.6% | 8.4% |
| Not identifiable | 8.6% | 0% | 5.1% |
Data curation pipeline. Our automated pipeline filters, deduplicates, and curates web-sourced primate videos at scale.
We continue pretraining V-JEPA (ViT-L) on PriVi to learn primate-specific video representations. For evaluation, we develop a lightweight attentive classifier with only 220K trainable parameters on top of the frozen encoder. This enables efficient training for diverse primate behavior benchmarks without finetuning the backbone.
Model overview. We continue pretraining V-JEPA on PriVi and optionally the target dataset, then evaluated with a frozen backbone and lightweight attentive classifier.
PriVi consistently outperforms prior methods across four primate behavior benchmarks, including fully finetuned baselines with orders of magnitude more trainable parameters.
| Method | ChimpACT | PanAf500 | BaboonLand | ChimpBehave | ||||
|---|---|---|---|---|---|---|---|---|
| mAP | mAPw | Acc | B-Acc | Acc | B-Acc | Acc | B-Acc | |
| X3D | 27.05 | 51.60 | 80.00 | 50.35 | 64.89 | 31.41 | 89.3 | 62.8 |
| VideoMAEv2 | — | — | — | — | — | — | 92.3 | 74.8 |
| UniformerV2-B | — | — | — | — | 63.45 | 28.67 | — | — |
| InternVideo-L | 25.7 | — | 78.57 | 54.01 | — | — | — | — |
| ChimpVLM | — | — | 84.91 | 61.94 | — | — | — | — |
| VideoPrism-g | 31.5 | — | — | — | — | — | — | — |
| Our Classifier | ||||||||
| V-JEPA | 36.33 | 55.50 | 82.96 | 56.69 | 74.91 | 26.99 | 94.99 | 68.41 |
| PriVi | 39.25 | 58.16 | 86.74 | 62.75 | 75.43 | 33.99 | 95.58 | 71.30 |
| PriVi + DaLP | 40.00 | 59.29 | 85.01 | 62.96 | 76.42 | 38.57 | 96.02 | 75.14 |
Bold = best, underline = second best. Results on test sets.
PriVi scales favorably with fewer labeled data. Error bars are 95% confidence intervals estimated over three different subsets. Results on validation sets.
Sample frames from each of the 11 R&O research subsets and the YT-Filtered web data.
PriVi was made possible by contributions from researchers across multiple institutions. The table below lists the contributors of each dataset subset.
| Dataset | Contributors |
|---|---|
| eulemur | Elif Karakoc (DPZ-BE), Claudia Fichtel (DPZ-BE), Peter Kappeler (UGoe) |
| baboon_w | William J. O'Hearn (UGoe/DPZ-CE), Julia Fischer (UGoe/DPZ-CE) |
| baboon_a | Julia Fischer (UGoe/DPZ-CE) |
| lemur | Kaja Wierucka (DPZ-BE) |
| assamese | Sofia M. Pereira (UGoe/DPZ-SE), Julia Ostner (UGoe/DPZ-SE), Oliver Schülke (UGoe/DPZ-SE) |
| barbary_a | Julia Fischer (UGoe/DPZ-CE) |
| barbary_t | Tiffany Bosshard (DPZ-CE), Julia Fischer (UGoe/DPZ-CE) |
| saimiri | Sandro Sehner (UZH), Judith Burkart (UZH) |
| marmoset | Sandro Sehner (UZH), Judith Burkart (UZH) |
| rhesus | Alexander Gail (UGoe/DPZ-CN), Neda Shahidi (UGoe/DPZ-CN) |
UGoe = University of Göttingen · DPZ-BE = DPZ, Behavioral Ecology & Sociobiology · DPZ-CE = DPZ, Cognitive Ethology · DPZ-SE = DPZ, Social Evolution in Primates · DPZ-CN = DPZ, Cognitive Neuroscience · UZH = University of Zurich · DPZ = German Primate Center — Leibniz Institute for Primate Research
@article{mueller2024privi,
title = {Towards a General-Purpose Video Model for Primate
Behavior in the Wild},
author = {Mueller, Felix B. and Meier, Jan F. and Lueddecke, Timo
and Vogg, Richard and Freixanet, Roger L. and Hassler,
Valentin and Bosshard, Tiffany and Karakoc, Elif and
O'Hearn, William J. and Pereira, Sofia M. and Sehner,
Sandro and Wierucka, Kaja and Burkart, Judith and
Fichtel, Claudia and Fischer, Julia and Gail, Alexander
and Hobaiter, Catherine and Ostner, Julia and Samuni,
Liran and Sch{\"u}lke, Oliver and Shahidi, Neda and
Wessling, Erin G. and Ecker, Alexander S.},
journal = {arXiv preprint arXiv:2511.09675},
year = {2025}
}