Type Model Download Note
Foundation Model InternVideo-MM-L-14 🤗 HF link WebVid10M+Self-collected (14M)
VideoMAEv2-B 🤗 HF link UnlabeledHybrid (1M)
VideoMAEv2-L 🤗 HF link UnlabeledHybrid (1M)
VideoMAEv2-H 🤗 HF link UnlabeledHybrid (1M)
Classification Model VideoMAEv2-B 🤗 HF link Use K400 Finetune
VideoMAEv2-B 🤗 HF link Use K710 Finetune
VideoMAEv2-B 🤗 HF link Use SSv2 Finetuning
VideoMAEv2-L 🤗 HF link Use K400 Finetuning
VideoMAEv2-L 🤗 HF link Use K700 Finetuning
VideoMAE-L 🤗 HF link Use SSv2 Finetuning
VideoMAEv2-H 🤗 ckpt Use K400 Finetuning
VideoMAEv2-H 🤗 ckpt Use SSv1 Finetuning
VideoMAEv2-H 🤗 ckpt_split1 Use HMDB51 Finetuning
Retrieval Model InternVideo-MM-L-14 🤗 HF link 🤗 LOG 🤗 OPT Use ActivityNet Finetune
InternVideo-MM-L-14 🤗 HF link 🤗 LOG 🤗 OPT Use DiDeMo Finetune
InternVideo-MM-L-14 🤗 HF link 🤗 LOG 🤗 OPT Use LSMDC Finetune
InternVideo-MM-L-14 🤗 HF link 🤗 LOG 🤗 OPT Use MSR-VTT Finetune
InternVideo-MM-L-14 🤗 HF link 🤗 LOG 🤗 OPT Use MSVD Finetune
InternVideo-MM-L-14 🤗 HF link 🤗 LOG 🤗 OPT Use VATEX Finetune
VideoQA Model InternVideo-MM-L-14 🤗 HF link Use MSRVTT Finetune
InternVideo-MM-L-14 🤗 HF link Use MSVD Finetune
InternVideo-MM-L-14 🤗 HF link Use TGIF-QA Finetune
STAL Model VideoMAE-H 🤗 HF link Use AVA-Kinetics Finetune

Citation


    @article{wang2022internvideo,
        title={InternVideo: General Video Foundation Models via Generative and Discriminative Learning},
        author={Wang, Yi and Li, Kunchang and Li, Yizhuo and He, Yinan and Huang, Bingkun and Zhao, Zhiyu and Zhang, Hongjie and Xu, Jilan and Liu, Yi and Wang, Zun and Xing, Sen and Chen, Guo and Pan, Junting and Yu, Jiashuo and Wang, Yali and Wang, Limin and Qiao, Yu},
        journal={arXiv preprint arXiv:2212.03191},
        year={2022}
      }
  

🔙 Go Back