InternVideo: General Video Foundation Models via Generative and Discriminative Learning

Type	Model	Download	Note
Foundation Model	InternVideo-MM-L-14	🤗 HF link	WebVid10M+Self-collected (14M)
	VideoMAEv2-B	🤗 HF link	UnlabeledHybrid (1M)
	VideoMAEv2-L	🤗 HF link	UnlabeledHybrid (1M)
	VideoMAEv2-H	🤗 HF link	UnlabeledHybrid (1M)
Classification Model	VideoMAEv2-B	🤗 HF link	Use K400 Finetune
	VideoMAEv2-B	🤗 HF link	Use K710 Finetune
	VideoMAEv2-B	🤗 HF link	Use SSv2 Finetuning
	VideoMAEv2-L	🤗 HF link	Use K400 Finetuning
	VideoMAEv2-L	🤗 HF link	Use K700 Finetuning
	VideoMAE-L	🤗 HF link	Use SSv2 Finetuning
	VideoMAEv2-H	🤗 ckpt	Use K400 Finetuning
	VideoMAEv2-H	🤗 ckpt	Use SSv1 Finetuning
	VideoMAEv2-H	🤗 ckpt_split1	Use HMDB51 Finetuning
Retrieval Model	InternVideo-MM-L-14	🤗 HF link 🤗 LOG 🤗 OPT	Use ActivityNet Finetune
	InternVideo-MM-L-14	🤗 HF link 🤗 LOG 🤗 OPT	Use DiDeMo Finetune
	InternVideo-MM-L-14	🤗 HF link 🤗 LOG 🤗 OPT	Use LSMDC Finetune
	InternVideo-MM-L-14	🤗 HF link 🤗 LOG 🤗 OPT	Use MSR-VTT Finetune
	InternVideo-MM-L-14	🤗 HF link 🤗 LOG 🤗 OPT	Use MSVD Finetune
	InternVideo-MM-L-14	🤗 HF link 🤗 LOG 🤗 OPT	Use VATEX Finetune
VideoQA Model	InternVideo-MM-L-14	🤗 HF link	Use MSRVTT Finetune
	InternVideo-MM-L-14	🤗 HF link	Use MSVD Finetune
	InternVideo-MM-L-14	🤗 HF link	Use TGIF-QA Finetune
STAL Model	VideoMAE-H	🤗 HF link	Use AVA-Kinetics Finetune

Citation


    @article{wang2022internvideo,
        title={InternVideo: General Video Foundation Models via Generative and Discriminative Learning},
        author={Wang, Yi and Li, Kunchang and Li, Yizhuo and He, Yinan and Huang, Bingkun and Zhao, Zhiyu and Zhang, Hongjie and Xu, Jilan and Liu, Yi and Wang, Zun and Xing, Sen and Chen, Guo and Pan, Junting and Yu, Jiashuo and Wang, Yali and Wang, Limin and Qiao, Yu},
        journal={arXiv preprint arXiv:2212.03191},
        year={2022}
      }

Citation

🔙 Go Back