Smoothed Gaussian Mixture Models for Video Classification and Recommendation

Kafle, Sirjan; Gupta, Aman; Xia, Xue; Sankar, Ananth; Chen, Xi; Wen, Di; Zhang, Liang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2012.11673 (cs)

[Submitted on 17 Dec 2020]

Title:Smoothed Gaussian Mixture Models for Video Classification and Recommendation

Authors:Sirjan Kafle, Aman Gupta, Xue Xia, Ananth Sankar, Xi Chen, Di Wen, Liang Zhang

View PDF

Abstract:Cluster-and-aggregate techniques such as Vector of Locally Aggregated Descriptors (VLAD), and their end-to-end discriminatively trained equivalents like NetVLAD have recently been popular for video classification and action recognition tasks. These techniques operate by assigning video frames to clusters and then representing the video by aggregating residuals of frames with respect to the mean of each cluster. Since some clusters may see very little video-specific data, these features can be noisy. In this paper, we propose a new cluster-and-aggregate method which we call smoothed Gaussian mixture model (SGMM), and its end-to-end discriminatively trained equivalent, which we call deep smoothed Gaussian mixture model (DSGMM). SGMM represents each video by the parameters of a Gaussian mixture model (GMM) trained for that video. Low-count clusters are addressed by smoothing the video-specific estimates with a universal background model (UBM) trained on a large number of videos. The primary benefit of SGMM over VLAD is smoothing which makes it less sensitive to small number of training samples. We show, through extensive experiments on the YouTube-8M classification task, that SGMM/DSGMM is consistently better than VLAD/NetVLAD by a small but statistically significant margin. We also show results using a dataset created at LinkedIn to predict if a member will watch an uploaded video.

Comments:	11 pages, 3 figures, 7 tables
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
ACM classes:	I.2.10
Cite as:	arXiv:2012.11673 [cs.CV]
	(or arXiv:2012.11673v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2012.11673

Submission history

From: Aman Gupta [view email]
[v1] Thu, 17 Dec 2020 06:52:41 UTC (266 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Smoothed Gaussian Mixture Models for Video Classification and Recommendation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Smoothed Gaussian Mixture Models for Video Classification and Recommendation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators