🤖🏓ViSTec: Video Modeling for Sports Technique Recognition and Tactical Analysis

Zhejiang University
(*denotes equal contribution, each may be listed first)

Abstract

The immense popularity of racket sports has fueled substantial demand in tactical analysis with broadcast videos. However, existing manual methods require laborious annotation, and recent attempts leveraging video perception models are limited to low-level annotations like ball trajectories, overlooking tactics that necessitate an understanding of stroke techniques. State-of-the-art action segmentation models also struggle with technique recognition due to frequent occlusions and motion-induced blurring in racket sports videos. To address these challenges, We propose ViSTec, a Video-based Sports Technique recognition model inspired by human cognition that synergizes sparse visual data with rich contextual insights. Our approach integrates a graph to explicitly model strategic knowledge in stroke sequences and enhance technique recognition with contextual inductive bias. A two-stage action perception model is jointly trained to align with the contextual knowledge in the graph. Experiments demonstrate that our method outperforms existing models by a significant margin. Case studies with experts from the Chinese national table tennis team validate our model's capacity to automate analysis for technical actions and tactical strategies.

Background: Racket Sports Analysis

Game Videos -> Technique Sequences -> Tactical Analysis

Racket sports, including tennis, badminton, and table tennis, are distinguished by their highly strategic nature, drawing millions of players and fans to explore tactical analysis. A game in racket sports consists of rallies, which are sequences of strokes executed by players alternately from both sides. Here, a stroke refers to the action of hitting the ball with a racket, and each stroke can employ a certain technique, such as "topspin" and "push." A tactic is characterized by a series of consecutive stroke techniques.

Background illustration



Experiments

Qualitative Demos

Ground Truth

ViSTec

Baseline

Result image.

Our Result

Loading...

Quantitative Results

SOTA Accuracy

Our proposed approach excels across all common evaluation metricsin video action segmentation, achieving state-of-the-art results in sports technique segmentation from broadcast game videos.

Framework of ViSTec.

Real-time Performance

ViSTec demonstrates an inference speed of approximately 39.3 frames per second (fps) in offline tests on a single A100 GPU, thereby enabling the potential for real-time processing of broadcast game videos.



Case Studies

We present two case studies conducted with senior analysts from the Chinese table tennis team, which address the most critical and complex facets of sports analysis that require domain knowledge: technique analysis and tactical analysis.

Technical Level Analysis

Case 1 image.

Tactical Level Analysis

Case 2 image.

Case 1: Analyzing Personalized Characteristics of Technical Actions

As shown in the left figure, for Japanese players, technique "Block" and "Topspin" exhibits striking analogy, as do "Push" and "Short". These similarities within deep visual features "reveal the high consistency and deceptive nature of their certain techniques," noted by the experts. This observation furnishes valuable insights, allowing opponents to enhance their preparation and anticipation of the players' moves in specific techniques, a critical factor in the fast-paced world of racket sports. Similar analysis can be transferred to other players in real time, enabling the understanding of the unique characteristics of their opponent's actions.

Case 2: Discovering Optimal Tactical Choices

As demonstrated in the right figure (B), the sequence "Serve Short Topspin"exhibits the highest scoring rate. This suggests that when serving on our side and the subsequent opponent's stroke involves a "Short" technique, the optimal choice in terms of scoring rate is to respond with a "Topspin" stroke in the next play. Moving on to the right figure (C), it becomes evident that following two strokes of the "Serve" and "Short" techniques, persisting with another "Short" stroke or responding with "Others" leads to a sudden drop in scoring rate to around 0.43. This underscores that taking the initiative early in the game and launching an offensive increases our likelihood of winning.

BibTeX

@article{2024vistec,
      title     = {ViSTec: Video Modeling for Sports Technique Recognition and Tactical Analysis},
      author    = {Yuchen He and Zeqing Yuan and Yihong Wu and Liqi Cheng and Dazhen Deng and Yingcai Wu},
      journal   = {AAAI Conference on Artificial Intelligence},
      year      = {2024}
}