What would be a good approach to applying computer vision to automatically edit out the downtime in tennis video?

Ask Question

Asked 4 years, 9 months ago

Modified 4 years, 9 months ago

Viewed 231 times

I have an iOS/macOS tennis app that now lets the user import video, and I would like to add the ability to automatically edit out the significant amount of downtime where players are not in a rally or point. To start I considered filtering the video down to segments where tennis ball trajectories are detected, however I don't think it's possible the associated video times for trajectory occurrences.

If I'm making two splits for each segment of downtime, I could use pose detection (applies to video as well as photos) to find all the serve motions (how each point starts) or feeds (how each practice rally starts) for each ending split.

For each starting split that marks the beginning of a downtime segment though I'm not sure what would be the simplest approach to detect that a point or rally ended. Detecting hand signals for "out" wouldn't cover a lot of ways in which a point ends, neither would audio analysis for verbal out calls and certainly not for practice sessions. Any guidance for what might be a relatively simple and comprehensive approach here (likely using a machine learning framework) would be greatly appreciated.

Expected video comes from everyday tennis players filmed from behind the baseline against the fence, and their own court would the dominant part of the frame.

edited Feb 6, 2021 at 3:48

asked Feb 6, 2021 at 0:14

Curious

951 silver badge10 bronze badges

1

Why not just listen for the sound of the ball being hit? Back track a few seconds from that and you'll get to see the serve. The game is over when the crowd claps. Humans tend to overly focus on vision. Sometimes your other senses serve you better.

candied_orange
– candied_orange

2021-02-06 02:44:01 +00:00
Commented Feb 6, 2021 at 2:44
Hmm, this is for anyone who films their own tennis, so often they'll be at public courts where the camera would also pick up audio from nearby courts. For professional matches, the crowd often claps in between points.

Curious
– Curious

2021-02-06 02:52:29 +00:00
Commented Feb 6, 2021 at 2:52
1

As long as the task is just to classify video into two different values, you almost certainly don't need any task-specific special tricks, just a neural network with a reasonably large annotated training set.

Kilian Foth
– Kilian Foth

2021-02-06 08:22:06 +00:00
Commented Feb 6, 2021 at 8:22
I don't know anything about built-in AI from IOS, but you could try to train your computer vision models to detect the ball trajectories, and maybe add another model to detect also the movement of the player's rackets (to check if they are moving). Then, to detect the end of a rally, perhaps the ball might still be moving, but the rackets not. Anyway, you could perform a combination of checks using your available CV models to detect what you want in the video.

Emerson Cardoso
– Emerson Cardoso

2021-02-13 12:00:56 +00:00
Commented Feb 13, 2021 at 12:00

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

What would be a good approach to applying computer vision to automatically edit out the downtime in tennis video?

0

Your Answer

Hot Network Questions

What would be a good approach to applying computer vision to automatically edit out the downtime in tennis video?

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions