Top LinkedIn Content on Advanced Computer Vision Techniques

112,848 followers 7mo

Presenting FEELTHEFORCE (FTF): a robot learning system that models human tactile behavior to learn force-sensitive manipulation. Using a tactile glove to measure contact forces and a vision-based model to estimate hand pose, they train a closed-loop policy that continuously predicts the forces needed for manipulation. This policy is re-targeted to a Franka Panda robot with tactile gripper sensors using shared visual and action representa- tions. At execution, a PD controller modulates gripper closure to track predicted forces -enabling precise, force-aware control. This approach grounds robust low- level force control in scalable human supervision, achieving a 77% success rate across 5 force-sensitive manipulation tasks. #research: https://lnkd.in/dXxX7Enw #github: https://lnkd.in/dQVuYTDJ #authors: Ademi Adeniji, Zhuoran (Jolia) Chen, Vincent Liu, Venkatesh Pattabiraman, Raunaq Bhirangi, Pieter Abbeel, Lerrel Pinto, Siddhant Haldar New York University, University of California, Berkeley, NYU Shanghai Controlling fine-grained forces during manipulation remains a core challenge in robotics. While robot policies learned from robot-collected data or simulation show promise, they struggle to generalize across the diverse range of real-world interactions. Learning directly from humans offers a scalable solution, enabling demonstrators to perform skills in their natural embodiment and in everyday environments. However, visual demonstrations alone lack the information needed to infer precise contact forces.

27 Comments

Alexey Navolokin

FOLLOW ME for breaking tech news & content • helping usher in tech 2.0 • GM @ AMD • Turning AI, Cloud & Emerging Tech into Revenue

779,680 followers 9mo

Robots on the pitch....You better believe it. Will you be able to play with this one? No more standing cones or passive drills. Athletes today are dodging dynamic robots—machines that track, move, and react in real time. These aren’t gimmicks; they’re next-gen training partners. ⚽ In football, systems like SKILLSLAB, Rezzil, and Trailblazer Training Bots are already used by top clubs to simulate high-pressure situations, improve decision-making, and measure milliseconds of reaction time. 🏀 In basketball, robotic arms help perfect shooting arcs, while AI vision tools break down footwork frame by frame. 🎾 In tennis, smart ball machines adjust spin, speed, and placement in unpredictable sequences—training the brain as much as the body. Why it matters: + Athletes improve reaction speed by up to 20% using adaptive robotic drills. + Training bots allow 3x more touches per minute compared to traditional drills. + Machine-learning platforms track thousands of data points per session—customizing feedback instantly. This isn’t just tech—it’s transformation. Robots are helping players train faster, smarter, and with a grin on their face. #Innovation #Tech #Robots

83 Comments

Swami Sivasubramanian

VP, AWS Agentic AI

191,348 followers 2mo

For most of football’s history, much of what we watched on the field went unmeasured. Today, nearly every player and ball movement throughout the game is measured, modeled, and analyzed in real time. This data is improving fan experiences and giving them richer sport insights. It's also changing how professionals approach the game—from improving player safety to unlocking new training environments. The results speak for themselves: a 35% reduction in lower-extremity injuries from the redesigned kickoff format, informed by Next Gen Stats data. Innovations like completion probability and rush yards over expectation that make broadcasts more engaging. And now, pose-tracking technology that captures full skeletal data 60 times per second, is opening doors to VR training that could accelerate player development from years to months. I'm proud of how we've expanded our partnership with the NFL on Next Gen Stats, powered by AI tools like Amazon SageMaker and Amazon Quick. What started as a tracking experiment in 2015 has become a critical part of the NFL’s infrastructure that uses machine learning models on AWS to process data from 22 players, generating 500-1,000 stats per play, instantly. What a win for the Hawks last night! If you're still riding the excitement, take a few minutes to read through this deep dive into the science that powers the complex stats you see on screen throughout the season. Cool look at the history of our partnership with the NFL through Next Gen Stats! https://lnkd.in/gX8Mpe7T

8 Comments

Sahar Mor

I help researchers and builders make sense of AI | ex-Stripe | aitidbits.ai | Angel Investor

41,936 followers 1y

Video understanding has been lagging behind text, image, and audio modalities—until now. Meta and Stanford researchers unveiled Apollo, a new family of state-of-the-art video-centric large multimodal models (video-LMMs) designed to close this gap. Unlike prior efforts, Apollo sets a new standard by efficiently analyzing hour-long videos and achieving breakthrough results on multiple benchmarks. Paper highlights: (1) Scaling Consistency - design decisions made with smaller models transfer reliably to larger ones, drastically cutting computational costs (2) Advanced video sampling techniques - Apollo uses FPS sampling, outperforming traditional uniform sampling methods (3) Streamlined evaluation - the new ApolloBench benchmark evaluating video-LMMs efficiently, reducing evaluation time by 41x while maintaining accuracy Apollo’s superior video comprehension capabilities pave the way for breakthroughs like real-time video summarization for content creators, better temporal reasoning for medical diagnostics, and enhanced video analytics for autonomous driving. With Apollo, video understanding might finally catch up to its multimodal counterparts. Project page https://lnkd.in/gSbxE9gS — Join thousands of world-class researchers and engineers from Google, Stanford, OpenAI, and Meta staying ahead on AI http://aitidbits.ai

2 Comments

Arjun Jain

Co-Creating Tomorrow’s AI | Research-as-a-Service | Founder, Fast Code AI | Dad to 8-year-old twins

36,109 followers 10mo

#MIT's new "Radial Attention" makes Generative Video 4.4x cheaper to train and 3.7x faster to run. Here's why: The problem with current AI video? It's BRUTALLY expensive. Every frame must "pay attention" to every other frame. With thousands of frames, costs explode exponentially. Training one model? $100K+ Running it? Painfully slow. Massachusetts Institute of Technology, NVIDIA, Princeton, UC Berkeley, Stanford, and First Intelligence just changed the game. Their breakthrough insight: Video attention works like physics. - Sound gets quieter with distance - Light dims as it travels - Heat dissipates over space Turns out, AI video tokens follow the same rules. Why waste compute power on distant, irrelevant connections? Enter Radial Attention: Instead of checking EVERY connection: • Nearby frames → full attention • Distant frames → sparse attention • Computation scales logarithmically, not quadratically Technical result: O(n log n) vs O(n²) Translation: MASSIVE efficiency gains Real-world results on production models: 📊 HunyuanVideo (Tencent): • 2.78x training speedup • 2.35x inference speedup 📊 Mochi 1: • 1.78x training speedup • 1.63x inference speedup Quality? Maintained or IMPROVED. What this unlocks: 4x longer videos, same resources 4.4x cheaper training costs 3.7x faster generation Works with existing models (no retraining!) And, MIT open-sourced everything: https://lnkd.in/gETYw8eT The bigger picture: The internet is transforming. BEFORE: A place to store videos from the real world NOW: A machine that generates synthetic content on demand Think about it: • TikTok filled with AI-generated content • YouTube creators using AI for entire videos • Streaming services producing personalized shows • Educational content generated for each student This changes everything. Remember when only big tech could afford image AI? 2020: GPT-3 → Only OpenAI 2022: Stable Diffusion → Everyone 2024: Midjourney everywhere Video AI is next. Radial Attention probably just accelerated the timeline. The future isn't coming. It's here. And it's more accessible than ever. Want to ride this wave? → Follow me for weekly AI breakthroughs → Share if this opened your eyes → Try the code: https://lnkd.in/gETYw8eT What will YOU create when video AI costs 4x less? #AI #VideoGeneration #MachineLearning #TechInnovation #FutureOfContent

2 Comments

Mukundan Govindaraj

Global Developer Relations | Physical AI | Digital Twin | Robotics

18,786 followers 1w

Streaming 3D reconstruction is fundamentally a memory problem. How do you map a massive, multi-room environment without blowing up your compute budget as the sequence gets longer? Lingbo-Map just introduced a highly elegant architectural solution to this exact bottleneck: Geometric Context Attention (GCA). Instead of brute-forcing the entire scene history into memory, GCA splits the streaming state into three lightweight buckets: an anchor for global coordinate grounding, a local reference window for dense geometry, and a compressed trajectory memory. By squashing the full sequence history into compact per-frame tokens, the memory and compute requirements remain nearly constant. Running through a DINO backbone, the pipeline actively predicts camera poses and depth maps at ~20 FPS—even on continuous 10,000+ frame sequences. This is how you scale real-time spatial computing and large-scale digital twins without needing infinite VRAM. Models: https://lnkd.in/dxY7D4Ar Project page: https://lnkd.in/dKRUEQaq Code: https://lnkd.in/dXQSJB7u Paper: https://lnkd.in/diPQk3Ki #SpatialComputing #3DReconstruction #ComputerVision #MachineLearning #SLAM #DevRel

1 Comment

LinkedIn News Australia

1,507,715 followers

2y

One of the latest applications for artificial intelligence could be a game changer — literally. Scientists at Google DeepMind in London have teamed up with the UK's Liverpool Football Club team to create TacticAI, a model that can provide insights on corner kicks. The tool has been trained on a dataset of 7,176 corner kicks from Premier League matches and uses a technique called 'geometric deep learning' to identify key strategic patterns that could prove to be critical in tight matches. "Predicting the outcomes of corner kicks is particularly complex due to the randomness in gameplay from individual players and the dynamics between them," Colin Murdoch, DeepMind's chief business officer, explained on LinkedIn. "TacticAI can model how players interact on the pitch, offering coaches advanced strategies to improve game outcomes." The research was published in a paper in Nature this week. But AI isn’t exactly new to sport, writes Edith Cowan University lecturer Mark Scanlan in The Conversation Australia + NZ. He says it was used in the men’s and women’s World Cups in 2022 and 2023, in conjunction with advanced ball-tracking technology to produce semi-automated offside decisions, and is a powerful tool for organisations. “Professional football clubs have analytical departments using AI at every level of the game, predominantly in the areas of scouting, recruitment and athlete monitoring. Other research has also tried to predict players’ shots on goal, or guess from a video what off-screen players are doing,” he writes. But, while he says AI promises to “offer coaches a more objective and analytical approach to the game”, it cannot make decisions on the fly, which is often where matches are won and lost. What do you think of the use of tech in sport? Could AI assistants give some coaches an unfair advantage or is it the future of competitions? Comment below. By Sam Shead and Cathy Anderson #sport #ParisOlympics Sources: The Conversation Australia + NZ: https://lnkd.in/g7y35qC4 Financial Times: https://lnkd.in/gY6UENpR Nature: https://lnkd.in/gVsUwJRy

Can AI improve football teams’ success from corner kicks? Liverpool and others are betting it can theconversation.com

18 Comments

Uli Hitzel

Executive Geek

15,285 followers 8mo

I built a set of command-line tools that let you generate, edit, and analyze images through Unix pipes - beautifully simple on Mac and Linux, and probably works on Windows too. These tools work perfectly with Google's brand new Gemini 2.5 Flash Image (nicely codenamed nano-banana). And at ~$0.039 per image through OpenRouter, you can actually afford to experiment and benchmark these models. Here's the simple case - generate a new image: 𝗴𝗿𝗮𝗳𝘁 -𝗽 "𝗔𝗻 𝗛𝗗 𝗽𝗵𝗼𝘁𝗼 𝗼𝗳 𝗮 𝗰𝘆𝗯𝗲𝗿𝗽𝘂𝗻𝗸 𝘀𝘁𝗿𝗲𝗲𝘁 𝗺𝗮𝗿𝗸𝗲𝘁 𝗮𝘁 𝗻𝗶𝗴𝗵𝘁" To make it more interesting, we can grab an image from the web and modify it: 𝗰𝘂𝗿𝗹 𝗵𝘁𝘁𝗽𝘀://𝗰𝗱𝗻.𝗻𝗮𝗶𝗱𝗮.𝗮𝗶/𝗺𝗶𝘀𝗰/𝘀𝗴𝟮𝟬𝟰𝟵.𝗽𝗻𝗴 | 𝗴𝗿𝗮𝗳𝘁 -𝗽 "𝗮𝗱𝗱 𝘀𝗼𝗺𝗲 𝗳𝗹𝘆𝗶𝗻𝗴 𝗱𝗿𝗼𝗻𝗲𝘀" That's a futuristic Singapore skyline, and now it has drones. Pipe it through glimpse to verify what changed, chain multiple edits, build entire workflows. Want to test if an AI model really understands photography styles? Run this: 𝗳𝗼𝗿 𝗱𝗲𝗰𝗮𝗱𝗲 𝗶𝗻 𝟭𝟵𝟱𝟬 𝟭𝟵𝟲𝟬 𝟭𝟵𝟳𝟬 𝟭𝟵𝟴𝟬 𝟭𝟵𝟵𝟬 𝟮𝟬𝟬𝟬; 𝗱𝗼 𝗴𝗿𝗮𝗳𝘁 -𝗽 "𝘀𝘁𝗿𝗲𝗲𝘁 𝘀𝗰𝗲𝗻𝗲, 𝗮𝘂𝘁𝗵𝗲𝗻𝘁𝗶𝗰 ${𝗱𝗲𝗰𝗮𝗱𝗲}𝘀 𝗽𝗵𝗼𝘁𝗼𝗴𝗿𝗮𝗽𝗵" -𝗼 - | 𝗴𝗹𝗶𝗺𝗽𝘀𝗲 -𝗺 <𝘀𝗼𝗺𝗲_𝗼𝘁𝗵𝗲𝗿_𝗺𝗼𝗱𝗲𝗹> -𝗽 "𝘄𝗵𝗮𝘁 𝗱𝗲𝗰𝗮𝗱𝗲 𝘄𝗮𝘀 𝘁𝗵𝗶𝘀 𝗽𝗵𝗼𝘁𝗼 𝘁𝗮𝗸𝗲𝗻?" 𝗱𝗼𝗻𝗲 You now have data on whether the model actually knows what makes a 1970s photo look like the 1970s. Run it 100 times with different temperatures, build a confusion matrix, find the edge cases where models hallucinate or ignore instructions. Configure glimpse to use high-end vision models like Gemini 2.5 Pro, GPT-5 or Claude 4 Sonnet to evaluate the outputs from smaller, cheaper, faster generation models - proper benchmarking without breaking the bank. For researchers evaluating image models, this beats clicking through web interfaces or writing complex evaluation scripts. Everything is scriptable, reproducible, and measurable. Export to CSV, track model performance over time, integrate into your CI/CD pipeline to catch regressions. The Unix philosophy wins big here: small tools that do one thing well, composed into powerful pipelines = rapid research & benchmarking. Code is on GitHub at u1i/graft if you want to try it yourself.

12 Comments

Muhammad Rizwan Munawar

53,525 followers 1mo

Real-time football analytics running on the MemryX Inc. MX3 accelerator card ⚽ Over the weekend, I built a real-time football analytics pipeline that tracks players, analyzes movement, and extracts tactical insights while running efficiently at the edge. How it works: ✅ Detects all players on the pitch using a YOLOv8 detection model. ✅ Identifies team jersey colors to separate players automatically. ✅ Draws color-coded ellipses around player footprints for clear visual tracking. ✅ Tracks players across frames to estimate movement speed and trajectories. ✅ Calculates camera location using 2D coordinate measurements for spatial analysis. ⚡ Pipeline flow: YOLO PyTorch → ONNX → DFP compilation → Inference on MX3 → Player tracking & analytics 🔗 Check the code here ➡️ https://lnkd.in/djmpYK8A #computervision #edgeai #sports

16 Comments

Bhavishya Pandit

85,318 followers 6mo

You can now generate infinite-length videos!? Yes, literally infinite. Let me quickly explain why it's a problem to begin with: AI models generate videos frame by frame, and each new frame depends on the previous one. The problem? Tiny errors stack up. By frame 100, your subject starts distorting. By frame 500, everything's a mess 💩 This happens because the model was trained on clean data, but during generation, it has to build on top of its own imperfect outputs. That gap kills quality over time [vanishing gradient analogy]. Plus, existing methods only handle one prompt, so you get repetitive scenes with no real story progression. Here's where Stable Video Infinity from EPFL shines 💡: Instead of fighting errors, it learns from them. The breakthrough is Error-Recycling Fine-Tuning. During training, the model deliberately injects its own past errors into clean frames, watches what goes wrong, and figures out how to fix it. Here's the process: → inject historical errors to simulate real generation conditions → predict where drift will happen → bank those errors in memory → learn to correct them before they compound. This creates three powerful results: • Videos can extend infinitely without quality collapse • Scene transitions happen naturally with controllable storylines • Works with multiple conditions like audio, skeleton poses, and text streams They've generated 10-minute Tom & Jerry videos from a single image. Not stitched clips, but continuous generation. The efficiency comes from only training LoRA adapters, not the full model. You can customise it without massive computing. The challenges? Real-time streaming isn't there yet. The model generates clip-by-clip with bidirectional attention for quality, which means you can't stream live outputs instantly. You still need decent hardware to train custom versions, though inference is manageable. And while error recycling is clever, the model needs to bank enough error patterns during training to handle diverse scenarios. But the future's interesting. They're working on Wan 2.2 5B-based SVI and true streaming generation. If they can achieve real-time inference while maintaining quality, this becomes viable for live content creation and gaming. The bigger idea here is training models on their own mistakes, rather than just clean data. That could apply beyond video to any autoregressive generation task. What's the longest AI-generated video you've successfully created without quality degradation, and what method did you use? Follow me, Bhavishya Pandit, for honest takes on AI breakthroughs that actually work 🔥

30 Comments

Advanced Computer Vision Techniques

More in Advanced Computer Vision Techniques

More Technology topics

Explore categories