Top LinkedIn Content on Reviewing Progress Regularly

Sales org underperforming despite trying everything? I help CEOs, founders & B2B sales leaders use their own data to pinpoint the best 3 moves to grow revenue | $195M ex-Fortune 500 leader | WSJ bestseller | 700+ Clients

101,954 followers 8mo

I just watched an AE lose a $1.2M deal after running a "successful" product trial that the prospect LOVED. After 8 weeks of work, the CFO killed it with five words: "Let's try our current vendor." After analyzing 200+ enterprise sales cycles at companies including Salesforce, HubSpot, Thomson Reuters, and Workday, I've identified the exact framework that separates 80%+ trial conversion rates from the industry average of 30%. The psychological shift required… Stop treating trials as product demos and start treating them as RISK ELIMINATION EXERCISES. After being promoted 12 times and hitting #1 in every role before leading a 110-person team to $190M+ annually, I've developed a framework that's transformed how top companies run trials. THE 5 POINT TRIAL QUALIFICATION SYSTEM: 1. 𝗣𝗥𝗢𝗕𝗟𝗘𝗠 𝗩𝗔𝗟𝗜𝗗𝗔𝗧𝗜𝗢𝗡 Ask these 3 questions before any trial: → "What happens if you don't solve this in 90 days?" (quantify impact) → "How have you tried solving this before?" (establishes solution gap) → "Who else is affected?" (identifies stakeholders) These eliminate 68% of unqualified trials before they start. 2. 𝗦𝗨𝗖𝗖𝗘𝗦𝗦 𝗗𝗘𝗙𝗜𝗡𝗜𝗧𝗜𝗢𝗡 Document these 4 criteria: → Technical requirements (features that must work) → Business metrics (quantifiable outcomes) → Timeline requirements (implementation speed) → User adoption requirements (usage patterns) Get confirmation: "If we demonstrate [criteria], you'd move forward with purchase by [date]. Correct?" 3. 𝗦𝗧𝗔𝗞𝗘𝗛𝗢𝗟𝗗𝗘𝗥 𝗠𝗔𝗣𝗣𝗜𝗡𝗚 Create a "Decision Matrix" for: → Technical buyers (every trial user) → Economic buyers (CFO/budget holder) → Political influencers (who can kill it) → Current solution advocates (status quo beneficiaries) Document each person's personal win/loss if change happens. 4. 𝗣𝗥𝗘-𝗧𝗥𝗜𝗔𝗟 𝗔𝗚𝗥𝗘𝗘𝗠𝗘𝗡𝗧 Have legal review BEFORE starting: "We typically have legal review the agreement structure ahead of time so there are no surprises and to save us both time so we can hit the deadline of December 1st you set. Would you be open to this during the trial?" 5. 𝗖𝗨𝗥𝗥𝗘𝗡𝗧 𝗩𝗘𝗡𝗗𝗢𝗥 𝗦𝗧𝗥𝗔𝗧𝗘𝗚𝗬 Ask: → "Have you discussed these challenges with your current vendor?" → "What was their response?" → "What specific capabilities do they lack?" Document these to prevent the "let's try our current vendor" objection. RESULTS from this framework: ✅ Trial conversion: 32% to 83% in 60 days ✅ Average deal size: +40% ✅ Sales cycle: -37% ✅ Forecast accuracy: +92% ✅ Time on unsuccessful trials: -43% — Hey Sales Leaders! Want to see how we can install these kinds of results into your org? Go here: https://lnkd.in/ghh8VCaf

31 Comments

Sohrab Rahimi

Director, AI/ML Lead @ Google

23,995 followers 11mo

Evaluating LLMs is hard. Evaluating agents is even harder. This is one of the most common challenges I see when teams move from using LLMs in isolation to deploying agents that act over time, use tools, interact with APIs, and coordinate across roles. These systems make a series of decisions, not just a single prediction. As a result, success or failure depends on more than whether the final answer is correct. Despite this, many teams still rely on basic task success metrics or manual reviews. Some build internal evaluation dashboards, but most of these efforts are narrowly scoped and miss the bigger picture. Observability tools exist, but they are not enough on their own. Google’s ADK telemetry provides traces of tool use and reasoning chains. LangSmith gives structured logging for LangChain-based workflows. Frameworks like CrewAI, AutoGen, and OpenAgents expose role-specific actions and memory updates. These are helpful for debugging, but they do not tell you how well the agent performed across dimensions like coordination, learning, or adaptability. Two recent research directions offer much-needed structure. One proposes breaking down agent evaluation into behavioral components like plan quality, adaptability, and inter-agent coordination. Another argues for longitudinal tracking, focusing on how agents evolve over time, whether they drift or stabilize, and whether they generalize or forget. If you are evaluating agents today, here are the most important criteria to measure: • 𝗧𝗮𝘀𝗸 𝘀𝘂𝗰𝗰𝗲𝘀𝘀: Did the agent complete the task, and was the outcome verifiable? • 𝗣𝗹𝗮𝗻 𝗾𝘂𝗮𝗹𝗶𝘁𝘆: Was the initial strategy reasonable and efficient? • 𝗔𝗱𝗮𝗽𝘁𝗮𝘁𝗶𝗼𝗻: Did the agent handle tool failures, retry intelligently, or escalate when needed? • 𝗠𝗲𝗺𝗼𝗿𝘆 𝘂𝘀𝗮𝗴𝗲: Was memory referenced meaningfully, or ignored? • 𝗖𝗼𝗼𝗿𝗱𝗶𝗻𝗮𝘁𝗶𝗼𝗻 (𝗳𝗼𝗿 𝗺𝘂𝗹𝘁𝗶-𝗮𝗴𝗲𝗻𝘁 𝘀𝘆𝘀𝘁𝗲𝗺𝘀): Did agents delegate, share information, and avoid redundancy? • 𝗦𝘁𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗼𝘃𝗲𝗿 𝘁𝗶𝗺𝗲: Did behavior remain consistent across runs or drift unpredictably? For adaptive agents or those in production, this becomes even more critical. Evaluation systems should be time-aware, tracking changes in behavior, error rates, and success patterns over time. Static accuracy alone will not explain why an agent performs well one day and fails the next. Structured evaluation is not just about dashboards. It is the foundation for improving agent design. Without clear signals, you cannot diagnose whether failure came from the LLM, the plan, the tool, or the orchestration logic. If your agents are planning, adapting, or coordinating across steps or roles, now is the time to move past simple correctness checks and build a robust, multi-dimensional evaluation framework. It is the only way to scale intelligent behavior with confidence.

25 Comments

Akhil Mishra

Tech Lawyer for Fintech, SaaS & IT | Contracts, Compliance & Strategy to Keep You 3 Steps Ahead | Book a Call Today

11,155 followers 11mo

When founders don’t trust their team, they start hovering. Every update is a red flag. Every task feels like a risk. And the worst part? They justify it. "I just want to make sure it’s done right." But micromanagement doesn’t fix problems. It creates new ones. Especially in high-stakes industries like Fintech. Let’s say you’re outsourcing the development of a digital lending app. If there’s no structure. No system for deliverables No timeline No feedback loop Then micromanagement becomes the default. • You follow up • You second-guess • You slow everything down The real solution isn’t tighter control. That's the last thing. It’s clearer processes. Now, you might have also been told to do this: • Define ownership • Use milestone-based contracts • Set communication cadences • Track what matters - not every single step Sure, that helps. But it’s not enough. Because micromanagement is what fills the void when structure is missing. Don’t patch the symptoms. Fix the foundation. So, to make delegation and outsourcing work, here’s what I suggest to my clients: 1 // Milestone-Based Deliverables with Acceptance Criteria • Break the project into clear milestones (UI prototype, backend integration, UAT, go-live) • Define what “done” means for each milestone • Link payments to milestone approvals - not just dates Examples: "UI prototype approved by client within 3 business days of delivery" "Lending workflow passes all test cases as per attached checklist" 2 // Progress Reporting & Demo Cadence • Include weekly or bi-weekly reports (written or demo) • Cover status, blockers, next steps, and demo of completed features • Lack of updates can trigger escalation or pause payments 3 // Feedback & Review Windows • Define time limits for feedback (e.g., 5 business days) • No feedback = auto-approval to keep things moving 4 // Issue Escalation & Dispute Resolution • Add process to resolve rejected deliverables • Example: “Meet within 3 business days to resolve” • Use mediation/arbitration under Indian law for unresolved issues 5 // Ownership, Access & Handover • All code, docs, and credentials handed over at each milestone • Add interim access clauses for termination or delay 6 // Confidentiality & Compliance • NDAs and data protection must comply with Indian fintech laws • Follow DPDP Act, RBI guidelines, and security best practices When these structures are in your contract: • You create accountability without micromanagement • You get transparency and control - without the stress • Your team knows what’s expected, and you know what’s coming next Fix the foundation, and trust (plus results) will follow. --- ✍ Tell me below: What’s one process you added that helped reduce micromanagement in your team?

40 Comments

Pan Wu

Senior Data Science Manager at Meta

51,645 followers 2y

A "sampled success metric" is a performance measure or evaluation criterion calculated from a sample or subset of data rather than the entire population. Its calculation often involves higher costs per sample, such as manual review, leading to a trade-off between sample size and metric accuracy/sensitivity. In this tech blog, written by the data science team from Shopify, the discussion revolves around how the team leverages Monte Carlo simulation to understand metric variability under various scenarios to help the team make the right trade-offs. Initially, the team defines simulation metrics to describe the variability of the sampled success metric. For instance, if the actual success metric is decreasing over time, the metric could indicate how many months of sampled success metric would show a decrease, termed as "1-month decreases observed". Then, the team defines the distribution to run the Monte Carlo simulation. Monte Carlo simulation, a computational technique using random sampling to estimate outcomes of complex systems or processes with uncertain inputs, draws samples from a dedicated distribution that matches business needs. Based on past observations, the team’s application follows a Poisson distribution. Next comes the massive simulation phase, where the team runs multiple simulations for one parameter and then changes various parameters to simulate different scenarios. The goal is to quantify how much the sample mean will differ from the underlying population mean given realistic assumptions. The final result provides a clear statistical distribution of how much extra sample size could lead to metrics variability decrease and increased accuracy. This case study demonstrates that Monte Carlo simulation could be a valuable toolkit to add to your decision-making and data science knowledge. #datascience #analytics #metrics #algorithms #simulation #montecarlo #decisionmaking – – – Check out the "Snacks Weekly on Data Science" podcast and subscribe, where I explain in more detail the concepts discussed in this and future posts: -- Spotify: https://lnkd.in/gKgaMvbh -- Apple Podcast: https://lnkd.in/gj6aPBBY -- Youtube: https://lnkd.in/gcwPeBmR https://lnkd.in/dKnrZzzV

Monte Carlo Simulations: Separating Signal from Noise in Sampled Success Metrics shopify.engineering

2 Comments

Niels Corsten

Sr. Manager Service Design, CX & Journey Management @ Deloitte Digital

5,639 followers 7mo

A critical part of journey management in any large organisation is measuring how your journeys perform. 📊 By setting clear goals, monitoring performance, identifying gaps, and measuring improvement impact, you create a continuous cycle of management and enhancement. Measurement surfaces opportunities and kickstarts improvements. 🚀 Yet many organisations struggle: data sits in silos, teams measure inconsistently, and dashboards report numbers without a coherent story. Product, marketing, sales, service, and digital teams collect valuable insights, but without a common language, they never combine into a unified performance view. The result? Plenty of activity, little clarity on what actually improves customer experience and business performance. Measuring performance along specific journeys—rather than isolated KPIs—provides the right context: the journey itself. 🗺️ This approach transforms your journey framework into an engine for improving both customer experience and business performance holistically, creating a shared structure and language where different KPIs unite. 🧭 Inspired by the Balanced Scorecard, this pragmatic 3x3 Matrix structures performance measurement across two dimensions: 👉 First, it distinguishes 3 performance metric categories: - Customer performance (behavior and sentiment) - Commercial performance (conversion, customer base, revenue) - Operational performance (cost, efficiency, reliability) 👉 Second, it distinct three journey hierachy levels: - Overall customer lifecycle - End-to-end product or service journey - Individual customer tasks These intersecting dimensions ensure each metric sits logically within a complete, coherent view. The visual below shows example metrics for all nine sections, helping you build a balanced measurement framework for journeys. This matrix delivers three immediate benefits: ✨ 1. It aligns siloed KPIs and contextualizes them into a shared journey 2. It enables drill-down and aggregation through connected KPIs across journey levels 3. It surfaces trade-offs and synergies between performance metrics A few quick tips to take into account when drafting or structuring your own journey-driven measurement framework 👇👇👇 🐌 Consider both leading and lagging indicators for a robust measurement approach that balances early warning signs with outcome metrics. 🤲 Don’t collect everything. Start with a North Star KPI for each journey, and add a small set of supporting metrics. Less is more. 💬 Always mix performance metrics with more qualitative feedback and insights that will help you determine why performance is down and how to fix it. Happy measuring! 🎉

60 Comments

Michael Ross

8,304 followers 7mo

𝗜𝗱𝗲𝗮 #𝟭𝟲: 𝗠𝗲𝘁𝗿𝗶𝗰𝘀 𝘁𝗵𝗮𝘁 𝗺𝗮𝘁𝘁𝗲𝗿: 𝘁𝗵𝗲 𝗯𝗲𝗮𝘂𝘁𝘆 𝗼𝗳 𝘀𝗽𝗶𝗹𝗹 𝗮𝗻𝗱 𝘀𝗽𝗼𝗶𝗹 I worked with a hotel chain that was focused on two high-level KPIs: 𝗮𝘃𝗲𝗿𝗮𝗴𝗲 𝗿𝗼𝗼𝗺 𝗿𝗮𝘁𝗲 (𝗔𝗥𝗥) and 𝗼𝗰𝗰𝘂𝗽𝗮𝗻𝗰𝘆 (%). Occupancy was around 80% and had increased year on year but this aggregate average was hiding significant opportunities. When we de-averaged the overall occupancy by hotel and night, we discovered that very few hotels were 80% full: most were either completely full or only half full. We reframed performance using two “failure metrics” (see illustration): • 𝗦𝗽𝗼𝗶𝗹: measured empty rooms (by hotel, by night). • 𝗦𝗽𝗶𝗹𝗹: measured “lost trading days” when a hotel reached full occupancy too early. By analysing 𝘀𝗽𝗶𝗹𝗹 𝗮𝗻𝗱 𝘀𝗽𝗼𝗶𝗹 𝗮𝘁 𝗮 𝘀𝗶𝘁𝗲-𝗻𝗶𝗴𝗵𝘁 𝗹𝗲𝘃𝗲𝗹, we uncovered significant value: • Spoil caused by pricing too high or insufficient marketing. • Spill caused by pricing too low or overmarketing. 𝗦𝗽𝗼𝗶𝗹 𝗶𝘀 𝗮 𝗳𝗮𝗰𝘁. 𝗦𝗽𝗶𝗹𝗹 𝗶𝘀 𝗮 𝗺𝗼𝗱𝗲𝗹. One measures what you wasted; the other estimates what you missed. The principle applies to almost any decision made under uncertainty: where there’s finite capacity and variable demand, there’s always a 𝘀𝗽𝗶𝗹𝗹-𝘀𝗽𝗼𝗶𝗹 𝘁𝗿𝗮𝗱𝗲-𝗼𝗳𝗳. I’ve applied this framework across a diverse range of businesses: • 𝗖𝗮𝗹𝗹 𝗰𝗲𝗻𝘁𝗿𝗲𝘀: spill = calls with no agents (missed sales); spoil = agents with no calls (wasted labour). • 𝗥𝗲𝘀𝘁𝗮𝘂𝗿𝗮𝗻𝘁𝘀: spill = understaffed hours (poor service); spoil = overstaffed hours (low productivity). • 𝗦𝘂𝗽𝗲𝗿𝗺𝗮𝗿𝗸𝗲𝘁𝘀: spill = missed sales (poor availability); spoil = waste (over-stocking). Every business wrestles with these two-sided costs – the 𝗰𝗼𝘀𝘁 𝗼𝗳 𝗲𝘅𝗰𝗲𝘀𝘀 and the 𝗰𝗼𝘀𝘁 𝗼𝗳 𝗺𝗶𝘀𝘀𝗲𝗱 𝗼𝗽𝗽𝗼𝗿𝘁𝘂𝗻𝗶𝘁𝘆. Once you measure both, you can manage the balance intelligently. The best metrics don’t just describe performance – they expose 𝘧𝘢𝘪𝘭𝘶𝘳𝘦 𝘮𝘰𝘥𝘦𝘴 that can actually be fixed. Key takeaways: • Analyse at the most atomic level that could be actionable (hour, site-night, SKU-store, agent, keyword etc.) • Define the acceptable 𝗴𝘂𝗮𝗿𝗱𝗿𝗮𝗶𝗹𝘀 for that atomic outcome. • Systematically analyse the distribution of performance outside guardrails. • Recognise that averages hide opportunities where good and bad performance offset each other There’s a fascinating 140-year history of optimising these decisions which are commonly referred to as Newsvendor problems – but that story deserves its own post.

19 Comments

Cameron R. Wolfe, Ph.D.

Research @ Netflix

24,467 followers 3w

Do you need to learn how to properly evaluate your agent? Here’s a step-by-step guide for how to do this, informed by best practices in recent research… (1) Define success. We need to first think about what it means for the agent to succeed. We should write clear and detailed criteria such as: - Outcome goals that verify aspects of the outcome (e.g., whether the expected database entries for the task were created). - Process goals that verify components of the transcript (e.g., whether certain tools were called). Recent agent benchmarks are heavily outcome-oriented, as outcome goals provide a reliable and objective mechanism for assessing the success of an agent. (2) Collect a small task set. Instead of curating a lot of data up front, we can start with a small number of tasks that we manually curate for evaluating the agent. As we use the agent and find new failure cases, we should record these issues and use them to add new tasks to our evaluation suite. Over time, we should continue collecting new—usually more difficult—tasks that challenge the agent. Legacy tasks can be maintained in a regression set. (3) Create useful tasks. We should create high-quality tasks that test important aspects of agent behavior in a reliable manner. Tasks should be clear enough that repeated evaluations yield consistent results. Ambiguous or noisy tasks complicate the evaluation process with unstable and misleading results that can obfuscate the actual performance of an agent. (4) Configure graders. We should begin with simple graders like deterministic checks (e.g., check if tools were called or if a final answer matches ground truth) because they are simple and easy to debug. For subjective criteria (e.g., code style) we need model-based graders (LLM-as-a-Judge) or human review. The human evaluation process should be calibrated, and we should monitor the level of agreement between LLM judges and human experts. (5) Build the evaluation harness. We must be able to execute the evaluation efficiently and repeatably. To do this, we can create an evaluation harness that: - Runs the agent in a realistic (but controlled) setup. - Collects the transcript, including tool calls and intermediate outputs. - Captures the final outcome. The agent should ideally use the same scaffold, tools, and environment that are used in production during the evaluation process. Each trial should start from a fresh environment to avoid any failures caused by shared state or evaluation infrastructure issues. (6) Inspect, iterate, and maintain the benchmark. Agent evaluations can become saturated quickly, so we should treat evaluation suites as living artifacts that continually improve in difficulty, diversity, and reliability. The best agent evaluations evolve continuously through new failure cases and ongoing maintenance.

5 Comments

Johan Baltzar

Co-Founder & CEO @ Steep | Hiring 🚀

7,262 followers 1y

Metric trees – the power tool every data leader needs 🔧 So you’ve defined your company key metrics when one day a big metric drops – revenue is down. What’s going on? Your CEO wants to understand why – ASAP! Here’s where I found amazing value from using metric trees. Instead of that exercise spiraling into a confusing mess, with analysts and business folks looking into every metric, a few well-designed metric trees will help everyone focus and find the real underlying drivers. A basic example: • Your revenue is driven by two key levers: average order value and order volume. • Order volume depends on how many active users you have and how well they’re converting. • Conversion rate is impacted by landing page performance, page load speed, and traffic quality. When revenue dips, this structure gives you a starting point. Instead of poking around in dashboards, you can follow the path: Did order volume drop? Was traffic quality bad, or did conversion dip? Mathematically, there are no other ways for revenue to drop. Typically, you can find that one underlying driver is the culprit, allowing you to quickly rule out other hypotheses and focus efforts where it matters. It’s a shift from reactive analytics to proactive problem-solving. Is your org using metric trees?

13 Comments

EBINYO JACOB

Founder || Principal Civil Engineer (COREN Certified) || Project Engineer || Consultant Highway and structural || MNSE || Real Estate Consultant || Technical Report Writing Coach #Talk about #roads #culverts #structures

21,067 followers 1y

Final Inspection: The Key to Preparing the Last Milestone Final inspection is non-negotiable, is not a formality. It’s how you protect your last milestone as a consultant. Before the final payment drops - inspect, verify, document because there is no payment without proof. We completed the final inspection for the renovation of a 6-classroom secondary school, which also included the staff room, principal’s office, and a detached toilet building. The inspection was a structured, technical review to ensure the contractor's works aligned with what was approved, designed, and paid for. This is what we focused on: 1️⃣ Work Completed vs BEME Scope: We physically verified each line item on the BEME from ceiling finishes, to windows, doors, wall screeding, tiling, electrical fittings, painting, plumbing, toilet installation. We ensured every detail was checked and documented. 2️⃣ Dimensions vs Drawings: Measurements were cross-checked against the final drawings submitted. Any discrepancies were flagged for clarification or correction. 3️⃣ Material Quality Assessment: We inspected the materials used, not just visually, but based on expected quality standards. We assessed workmanship, alignment, finish level, plumbing flow test, and even paint consistency. Final inspections like this help you achieve two major things: ☑️ Ensure the client is not paying for what was not done. ☑️ Ensure the contractor is paid promptly and fairly for what was done. In construction supervision, we don’t guess. 🔸We verify. 🔶We document. We report facts, not feelings. Thanks again to the team, and to our client - the Local Government Chairman and the State Government, for trusting us with this assignment. As we prepare the last milestone documentation for payment, I’m reminded that diligence at this stage protects both client and contractor. P.S. Follow EBINYO JACOB for updates on the project handover process and what goes into preparing a complete final milestone payment certificate. Let’s keep learning and raising the bar in our industry. See more on what we focused on during the final inspection in the comment section. #JacohdielEngineering #FinalInspection #ConstructionCloseout #SchoolRenovation #MilestoneCertification #BEME #SiteSupervision #EngineeringStandards #ProjectAccountability #LeadershipInConstruction

+2

5 Comments

Wil van der Aalst

Alexander-von-Humboldt professor @ RWTH Aachen, Chief Scientist @ Celonis, Chair Process and Data Science, Fraunhofer FIT, IFIP Fellow, IEEE Fellow, ACM Fellow, also known as the "Godfather of Process Mining" :-)

26,921 followers 2w

How should we assess scientific performance?? For years, academia has struggled with the role of metrics. Citation counts, h-indices, and journal-based indicators have been criticized — often for good reasons. But the opposite extreme, replacing metrics with purely narrative assessments, is not the solution either. In our new article, “Towards a more informed and balanced use of scientific performance metrics”, Jaap Denissen, Klaas Sijtsma and I argue for a more nuanced approach in the latest issue of #ResearchEvaluation. Download via https://lnkd.in/ekEMRy7r or https://lnkd.in/eqa-yTYx. Our main message: quantitative and qualitative indicators should not be seen as enemies. They are complementary. Each gives an incomplete picture when used in isolation, and each can introduce biases if used carelessly. We propose five psychometric criteria for assessing scientific performance: - Fairness - Standardization - Objectivity - Field comparability - Predictive validity These criteria help to evaluate not only bibliometric indicators, but also narrative CVs, peer review, and funding-selection procedures. A key finding is that abandoning metrics altogether may reduce transparency and increase subjectivity. Instead, we should use better metrics, correct them for known biases, and combine them with structured qualitative information. This is especially important for high-stakes decisions: hiring, promotion, awards, and research funding. Such decisions shape careers, institutions, and scientific fields for decades. The article also discusses the Dutch NWO Vici scheme as a case study, showing how assessment procedures changed over time and how our criteria can be used to evaluate and improve them. The goal is not “more metrics” or “less metrics.” The goal is better assessment. Open access article: Towards a more informed and balanced use of scientific performance metrics Research Evaluation, 2026 https://lnkd.in/ekEMRy7r #ResearchEvaluation #Scientometrics #Bibliometrics #OpenScience #ResearchAssessment #AcademicCareers #SciencePolicy Profile Areas of RWTH Aachen University Tilburg University

2 Comments

Reviewing Progress Regularly

More in Reviewing Progress Regularly

More Productivity topics

Explore categories