otel: Export rate metrics as a single counter #5164

codebien · 2025-09-10T09:26:55Z

What?

Rate metrics are now exported as a single counter with a label that can have two values: zero and nonzero.

Why?

Most of the time a single metric should be enough to query the main stats for this kind of query. We are aware that sometimes it might require a bit more complexity in case of aggregation but it seems to be acceptable.

It automatically solves the issue with cold startup of the metric because we expect at least a value between nonzero and zero.

Checklist

I have performed a self-review of my code.
I have commented on my code, particularly in hard-to-understand areas.
I have added tests for my changes.
I have run linter and tests locally (make check) and all pass.

Checklist: Documentation (only for k6 maintainers and if relevant)

Please do not merge this PR until the following items are filled out.

I have added the correct milestone and labels to the PR.
I have updated the release notes: link
I have updated or added an issue to the k6-documentation: grafana/k6-docs#NUMBER if applicable
I have updated or added an issue to the TypeScript definitions: grafana/k6-DefinitelyTyped#NUMBER if applicable

Related PR(s)/Issue(s)

Closes #4573

codebien · 2025-09-10T16:10:04Z

internal/output/opentelemetry/output.go

@@ -53,12 +54,12 @@ func (o *Output) StopWithTestError(_ error) error {
 	o.logger.Debug("Stopping...")
 	defer o.logger.Debug("Stopped!")

+	o.periodicFlusher.Stop()


We have to stop the ingestion before stopping the flushing.

(Note this IS NOT blocking! ⚠️ )

I'm not sure whether it's related to this or not, but from time to time I see the following error:

INFO[0010] 2025/09/16 15:39:10 failed to upload metrics: context canceled: rpc error: code = Canceled desc = context canceled

especially when using the old method, which I guess is more prone to fail because it needs to send two metrics instead of one.

Have you seen this before? Was that what made you change the order of these calls?

This is definitely NOT an issue introduced by your changes, because it happens with master as well, but I'm wondering whether there's something within this shutdown process we could to completely avoid that.

Perhaps we should create an issue and investigate it later, and consider if we should include it as part of #5173.

Honestly, I'm not aware of it. Did you see it on local tests or across networks? I've ran multiples local tests and I never hit it.

I discovered the current bug mostly by writing the unit test. Because they are sensible if the last batch has been correctly flushed or not.

The error reported smells like a bug related a bad management of the context in the recurring flush operation. It's a good idea to open a bug report and we have to tackle it as part of the graduation process.

codebien · 2025-09-10T16:10:37Z

internal/output/opentelemetry/output.go

 	default:
-		o.logger.Warnf("metric %q has unsupported metric type", entry.Metric.Name)
+		return fmt.Errorf("metric %q has unsupported metric type", entry.Metric.Name)


If this happens is an error. There isn't a way to create an untyped metric.

Good catch! 👏🏻

codebien · 2025-09-11T12:11:06Z

@oleiade @inancgumus I've decided to be less brutal and do this change a bit more gradual. We will have a feature flag for the next release, so if we overlooked at any topic or if users need more time then they can still use the previous method without breaking their workflow.

joanlopez

I haven't carefully looked at all the changes in this PR, and I haven't had time to test it end-to-end yet neither, but I'm wondering how this approach may help in one of the problems mentioned in the issue #4573 -the one I personally reported-, which is the lack of initialization, also discussed within k6 and around thresholds: #4787.

Beyond that, and thinking around the practicality of this approach, I'm not quite sure if it really brings much value. Although it's true that using labels is likely a better approach than having two "different" counters, I'm not quite sure how, generally speaking, having a statically defined attribute with a pair of values is going to improve user's experience overall.

Let's suppose we have a metric that tracks (using Prometheus notation for simplicity):

the "rate" of HTTP errors, like: http_requests_failed{type="zero|non-zero"}
the "rate" of hit/miss in cache, like: cache_reads_miss{type="zero|non-zero"}

Do we really think that this really improves much the experience, versus:

HTTP requests failed:
http_requests_errors_occurred{} 0
http_requests_errors_total{} 0
Cache hits/misses:
cache_reads_miss_occurred{} 0
cache_reads_miss_total{} 0

? 🤔

Because, despite that not being perfect (I don't like it a lot neither), semantically the former -current- looks a bit nicer to me than the new one. And note that this is NOT only a "naming" thing, because if you want to track "all" vs "some" (as "occurred" vs "total" pursues), I think you shouldn't use OTEL attributes, which purpose I think is very close to Prometheus labels, usually used to categorize rather than tracking rates (i.e. In what % of the total amount of events happened X?).

As said, I don't like either, and I don't have a better proposal, but I have doubts around changing what already exists in favor of something that doesn't really look much better. Instead, perhaps we could try to gather feedback and knowledge from more experienced users, also more familiar with OTEL.

joanlopez · 2025-09-12T10:01:32Z

examples/docker-compose/opentelemetry/docker-compose.yml

@@ -5,7 +5,7 @@ networks:

 services:
  prometheus:
-    image: prom/prometheus:v3.2.0
+    image: prom/prometheus:v3.5.0


Thanks for keeping this file up-to-date. Do you think we can also get rid of the version attribute, as per https://docs.docker.com/reference/compose-file/version-and-name/?

Rate metrics are now exported as a single counter with a label that can have two values: zero and nonzero.

examples/docker-compose/opentelemetry/grafana/dashboards/k6-prometheus.json

joanlopez · 2025-09-16T13:08:39Z

examples/docker-compose/opentelemetry/grafana/dashboards/k6-prometheus.json

          "editorMode": "code",
          "exemplar": false,
-          "expr": "k6_checks_occurred{testid=~\"$testid\"} / k6_checks_total{testid=~\"$testid\"}",
+          "expr": "rate(k6_checks_total{testid=~\"$testid\", condition=\"nonzero\"}[$__rate_interval]) / rate(k6_checks_total{testid=~\"$testid\"}[$__rate_interval])",


I think this is wrong, as I have tested it with successful and unsuccessful checks and it still display: 100%.

It might be something along the lines of:

sum by (check) (k6_checks_total{testid=~"$testid", condition="nonzero"}) / sum by (check) (k6_checks_total{testid=~"$testid"})

Do we have a test that verifies this behavior (whether it's incorrect or not)?

@inancgumus Do you mean an automated or a manual tests? In terms of automation, we don't have it. If you want to manual test it, then you can go to examples/docker-compose/opentelemetry/* which is the path where this dashboard is in.

Yeah, I basically tested this manually.

I'm not sure if there's an easy way to automate that, or if it does worth the effort of automating such thing (probably not?).

internal/output/opentelemetry/config.go

internal/output/opentelemetry/output.go

internal/output/opentelemetry/registry.go

Co-authored-by: Joan López de la Franca Beltran <5459617+joanlopez@users.noreply.github.com>

joanlopez

Generally looks good, great job @codebien! 👏🏻

I left multiple (minor) comments, which aren't really blocking, except for the query in the dashboard that I think it's wrong and could be misleading.

In general, I'm quite satisfied with the approach, and it really solves the issue reported 👍🏻 Although, I have to admit that condition="zero|nonzero" still resonates in my head as unlikely the best candidate/most intuitive one.

In other words, I'm not super convinced that if you would ask to a large population of k6 users if: k6_http_req_failed{condition="zero"} reflects the amount of HTTP requests that failed, or that did not, a vast majority would answer correctly.

But overall it's true that his fixes the underlying issue, so I'm 👍🏻 for merging it, as it improves the current status 🙌🏻

joanlopez · 2025-09-16T14:35:23Z

In general, I'm quite satisfied with the approach, and it really solves the issue reported 👍🏻 Although, I have to admit that condition="zero|nonzero" still resonates in my head as unlikely the best candidate/most intuitive one.

After an internal chat, we decided to stick with condition="zero|nonzero" as any other alternative evaluated so far is also confusing in certain conditions (note here that signs of positivity like positive|negative and success|failure would suffer the same as discussed in #2306, an in some other related issues/pull requests, so we stick to a boolean representation).

If we ever find a better candidate, we can always add the new attribute and eventually get rid of the old one 👍🏻

mstoykov

LGTM 👍

internal/output/opentelemetry/output.go

Co-authored-by: Mihail Stoykov <312246+mstoykov@users.noreply.github.com>

inancgumus

LGTM

codebien self-assigned this Sep 10, 2025

codebien force-pushed the otel-rate-single-counter branch from 09cd6ff to 130844d Compare September 10, 2025 13:26

codebien commented Sep 10, 2025

View reviewed changes

codebien marked this pull request as ready for review September 10, 2025 16:12

codebien requested a review from a team as a code owner September 10, 2025 16:12

codebien requested review from inancgumus and oleiade and removed request for a team September 10, 2025 16:12

codebien added this to the v1.3.0 milestone Sep 10, 2025

codebien mentioned this pull request Sep 11, 2025

Release notes v1.3.0 #5169

Merged

codebien force-pushed the otel-rate-single-counter branch from 8f35f70 to ae9506c Compare September 11, 2025 12:12

codebien temporarily deployed to azure-trusted-signing September 11, 2025 12:18 — with GitHub Actions Inactive

codebien temporarily deployed to azure-trusted-signing September 11, 2025 12:19 — with GitHub Actions Inactive

joanlopez self-requested a review September 12, 2025 08:03

joanlopez reviewed Sep 12, 2025

View reviewed changes

codebien added 4 commits September 15, 2025 19:19

Export rate metrics as single counter

f472876

Rate metrics are now exported as a single counter with a label that can have two values: zero and nonzero.

Update docker-compose versions

f742093

Rollback via a feature flag

0a77975

examples/docker-compose/otel: drop version

f60ab97

codebien force-pushed the otel-rate-single-counter branch 2 times, most recently from 3441fe5 to edea45f Compare September 15, 2025 17:24

codebien commented Sep 15, 2025

View reviewed changes

examples/docker-compose/opentelemetry/grafana/dashboards/k6-prometheus.json Outdated Show resolved Hide resolved

codebien temporarily deployed to azure-trusted-signing September 15, 2025 17:30 — with GitHub Actions Inactive

codebien temporarily deployed to azure-trusted-signing September 15, 2025 17:32 — with GitHub Actions Inactive

examples/otel/grafana: Updated to use the new Rate format

1766f00

codebien force-pushed the otel-rate-single-counter branch from edea45f to 1766f00 Compare September 16, 2025 12:39

codebien requested a review from joanlopez September 16, 2025 12:41

codebien temporarily deployed to azure-trusted-signing September 16, 2025 12:45 — with GitHub Actions Inactive

codebien temporarily deployed to azure-trusted-signing September 16, 2025 12:46 — with GitHub Actions Inactive

joanlopez reviewed Sep 16, 2025

View reviewed changes

internal/output/opentelemetry/config.go Outdated Show resolved Hide resolved

joanlopez reviewed Sep 16, 2025

View reviewed changes

internal/output/opentelemetry/output.go Outdated Show resolved Hide resolved

joanlopez reviewed Sep 16, 2025

View reviewed changes

internal/output/opentelemetry/registry.go Outdated Show resolved Hide resolved

codebien and others added 2 commits September 16, 2025 15:57

Apply suggestion from @joanlopez

937e076

Co-authored-by: Joan López de la Franca Beltran <5459617+joanlopez@users.noreply.github.com>

Apply suggestion from @joanlopez

3b4d5c6

Co-authored-by: Joan López de la Franca Beltran <5459617+joanlopez@users.noreply.github.com>

joanlopez reviewed Sep 16, 2025

View reviewed changes

codebien temporarily deployed to azure-trusted-signing September 16, 2025 14:03 — with GitHub Actions Inactive

codebien temporarily deployed to azure-trusted-signing September 16, 2025 14:06 — with GitHub Actions Inactive

oleiade modified the milestones: v1.3.0, v1.4.0 Sep 19, 2025

mstoykov self-requested a review September 24, 2025 08:10

mstoykov previously approved these changes Sep 25, 2025

View reviewed changes

internal/output/opentelemetry/output.go Outdated Show resolved Hide resolved

Link the issue for removal

cfff123

Co-authored-by: Mihail Stoykov <312246+mstoykov@users.noreply.github.com>

codebien dismissed mstoykov’s stale review via cfff123 September 25, 2025 08:28

codebien temporarily deployed to azure-trusted-signing September 25, 2025 08:34 — with GitHub Actions Inactive

codebien temporarily deployed to azure-trusted-signing September 25, 2025 08:36 — with GitHub Actions Inactive

Format fix

89d51e3

mstoykov approved these changes Sep 25, 2025

View reviewed changes

codebien temporarily deployed to azure-trusted-signing September 25, 2025 08:59 — with GitHub Actions Inactive

codebien temporarily deployed to azure-trusted-signing September 25, 2025 09:02 — with GitHub Actions Inactive

inancgumus approved these changes Sep 25, 2025

View reviewed changes

codebien merged commit 621b71e into master Sep 25, 2025
45 of 46 checks passed

codebien deleted the otel-rate-single-counter branch September 25, 2025 14:43

codebien added the breaking change for PRs that need to be mentioned in the breaking changes section of the release notes label Sep 25, 2025

joanlopez mentioned this pull request Nov 4, 2025

Problem sending metrics with flag '--out experimental-opentelemetry' #5362

Closed

otel: Export rate metrics as a single counter #5164

otel: Export rate metrics as a single counter #5164

Uh oh!

Conversation

codebien commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

Why?

Checklist

Checklist: Documentation (only for k6 maintainers and if relevant)

Related PR(s)/Issue(s)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codebien commented Sep 11, 2025

Uh oh!

joanlopez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codebien Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joanlopez left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joanlopez commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mstoykov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

inancgumus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codebien commented Sep 10, 2025 •

edited

Loading

codebien Sep 16, 2025 •

edited

Loading

joanlopez left a comment •

edited

Loading

joanlopez commented Sep 16, 2025 •

edited

Loading