feat: add opentelemetry counters for sent and acked messages by agrawal-siddharth · Pull Request #2532 · googleapis/java-bigquerystorage

agrawal-siddharth · 2024-06-18T02:23:14Z

Also add network latency, queue length and error counts.

The metrics (other than error counts) are now reported periodically, every second.

yirutang · 2024-06-21T00:09:39Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

      };
+  static AttributeKey<String> telemetryKeyErrorCode = AttributeKey.stringKey("error_code");
  private Attributes telemetryAttributes;
+  private long incomingRequestCountBuffered;


nit, I would group all of them under telemetryMetrics.

yirutang · 2024-06-21T00:11:21Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

+                "Reports time taken in milliseconds for a response to arrive once a message has been sent over the network.")
+            .setExplicitBucketBoundariesAdvice(METRICS_LATENCY_BUCKETS)
+            .build();
+    instrumentConnectionEstablishCount =


I believe this can be derived from network_response_latency, if you put connection_id as the metrics field.

I have added writer_id as an attribute. I still have this metric, however, as it directly provides information about establishing a connection.

yirutang · 2024-06-21T00:11:50Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

+              measurement.record(length, getTelemetryAttributes());
+            });
+    writeMeter
+        .gaugeBuilder("inflight_queue_length")


It will need connection_id as metrics field, otherwise it doesn't make too much sense?

I have added writer_id as an attribute.

yirutang · 2024-06-21T00:12:57Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

+            .build();
+    instrumentSentRequestRows =
+        writeMeter
+            .counterBuilder("append_rows_sent")


I think we should maintain less metrics. Can we just add a "result" field to append_requests/rows/bytes?

Done. I have removed the following metrics: append_requests, append_request_bytes, append_rows, waiting_queue_length, connection_retry_count, append_requests_error, append_request_bytes_error, append_rows_error.

I now use the "error_code" attribute on each of the following metrics: append_requests_acked, append_request_bytes_acked, append_rows_acked, connection_end_count.

shollyman · 2024-07-08T23:49:22Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

+  private LongCounter instrumentErrorRequestCount;
+  private LongCounter instrumentErrorRequestSize;
+  private LongCounter instrumentErrorRequestRows;
+  private static final List<Long> METRICS_LATENCY_BUCKETS =


are these millis/micros/nanos?

Renamed this to METRICS_MILLISECONDS_LATENCY_BUCKETS.

shollyman · 2024-07-09T00:01:17Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java


+  private void periodicallyReportOpenTelemetryMetrics() {
+    Duration durationSinceLastRefresh = Duration.between(instantLastSentMetrics, Instant.now());
+    if (durationSinceLastRefresh.compareTo(METRICS_UPDATE_INTERVAL) > 0) {


Are metrics updates really that costly on the producer side that you don't just update metrics at time of event?

In opencensus, flushing/updates were mostly an exporter concern.

I am testing using an exporter to Google Cloud Monitoring. I encountered "exceeded max frequency" errors with this exporter. To resolve this issue, I have switched to updating the instruments only once every second, which I believe should be sufficient for our needs.

Upon further inspection, I narrowed down the issue I was seeing to the frequency of the exporter. I have restored all metrics to be instrumented in real time.

yirutang · 2024-07-23T20:33:38Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

-  private LongCounter instrumentIncomingRequestSize;
-  private LongCounter instrumentIncomingRequestRows;
+  private static final List<Long> METRICS_MILLISECONDS_LATENCY_BUCKETS =
+      ImmutableList.of(0L, 50L, 100L, 500L, 1000L, 5000L, 10000L, 20000L, 30000L, 60000L, 120000L);


@GaoleMeng Do these buckets look good to you? Do we need a bucket at 50L? Maybe add a 2000L?

this is too sparse, in backend we are using power of 1.5 bucket, that means it's
1 1.5 1.5^2 1.5^3.... millisecond

We were once using power of 4, but found that was too sparse, so we reduced it to power of 1.5
could we do similar bucketing here?

The power of 1.5 sequence looks like this:

1, 2, 3, 5, 8, 11, 17, 26, 38, 58, 86, 130, 195, 292, 438, 657, 985, 1478, 2217, 3325, 4988, 7482, 11223, 16834, 25251, 37877, 56815, 85223, 127834, 191751, 287627, 431440, 647160, 970740, 1456110

Would it be useful to provide all of these buckets? Alternatively, we could just provide every other bucket, so the list looks like this:

1, 3, 8, 17, 38, 86, 195, 438, 985, 2217, 4988, 11223, 25251, 56815, 127834, 287627, 647160, 1456110

yirutang · 2024-07-23T22:20:23Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

    if (!tableName.isEmpty()) {
      builder.put(telemetryKeyTableId, tableName);
    }
+    builder.put(telemetryKeyWriterId, writerId);


Add some comment to buildOpenTelemetryAttributes, what kind of attributes it is building and does this apply to all metrics?

yirutang · 2024-07-23T23:26:17Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

+      ImmutableList.of(0L, 50L, 100L, 500L, 1000L, 5000L, 10000L, 20000L, 30000L, 60000L, 120000L);
+
+  private static final class OpenTelemetryMetrics {
+    private LongCounter instrumentSentRequestCount;


Discussed with Gaole, we think that maybe the Sent and Ack won't make a significant difference. Let's just record Ack for now for simplicity.

yirutang · 2024-07-23T23:27:02Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

        writeMeter
-            .counterBuilder("append_requests")
-            .setDescription("Counts number of incoming requests")
+            .counterBuilder("append_requests_acked")


This can be a TODO, I am wondering if it is possible to have a Retry attribute to the metric.

yirutang

LGTM, please address the bucket length issue.

yirutang · 2024-07-24T21:03:16Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

+  // Buckets are based on a list of 1.5 ^ n
+  private static final List<Long> METRICS_MILLISECONDS_LATENCY_BUCKETS =
+      ImmutableList.of(
+          1L, 3L, 8L, 17L, 38L, 86L, 195L, 438L, 985L, 2217L, 4988L, 11223L, 25251L, 56815L,


Do we need to have 1L, 3L and 8L?

I have removed these. I now start at 0 (as the lowest bucket boundary) and end at 647160 which represents about 10 minutes.

Also add network latency, queue length and error counts. The metrics (other than error counts) are now reported periodically, every second.

…pis#2532) Also add network latency, queue length and error counts. The metrics (other than error counts) are now reported periodically, every second.

agrawal-siddharth requested a review from a team as a code owner June 18, 2024 02:23

agrawal-siddharth requested a review from alvarowolfx June 18, 2024 02:23

product-auto-label bot added size: m Pull request size is medium. api: bigquerystorage Issues related to the googleapis/java-bigquerystorage API. labels Jun 18, 2024

agrawal-siddharth requested review from shollyman and yirutang June 18, 2024 02:23

yirutang reviewed Jun 21, 2024

View reviewed changes

yirutang requested a review from GaoleMeng June 21, 2024 00:13

shollyman reviewed Jul 9, 2024

View reviewed changes

agrawal-siddharth force-pushed the openMetrics2 branch 2 times, most recently from 3caa290 to b6b30cf Compare July 19, 2024 15:36

product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Jul 19, 2024

agrawal-siddharth force-pushed the openMetrics2 branch from b6b30cf to 796ae3e Compare July 19, 2024 20:42

product-auto-label bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels Jul 19, 2024

agrawal-siddharth force-pushed the openMetrics2 branch 3 times, most recently from 0b87d97 to ad164fb Compare July 22, 2024 16:58

yirutang reviewed Jul 23, 2024

View reviewed changes

agrawal-siddharth force-pushed the openMetrics2 branch from ad164fb to 73c9193 Compare July 24, 2024 00:27

yirutang approved these changes Jul 24, 2024

View reviewed changes

agrawal-siddharth force-pushed the openMetrics2 branch from 73c9193 to 0572a47 Compare July 24, 2024 20:50

yirutang reviewed Jul 24, 2024

View reviewed changes

feat: add opentelemetry counters for sent and acked messages

8ac0ed2

Also add network latency, queue length and error counts. The metrics (other than error counts) are now reported periodically, every second.

agrawal-siddharth force-pushed the openMetrics2 branch from 0572a47 to 8ac0ed2 Compare July 24, 2024 21:21

agrawal-siddharth merged commit 2fc5c55 into googleapis:main Jul 24, 2024

release-please bot mentioned this pull request Jul 24, 2024

chore(main): release 3.8.0 #2576

Merged

agrawal-siddharth deleted the openMetrics2 branch August 14, 2024 16:34

release-please bot mentioned this pull request Dec 15, 2025

chore(protobuf-4.x-rc): release 3.20.0-rc1 #3127

Merged

Conversation

agrawal-siddharth commented Jun 18, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yirutang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants