Skip to content

Conversation

@clayrosenthal
Copy link
Contributor

Adding AWS cloudwatch as an option for external logging

closes #6167

added tests for the config and logging in tests/sky/logs/test_aws.py

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

@clayrosenthal
Copy link
Contributor Author

@aylei I saw you assigned the original issue, hopefully you hadn't spent much time on it, but figured I'd take a crack

@aylei
Copy link
Collaborator

aylei commented Jul 22, 2025

Thanks! I have not looked into the issue yet😉

Copy link
Collaborator

@aylei aylei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @clayrosenthal ! The changes look good to me, just left some nits

sky/logs/aws.py Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# 'tag':

Nit: looks like we don't need tag in inputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah my bad, can remove this

sky/logs/aws.py Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: shall we keep the defaults in config init for coherence?

sky/logs/aws.py Outdated
Comment on lines 17 to 18
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log_group_name: Optional[str] = None
log_stream_prefix: Optional[str] = 'skypilot-'
log_group_name: str = 'skypilot-logs'
log_stream_prefix: str = 'skypilot-'

Will this work?

@aylei
Copy link
Collaborator

aylei commented Jul 22, 2025

/smoke-test -k test_log_collection_to_aws_cloudwatch

@aylei
Copy link
Collaborator

aylei commented Jul 22, 2025

The UT failure looks irrelevant, I will take a look.

Could you fix the document build @clayrosenthal ?

@clayrosenthal clayrosenthal force-pushed the external-logs-to-cloudwatch branch from ddc69aa to c19f6d0 Compare July 22, 2025 19:05
Signed-off-by: Clay Rosenthal <clayros@amazon.com>
@clayrosenthal clayrosenthal force-pushed the external-logs-to-cloudwatch branch from c19f6d0 to 5f6bb7a Compare July 22, 2025 19:27
@clayrosenthal
Copy link
Contributor Author

Ok should be all fixed now, thanks @aylei !

@aylei
Copy link
Collaborator

aylei commented Jul 23, 2025

/smoke-test -k test_log_collection_to_aws_cloudwatch

Signed-off-by: Aylei <rayingecho@gmail.com>
@aylei
Copy link
Collaborator

aylei commented Jul 23, 2025

Hi @clayrosenthal ! I notice that there is an issue of fallback logs:

$ cat /tmp/fluentbit.log
...
[2025/07/23 12:08:26] [error] [/tmp/fluent-bit/plugins/out_file/file.c:553 errno=2] No such file or directory
[2025/07/23 12:08:26] [error] [output:file:file.1] error opening: /tmp/skypilot_logs_fallback.log/tail.

And looks like the logs will always get output to fallback logs:

$ cat /home/ubuntu/.sky/logging/fluentbit.yaml
pipeline:
  inputs:
  - name: tail
    path: ~/sky_logs/*/*.log
    path_key: log_path
    refresh_interval: 1
    processors:
      logs:
      - name: content_modifier
        action: upsert
        key: skypilot.cluster_name
        value: t-log-collection-1q-ca
      - name: content_modifier
        action: upsert
        key: skypilot.cluster_id
        value: t-log-collection-1q-ca-b673d4fd
  outputs:
  - name: cloudwatch_logs
    match: '*'
    region: us-east-1
    log_group_name: skypilot-logs
    log_stream_prefix: skypilot-t-log-collection-1q-ca-b673d4fd-
    auto_create_group: 'true'
  - name: file
    match: '*'
    path: /tmp/skypilot_logs_fallback.log
    format: out_file

Do you have any ideas about this? @clayrosenthal

@clayrosenthal
Copy link
Contributor Author

Huh, I can try and change the fallback log options, but maybe not necessary as the log files should already exist. But do your AWS access have permissions to write to cloudwatch?

@aylei
Copy link
Collaborator

aylei commented Jul 24, 2025

Huh, I can try and change the fallback log options, but maybe not necessary as the log files should already exist. But do your AWS access have permissions to write to cloudwatch?

Yes, the log was also sent to cloudwatch in this case.

@aylei
Copy link
Collaborator

aylei commented Jul 24, 2025

/smoke-test -k test_log_collection_to_aws_cloudwatch

Signed-off-by: Aylei <rayingecho@gmail.com>
@aylei
Copy link
Collaborator

aylei commented Jul 24, 2025

/smoke-test -k test_log_collection_to_aws_cloudwatch

@aylei
Copy link
Collaborator

aylei commented Jul 24, 2025

Let's merge this first and keep the fallback logs issue as a follow-up, thanks for the making this happen @clayrosenthal !

@aylei aylei merged commit 96bb129 into skypilot-org:master Jul 24, 2025
16 checks passed
@clayrosenthal clayrosenthal deleted the external-logs-to-cloudwatch branch July 24, 2025 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[External Logging] Add AWS cloudwatch as external logging destination

2 participants