[eagle overlap spec] wip impl top k > 1 in overlap eagle worker(v2) #11839
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Try to impl top_k > 1 in eagle overlap spec (v2 ->> replace v1 (non overlap))
Accuracy Tests
SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python -m sglang.launch_server --dtype float16 --model-path unsloth/Meta-llama-3.1-8b-instruct --attention-backend triton --decode-log-interval 1 --disable-cuda-graph --speculative-algorithm EAGLE --speculative-draft-model-path lmsys/sglang-EAGLE-LLAMA3-instruct-8B --mem-fraction-static 0.8 --speculative-num-steps 3 --speculative-eagle-topk 2 --speculative-num-draft-tokens 4 ---page-size 1 --disable-radix-cache --disable-cuda-graph --enable-beta-specRight now it still produce gibberish.
Baseline result
Current Status
Still gibberish, focus on triton