Skip to content

Conversation

@mernst
Copy link
Contributor

@mernst mernst commented Aug 3, 2019

No description provided.

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 3, 2019

\n\nWelcome to the OpenJDK organization on GitHub!\n\nThis repository is currently a read-only git mirror of the official Mercurial repository located at http://hg.openjdk.java.net/jdk/jdk. As such, we are not currently accepting pull requests here. If you would like to contribute to the OpenJDK project, please see http://openjdk.java.net/contribute/ on how to proceed.\n\nThis pull request will be automatically closed.

@bridgekeeper bridgekeeper bot closed this Aug 3, 2019
@mernst
Copy link
Contributor Author

mernst commented Aug 3, 2019

Sorry, pull request against wrong fork.

tschatzl added a commit to tschatzl/jdk that referenced this pull request Oct 16, 2020
plevart pushed a commit to plevart/jdk that referenced this pull request Oct 16, 2020
…mance

Performance improvements for Proxy::invokeDefaultMethod
valeriepeng added a commit to valeriepeng/jdk that referenced this pull request Dec 15, 2020
…KCS openjdk#1

Enhanced RSA KeyFactory impl of SunRsaSign and SunPKCS11 providers to accept RSA keys in PKCS#1 format and encoding
@wangweij wangweij mentioned this pull request Jan 21, 2021
3 tasks
openjdk-notifier bot pushed a commit that referenced this pull request Apr 15, 2021
Update JohnTortugo fork
openjdk-notifier bot pushed a commit that referenced this pull request Apr 29, 2021
snake66 pushed a commit to snake66/jdk that referenced this pull request Jan 3, 2025
Use .dt for generated dtrace file name suffix
snake66 pushed a commit to snake66/jdk that referenced this pull request Jan 3, 2025
Fix running tests and build for gcc
oraluben pushed a commit to oraluben/jdk that referenced this pull request Feb 5, 2025
oraluben pushed a commit to oraluben/jdk that referenced this pull request Feb 5, 2025
openjdk-notifier bot pushed a commit that referenced this pull request Feb 27, 2025
Fix so skipped tests are not considered failures
oraluben pushed a commit to oraluben/jdk that referenced this pull request Apr 30, 2025
oraluben pushed a commit to oraluben/jdk that referenced this pull request Apr 30, 2025
pf0n referenced this pull request in pf0n/jdk Jul 9, 2025
Initial source import from internal repo.
openjdk-notifier bot pushed a commit that referenced this pull request Aug 14, 2025
openjdk-notifier bot pushed a commit that referenced this pull request Aug 15, 2025
coleenp added a commit to coleenp/jdk that referenced this pull request Sep 19, 2025
jdksjolen added a commit to jdksjolen/jdk that referenced this pull request Oct 16, 2025
franferrax added a commit to franferrax/jdk that referenced this pull request Oct 16, 2025
RH2023467: Enable FIPS keys export

Co-Authored-By: Martin Balao <mbalao@redhat.com>
Co-Authored-By: Alex Kashchenko <akashche@redhat.com>
erifan added a commit to erifan/jdk that referenced this pull request Jan 22, 2026
When optimizing some VectorMask related APIs , we found an optimization
opportunity related to the `cpy (immediate, zeroing)` instruction [1].
Implementing the functionality of this instruction using `cpy (immediate,
merging)` instruction [2] leads to better performance.

Currently the `cpy (imm, zeroing)` instruction is used in code generated
by `VectorStoreMaskNode` and `VectorReinterpretNode`. Doing this
optimization benefits all vector APIs that generate these two IRs
potentially, such as `VectorMask.intoArray()` and `VectorMask.toLong()`.

Microbenchmarks show this change brings performance uplift ranging from
**11%** to **33%**, depending on the specific operation and data types.

The specific changes in this PR:
1. Achieve the functionality of the `cpy (imm, zeroing)` instruction
with the `movi + cpy (imm, merging)` instructions in assembler:
```
cpy  z17.d, p1/z, openjdk#1 =>

movi v17.2d, #0       // this instruction is zero cost
cpy  z17.d, p1/m, openjdk#1
```

2. Add a new option `PreferSVEMergingModeCPY` to indicate whether to
apply this optimization or not.
- This option belongs to the Arch product category.
- The default value is true on Neoverse-V1/V2 where the improvement
  has been confirmed, false on others.
- When its value is true, the change is applied.

3. Add a jtreg test to verify the behavior of this option.

This PR was tested on aarch64 and x86 machines with different
configurations, and all tests passed.

JMH benchmarks:

On a Nvidia Grace (Neoverse-V2) machine with 128-bit SVE2:
```
Benchmark	        	Unit	size	Before		Error	After		Error	Uplift
byteIndexInRange		ops/ms	7.00	471816.15	1125.96	473237.77	1593.92	1.00
byteIndexInRange		ops/ms	256.00	149654.21	416.57	149259.95	116.59	1.00
byteIndexInRange		ops/ms	259.00	177850.31	991.13	179785.19	1110.07	1.01
byteIndexInRange		ops/ms	512.00	133393.26	167.26	133484.61	281.83	1.00
doubleIndexInRange		ops/ms	7.00	302176.39	12848.8	299813.02	37.76	0.99
doubleIndexInRange		ops/ms	256.00	47831.93	56.70	46708.70	56.11	0.98
doubleIndexInRange		ops/ms	259.00	11550.02	27.95	15333.50	10.40	1.33
doubleIndexInRange		ops/ms	512.00	23687.76	61.65	23996.08	69.52	1.01
floatIndexInRange		ops/ms	7.00	412195.79	124.71	411770.23	78.73	1.00
floatIndexInRange		ops/ms	256.00	84479.98	70.69	84237.31	70.15	1.00
floatIndexInRange		ops/ms	259.00	22585.65	80.07	28296.21	7.98	1.25
floatIndexInRange		ops/ms	512.00	46902.99	51.60	46686.68	66.01	1.00
intIndexInRange			ops/ms	7.00	413411.70	50.59	420684.66	253.55	1.02
intIndexInRange			ops/ms	256.00	84652.41	191.45	86758.74	193.66	1.02
intIndexInRange			ops/ms	259.00	61825.20	291.71	62037.58	2355.43	1.00
intIndexInRange			ops/ms	512.00	46754.89	149.72	46972.06	40.13	1.00
longIndexInRange		ops/ms	7.00	329385.10	3292.7	318538.75	11103.9	0.97
longIndexInRange		ops/ms	256.00	46910.36	53.41	46927.82	138.29	1.00
longIndexInRange		ops/ms	259.00	33126.45	3210.07	32245.59	1347.58	0.97
longIndexInRange		ops/ms	512.00	23931.64	215.55	23805.65	312.39	0.99
shortIndexInRange		ops/ms	7.00	479265.67	1055.89	468452.89	433.15	0.98
shortIndexInRange		ops/ms	256.00	138657.38	317.72	138695.29	505.69	1.00
shortIndexInRange		ops/ms	259.00	113353.87	913.13	108912.75	1125.60	0.96
shortIndexInRange		ops/ms	512.00	84652.74	171.37	84447.01	91.99	1.00
```

On an AWS Graviton3 (Neoverse-V1) machine with 128-bit SVE1:
```
Benchmark	        	Unit	size	Before		Error	After		Error	Uplift
byteIndexInRange		ops/ms	7.00	320073.86	669.91	318557.87	1285.42	1.00
byteIndexInRange		ops/ms	256.00	119246.71	43.13	120658.01	28.27	1.01
byteIndexInRange		ops/ms	259.00	137664.23	12001.6	150378.59	70.41	1.09
byteIndexInRange		ops/ms	512.00	97187.13	18.60	95356.43	78.60	0.98
doubleIndexInRange		ops/ms	7.00	291076.68	603.08	287383.75	518.59	0.99
doubleIndexInRange		ops/ms	256.00	57473.11	123.34	61559.58	687.21	1.07
doubleIndexInRange		ops/ms	259.00	19396.73	40.03	22046.65	8.66	1.14
doubleIndexInRange		ops/ms	512.00	33619.28	33.58	34715.40	157.72	1.03
floatIndexInRange		ops/ms	7.00	317295.18	627.76	303857.78	465.78	0.96
floatIndexInRange		ops/ms	256.00	91734.27	183.61	91851.31	394.35	1.00
floatIndexInRange		ops/ms	259.00	38103.12	129.44	42237.38	92.17	1.11
floatIndexInRange		ops/ms	512.00	57219.58	366.00	57769.07	264.71	1.01
intIndexInRange			ops/ms	7.00	317063.25	830.81	304289.56	541.12	0.96
intIndexInRange			ops/ms	256.00	91535.60	315.36	98143.40	142.44	1.07
intIndexInRange			ops/ms	259.00	73827.89	472.28	73781.80	21.53	1.00
intIndexInRange			ops/ms	512.00	57552.09	20.19	62348.87	37.45	1.08
longIndexInRange		ops/ms	7.00	301886.14	381.89	301636.82	184.80	1.00
longIndexInRange		ops/ms	256.00	62246.77	69.29	62093.75	88.72	1.00
longIndexInRange		ops/ms	259.00	40642.36	861.47	41566.43	256.04	1.02
longIndexInRange		ops/ms	512.00	34850.70	154.39	34884.42	149.17	1.00
shortIndexInRange		ops/ms	7.00	318133.03	593.20	313469.12	528.73	0.99
shortIndexInRange		ops/ms	256.00	105019.58	21.38	105014.90	21.81	1.00
shortIndexInRange		ops/ms	259.00	116235.93	1985.27	118697.74	48.41	1.02
shortIndexInRange		ops/ms	512.00	91981.84	166.84	91874.82	78.28	1.00
```

[1] https://developer.arm.com/documentation/ddi0602/2025-06/SVE-Instructions/CPY--immediate--zeroing---Copy-signed-integer-immediate-to-vector-elements--zeroing--?lang=en
[2] https://developer.arm.com/documentation/ddi0602/2025-12/SVE-Instructions/CPY--immediate--merging---Copy-signed-integer-immediate-to-vector-elements--merging--?lang=en
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant