Add more tests for sizeAdd node.#4258
Conversation
vanbasten23
commented
Dec 1, 2022
- Test SizeAdd::Lower function.
- Fix an issue with another existing test "test_simple_expand".
| # Exercise SizeAdd::Lower. | ||
| t4 = t3.expand(dyn_size) | ||
| self.assertEqual(t4.size(0), 3) | ||
| print(torch_xla._XLAC._get_xla_tensors_text([t4])) |
There was a problem hiding this comment.
I wasn't able to test SizeAdd::ToString() method. Let me try to look further.
There was a problem hiding this comment.
what's the output of torch_xla._XLAC._get_xla_tensors_text([t4]) here? we want to do a self.assertIn for the IR here.
There was a problem hiding this comment.
The output of torch_xla._XLAC._get_xla_tensors_text([t4]) here is
IR {
%0 = s32[] prim::Constant(), value=2
%1 = f32[] prim::Constant(), value=1
%2 = f32[1]{0} aten::view(%1), output_size=(1)
%3 = f32[] prim::Constant(), value=0
%4 = f32[5,2]{1,0} aten::expand(%3), size=(5, 2)
%5 = f32[1,2]{1,0} xla::generic_slice(%4), base_indices=(3, 0), sizes=(1, 2)
%6 = f32[2]{0} aten::view(%5), output_size=(2)
%7 = f32[2]{0} xla::update_slice(%6, %2), base_indices=(0)
%8 = f32[1,2]{1,0} aten::view(%7), output_size=(1, 2)
%9 = f32[5,2]{1,0} xla::update_slice(%4, %8), base_indices=(3, 0)
%10 = (s32[<=10,2]{1,0}, s32[]) aten::nonzero(%9), num_outputs=2
%11 = s32[] aten::size(%10.0)
%12 = s64[] aten::add(%11, %0)
%13 = f32[] prim::Constant(), value=1
%14 = f32[1]{0} aten::expand(%13), size=(1)
%15 = f32[<=12]{0} aten::expand(%14, %12), size=(12), dynamic_dims=(1), ROOT=0
}
It seems hard to test in the python code. But how about let me test in
Line 165 in 1685a28
There was a problem hiding this comment.
You can just assert if f32[<=12]{0} aten::expand is in the torch_xla._XLAC._get_xla_tensors_text([t4])
There was a problem hiding this comment.
What I am trying to test is SizeAdd::ToString() instead of expand. I set SizeAdd::String() to return "SizeAdd for op aten::add" but as you can see the string does not appear in torch_xla._XLAC._get_xla_tensors_text([t4]).
So I don't think torch_xla._XLAC._get_xla_tensors_text([t4]) will exercise SizeAdd::ToString(). That's why I suggest to test in test/cpp/test_ir.cpp in this pr.
|
Ok, now the test that I added The hlo and the ir graph. The full error message also doesn't show anymore useful info. Via gdb, I was able to break at xla/torch_xla/csrc/ops/dynamic_ir.cpp Line 90 in 1685a28 input1 has type s32 and input2 has type s64, hence the type mismatch. However, I couldn't find where input1 and input2 come from because p input1 doesn't reveal any useful info.
So my question is:
|
JackCaoG
left a comment
There was a problem hiding this comment.
Let's dig a bit into the
======================================================================
ERROR: test_sizeAdd (__main__.TestDynamicShapes)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/pytorch/xla/test/test_dynamic_shapes.py", line 48, in test_sizeAdd
self.assertEqual(t4.size(0), 3)
File "/opt/conda/lib/python3.7/unittest/case.py", line 852, in assertEqual
assertion_func(first, second, msg=msg)
File "/opt/conda/lib/python3.7/unittest/case.py", line 842, in _baseAssertEqual
if not first == second:
File "/opt/conda/lib/python3.7/site-packages/torch/__init__.py", line 212, in __bool__
return self.node.bool_()
RuntimeError: Error while lowering: SizeAdd for op aten::add
XLA builder error: INVALID_ARGUMENT: Binary op add with different element types: s32[] and s64[].:
We need to identify where the s32 coming from. Dumping the IR of t4 should help. Maybe we should make SizeAdd to handle the casting.
I already did. Here is the IR of t4. Specifically Both operands of aten::add are type s32. Via gdb, I was able to find #4258 (comment). |
|
@JackCaoG I followed your suggestion to create a shorter repo |
|
I will try to repo sometime today |
|
I am able to repo the error, will post my debugging steps tonight. |
|
After verify that expand is the source of the error, I dump the IR of the weirdly both input seems to be s64, and this is the only One possibility is that and I saw and if I do so I was wrong, it is actually coming from |
looks suspious, if I have to guess, |
|
ok confirmed, if I do (which avoids I am able to see which suggest Size is actually a s32 even on CPU, but we force it to be S64 in xla/torch_xla/csrc/tensor_util.cpp Lines 1253 to 1259 in d7d0479 |
|
with issue is gone. That being said |
|
Here is the full diff |
|
Thanks for looking into it. So to confirm, the |
|
my patch will force all size to be s32, as long as make sure |
But regarding your comment "looks suspious, if I have to guess, GetDimensionSize also returns s32, but we incorrectly flag it to be s64.", we don't have to change the part that flags it to be s64, right? |
|
no, we need to make sure dimensionType is always s32. However for the case of SizeAdd, if we have a |
|
Recap: What we concluded is "Size is actually a s32 even on CPU, but we force it to be S64 in here. To confirm:
just so that it'll always return s32 for size operation? |
89475b0 to
70a9a79
Compare
JackCaoG
left a comment
There was a problem hiding this comment.
mostly lgtm, minor nits