Subtract operation with broadcasting support
type A = TensorStorage<Float32, [2, 3]>;type B = TensorStorage<Float32, [1, 3]>;type Result = Sub<A, B>; // Output shape: [2, 3] Copy
type A = TensorStorage<Float32, [2, 3]>;type B = TensorStorage<Float32, [1, 3]>;type Result = Sub<A, B>; // Output shape: [2, 3]
Subtract operation with broadcasting support