Multiply operation with broadcasting support
type A = TensorStorage<Float32, [2, 3]>;type B = TensorStorage<Float32, [1, 3]>;type Result = Mul<A, B>; // Output shape: [2, 3] Copy
type A = TensorStorage<Float32, [2, 3]>;type B = TensorStorage<Float32, [1, 3]>;type Result = Mul<A, B>; // Output shape: [2, 3]
Multiply operation with broadcasting support