Krzysztof Drewniak

Contact

Email: krzysdrewniak@gmail.com
Work Email: Krzysztof.Drewniak@amd.com
Github: krzysz00
Resume: pdf
PGP: 837E 880C 11FF 99D9 F0C0 9BA1 2A14 2308 2388 E924
Ham radio: KF5SOQ

About me

I am a senior engineer on the machine learning compiler team at AMD. There, I am one of the technical leads on rocMLIR, a MLIR-based generator for high performance machine lerning kernels.. See Software development for more details on my work and the contributions I have made to the MLIR and LLVM projects as a result.

I’m not looking for new work right now.

Previously, I was a PhD student at the University of Washington, as detailed in Research.

In addition to my software development work, I have written several published short stories in the Post-Self setting as detailed in Writing.

Software development

My main role at AMD since I joined the company has been work on the rocMLIR project. Over that time, I have become one of the tech leads on the project, which is a generator for high-performance implementations of matrix multiplication and convolution on AMD GPUs, primarily for use in machine learning.

My work on rocMLIR has spanned from overseeing our integration into the MIGraphX graph inference engine to substantial performance improvements to critical refactorings. For example, I made our existing coordinate transformations concept a first-class IR object, which allowed for powerful analyses such as the ability to determine how accesses to the memory underlying a tensor could be best vectorized.

I have made various substantial contributions to the wider MLIR and LLVM community, including:

Providing ergonomic wrappers around our matrix multiplication instructions
Adding support for gpu.printf
Integer range analysis for MLIR
The ptr addrspace(8) (buffer resource) and ptr addrspace(7) (buffer fat pointer) representations for AMD buffer resources in LLVM
Substantially expanding how the new properties system in MLIR could be used in declarative operation specifications

Writing

In addition to working in software, I have taken up science fiction writing.

My published work mainly consists of works in the Post-Self setting.

My 12,000-word short story “Sentences” is included in Marsh, and I am the primary author of two sections of the upcoming Idumea, namely those about The Dog and The Rabbit-Chaser.

I have written other short stories in the setting, including (but not limited to)

“Coffee Leak”, where we see the consequences of physics being less fixed in the System
“The Party”, where there is a dog at The Party that never ends
“Arise to Oath And Office”, showing another angle of the inciting incident of Marsh (and used as promotional material)

News

2024-06-12 Some of my Post-Self short stories will be incorporated into the upcoming book Idumea.

2024-02-01 My short story “Sentences” will be appearing in the upcoming Post-Self book Marsh

2023-07-01 I have been promoted to the position of Senior Member of Technical Staff at AMD.

2022-10-06 I have presented rocMLIR’s coordinate transformations system to the MLIR Open Design Meeting. See the slides and recording.

2021-03-09 I’ve accepted a position as a Machine Learning Compiler engineer at AMD in Austin.

2020-04-09 The grant proposal “Multiscale Synthesis for Tensor Programs” for Facebook’s “Towards On-Device AI” RFP, which I helped write, was funded.

Research

I was a PhD student at the University of Washington Paul G. Allen School of Computer Science & Engineering for three years from 2018 to 2021. I worked in the Programming Languages and Software Engineering group, and am advised by Rastislav Bodik. My research focus is on using program synthesis to improve the performance of numerical computations, such as matrix multiplication and convolution, that are used in machine learning and scientific computing.

I have developed a new synthesis technique for fixed-sized mathematical operators on accelerators (such as GPUs) that reduces the problem to synthesis over functional array programs and uses an abstract reachability analysis to quickly prune most incorrect partial candidates. This has the added advantage of reducing a large amount of the computation to Boolean matrix multiplication, enabling synthesis over larger spaces of functions as compared to previous work. In the future, this work will be integrated into a larger framework for synthesizing efficient code for machine learning models on accelerators.