Allan Lin's Work Portfolio

CentML

May 2023 - Dec 2023

I interned at CentML working on their open source compiler machine learning compiler, Hidet. Here is a record of my contributions to the project: All PRs.

Some highlights of my time there are:

UofT EcoSys Lab

Jan 2024 - Mar 2024

I worked with them a bit part-time before I realized that I can't multitask very well. I implemented VLLM inference kernels in CUDA for them, for LLM stuff. PR

UWaterloo Applied and Principled PL Group

May 2024 - Aug 2024

I wrote a prototype compiler for the Familia programming language, using LLVM with Rust. What's cool with Familia is that it solves the expression problem, the feature is that classes can be arbitrarily nested within other classes kind of like modules, and the inheritance relations between classes is preserved. Another feature is that classes do not hold data like traditional OOP but are witnesses to interfaces, so kind of like one more layer of indirection where the class is a name that references an implementation of an interface for a data-type.

I significantly over-estimated my abilities and 4 months is no where enough time to finish a compiler for such a complex language. Some lessons were that Rust is not a good language for compilers, since the abstract syntax tree and internal representations have cyclic references that Rust's ownership system cannot model well. I've settled on an ID system that I've observed other compilers in Rust using, sort of like an entity-component-system design in game engines where IDs are integers that index into a global data-store. This incurred a lot of boiler-plate and productivity overhead though.

Waabi

Jan 2025 - Aug 2025

This is a recap of what I did (can't show PRs cuz NDA):

Essentially Waabi uses the Detra model in their autonomous system, with some proprietary enhancements of course. I evaluated several alternative architectures for the camera backbone and feature fusion mechanism that combines the LiDAR features with the camera features. Some problems with their old architecture is that it's not very scalable in terms of receptive field; the 3D features around the vehicle is projected into a dense birds-eye-view (BEV) feature map that does not scale well with respects to receptive field increases, as each xy region of space is represented by a pixel. I had some ideas with multi-scale feature maps, dynamic voxelization, and integrating a new LiDAR backbone Point Transformer V3.

One of the team's sources of pain was the sparse-convolution library Torchsparse, which took a lot of developer time migrating across CUDA versions and inference time. I made a small but performant sparse convolution library that is both faster for the GPUs that we were using and much more maintainable, by leveraging Triton.

Sparse-conv performance forward pass Sparse-conv performance backward pass

I open sourced the work here: https://github.com/Aalanli/sparse-conv.

I used TensorRT to compile a large portion of Waabi's perception model, which was a huge pain in the ass as the python tooling was horrible; random segfaults, silent numerical errors, and very very unhelpful compiler errors.

I used the cuda-events API and hooked onto Waabi's onboard runtime environment to generate profile traces, which was kind of non-trivial given their build system and onboard tech mish-mash.

The perception model had some training instability issues where losses would diverge non-deterministically, eg. only for some runs. This was very challenging to root-cause as training runs were long, the training environment was a bit black-box and non-reproducible. I tracked down the issue to arise from unbounded activation spikes in the BEV, which is caused by sparse and irregular data on the road.

FPN debug

And probably an unstable training objective.

stability hypothesis

The fix involved a hard-clamp on the activations for key parts of the model which prevented these irregularities from occurring, and as a bonus we increased the learning rate and improved metrics. In the course of model debugging I developed a tool, the Dash app, that visualized PyTorch model graphs and helped a lot with finding a fix. The visualizer tool uses torch dynamo to capture the model graph, interprets it, and visualizes the connections.


About me

home