The Basics
Introducing bucketMul
The GPU implementation
Pesky details
About the Author(s)
Download and Run

About the Author

... and soon to be collabolators

Photo from before the three-month sprint to get the project going. So farm this has been a single-person operation by:

Tomasz Kolinko.

I spend most of my time in Warsaw, building cool things, mentoring startups and so on. A few times a year I visit Berlin and San Francisco - feel free to chat me up if you're from one of these places.

I've also built a few other projects in the past:

- AppCodes - App Store SEO tool

- Orisi - a possibly first whitepaper about decentralised oracles

- Eveem.org - a smart contract decompiler.

- Freespace Warsaw - a short lived climate tech venture builder, now an awesome mancave.

You can contact me via e-mail: kolinko@gmail.com,
and find me on Twitter/X: @kolinko.

Future hall of fame

This place is reserved for people brave enough to improve some of the following:

Fix the inference bug

Fix the 15ms overhead for each token

Wrap up the Q8 implementation

Reimplement Mixtral

Implement Effort in Llama.cpp / Ollama / MLX and other projects

Test on HellaSWAG and HumanEval

Scale up to preprocessing / parralelize

Special Thanks

Remco Bloemen

for building Cria that was the starting point implementation.

Przemysław "Psyho" Dębiak

for the tips on performance improvements and benchmarking

Kacper Wikieł

for support and last minute testing

Anna Kubasiak

For putting up with me and saying I'm so smart

the rest of friends

For still inviting me to parties after a 3-month disappearance from the face of the planet.

and open source community in general

for providing the resources to develop things like this.

Dave Liepmann

For making Effort Engine that allowed me to deploy this whole site in just one day - edwardtufte.github.io/tufte-css/

(the list will be surely updated soon)