It’s trinary, and I understand why they instead say “1-bit,” but it still bugs me that they call it “1-bit.”
I’d love to see how low they can push this and still get spooky results. Something with ten million parameters could fit on a Macintosh Classic II - and if it ran at any speed worth calling interactive, it’d undercut a lot of loud complaints about energy use. Training takes a zillion watts. Using the model is like running a video game.
I think they mean whatever’s handling the model. A program into which you feed this inherently restricted format, so it takes advantage of those limitations, in order to run more efficiently.
Like if every number’s magnitude is 1 or 0, you don’t need to do floating-point multiplication.
It’s trinary, and I understand why they instead say “1-bit,” but it still bugs me that they call it “1-bit.”
I’d love to see how low they can push this and still get spooky results. Something with ten million parameters could fit on a Macintosh Classic II - and if it ran at any speed worth calling interactive, it’d undercut a lot of loud complaints about energy use. Training takes a zillion watts. Using the model is like running a video game.
Can someone tell me what’s meant by,
Does it mean you need to run your OS with a specific kernel from bitnet.cpp? Or is it a different kind of ‘kernel’?
I think they mean whatever’s handling the model. A program into which you feed this inherently restricted format, so it takes advantage of those limitations, in order to run more efficiently.
Like if every number’s magnitude is 1 or 0, you don’t need to do floating-point multiplication.