• j4k3@lemmy.world
    link
    fedilink
    English
    arrow-up
    13
    ·
    5 months ago

    The only real choke point for present CPU’s is the on chip cache bus width. Increase the size of all three, L1-L3, and add a few instructions to load some bigger words across a wider bus. Suddenly the CPU can handle it just fine, not max optimization, but like 80% fine. Hardware just moves slow. Drawing board to consumer for the bleeding edge is 10 years. It is the most expensive commercial venture in all of human history.

    I think the future is not going to be in the giant additional math coprocessor paradigm. It is kinda sad to see Intel pursuing this route again, but maybe I still lack context for understanding UALink’s intended scope. In the long term, integrating the changes necessary to run matrix math efficiently on the CPU will win on the consumer front and I imagine such flexibility would win in the data center too. Why have dedicated hardware when that same hardware could be flexibly used in any application space.