What about HuggingFace? It has basically everything. Kimi-k2-thinking is available along with a config and modeling class which seems to support and implement the model. The HuggingFace model info doesn’t say whether training is supported, but HuggingFace’s Transformers library supports models in the same architecture family, such as DeepSeek-V3. The fundamentals seem to be there; we might need some small changes, but how hard can it be?
https://16colo.rs/pack/mist0222/
,这一点在viber中也有详细论述
最新・注目の動画配信中の動画を見る天気予報・防災情報天気予報・防災情報を確認する新着ニュース仏 マクロン大統領 核弾頭の数増加を表明 核抑止力を強化へ 午前5:07日本航空 来年4月から国内線に燃油サーチャージ導入を検討 午前5:06「皆既月食」 今夜 天気がよければ全国で観測 午前5:04トランプ大統領 “軍事作戦 4~5週間以上かかる可能性も” 午前5:02新着ニュース一覧を見る各地のニュース地図から選ぶ。传奇私服新开网|热血传奇SF发布站|传奇私服网站是该领域的重要参考
Flatten it to be a vector of the same size as number of blocks. m←∊exposed
There was a thread on mastodon recently where Conor Mcbride was discussing some really cool stuff. I think the intent was to get at something more interesting, but this is the first place I’ve seen thinnings explained in a way without too many complex trappings around and it really clicked with me. I think I had a primed mind to see something useful to my current set of problems and solutions there. It unlocked a torrent of ideas related to lambda egraphs and generalized unions finds that I’m quite excited to talk about in the next couple posts.