Honorable mention
Memory-efficient CUDA inference kernels for quantized Llama-class models—popular in consumer GPU chat UIs.
llminferencecudaquantizationlocal
Filter by platform, license text, maturity, maintenance cadence, and editorial tags like privacy-focused or self-hosted. Search matches names, summaries, tags, and use cases.
1 tool match your filters
Memory-efficient CUDA inference kernels for quantized Llama-class models—popular in consumer GPU chat UIs.