You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This custom_node for ComfyUI adds one-click "Virtual VRAM" for any UNet and CLIP loader as well MultiGPU integration in WanVideoWrapper, managing the offload/Block Swap of layers to DRAM *or* VRAM to maximize the latent space of your card. Also includes nodes for directly loading entire components (UNet, CLIP, VAE) onto the device you choose
Privacy-first AI ecosystem for Android. Run GGUF models offline or access 100+ cloud models via OpenRouter. Features 11 premium offline voices, extensible plugins, and dynamic DataHub for context injection. No subscriptions, no data harvesting—just AI on your terms.
Run Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1, and other state-of-the-art language models locally with scorching-fast performance. Inferno provides an intuitive CLI and an OpenAI/Ollama-compatible API, putting the inferno of AI innovation directly in your hands.
Create out symbolic links for the GGUF Models in Ollama Blobs. for use in other applications such as Llama.cpp/Jan/LMStudio etc. / 将 Ollama GGUF 模型文件软链接出,以便其他应用使用。
🤖 Serverless AI on Hedera! Smart contracts run LLMs ⚙️ Distilled GGUF on IPFS. Pay crypto 🪙 → get replies 💬. 99.9% cheaper 💸 than big-tech APIs. Decentralized ⛓️ no censorship. “Gov can’t stop it. Nobody can.” 🧠 – Sir Charles Spikes , Cincinnati,Ohio| TELEGRAM: @SirGODSATANAGI✉️ SirCharlesspikes5@gmail.com #AGI #NoPC
A high-performance local service engine for large language models, supporting various open-source models and providing OpenAI-compatible API interfaces with streaming output support.
Automate bulk downloads of Hugging Face LLMs with retry logic, manifest export, checksum validation, and usage reporting. Ideal for managing GGUF models at scale.
This repository demonstrates how to use outlines and llama-cpp-python for structured JSON generation with streaming output, integrating llama.cpp for local model inference and outlines for schema-based text generation.