multimodal

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr multimodal speech-translation large-language-models speaker-diariazation generative-ai

Updated Oct 11, 2024
Python

mediar-ai / screenpipe

Star

24/7 local AI screen & mic recording. Works with Ollama. Llama3.2 control your computer. Alternative to Rewind.ai & Zapier. Open. Secure. You own your data. Rust.

machine-learning ai computer-vision ml vision multimodal llm

Updated Oct 11, 2024
Rust

bentoml / BentoML

Star

The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Oct 11, 2024
Python

rerun-io / rerun

Star

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.

visualization python rust computer-vision cpp robotics multimodal

Updated Oct 11, 2024
Rust

facebookresearch / mmf

Star

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

deep-learning dialog pytorch vqa pretrained-models captioning multimodal multi-tasking textvqa hateful-memes

Updated May 25, 2024
Python

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

ui beam agi openai gpt mistral multimodal groq openai-api gpt-4 large-language-models stable-diffusion generative-ai chatgpt chatgpt-ui gpt-5 anthropic

Updated Oct 11, 2024
TypeScript

SkalskiP / courses

Sponsor

Star

This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

nlp machine-learning natural-language-processing tutorial deep-neural-networks computer-vision deep-learning transformers generative-model multimodal mlops stable-diffusion

Updated Apr 22, 2024
Python

swyxio / ai-notes

Star

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

ai openai gpt multimodal gpt-3 prompt-engineering stable-diffusion

Updated Sep 24, 2024
HTML

kyegomez / tree-of-thoughts

Sponsor

Star

Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%

deep-learning prompt artificial-intelligence multimodal gpt4 prompt-learning prompt-tuning prompt-engineering chatgpt

Updated Aug 29, 2024
Python

IDEA-CCNL / Fengshenbang-LM

Star

Fengshenbang-LM(封ふう神しん榜大模型もけい)是ぜIDEA研究けんきゅう院いん认知计算与あずか自然しぜん语言研究けんきゅう中心ちゅうしん主ぬし导的大だい模型もけい开源体系たいけい，成なり为中文ぶんAIGC和わ认知智能ちのう的てき基もと础设施ほどこせ。

transformers pytorch chinese-nlp pretrained-models distributed-training multimodal aigc

Updated Aug 13, 2024
Python

jina-ai / discoart

Star

🪩 Create Disco Diffusion artworks in one line

generative-art cross-modal diffusion prompts creative-ai creative-art multimodal clip-guided-diffusion dalle disco-diffusion midjourney imgen discodiffusion latent-diffusion stable-diffusion

Updated May 16, 2023
Python

luban-agi / Awesome-AIGC-Tutorials

Star

Curated tutorials and resources for Large Language Models, AI Painting, and more.

nlp awesome ai deep-learning tutorials multimodal courses-resource aigc llm midjourney prompt-engineering stable-diffusion chatgpt

Updated Mar 31, 2024

modelscope / ms-swift

Star

Use PEFT or Full-parameter to finetune 350+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)