Today's Open Source (2024-10-18): Fudan, Baidu, and Nanjing University Open-Source Hallo2
Explore cutting-edge AI open-source projects like Hallo2 for audio-driven animations, Align Anything for multimodal alignment, and more.
Here are some interesting AI open-source models and frameworks I wanted to share today:
Project: Hallo2
Hallo2 is an audio-driven portrait animation project capable of generating high-resolution, long-duration portrait animations.
This project combines various advanced deep learning technologies, aiming to produce realistic portrait animations through audio input.
Developed in collaboration with researchers from Fudan University, Baidu, and Nanjing University, the project provides open-source code and pre-trained models for users to further develop and apply.
https://github.com/fudan-generative-vision/hallo2
Project: Align Anything
The Align Anything project aims to align large multimodal models with human intent and values through feedback-based training.
This modular framework supports fine-tuning models across various modalities, including text, images, audio, and video.
The project offers multiple alignment algorithms, allowing users to easily modify and customize the code for different tasks.
https://github.com/PKU-Alignment/align-anything
Project: CtrLoRA
CtrLoRA is a scalable and efficient framework designed for controllable image generation.
By training a foundational ControlNet and condition-specific LoRAs, the project enables training on large datasets and adaptation to new conditions with just a few images and a short time using a single GPU.
The project supports multi-condition generation and style transfer, offering pre-trained models and an online demo via Gradio.
https://github.com/xyfJASON/ctrlora
Project: VisRAG
VisRAG is a novel retrieval-augmented generation (RAG) pipeline based on vision-language models (VLM).
Unlike traditional text-parsing methods, VisRAG directly embeds documents as images and retrieves relevant information through VLM to enhance the generation process.
This approach maximizes the retention and utilization of data from the original documents, eliminating information loss introduced during parsing.
https://github.com/OpenBMB/VisRAG
Project: Adaline Gateway
Adaline Gateway is a fully localized, production-grade super SDK, offering a simple, unified, and powerful interface to call over 200 large language models (LLM).
The project supports various features, including batch processing, retries, caching, callbacks, and OpenTelemetry support, making it suitable for a wide range of enterprise-level applications.
Users can flexibly integrate it into existing infrastructures through custom plugins and providers.
https://github.com/adaline/gateway
Project: Ditto
Ditto is a user-friendly tool that allows users to generate multi-file Flask applications through simple natural language descriptions.
By leveraging simple LLM loops and a few tools, Ditto automates the coding process, turning user ideas into functional web applications.