Today's Open Source (2024-10-18): Fudan, Baidu, and Nanjing University Open-Source Hallo2

Explore cutting-edge AI open-source projects like Hallo2 for audio-driven animations, Align Anything for multimodal alignment, and more.

Oct 18, 2024

Here are some interesting AI open-source models and frameworks I wanted to share today:

Project: Hallo2

Hallo2 is an audio-driven portrait animation project capable of generating high-resolution, long-duration portrait animations.

This project combines various advanced deep learning technologies, aiming to produce realistic portrait animations through audio input.

Developed in collaboration with researchers from Fudan University, Baidu, and Nanjing University, the project provides open-source code and pre-trained models for users to further develop and apply.

https://github.com/fudan-generative-vision/hallo2

Project: Align Anything

The Align Anything project aims to align large multimodal models with human intent and values through feedback-based training.

This modular framework supports fine-tuning models across various modalities, including text, images, audio, and video.

The project offers multiple alignment algorithms, allowing users to easily modify and customize the code for different tasks.

https://github.com/PKU-Alignment/align-anything

Project: CtrLoRA

CtrLoRA is a scalable and efficient framework designed for controllable image generation.

By training a foundational ControlNet and condition-specific LoRAs, the project enables training on large datasets and adaptation to new conditions with just a few images and a short time using a single GPU.

The project supports multi-condition generation and style transfer, offering pre-trained models and an online demo via Gradio.

https://github.com/xyfJASON/ctrlora

Project: VisRAG

VisRAG is a novel retrieval-augmented generation (RAG) pipeline based on vision-language models (VLM).

Unlike traditional text-parsing methods, VisRAG directly embeds documents as images and retrieves relevant information through VLM to enhance the generation process.

This approach maximizes the retention and utilization of data from the original documents, eliminating information loss introduced during parsing.

https://github.com/OpenBMB/VisRAG

Project: Adaline Gateway

Adaline Gateway is a fully localized, production-grade super SDK, offering a simple, unified, and powerful interface to call over 200 large language models (LLM).

The project supports various features, including batch processing, retries, caching, callbacks, and OpenTelemetry support, making it suitable for a wide range of enterprise-level applications.

Users can flexibly integrate it into existing infrastructures through custom plugins and providers.

https://github.com/adaline/gateway

Project: Ditto

Ditto is a user-friendly tool that allows users to generate multi-file Flask applications through simple natural language descriptions.

By leveraging simple LLM loops and a few tools, Ditto automates the coding process, turning user ideas into functional web applications.

https://github.com/yoheinakajima/ditto

Today's Open Source (2024-10-17): NVIDIA Open-Sources Llama 3.1 Nemotron 70B

Meng Li

October 17, 2024

Today's Open Source (2024-10-17): NVIDIA Open-Sources Llama 3.1 Nemotron 70B

Here are some interesting AI open-source models and frameworks I wanted to share today:

Read full story

AI Disruption

Today's Open Source (2024-10-17): NVIDIA Open-Sources Llama 3.1 Nemotron 70B

Discussion about this post