multimodal AI models AI News List | Blockchain.News
AI News List

List of AI News about multimodal AI models

Time Details
2025-12-20
14:59
Amazon Nova 2 Family Launch: Competitive Multimodal AI Models and Custom Training with Nova Forge

According to DeepLearning.AI, Amazon has introduced the Nova 2 family, which includes Pro, Omni, Lite, and Sonic models, delivering highly competitive multimodal reasoning and generation capabilities. Nova Forge provides a robust platform for customers to blend their proprietary data with Amazon checkpoints, enabling tailored AI model training for enterprise needs. Additionally, Nova Act offers advanced browser-automation agents that can navigate websites, fill forms, and extract data, streamlining business automation. Early benchmarks indicate that Nova 2 Pro performs on par with top-tier AI models across multiple evaluation tests, showcasing significant opportunities for businesses to leverage Amazon’s next-generation AI infrastructure for custom solutions and workflow automation (source: DeepLearning.AI, The Batch).

Source
2025-12-18
16:58
Meta Open-Sources PE-AV Model: Advanced Audio-Visual AI Integration for State-of-the-Art Audio Separation

According to @AIatMeta, Meta has open-sourced the Perception Encoder Audiovisual (PE-AV), a powerful AI engine underlying SAM Audio’s state-of-the-art audio separation technology (source: @AIatMeta, Dec 18, 2025). PE-AV is built upon the earlier Perception Encoder model and uniquely integrates audio with visual perception, setting new benchmarks in audio and video analysis tasks. The model's native multimodal capabilities enable enhanced sound detection and improved scene understanding, offering significant potential for practical AI applications such as audio forensics, video content analysis, and accessibility solutions. By releasing the code and research paper, Meta is fostering innovation in multimodal AI, opening business opportunities for startups and enterprises aiming to leverage advanced audio-visual machine learning models in commercial products (source: https://go.meta.me/e541b6, https://go.meta.me/7fbef0).

Source
2025-10-13
22:15
Agentic AI Course by Andrew Ng: Hands-On Guide with Design Patterns, Plus Anthropic Claude Sonnet 4.5, OpenAI and Meta AI Product Expansion

According to DeepLearning.AI, Andrew Ng has announced a new course titled Agentic AI, focusing on practical skills for building AI agents using four key design patterns: reflection, tool use, planning, and multi-agent collaboration (Source: DeepLearning.AI, The Batch, Oct 13, 2025). The course aims to equip professionals with actionable frameworks for deploying advanced AI agents in business environments. Additionally, Anthropic has launched Claude Sonnet 4.5 and updated Claude Code, offering enhanced capabilities for enterprise automation and AI-driven development. OpenAI and Meta are expanding their AI product portfolios to target broader industry needs, while Alibaba has introduced the Qwen3-Max and open-sourced its multimodal Qwen3-VL/Omni models, aiming to drive innovation in AI-powered applications. The availability of LoRA adapters further streamlines fine-tuning for industry-specific tasks. These developments signal a shift toward more accessible, customizable, and collaborative AI solutions, creating new business opportunities in agentic AI, enterprise automation, and multimodal applications (Source: DeepLearning.AI, The Batch).

Source
2025-09-04
07:07
Gemini 2.5 Flash Image AI Model 'nanobanana' Now Powers Text-Guided Photo Editing on PicLumen

According to PicLumen AI (@PicLumen), the launch of the 'nanobanana' (Gemini 2.5 Flash Image) model on the PicLumen platform introduces advanced, text-guided image editing capabilities for users. This AI model enables precise restoration, stylization, and creative reimagining of images in a single step, delivering high consistency and efficiency (source: @PicLumen, September 4, 2025). For AI companies, this represents a significant business opportunity in the rapidly growing visual content enhancement market. It also highlights the trend of integrating large multimodal models into consumer-facing applications, driving adoption among creative professionals, marketers, and e-commerce businesses. As the demand for intuitive image editing tools increases, platforms leveraging state-of-the-art AI like Gemini 2.5 are well-positioned to capture new market segments and create innovative monetization models.

Source