14 May 2024 / NEWS

Google Announcements

Just a day after OpenAI wowed us with GPT-4o, Google decided it’s their turn to dazzle! Let's dive into the goodies unveiled at the Google IO conference.

Among the exciting announcements, we have:

Veo: Google’s star video generation model.
Project Astra: The futuristic AI assistant in the making.
Gemini 1.5 Pro Updates: Introducing two new versions of their flagship model – one sleeker, the other with a whopping 2M token context length.

Now, let’s break these down one by one:

Veo

Meet Veo, Google DeepMind's newest and shiniest video generation model. Here's what it can do:

Produce high-quality videos in 1080p
> Extend beyond a minute of runtime
Deliver a spectrum of cinematic and visual styles

Veo lets you input an image or video paired with a textual prompt. It can animate the image or edit the video as needed. Plus, it supports masked editing, so you can tweak specific areas by adding a mask and text prompt. On the technical side, Google enhanced video captions in Veo’s training data and uses high-quality, compressed video representations (latents) to boost performance, speed, and efficiency.

Project Astra

Enter Astra, Google’s new project aimed at crafting the AI assistant of tomorrow, hot on the heels of OpenAI's GPT-4o demo. Powered by Gemini, Astra supports real-time audio, text, video, and image interactions. Although it's still a prototype, Astra’s demo came through pre-recorded videos since it’s not yet widely available. Early testers noted a bit of lag, less emotional intelligence, and tone compared to GPT-4o, but praised its text-to-speech prowess and potentially superior long-context video support.

Gemini 1.5 Pro

Google rolled out two new iterations of their flagship model, Gemini 1.5 Pro:

Gemini 1.5 Pro Flash: A nimble, fast, and cost-efficient version, still multimodal with a 1M token context length. It boasts an MMLU of 78.9% vs. 81.9% for the original Gemini 1.5 Pro.
Gemini 1.5 Pro: This powerhouse’s context length is doubled to 2M tokens and is currently accessible via a waitlist for select API developers.

Other Announcements

Google also revealed:

Imagen 3: Their most advanced image generation model, available in versions tailored for tasks ranging from quick sketches to high-res images.
Gemma 2 and PaliGemma New open-source models. PaliGemma is Google’s inaugural vision-language model and is available now. Gemma 2, with 27B parameters, outperforms its predecessor and will be available in June.

Access

The Gemini API and Google AI Studio are now live in over 200 countries. Gemini 1.5 Flash is priced at $0.35 per 1M tokens, with context caching launching next month. While Veo, Astra, and the 2M context Gemini 1.5 Pro aren’t available yet, you can join the waitlist for access. But no worries, Gemini 1.5 Pro Flash is ready for you through the API, and PaliGemma is freely available on Kaggle. Time to explore these futuristic tools and see where they can take us!

Google Announcements

Veo

Project Astra

Gemini 1.5 Pro

Other Announcements

Access

LoRA Learns Less and Forgets Less

DeepSeekMath Pushing the Limits of Mathematical Reasoning in Open Language Models

Veo

Project Astra

Gemini 1.5 Pro

Other Announcements

Access

Subscribe to Kavour

Subscribe to Kavour