Sesame AI presents a voice companion that astonishingly simulates human conversations.

Home · AI Blog · Basic concepts · Sesame AI presents a voice companion that astonishingly simulates human conversations.

Update – March 14, 2025

The artificial intelligence company Sesame has just dropped a bombshell: its base model CSM-1B is now open source. Yes, you read that right. Anyone can download it from GitHub and start experimenting (surely we will soon see a retrained model for other languages, like Spanish).

This model, with one billion parameters, is released under the Apache 2.0 license, which means that its commercial use is practically unrestricted. The interesting thing is that anyone can directly test its audio generation capabilities, making this decision by Sesame hard to ignore. Moreover, a fine-tuned version of this model powers the voice system of the AI Maya, the same one we described in the previous article (below).

Safety? Just enough

Sesame’s stance on safety is, let’s say, quite relaxed. Its “safety approach” basically consists of general guidelines asking developers and users not to use the AI to clone voices without authorization, create misleading content, or do “harmful” things. Nothing more.

The problem is that CSM-1B can clone voices with just one minute of original audio, which opens the door to voice-based fraud and scams. Imagine receiving a call from a supposed family member asking for help… and the voice is identical.

The struggle between open and safe

This decision by Sesame brings back the dilemma of open source in AI. Companies like OpenAI have chosen not to release similar models due to safety concerns, but the speed at which open source is advancing makes these measures increasingly ineffective. In other words: while some companies try to maintain control, the openness of models like CSM-1B demonstrates that the AI race is unstoppable, for better or worse.

Original article from March 5, 2025:

Artificial intelligence has made a significant leap with the arrival of Sesame, a startup co-founded by Brendan Iribe, one of the creators of Oculus. Its new voice companions, Maya and Miles, are transforming the way we interact with chatbots. Unlike other voice assistants we have tried, such as OpenAI’s advanced voice mode of ChatGPT, which is not bad at all, Sesame has managed to create an experience that truly feels human.

Instead of being mere voice assistants, Sesame labels Maya and Miles as “conversationalists” and “voice companions”. This distinction is key because their approach aims to generate deeper and more meaningful interactions. During my test with Maya, the female voice of the duo, I was surprised by how natural she sounded. Not only did she speak, but she also included breathing sounds, micro-pauses, and variations in her tone, making the conversation flow organically. When I laughed, Maya promptly asked me, “Why are you laughing?”, creating an atmosphere of authentic chat.

Captivating interaction with Maya

One of the things that impressed me the most was how Maya provided space to think before responding. This small detail, which seems insignificant, makes the conversation feel much more natural. Imagine having a dialogue where your interlocutor not only listens but also seems to reflect on what you say. Although Sesame clarifies that its technology is not completely two-way, as it processes information after you finish speaking, the experience is remarkably smooth.

Compared to other AIs that tend to sound robotic, Sesame’s voice companion approaches a level of interaction that challenges the famous “uncanny valley.” It is designed not only to speak but to engage the user through a tone and contextual awareness that add layers to the conversation.

Technology behind Sesame’s voice companion

Sesame is still in the early stages of development, and what we have seen so far is just a demonstration of initial research. Backed by the venture capital firm Andreessen Horowitz, the company uses a Voice Conversation Model (CSM), which is based on a multimodal transformer model for voice generation.

They have trained three models with different capabilities: Tiny (1B parameters), Small (3B), and Medium (8B), using nearly a million hours of audio, mostly in English, although it also has some multilingual capability. Sesame’s goal is to develop a two-way model with long-term memory and an adaptable personality, which promises even more in the future.

For those interested in trying this revolutionary technology, Sesame plans to launch lightweight glasses that will allow interaction with Maya or Miles throughout the day, like in the movie ‘Her’. With the possibility of soon incorporating vision capabilities, the future of interactions with AI seems more exciting than ever. Oh, I almost forgot that you can try it yourself too. Keep in mind that although it understands Spanish, it will respond to you in English, although ideally, you should keep the conversation entirely in English. Go ahead, give it a try, but be warned that going back will be disappointing. Let us know your experience.

Written by Miguel Ángel G.P.

IT Manager | Más de 15 años de experiencia en informática corporativa. Experto en Apple, sistemas, redes, nube, virtualización, big data, diseño web...
This article talks about Business Applications.
Published on 23 de March de 2025.
In this blog we talk a lot about Robotics, OpenAI, Employment, Neural networks, Automatic learning, Medical.

Discover new AIs

We talk about all this

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *