Couch Fashion

Elevating Couch Fashion's Styling Assistant Through Audio and Text-to-Speech Synergy

Couch Fashion is a pioneering force in the fashion and styling industry, leading the way with its groundbreaking AI Styling Assistant. Their mission is clear: to empower individuals with effortless style and unshakable confidence. The innovative platform caters to both fashion enthusiasts and trendsetters, offering an array of invaluable features. Couch Fashion's primary objectives include providing users with expert styling advice, personalized outfit recommendations, and an extensive source of fashion inspiration. They've seamlessly integrated shopping into their platform, making the fashion discovery and purchase process remarkably convenient. Moreover, Couch Fashion aims to keep users in the know about the latest fashion trends, ensuring that they consistently project a stylish and contemporary image.

Case Cover Image

Challenges

Our challenges encompassed the integration of audio input for voice interaction with Dialog Flow, which involved enabling devices like computers and phones to process and comprehend spoken commands. This was akin to implementing a virtual conversational partner. Subsequently, we tackled the task of text-to-speech conversion to produce responses in a human-like voice, ensuring that the interactions with the device were devoid of a robotic tone. This was a crucial aspect of delivering a seamless and user-friendly experience.

Images 6
1. Implementing Audio Input in Meteor JS for Dialog Flow Integration

To enable voice interaction with the system and facilitate natural language processing through Dialog Flow, the challenge was to seamlessly integrate audio input capabilities into the Meteor JS platform. This involved not only technical implementation but also efficient handling of audio data to ensure accurate communication with Dialog Flow.

Images 6
2. Converting Textual Output to Human Voice for Enhanced User Experience

Another critical challenge was converting the textual output generated by the system into a natural and human-like voice. This conversion was essential to deliver an engaging user experience, requiring the integration of voice synthesis technologies to ensure the system's responses were easily understandable and conversational.

Solutions

Within the domain of audio processing optimization, two closely linked challenges come to light. The first entails the conversion of stereo audio streams into a mono format, while the second focuses on the creation of lifelike text-to-speech output. These carefully devised solutions serve as key drivers in streamlining audio processing and enhancing the interaction between users and the system, ushering in a more immersive and engaging experience.

1. Strategies for Stereo-to-Mono Audio Conversion

To address the issue of converting stereo audio streams into a mono format suitable for Dialog Flow processing, we explored various strategies. One option involved considering "Recorder.js," a prebuilt recording library capable of automating recording and conversion tasks. However, it encountered compatibility issues with the MeteorJS system. Another approach was to transmit input from a single audio stream, either left or right, but this had the drawback of potentially delivering less clear audio for language processing. Ultimately, our solution combined both audio streams through a series of transformations, resulting in a mono audio format ideal for seamless integration with Dialog Flow.

2. Enhancing Natural Text-to-Speech Output

In the realm of text-to-speech conversion, our aim was to make text sound like a natural voice for audio output. One approach explored the use of the Audio Context API for client-side text-to-speech conversion. However, this method had certain limitations, such as generating a somewhat robotic voice and relying on the available audio options based on the client's operating system. As an alternative, we leveraged the power of the Google Text-to-Speech API, converting text into audio files, such as MP3 or WAV, and then transmitting these files to the client for playback. This innovative approach resulted in audio output imitating human speech, offering a versatile audio experience that enhanced overall user interaction with the system.