Vaani the voice of IoT

Project Vaani Moves to Prototyping Phase

August 17, 2016

As you may already be aware, Project Vaani aims to bring a voice interface to the connected devices using open, Mozilla-backed technologies that do not tie you down to the silo’ed services like some existing offerings do.

Our Approach: Products Before Platform

Before we decide to build a platform, or create a set of technologies to enable this, our approach is to first focus on releasing a narrowly defined product that solves a specific consumer problem(s). So far, we have conducted several market surveys and user studies with initial concepts. During the user studies and surveys, we consistently heard one common problem: keeping track of things. One such verbatim quote is here:

"...The stress of keeping track of a hundred things to do – errands like groceries and household items… I mentally note those when I am in the middle of something at home...
But when I am at the store, I am scratching my head about what all I needed to buy… I forget buying some items… When I need them next, the cycle continues”
We broke down this user pain point into a narrow goal for us below:

Goal: Remove friction from creating lists
(and provide reminders when it matters)

And formed a following hypothesis based on this:

We believe that...
  • (target audience) smartphone-equipped grocery decision makers (Women between 25 and 40 years old, living in North America urban areas, who work full time and run a 3 to 5 person household)
  • (current solution) who are currently making shopping lists with pen and paper or smartphone application
  • (problem) currently find it cumbersome to create and use these lists when shopping
We can determine a solution and benefit...
  • by building a device that takes voice commands and creates lists with a note-taking app
  • and providing a faster and easier way to create shopping lists than what they are currently using.
We will know this to be true when we see that a majority of participants who use the prototype for (..x.. days)...
  • who prefer this device to their current solution, and
  • who believe this device provides a faster and easier way of creating lists.

We are calling this Minimum Viable Product (MVP) "Vaani Local" now as it focuses on local shopping.

pixabay photo by larsen9236 shared under a Creative Commons (CC0) license

There are a couple assumptions here that we are validating with user and market surveys. We also are building a quick hardware prototype and plan to reach out to “Mozilla friends and family” for testing this. Each user study participant would place a small hardware unit in their house (e.g. on kitchen counters) and use voice to add items to a shopping list via voice. These will be converted to text line items in their smartphone app automatically; We are starting with an Evernote app in the prototype. We will review the results during the month of August and analyze the learnings to fine tune the MVP further. This will help us validate the problem space, target user demographic, use cases and the right approach to solve it.

That’s all for now. We will keep our community posted with regular updates.

Please continue following us on our wiki for further updates.

Vaani Kick-Off in Berlin

February 16, 2016

Here is a brief update from the Vaani team meetup in Berlin last week around prototype-scoping and planning, following the gate 0 go-ahead.


First, a quick background. Over the past few weeks, the team worked hard to study the offerings in this space; talked to experts in the industry and received feedback from Mozillians to refine the ideas further. We started with a few basic constraints we have set for ourselves:

  • This is a broad space and one that is maturing fast. However, since we need to work with limited resources (& time), we need to start small. We need to make good product/architectural decisions for the prototype in line with the technical maturity we can realistically achieve in the next few months.
  • We will use open source blocks (where available) after our careful assessments rather than starting from scratch e.g. speech engines.
  • Finally, our end goal & focus must be the end users. Since Voice interface is an ingredient (vs a final product by itself), we need to bridge the gap for a usable (value exhibitable) prototype. Thus, our approach is to create an “IoT enabler package” that is useful for developers, early adopters/makers and device makers. Something with which they can use voice to control IoT devices out of the box and showcase the future possibilities.


In the Berlin meeting last week, we went deeper into the scope and planning. Here are the highlights:

  • Prototype: First prototype target is to create a Vaani SDK that is based on Raspberry Pi2 and a sample smart home SDK (e.g. openHAB), so we can enable voice control in specific devices like SmartHome hubs, thermostats, music players.
  • Architecture: The Vaani SDK will use Pocketsphinx and Kaldi for speech to text (STT) analysis and MaryTTS for text to speech (TTS) synthesis. A purchased speech corpora will be used for training our STT model. A voice talent (to be contracted) will do recording sessions for training the TTS model. (Note: If you know a female voice talent or a good agency for voice talents, please contact us.)
  • UX: Connected devices are going to bring new and exciting UX paradigms. First, we have started preparing the list of all possible voice commands we need for these devices. Our next steps are to explore, refine and create user flows. These commands are focused on the SmartHome hubs, thermostats and music players. Also, we started discussion of audio ambient indicators and the “personality” (or “soul”) of Vaani. We began exploration of a Mozilla designed user interface of the openHAB based prototype.
  • openHAB & market validation: We met with the openHAB CEO to validate our concepts as well as to learn about their architecture & plans. They are an open source initiative with over 120 IoT hardware protocol bindings enabled so far. (It is a very interesting initiative that we should look more closely into, but more on that later) Generally openHAB team feels that open source Voice option at scale like Vaani is essential. At our request, they are looking into ways of helping us validate this further with some device makers (potentially via the QIVICON Alliance). We will follow up with them for this, as well as work with them to finalize the integration details of the prototype.
  • User research: We are looking for opportunities to get a high level user validation done with the User insights/research team. More on this in next weeks.
  • QA update: Created initial test plan for Vaani, discussed future automation, discussed Continuous Integration, began discussing community involvement for QA and testing.
  • Community: Vaani as a high level concept was introduced in the Community leadership summit in Singapore event in Jan 2016. Currently, we are working with George Roter to identify opportunities and timing for broader community participation.
  • Differentiation: While we are focused on a specific prototype as a proof point for now, in the long term we looked at several differentiating opportunities for us to take Vaani much further than existing products in market like Amazon Alexa (Echo) offer today. E.g.
    • Amazon is trying to lock down users with their own/partner services with Echo. (shopping with Amazon, taxi with Uber, pizza with Domino's, music from spotify etc. Amazon decides these partnerships for their silo.) We can break that model by offering users a choice of using any of the services they desire directly from the web. Users may decide to get shopping from Flipkart, taxi from Lyft, music from Spotify and pizza from Pizza-Hut etc.
    • Users could even get web based information directly into the Vaani answers. “Where is the cheapest X?” from online providers (or local ones in neighborhood). We can help connect users with everything that the open web can offer with such services without gate-keeping.
    • A few other ideas are around integrating speaker identification ("who is speaking") together with Firefox Account - to directly load and access user preferences. Based on users’ settings/preferences, we could offer in-home or remote personalized services from various providers.
    • Some ideas around “locating” the user inside the home and have context sensitive commands supported like “Turn the lights off here (/in this room)”
    • Another idea is to build the context within actions (e.g. While cooking recipes, answer context sensitive questions like “how many spoons of that?”).
    • (This discussion continues)

While we are focused in earnest in getting the prototype built now, we will build the stack of differentiation ideas in parallel. Please continue sending feedback and ideas.