Azure's Immersive Reader 💬

This article discusses a basic setup for Microsoft's Immersive Reader, which is part of the Cognitive Services in Azure. The reader provides a full screen reader that will speak a given block of text and offers a number of other features to help read someone to read the text. Microsoft identifies that it can help "emerging readers, language learners, and people with learning difficulties such as dyslexia" [1].

There is a bit of setup in Azure and then configuration in the application. Microsoft has examples for ASP.NET Core MVC (C#), Node.js, Android (Kotlin and Java) and iOS (Swift). I have tried the ASP.NET Core version of Microsoft's instructions, which took me about 1 hour to complete. It isn't complex, but there are few steps to complete.

This article comments on the setup process - the instructions from Microsoft are clear and I won't replicate them here. I will summarise my experience of following them.

When the ASP.NET application runs, you will see a web page with text and the Immersive Reader button (Figure 1).

Figure 1: Display in the ASP.NET web application that will load the Immersive Reader.

Pressing the Immersive Reader button changes the screen to show an Immersive Reader view (Figure 2). There is a short pause as the reader is prepared. The text is shown in a large font with a lot of space between the different lines.

Figure 2: The Immersive Reader view for the example application.

Pressing the play button at the bottom of the screen will read the text aloud, shown in the video below (Figure 3).

Figure 3: Immersive Reader example for ASP.NET Core

Main steps

  1. Create a free Azure account or use your existing account.
  2. Setup the Immersive Reader on Azure.
  3. Setup the MVC app.

Setting the Immersive Reader Resource

The longest part was setting up the resource on Azure. There aren't many steps, but it took a while to check what options might be relevant for my situation. I think doing this in future will be much quicker.

Microsoft's article Create an Immersive Reader resource and configure Azure Active Directory authentication covers the details. The steps are:

  1. Log in to the Azure Portal in a web browser.
  2. In the portal, access the Cloud Shell, which opens in the bottom part of the browser window.
  3. Copy a function definition from the article and paste it into the Cloud Shell.
  4. Run that function. The example in the article shows doing this with a lot of options. I found that it was easier to just type the function name, Create-ImmersiveReaderResource and then you are in an iteractive mode.
  5. Below the example set of values there is a description of the different values, which helps when trying to identify names.

On my first attempt, I made a mistake with the date. That caused the creation process to fail. It was a simple mistake, but the date needed to be provided in single quotes, which is different to the other names entered in interactive mode.

Re-running the script then failed because part of the resources were partially created. I expect that there is a way to fix that within the Azure portal, but I don't have enough experience with the portal yet. Deleting the partially created resource in the portal worked. I could then run the function again and it worked.

Once setup, there is a JSON string that you need to copy for use in the ASP.NET Core app. An example is shown below. The values on the right will be specific to your setup.

{
  "ClientSecret": "value-that-you-specify-in-function",
  "ClientId": "id-generated-for-you",
  "Subdomain": "name-you-specify-in-function",
  "TenantId": "id-generated-for-you"
}

ASP.NET Core app

Setting up the ASP.NET Core app was straightforward. The instructions in the ASP.NET Core Quickstart example page has instructions for creating the application using Visual Studio. I am using Visual Studio Code and the dotnet CLI, so I needed to make a few adjustments.

Using the dotnet CLI, I created the application with the following line.

dotnet new mvc -o ImmersiveReader

The -o option creates the project in a new folder with the name that follows the argument. You could leave that out and create the files in your current directory.

The app uses the values from the JSON string generated when the Immersive Reader resource was created. In the Microsoft example, there is a feature in Visual Studio that makes it easy to paste the values. For my setup, I used the user-secrets CLI tool.

# setup the user-secrets
# run these commands in the same folder as the .NET project file
dotnet user-secrets init

# add the secrets one at a time with the following format
dotnet user-secrets set "ClientSecret" "id-from-the-json-string"

Once that is done, the rest of the setup in Microsoft's article is straightforward if you are familiar with an ASP.NET Core MVC project.

With everything in place, there was a final oddity. The ASP.NET application ran, but the JavaScript failed to correct to the MVC action. Starting and refreshing things didn't seem to make a difference. Running the debugger in Visual Studio Code seemed to fix the issue without needing to use any debugging features. I don't understand what happened, but it now runs successfully. If I have the same problem later on, I will investigate further.

The video in Figure 4 shows show of the options to configure the tool, including background colour, font and font size, parts of speach and whether to focus on a a line at a time or a few lines at a time.

Figure 4: Video of some options to configure the tool.

Pricing

Pricing is based on the number of characters processed by the reader. The first 3 million characters per month are free and then there a set price for each additional 1 million characters. There are two pricing levels which cover Standard and Education/Charity use, where the Education/Charity use is about half of the Standard price.

Looking in the console, it is worth noting that the count of the characters processed is taken when the reader is started. The graph in the Azure console shows that all of the characters on the page, about 700, are processed in one go when the Immersive Reader button is pressed.

In one way, that seems reasonable, but it does mean that your free characters can be used even if a user only launches the reader and does not press the Play button. When the reader is open, it doesn't look as though any more character processing is done even if you stop, start and move around the text.

Figure 5: Example report for the number of Processed Characters, shown in the Azure Portal

Summary

This looks like an interesting resource from Microsoft and it could be a help that doesn't rely on screen readers being installed on machines. It will be interesting to use this further.

The technology looks great. Like all cloud features, there is a slight concern about keeping control of costs. I suspect in real use, the free limit works quite well and the costs per extra 1 million are pretty fair.

Resources

[1] Microsoft's Immersive Reader Documentation