Transcript
Can you guess how much data is created every day in 2022? According to earthweb.com, that would be 2.5 quintillion bytes of data. A lot of this is just unstructured text-based data like documents and reports. Now this data is valuable only if we are able to generate meaningful insights from it. Historically, this required very powerful algorithms to build computer programs that extracted meaningful information from text requiring machine learning expertise, enormous training datasets and compute power. What if we could do this in one API call? This technique of identifying and extracting the essential entities from any text-based document is called Named Entity Recognition. It is one of the several natural language processing features offered by Azure Cognitive Service for language.
Now we discussed that using named entity recognition you can identify and categorize entities in unstructured text. Let me tell you why you might need this capability. There are so many everyday use cases where named entity recognition can be helpful. Let's say you have an online business. You may have multiple avenues generating data, for example, customer reviews. What good is those thousands of reviews if you can't extract insights from it? Now what you could do is allow the data to be processed by entity recognition algorithms. It will help you extract specific information like geographical data or invoice details. This will help you better the service for your customers. So named entity recognition is an AI technique that will help you mine for good. It can extract so many key entities from your data and classify these into appropriate categories like person, person-type, location, organization, event. It could save a lot of time and improve the efficiency of your team. So, if you'd rather let an AI service read all the user reviews and return to you the results in a matter of seconds, then manually sift through these for over days and weeks. Let me show you how.
The named entity recognition feature can identify and categorize entities in unstructured text. For example, people, places, organizations, quantities. All we need to do is to send in our request through an API and it will return the desired results. So, first step, we need to create a language resource in Azure. If you have an Azure subscription today or you can try for trial subscription of Azure and you can look in for the language resource. Just search for language in the marketplace and you will find the language service. As you proceed to create the language service, you will see that it supports a variety of features out of the box, one of which is the named entity recognition feature, which we discussed in this episode. As you continue to create your resource, you need to provide your subscription, your resource group, the location of your resource in Azure and it will take one or two minutes and you can create your language resource. From there, there are two ways in which you can proceed. One is through programming language of your choice C#, Java, Python. You can use client libraries, or you can invoke rest API's and you can call the API to provide your unstructured text and obtain the results in the output. So that's one way of doing it. The only thing to note is that you might need two values that you create that you obtain by creating this resource. One is the key or the password and the other one is the endpoint URL. This is vital to authenticate your API. So that's one way of doing it. Another way of trying out the capabilities of this cognitive service is through a web portal called the Language Studio. So, this is a fairly new feature, a web-based platform through which without writing any code, you can still try out the features of the named entity recognition and several other features as well. Let's try that out. For the sake of time, I've created a conversational language, demo a language resource, and this would be the Azure resource that I will refer to in my language studio. So, the language studio is available for you to try out today, and if you log into the language studio, just make sure you have a language resource that you can reference to try out each of these natural language processing capabilities. Let's try out the named entity recognition feature that we discussed in this episode. So the extract named entities. I'm going to try that out and see that it asks for my text language, which I will choose as English, and it is asking me for my Azure resource. So, this is the one that I just created right now. I have the option to provide it any piece of text, so I have some text files are available out of the box. I can also upload a text file of my own. So, let's choose one that's available here. This piece of text is interesting. It reads the moon orbits the earth at a distance of 394400 kilometers, about 30 times the diameter of the Earth, and its gravity influences the Earth's tides. Interesting. A very simple piece of text, but if I were to run the named entity recognition service on that, it would immediately identify two locations, Moon and Earth, and two quantities in that one line of text.
Let's try something more complex. Let's try a restaurant review. Restaurant reviews usually have a load of information in that sort of like our Amazon reviews as well. If I were to run that the same feature on that piece of text, you would see that in that one paragraph there are so many insights that we obtain in a matter of seconds. It has identified organization, location, a couple of locations in fact, a datetime and even a person, types a person, product, phone number, e-mail, URL and whatnot. So, this is the power of the named entity recognition feature, and you can try this out today.
Just so we are here, let me show you some of the associated features as well. What we tried out right now was the extract named entities feature. A related feature you might observe is extract PII. PII stands for personally identifiable information. As you might have noticed from the previous example, this is a pre-built capability built over the named entity recognition feature. If I were to use the same example as we used previously and run this feature, you would see that it has been able to identify a couple of PII information which includes date range, person type, person, URL, phone number, e-mail. The power of this feature is that if you are able to identify and extract PII information, you would be able to redact, or to obscure or mask that information for security reasons.
Likewise, another related feature is named entity recognition. This is a very powerful feature. If you are required to create custom AI models which would require to extract very specific domain entities or industry terminologies from your organizational documents or financial contracts, you still have the option to create custom AI Models. So, what you would need for that is to be able to create a project as I have created here in Language Studio. This whole web portal makes the experience so much more easier for us. You can import all the information you know. I had a list of text that GitHub provided, the training data set. For that you can have the option to manually label the data, or you can upload pre-labeled data as well, but it starts with labeling data and obviously the quality of labeling data as well. You can go ahead and train the model. You can evaluate the performance of the model and once you're happy with that, you can deploy the model, you can improve it, you can test it and using which, as you can see on screen, you will be able to extract custom entities that are very specific to your organization or to your industry. So, all of these are some of the features that are available in named entity recognition, and there's some great samples and GitHub code and quick-starts and documentation from Microsoft to help you get started on the API side and the client library side as well.
How does Netflix know how to suggest you comedy? Think YouTube, think any service that sends you personalized recommendations. Under the hood, these content recommendation systems use a technique called entity recognition. So, when your user history reveals that you enjoy comedy using techniques like entity recognition, they create a list called Comedy and recommend it to you. With Azure Cognitive services for language, you can do this and more today.