Recently met with an interesting requirement regarding the OCR. After investigating many solutions, implemented the Azure Search Service. Thought of sharing with the community.
In this article, let us see how to use Azure Search Service for the content reading and searching. The raw data is in the Azure Blob Storage.
Request to read the article about the Azure Blob Storage creation for better understanding of Blob Storage Creation.
I am assuming that, we have the Azure Blob Storage available on the Azure Portal.
1. Go to the Azure Portal and Add New Resource – Storage Account- blob, data lake
2. Once the validation succeeded, then click on Create.
3. Click on Blobs.
4. Add a container.
5. Give a name and click OK.
6. Upload Files.
With this, we are done with the Containers.
Now, let us go back to the home of the Azure portal and Create an Azure Search Service.
1. Click Create a resource.
2. Search for Azure Search and Click on the resource.
3. Click on Create.
4. Give the appropriate inputs and Click Create.
5. Once, it got created, we have two resources. One is the storage account and the other one is the search service.
6. Go to Search Service.
7. On the Search Service, there are few important things to be noted.
a. Data Source
b. Index
c. SkillSet
d. Indexer.
8. We can discuss in detail about the above concepts in another article. But as of now, let us see how to create them.
9. Click on Import Data.
10. As I said earlier, there are 4 things which we are going to create.
11. Let us create the Data Source. Select the Azure Blob Storage.
12. Give a name and click on the Storage Container.
13. Select the Storage Account which we created.
14. Select the Container.
15. If at all, we want to Index any specific folder, give the folder name on the screen. Otherwise leave empty. It will crawl all the files and folders.
16. Now, create the skill set. Enter the skillset name and select the OCR Enabled content.
17. Now, create the Index. Give the appropriate index and make sure that the fields are filled up with appropriate “Retrievable”, “Searchable” based on our requirement.
18. One important thing is, don’t make the content and merged_content fields as “Searchable”, Filterable, Sortable, Facetable. Since they may contain a large content, these properties should not be selected for those fields.
19. Now create the Indexer.
20. After selecting all the parameters properly, click ok.
21. We can see the Search Service as below.
22. Go to the Indexer and it is in Progress state.
23. It will take few mins to index the content. It depends up on the size of the blob files.
24. Once, the Indexer runs successfully, we can search the content.
25. Enter the Keywords and click on Search.
26. We will get the results in JSON format.
In this article, we saw how to use the Blob storage service and the Search Service with OCR enabled in Azure. In the upcoming article, we will see how to do them programmatically.
Happy Coding,
Sathish Nadarajan.
Leave a comment