Metinvest Enhances Document Search on Microsoft SharePoint Using Machine Learning

Learn how mining and metals company uses machine learning for imroving document management and search.
Organization:
Metinvest
Metinvest is an international, vertically integrated mining and metals company. The Group comprises mining and metallurgical assets in Ukraine, Europe and the US, complemented by a global sales network.
Ukraine  Ukraine
Metinvest, an international steel and mining group, faced significant challenges managing and searching through terabytes of data on Microsoft SharePoint, where access management, security, and data governance are critical requirements for such extensive data solutions.
In response, we integrated an advanced multilingual search system enhanced with machine learning. This solution effectively identifies linked documents, detects duplicates, extracts key data insights, and automates document categorization. It allowed decreasing documents management time and improving document discovery and search relevance.
Metinvest solution overview
Here are the key features:
Full-text, attribute-based, and category-based multilingual search. Our solution offers a robust multilingual search system that supports full-text, attribute-based, and category-based searches. This capability allows users to perform comprehensive searches across multiple languages, ensuring that they can find relevant documents quickly and efficiently, regardless of the language in which they are written.
Linked documents identification. The system can identify and link related documents, providing users with a more connected and contextual understanding of their data. By recognizing relationships between documents, it helps users navigate through relevant materials seamlessly, enhancing the overall document management experience.
Document duplicates discovery. To optimize storage and streamline data management, our solution includes a feature that discovers and flags duplicate documents. This reduces redundancy, saves storage space, and ensures that users are always working with the most current and accurate versions of their documents.
Meaningful information extraction from documents. Leveraging advanced machine learning algorithms, the system can extract key data insights from documents. This feature allows users to quickly identify and utilize important information without manually sifting through extensive content, thereby increasing productivity and decision-making efficiency.
Automated document categorization. Our solution automates the categorization of documents based on their content, significantly reducing the time and effort required for manual sorting. This automated process ensures that documents are organized logically and consistently, facilitating easier retrieval and management.
Data access management based on Azure Entra ID (formerly Azure Active Directory) organization’s permission model. Security and access management are very important. The solution integrates with Azure Entra ID to provide robust data access management, ensuring that document access is controlled and compliant with the organization’s permission model. This ensures that sensitive information is protected and accessible only to authorized personnel.
The initial solution was developed before the advent of OpenAI's technologies, necessitating the training of most models and algorithms from scratch. However, the advent of large language models has brought significant improvements in context understanding and document processing capabilities, offering a substantial enhancement for these types of systems.
For more information, please refer to this page for additional case studies on the use of generative AI technologies.

Technologies

.NET Core
.NET Core
A free and open-source, managed computer software framework for Windows, Linux, and macOS operating systems.
Azure AI Document Intelligence
Azure AI Document Intelligence
A collection of services that utilizes machine learning to extract, analyze, and understand data from documents.
Azure AI Language
Azure AI Language
A suite of natural language processing services that enables developers to analyze, understand, and generate human language text.
Azure Machine Learning
Azure Machine Learning
Use an enterprise-grade AI service for the end-to-end machine learning lifecycle.
Blazor
Blazor
A modern front-end web framework based on HTML, CSS, and C#.
Microsoft Azure
Microsoft Azure
A cloud computing service created by Microsoft for building, testing, deploying, and managing applications and services.
Microsoft Graph SDKs
Microsoft Graph SDKs
Designed to simplify building high-quality, efficient, resilient applications that access Microsoft Graph.
Microsoft SharePoint
Microsoft SharePoint
A collaboration platform allowing organizations to create, manage, and share documents, information, and resources within a secure and centralized environment.
Milvus
Milvus
An open-source vector database built for GenAI applications.