Developing an AI-based data management solution to remove redundancy from data points
The client is a leading provider of CRM, CMS, digital marketing, and revenue generation solutions for Destination Marketing Organizations (DMOs). The company has 1000+ clients from the travel, tourism, and convention marketing industry from six continents. It partners with destinations & their agencies to engage stakeholders, attract visitors, and win bids for conventions & events.
- 300,000+ records analyzed
- 98.8% accuracy rate
- 80% more efficient than manual work
Data is arguably the most important tool available to a business; provided that it is accurate, updated and non-redundant, and usable. The client had one of the largest repositories of information on organizations and their meetings & events. The database had over 150,000+ meeting histories and data from over 160,000+ organizations that would help DMOs in lead generation for future events.
However, this database had records that were not of much help as the data was redundant and related to similar organizations. Thus a record linkage solution was required to check for data points that were common and if the records were linked or not.
- The client wanted to build a Nodejs-based solution that can help to identify and bring together duplicate records from the database. With this requirement, they reached out to team Daffodil for building a solution that looks for duplicate organizations and meeting records in the database.
Daffodil Software, on analyzing the requirement, proposed the idea of building a self-learning, AI-based record linkage solution. A technical proposal was shared by the team at Daffodil that illustrated how the idea can be executed using the BERT model. The power and advantages of developing an AI-based solution were exhibited through a result comparison with a NodeJS application.
Team Daffodil developed two different BERT models for merging organizations and meeting records with similar entities. Bidirectional Encoder Representations from Transformers (BERT) is a transformer-based machine learning technique for natural language processing (NLP) pre-training.
Building the BERT Model for Organizations
Daffodil started by analyzing the possibilities in which the organization data may exist. The organization names had text, numbers, and Unicode characters. Also, the database had entries made in short form. To remove unwanted entries from the database, text analysis, and data cleaning was performed on it. The data was then fed to different BERT models to analyze which BERT model gave the accurate output.
Manually, this task would have taken 60+ days to complete. With the NodeJS solution, the process would take 2-3 hours but with no idea of score accuracy (as the solution won’t grow with data and time). With the AI solution, identifying and merging similar entities took 4-6 hours with 99% accuracy. The best part about the BERT model was its accuracy would remain consistent, irrespective of the database’s growth.
Building the BERT Model for Meetings
For every meeting, the date and time would vary and these were the only fields that were available for all the records. The meeting database was fed to a variety of models – Convolutional Neural Network (CNN), Sequential Model, Random Forest, and Decision Trees. These models were tested with 30,000 records to figure out which one offers the accurate output. With the NodeJS solution, this task would have taken 10 hours to complete and with the AI technology, the same task took over 24 hours with an accuracy of 98.8%.
The record linkage solution built using Artificial Intelligence has proven to be time-efficient, has an incomparable accuracy level, and is self-learning. This ascertains that even if the size of the database increases in the coming days, the functionality of the solution won’t be affected, which wouldn’t have been the case with a static, rule-based algorithm built using NodeJS.
The AI- based data management solution saved 80% of the time that it took to manually identify the duplicates and merge them. Compared to the static rule-based NodeJS solution, the AI-based solution offers a 98.8% improved accuracy rate as the AI models were trained with more than 30,000 records to ensure its accuracy.
Read Related Case Studies
Get in Touch
Sign up for a 30 min no-obligation strategic session with us
Let us understand your business objectives, set up initial milestones, and plan your software project.
At the end of this 30 min session, walk out with:
- Validation of your project idea/ scope of your project
- Actionable insights on which technology would suit your requirements
- Industry specific best practices that can be applied to your project
- Implementation and engagement plan of action
- Ballpark estimate and time-frame for development