The client is an emerging label maker with a YoY growth rate of approximately 30%-40%. They sell their labels to Retail Brand Owners (RBOs) or export houses who incorporate them into finished goods such as apparels, carpets, shoes, and other products for export. To ensure direct sales, the client’s sales force is distributed nationwide and actively approaches RBOs and export houses. Recently, the client purchased export data from the database supplier to identify other exporters in similar lines of business and engage them for new business opportunities, while also exploring untapped potential in their current business with existing export houses customers.
1. The data received from the database supplier was too large (around 1–2 GB) for the client’s existing on-prem infrastructure, making data processing in Microsoft excel impossible.
2. The data format varied month over month due to constant changes in infrastructure and regulations from the database provider, but a few key required fields were always present.
3. Government norms prohibited API connections to the database, so manual upload/download was required from the client’s end.
4. Mapping the master list of RBOs received to the RBOs present in the internally maintained ERP tool was not possible due to the lack of a common identifier, and the naming convention between the two files required manual name matching.
5. The inability to map the datasets together, combined with the large data size, prevented the client from generating any insights from the data.
To solve the challenges faced by the client, a three-step approach was taken:
- A one-time historical data cleaning process was undertaken, including name matching and mapping to enable future analysis.
- An ETL (extract, transform, load) job was implemented to ensure ongoing naming correction, making the data consistent over time.
- A visualization dashboard was created to enable the client to derive insights from the data and make informed decisions.
Historical data cleaning process:
To streamline the data processing, all the data was first manually uploaded to Amazon Simple Storage Service (S3). Using Amazon Athena, we then appended only the consistent fields across all datasets that were required for analysis. This step ensured that we obtained a single database for further analysis, making it easier to identify patterns and draw conclusions.
Upon analyzing the appended database, we discovered that similar RBOs/Exporters names were written in an inconsistent manner (sample in figure below) due to possible incorrect data entry into the system. Our first step was to identify these RBO with similar names and tag them to a single unique identifier.
To ensure consistency in the RBO/Exporter names, we took the following steps:
- We converted all letters within the RBOs to lowercase to avoid differences due to capital and small letters.
- Spaces between the words were removed to make the data consistent.
- Special characters like “.”, “,”, “#”, and many others were removed to avoid inconsistencies.
- Abbreviations like PVT, LTD, and Exp were expanded to their full names to ensure continuity.
- FindMatches transform in AWS Glue was used to generate labeling file for preparing the sample test dataset. The labeling file was then used to train the actual data.
- The output from the FindMatches transform was aggregated to identify the percentage match between each name flagged by the algorithm.
- Combinations with a 95% or higher match were analysed, and a unique identifier was created by keeping the most consistent name.
As a result of this process, approximately 70% of the total unique exporter names were tagged with a unique identifier. The remaining names were either tagged manually, or dropped in concurrence with business requirements.
To map the remaining 85% of the RBOs from the database master, which were grouped by identifier, to the naming nomenclature in the client’s ERP system, we followed the following steps:
- We extracted the list of RBO names from the ERP system and created a master list.
- We then manually reviewed the RBO names in the master list and mapped them to the RBO identifiers we had created in the previous step.
- We used the mapping between RBO names and identifiers to create a lookup table that could be used to automate the mapping process in the future.
- We used this lookup table to automatically map the RBO names in the appended database to the RBO names in the client’s ERP system.
- We validated the mapping by manually reviewing a sample of the mapped RBOs.
This process ensured that all the RBOs in the appended database were mapped to the correct
RBO names in the client’s ERP system, which would allow for easier analysis and reporting in the future.
Automated process for future:
During the historical data cleaning process, a database consisting of mapping files and identifiers was created and stored in Amazon Relational Database Service (RDS) for future mapping. This step ensured that all historical mappings were directly mapped to the new files received from the database provider. If there was a mismatch, the process to make BO names consistent was followed, followed by re-mapping using the RDS service. If some RBOs remained un-mapped, then a mail with the file attached was sent to the business user for manual tagging. The manual tags were appended to the RDS via an S3 job triggered using AWS Lambda. This process ensured that RBOs were mapped correctly on a recurring basis, and the database remained up-to-date with new mappings.
Note: Client already had license to Tableau for visualization & hence Tableau was chosen as the visualization layer for the analysis.
Building an interactive visualization layer for analysis:
To prepare a decision board, we first needed to identify the key insights from the data that would be most relevant for our client’s sales operations. Based on the client’s requirements, we identified the following key insights:
- Sales trends by region: Analyzing the sales trends by region would help the client identify which regions were performing well and which regions needed more attention in terms of sales efforts.
- Popular product categories: Understanding which product categories were most popular among RBOs would help the client focus their sales efforts on these categories and potentially expand their product offerings in those areas.
- Top RBOs by revenue: Identifying the top RBOs by revenue would help the client focus on building and maintaining strong relationships with these key customers.
- Salesperson performance: Analysing the performance of the client’s sales force would help identify areas of strength and areas that need improvement.
- Exporter identification: Using the export data purchased from the government, we can identify other exporters in similar lines of business and actively engage them for new business.
Once we had identified these key insights, we created a decision board that would allow the client to easily visualize and interact with the data. The decision board included interactive charts and tables that could be filtered based on various criteria such as region, product category, and salesperson. This would allow the client to quickly identify areas that needed attention and make data-driven decisions to improve their sales operations
Using the decision board, the client was able to :
A) Identify untapped RBOs, which provided new opportunities for sales growth. By identifying RBOs that they had not previously worked with, the client could potentially increase their market share and reach new customers.
B) Analyse the sales data for their existing customers and identify areas where they could improve sales. This allowed them to focus their efforts on specific product categories or customer segments, resulting in increased revenue and profitability.
To further enhance the value of the decision board, the client could leverage various services provided by AWS. For example, they could use Amazon QuickSight to create interactive dashboards that enable users to explore the data in real-time, providing greater insights into customer behaviour and market trends. The client could also use AWS Lambda to automate data processing tasks, enabling them to quickly and efficiently extract insights from large data sets. Furthermore, AWS Machine Learning services such as Amazon SageMaker could be used to develop predictive models that can help the client identify future opportunities for growth and optimize their sales strategy.
In summary, the decision board enabled the client to identify opportunities for growth from untapped RBOs and optimize their sales strategy with existing customers. By leveraging AWS services, the client could unlock additional value from the data and work towards increasing their market share in the country.
Gaurav H Kankaria is the Assistant Vice President — EXL Services. He has over 9+ years of experience designing and implementing solutions to help organizations in the Retail Domain. He is an AWS certified Solution Architect — Professional and Data Analytics — Specialty.
Vaishnavi B is an Apprentice Leader at Ganit Business Solutions Private Limited. She has years of experience in building strategic decision boards and demand forecasting solutions to help the supply chain team of Retail organizations. She is an AWS certified Data Analytics — Specialty.
CA Gopal Goyal, is a chartered accountant by profession with over 7+ years of experience. He is currently working in Amazon and is responsible for Finance business partnering for Fulfillment center operations. He has working experience in Apparels, FMCG and Metal manugacturing industry.