Voice of Customer through Data Harvesting and Storage in Data Lakes

Consumer Research of the Future

Imagine a world where you have access to all of your consumer reviews data in one single place. A simple logon, query, and review of what was posted in the last 24 hours across all of your e-commerce channels for all of your brands and products, review star ratings rolled up in comments, consumer challenges, and consumer wins all in a matter of 10 minutes. This is the future.

Brands operating in retail are highly consumer-focused as they operate in a hyper-competitive environment. In such an environment it is critical to understand the voice of customer analytics for consumer brands. To enhance their understanding of consumer behavior, they constantly need consumer-generated data about their products. The traditional approach to getting such data has been to use surveys or research reports from companies such as Nielson, Kantar, and other market research firms. However, the emergence of e-commerce websites, social media platforms, and other online sales channels has significantly increased avenues to understand the consumers. To ensure more seamless access to such large amounts of data, brands can harvest data from these websites and store it in data lakes for future use.  

Benefits of Data Harvesting and Data Lake Storage

With consumer data scattered across various online mediums, the starting point for accessing such data begins with harvesting the data across a wide variety of sources. Second, once collected, data is then housed in one central repository available for further refinement and significantly reduces the time required to source data as the need arises by various cross-functional teams. Finally, these established data pipelines provide access points for continuous data flow from source to data lake storage destination.

The diversity of such data is high yet relevant for various internal teams. For example, likes on social media validate the marketing message, dislikes and negative reviews act as a metric to improve customer experience & product quality. 

Challenges in Building a Data Lake

While data lakes provide instant access to relevant data, data harvesting for data lakes that hold consumer-generated data is not without their challenges. Three key things to note when planning for a consumer reviews data lake include:

  • Data Dispersion: Consumer commentary is dispersed across thousands of public sources, making it difficult to collect relevant data efficiently. Such sources include review sites, discussion boards, blogs, social media comments, and private sources, including CRM systems, survey platforms, and chat systems.
  • Limited API Access: While many sources provide APIs, others need to be scraped with custom-built extractors.
  • Schema Variety: Each website has a different schema, and the scrapers need to be adjusted for these schemas and then set up for data scraping. The entire collected data is stored in a Data Lake to make them usable for downstream processes.

A Consumer Reviews Data Lake in Play

A large CPG company that has leaped the future of consumer reviews market research is the world’s leading consumer health and hygiene company with over 100+ brands and operations in 60+ countries. The key motivation for the company was to improve consumer satisfaction and engage directly with dissatisfied consumers to address critical concerns.

Operating geographies

The team at SetuServ harvested consumer reviews data at the product level from 80+ sources across 29+ countries. The extracted data was then cleansed and aggregated into MongoDB and Postgres SQL databases. In addition, the Non-English reviews were translated into English. The collected information was then disseminated through APIs and PowerBI dashboards with access across the global organization.

Data Sources for harvesting

Various country managers used the extracted data in the organization to:

  • Assess trends over time by brand/product
  • Identify low rated reviews that they could respond to

Because collecting data is an ongoing process, the company is able to mitigate negative sentiment faster and stay updated with market issues in real-time. The above project gives a sneak peek into how powerful data harvesting and data lakes can benefit consumer brands. To know more about data harvesting and building data lakes reach out to us at [email protected] or visit https://www.setuserv.com/ for more information.