The Spark Blog

Welcome to the Era of Free Knowledge!

The Winners Take It All: Data Lake vs Unlimited Data Retrieval


About a decade ago, visionaries began harnessing the power of AI to generate business intelligence (BI). By analyzing data collected from various sources, they cleaned and refined it into a polished repository—feeding it into AI systems. Surprisingly, this reduced dataset yielded valuable insights, supporting senior management and boardroom decisions.


Over time, the lesson became clear: the better the quality of the data stored in the data lake, the better the BI outcomes.

 

Now, with the transformative revolution of Generative AI (GenAI), many organizations are adapting their BI strategies to incorporate this groundbreaking technology.


Naturally, they’re applying their hard-earned lessons—once again focusing on reduced datasets stored in data lakes.


But is this approach still effective?


I don’t think so.


When transitioning to the GenAI era, limiting the input data to human-selected and processed information constrains the insights GenAI can generate. It caps the system’s potential by imposing human biases and assumptions.


Why limit GenAI's capabilities?
Generative AI thrives on vast, unprocessed datasets from diverse sources. When fed more comprehensive and interconnected data, Large Language Models (LLMs) can uncover unexpected patterns and deliver game-changing insights—insights we didn’t even know to ask for.


The shift in mindset is clear:
To unleash the true power of GenAI, we need to abandon the foundational assumptions of the "old AI days" and embrace new paradigms.



An Example:
Imagine a head of sales tasked with presenting the Board of Directors (BOD) with the following insights:


  1. Which products generate the best revenue-to-cost ratio and achieve the highest lead-to-opportunity conversion.
  2. Where the company’s marketing dollars are yielding the most effective results and strategy is used.
  3. Strategies for optimizing performance in untapped markets.


This would require pulling data from Salesforce, ServiceNow, HubSpot, and SAP.


Should data scientists first create a data lake?



Not necessarily.


By Shlomo Touboul October 23, 2025
𝐍𝐞𝐰 𝐄𝐑𝐀 𝐢𝐧 𝐭𝐡𝐞 𝐒𝐭𝐚𝐫𝐭𝐮𝐩 𝐖𝐨𝐫𝐥𝐝: 𝐒𝐭𝐨𝐩 𝐂𝐨-𝐏𝐢𝐥𝐨𝐭, 𝐒𝐭𝐚𝐫𝐭 𝐀𝐮𝐭𝐨-𝐏𝐢𝐥𝐨𝐭
By Shlomo Touboul October 22, 2025
𝟒𝟎 𝐘𝐞𝐚𝐫𝐬 𝐒𝐢𝐧𝐜𝐞 𝐒𝐡𝐚𝐧𝐲 𝐂𝐨𝐦𝐩𝐮𝐭𝐞𝐫𝐬, 𝐇𝐨𝐰 𝐈𝐬𝐫𝐚𝐞𝐥, 𝐚𝐧𝐝 𝐈, 𝐂𝐡𝐚𝐧𝐠𝐞𝐝.
By Shlomo Touboul October 5, 2025
The Silent Crisis of Token Exchange Inside AI Enterprises