The dominant narrative suggests that synthetic data will further advance AI. Color me skeptical.
Too many artificial intelligence correspondents extol the merits of synthetic data without going into detail. Synthetic data is currently a subset of A one of three driving factors that will determine the pace of advancement of AI models. These factors include:
- Model architecture
- Calculation (or energy)
- Data (synthetic and real)
So far, more calculations trained on more data within the same transformer architecture correlate to a more powerful model. If current models have already been trained across the entire internet, then data could be the first limiting factor hindering the advancement of AI models. This is where synthetic data comes into the equation.
Synthetic data is artificially generated data that mimics real data. This is fake data. This is NOT real. If synthetic data isn’t real, then why do we care? Even before the proliferation of large language models (LLMs), generating synthetic data was much easier and less expensive than collecting real data. This requires less work. You don’t need any special instruments. And it’s much faster. If you want to create a model that can predict future sales growth based on past sales growth for a…