In a recent blog post, I discussed many challenges of modern data management including architecture, quality management, data modeling, data governance, and curation and cataloging. Although much of data modernization focuses on data at rest, data in motion is equally important and arguably more challenging. Data pipelines are the means by which we move data through today’s complex analytics ecosystems.
Data movement has stretched well beyond simple and linear batch ETL that was the standard of early data warehousing. Abundant data sources and multiple use cases result in many data pipelines – possibly as many as one distinct pipeline for each use case. Capabilities to find the right data, manage data flow and workflow, and deliver the right data in the right forms for analysis are essential for all data-driven organizations.