Apache Griffin is a Data Quality Service platform built on Apache Hadoop and Apache Spark. It provides a framework process for defining data quality model, executing data quality measurement, automating data profiling and validation, as well as a unified data quality visualization across multiple data systems. It tries to address the data quality challenges in big data and streaming context.
- Accuracy（准确性）- Does data reflect the real-world objects or a verifiable source
- Profiling（统计）- Apply statistical analysis and assessment of data values within a dataset for consistency, uniqueness and logic
- Completeness（完整性）- Is all necessary data present
- Timeliness（实时性）- Is the data available at the time needed
- Anomaly detection（异常检测）- Pre-built algorithm functions for the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset
- Validity（有效性）- Are all data values within the data domains specified by the business