Skip to main content
Version: 16.4

Overview

Data Quality

One of the primary purposes of conducting workforce analytics is to engage in data-driven decision-making about the workforce. Decisions cannot be effective if they are based on poor-quality data. Thus, it is important to ensure that any data that will be used for workforce analytics are accurate and can be trusted.

from qic-wd.org

DQC has several modules,

DQC Kernel

DQC provides two features mainly,

  • Monitor rules for topic data,
  • And topic data profile.

Monitor Rules​

A set of monitor rules are built-in, they are on difference level,

  • Factor mismatch enumeration: on single factor value,
  • Row not exists: on single topic,
  • Rows not change: on single topic,
  • Rows count mismatch with another topic: on single topic,
  • Factor value is empty: on single factor value,
  • Factor value is blank: on single factor value,
  • Factor string value length mismatch: on single factor value,
  • Factor string value length not in range: on single factor value,
  • Factor value matches Regex: on single factor value,
  • Factor value mismatches Regex: on single value,
  • Factor empty value over coverage: on factor values,
  • Factor value mismatches type: on single factor value,
  • Factor not in range: on single factor value,
  • Factor max value not in range: on factor values,
  • Factor min value not in range: on factor values,
  • Factor avg value not in range: on factor values,
  • Factor median value not in range: on factor values,
  • Factor quantile value not in range: on factor values,
  • Factor stdev value not in range: on factor values,
  • Factor most common value not in range: on factor values,
  • Factor most common value over coverage: on factor values,
  • Factor value not equals another factor's: on single factor value.

Monitor rules can be trigger by predefined jobs,

  • Daily,
  • Weekly,
  • Monthly.

For one specific topic and factor, rules only can be triggered by one of above options.

Trigger by API​

Monitor rules also can be triggered by api,

curl \
--location \
--request GET 'http://host:port/dqc/monitor/rules/run?topic_name=a_topic&frequency=daily&process_date=20200816&tenant_id=1' \
--header 'Authorization: Bearer ...'
  • topic_name: run monitor rules on given topic, it is case sensitive,
    • run rules on all topics when topic_name is not provided,
  • frequency: one of daily, weekly and monthly,
  • process_date: server will compute the date range which data changed,
    • daily: exactly the process date, whole day, 00:00:00.000 ~ 23:59:59.999,
    • weekly: week of process date. A week is from Sunday to Saturday,
    • monthly: month of process date. A month is from 1st to last day of month,
  • tenant_id: required when current user is super admin.

Result of Monitor Rules​

There is no response body via trigger api, result of monitor rules can be reviewed by web client, visit here for more details.

Topic Profile​

Profile data of topic is a real-time computation, it is invoked by web client in topic and pipeline pages, visit here for more details.