Database Management

Data Quality[]

Definition: Data quality involves how reliable and effective data is. To ensure that data is of high quality, it must be cleaned, and scanned for any problems that could occur.

Reference: [[1]] 10/27/08

The following list shows 5 areas of interest that help maintain high quality data.[]

Relevance- To see if data is relevant, we ask our self if the data meets the specific needs of a particular company. All data is collected for a purpose, and if this purpose is not met, there must be irrelevant data. We also can ask if the data could be used for other purposes that are not currently addressed. Another way to address data relevance is to train the people that enter the data to know the standards of their work.

Accuracy- Having inaccurate data leads to poor decision making. This means that the data must be discarded or cleaned up. To make sure data is kept as accurate as possible, we have to devise methods of testing the data for accuracy.

Normalization- If data are not normalized it leads to one or more of the three types of anomalies (update, deletion, or insertion). Normalizing data is basically making sure that data are stored where they should be in an organized fashion. This will prevent the anomalies from occurring and reducing the integrity of the database. Relations should be normalized to at least the Boyce-Codd Normal Form.

Timeliness- Data that are not received in time to make a decision are useless. The data must be available when it is required. If a database cannot generate useful data in a timely fashion, the database is worthless. Some applications require data generation faster than others, but in all cases data needs to be ready before it is needed.

Completeness- In order for data to be considered complete, there can be no missing records or elements. We must have a target in order to see if the data are complete. If there are no standards as to what data will be needed, we cannot know if the data we currently have are sufficient. If there are fields that are consistently not used, they can probably be omitted from the database. If there are not enough data to aid in making decisions, we must narrow down what data can be added in order to better aid business decision making.

References: [[2]] 10/27/08 [[3]] 10/27/08 [[4]] 10/27/08

--Nels5093 01:10, 28 October 2008 (UTC)