best practices of data modeling


best practices of data modeling

 

Contact for any Data Architecture related queries or assistance <br /> retail.architect@wipro.com <br /> OR <br /> dataarchitect-retail@wipro.com

Data Modeling ? Best Practices Author: Krishna R Revankar Technology Architecture Group (Data Architecture) Retail, CPG & Distribution Vertical CONFIDENTIAL© Copyright 2006 Wipro Technologies 2 Data Architecture: Data Modeling Best Practices DMBP1: Analyze the Data requirements, Data sample in detail before starting the data modeling activity. It is very important to analyze the data requirements and sample data to understand the data relations, data patterns. The analytical information obtained here forms the base to derive the entities, attributes and their relationship. Hence examine each and every element of the data and its usage requirements before starting data modeling activity. DMBP2: Prepare a data dictionary document A data dictionary document is a glossary of the data requirements. This will provide information about various business entities, attributes, business rules and enumerations or static values available for certain set of attributes. This document will act as a single repository for the data modeling activity and hence saves major part of the information gathering and analysis time during the data model design and also during the change management process. Best Practices - Architecture DMBP ? Data Modeling Best Practice CONFIDENTIAL© Copyright 2006 Wipro Technologies 3 Data Architecture: Data Modeling Best Practices DMBP3: Draw a Conceptual Data model (CDM) during the requirement analysis The most critical step in data model design is to conceptualize the business requirements and business rules and establish data skeleton which represents the business process of an organization. This act as a blue print for the business users. This can be achieved by means of a graphical representation called the Conceptual Data Model. Conceptual model is an E-R diagram which provides a high level view of the data requirements and association between various data elements. Hence it is essential that a Conceptual Data Model be established and verified from the business users to ensure that the data modeling activity is right on the track. Best Practices - Architecture CONFIDENTIAL© Copyright 2006 Wipro Technologies 4 Data Architecture: Data Modeling Best Practices DMBP4: Drive the Conceptual and Logical data models based on business requirements, not by application implementation or physical implementation point of view. The conceptual data model (CDM) is the business requirement blue print and logical data models (LDM) creates a data presentation blueprint which articulates the business requirements in technical terms ? entities, attributes, relationships etc. Hence for both CDM and LDM to be acceptable and successful in their implementation, they should be driven by the business requirements, business rules and dimensions. The data model should get an approval from the business user community before it goes for an implementation. Best Practices - Architecture CONFIDENTIAL© Copyright 2006 Wipro Technologies 5 Data Architecture: Data Modeling Best Practices DMBP5: Use right data modeling tool Use of right data modeling tool facilitates a major productivity improvement. Make sure that the tool provides the features like ? Strong ER diagram features ? Color coding features for entities ? Reverse engineering ? Schema comparison ? DDL Script generation ? Logical to physical model conversion ? Data Dictionary ? Data repository for configuration management DMBP6: Start data modeling in the early stage of the project Start the data model design in the early stage of the project once all the key requirements are identified. This will help the project stake holders to clearly define the scope from business perspective. Best Practices - Architecture CONFIDENTIAL© Copyright 2006 Wipro Technologies 6 Data Architecture: Data Modeling Best Practices DMBP7: Consider the data quality and data existence as one of the key driver for data model design A good data model is one which covers the key business functionalities and provides room for good quality data and revels the data gaps. Hence design a data model which can be loaded with 90 ? 95% of the data when deployed. To achieve this the data quality should be one of the key driver for the data model design. Collect the sample data, analyze the data, identify the gaps and map the data with the business rules / business requirements. This will give clear picture of how the data quality is and how should the model look like from the Business perspective. DMBP8: Involve the business users and end users during conceptual and logical data modeling It is very important keep business users and end users of the system during the data model design rather at the end of the design phase. These are the users who provide the actual view of the real world complexities associated with their business data and functionality. Make sure that the Conceptual and Logical data model are presented to the key business users and they are involved in all the decision making business meetings / discussions which influence the data model and also make sure that they approve the model. Best Practices - Architecture CONFIDENTIAL© Copyright 2006 Wipro Technologies 7 Data Architecture: Data Modeling Best Practices DMBP9: Create source-target attribute mapping during Logical/physical data model design In the scenario where source system(s) already exists and the proposed target system will be fed with the data from these one or more source systems then it is essential to establish the source and target attribute mapping during the logical data model design rather than doing it at the end of data modeling and during the ETL process design, as this could lead to changes in the data model. Hence start the data analysis activity in parallel to the data model design activity. The data analysts working on the ETL process and data modelers should work closely to come out with the appropriate source- target attribute mapping. DMBP10: One table be associated with one subject area only Make sure that a table will not contain data about more than one subject. Having data about more than one subject area in a table could lead to data integrity issues. Best Practices - Architecture CONFIDENTIAL© Copyright 2006 Wipro Technologies 8 Data Architecture: Data Modeling Best Practices DMBP11: Associate ?type? information with ?typecode? columns Any record that has some form of type information (enumerations) associated with it should contain a type code column, which itself is a foreign key referencing a type code table. The foreign key constraint ensures that no records with invalid types exist and hence it gets validated at the database level itself. DMBP12: Normalization Though higher forms of normalizations are available it is recommended that initially base the ER Design at the third normal form and would consider de-normalization, as may be the case during physical implementation. The physical implementation is mainly driven by the performance requirements, scalability and application nature. The level of normalization mainly based on ? The scalability requirement of the model (the model should be generic to accommodate any application requirement ) ? There is a scope for keeping physical implementation separate from the conceptual / logical model. ? Higher level of data integrity requirements Best Practices - Architecture CONFIDENTIAL© Copyright 2006 Wipro Technologies 9 Data Architecture: Data Modeling Best Practices DMBP13: De-normalization De-normalization should be applied during physical data model stage or on implementation specific data model. The level of de-normalization differs from case-to-case from the implementation point of view. Some possible scenarios where de-normalization can be applied are: ? Too many joins to fetch data from tables. ? Critical performance requirements. ? Requirement to store historic / summary data. ? When you know that some level of redundancy can bring in remarkable performance advantages then de-normalization is a must. Hence having data redundancy may not be a problem in such instances, but not knowing that there is data redundancy could be a major problem. ? Storing periodic running totals, aggregation of child values in their respective parent tables. Best Practices - Architecture CONFIDENTIAL© Copyright 2006 Wipro Technologies 10 Data Architecture: Data Modeling Best Practices DMBP14: Avoid composite Primary Key in a table Every table should have a primary key. Select a key as primary key which does not change. Avoid a composite primary key. ? A Composite Primary key with too many columns results in longer updates in a table. ? This will also have an impact on query performance due to slower joins to these keys. Best Practices - Architecture CONFIDENTIAL© Copyright 2006 Wipro Technologies 11 Data Architecture: Data Modeling Best Practices DMBP15: Adopt a surrogate key When a need for composite primary key arise, then adopt a non-natural key which is alternate and artificial (called as Surrogate Key). ? Surrogate keys help to mitigate the risk of real business data not fulfilling these requirements. ? They are easy to maintain, as the surrogate key typically is a single field (not a compound key), which simplifies the schema (particularly when the key is used in other tables as a foreign key). ? Make sure that these system generated identifiers should necessarily have a unique constraint. ? Make sure to follow a rule that all the composite keys having more than 3 attributes/columns and those keys of the leaf tables (tables which can not have child tables) will be replaced with Surrogate keys. Best Practices - Architecture CONFIDENTIAL© Copyright 2006 Wipro Technologies 12 Data Architecture: Data Modeling Best Practices DMBP16: Create right index on the table Creating a bad index on a table is worse than not creating an index on a table. Hence care should be taken while creating an index on a table. Hence make sure to create an index on the child tables attribute which matches the corresponding attributes of the parent table, as these are the one?s most frequently used in query joins. DMBP17: Create index on the most restrictive table A table is called most restrictive when there is a possibility that only small number of rows are retrieved with respect to the other table with which it is joined. It is essential to create index on the most restrictive table to avoid the full table scans during the query joins. DMBP18: Define the data defaults and approach for Null values during data model design Make sure that the data defaults required for the business are defined during the data model design stage and get an approval for these values. Define a strategy to handle Null / Missing values during the data model design and enforce the appropriate constraints and business rules accordingly. Best Practices - Architecture CONFIDENTIAL© Copyright 2006 Wipro Technologies 13 Data Architecture: Data Modeling Best Practices DMBP19: Model for clearly visible requirements Design data model taking into considerations the future requirements which are clearly visible, not more than that. Don?t include too many things in the data model taking the too far futuristic view which might increase the data model complexity and also might result into data quality issues. Hence take into consideration the clearly visible business requirements over a short span of time than designing it for assuming things which even may not exist. DMBP20: Recursive relationship only if it is essential Make sure that the data model doesn?t have a recursive relationship which might lead to indefinite query loops. Have a recursive relationship only if it is essential with clearly defined relationship established. Best Practices - Architecture CONFIDENTIAL© Copyright 2006 Wipro Technologies 14 Thank You Contact for any Data Architecture related queries or assistance retail.architect@wipro.com OR dataarchitect-retail@wipro.com

PARTAGER SUR

Envoyer le lien par email
1492
READS
9
DOWN
7
FOLLOW
4
EMBED
DOCUMENT # TAGS
#datawarehouse 

licence non indiquée


DOCUMENT # INDEX
Database 
img

Partagé par  santiago

 Suivre

Auteur:Krishna R Revankar
Source:Non communiquée

 

 

santiago a également publié

*   best practices of data modeling