Meljun Cortes Database System Instructional Manual

MELJUN CORTES DATABASE System Instructional Manual
View more...
   EMBED

Share

Preview only show first 6 pages with water mark for full document please download

Transcript

DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES PRELIM PERIOD Lecture no. 1: DATABASE SYSTEMS 1.1 Introduction to Database Systems Database (DB) - An integrated collection of related data By related data we mean that the data represents logically coherent facts about some aspects of the real world that are required by an application Universe of discourse or mini-world - The part of the real world that a database is designed to model within a computer By integrated we mean that the data for multiple applications is stored together and manipulated in a uniform way on a secondary storage such as a magnetic or an optical disk. The primary goal of integration is to support information sharing across multiple applications. a Database System consists of 1) an application specific database, 2) the DBMS that maintains that database, and 3) the application software that manipulates the database Database Systems and Database Management Systems A Database Management System (DBMS) is a collection of programs that controls a database. Specifically, it provides us with an interface to create, maintain, and manipulate multiple databases DBMS is a general-purpose software system that we can use not only to create and maintain multiple databases but also to implement database systems for different applications as well. As opposed to a DBMS, which is general-purpose, a database system is developed to support the operations of a specific organization or a specific set of applications. THE DATABASE APPROACHES (Ways of Handling Databases) 1. Manual – manual manipulation of data Ex. Manual card catalog 2. Computerized – electronic data handling Traditional File Processing System (TFPS) Database Management System (DBMS) DBMS vs TFPS TFPS - application programs directly (filenames and data definitions are embedded in each program.) -data are integrated in a single, shared data file, all application programs that share the data file must be aware of all the data in the file, including those data items that they do not make use of or need to know - The problem gets worse when a new field is added to a data file MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 1 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES Disadvantages of TFPS 1. 2. 3. 4. 5. 6. 7. Uncontrolled Redundancy Inconsistent Data Inflexibility Limited Data Sharing Poor Enforcement of Standards Low Programmer Productivity Excessive Program Maintenance DBMS - stores the structure of the data as part of the description of the database in the system catalog, separately from the application programs Characteristics of DBMS Data Abstraction DBMSs allow data to be structured in ways that make it more understandable and meaningful to the applications than the ways data are physically stored on disks. They provide users with high-level, conceptual reC:\Documents and Settings\Arnel & Maegen\My Documents\CSCI12_lecedited.doc presentations of the data—a table in relational DBMSs, or an object in object-oriented DBMSs, to give two examples—while they hide storage details that are not of interest to most database users. program-data independence - the physical organization of data can be changed without affecting the application programs program-operation independence- the implementation of abstract operations can be changed without affecting the code of the application programs, - as long as their calling interface stays the same Data abstraction and, in particular, data independence is what facilitates data sharing and integration. This is the main advantage of DBMS against Traditional File processing whose application programs depend on the lowlevel structure of the data or storage organization, each program stores its data in a separate data file Reliability DBMSs provide high reliability by 1) enforcing integrity constraints and 2) ensuring data consistency despite hardware or software failures. Integrity constraints reflect the meaning (or, the semantics) of the data and of the application ( ex. Data type) Constraints – conditions, restrictions Data consistency that is, interrupted update operations do not corrupt the database with values that violate the integrity constraints and no data in the database is lost. After a failure, a DBMS automatically recovers, restoring the database to the consistent state in which it existed just prior to the interruption. This consistent state is constructed as follows. During recovery, a DBMS rolls back all interrupted transactions, obliterating their updates from the database, and re-executes successfully terminated transactions as necessary, restoring their updates in the database Efficiency DBMSs support both efficient space utilization and efficient access to data. By making use of the data description in the catalog, DBMSs are able to minimize data redundancy, which in turn saves both space, by storing each data item only once, and processing time, by eliminating the need of multiple updates to keep the replicas consistent and up-todate. DBMSs enhance the performance of queries by means of optimizations and the use of access methods to data based on their values. Optimizations simplify the queries so that they can execute faster, and access methods allow direct access to locations where relevant data are stored, in a way similar to the access provided by the index in the back of a book. DBMSs decrease response time of transactions by allowing multiple users to access the database concurrently MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 2 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES 1.2 Relational Databases Relational Database Schema A relational database schema is a set of table schemas and a set of integrity constraints. Integrity constraints can be sorted into two kinds:   structural (model-specific) integrity constraints that are imposed by the model as discussed below, and semantic (application-specific) integrity constraints imposed by the application, such as the constraint, for example, that the balance of a savings account cannot be negative. Keys - Keys are columns whose values are sufficient to uniquely identify a row TYPES OF KEYS 1. Primary Key – uniquely identifies a record 2. Secondary Key – used to access a group of records with common attributes 3. Alternate Key – candidate to be Primary Key 4. Composite Key – composed of two or more columns to access a unique record 5. Foreign Key - a non- key attribute (ordinary column ) in one table, but a primary key in another. - establishes association(relationships) among tables within one database (in a relational database schema) DDL (Data Definition Language) The command to create a table in SQL is the CREATE TABLE command. SQL supports all the basic data types found in most programming languages: integer, float, character, and character string. SQL commands are not case sensitive. CREATE TABLE MEMBER ( MemNo integer(4), DriverLic integer, Fname char(10), MI char, Lname char(15), PhoneNumber char(14), PRIMARY KEY (MemNo), UNIQUE (DriverLic) ); The primary key is specified using the PRIMARY KEY directive, alternate keys using the UNIQUE directive DML (Data Manipulation Lanaguage) Update Operations Relational DML allows us to insert and delete rows in a table as well as to update the values of one or more columns in a row. In SQL, only one row can be inserted at a time, by specifying the values of each column, as in the following example: INSERT INTO MEMBER VALUES (101, 6876588, 'Susan', W, 'Jones', '412-376-8888'); This statement inserts a new row for Susan W. Jones in the MEMBER table. In SQL, strings are enclosed within single quotes. Delete and update can be applied to multiple rows that satisfy a selection condition. In SQL, a selection condition in a deletion is specified by a WHERE clause. In the simplest case, a row is selected by specifying the value of its primary key. For example, the statement DELETE FROM MEMBER WHERE MemNo = 102; deletes the row with member number 102 from the MEMBER table. The following statement changes the middle initial of the member 101 in the MEMBER table. UPDATE Member SET MI = S WHERE MemNo = 101; MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 3 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES An update operation succeeds if it does not violate any integrity constraints. For example, an insert operation will not succeed if it attempts to insert a row whose keys, primary and alternate, conflict with existing keys. That is, if the row were to be inserted, the property that keys should be unique would be violated. On the other hand, deleting a row never violates a key constraint, unless the deleted row is referenced by a foreign key. In that case, deleting a row might violate a referential integrity constraint TOOLS FOR QUERIES 1) QBE (Query By Example) Query By Example (QBE) is another visual query language developed by IBM [Zloof, 1977] to simplify an average end-user's task of retrieving data from a database. QBE saves the user from having to remember the names of tables and columns, and the precise syntax of a query language. The basic idea is to retrieve data with the help of query templates. QBE works as follows:   the system provides the user with a skeleton or query template of the tables in the database, and the user fills in the tables with examples of what is to be retrieved or updated. A skeleton of a table is basically a copy of the table without any rows, i.e. an empty table. For simple selection conditions, the examples can be constant values, such as Susan and 100, or comparisons with constant values such as 100, specified under a column Projection in QBE Projection is specified by selecting the show button associated with each field, which we denote in our example with "P.". To print all columns of retrieved tuples, we only need to put one "P." under the name of the table. EX. displays MemNo, Lname, and PhoneNumber from MEMBER: QBE1: MEMBER |MemNo| DriverLic| Fname| MI| Lname| Address| PhoneNumber| P. P. P. The result of a query is displayed in a result table, which subsequently can be either stored or manipulated further. In Microsoft Access, the resulting table is called a datasheet. Selection in QBE QBE2: Retrieve all members whose first name is John. MEMBER |MemNo| DriverLic| Fname| MI| Lname| Address| PhoneNumber| P. John By placing P. under the table name, this will retrieve and display the data in all the columns. QBE3: Retrieve the name and member number of all the members whose member number is greater than 100. MEMBER |MemNo | DriverLic| Fname| MI| Lname| Address| PhoneNumber| >100 P. P. Comparison with constant value (in the above example the constant value is 100) is placed in the appropriate column. The resulting table will have the following columns: Result| MemNo | Fname | Lname | In QBE, a disjunction (OR) is expressed by using different examples in different rows of the skeleton. QBE4: Retrieve the name and member number of all the members whose first name is John or Susan. MEMBER |MemNo| DriverLic| Fname| MI| Lname| Address| PhoneNumber| P. P.John P. P. P.Susan P. A conjunction (AND), on the other hand, is expressed in the same row. QBE5: Retrieve the name and member number of all the members whose first name is Susan and whose member number is greater than 100. MEMBER |MemNo | DriverLic| Fname| MI| Lname| Address| PhoneNumber| P.>100 P.Susan P. If the conjunction is a condition involving a single column, the condition can be specified using the AND operator, as in SQL. For example, if the MemNo should be greater than 100 and less than 150, this is specified under the MemNo column as: ( _x > 100 ) AND ( _x < 150 ) MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 4 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES Join in QBE Joins can be expressed by using common example variables in multiple tables in the columns to be joined. QBE6: List the member number and last name of all the members who currently have a borrowed book. MEMBER |MemNo | DriverLic| Fname| MI| Lname| Address| PhoneNumber| P._join P. BOOK|Book_id|CallNumber|Edition|BorrowerMemNo|BorrowDueDate| _join To express multiple joins you can use multiple example variables at the same time. SEATWORK: 1.What is a Relational Database? 2.Enumerate the different types of Keys and give an example QUIZ # 1 Complete SQL Lecture no. 2: 2.1 SQL Structured Query Language SQL – is the de-facto standard query language for relational DBMS. - is a comprehensive language providing statements for both data definition and data manipulation. SQL DDL – (Data Definition Language) - Provides basic commands for defining the conceptual schema of a database. SQL Provides 3 Numeric data types: 1.) Exact Number – These are integers or whole numbers which maybe positive or negative or zero. SQL Support 2 integer types: 1.) Integer (INT) 2.) SMALLINT 2.) Approximate number – these are numbers that cannot be represented exactly, such as real numbers and fractional types. 3.) Formatted Number – theses are numbers stored in decimal notation. Formatted numbers can be defined using the ff: 1.) Decimal (ij) 2.) DEC (ij) 3.) Numeric (ij) Where: I = is the precision on the total number of digits excluding decimal point. J = is the scale, on the number of fractional digits. Default scale is zero (0) Syntax in creating a database name in SQL Query analyzer 1.) CREATE DATABASE USE 2.) CREATE ON (NAME = DATA FILE NAME FILENAME = “”) 3.) CREATE TABLE ( () PRIMARY KEY, () ); 4.) INSERT [INTO] [(column_list)] VALUES (data_values) 5.) SELECT * FROM MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 5 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES SPECIFIC RELATIONAL OPERATIONS 1.) Projection Operation (Π) - Selects the attributes or an attribute list from a table r, while discarding the vest. 2.) Selection Operation (б) - Selects some rows in attribute r that satisfy a selection condition (alias predicate). 3.) Join Operation - Combines two tables in one, there by allowing us to obtain more information. SEATWORK: 1.) What is SQL and it’s capabilities? 2.) Create a Database, Insert values and view the data inserted. QUIZ # 2 PRELIM EXAMINATION MIDTERM PERIOD Lecture no. 1: DATABASE DESIGN 1.1 Database System design COST OF DATABASE APPROACH If you are to implement DBMS in an organization you need to consider these 4 things : 1. New personalized personnel - organization should have or train individuals to : a. maintain the new database software b. develop and enforce new programming standards c. design databases d. manage the staff of new people to train the new employees - this personnel will increase or may increase productivity (should not minimize skills ) 2. Need to explicit back – up - provide back – up copies of data because : a. it is helpful in restoring damaged data files b. provides validity checks on crucial data 3. Interference with shared data - concurrent access to shared data via several application program problems a. when 2 concurrent users both want to change the same or related data inaccurate results can occur if access to data is not properly synchronized. b. When data are used exclusively for updating, different users can obtain control of different segments of the database to lock – up any use of the data. MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 6 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES Organizational Conflict - a shared database requires a consensus of data definition a. conflicts on how to define data length and coding rights to update shared data and associated issues. TYPES OF DATABASE 1. Operational Database - contains business transaction and history of daily business activities - used to support the on – going daily activities of the organization - use on the “ Transaction Processing System “ Ex. Customers orders, purchases, accounting, shipments and payments 2. Managerial Database - used by middle managers for planning control, summaries of operational database - summary of operation - use on “ Management Information System “ 3. Strategic Database - used by senior managers to develop corporate strategies and seek competitive advantage - contains information on competitors to economic factors as well as corporate information - used on “ Decision Support System “ GENERIC TYPES OF DATABASE APPLICATION 1. Data Capture - captures transaction data, populate databases and maintain the currency of data, gather data 2. Data Transfer - moves / transfers data from one database to another - Ex. From operational to managerial MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 7 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES 3. Data Distribution - application resulting from data analysis - converts data into a readily useful information and present them to the management in a readily understandable form - Ex. Report, summary and graphs COMPONENTS OF DATABASE ENVIRONMENT DATA ADMINISTRATORS SYSTEM DEVELOPERS CASE tools user interface DBMS Repository END USERS Application Database 1. CASE Tools - Computer – Aided Software Engineering ( CASE ) tools - Automated tools used to design databases and application program 2. Repository - centralized knowledge base containing all data definitions, screen and report formats and definitions of other organizations and system components containing definitions of data format 3. DBMS - commercial software system used to provide access to the database and repository 4. Database - an integrated collection of data, organized to meet the information needs of multiple users in an organization - contains occurrences of data ( value itself 5. Application Programs - computer programs are used to create and maintain the database and provide information to users MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 8 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES 6. User Interface - languages, menu and other facilities interacted by the users front and support - use of menu driven system, mouse and voice recognition system to promote end – user computing – user who are not experts, can define their own report, displays and application 7. Data Administrators - persons who are responsible for designing databases and for developing policies regarding databases security and integrity 8. they use CASE tools to improve the productivity of databases planning and design. System Developers - persons such as system analysts and programmers who design new application programs. 9. They use CASE tools for system requirement, analysis and program design. End Users Lecture no. 2: - persons through the organization who adds, edits, delete and receive information - encoders ENTITY – RELATIONSHIP MODELS 2.1 Entity – Relationship models ENTITY – RELATIONSHIP MODELS Relationship between two or more entities. CATEGORIES OF ASSOCIATION 1. ASSOCIATION BETWEEN DATA ITEMS Represent the relationship of data item or shows how each data item is related to another. Each type of data item is represented by an ellipse or bubble with the data item enclosed. Association between data items is represented by an arrow connecting the data item bubbles. Example of data items that has - no meaningful association. STUD # EMPLOY # Example of data items that has - meaningful association STUD # STUDNME MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 9 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES Types of Association 1. One - association - means that at any point in time, a given value of A has one and only one value of A, then the value of B is implicitly known. Implicitly known means that it can be understood though not plainly expressed. We represent a one – association with a single – headed arrow. A B Ex. EMPLOYEE ADDRESS 2. Many – association - means that at any point, a given value of A has one or many values of B associated with it. We represent a many – association with a double – headed arrow. Ex. A STUD # B SUBJECTS MULTIVALUED ATTRIBUTE - occurs potentially multiple times for each item of A 3. Conditional Association - with this, for a given value of data item A there are two possibilities: either there is no value of data item B or there is one ( or many ) value (s) of data item B. A conditional association is represented by a zero recorded on the arrow near the conditional item. A Ex. B Conditional item BED PATIENT CARDINALITY - term used by the analysts that is represented by the arrow heads and zeros on the arrows which can be thought of as having minimum and maximum values. Reverse Association If there is an association from data item A to data item B, there is also a reverse association from B to A. MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 10 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES Types of Reverse Association 1. One – to – one association Means that at any point in time, each value of data item A is associated with zero or exactly one of data item B. Conversely, each value of B is associated with one value of A. A B Ex. STUD # STUDNME 2. One – to – many association Means that at any point in time, each value of data item A is associated with zero, one or many values of data item B. However, each value of B is associated with exactly one value of A. The mapping from B to A is said to be many - to – one, since there may be many values of B associated with one value of A. A B Ex. STUD # EXAM 3. Many - to – many association Means that at any point in time, each value of data item A is associated with zero, or one or many values of data item B. Also each value of B is associated with zero, or one or many values of A. A B Ex. STUD # COURSE II. ASSOCIATION BETWEEN RECORDS Shows the relationship between records. Crow’s Foot - used to distinguish one and many associations between entities and records. Crow’s Foot Notation - used to represent the association between records. Types of Association 1. One Association - no crow’s foot ( one - to – one ) HUSBAND STUDENT Ex. WIFE GRADE MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 11 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES 2. Many Association - represented by a crow’s foot EMPLOYEE STUDENT BENEFICIARY COURSE Ex. DATA MODELS Representation of the data about entities, events, activities and their associations within the organizations. CATEGORIES / GROUPS OF DATA MODELS I. SEMANTIC DATA MODEL Use of capture all meaning of data and to embed this as integrity and structural clauses in the database definitions. Such concepts as class, subclass, aggregation, dynamic properties and structures and handling object of different types ( images, voice print, as well as text and data ) are included in the SDM and other semantically rich data models. II. RELATIONAL DATA MODEL The relational data model uses the concept of a relation to represent what we have previously called a file that is a relation represents an entity class. A relation is viewed as a two dimensional table. The choice of many database builders and users is the relational data model. It is different from other models not only from the architecture but also in the following ways : 1. Implementation Independence - it logically represents all relationships implicitly and hence, one does not know what associations are or not physically represented by an efficient method. Relational shares this property with ER – D. 2. Terminology - it uses its own terminology, most of which has equivalent terms in other data models. 3. Logical Key Pointers - it uses primary and secondary keys in records to represent the association between 2 records, whereas ER – D uses arc between entity boxes. 4. Normalization Theory - properties of database that make it free of certain maintenance problems have been developed within the context of the relational data model ( although this properties can also be designed into an ER – D or a network data model ). MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 12 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES 5. High Level Programming Languages - P. L. have been developed specifically to access database defined via the relational data model; these languages permit data to be manipulated as groups of files than procedurally one record at a time. III. HIERARCHICAL DATA MODEL Organizations are usually viewed as a hierarchy oppositions and authority. Computer programs can be viewed as hierarchy of control and operating modules; and various taxonomies of animals and plants view elements in a hierarchical sets of relationship. The hierarchical data model represents data as a set of nested one to many relationships, the hierarchical data model is used exclusively with hierarchical database management systems; since such systems are in general, being phased out. IV. NETWORK DATA MODEL The network data model permits as much or as little structure as is desired. We can even create a hierarchy ( a special of a network ) if that is what is needed. As the hierarchical data model, if a certain relationship is not explicitly included in the database definition, then it cannot be used by a DBMS in processing a database. V. ENTITY RELATIONSHIP DATA MODEL ( ER – DIAGRAM ) It is based on the perception of a real world that consists of a set of basic objects called entities and relationships among entities / objects. It is a graphical notation that uses special symbols to indicate relationship among entities intended primarily for the database design process. Basic Symbols Entity Relationship Data Item Stands for “is a” Primary Key Class - sub - class ISA Degree The number of entities that participate in a relationship. MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 13 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES Most Typical Degrees for Relationship 1. Unary Relationship - relationship between instances of the entity class. EMPLOYEE Ex. PERSON 2. Binary Relationship - relationship between instances of two entity classes. PARENT CHILD Ex. CUSTOMER ORDER 3. Ternary Relationship - relationship among instances of three entity classes. Ex. PRODUCT VENDOR WAREHOUSE MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 14 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES SEATWORK: 1.) Give two examples using Unary, Binary and Ternary Relationships QUIZ # 3 Normalizing a Database Normalization - is a process of reducing redundancies of data in a database. - is a technique that is used when designing and redesigning a database. - is a process or set of guidelines used to optimally design a database to reduce redundant data. The Raw Database A database that is not normalized may include data that is contained in one or more different tables for no apparent reason. This could be bad for security reasons, disk space usage, speed of queries, efficiency of database updates, and, maybe most importantly, data integrity. A database before normalization is one that has not been broken down logically into smaller, more manageable tables. COMPANY_DATABASE Emp_id Last_name First_name Middle_name Address City State Zip Phone Pager Position Date_hire cust_id cust_name cust_address cust_city cust_state cust_zip cust_phone cust_fax ord_num qty ord_date prod_id MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 15 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES Logical Database Design Any database should be designed with the end user in mind. Logical database design, also referred to as the logical model, is the process of arranging data into logical, organized groups of objects that can easily be maintained. The logical design of a database should reduce data repetition or go so far as to completely eliminate it. After all, why store the same data twice? Naming conventions used in a database should also be standard and logical. What are the End User’s Needs? The needs of the end user should be one of the top considerations when designing a database. Remember that the end user is the person who ultimately uses the database. There should be ease of use through the user’s front-end tool (a client program that allows a user access to a database), but this, along with optimal performance, cannot be achieved if the user’s needs are not taken into consideration. Some user-related design considerations include the following:  What data should be stored in the database?  How will the user access the database?  What privileges does the user require?  How should the data be grouped in the database?  What data is the most commonly accessed?  How is all data related in the database?  What measures should be taken to ensure accurate data? Data redundancy Data should not be redundant, which means that the duplication of data should be kept to a minimum for several reasons. For example, it is unnecessary to store an employee’s home address in more than one table. With duplicate data, unnecessary space is used. Confusion is always a threat when, for instance, an address for an employee in one table does not match the address of the same employee in another table. Which table is correct? Do you have documentation to verify the employee’s current address? As if data management were not difficult enough, redundancy of data could prove to be a disaster. MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 16 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES The Normal Forms Normal form is a way of measuring the levels or depth, to which a database has been normalized. A database’s level of normalization is determined by the normal form. The following are the three most common normal forms in the normalization process :  The first normal form  The second normal form  The third normal form Of the three normal forms, each subsequent normal form depends on normalization steps taken in the previous normal form. For example, to normalize a database using the second normal form, the database must first be in the first normal form. The First Normal Form The objective of the first normal form is to divide the base data into logical units called tables. When each table has been designed, a primary key is assigned to most or all tables. COMPANY _ DATABASE EMPLOYEE _ TBL emp_id last_name first_name middle_name address city state zip phone pager position position_desc date_hire pay_rate bonus date_last_raise emp_id cust_id last_name cust_name first_name cust_address middle_name cust_city address cust_state city cust_zip state cust_phone zip cust_fax phone ord_num pager qty position ord_date postion_desc date_hire prod_id pay_rate prod_desc bonus cost date_last_raise CUSTOMER _ TBL cust_id cust_name cust_address cust_city cust_state cust_zip cust_phone cust_fax ord_num qty ord_date PRODUCTS_TBL prod_id prod_desc cost You can see that to achieve the first normal form, data had to be broken into logical units of related information, each having a primary key and ensuring that there are no repeated groups in any of the tables. Instead of the large table, there are now smaller, more manageable tables: EMPLOYEE_TBL, CUSTOMER_TBL and PRODUCTS_TBL. The primary keys are normally the first columns listed in a table, in this case: EMP_ID and PROD_ID. MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 17 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES The Second Normal Form The objective of the second normal form is to take data that is only partly dependent on the primary key and enter that data into another table. EMPLOYEE_TBL emp_id last_name first_name middle_name address city state zip phone pager position position_desc date_hire pay_rate bonus date_last_raise CUSTOMER_TBL cust_id cust_name cust_address cust_city cust_state cust_zip cust_phone cust_fax EMPLOYEE_TBL emp_id last_name first_name middle_name address city state zip phone pager EMPLOYEE_PAY_TBL emp_id position position_desc date_hire pay_rate bonus date_last_raise CUSTOMER_TBL cust_id cust_name cust_address cust_city cust_state cust_zip cust_phone cust_fax ORDERS_TBL ord_num prod_id qty ord_date FIRST NORMAL FORM ord_num prod_id qty ord_date SECOND NORMAL FORM According to the figure, the second normal form is derived from the first normal form by further breaking two tables down into more specific units. EMPLOYEE_TBL split into two tables called EMPLOYEE-TBL and EMPLOYEE_PAY_TBL. Personal employee information is dependent on the primary key MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 18 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES ( EMP_ID ), so that the information remained in the EMPLOYEE_TBL ( EMP_ID, LAST_NAME, FIRST_NAME, MIDDLE_NAME, ADDRESS, CITY, STATE, ZIP, PHONE and PAGER. On the other hand, the information that is only partly dependent on the EMP_ID ( each individual employee ) is used to populate EMPLOYEE_PAY_TBL ( EMP_ID, POSITION, POSITION_DESC, DATE_HIRE, PAY_RATE and DATE_LAST_RAISE ). Notice that both tables contain the column EMP_ID. This is the primary key of each table and is used to match corresponding data between the two tables. CUSTOMER_TBL split into two tables called CUSTOMER_TBL and ORDERS_TBL. What took place is similar to what occurred in the EMPLOYEE_TBL. Columns that were partly dependent on the primary key were directed to another table. The order information for a customer is dependent on each CUST_ID, but does not directly depend on the general customer information in the original table. The Third Normal Form The third normal form’s objective is to remove data in a table that is not dependent on the primary key. Another table was created to display the use of the third normal form. EMPLOYEE_PAY_TBL is split into two tables, one table containing the actual employee pay information and the other containing the position descriptions, which really do not need to reside in EMPLOYEE-PAY_TBL. The POSITION_DESC column is totally independent of the primary key, EMP_ID. EMPLOYEE_PAY_TBL emp_id position position_desc date_hire pay_rate bonus date_last_raise EMPLOYEE_PAY_TBL emp_id position date_hire pay_rate bonus date_last_raise POSITIONS_TBL position position-desc MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 19 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES Benefits of Normalization Normalization provides numerous benefits to a database. Some of the major benefits include the following : - Greater overall database organization - Reduction of redundant data - Data consistency within the database design - A much more flexible database design - A better handle on database security . Drawbacks of Normalization Although most successful databases are normalized to some degree, there is one substantial drawback of a normalized database: reduced database performance. The acceptance of reduced performance requires the knowledge that when a query or transaction request is sent to the database, there are factors involved, such as CPU usage, memory usage and input/output (I/O). To make a long story short, a normalized database requires much more CPU, memory and I/O to process transactions and database queries than does a denormalized database. A normalized database must locate the requested tables and then join the data from the tables to either get the requested information or to process the desired data. A more in-depth discussion concerning database performance occurs in Hour 18, “Managing Database Users. “ QUIZ # 4 MIDTERM EXAMINATION FINALS PERIOD Lecture no. 1: TRANSACTION MANAGEMENT 1.1 Transaction Support Transaction – an action or series of actions, carried out by a single user or application program, which accesses or changes the contents of the database. - is a logical unit of work on the database. It may be an entire program, part of a program, or a single command. MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 20 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES Properties of Transactions There are properties that all transactions should possess the four basic, or so called ACID:     Atomicity – the “all or nothing” property. A transaction is an indivisible unit that is either performed in its entirely or it is not performed at all. Consistency- a transaction must transform the database from one consistent state to another consistent state. Isolation – transactions execute independently of one another. In other words, the partial effects of incomplete transactions should not be visible to other transactions. Durability – the effects of a successfully completed (committed) transaction are permanently recorded in the database and must be lost because of a subsequent failure. THE DBMS TRANSACTION MANAGEMENT Transaction Manager – coordinates transactions on behalf of application programs. Scheduler – the module responsible for implementing a particular strategy for concurrency control. Sometimes referred to as “ Lock Manager”. Recovery Manager - ensures that the database is restored to the state it was in before the start of the transaction, and therefore a consistent file. Buffer Manager – is responsible for the transfer of data between disk storage and main memory. Access Manager Transaction Manager Scheduler Buffer Manager Recovery Manager File Manager Systems Manager Database and System Catalog Figure 1 The DBMS Transaction Subsystem MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 21 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES 1.2 Concurrency Control Concurrency Control – the process of managing simultaneous operations on the database without having them interfere with one another. Major objective in developing a database “ Is to enable many users to access shared data concurrently” Three examples of potential problems caused by concurrency: 1.) The Lost Update Problem – an apparently successfully completed update operations by one user can be overridden by another user. 2.) The Uncommitted Dependency Problem – occurs when one transaction is allowed to see the intermediate results of another transaction before it has committed. 3.) The Inconsistent Analysis Problem – occurs when a transaction updates some of them using the execution of the first. 1.3 Database Recovery Database Recovery – the process of restoring the database to a correct state in the event of a failure. Four different types of media with an increasing degree for reliability: 1.) Main Memory – is volatile storage that usually does not survive system crashes. 2.) Magnetic Disks – provide online non-volatile storage. Compared with main memory, disks are more reliable and much cheaper, but slower by three to four order of magnitude. 3.) Magnetic Tape – is an offline non-volatile storage medium, which is far more reliable than disk and fairly inexpensive, but slower, providing only sequential access. 4.) Optical Disks – is more reliable than tape, generally cheaper, faster, providing random access. Additional Facts: - Main memory is also called Primary Storage. Disks and tape are known as Secondary Storage. Stable storage represents information that has been replicated in several non-volatile storage media (usually disk) with independent failure modes. MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 22 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES Among the causes of failure are: 1.) System crashes – due to hardware or software errors, resulting in loss of main memory. 2.) Media failures – such as head crashes or unreadable media, resulting in the loss of parts of secondary storage. 3.) Application software errors – such as logical errors in the program that is accessing the database, which cause one or more transaction to fail. 4.) Natural physical disasters – such as fire, floods, earthquakes, or power failures. 5.) Sabotage – or can be called as intentional corruption or destruction of data, hardware or software facilities. Two Principal Effects that we need to consider: 1.) The loss of main memory, including the database buffers. 2.) The loss of the disk copy of the database. A DBMS should provide the following facilities to assist with recovery:    A backup mechanism, which makes periodic backup copies of the database. Logging facilities, which keep track of the current state of transactions and database changes. A checkpoint facility, which enables updates to the database that are in progress to be made permanent. A recovery manager, which allows the system to restore the database to a consistent state following a failure.  Log File To keep track of database transactions, the DBMS maintains a special file called a log (or journal) that contains information about all updates to the database. The log may contain the following data: 1.) Transaction Records, containing:    Transaction identifier. Type of log record. Identifier of data item affected by the database action. 2.) Checkpoint records Checkpoint – the point of synchronization between the database and the transaction log file. All buffers are force-written to secondary storage. MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 23 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES Checkpoint are scheduled at predetermined intervals and involve the following operations:    Writing all log records in main memory to secondary storage. Writing the modified blocks in the database buffers to secondary storage. Writing a checkpoint record to the log file. This record contains the identifiers of all transactions that are active at the time of the checkpoint. Recovery Techniques 1.) Recovery techniques using deferred update  When a transaction starts, write a transaction start record to the log.  When any write operation is performed, write a log record containing all the data specified previously (excluding the before-image of the update). Do not actually write the update to the database buffers or the database itself.  When a transaction is about to commit, write a transaction commit log record, write all the log records for the transaction to disk and then commit the transaction. Use the log records to perform the actual updates to the database.  If a transaction aborts, ignore the log records for the transaction and do not perform the writes.  Any transaction with transaction start and transaction commit log records should be redone.  For any transactions with transaction start and transaction abort log records, we do nothing, since no actual writing was done to the database, so these transactions do not have to be undone. 2.) Recovery techniques using immediate update  When a transaction starts, write a transaction start record to the log.  When a write operation is performed, write a record containing the necessary data to the log file.  Once the log record is written, write the update to the database buffers.  The updates to the database itself are written when the buffers are next flushed to secondary storage.  When a transaction commits, write a transaction commit to the log. Lecture no. 2: IMPROVING QUERY PERFORMANCE 2.1 Hash Files Hash function – calculates the address of the page in which the record is to be stored based on one or more of the fields in the record. MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 24 DATABASE FUNDAMENTAL 2ND Semester 2014-2015  Hash field – is also called as base field.  Hash key – if the field is also a key field of the file. MELJUN P. CORTES Collision – when the same address is generated for two or more records. There are several techniques that can be used to manage collisions: 1.) 2.) 3.) 4.) Open addressing Unchained overflow Chained overflow Multiple hashing 2.2 Indexes Indexes – a data structure that allows the DBMS to locate particular records in a file more quickly, and thereby speed response to user queries.  Data file – the file containing the logical records.  Index file – the file containing the index records.  Primary index – if the data file is sequentially ordered, and the indexing field is a key field of the file, it is guaranteed to have a unique value in each record.  Clustering index – if the indexing is not a key field of the file, so that there can be more than one record corresponding to a value of the indexing field.  Secondary index – an index that is defined on a non-ordering field of the data file. Indexed sequential file – a sorted data file with a primary index. An Indexed sequential file is a more versatile structure, which normally has:  A primary storage area.  A separate index or indexes.  An overflow area. SEATWORK: 1.) Define Transaction 2.) Give other Database recovery techniques MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 25 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES QUIZ #5 Lecture no. 3: DATA WAREHOUSING, OLAP and DATA MINING 3.1 Data Warehousing Data Warehousing - is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process. Benefits of Data Warehousing    Potential high returns on investment Competitive advantage Increased productivity of corporate decision-makers Problems of Data Warehousing           Underestimation of resources for data loading Hidden problems with source systems Required data not captured Increased end-user demands Data homogenization High demand for resources Data ownership High maintenance Long duration projects Complexity of integration The Major components of a Data Warehouse 1. Operational Data The source of data for the data warehouse is supplied from:     Mainframe operational held in first generation hierarchical and network databases. Departmental data held in propriety file systems such as VSAM, RMS, and relational DBMS such as Informix, Oracle. Private data held on workstations and private servers. External systems such as the Internet, commercially available databases, or databases associated with an organization’s supplier or customers. 2. Load Manager – (also called the front-end component) performs all the operations associated with the extraction and loading of data into the warehouse. 3. Warehouse Manager – performs all the operations associated with the management of the data in the warehouse. 4. Query Manager - (also called the back-end component) performs all the operations associated with the management with the management of user queries. 5. End-user access tools – is to provide information to business users for strategic decision making. MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 26 DATABASE FUNDAMENTAL 2ND Semester 2014-2015 MELJUN P. CORTES Can be categorized into five main groups:      Reporting and query tools. Application development tools. Executive information system (EIS) tools. Online analytical processing (OLAP) tools. Data mining tools. 3.2 Online Analytical Processing (OLAP) – the dynamic synthesis, analysis, and consolidation of large volumes of multi-dimensional data. Rules for OLAP Systems             Multi-dimensional conceptual view Transparency Accessibility Consistent reporting performance Client-server architecture Generic dimensionality Dynamic sparse matrix handling Multi-user support Unrestricted cross-dimensional operations Intuitive data manipulation Flexible reporting Unlimited dimensions and aggregations Categories of OLAP Tools 1. Multi-dimensional OPLAP (MOLAP or MD-OPLAP) – use specialized data structures and Multi-dimensional Database Management (MDDBMSs) to organize, navigate, and analyze data. 2. Relational OPLAP (ROLAP) – is the fastest-growing style of OLAP technology. ROLAP supports RDBMS products through the use of a meta-data layer, thus avoiding the requirement to create a static multi-dimensional data structure. 3. Managed Query Environment (MQE) – they provide limited analysis capability, either directly against RDBMS products, or by using an intermediate MOLAP server. 3.3 Data Mining Data Mining – the process of extracting valid, previously unknown, comprehensible, and actionable information from large databases using it to make crucial business decisions. Four Main Operations Associated with Data Mining Techniques: 1. Predictive modeling 2. Database segmentation 3. Link analysis 4. Deviation detection QUIZ #6 FINAL EXAMINATION MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 27