constitute a database. However, the common use of the term database is usually
more restricted. A database has the following implicit properties:
■ A database represents some aspect of the real world, sometimes called the
miniworld or the universe of discourse (UoD). Changes to the miniworld
are reflected in the database.
■ A database is a logically coherent collection of data with some inherent
meaning. A random assortment of data cannot correctly be referred to as a
database.
■ A database is designed, built, and populated with data for a specific purpose.
It has an intended group of users and some preconceived applications in
which these users are interested.
In other words, a database has some source from which data is derived, some degree
of interaction with events in the real world, and an audience that is actively interested in its contents. The end users of a database may perform business transactions
(for example, a customer buys a camera) or events may happen (for example, an
employee has a baby) that cause the information in the database to change. In order
for a database to be accurate and reliable at all times, it must be a true reflection of
the miniworld that it represents; therefore, changes must be reflected in the database as soon as possible.
A database can be of any size and complexity. For example, the list of names and
addresses referred to earlier may consist of only a few hundred records, each with a
simple structure. On the other hand, the computerized catalog of a large library
may contain half a million entries organized under different categories—by primary author's last name, by subject, by book title—with each category organized
alphabetically. A database of even greater size and complexity would be maintained
by a social media company such as Facebook, which has more than a billion users.
The database has to maintain information on which users are related to one another
as friends, the postings of each user, which users are allowed to see each posting,
and a vast amount of other types of information needed for the correct operation of
their Web site. For such Web sites, a large number of databases are needed to keep
track of the constantly changing information required by the social media Web site.
An example of a large commercial database is Amazon.com. It contains data for
over 60 million active users, and millions of books, CDs, videos, DVDs, games,
electronics, apparel, and other items. The database occupies over 42 terabytes
(a terabyte is 1012 bytes worth of storage) and is stored on hundreds of computers
(called servers). Millions of visitors access Amazon.com each day and use the
database to make purchases. The database is continually updated as new books
and other items are added to the inventory, and stock quantities are updated as
purchases are transacted.
A database may be generated and maintained manually or it may be computerized. For example, a library card catalog is a database that may be created and
maintained manually. A computerized database may be created and maintained
either by a group of application programs written specifically for that task or by a
database management system. Of course, we are only concerned with computerized databases in this text.
A database management system (DBMS) is a computerized system that enables
users to create and maintain a database. The DBMS is a general-purpose software
system that facilitates the processes of defining, constructing, manipulating, and
sharing databases among various users and applications. Defining a database
involves specifying the data types, structures, and constraints of the data to be
stored in the database. The database definition or descriptive information is also
stored by the DBMS in the form of a database catalog or dictionary; it is called
meta-data. Constructing the database is the process of storing the data on some
storage medium that is controlled by the DBMS. Manipulating a database includes
functions such as querying the database to retrieve specific data, updating the database to reflect changes in the miniworld, and generating reports from the data.
Sharing a database allows multiple users and programs to access the database
simultaneously.
An application program accesses the database by sending queries or requests for
data to the DBMS. A query2 typically causes some data to be retrieved; a transaction
may cause some data to be read and some data to be written into the database.
Other important functions provided by the DBMS include protecting the database
and maintaining it over a long period of time. Protection includes system protection against hardware or software malfunction (or crashes) and security protection
against unauthorized or malicious access. A typical large database may have a life
cycle of many years, so the DBMS must be able to maintain the database system by
allowing the system to evolve as requirements change over time.
It is not absolutely necessary to use general-purpose DBMS software to implement
a computerized database. It is possible to write a customized set of programs to create and maintain the database, in effect creating a special-purpose DBMS software
for a specific application, such as airlines reservations. In either case—whether we
use a general-purpose DBMS or not—a considerable amount of complex software
is deployed. In fact, most DBMSs are very complex software systems.
To complete our initial definitions, we will call the database and DBMS software
together a database system. Figure 1.1 illustrates some of the concepts we have
discussed so far.
1.2 An Example
Let us consider a simple example that most readers may be familiar with: a
UNIVERSITY database for maintaining information concerning students, courses,
and grades in a university environment. Figure 1.2 shows the database structure
and a few sample data records. The database is organized as five files, each of which
stores data records of the same type.3 The STUDENT file stores data on each student, the COURSE file stores data on each course, the SECTION file stores data on
each section of a course, the GRADE_REPORT file stores the grades that students
receive in the various sections they have completed, and the PREREQUISITE file
stores the prerequisites of each course.
To define this database, we must specify the structure of the records of each file by
specifying the different types of data elements to be stored in each record. In
Figure 1.2, each STUDENT record includes data to represent the student's Name,
Student_number, Class (such as freshman or '1', sophomore or '2', and so forth),
and Major (such as mathematics or 'MATH' and computer science or 'CS'); each
COURSE record includes data to represent the Course_name, Course_number,
Credit_hours, and Department (the department that offers the course), and so
on. We must also specify a data type for each data element within a record. For
example, we can specify that Name of STUDENT is a string of alphabetic characters,
Student_number of STUDENT is an integer, and Grade of GRADE_REPORT is a single character from the set {'A', 'B', 'C', 'D', 'F', 'I'}. We may also use a coding
scheme to represent the values of a data item. For example, in Figure 1.2 we represent the Class of a STUDENT as 1 for freshman, 2 for sophomore, 3 for junior,
4 for senior, and 5 for graduate student.
To construct the UNIVERSITY database, we store data to represent each student,
course, section, grade report, and prerequisite as a record in the appropriate file.
Notice that records in the various files may be related. For example, the record for
Smith in the STUDENT file is related to two records in the GRADE_REPORT file that
specify Smith's grades in two sections. Similarly, each record in the PREREQUISITE
file relates two course records: one representing the course and the other representing the prerequisite. Most medium-size and large databases include many types of
records and have many relationships among the records.
Database manipulation involves querying and updating. Examples of queries are as
follows:
■ Retrieve the transcript—a list of all courses and grades—of 'Smith'
■ List the names of students who took the section of the 'Database' course
offered in fall 2008 and their grades in that section
■ List the prerequisites of the 'Database' course
Examples of updates include the following:
■ Change the class of 'Smith' to sophomore
■ Create a new section for the 'Database' course for this semester
■ Enter a grade of 'A' for 'Smith' in the 'Database' section of last semester
These informal queries and updates must be specified precisely in the query language of the DBMS before they can be processed.
At this stage, it is useful to describe the database as part of a larger undertaking
known as an information system within an organization. The Information Technology (IT) department within an organization designs and maintains an information system consisting of various computers, storage systems, application software,
and databases. Design of a new application for an existing database or design of a
brand new database starts off with a phase called requirements specification and
analysis. These requirements are documented in detail and transformed into a
conceptual design that can be represented and manipulated using some computerized tools so that it can be easily maintained, modified, and transformed into a
database implementation. (We will introduce a model called the Entity-Relationship model in Chapter 3 that is used for this purpose.) The design is then translated
to a logical design that can be expressed in a data model implemented in a commercial DBMS. (Various types of DBMSs are discussed throughout the text, with an
emphasis on relational DBMSs in Chapters 5 through 9.)
The final stage is physical design, during which further specifications are provided for
storing and accessing the database. The database design is implemented, populated
with actual data, and continuously maintained to reflect the state of the miniworld.
1.3 Characteristics of the Database Approach
A number of characteristics distinguish the database approach from the much
older approach of writing customized programs to access data stored in files. In
traditional file processing, each user defines and implements the files needed for a
specific software application as part of programming the application. For example,
one user, the grade reporting office, may keep files on students and their grades.
Programs to print a student's transcript and to enter new grades are implemented
as part of the application. A second user, the accounting office, may keep track of
students' fees and their payments. Although both users are interested in data about
students, each user maintains separate files—and programs to manipulate these
files—because each requires some data not available from the other user's files.
This redundancy in defining and storing data results in wasted storage space and
in redundant efforts to maintain common up-to-date data.
In the database approach, a single repository maintains data that is defined once
and then accessed by various users repeatedly through queries, transactions, and
application programs. The main characteristics of the database approach versus the
file-processing approach are the following:
■ Self-describing nature of a database system
■ Insulation between programs and data, and data abstraction
■ Support of multiple views of the data
■ Sharing of data and multiuser transaction processing
We describe each of these characteristics in a separate section. We will discuss additional characteristics of database systems in Sections 1.6 through 1.8.
1.3.1 Self-Describing Nature of a Database System
A fundamental characteristic of the database approach is that the database system
contains not only the database itself but also a complete definition or description of
the database structure and constraints. This definition is stored in the DBMS catalog, which contains information such as the structure of each file, the type and storage format of each data item, and various constraints on the data. The information
stored in the catalog is called meta-data, and it describes the structure of the primary database (Figure 1.1). It is important to note that some newer types of database systems, known as NOSQL systems, do not require meta-data. Rather the data
is stored as self-describing data that includes the data item names and data values
together in one structure (see Chapter 24).
The catalog is used by the DBMS software and also by database users who need
information about the database structure. A general-purpose DBMS software
package is not written for a specific database application. Therefore, it must refer
to the catalog to know the structure of the files in a specific database, such as the
type and format of data it will access. The DBMS software must work equally well
with any number of database applications—for example, a university database, a banking database, or a company database—as long as the database definition is
stored in the catalog.
In traditional file processing, data definition is typically part of the application programs themselves. Hence, these programs are constrained to work with only one
specific database, whose structure is declared in the application programs. For
example, an application program written in C++ may have struct or class declarations. Whereas file-processing software can access only specific databases, DBMS
software can access diverse databases by extracting the database definitions from
the catalog and using these definitions.
For the example shown in Figure 1.2, the DBMS catalog will store the definitions of
all the files shown. Figure 1.3 shows some entries in a database catalog. Whenever a
request is made to access, say, the Name of a STUDENT record, the DBMS software
refers to the catalog to determine the structure of the STUDENT file and the position
and size of the Name data item within a STUDENT record. By contrast, in a typical
file-processing application, the file structure and, in the extreme case, the exact
location of Name within a STUDENT record are already coded within each program
that accesses this data item.
1.3.2 Insulation between Programs and Data,
and Data Abstraction
In traditional file processing, the structure of data files is embedded in the application programs, so any changes to the structure of a file may require changing all
programs that access that file. By contrast, DBMS access programs do not require
such changes in most cases. The structure of data files is stored in the DBMS catalog separately from the access programs. We call this property program-data
independence.
For example, a file access program may be written in such a way that it can access
only STUDENT records of the structure shown in Figure 1.4. If we want to add
another piece of data to each STUDENT record, say the Birth_date, such a program
will no longer work and must be changed. By contrast, in a DBMS environment, we
only need to change the description of STUDENT records in the catalog (Figure 1.3)
to reflect the inclusion of the new data item Birth_date; no programs are changed.
The next time a DBMS program refers to the catalog, the new structure of
STUDENT records will be accessed and used.
In some types of database systems, such as object-oriented and object-relational
systems (see Chapter 12), users can define operations on data as part of the database
definitions. An operation (also called a function or method) is specified in two
parts. The interface (or signature) of an operation includes the operation name and
the data types of its arguments (or parameters). The implementation (or method) of
the operation is specified separately and can be changed without affecting the interface. User application programs can operate on the data by invoking these operations through their names and arguments, regardless of how the operations are
implemented. This may be termed program-operation independence.
The characteristic that allows program-data independence and program-operation
independence is called data abstraction. A DBMS provides users with a conceptual
representation of data that does not include many of the details of how the data is
stored or how the operations are implemented. Informally, a data model is a type of
data abstraction that is used to provide this conceptual representation. The data
model uses logical concepts, such as objects, their properties, and their interrelationships, that may be easier for most users to understand than computer storage
concepts. Hence, the data model hides storage and implementation details that are
not of interest to most database users.
Looking at the example in Figures 1.2 and 1.3, the internal implementation of the
STUDENT file may be defined by its record length—the number of characters
(bytes) in each record—and each data item may be specified by its starting byte
within a record and its length in bytes. The STUDENT record would thus be represented as shown in Figure 1.4. But a typical database user is not concerned with the
location of each data item within a record or its length; rather, the user is concerned
that when a reference is made to Name of STUDENT, the correct value is returned.
A conceptual representation of the STUDENT records is shown in Figure 1.2. Many
other details of file storage organization—such as the access paths specified on a
file—can be hidden from database users by the DBMS; we discuss storage details in
Chapters 16 and 17.
In the database approach, the detailed structure and organization of each file are
stored in the catalog. Database users and application programs refer to the conceptual representation of the files, and the DBMS extracts the details of file storage
from the catalog when these are needed by the DBMS file access modules. Many
data models can be used to provide this data abstraction to database users. A major
part of this text is devoted to presenting various data models and the concepts they
use to abstract the representation of data.
In object-oriented and object-relational databases, the abstraction process includes
not only the data structure but also the operations on the data. These operations
provide an abstraction of miniworld activities commonly understood by the users.
For example, an operation CALCULATE_GPA can be applied to a STUDENT object
to calculate the grade point average. Such operations can be invoked by the user
queries or application programs without having to know the details of how the
operations are implemented.
1.3.3 Support of Multiple Views of the Data
A database typically has many types of users, each of whom may require a different
perspective or view of the database. A view may be a subset of the database or it may
contain virtual data that is derived from the database files but is not explicitly stored.
Some users may not need to be aware of whether the data they refer to is stored or
derived. A multiuser DBMS whose users have a variety of distinct applications must
provide facilities for defining multiple views. For example, one user of the database
of Figure 1.2 may be interested only in accessing and printing the transcript of each
student; the view for this user is shown in Figure 1.5(a). A second user, who is interested only in checking that students have taken all the prerequisites of each course
for which the student registers, may require the view shown in Figure 1.5(b).
1.3.4 Sharing of Data and Multiuser Transaction Processing
A multiuser DBMS, as its name implies, must allow multiple users to access the
database at the same time. This is essential if data for multiple applications is to be
integrated and maintained in a single database. The DBMS must include concurrency
control software to ensure that several users trying to update the same data do so in a controlled manner so that the result of the updates is correct. For example, when several reservation agents try to assign a seat on an airline flight, the
DBMS should ensure that each seat can be accessed by only one agent at a time for
assignment to a passenger. These types of applications are generally called online
transaction processing (OLTP) applications. A fundamental role of multiuser
DBMS software is to ensure that concurrent transactions operate correctly and
efficiently.
The concept of a transaction has become central to many database applications. A
transaction is an executing program or process that includes one or more database
accesses, such as reading or updating of database records. Each transaction is supposed to execute a logically correct database access if executed in its entirety without interference from other transactions. The DBMS must enforce several
transaction properties. The isolation property ensures that each transaction
appears to execute in isolation from other transactions, even though hundreds of
transactions may be executing concurrently. The atomicity property ensures that
either all the database operations in a transaction are executed or none are. We discuss transactions in detail in Part 9.
The preceding characteristics are important in distinguishing a DBMS from traditional file-processing software. In Section 1.6 we discuss additional features that
characterize a DBMS. First, however, we categorize the different types of people
who work in a database system environment.
1.4 Actors on the Scene
For a small personal database, such as the list of addresses discussed in Section 1.1,
one person typically defines, constructs, and manipulates the database, and there is
no sharing. However, in large organizations, many people are involved in the
design, use, and maintenance of a large database with hundreds or thousands of
users. In this section we identify the people whose jobs involve the day-to-day use
of a large database; we call them the actors on the scene. In Section 1.5 we consider
people who may be called workers behind the scene—those who work to maintain
the database system environment but who are not actively interested in the database contents as part of their daily job.
1.4.1 Database Administrators
In any organization where many people use the same resources, there is a need for
a chief administrator to oversee and manage these resources. In a database environment, the primary resource is the database itself, and the secondary resource is the
DBMS and related software. Administering these resources is the responsibility of
the database administrator (DBA). The DBA is responsible for authorizing access
to the database, coordinating and monitoring its use, and acquiring software and
hardware resources as needed. The DBA is accountable for problems such as security breaches and poor system response time. In large organizations, the DBA is
assisted by a staff that carries out these functions.
1.4.2 Database Designers
Database designers are responsible for identifying the data to be stored in the database and for choosing appropriate structures to represent and store this data. These
tasks are mostly undertaken before the database is actually implemented and populated with data. It is the responsibility of database designers to communicate with
all prospective database users in order to understand their requirements and to create a design that meets these requirements. In many cases, the designers are on the
staff of the DBA and may be assigned other staff responsibilities after the database
design is completed. Database designers typically interact with each potential group
of users and develop views of the database that meet the data and processing
requirements of these groups. Each view is then analyzed and integrated with the
views of other user groups. The final database design must be capable of supporting
the requirements of all user groups.
1.4.3 End Users
End users are the people whose jobs require access to the database for querying,
updating, and generating reports; the database primarily exists for their use. There
are several categories of end users:
■ Casual end users occasionally access the database, but they may need different information each time. They use a sophisticated database query interface to specify their requests and are typically middle- or high-level managers or
other occasional browsers.
■ Naive or parametric end users make up a sizable portion of database
end users. Their main job function revolves around constantly querying
and updating the database, using standard types of queries and updates—
called canned transactions—that have been carefully programmed and
tested. Many of these tasks are now available as mobile apps for use with
mobile devices. The tasks that such users perform are varied. A few
examples are:
Bank customers and tellers check account balances and post withdrawals
and deposits.
Reservation agents or customers for airlines, hotels, and car rental companies check availability for a given request and make reservations.
Employees at receiving stations for shipping companies enter package
identifications via bar codes and descriptive information through buttons
to update a central database of received and in-transit packages.
Social media users post and read items on social media Web sites.
■ Sophisticated end users include engineers, scientists, business analysts, and
others who thoroughly familiarize themselves with the facilities of the DBMS
in order to implement their own applications to meet their complex requirements.
■ Standalone users maintain personal databases by using ready-made program packages that provide easy-to-use menu-based or graphics-based
interfaces. An example is the user of a financial software package that stores
a variety of personal financial data.
A typical DBMS provides multiple facilities to access a database. Naive end users
need to learn very little about the facilities provided by the DBMS; they simply have
to understand the user interfaces of the mobile apps or standard transactions
designed and implemented for their use. Casual users learn only a few facilities that
Pueden usarlo repetidamente. Los usuarios sofisticados intentan aprender la mayoría de las funciones del DBMS
para satisfacer sus complejas necesidades. Los usuarios independientes suelen convertirse en
muy competente en el uso de un paquete de software específico.