base de datos by SolRack_78

constitute a database. However, the common use of the term database is usually

more restricted. A database has the following implicit properties:

■ A database represents some aspect of the real world, sometimes called the

miniworld or the universe of discourse (UoD). Changes to the miniworld

are reflected in the database.

■ A database is a logically coherent collection of data with some inherent

meaning. A random assortment of data cannot correctly be referred to as a

database.

■ A database is designed, built, and populated with data for a specific purpose.

It has an intended group of users and some preconceived applications in

which these users are interested.

In other words, a database has some source from which data is derived, some degree

of interaction with events in the real world, and an audience that is actively interested in its contents. The end users of a database may perform business transactions

(for example, a customer buys a camera) or events may happen (for example, an

employee has a baby) that cause the information in the database to change. In order

for a database to be accurate and reliable at all times, it must be a true reflection of

the miniworld that it represents; therefore, changes must be reflected in the database as soon as possible.

A database can be of any size and complexity. For example, the list of names and

addresses referred to earlier may consist of only a few hundred records, each with a

simple structure. On the other hand, the computerized catalog of a large library

may contain half a million entries organized under different categories—by primary author's last name, by subject, by book title—with each category organized

alphabetically. A database of even greater size and complexity would be maintained

by a social media company such as Facebook, which has more than a billion users.

The database has to maintain information on which users are related to one another

as friends, the postings of each user, which users are allowed to see each posting,

and a vast amount of other types of information needed for the correct operation of

their Web site. For such Web sites, a large number of databases are needed to keep

track of the constantly changing information required by the social media Web site.

An example of a large commercial database is Amazon.com. It contains data for

over 60 million active users, and millions of books, CDs, videos, DVDs, games,

electronics, apparel, and other items. The database occupies over 42 terabytes

(a terabyte is 1012 bytes worth of storage) and is stored on hundreds of computers

(called servers). Millions of visitors access Amazon.com each day and use the

database to make purchases. The database is continually updated as new books

and other items are added to the inventory, and stock quantities are updated as

purchases are transacted.

A database may be generated and maintained manually or it may be computerized. For example, a library card catalog is a database that may be created and

maintained manually. A computerized database may be created and maintained

either by a group of application programs written specifically for that task or by a

database management system. Of course, we are only concerned with computerized databases in this text.

A database management system (DBMS) is a computerized system that enables

users to create and maintain a database. The DBMS is a general-purpose software

system that facilitates the processes of defining, constructing, manipulating, and

sharing databases among various users and applications. Defining a database

involves specifying the data types, structures, and constraints of the data to be

stored in the database. The database definition or descriptive information is also

stored by the DBMS in the form of a database catalog or dictionary; it is called

meta-data. Constructing the database is the process of storing the data on some

storage medium that is controlled by the DBMS. Manipulating a database includes

functions such as querying the database to retrieve specific data, updating the database to reflect changes in the miniworld, and generating reports from the data.

Sharing a database allows multiple users and programs to access the database

simultaneously.

An application program accesses the database by sending queries or requests for

data to the DBMS. A query2 typically causes some data to be retrieved; a transaction

may cause some data to be read and some data to be written into the database.

Other important functions provided by the DBMS include protecting the database

and maintaining it over a long period of time. Protection includes system protection against hardware or software malfunction (or crashes) and security protection

against unauthorized or malicious access. A typical large database may have a life

cycle of many years, so the DBMS must be able to maintain the database system by

allowing the system to evolve as requirements change over time.

It is not absolutely necessary to use general-purpose DBMS software to implement

a computerized database. It is possible to write a customized set of programs to create and maintain the database, in effect creating a special-purpose DBMS software

for a specific application, such as airlines reservations. In either case—whether we

use a general-purpose DBMS or not—a considerable amount of complex software

is deployed. In fact, most DBMSs are very complex software systems.

To complete our initial definitions, we will call the database and DBMS software

together a database system. Figure 1.1 illustrates some of the concepts we have

discussed so far.

1.2 An Example

Let us consider a simple example that most readers may be familiar with: a

UNIVERSITY database for maintaining information concerning students, courses,

and grades in a university environment. Figure 1.2 shows the database structure

and a few sample data records. The database is organized as five files, each of which

stores data records of the same type.3 The STUDENT file stores data on each student, the COURSE file stores data on each course, the SECTION file stores data on

each section of a course, the GRADE_REPORT file stores the grades that students

receive in the various sections they have completed, and the PREREQUISITE file

stores the prerequisites of each course.

To define this database, we must specify the structure of the records of each file by

specifying the different types of data elements to be stored in each record. In

Figure 1.2, each STUDENT record includes data to represent the student's Name,

Student_number, Class (such as freshman or '1', sophomore or '2', and so forth),

and Major (such as mathematics or 'MATH' and computer science or 'CS'); each

COURSE record includes data to represent the Course_name, Course_number,

Credit_hours, and Department (the department that offers the course), and so

on. We must also specify a data type for each data element within a record. For

example, we can specify that Name of STUDENT is a string of alphabetic characters,

Student_number of STUDENT is an integer, and Grade of GRADE_REPORT is a single character from the set {'A', 'B', 'C', 'D', 'F', 'I'}. We may also use a coding

scheme to represent the values of a data item. For example, in Figure 1.2 we represent the Class of a STUDENT as 1 for freshman, 2 for sophomore, 3 for junior,

4 for senior, and 5 for graduate student.

To construct the UNIVERSITY database, we store data to represent each student,

course, section, grade report, and prerequisite as a record in the appropriate file.

Notice that records in the various files may be related. For example, the record for

Smith in the STUDENT file is related to two records in the GRADE_REPORT file that

specify Smith's grades in two sections. Similarly, each record in the PREREQUISITE

file relates two course records: one representing the course and the other representing the prerequisite. Most medium-size and large databases include many types of

records and have many relationships among the records.

Database manipulation involves querying and updating. Examples of queries are as

follows:

■ Retrieve the transcript—a list of all courses and grades—of 'Smith'

■ List the names of students who took the section of the 'Database' course

offered in fall 2008 and their grades in that section

■ List the prerequisites of the 'Database' course

Examples of updates include the following:

■ Change the class of 'Smith' to sophomore

■ Create a new section for the 'Database' course for this semester

■ Enter a grade of 'A' for 'Smith' in the 'Database' section of last semester

These informal queries and updates must be specified precisely in the query language of the DBMS before they can be processed.

At this stage, it is useful to describe the database as part of a larger undertaking

known as an information system within an organization. The Information Technology (IT) department within an organization designs and maintains an information system consisting of various computers, storage systems, application software,

and databases. Design of a new application for an existing database or design of a

brand new database starts off with a phase called requirements specification and

analysis. These requirements are documented in detail and transformed into a

conceptual design that can be represented and manipulated using some computerized tools so that it can be easily maintained, modified, and transformed into a

database implementation. (We will introduce a model called the Entity-Relationship model in Chapter 3 that is used for this purpose.) The design is then translated

to a logical design that can be expressed in a data model implemented in a commercial DBMS. (Various types of DBMSs are discussed throughout the text, with an

emphasis on relational DBMSs in Chapters 5 through 9.)

The final stage is physical design, during which further specifications are provided for

storing and accessing the database. The database design is implemented, populated

with actual data, and continuously maintained to reflect the state of the miniworld.

1.3 Characteristics of the Database Approach

A number of characteristics distinguish the database approach from the much

older approach of writing customized programs to access data stored in files. In

traditional file processing, each user defines and implements the files needed for a

specific software application as part of programming the application. For example,

one user, the grade reporting office, may keep files on students and their grades.

Programs to print a student's transcript and to enter new grades are implemented

as part of the application. A second user, the accounting office, may keep track of

students' fees and their payments. Although both users are interested in data about

students, each user maintains separate files—and programs to manipulate these

files—because each requires some data not available from the other user's files.

This redundancy in defining and storing data results in wasted storage space and

in redundant efforts to maintain common up-to-date data.

In the database approach, a single repository maintains data that is defined once

and then accessed by various users repeatedly through queries, transactions, and

application programs. The main characteristics of the database approach versus the

file-processing approach are the following:

■ Self-describing nature of a database system

■ Insulation between programs and data, and data abstraction

■ Support of multiple views of the data

■ Sharing of data and multiuser transaction processing

We describe each of these characteristics in a separate section. We will discuss additional characteristics of database systems in Sections 1.6 through 1.8.

1.3.1 Self-Describing Nature of a Database System

A fundamental characteristic of the database approach is that the database system

contains not only the database itself but also a complete definition or description of

the database structure and constraints. This definition is stored in the DBMS catalog, which contains information such as the structure of each file, the type and storage format of each data item, and various constraints on the data. The information

stored in the catalog is called meta-data, and it describes the structure of the primary database (Figure 1.1). It is important to note that some newer types of database systems, known as NOSQL systems, do not require meta-data. Rather the data

is stored as self-describing data that includes the data item names and data values

together in one structure (see Chapter 24).

The catalog is used by the DBMS software and also by database users who need

information about the database structure. A general-purpose DBMS software

package is not written for a specific database application. Therefore, it must refer

to the catalog to know the structure of the files in a specific database, such as the

type and format of data it will access. The DBMS software must work equally well

with any number of database applications—for example, a university database, a banking database, or a company database—as long as the database definition is

stored in the catalog.

In traditional file processing, data definition is typically part of the application programs themselves. Hence, these programs are constrained to work with only one

specific database, whose structure is declared in the application programs. For

example, an application program written in C++ may have struct or class declarations. Whereas file-processing software can access only specific databases, DBMS

software can access diverse databases by extracting the database definitions from

the catalog and using these definitions.

For the example shown in Figure 1.2, the DBMS catalog will store the definitions of

all the files shown. Figure 1.3 shows some entries in a database catalog. Whenever a

request is made to access, say, the Name of a STUDENT record, the DBMS software

refers to the catalog to determine the structure of the STUDENT file and the position

and size of the Name data item within a STUDENT record. By contrast, in a typical

file-processing application, the file structure and, in the extreme case, the exact

location of Name within a STUDENT record are already coded within each program

that accesses this data item.

1.3.2 Insulation between Programs and Data,

and Data Abstraction

In traditional file processing, the structure of data files is embedded in the application programs, so any changes to the structure of a file may require changing all

programs that access that file. By contrast, DBMS access programs do not require

such changes in most cases. The structure of data files is stored in the DBMS catalog separately from the access programs. We call this property program-data

independence.

For example, a file access program may be written in such a way that it can access

only STUDENT records of the structure shown in Figure 1.4. If we want to add

another piece of data to each STUDENT record, say the Birth_date, such a program

will no longer work and must be changed. By contrast, in a DBMS environment, we

only need to change the description of STUDENT records in the catalog (Figure 1.3)

to reflect the inclusion of the new data item Birth_date; no programs are changed.

The next time a DBMS program refers to the catalog, the new structure of

STUDENT records will be accessed and used.

In some types of database systems, such as object-oriented and object-relational

systems (see Chapter 12), users can define operations on data as part of the database

definitions. An operation (also called a function or method) is specified in two

parts. The interface (or signature) of an operation includes the operation name and

the data types of its arguments (or parameters). The implementation (or method) of

the operation is specified separately and can be changed without affecting the interface. User application programs can operate on the data by invoking these operations through their names and arguments, regardless of how the operations are

implemented. This may be termed program-operation independence.

The characteristic that allows program-data independence and program-operation

independence is called data abstraction. A DBMS provides users with a conceptual

representation of data that does not include many of the details of how the data is

stored or how the operations are implemented. Informally, a data model is a type of

data abstraction that is used to provide this conceptual representation. The data

model uses logical concepts, such as objects, their properties, and their interrelationships, that may be easier for most users to understand than computer storage

concepts. Hence, the data model hides storage and implementation details that are

not of interest to most database users.

Looking at the example in Figures 1.2 and 1.3, the internal implementation of the

STUDENT file may be defined by its record length—the number of characters

(bytes) in each record—and each data item may be specified by its starting byte

within a record and its length in bytes. The STUDENT record would thus be represented as shown in Figure 1.4. But a typical database user is not concerned with the

location of each data item within a record or its length; rather, the user is concerned

that when a reference is made to Name of STUDENT, the correct value is returned.

A conceptual representation of the STUDENT records is shown in Figure 1.2. Many

other details of file storage organization—such as the access paths specified on a

file—can be hidden from database users by the DBMS; we discuss storage details in

Chapters 16 and 17.

In the database approach, the detailed structure and organization of each file are

stored in the catalog. Database users and application programs refer to the conceptual representation of the files, and the DBMS extracts the details of file storage

from the catalog when these are needed by the DBMS file access modules. Many

data models can be used to provide this data abstraction to database users. A major

part of this text is devoted to presenting various data models and the concepts they

use to abstract the representation of data.

In object-oriented and object-relational databases, the abstraction process includes

not only the data structure but also the operations on the data. These operations

provide an abstraction of miniworld activities commonly understood by the users.

For example, an operation CALCULATE_GPA can be applied to a STUDENT object

to calculate the grade point average. Such operations can be invoked by the user

queries or application programs without having to know the details of how the

operations are implemented.

1.3.3 Support of Multiple Views of the Data

A database typically has many types of users, each of whom may require a different

perspective or view of the database. A view may be a subset of the database or it may

contain virtual data that is derived from the database files but is not explicitly stored.

Some users may not need to be aware of whether the data they refer to is stored or

derived. A multiuser DBMS whose users have a variety of distinct applications must

provide facilities for defining multiple views. For example, one user of the database

of Figure 1.2 may be interested only in accessing and printing the transcript of each

student; the view for this user is shown in Figure 1.5(a). A second user, who is interested only in checking that students have taken all the prerequisites of each course

for which the student registers, may require the view shown in Figure 1.5(b).

1.3.4 Sharing of Data and Multiuser Transaction Processing

A multiuser DBMS, as its name implies, must allow multiple users to access the

database at the same time. This is essential if data for multiple applications is to be

integrated and maintained in a single database. The DBMS must include concurrency

control software to ensure that several users trying to update the same data do so in a controlled manner so that the result of the updates is correct. For example, when several reservation agents try to assign a seat on an airline flight, the

DBMS should ensure that each seat can be accessed by only one agent at a time for

assignment to a passenger. These types of applications are generally called online

transaction processing (OLTP) applications. A fundamental role of multiuser

DBMS software is to ensure that concurrent transactions operate correctly and

efficiently.

The concept of a transaction has become central to many database applications. A

transaction is an executing program or process that includes one or more database

accesses, such as reading or updating of database records. Each transaction is supposed to execute a logically correct database access if executed in its entirety without interference from other transactions. The DBMS must enforce several

transaction properties. The isolation property ensures that each transaction

appears to execute in isolation from other transactions, even though hundreds of

transactions may be executing concurrently. The atomicity property ensures that

either all the database operations in a transaction are executed or none are. We discuss transactions in detail in Part 9.

The preceding characteristics are important in distinguishing a DBMS from traditional file-processing software. In Section 1.6 we discuss additional features that

characterize a DBMS. First, however, we categorize the different types of people

who work in a database system environment.

1.4 Actors on the Scene

For a small personal database, such as the list of addresses discussed in Section 1.1,

one person typically defines, constructs, and manipulates the database, and there is

no sharing. However, in large organizations, many people are involved in the

design, use, and maintenance of a large database with hundreds or thousands of

users. In this section we identify the people whose jobs involve the day-to-day use

of a large database; we call them the actors on the scene. In Section 1.5 we consider

people who may be called workers behind the scene—those who work to maintain

the database system environment but who are not actively interested in the database contents as part of their daily job.

1.4.1 Database Administrators

In any organization where many people use the same resources, there is a need for

a chief administrator to oversee and manage these resources. In a database environment, the primary resource is the database itself, and the secondary resource is the

DBMS and related software. Administering these resources is the responsibility of

the database administrator (DBA). The DBA is responsible for authorizing access

to the database, coordinating and monitoring its use, and acquiring software and

hardware resources as needed. The DBA is accountable for problems such as security breaches and poor system response time. In large organizations, the DBA is

assisted by a staff that carries out these functions.

1.4.2 Database Designers

Database designers are responsible for identifying the data to be stored in the database and for choosing appropriate structures to represent and store this data. These

tasks are mostly undertaken before the database is actually implemented and populated with data. It is the responsibility of database designers to communicate with

all prospective database users in order to understand their requirements and to create a design that meets these requirements. In many cases, the designers are on the

staff of the DBA and may be assigned other staff responsibilities after the database

design is completed. Database designers typically interact with each potential group

of users and develop views of the database that meet the data and processing

requirements of these groups. Each view is then analyzed and integrated with the

views of other user groups. The final database design must be capable of supporting

the requirements of all user groups.

1.4.3 End Users

End users are the people whose jobs require access to the database for querying,

updating, and generating reports; the database primarily exists for their use. There

are several categories of end users:

■ Casual end users occasionally access the database, but they may need different information each time. They use a sophisticated database query interface to specify their requests and are typically middle- or high-level managers or

other occasional browsers.

■ Naive or parametric end users make up a sizable portion of database

end users. Their main job function revolves around constantly querying

and updating the database, using standard types of queries and updates—

called canned transactions—that have been carefully programmed and

tested. Many of these tasks are now available as mobile apps for use with

mobile devices. The tasks that such users perform are varied. A few

examples are:

Bank customers and tellers check account balances and post withdrawals

and deposits.

Reservation agents or customers for airlines, hotels, and car rental companies check availability for a given request and make reservations.

Employees at receiving stations for shipping companies enter package

identifications via bar codes and descriptive information through buttons

to update a central database of received and in-transit packages.

Social media users post and read items on social media Web sites.

■ Sophisticated end users include engineers, scientists, business analysts, and

others who thoroughly familiarize themselves with the facilities of the DBMS

in order to implement their own applications to meet their complex requirements.

■ Standalone users maintain personal databases by using ready-made program packages that provide easy-to-use menu-based or graphics-based

interfaces. An example is the user of a financial software package that stores

a variety of personal financial data.

A typical DBMS provides multiple facilities to access a database. Naive end users

need to learn very little about the facilities provided by the DBMS; they simply have

to understand the user interfaces of the mobile apps or standard transactions

designed and implemented for their use. Casual users learn only a few facilities that

Pueden usarlo repetidamente. Los usuarios sofisticados intentan aprender la mayoría de las funciones del DBMS

para satisfacer sus complejas necesidades. Los usuarios independientes suelen convertirse en

muy competente en el uso de un paquete de software específico.

base de datos

Volume 1

Chapter 1 - el inicio