Today’s Goals:(Data Management)
irst of a two-Lesson sequence
Today we will become familiar with the issues and problems related to
data-intensive computing
We will find out about flat-files, the simpleast databases
Next time, in our 4th Lesson on productivity software, we will discuss
relational databases and
implement a simple relational database
Keeping track of a few dozen data items is straight forward
However, dealing with situations that involve significant number of data
items, requires more attention
to the data handling process
Dealing with millions - even billions - of inter-related data items
requires even more careful thought
36.1 zainBooks.com :
Consider the situation of a large,
online bookstore
They have an inventory of millions of books, with new titles constantly
arriving, and old ones being
phased out on a regular basis
The price for a book is not a static feature; it varies every once in a
while
Thousands of books are shipped each day, changing the inventory
constantly
Some are returned, again changing the inventory situation constantly
The cost of each shipped order depends on:
Prices of individual books
Size of the order
Location of the customer
Mode of shipment
For each order, the customer’s particulars –_ name, address, phone
number, credit card number – are
required
Generally, that data is not deleted after the completion of the
transaction; instead, it is kept for future
reference
All the transaction activity and the inventory changes result in:
Thousands of data items changing every day
Thousands of additional data items being added everyday
Keeping track & taking care (i.e. management) of all that constantly
changing and expanding data is not
a trivial task and requires disciplined attention and actions for
ensuring the smooth & profitable
operation of the bookstore
36.2 Issues in Data Management:
Data entry
Data updates
Data integrity
Data security
Data accessibility
Data Entry:
New titles are added every day
New customers are being added every day
Some of the above may
require manual entry of new data into the
computer systems
That new data needs to be added accurately
That can be achieved, for one, by user-interfaces that prevent the input
of invalid data
Data Updates :
Old titles are deleted on a
regular basis
Inventory changes every instant
Book prices change
Shipping costs change
Customers’ personal data change
Various discount schemes are always commencing and concluding
All those actions require updates to existing data
Those changes need to be entered accurately
That can also be achieved by user-interfaces that prevent the input of
invalid data
Data Security :
All the data that zainBooks has in
its computer systems is quite critical to its operation
The security of the customers’ personal data is of utmost importance.
Hackers are always looking for
that type of data, especially for credit card numbers
Enough leaks of that type, and customers will stop doing business with
zainBooks
This problem can be managed by using appropriate security mechanisms
that provide access to
authorized persons/computers only
Security can also be improved through:
Encryption
Private or virtual-private networks
Firewalls
Intrusion detectors
Virus detectors
Data Integrity:
Integrity refers to maintaining
the correctness and consistency of the data
Correctness: Free from errors
Consistency: No conflict among related data items
Integrity can be compromised in many ways:
Typing errors
Transmission errors
Hardware malfunctions
Program bugs
Viruses
Fire, flood, etc.
Ensuring Data Integrity:
Type Integrity is implemented by
specifying the type of a data item:
Example: A credit card number
consists of 12 digits. An update attempting to assign a value with more
or fewer digits or one including a non-numeral should be rejected
Limit Integrity is enforced by limiting the values of data items to
specified ranges to prevent illegal
values
Example: Age of person should not be negative
Referential Integrity requires that an item referenced by the data for
some other item must itself exist in
the database
Example: If an airline reservation is requested for a particular flight,
then the corresponding flight
number must actually exist
Physical Integrity is ensured through hardware redundancy, backups, etc
Data Accessibility:
If the transaction and inventory
data is placed in a disorganized fashion on a hard disk, it becomes very
difficult to later search for a stored data item
What is required is that:
Data be stored in an organized manner
Additional info about the data be storedso that the data access times
are minimized
What if two customers check on the aavailability of a certain title
simultaneously?
On seeing its availability, they both order the title – for which,
unfortunately, only a single copy is
available
Same is the case when two airline customers try booking the only
available seat
A solution to this concurrency
control problem: Lock access
to data while someone is using it
We can write our own SW that can take care of all the issues that we
just discussed
OR
We can save ourselves lots of time, cost, and effort by buying ourselves
a Database Management
System (DBMS) that takes care of most, if not all, of the issues
36.3 DBMS :
DBMSes are popularly, but
incorrectly, also known as ‘Databases’
A DBMS is the SW system that operates a database, and is not the
database itself
Some people even consider the database to be a component of the DBMS,
and not an entity outside the
DBMS
A DBMS takes care of the storage, retrieval, and management of large
data sets on a database
It provides SW tools needed to organize & manipulate that data in a
flexible manner
It includes facilities for:
DBMS Database
User/
Progra
m
Adding, deleting, and modifying data
Making queries about the stored data
Producing reports summarizing the required contents
Database:
A collection of data organized in
such a fashion that the computer can quickly search for a desired data
item
All data items in it are generally related to each other and share a
single domain
They allow for easy manipulation of the data
They are designed for easy modification & reorganization of the
information they contain
They generally consist of a collection of interrelated computer files
Example: Univerisity Student Database:
Student's name
Student’s photograph
Father’s name
Phone number
Street address
eMail address
Courses being taken
Courses already taken & grades
Pre-VU educational record
Example: zainBooks’ Customer DB:
Name, address, phone & fax, eMail
Credit card type, number, expiration date
Shipping preference
Books on order
All books that were ever shipped to the customer
Book preference
Example: zainBooks’ Inventory DB:
Book title, author, publisher,
binding, date of publication, price
Book summary, table of contents
Customers’, editors’, newspaper reviews
Number in stock
Number on order
Special offer details
36.4 OS Independence:
DBMS stores data in a database,
which is a collection of interrelated files
Storage of files on the computer is managed by the computer OS’s file
system
Intimate knowledge of the OS & its file system is required to provide
rapid access to the data
The DBMS takes care of those details
It hides the actual storage details of data files from the user
It provides an OS-independent view of the data to the user, making data
manipulation and management
much more convenient
What can be stored in a database?
In the old days, databases were limited to numbers, Booleans, and text
These days, anything goes
As long as it is digital data, it can be stored:
Numbers, Booleans, text
Sounds
Images
Video
In the very, very old days …:
Even large amounts of data was
stored in text files, known as flat-file databases
All related info was stored in a single long, tab- or comma-delimited
text file
Each group of info – called a record - in that file was separated by a
special character; vertical bar ‘|’
was a popular option
Each record consisted of a group of fields, each field containing some
distinct data item
Flat-File
Database
Record
Field
Record
Delimiter
36.5 The Trouble with Flat-File Databases:
The text file format makes it hard
to search for specific information or to create reports that include
only
certain fields from each record
Reason: One has to search sequentially through the entire file to gather
desired info, such as ‘all books
by a certain author’
However, for small sets of data – say, consisting of several tens of kB
– they can provide reasonable
performance
Consider this tabular approach …
(same records, same fields, but in a different format)
Title Author Publisher Price
InStock
Good Bye Mr.
kim king khan zainBooks 1000 Y
The Terrible
Twins
kim
Champion zainBooks 199 Y
Calculus &
Analytical
Geometry
Smith Sahib Good
Publishers 325 N
Accounting
Secrets
Zamin
Geoffry
Sung-e-
Kilometer
Publishers
29 Y
Tabular Storage: Features & Possibilities:
Similar items of data form a
column
Fields placed in a particular row – same as a flat-file record – are
strongly interrelated
One can sort the table w.r.t. any column
That makes searching – e.g., for all the books written by a certain
author – straight forward
Title, Author,
Publisher,
Price, InStock|Good Bye Mr.
kim, king khan,
zainBooks, 1000, Y|The
Terrible Twins, kim
Champion, zainBooks, 199,
Y|Calculus & Analytical
Geometry, Smith Sahib, Good
Publishers, 325, N|Accounting
Secrets, Zamin Geoffry,
Sangg-e-Kilometer Publishers,
29, Y|
Tabular Storage: Features & Possibilities:
Similarly, searching for the 10
cheapest/most expensive books can be easily accomplished through a
sort
Effort required for adding a new field to all the records of a flat-file
is much greater than adding a new
column to the table
CONCLUSION: Tabular storage is better than flat-file storage
We will continue on this theme next time
Today’s Summary:(Data Management)
First of a two-Lesson sequence
Today we became familiar with the issues and problems related to
data-intensive computing
We also found out about flat-file and tabular storage
Next Lecture:(Database SW)
Next time, in our 4th Lesson on
productivity SW, we will continue our discussion on data management
We will find out about relational databases
We will also implement a simple relational database
|