ADD

Wednesday 18 May 2016

DBMS LANGUAGES

DATABASE LANGUAGES

A DBMS Provides two types of languages : 
  1.  Data - Definition Language - to define the data in the database.
  2. Data - Manipulation Language - to manipulate the data in the database.
Data - Definition Language (DDL) 
       The language used to define the data in the database is called as Data - Definition Language (DDL). In simple words, Data Definition Languages is used to create the database, alter the database and delete the database. The result of compilation of Data-Definition Language statements is a set of tables that is stored in a special file called data dictionary or data directory.A data dictionary is a file that contains metadata - that is, data about data. This file is consulted before actual data is read or modified in the database. This DDL is used by the conceptual schema to store (define) or retrieve (query) the records to/from the database respectively, where these records describes everything, i.e., entities, attributes and relationships.
There is yet another Data Definition Language - the Internal DDL or also known as data storage and definition language used by Internal schema which describes how data is actually stored. The result of compilation of the statements of this language specifies the implementation details of the database schemas - details  are usually hidden from the users 
            DDL commands are create,Alter and Drop only.


Data - Manipulation Language (DML)
The language used to manipulate the data in database is called Data - Manipulation Language (DML). In simple words, Data - manipulation Language is used to retrieve the data from the database, insertion of new data into the database & deletion or modification of existing data.
        The operation of retrieving the data from the database using DML is called as a query.
Simply we can say that, A query is a statement in the DML that requests the retrieval of data from the database. The portion of DML used to pose a query is known as query languages. We can use the terms query language and data - manipulation language synonymously.  The query language contains commands to select and retrieve the data from the database. Commands are also provided to insert, update & delete the records.
     Even, the  pre - defined data - manipulation functions are provided by the DBMS which can be called into our programs directly by their names i.e., procedure calls. 

A DML Manipulation the data basically in two types :
  1.      procedural DMLs
  2.  Non - procedural DMLs
  • Procedural DMLs makes a user to specify two things. one is what data is to be retrieved and the second is to tell be procedure to how to get those data. 
  • Non - procedural DMLs makes a user to specify only one thing, that is, what data is to be retrieved i.e., without telling the procedure to how to get that data.  
NOTE : Non - procedural DMLs are easier to learn and use than procedural DMLs BUT procedural language is more efficient than non - procedural language because it specifies the procedure to retrieve the data.

Data Sub - Language 
The DDL and DML are collectively referred to as the data sub - language when embedded within a host language (Example : Cor COBOL etc).

Host Language 
      A relational DBMS supports an interactive SQL interface and users can directly enter SQL commands. This simple approach is fine as long as the task at hand can be accomplished entirely with SQL commands . In practice we often encounter situations in which we need the greater flexibility of a general - purpose programming language, in addition to the data manipulation facilities provided by SQL. For example, we may want to integrate a database application with a nice graphical user interface. To deal with situations, the SQL standard defined how SQL commands can be executed from within a program in a host language such as C or JAVA, the use of SQL commands within host language program is called embedded SQL or the statement (i.e.,not deceleration) can be used wherever a statement in the host language is allowed (with few restrictions). Of course, SQL statement must be clearly marked so that a preprocessor can deal with them before invoking the compiler for the host language. Also, any host language variable used to pass arguments into an SQL command must be declared in SQL. 
          SQL and QUEL only provide facilities to define and retrieve data. To extend the data manipulation operations of this languages, for example to separately process each type of relation, these languages have to used with a traditional high - level language. such a language is called a host language and the program is called the host program.


BUFFER MANAGEMENT

BUFFER MANAGEMENT

The buffer manager is the software layer that is responsible for bringing pages from physical disk to main memory as needed. The buffer manages the available main memory by dividing the main memory into a collection of pages, which we called as buffer pool.  The main memory pages in the buffer pool are called frames.
                                                     Page Requests from Higher Level
Buffer Management in a DBMS

  • Data must be in RAM for DBMS to operate on it!
  • Buffer manager hides the fact that not all data is in RAM.
  
       The goal of the buffer manager is to ensure that the data requests made by programs are satisfied by copying data from secondary storage devices into buffer. Infact, if a program performs reading from existing buffers. similarly, if a program performs an output statement: it calls the buffer manager for output operation - to satisfy the requests by writing to the buffers. Therefore, we can say that input and output operation occurs between the program and the buffer area only.
if an input operation does not find the requested  page in the buffer pool, then the buffer manager (software) will have to do a physical transfer the page from the secondary memory (disk) to a free block in buffer pool and then make the requested page placed in the buffer pool, that is, available to the program requesting the original input operation. A similar scenario will take place in the revers order for an output operation. That is, the buffer manager (software) makes a new empty buffer a available to the program for outputting the page. If there is no space in the buffer pool then the buffer manager (software) physically transfer one of the page from buffer pool to the disk (secondary memory) to provide the empty space in the buffer pool for the output operation of the program to display the page in the buffer pool. 
In addition to the buffer pool itself, the buffer manager maintains same block keeping information and two variable for each frame in the pool ; pin - count and dirty. The number of times the page is requested in the frame - each time the pin - count variable is incremented for that frame (because that page is in this frame). For satisfying each request of the user ; the pin - counter variable is decremented each time for that frame. Thus, if a page is requested the pin - count is incremented ; if it fulfills the request the pin - count is decremented. In addition to this, if the page has been modified the Boolean variable; dirty is set as 'on'. Otherwise 'off '.



PINNED AND UN PINNED RECORDS

Pinned and Unpinned Records 

A record is said to be pinned down or pinned record if there exists a pointer to it somewhere in the database. For example, when a table look up approach is used to locate a record, the table contains a pointer to the record and the record becomes pinned down. The pinned records cannot be moved without reason randomly because in that case the pointers pointing to these records will dangle. Any movement of pinned records should be associated with appropriate modification of the pointers. In fact, the file organization method which maintain pointers to pinned records, appropriate modify these pointer whenever the records are inserted or deleted. 
A record is said to be unpinned record, if there does not exist any pointer pointing to it in the database. In fact, it is the independent record.

Pinned Records and Blocks 

  • 'pinned' means if the block cannot be written back to disk safety
  • indicated by a bit located in the header of the block
  • To write a pinned block back to disk we need to 'unpin' the block
  • Unswizzle any pointers associated to it.

 

Pinning and Unpinning of Pages 

       In buffer pool if some frame contains the requested page, the pin - count variable is incremented of that frame. So, increasing pin- count is called pinning the requested page in its frame. When the request of the requester is fulfilled, the pin - count variable is decrement of that frame and releases the page. This is called unpinning the page. 
       However, it is the buffer manager's (software) responsibility to pin a page. Whereas, it is the responsibility of the requestor of that page to tell the buffer manager to unpin a page.
    The buffer manager will not read another page into a frame unit its pin-count becomes, 0 (zero) that is,until all requestors of the page have unpinned it. If the requestor has modified the page, the Boolean variable : dirty is set as 'on' then it informs the buffer manager of this when it unpins the page to overwrite the old page with page changes and again dirty bit for the frame is set 'off'.

Buffer Replacement Policies 

         As the buffer memory is small, all the data cannot be brought into buffer memory at one time only. So, the DBMS must bring the pages in need, to the buffer memory to process it, decide what existing page buffer memory is to be replace to make space for the new page, collectively known as replacement policy. 
      The policy used to choose the replacement page for replacement can affect the time taken for database operations of the many alternative policies, each is suitable in different situations.
the buffer replacement policies is two types 
(i). Least Recently Used (LRU)
(ii). Clock Replacement.

Tuesday 17 May 2016

RECORDS

RECORDS 


Definition : A record is collection of data items and is the unit for data storage at the logical or file level. 
     A record may consist of different fields and each field corresponds to an attribute (column) of the record. For example, a student record has fields such as student name, courses, class, roll number and grade.

Spanned and Unspanned Records : 
      The records may be of fixed size, or may be variable in length. One block (also sometimes called a physical record) may contain multiple records. 

Unspanned Records Definition : When many records are restricted to fit within one block due to their small size then such records are called Unspanned Records. 

Spanned Records : When (portions of ) a single record may lie in different blocks, due to their large size, then such records are called Spanned Records

Blocking of Records : Different methods of arranging records in blocks, is called as blocking of records.
The blocking of records is done in three ways, shown as follows  
1 . fixed blocking or fixed - length records 
       i ) . Waste due to interblock gap 
      ii ) . Waste due to track size 
     iii ) . Waste due to records cannot fit in remaining portion of the block.
2 . Variable blocking for unspanned records 
3 . Variable blocking for spanned records 

BLOCKING FACTOR 
The ratio B/R where R is the record size and Bis the equal length of number of blocks Then above figures, a portion of the block would be wasted and add to the Gap G For variable length unspanned    records there is wasted space in blocks. However, the greater the radio of block length of the mean record length, the less will be the space that is wasted. 
                 Thus for fixed blocking 

                                               W  = GRm/B           
where W     ----->  calculate average waste per record 
           G      ----->  gap size 
           Rm   ----->  mean record length 
            B     ----->  equal length of number of blocks.
While retrieving or storing records in a block, it is necessary to identify where records begin or and. Such blocking information may be a part of each block, called a block header. For fixed blocking, only record length needs to be known to locate a particular record in block. However, for variable length records, the data describing the blocking must be available in the block header. Different techniques are available to identify variable length records such as making from the end - of - records or from the beginning of record. this is done by storing pointer values corresponding to each records per block. in the case of spanned records, we also need to store in each block a block pointer which points to the successor block where the remaining portion of the spanned record is stored. storing such pointer will load to further wastage of the block space.
For spanned records, the average waste per record can be expressed as
                                     W = (G + P) Rm/B + P
The spanned records have the advantage that,unlike unspanned blocking, there is no space wastage in the blocks due to the inability to pack variable length records into a fixed length block. However, Spanned records are usually difficult to implement and updating of such records creates problems. Hence, unspanned records or generally prepared and the average waste space may be reduced by increasing the mean blocking factor i.e., using either larger blocks or smaller blocks.

                                 

DISK SPACE MANAGEMENT

DISK SPACE MANAGEMENT 

The disk space manager is the lowest level of software in the DBMS architecture, with manages space on disk. In short, the disk space manager supports the concept of a page as a unit of data, and provides commands to allocate or deallocate a page and read or write a page. the size of a page is chosen to be the size of a disk block and pages are stored as disk blocks so that reading or writing a page can be done in one disk Input/Output. 
It is often useful to allocate a sequence of page as  a contiguous sequence of blocks to hold data that is frequently accessed in sequential order This capability is essential for exploiting the advantages of sequentially accessing disk blocks. Such  a capability, if desired, must be provided by the disk space manager to higher - level layers of the DBMS. 
Thus, the disk space manager hides details of the underlying hardware (and possibly the operating system) and allows higher levels of the software to think of the data as a collection of pages.



Track :

A disk surface contains several concentric tracks. In fact, a track contains data which can be read by signal read head without changing its position. the maximum amount of information that can be read by a single read/write head system in one revolution is determined by the track length. The track length is expressed in bytes, words or characters. Each track, in addition to the data, contents some extra information such as : the address of tracks (i.e., cylinder number and track number), block number, gap between blocks cyclic check code etc. information are used by Input/Output controls for proper processing of data.

Sectors : 

The tracks are sub-divided into smaller regions called sectors. A sector is the smallest addressable segment (part) of a track. The division of a track into sectors is done by hardware or software format operation.


Blocks 
Since the entire track length is too large for data to be transferred by a single Input/Output command, the data is stored on the track in a number of blocks of equal length A block may be equal to a sector or a number of sectors. Infact, the block size determines the basic unit of data which is read or written by a single Input/Output command. The blocks are separated by a gap (G) and this gap reduces this storage capacity of data. A small block size increases the number of gaps thereby causing wastage of storage space. Very large blocks on the other hand create problems for processor requiring larger main memory in which data is to be transferred.


In order to refer to a particular block, an address identifying the track and the position of the block on the track is constructed. This block identification address is known as a block pointer, which is donated by P. The block pointer (i.e., the pointer to a block) can be its absolute address consisting of the cylinder number, surface number, sector or block number etc.


LEVELS OF RAID / LEVEL OF REDUNDANCY

LEVELS OF RAID / LEVEL OF REDUNDANCY 

In  the following discussion, we discuss the different RAID levels, where we are considering some sample that would just fit on four disks. It means that,without any RAID technology, our storage system would contain exactly four data disks, BUT,depending on the RAID level chosen in the RAID technology, our storage system would vary from zero disk to four disk.

1. Level 0 : Non- redundant :  A RAID Level 0 system uses data stripping to increase the maximum bandwidth available. No redundant information is maintained. While being the solution with the     lowest cost, reliability is a problem, since the MTTF decreases linearly with the number of disk       drives in the array. RAID level 0 has the best write performance of all RAID levels, because             absence of redundant information implies that no redundant information needs to be updated! RAID level 0 does not have the best read performance of all raid levels,  since systems with redundancy have a choice of scheduling disk accesses. 
     In our example, the RAID Level 0 solution consists of only four data disks. Independent of the number of data disks, the effective space utilization for a RAID Level 0 system is always 100 percent.


2. Level 1 : Mirrored : A RAID Level 1 system is the most expensive solution . Instead of having one copy of the data, two identical copies of the data on two different disks are maintained. This type of redundancy is often called mirroring. Every write of a disk block involves a write on both disks. These writes may not be performed simultaneously, since a global system failure (Ex : due to power outrage) could occur while writing the blocks and then leave both copies in an inconsistent state. Therefore, we always write a block on one disk first and then write the other copy on the mirror disk.
       In our example, we need for data and four check disks with mirrored data for a RAID Level 1 implementation. the effective space utilization is 50 percent independent of the number of data disks.


3. level 0+1 : Stripping and Mirroring : RAID Level 0+1 : Stripping sometimes also referred to as Raid Level 10 - combines stripping and mirroring - Thus, as in RAID Level 1, read requests of the size of a disk block can be scheduled both  to a disk or its mirror image. In addition, read requests of the size of several contiguous blocks benefit from the aggregated bandwidth of all disks. the cost for writes is an analogous to RAID Level 1. 
           In our example, four data disk require four check disks and effective space utilization is always 50 percent.


4. Level 2 : Error- Correcting Codes : In RAID Level 2 the stripping unit is a single bit. The redundancy scheme used in hamming code. in our example with four data disks, only three check disks are needed. In general, the number of check disks grows logarithmically with the number of data disks. Stripping at the bit level has the implication that in a disk array with D data disks, the smallest unit of transfer for a read is a set of D blocks. Thus, Level  2 is good for workloads with many large requests since for each request the aggregated bandwidth of all data disks is used. But RAID Level 2 is bad for small requests of the size of an individual block for the same reason.
For a RAID Level 2 implementation with for data disks, there check disks are needed. Thus in our example the effective space utilization is about 57 percent. The effective space utilization increase with the number of data disks. For example, in a setup with 10 data disks, four check disks are needed and the effective space utilization is 71 percent. in a setup with 25 data disks, five check disks are required and the effective space utilization grows to 83 percent.



5. Level 3 : bit - Interleaved parity :  While the redundancy scheme used in RAID Level 2 improve in terms of cost upon RAID Level 1, it keeps more redundant information than is necessary. Hamming Code, as used in RAID Level 2, has advantage of being able to identify which disk has failed. But disk controllers can easily detect which disk has failed. Thus, the check disks do not need to contain information to identify the failed disk.information to recover the lost data is sufficient. Instead of using several disks with to store hamming Code, RAID Level 3 has a single check disk with parity information. Thus, the reliability overhead for RAID Level 3 is a single disk, the lowest overhead possible.
The performance characteristics of RAID Level 2 and RAID Level 3 are very similar. RAID Level 3 can also process only are Input / Output at a time, the minimum transfer unit is D blocks, and a write requires a read - modify - write cycle.


6. level 4 : Block - Interleaved Parity : RAID Level 4 has a stripping unit of a disk block, instead of a single bit as in RAID level 3, Block - level stripping has the advantage that read requests of the size of a disk block can be served entirely by the disk where the requested block resides. Large read requests of several disk blocks can still utilize the aggregated bandwidth of the D disks.
The write of a single block still requires a read - modify - write cycle, but only one data disk and the check disk are involved. The parity on the check disk can be updated without reading all D disk blocks because the new parity can be obtained by noticing the differences between the old data block and the new data block and them applying the difference to the parity block on the check disk :
New parity = (Old Data XOR New Data) XOR (Old parity).
RAID Level 3 and 4 configurations with four data disks require just a single check disk. Thus, in our example, the effective space utilization is 80 percent. The effective space utilization increases with the number of data disks, since always only one check disk is necessary.

7. Level 5 : Block - Interleaved Distributed Parity : RAID Level 5 improves upon Level 4 by distributing the parity blocks uniformly over all disks, instead of storing them as a single check disk. This distribution has two advantages. First, several write requests can potentially be processed in parallel, since the bottleneck of a unique check disk has been eliminated. second, read request have a higher level of parallelism. since the data is distributed over all disks, read request involve all disks, whereas in systems with a dedicated check disk never participate in reads.
A RAID Level 5 system has the best performance of all RAID levels with redundancy for small and large read and large write requests. Small writes still require a read - modify - write cycle and are thus less efficient than in RAID Level 1.
In our example, the corresponding RAID Level 5 system has 5 disks overall and thus the effective space utilization is the same as in RAID level 3 and 4.

8. Level 6 : P + Q Redundancy : A RAID Level 6 system uses Read - Solomon codes to be able to recover from upto two simultaneous disk failure. RAID Level 6 requires to check disks, but it also uniformly distributes redundant information at block level as in RAID Level 5 - Thus, the performance characteristics per small and large read requests and for large write requests are analogs to RAID Level 5. For small writes, the read - modify - write procedure involves 6 instead for four disks as compared to RAID Level 5, since two blocks with redundant information need to the updated.
   For a RAID Level 6 system with storage capacity equal to four data disks, six disks are required. Thus, in our example, the effective space utilization is 66 percent.


Monday 16 May 2016

REDUNDANCY

REDUNDANCY 

In redundancy technique, we store duplicate (copy) of original information, that is not needed normally, But that can be used in the case of a disk failure so that it can replace the contents of last information. thus, even if a disk fails, data is not lost, so that the Mean Time To Failure (MTTF) is increased. Moreover. by storing redundant information on disks, the reliability can be improved. 
         When implementing redundancy into disk array design, we have to make two choices. 
        First we have to decide where to store the redundant information i.e., we can either store the redundant information on a single disk (or) we can distribute the redundant information uniformly over all disks. 
Second, we have to decide how to retrieve the redundant information. To retrieve the redundant information. the disk arrays uses the parity scheme, which contains an extra disk known as check disk that can be used to recover from failure if any disk fails. This is done, by maintaining a parity bit: that is if the disk contains information such as 1101, here the number of '1' is odd.then the first bit on the check disk known as parity bit is set to 1. Likewise, if the disk contain even number of 1's, then the check disk parity bit is set to 0. 
            Therefore, suppose if a disk fails then first count the number of 1's in failed disk, if it is odd then check its related parity bit in check disk which should be 1 , if both are not matching i.e., if there is odd number of 1's and parity bit is 0, then it means we have corrupted one 1 in failed disk, i.e., if parity bit is 0 there should be even number of 1's and we have first counted it was odd, so add one 1 to recover lost data.
                    Example : if the disk was 11101 before failure, so we have parity bit even as 0 then after failures it was 11001 i.e., odd number of 1's but parity bit maintained for the disk was 0. then we have corrupted one 1 so immediately add one 1 to the failure disk information to make it even number of 1's. Thus, with parity we can recover from failure of any one disk.


RAID

RAID - REDUNDANT ARRAYS OF INDEPENDENT DISKS

A disk array is an arrangement of several disks together. organized  so as to increase the performance and improve reliability storage system. 
  In simple words, performance is increased through data stripping because  it distributes data over several disks to give the impression of having a single large, very fast disk. But Reliability is improved part redundancy because instead of having single copy of the data. redundant (duplicate or copy) information is painted so that in case  of a disk failure, it can be used to reconstruct the contents of the failed disk.
          Therefore, in short, disk arrays that implements a combination of both i.e., data stripping and redundancy are called as "Redundant Arrays of independent disks", or in short, "RAID" to improve both i.e., performance and Reliability respectively.
           several RAID organizations, referred to as RAID levels, have been proposed. Each RAID level represents a different trade off between reliability and performance as follows.


Data Stripping 

Data stripping is the process where a single large file is divided into a small - small equal pieces and stores these piece of a file on different disks.


Data stripping uses parallelism to improve disk performance by distributing data over multiple disks to make them appear as a single large, fast disk. stripping improves overall Input/Output
performance by allowing multiple Input/Output to be serviced in parallel, thus providing high overall transfer rates. Data stripping also accomplishes load balancing among disks.



DISK BLOCK COMPONENTS

THE TIME TO ACCESS A DISK BLOCK HAS FOLLOWING COMPONENTS

  1. Seek time : Seek time is defined as the time taken to move the disk heads to the track on which a desired block is located.  
  2. Rotational delay : Rotational delay is defined as the waiting time for the desired block to rotate under the disk head; it is the time required for half a rotation on average and is usually less than seek time.
  3. Transfer Time : Transfer time is defined as the time to actually read or write the data in the block once the head is positioned. that is, the time for the disk to rotate over the block. 
THE PERFORMANCE OF DISK STRUCTURE DEPENDS UPON THE FOLLOWING THREE PRINCIPLES :
  1. Data must be in memory for the DBMS to operate on it.
  2. The unit for data transfer between disk and main memory is block; if a single item on a block is needed. the entire block is transferred. Reading or writing a disk block is called an I/O (input/output) operation.
  3. The time to read for write a block varies, depending on the location of the data : 
             access time = seek time + rotational delay + transfer time. 
These observations imply that the time taken for database operations is affected significantly by how data is stored on disks.




Sunday 15 May 2016

MAGNETIC DISKS

MAGNETIC DISKS

Magnetic Disk are used for storing large amounts of data. The storage capacity of a single disk ranges from 10 MB to 20 GB. Magnetic disks support direct access to a desired location and are widely used for the database applications. A DBMS provides access to data on disk ; applications of DBMS should not worry about, whether data is in main memory are disk.
    The most basic unit of data on the disk is a single bit of information. By magnetizing an area on disk in certain ways, one can make a bit value to represent either in 1 or 0. To code information, bits are grouped into bytes (bytes or characters). Byte sizes are equal to 4 to 8 bytes, depending on the computer and the device. We assume that one character is stored in a single byte, and we used the terms byte and character interchangeably. The capacity of a disk is the number of bytes it can store, which is very large. small floppy disk used with micro computers holds from 400 KB to 1.5 MB; hard disks for micro - typically hold from several hundreds MB to few GB: and large disk packs used with minicomputers and mainframes have capacities that range up to a few tens or hundred of  GB. Disk capacities continue to grow as technology improves. Data is stored on disk in units called disk blocks. A disk block is a contiguous sequence of bytes and is the unit in which data is written to a disk and read from a disk.


         Blocks are arranged in concentric rings called tracks, on one or more platters. Tracks can be recorded on one or both surfaces of platter.
We refer to platters as single - sided or double - sided accordingly. 
The set of all tracks with the same diameter is called a cylinder, because the space occupied by these tracks is shaped like a cylinder; a cylinder contains one track per platter surface.

DBMS STORAGE MEDIA




DBMS STORAGE MEDIA

FILE ORGANIZATION : STORAGE MEDIA 

introduction:

the physical or internal level, organization of a  database system is concerned with the efficient storage of information in the secondary storage devices. At this level the user's are no  longer concerned with the view of the database (how data is stored). User can retrieve the data from the physical level to the logical level by providing necessary mapping from the physical level to the logical level . so, the basic problem in physical database representation is to select a suitable file system to store the desired information and retrieve it later. The file consist of  records and a record may consist of several fields.
          The efficiency of a file system depends on how efficiently the operations : Retrieve, Insert, Update and Delete,  can be performed on the file. However, the efficiency of these operations on a particular file may depend on the type of queries the file system needs to process. For example, a sequential file system is more efficient. Where the storage and retrieval of records are performed sequentially. The same file system turns out to be inefficient in an environment where records are to be retrieved or updated randomly.


Data in a DBMS is stored on storage devices such as disks and tapes. The disk space manager is responsible for keeping track of available disk space. the file manager, which provides the abstraction of a file of  records to higher levels of  DBMS code, issues requests to the disk space manager to obtain space on disk. the file management layer requests and frees disk space in units of a page; the size of a page is a DBMS parameter and typical values are 4 KB or 8 KB. the file management layer is responsible for keeping track of the pages in a file and for arranging records within pages.
      When a record is needed for processing, it must be fetched from disk to main memory. The page on which the records resides is determined by the file manager. Sometimes, the file manager uses supporting data structures to quickly identify the page that contains a desired record. After identifying the required page, the file manager issues a request for the page to a layer of DBMS code called the buffer manager. The buffer manager fetches a requested page from disk into a  region of main memory called the buffer pool and tells the file manager, the location of the requested page.

Thus, the glimpse of the physical level in DBMS focuses us on the study of details of memory hierarchy, redundant arrays of independent disks (RAID), disk space management etc. records and a record may consist of several fields.

MEMORY HIERARCHY

MEMORY HIERARCHY

Memory in a Computer system is arranged in a hierarchy, as follows.



at the top, we have primary storage, which consists of cache and main memory , and provides very fast access to data. then comes secondary storage, which consists of slower devices such as magnetic disks. tertiary storage is the slowest class of storage devices; for example, optical disks and tapes.
            all the primary storage level, the memory hierarchy includes at the most expensive end:
cache memory, which is a static RAM (Random Access Memory ) cache memory is mainly used by the CPU to speedup execution programs. the next level of primary storage is DRAM (Dynamic Random Access Memory ), which provides the main work area for the CPU for  keeping programs and data , which is popularly called as main memory . the advantages of  DRAM is its low cost, which continuous to decrease ; the drawback is its volatility and lower speed compared with static RAM.

       At the secondary storage level, the hierarchy includes magnetic disks, as well storage in the form of CD - ROM (Compact Disk - Read Only Memory ) devices.

      At the tertiary storage level, the hierarchy includes optical disks and tapes as the least expensive end.
              The storage capacity anywhere in the hierarchy is measured in kilobytes (k bytes or bytes), megabytes (M bytes or 1 million bytes), gigabytes (G byte or billion bytes), and even terabytes
(1000 G bytes).
  
           programs reside execute in DRAM . Generally, large permanent database reside on secondary storage, and portions of the database are read into and written from buffers is main memory as needed. personal computers and work stations have tens of megabytes of data in DRAM. it is become possible to load a large fraction of the database into main memory. an example is telephone switching applications, which store databases that contain routing and line information in main memory.
     Between DRAM and magnetic disk storage, another form of memory resides, flash memory, which is becoming common, particularly because it is non - volatile. flash memories are high density, high - performance memories using EEPROM (Electrically Erasable programmable Read -only Memory) technology. the advantage of flash memory is the fast access speed; the disadvantage is that an entire block must be erased and written over at a time.
           
         CD - ROM disks store data optically and are read by a laser. CD - ROM s contain pre - recorded data that cannot be overwritten. WORM (Write - Once - Read - Many disks) are a form of optical storage used for archiving data; they allow data to be written once and read any number of times without the possibility of erasing. the DVD (Digital Video Disks) is a recent standard for optical disks allowing fourteen to fifteen gigabytes of storage per disks.
        Tapes are relatively not expensive and can store very large amount of data. when we maintain data for a long period but do expect to access it very often. A Quantum DLT 4000 drive is a typical tape device; it stores 20 GB of data and can store about twice as much by compressing the data.it records data on 128 tape tracks, which can be thought of as a linear sequence of adjacent bytes, and supports a sustained transfer rate of 1.5 MB/sec with uncompressed data (typically 3.0 MB/sec with compressed data). A single DLT 4000 tape drive can be used to access up to seven tapes in a stacked configuration, for a maximum compressed data capacity of about 280 GB.

Friday 4 March 2016

OBJECT - BASED LOGICAL MODELS

   Object - based logical models are used in describing data at logical level and view level.Logical level and view level are used to retrieve the data.Logical level describes what data are stored in the database and what relationships exist among those data,Logical level is used by DBA,who must decide what information is to be kept in the database.View level describes only part of the entire database to be viewed by the user of the database hiding the details of the information stored.
OBJECT - BASED LOGICAL MODELS ARE DESCRIBED IN THE DIFFERENT FOLLOWING MODELS:
  • The Entity - Relationship Model
  • The Object - Based Logical Model
  • The Semantic Data Model
  • The Functional Data Model.

THE ENTITY - RELATIONSHIP MODEL

   An entity is a "thing" or "object" in the real world that is distinguishable from other objects.The Entity - Relationship Model (E - R Model) is based on a collection of basic objects,called entities,and the relationship among these objects.

* Rectangles represent entities
Diamonds represent relationship among entities
* Ellipse represents attributes ( characteristics of entities)
* Lines represents link of attributes to entities to relationships
   Here student and college are two different entities "things" or "Objects" and there is only one relationship between these two entities - i.e.,details.It means student details are in ID.When student details are opend it specify the student - name,student - class,student - city Known as attributes:(Characteristics to be possessed by entities).Likewise,college shoul specify the attributes:college - ID and college - name.
   Therefore,we can conclude that the overall logical structure of a database can be expressed graphically by an E-R diagram.

THE OBJECT - ORIENTED MODEL

   Like the E-R model,the Object-oriented Model is based on a collection of objects.An object contains values stored in instance variables,within the object also contains bodies of code that operates on the object.These bodies of code are called as methods.
   Objects that contain the same types of values and the same methods are grouped together into classes.A class may be viewed as a definition for objects.This combination of data and methods comprising a definition is similar to a programming-language abstract data type.
   The only way in which one object can access the data of another object is by invoking a method of that other object.This action is called sending a message to the object.Thus,the call interface of the methods of an object defines that objects are externally visible.The internal part of the object-the instance variables and method code -are not visible externally.The result is two levels of abstraction.

THE SEMANTIC DATA MODEL

   A semantic Data Model is a more high-level data model that makes it easier for a user to give starting description of data in an enterprise.These models contain a wide variety of relations that helps to describe a real application scenario.A DBMS cannot support all these relations directly;so it is built only with few relations Known as relational model in DBMS.A widely used semantic data model is the Entity-Relationship (ER) data model which allows us to graphically denote entities and relationship between them.

THE FUNCTIONAL DATA MODEL

   The functional Data Model makes it easier to define functions and call them wherever necessary to process data.

DATA MODELS

   Data Model means to model the data i.e., to give a shape to the data - to give a figure to the stored data.A data model makes it easier to understand the meaning of the data by its figure and thus we model data to ensure that we understand:
  • The nature of the data itself,independent of its physical representations.
  • Each users view to the relation of data.
  • The use of data across application areas.
   The Data Model can be used to convey the designers understanding of the information requirement of the organization.Increasingly,organizations are standardizing the way that they model data by selecting a particular approach to data model and using it throughout their database development projects.
   In simple words we can define data model as " A collection of high - level data description that hide many low - level storage details"."A data model can also be defined as a collection of conceptual tools for describing data,data relationships,data semantics and consistency constraints".Thus,A DBMS allows a user to define the stored data in terms of data model.
THE DATA MODELS ARE DIVIDED INTO THE THREE DIFFERENT GROUPS:
   (a) Object - Based Logical Models
   (b) Record - Based Logical Models
   (c) Physical Models.
RECORD - BASED LOGICAL MODEL
Record-Based Logical Models describes data at Logical and View levels. When compared with object-based data models, the record-based logical models specifies the overall logical structure of the database and provides higher-level implementation.
         Record - based models are so named because the data is kept in the form of records (documents) of several types, each record has fixed number of (fields or) attributes and each field is of fixed length.
      The Record-Based Models are of three types

  • Relation Model
  • Network Model 
  • Hierarchical Model
Relational Model 
      The relational model represents both data (entities) and relationships among data in the form of tables. Each table has multiple columns and each column has a unique name. Consider the following relational model :
   The description of data in terms of tables is called as relations, from the above CUSTOMER and Account relations, we can make a condition that customer details are maintained in CUSTOMER relation database and their deposit details are maintained in Account relation database.
Network Model
    Data in the network model are represented by collection of records and relationships among data are connected by links. The links can be viewed as pointers. The records in the database are represented in the form of graphs.
Hierarchical Model
Hierarchical Model is same as the Network Model i.e., Data in the Hierarchical Model also are, represented by collection of records and relationship among data are connected by links. The links can be viewed as pointers. But, the difference from Network Model is that, the records in the database are represented in the form of trees.
   

Thursday 3 March 2016

FILE SYSTEMS Vs DBMS

INTRODUCTION TO FILE SYSTEMS

        one way to keep the information on a computer is to store it in permanent system i.e., FILES.FILE is a memory block of a secondary - storage device like disk,magnetic -tape etc, to store logically related records permanently.Therefore,in a file management system,each record has a separate file.Various records are stored in various files and different programs are written to work with different files respectively.thus,as time goes by,more file and more programs are added to the system.this approach of data management,where the organization has many programs with related file,is Known as traditional approach.This traditional approach can cause many problems like data redundancy,data inconsistency,difficulty in accessing data etc.
      These problems of traditional file system can be eliminated by using database approach.With database approach,all the data resides in the same storage location,rather than residing in many different files across the organization.unlike traditional file system,in which different programs access different files,the database in arranged so that one set of programs - the database management system - provides access to all the data.therefore,data redundancy,data inconsistency are minimized and data can be shared among all users of the database.In addition security and data integrity are increased.

STRUCTURES OF DBMS 

The structure supporting  parts of a  DBMS with same simplification based on the relational data model. 
  A DBMS is divided into  two modules (parts) :   
  1. Query processor 
  2. Storage manager 
     The query processor components in DBMS accepts SQL commands generated from a variety of user interfaces, produces query evaluation plans, executes these plans with the database, and returns the answers. The query processor components are : 
(i) . DML compiler,  which translates DML statements in a query language low- level instructions that the query evaluation engine understands. In addition, the DML compiler transform a  user's request into an equivalent but more efficient from low-level instructions, thus finding a good strategy for executing the query. 


(ii) . Embedded DML Precompiler, SQL commands can be embedded in host- language application programs, example; JAVA or COBOL programs, which can be converted into DML statements by this embedded DML pre-compiler. The pre-compiler must interact with the DML compiler to generate the appropriate code. 

 (iii) . DDL interpreter, which interpreter DDL statements and records than in a set tables containing metadata.
(iv) . Query Evaluation Engine, which executes low-level instructions generated by the DML compiler. 
(v) . Application Programs Object Code, which is the low- level instructions of the programs written by novice users, which the query evaluation engine understands and executes them.

       The storage manager components in DBMS provides the interface between the physical level stored data in the database and the programs/ queries requesting the stored data from the database. The storage manager components are : 

(i) . Authorization and integrity manager, which tests for the satisfaction of integrity constraints  and checks the authority of uses to access data.
(ii) . Translation manager, which ensures that the database remains in a consistent (corrector) state despite of the system failures, and the concurrent transaction executions proceed without conflicting.
(iii).file manager, which manages the allocation of space storage and the data structures used to represent the stored information on disk.
(iv). Buffer  manager, which is responsible for fetching data from disk storage into main memory, and deciding what data to cache in memory.
(v). Lock manager, which keeps track of requests for locks and grants looks on the data in the database.
(vi). Recovery manager, which is responsible for maintaining a log and restoring the system to a consistent state after a crash. 
           In addition, the  DBMS software deals with the management of space on disk,where the data is stored, The disk storage components are :
(i) . Data files, which store the database itself.
(ii). Data dictionary, which stores metadata  about the structure of the database. The data dictionary, is used very heavily. Therefore, great emphasis should be placed on developing a good design and efficient implementation of the dictionary,
(iii) . Indices, which provide fast access to data items that hold particular values.
(iv) . Statistical data, which stores statistical information about the data in the database. This information is used by the query processor to select efficient ways to execute a query.

Wednesday 2 March 2016

DISADVANTAGES OF DATABASE MANAGEMENT SYSTEM (DBMS)

CONVERSION COSTS

   The older systems in an organization are based on file processing and / or older database technology.The cost of converting these older systems to modern database technology - is measured in terms of dollars,time and organizational commitment and may often seems prohibitive to an organization.

INSTALLATION AND MANAGEMENT COST

  • A multiuser database management system is a large and complex software that has a high initial cost,requires a staff of trained personnel (people) to install and operate,annual maintenance etc.
  • Installing such a system may also require upgrades to the hardware and data communication system in the organization.
  • Substantial training is required at ongoing basis to keep up with new releases and upgrades.

NEW SPECIALIZED PERSONNEL

  • Organizations that adopt the database approach need to hire or train individuals to design and implement databases,provide database administration services and manage a staff of new people.
  • Further,because of the rapid changes in technology these new people will have to be retrained or upgraded on a regular basis.

NEED FOR EXPLICIT BACKUP AND RECOVERY

     A shared database must be accurate and available at all times.This requires that procedures should be developed and used for providing backup copies of data and for restoring a database when damage occurs.

SECURITY BREACHES

   Centralization also means that the data is accessible from a single source,namely the database.This increases the severity of security breaches(breaking) and disrupting the operation of the organization.