ADD

Wednesday 18 May 2016

DBMS LANGUAGES

DATABASE LANGUAGES

A DBMS Provides two types of languages : 
  1.  Data - Definition Language - to define the data in the database.
  2. Data - Manipulation Language - to manipulate the data in the database.
Data - Definition Language (DDL) 
       The language used to define the data in the database is called as Data - Definition Language (DDL). In simple words, Data Definition Languages is used to create the database, alter the database and delete the database. The result of compilation of Data-Definition Language statements is a set of tables that is stored in a special file called data dictionary or data directory.A data dictionary is a file that contains metadata - that is, data about data. This file is consulted before actual data is read or modified in the database. This DDL is used by the conceptual schema to store (define) or retrieve (query) the records to/from the database respectively, where these records describes everything, i.e., entities, attributes and relationships.
There is yet another Data Definition Language - the Internal DDL or also known as data storage and definition language used by Internal schema which describes how data is actually stored. The result of compilation of the statements of this language specifies the implementation details of the database schemas - details  are usually hidden from the users 
            DDL commands are create,Alter and Drop only.


Data - Manipulation Language (DML)
The language used to manipulate the data in database is called Data - Manipulation Language (DML). In simple words, Data - manipulation Language is used to retrieve the data from the database, insertion of new data into the database & deletion or modification of existing data.
        The operation of retrieving the data from the database using DML is called as a query.
Simply we can say that, A query is a statement in the DML that requests the retrieval of data from the database. The portion of DML used to pose a query is known as query languages. We can use the terms query language and data - manipulation language synonymously.  The query language contains commands to select and retrieve the data from the database. Commands are also provided to insert, update & delete the records.
     Even, the  pre - defined data - manipulation functions are provided by the DBMS which can be called into our programs directly by their names i.e., procedure calls. 

A DML Manipulation the data basically in two types :
  1.      procedural DMLs
  2.  Non - procedural DMLs
  • Procedural DMLs makes a user to specify two things. one is what data is to be retrieved and the second is to tell be procedure to how to get those data. 
  • Non - procedural DMLs makes a user to specify only one thing, that is, what data is to be retrieved i.e., without telling the procedure to how to get that data.  
NOTE : Non - procedural DMLs are easier to learn and use than procedural DMLs BUT procedural language is more efficient than non - procedural language because it specifies the procedure to retrieve the data.

Data Sub - Language 
The DDL and DML are collectively referred to as the data sub - language when embedded within a host language (Example : Cor COBOL etc).

Host Language 
      A relational DBMS supports an interactive SQL interface and users can directly enter SQL commands. This simple approach is fine as long as the task at hand can be accomplished entirely with SQL commands . In practice we often encounter situations in which we need the greater flexibility of a general - purpose programming language, in addition to the data manipulation facilities provided by SQL. For example, we may want to integrate a database application with a nice graphical user interface. To deal with situations, the SQL standard defined how SQL commands can be executed from within a program in a host language such as C or JAVA, the use of SQL commands within host language program is called embedded SQL or the statement (i.e.,not deceleration) can be used wherever a statement in the host language is allowed (with few restrictions). Of course, SQL statement must be clearly marked so that a preprocessor can deal with them before invoking the compiler for the host language. Also, any host language variable used to pass arguments into an SQL command must be declared in SQL. 
          SQL and QUEL only provide facilities to define and retrieve data. To extend the data manipulation operations of this languages, for example to separately process each type of relation, these languages have to used with a traditional high - level language. such a language is called a host language and the program is called the host program.


BUFFER MANAGEMENT

BUFFER MANAGEMENT

The buffer manager is the software layer that is responsible for bringing pages from physical disk to main memory as needed. The buffer manages the available main memory by dividing the main memory into a collection of pages, which we called as buffer pool.  The main memory pages in the buffer pool are called frames.
                                                     Page Requests from Higher Level
Buffer Management in a DBMS

  • Data must be in RAM for DBMS to operate on it!
  • Buffer manager hides the fact that not all data is in RAM.
  
       The goal of the buffer manager is to ensure that the data requests made by programs are satisfied by copying data from secondary storage devices into buffer. Infact, if a program performs reading from existing buffers. similarly, if a program performs an output statement: it calls the buffer manager for output operation - to satisfy the requests by writing to the buffers. Therefore, we can say that input and output operation occurs between the program and the buffer area only.
if an input operation does not find the requested  page in the buffer pool, then the buffer manager (software) will have to do a physical transfer the page from the secondary memory (disk) to a free block in buffer pool and then make the requested page placed in the buffer pool, that is, available to the program requesting the original input operation. A similar scenario will take place in the revers order for an output operation. That is, the buffer manager (software) makes a new empty buffer a available to the program for outputting the page. If there is no space in the buffer pool then the buffer manager (software) physically transfer one of the page from buffer pool to the disk (secondary memory) to provide the empty space in the buffer pool for the output operation of the program to display the page in the buffer pool. 
In addition to the buffer pool itself, the buffer manager maintains same block keeping information and two variable for each frame in the pool ; pin - count and dirty. The number of times the page is requested in the frame - each time the pin - count variable is incremented for that frame (because that page is in this frame). For satisfying each request of the user ; the pin - counter variable is decremented each time for that frame. Thus, if a page is requested the pin - count is incremented ; if it fulfills the request the pin - count is decremented. In addition to this, if the page has been modified the Boolean variable; dirty is set as 'on'. Otherwise 'off '.



PINNED AND UN PINNED RECORDS

Pinned and Unpinned Records 

A record is said to be pinned down or pinned record if there exists a pointer to it somewhere in the database. For example, when a table look up approach is used to locate a record, the table contains a pointer to the record and the record becomes pinned down. The pinned records cannot be moved without reason randomly because in that case the pointers pointing to these records will dangle. Any movement of pinned records should be associated with appropriate modification of the pointers. In fact, the file organization method which maintain pointers to pinned records, appropriate modify these pointer whenever the records are inserted or deleted. 
A record is said to be unpinned record, if there does not exist any pointer pointing to it in the database. In fact, it is the independent record.

Pinned Records and Blocks 

  • 'pinned' means if the block cannot be written back to disk safety
  • indicated by a bit located in the header of the block
  • To write a pinned block back to disk we need to 'unpin' the block
  • Unswizzle any pointers associated to it.

 

Pinning and Unpinning of Pages 

       In buffer pool if some frame contains the requested page, the pin - count variable is incremented of that frame. So, increasing pin- count is called pinning the requested page in its frame. When the request of the requester is fulfilled, the pin - count variable is decrement of that frame and releases the page. This is called unpinning the page. 
       However, it is the buffer manager's (software) responsibility to pin a page. Whereas, it is the responsibility of the requestor of that page to tell the buffer manager to unpin a page.
    The buffer manager will not read another page into a frame unit its pin-count becomes, 0 (zero) that is,until all requestors of the page have unpinned it. If the requestor has modified the page, the Boolean variable : dirty is set as 'on' then it informs the buffer manager of this when it unpins the page to overwrite the old page with page changes and again dirty bit for the frame is set 'off'.

Buffer Replacement Policies 

         As the buffer memory is small, all the data cannot be brought into buffer memory at one time only. So, the DBMS must bring the pages in need, to the buffer memory to process it, decide what existing page buffer memory is to be replace to make space for the new page, collectively known as replacement policy. 
      The policy used to choose the replacement page for replacement can affect the time taken for database operations of the many alternative policies, each is suitable in different situations.
the buffer replacement policies is two types 
(i). Least Recently Used (LRU)
(ii). Clock Replacement.

Tuesday 17 May 2016

RECORDS

RECORDS 


Definition : A record is collection of data items and is the unit for data storage at the logical or file level. 
     A record may consist of different fields and each field corresponds to an attribute (column) of the record. For example, a student record has fields such as student name, courses, class, roll number and grade.

Spanned and Unspanned Records : 
      The records may be of fixed size, or may be variable in length. One block (also sometimes called a physical record) may contain multiple records. 

Unspanned Records Definition : When many records are restricted to fit within one block due to their small size then such records are called Unspanned Records. 

Spanned Records : When (portions of ) a single record may lie in different blocks, due to their large size, then such records are called Spanned Records

Blocking of Records : Different methods of arranging records in blocks, is called as blocking of records.
The blocking of records is done in three ways, shown as follows  
1 . fixed blocking or fixed - length records 
       i ) . Waste due to interblock gap 
      ii ) . Waste due to track size 
     iii ) . Waste due to records cannot fit in remaining portion of the block.
2 . Variable blocking for unspanned records 
3 . Variable blocking for spanned records 

BLOCKING FACTOR 
The ratio B/R where R is the record size and Bis the equal length of number of blocks Then above figures, a portion of the block would be wasted and add to the Gap G For variable length unspanned    records there is wasted space in blocks. However, the greater the radio of block length of the mean record length, the less will be the space that is wasted. 
                 Thus for fixed blocking 

                                               W  = GRm/B           
where W     ----->  calculate average waste per record 
           G      ----->  gap size 
           Rm   ----->  mean record length 
            B     ----->  equal length of number of blocks.
While retrieving or storing records in a block, it is necessary to identify where records begin or and. Such blocking information may be a part of each block, called a block header. For fixed blocking, only record length needs to be known to locate a particular record in block. However, for variable length records, the data describing the blocking must be available in the block header. Different techniques are available to identify variable length records such as making from the end - of - records or from the beginning of record. this is done by storing pointer values corresponding to each records per block. in the case of spanned records, we also need to store in each block a block pointer which points to the successor block where the remaining portion of the spanned record is stored. storing such pointer will load to further wastage of the block space.
For spanned records, the average waste per record can be expressed as
                                     W = (G + P) Rm/B + P
The spanned records have the advantage that,unlike unspanned blocking, there is no space wastage in the blocks due to the inability to pack variable length records into a fixed length block. However, Spanned records are usually difficult to implement and updating of such records creates problems. Hence, unspanned records or generally prepared and the average waste space may be reduced by increasing the mean blocking factor i.e., using either larger blocks or smaller blocks.

                                 

DISK SPACE MANAGEMENT

DISK SPACE MANAGEMENT 

The disk space manager is the lowest level of software in the DBMS architecture, with manages space on disk. In short, the disk space manager supports the concept of a page as a unit of data, and provides commands to allocate or deallocate a page and read or write a page. the size of a page is chosen to be the size of a disk block and pages are stored as disk blocks so that reading or writing a page can be done in one disk Input/Output. 
It is often useful to allocate a sequence of page as  a contiguous sequence of blocks to hold data that is frequently accessed in sequential order This capability is essential for exploiting the advantages of sequentially accessing disk blocks. Such  a capability, if desired, must be provided by the disk space manager to higher - level layers of the DBMS. 
Thus, the disk space manager hides details of the underlying hardware (and possibly the operating system) and allows higher levels of the software to think of the data as a collection of pages.



Track :

A disk surface contains several concentric tracks. In fact, a track contains data which can be read by signal read head without changing its position. the maximum amount of information that can be read by a single read/write head system in one revolution is determined by the track length. The track length is expressed in bytes, words or characters. Each track, in addition to the data, contents some extra information such as : the address of tracks (i.e., cylinder number and track number), block number, gap between blocks cyclic check code etc. information are used by Input/Output controls for proper processing of data.

Sectors : 

The tracks are sub-divided into smaller regions called sectors. A sector is the smallest addressable segment (part) of a track. The division of a track into sectors is done by hardware or software format operation.


Blocks 
Since the entire track length is too large for data to be transferred by a single Input/Output command, the data is stored on the track in a number of blocks of equal length A block may be equal to a sector or a number of sectors. Infact, the block size determines the basic unit of data which is read or written by a single Input/Output command. The blocks are separated by a gap (G) and this gap reduces this storage capacity of data. A small block size increases the number of gaps thereby causing wastage of storage space. Very large blocks on the other hand create problems for processor requiring larger main memory in which data is to be transferred.


In order to refer to a particular block, an address identifying the track and the position of the block on the track is constructed. This block identification address is known as a block pointer, which is donated by P. The block pointer (i.e., the pointer to a block) can be its absolute address consisting of the cylinder number, surface number, sector or block number etc.


LEVELS OF RAID / LEVEL OF REDUNDANCY

LEVELS OF RAID / LEVEL OF REDUNDANCY 

In  the following discussion, we discuss the different RAID levels, where we are considering some sample that would just fit on four disks. It means that,without any RAID technology, our storage system would contain exactly four data disks, BUT,depending on the RAID level chosen in the RAID technology, our storage system would vary from zero disk to four disk.

1. Level 0 : Non- redundant :  A RAID Level 0 system uses data stripping to increase the maximum bandwidth available. No redundant information is maintained. While being the solution with the     lowest cost, reliability is a problem, since the MTTF decreases linearly with the number of disk       drives in the array. RAID level 0 has the best write performance of all RAID levels, because             absence of redundant information implies that no redundant information needs to be updated! RAID level 0 does not have the best read performance of all raid levels,  since systems with redundancy have a choice of scheduling disk accesses. 
     In our example, the RAID Level 0 solution consists of only four data disks. Independent of the number of data disks, the effective space utilization for a RAID Level 0 system is always 100 percent.


2. Level 1 : Mirrored : A RAID Level 1 system is the most expensive solution . Instead of having one copy of the data, two identical copies of the data on two different disks are maintained. This type of redundancy is often called mirroring. Every write of a disk block involves a write on both disks. These writes may not be performed simultaneously, since a global system failure (Ex : due to power outrage) could occur while writing the blocks and then leave both copies in an inconsistent state. Therefore, we always write a block on one disk first and then write the other copy on the mirror disk.
       In our example, we need for data and four check disks with mirrored data for a RAID Level 1 implementation. the effective space utilization is 50 percent independent of the number of data disks.


3. level 0+1 : Stripping and Mirroring : RAID Level 0+1 : Stripping sometimes also referred to as Raid Level 10 - combines stripping and mirroring - Thus, as in RAID Level 1, read requests of the size of a disk block can be scheduled both  to a disk or its mirror image. In addition, read requests of the size of several contiguous blocks benefit from the aggregated bandwidth of all disks. the cost for writes is an analogous to RAID Level 1. 
           In our example, four data disk require four check disks and effective space utilization is always 50 percent.


4. Level 2 : Error- Correcting Codes : In RAID Level 2 the stripping unit is a single bit. The redundancy scheme used in hamming code. in our example with four data disks, only three check disks are needed. In general, the number of check disks grows logarithmically with the number of data disks. Stripping at the bit level has the implication that in a disk array with D data disks, the smallest unit of transfer for a read is a set of D blocks. Thus, Level  2 is good for workloads with many large requests since for each request the aggregated bandwidth of all data disks is used. But RAID Level 2 is bad for small requests of the size of an individual block for the same reason.
For a RAID Level 2 implementation with for data disks, there check disks are needed. Thus in our example the effective space utilization is about 57 percent. The effective space utilization increase with the number of data disks. For example, in a setup with 10 data disks, four check disks are needed and the effective space utilization is 71 percent. in a setup with 25 data disks, five check disks are required and the effective space utilization grows to 83 percent.



5. Level 3 : bit - Interleaved parity :  While the redundancy scheme used in RAID Level 2 improve in terms of cost upon RAID Level 1, it keeps more redundant information than is necessary. Hamming Code, as used in RAID Level 2, has advantage of being able to identify which disk has failed. But disk controllers can easily detect which disk has failed. Thus, the check disks do not need to contain information to identify the failed disk.information to recover the lost data is sufficient. Instead of using several disks with to store hamming Code, RAID Level 3 has a single check disk with parity information. Thus, the reliability overhead for RAID Level 3 is a single disk, the lowest overhead possible.
The performance characteristics of RAID Level 2 and RAID Level 3 are very similar. RAID Level 3 can also process only are Input / Output at a time, the minimum transfer unit is D blocks, and a write requires a read - modify - write cycle.


6. level 4 : Block - Interleaved Parity : RAID Level 4 has a stripping unit of a disk block, instead of a single bit as in RAID level 3, Block - level stripping has the advantage that read requests of the size of a disk block can be served entirely by the disk where the requested block resides. Large read requests of several disk blocks can still utilize the aggregated bandwidth of the D disks.
The write of a single block still requires a read - modify - write cycle, but only one data disk and the check disk are involved. The parity on the check disk can be updated without reading all D disk blocks because the new parity can be obtained by noticing the differences between the old data block and the new data block and them applying the difference to the parity block on the check disk :
New parity = (Old Data XOR New Data) XOR (Old parity).
RAID Level 3 and 4 configurations with four data disks require just a single check disk. Thus, in our example, the effective space utilization is 80 percent. The effective space utilization increases with the number of data disks, since always only one check disk is necessary.

7. Level 5 : Block - Interleaved Distributed Parity : RAID Level 5 improves upon Level 4 by distributing the parity blocks uniformly over all disks, instead of storing them as a single check disk. This distribution has two advantages. First, several write requests can potentially be processed in parallel, since the bottleneck of a unique check disk has been eliminated. second, read request have a higher level of parallelism. since the data is distributed over all disks, read request involve all disks, whereas in systems with a dedicated check disk never participate in reads.
A RAID Level 5 system has the best performance of all RAID levels with redundancy for small and large read and large write requests. Small writes still require a read - modify - write cycle and are thus less efficient than in RAID Level 1.
In our example, the corresponding RAID Level 5 system has 5 disks overall and thus the effective space utilization is the same as in RAID level 3 and 4.

8. Level 6 : P + Q Redundancy : A RAID Level 6 system uses Read - Solomon codes to be able to recover from upto two simultaneous disk failure. RAID Level 6 requires to check disks, but it also uniformly distributes redundant information at block level as in RAID Level 5 - Thus, the performance characteristics per small and large read requests and for large write requests are analogs to RAID Level 5. For small writes, the read - modify - write procedure involves 6 instead for four disks as compared to RAID Level 5, since two blocks with redundant information need to the updated.
   For a RAID Level 6 system with storage capacity equal to four data disks, six disks are required. Thus, in our example, the effective space utilization is 66 percent.


Monday 16 May 2016

REDUNDANCY

REDUNDANCY 

In redundancy technique, we store duplicate (copy) of original information, that is not needed normally, But that can be used in the case of a disk failure so that it can replace the contents of last information. thus, even if a disk fails, data is not lost, so that the Mean Time To Failure (MTTF) is increased. Moreover. by storing redundant information on disks, the reliability can be improved. 
         When implementing redundancy into disk array design, we have to make two choices. 
        First we have to decide where to store the redundant information i.e., we can either store the redundant information on a single disk (or) we can distribute the redundant information uniformly over all disks. 
Second, we have to decide how to retrieve the redundant information. To retrieve the redundant information. the disk arrays uses the parity scheme, which contains an extra disk known as check disk that can be used to recover from failure if any disk fails. This is done, by maintaining a parity bit: that is if the disk contains information such as 1101, here the number of '1' is odd.then the first bit on the check disk known as parity bit is set to 1. Likewise, if the disk contain even number of 1's, then the check disk parity bit is set to 0. 
            Therefore, suppose if a disk fails then first count the number of 1's in failed disk, if it is odd then check its related parity bit in check disk which should be 1 , if both are not matching i.e., if there is odd number of 1's and parity bit is 0, then it means we have corrupted one 1 in failed disk, i.e., if parity bit is 0 there should be even number of 1's and we have first counted it was odd, so add one 1 to recover lost data.
                    Example : if the disk was 11101 before failure, so we have parity bit even as 0 then after failures it was 11001 i.e., odd number of 1's but parity bit maintained for the disk was 0. then we have corrupted one 1 so immediately add one 1 to the failure disk information to make it even number of 1's. Thus, with parity we can recover from failure of any one disk.