Wednesday, 29 October 2014

Introduction To SQL Server Joins with Example

In SQL Server database, if records of two or more tables have to be combined, then the Join clause is used. It is a means that combine same fields of the tables can be combined using their common values. There are five types of join clause available for ANSI standard SQL Server: INNER, LEFT, RIGHT, FULL, and CROSS. A unique kind of Join clause named SELF-JOIN can be used that can combine itself.

Let me explain all joins with an example. Consider two tables below i.e. customer and bank_account

Table customer:


id
Customer_name
business
salary
1
1
jublin
doctor
1000000
2
2
john
Bank manager
500000
3
3
michael
engineer
1000000
4
4
shown
clerk
300000

Table bank-account:


id
Account_name
Account_id
1
1
SBI
3
2
2
RBI
2
3
3
BOI
1
4
4
HDFC
2

Inner Join:
This requires combining matching records of two tables. This is one of the commonly used join operation processed on tables, but it should not be considered as the best solution in every situation. Similar fields of the tables can be combined on the basis of join-predicate. When a query is passed, both the tables are combined with non-NULL values.

For example: We have two tables named Customers and Bank-account. If we want to join the tables that have customer_name, business, and account_name available depending upon the common IDS, the following query has to be run:

select customer_name, business, account_name from customer
inner join Bank_account on customer.id=Bank_account.account_id;

The join here is made on the ID column in both the tables.


Customer_name
business
Account_name
1
michael
engineer
SBI
2
john
Bank manager
RBI
3
jublin
doctor
BOI
4
john
Bank manager
HDFC

Left Join:
In this case, records of the left table will be displayed irrespective of the fact that it does not have a matching entry in the right table. So, in the left join, all data from the ‘Customer’ table will be displayed but only the matching entries of the right table, i.e. Bank_account will be displayed.
It is important to understand here what does this left and right means here. In the query, that way you mention the table name depicts which is the right or left table. For example:

select customer_name, business, account_name from customer
left join Bank_account on customer.id=Bank_account.account_id;

Here the left table is customer because it is on the left side of the statement. In the output, notice that the IDs in both the table for the clerk did not match and thus account name is displayed as NULL.   


Customer_name
Business
Account_name
1
Jublin
Doctor
BOI
2
John
Bank manager
RBI
3
john
Bank manager
HDFC
4
michael
engineer
SBI
5
shown
clerk
NULL

Right Join:
This is opposite of the left join. In this case, all entries in the right table will be displayed in the output irrespective of if the records match with the left table. The standard syntax that has to followed here is:

select customer_name, business, account_name from customer
right join Bank_account on customer.id=Bank_account.account_id;

Here you can notice that all records of the right table are displayed in spite of their non-matching behavior with the left table.



Full Join
Consider as the combination of inner, left, and right join. It will show all the matching, non-matching records from the left as well as right table. You can notice here that all entries of the tables are displayed irrespective of if they have matching values. John, the bank manager has an account in RBI as well as HDFC.  The standard syntax to follow is:


select customer_name, business, account_name from customer
full join Bank_account on customer.id=Bank_account.account_id;


Self join: It is used to join a table to itself if there are two tables so that one table could be renamed in the SQL statement.

I have used two IDs from tables customer and bank_account where one id is less than other tables account_id.

For example- select customer_name, business, account_name from customer, Bank_account where customer.id<Bank_account.account_id;


Cartesian Join:
Also known as the cross join, this does not require any specific condition to be fulfilled. The output of the two tables provided will be the Cartesian product if WHERE clause filters the rows. The rows of the left column are multiplied with the rows of the right column.


Conclusion: I have explained all above different types of SQL Server joins with their examples and use. Further will explain more about SQL server related information.


Wednesday, 15 October 2014

Understanding the Concept Of SQL Server Data Mining

What is Data Mining?


Data mining helps organization having large sized database to explore their data in a simple and interactive way. It is used to find patterns of the data and performs predictions on the process like a transaction on the server will succeed or not. It provides real time predictions and its technologies, help users to analyze data as well as discovers hidden patterns in their data sets. Data mining uses combination of the statistics, probability, artificial intelligence, machine learning and database technologies etc to analyze particular data sets.

We can call data mining as the process of exploring data from large sized databases and extracting required information according to certain rules and patterns. Different analysis schemes are used in order to recognize the patterns or the rules in the historical data according to the provided business scenario. The information thus can be stored in an abstract mathematical model which is termed as the Data Mining Model.  Once this is done, new database is examined using this model to add relevant information according to the rules or patterns. This is done in accordance to improve results for a query for a given business scenario. 

Understanding with an Example: With analysis of recorded database for the number of its items that are purchased from different shops, retail stores, or supermarket chains, it is possible to derive information about the product that are sold most so that its supply can be increased accordingly. In short, Data Mining is an analytical activity where hidden patterns are studied by sorting huge sized database. 

Why Consider Data Mining: There are many advantages that help users to choose data mining techniques. But few are listed below:

  1. It helps to discover reasons for success and failure.
  2. It helps to understand your customers, products etc.
  3. It improves your organization by mining large sized databases.

Know What SQL Data Mining is?


SQL Data Mining is an automatic backend procedure where a set of machine learning algorithms explores the database for the defined patterns. Once designed, these patterns can be a great help to get a better insight to the data and then can be further used for creating predictions that allow exploring different facts based on the defined algorithms.

There are nine mining algorithms for SQL Server and additional tools are required for creating and deploying the data mining models that suits situation of a business. For SQL data mining, free Business Intelligence Development Studio (BIDS) is available free by the Microsoft. Mathematical techniques are applied on a set of data called mining algorithms. .NET framework, BDIS, DMX languages are used as custom Microsoft solutions and this is the reason why data mining is sometimes referred to as machine learning.


SQL Data Mining Algorithms


Data Mining algorithms is a heuristic program that creates a data mining model from the warehouse. The algorithms first examines the provided data, search out for particular rules and patterns. The outcome of this analysis is used by the algorithms to create data mining model (DMM). The parameters defined for DMM are applied to the data warehouse for extracting detailed statistics. The mining model can be converted into:

  1. Set of clusters illustrating how to relate the cases in dataset.
  2. Decision Tree forecasts about the outcome and its after-effects.
  3. Set of Rules explain how to group the products in a transaction.
SQL Server Analysis Services (SAAS) provides variety of algorithms for a perfect data mining solution. All these algorithms are customizable. 

Classification Algorithms: It predicts distinct variables depending upon the different attributes in the dataset. 

Regression Algorithms: It creates a linear equation for different type of variables so that is most suitable for the dataset. 

Segmentation Algorithms: This helps to categorize the database into certain groups or clusters that share similar properties. 

Association Algorithms: It helps to relate different attributes of the database. This is one of the widely accepted algorithms used for market based analysis.

Sequence Analysis Algorithms: This helps to explore the data that is linked by sequences. For Example: Web Path Flow 

One or more algorithms can be adopted as a part of solution for businesses. Experienced analysts adopt only one algorithm to test output after an input and then apply another algorithm to test outcome based on provided data.

Conclusion:


Data mining techniques are helpful for those users who deal with large sized database, analyse event failure and predict about the data etc.

Friday, 10 October 2014

Introduction of Triggers in SQL Server

Triggers are the database objects that are used in relational databases. These are executed or fired automatically, when DDL or DML command statement is executed related to trigger. Triggers are used to assess or evaluate data, preserve data integrity, control server operations, used to implement business logic/rules, or to audit server, etc., very easily. One type of trigger i.e. even, allows modifying a data to multiple base tables of a view.

I will explain the different types of triggers used in SQL Server and more about their respective roles.

Types of Triggers:

There are four types of triggers used in SQL server:
  1. DDL Triggers (Data Definition Language Triggers)
  2. DML Triggers (Data Manipulation Language Triggers)
  3. CLR Triggers
  4. Logon Triggers
DDL Triggers:

Users can create triggers on DDL statements like Alter, Create, Drop, etc., and stored procedures based on a system that perform DDL operations.

Example: Suppos, a user executes the Create Login statement or sp_addlogin stored procedure for user login, then both these procedures can fire a DDL trigger that user can create on the create_login event.

We can also make After Trigger on DDL statements and cannot use Instead of the clause.

DDL triggers are helpful to manage administrator tasks like; regulating auditing database operations used to control actions on the SQL Server.

DML Triggers

In SQL Server we can create triggers on DML statements (like; INSERT, UPDATE, and DELETE) and stored procedures that perform DML-like operations. DML Triggers are of two types:

1. After Trigger:  When execution of the action performed on SQL Server is about to finish, then this trigger is fired. For example: When a user insert record in a table, then the trigger related to this insert event will fire only after the ‘all constraints pass in the row’, like; primary key, rules, etc. If it fails then SQL Server won’t fire this trigger.

2. Instead of Trigger:
This trigger starts before the execution of the action performed by SQL Server. It used before the action is performed opposite to the after trigger. The user can apply Instead of clause like; update, deleted, insert triggers for a table. Actually, it doesn’t include insert and all to the table.

For example: When a user insert record in a table, then the trigger related to this insert event will fire only before the ‘all constraints passed’ in the row like primary key, rules, etc. If it fails then SQL server will fire this trigger.

3. CLR Triggers

These are the special types of triggers that are based on Common Languages Runtime in .net framework. CLR is integrated with the SQL Server 2008 and allows triggers to be programmed in .net languages like; VB, C# etc.

User can write code for both DDL and DML triggers using CLR languages. For more on CLR triggers visit

Logon Triggers

These triggers are used when a LOGON event of the SQL Server occurs. This event gets raised when a user session is being established with the SQL Server after the authentication phase has finished. All the messages defined in the trigger like; error messages will be redirected to the error log of the SQL Server. If authentication fails, then the logon triggers aren’t being fired. To audit server controls and server sessions like login activity, limit the no. of sessions etc., these triggers has used.

Syntax to use The Logon Trigger

Create trigger trigger_name
On all server
[with encryption]
{For|after} logon
As
Sql_statement [1..n]

Syntax for trigger

Create trigger trigger_name
On {table|view}
[with encryption|Execute As] -- this is optional, if specified, then text in the trigger will be encrypted.
{ for|after|instead of } { [create|alter|drop|delete|update|insert] }
[not for replication] – It indicates that trigger shouldn’t be executed when a table is modified by the replication process.
As – After this action and condition are performed.
SQL_statement upto n


Conclusion: All I have explained above is about triggers used in SQL Server and their types. I also described how to code triggers for the default table or view in the SQL Server. For more details about the triggers and their execution, I will explain in detail on my next blog.

Tuesday, 7 October 2014

Brief Introduction of Deadlock Occurrence in SQL Server

A deadlock is defined as the situation where two or more processes permanently block each other by acquiring the lock on a resource that has already been requested by another process. This means one process is trying to lock the resource while the other one has already locked it. The following graph shows the state of deadlock.

  • Process P1 has a lock on resource R1 and has requested a lock on resource R2.
  • Process P2 has a lock on resource R2 and has requested a lock on resource R1.
  • The situation shows that neither the process can continue without a resource being released nor a resource can be available till a process continues, which means a deadlock state exists.
Sometimes, a deadlock state is also called Cyclic Dependency as, process P1 depends on the process P2. In SQL Server, database engine automatically detects deadlock cycles. The SQL Server database engine chooses one of the sessions as the deadlock victim and the current transaction is terminated to break the deadlock.

Here, the process refers to a transaction. By default SQL Server transaction doesn’t stop unless LOCK_TIMEOUT is set.

Deadlock can occur on any system having multiple threads not just on relational database management system. When we talk about database engine, sessions can encounter deadlock when acquiring non database resources like; memory or threads.


The above figure shows that Transaction 1 has acquired a lock on resource Supplier which Transaction 2 wants to acquire. However, Transaction 2 has acquired a lock on resource Part which Transaction 1 wants to acquire. Both are dependent on each other. There is a deadlock between T1 and T2.

In an SQL Server database, deadlocks can also occur in a table when it is partitioned and LOCK_ESCALATION is set to auto. In this case concurrency increases, as database engines allow table partition lock on HoBT level instead of Table Level. Separate transaction holds partition locks in a table and wants a lock on other transactions and results in a deadlock. This deadlock can be avoided by setting LOCK_ESCALATION to TABLE and it will reduce the concurrency.

Resources That Can Cause Deadlock:

Following types of resources can cause deadlock by blocking transactions or processes.

·       Locks: Acquires locks on resources like; pages, objects, rows metadata, and can cause deadlock. If transaction T1 has a shared lock on Table Row r1 and waiting to get the exclusive lock on row r2. Transaction T2 has a shared lock on row r2 and waiting to get the exclusive lock on r1. In this situation both T1 and T2 wait for resources to be released and dependent on each other.

·   Worker Threads: Suppose, one task is in queue and waiting for an available worker thread; this can cause deadlock. Let’s suppose, Session S1 starts a transaction and acquires a shared lock on row r1 and goes to sleep. Meanwhile, the active sessions running on all available worker threads try to acquire exclusive lock (X) on row r1. In this case session s1 cannot acquire worker thread and cannot commit transaction and release the lock on r1, eventually resulting in a deadlock.

·    Memory: Let’s talk about the two concurrent queries, Q1 and Q2 that execute user-defined functions, acquiring 20 MB and 10 MB memory space respectively. If the query needs 30 MB of space and available memory is only 20 MB, then both the queries have to wait for each other to release memory which later results in a deadlock.

·    Parallel Query Execution-Related Resources: In SQL Server producer, coordinator, and consumer threads, associated with exchange port; may block each other and cause deadlock. SQL Server determines the degree of parallelism in case of parallel query execution, based on the current workload on a worker thread.

Multiple Active Result Sets (MARS) Resources. These resources are used to control interleaving of multiple active requests.

1.  User Resource. When a thread is waiting for a resource that is controlled by a user application.
2.    Session Mutex. One task can run under the session at a given time and before the task starts, it must have exclusive access to the session mutex.

Transaction Mutex. One task can run under the transaction at a given time and before the transaction starts, it must have exclusive access to the transaction mutex.

SQL Server Deadlock Detection and Minimization


Deadlock is a state when an executing process is interrupted by subsequent task waiting to be executed. In this situation, the forthcoming process is waiting for the preceding one to be executed and waiting for the resources to be released by the executing process\ threads.

Various Modes of Deadlock

The transaction in deadlock waits until it is broken by some external process. There are distinct states of Deadlock; also known as Coffman Conditions and these include:
  •      Mutual Exclusion: At a given period of time one resource is assigned to a single process.
  •    Resource Holding: A resource is hold by a process which is further waiting for another resources held by other executing process in the loop.
  •    No Preemption: When a process is being completed, it can release the resource that it holds while execution phases.
  •     Circular Wait: In this state, a process is waiting for a resource to be released by another process which is in turn waiting for some other resource.

Process of Detecting Deadlocks

Deadlock Detection scheme first allows the deadlock to occur and then examine it for detecting that it has been occurred; to further correct it. An algorithm is been utilized to detect the allocation of resources that are further restarted or rolled back some of the processes for removing the detected deadlock.

After detecting the deadlock, the Database Engine dismisses the existing batch and the transaction is rolled back and displays error 1205 to the respective application. When a transaction for deadlock is rolled back, the victim then releases all the locks that the specified transaction holds.

Error 1205 records the information in the error log for all the processes\ threads or the resources that are involved; or participated in the formation of particular deadlock. Now, when the deadlock is been detected, it can be corrected by implementing either of the given methods:

· Process Termination: The chain of executing and the waiting processes in the deadlock can be aborted. Either some or entire processes in the queue may be dismissed to ensure fast and speedy resolution.
· Resource Allocation: The allocated resources are successfully released by the processes and are further allocated to the waiting processes in the loop. 

The Concept of Minimizing Deadlocks

Though it is not possible to avoid or eradicate the occurrence of Deadlock but this can be reduced to some extent. The following coding conventions can be used to lower the occurrence of Deadlock state.

View Objects in Actual Order:

The transactions access object in exactly same order in which they were generated. When the executing process is completed or rolled back, the second process in the queue will be started and the same procedure continues for all the following processes.

Avoid User Interference in Transactions:

Avoid transaction that involves user interaction such as the processes that requires user’s manual response. The queries or processes that run without manual interaction are much faster than those that are bounded with user interventions.

Minimizing the Volume of Transactions:

The major cause behind the occurrence of Deadlock situation is the large size of transactions. Minimizing the volume of transactions and retaining them under one batch can help in resolving this issue as the longer transactions holds resources for longer period of time till the process gets completed or rolled back. Keeping the transactions under single batch reduces the load over network as then it does not have to take huge roundtrips.

Ample Resources for the Transactions:

Ensure that there are ample of resources available corresponding to the processes so that each transaction is allocated with at least one resource. More the resources, faster will be the process execution time.

Prevent Locking Contention:

By running the transaction at lower isolation level, the occurrence of deadlock situations can be minimized. Implementation of READ_COMMITTED on transactions allows them to read the data that is previously read but not modified; by succeeding transaction without the completion of previous transaction.

Conclusion:

Deadlock is a critical situation as it affects the execution operation of overall process cycle. It is quite easier to detect the deadlock and to minimize their occurrence for smooth execution of functions. This section includes detailed information over SQL Server Deadlock Detection and Minimization and the measures provided herein can be implemented to rectify such situation.