Wednesday, 15 October 2014

A Truly Random Post

Want to generate a truly random number? It's a little more tricky than it first seems.

First off, TSQL has a RAND function. Simple huh? Not so fast. We'll start with some definitions...

RAND Returns a pseudo-random value of the float datatype from 0 through 1, exclusive.
The value returned is based on the seed, hence providing the same seed always produces the same values.
It is difficult to use RAND in set based results as it is only evaluated once per batch.
It will return the same value for each of the row in a query if the same seed is used.
Therefore to use RANDeffectively you need to use different seed values.

CHECKSUM Returns the checksum value computed over a row of a table, or over a list of expressions. CHECKSUM is intended for use in building hash indexes.
It returns an integer and is good (meant) for generating a one way hash.

NEWID Returns a unique value of type uniqueidentifier i.e. a 16-byte GUID.
for example ; 0EFD9C6D-7FF5-4CB1-A71B-69CF8FF02B4B

ABS Returns the absolute value of the input, i.e. a positive number.


Generating Random Numbers


Method 1 :

This involves using both NEWID and RAND for 2 levels of randomness.
Using CHECKSUM to change a string into a given integer is repeatable (not random), although the values are far from being visually predictable.

Because fractional values are generated in a float i.e. less than 1, the result is multiplied by the maximum value and cast as an integer to generate whole numbers.

Firstly, the workings ...

DECLARE @data_newid  uniqueidentifier
DECLARE @data_int  int
DECLARE @data_float  float
DECLARE @maxvalue  int

SET @maxvalue = 100

-- Generate Unique 16 bit GUID
SET @data_newid = NEWID() 
-- Generate hash value integer of the GUID
SET @data_int = CHECKSUM(@data_newid)
-- Use the hash value to seed the RAND function
SET @data_float = RAND(@data_int)

-- Show Results
SELECT @data_newid as d_uniqueidentifier
SELECT @data_int as d_int
SELECT @data_float as d_float
SELECT CAST(@maxvalue * @data_float AS int) as randomnumber
GO

The abridged version...

DECLARE @maxvalue int
SET @maxvalue = 100
SELECT CAST(@maxvalue * RAND(CHECKSUM(NEWID())) AS int) RandomNumber
GO

Method 2 :

This involves recasting NEWID() first to varbinary, then to int. ABS is used to convert any negative results to positive. Because high values are generated by this method, we use %, the modulo operator to retrieve a result. If we want numbers up to and including 15, then we divide by 16 as 0 to 15 would be the remainder generated by modulo.

-- The Workings

DECLARE @data_newid  uniqueidentifier
DECLARE @data_varbinary varbinary
DECLARE @data_int int
DECLARE @maxvalue int

SET @maxvalue = 100

SET @data_newid = NEWID()
SET @data_varbinary = CAST(@data_newid AS varbinary)
SET @data_int = CAST(@data_varbinary as int)

SELECT @data_newid as d_uniqueidentifier
SELECT @data_varbinary as d_varbinary
SELECT @data_int as d_int
SELECT @data_int % (@maxvalue +1) as randomnumber

The abridged version...
DECLARE @maxvalue int
SET @maxvalue = 100
SELECT (ABS(CAST(CAST(NEWID() AS varbinary) AS int)) % (@maxvalue + 1)) RandomNumber
GO

Method 3

Similar to Method 1, but only using NewID for the random element. Because a checksum can be negative, we use ABS to ensure the result is positive. Then we use the modulo operator like method 2, to get the remainder as an integer.


DECLARE @maxvalue int
SET @maxvalue = 100
SELECT ABS(CHECKSUM(NewId())) % (@maxvalue + 1) RandomNumber
GO


Generating a Random number within a range

The following example adapts version 1 so that you can provide a lower and upper value for the number you want generated.

DECLARE @MaxValue int
DECLARE @MinValue int
SET @MaxValue = 5000
SET @MinValue = 0

SELECT CAST((((@MaxValue) + 1) - (@MinValue))* RAND(CHECKSUM(NEWID())) + (@MinValue) AS int) RandomNumber


Links 

The HOBT : Check your lucky numbers
Less than dot : SET Based random numbers
Blackwasp : Random numbers in TSQL Queries
Technet : Checksum
Technet : Floor
Technet : Round
Technet : Abs
Technet : % (Modulo)

Wednesday, 8 October 2014

Testing SQL Server Connectivity

Starting with the basics, can we see the server? (run ping from the command line)

PING ipaddress/hostname 

Is a response returned from the host?

This checks the host is present.
Caveats : PING (ICMP requests) might be blocked by a firewall.
If you can ping an IP Address by not a hostname, you need to investigate name resolution and DNS.

Check SQL is listening on TCP.

TELNET ipaddress/hostname 1433

Does Telnet open a session succesfully? (Does the command window go blank with a flashing cursor in the top left)

This tests the default instance of SQL Server on the specified (in this case default) port.
The SQL Listener/Browser service is never queried.
Annoyingly TELNET is not installed by default, hence has to be added to versions of Windows from Vista onwards.

If you don't connect successfully, your first checks should be

  1. Is the SQL Server service is running? 
  2. Does the firewall allows TCP 1433?

To connect to a named instance, lookup the port it is running on from it's TCP/IP Properties (under IP Addresses > IP All > TCP Dynamic Ports)
Then TELNET to the hostname with the discovered port number

TELNET ipaddress/hostname 1433

The presence of a SQL Named instance is broadcast by the SQL Browser Service which communicates on UDP port 1434.
If you can connect via port but not by specifying the instance name >

  1. check the named instance is running
  2. check the browser service is is running.
  3. check UDP port 1434 is allowed in the firewall configuration.


If you want to test connectivity via named pipes or Shared Memory you can use SQLCMD as detailed here... Steps to troubleshoot SQL connectivity issues


Further Reading

SQL Server Connectivity Issues with Named Instances

SQL Server – Finding TCP Port Number SQL Instance is Listening on

Using named instances? Test your DAC connection!

Steps to troubleshoot SQL connectivity issues


Thursday, 2 October 2014

TSQL : Revealing Foreign Key Constraints

Back in the day I blogged on revealing Referential Integrity via Information_Schema views. Here is an updated script to do the same.
SELECT 
    ForeignKeyName   = fk.name
   ,TableName   = OBJECT_NAME(fk.parent_object_id)  
   ,ColumnName   = COL_NAME(fkc.parent_object_id, fkc.parent_column_id)
   ,ReferencesTable  = OBJECT_NAME (fk.referenced_object_id)
   ,ReferencesColumn = COL_NAME(fkc.referenced_object_id, fkc.referenced_column_id) 
FROM sys.foreign_keys fk
INNER JOIN sys.foreign_key_columns fkc 
   ON fk.object_id = fkc.constraint_object_id

Monday, 22 September 2014

Ctrl+E (Execute) Not working?

In SQL 2012 Management Studio, my favourite shortcut for running scripts stopped working mysteriously, A message in the bar at the bottom read ...

 (Ctrl+E) was pressed. Waiting for second key of chord...

The reset instructions are shown below the screenshot.



The 'fix' for this, is 
  1. Go to Tools > Options
  2. Open the 'Environment' section of the tree and select 'Keyboard'
  3. On the right you will see a 'Reset' button.
  4. Click it and Keyboard shortcuts will be returned to their defaults.


Tuesday, 9 September 2014

Windows 8.1 : VMConnect Issue and shortcut creation

I recently set up Hyper-V on Windows 8 and was happily using RDP to connect to my VMs to achieve full screen mode via a short-cut.

It wasn't until I was off my local network that I realised that I had been relying on my network router (to resolve the VM names) for Remote Desktop to work in this way. Double clicking on the VM in Hyper-V manager does indeed bring up the VM, but I ideally don't want to launch this first every time.

VMConnect is the tool that lets you access VMs.

At first I had a small issue when I ran VMConnect...



" You do not have the required permission to complete this task. Contact the administrator of the authorization policy for the computer ‘computername’ "

So I followed the advice on Joesph Turley's blog and added myself to the 'Hyper-V Administrators' group, restarted the machine and VMConnect could now see my VMs...


So, returning to my original problem, how do I quickly open a session to my Hyper-V VM without RDP?

The solution is to create a shortcut to VMConnect itself, passing the pc name and vm name as parameters. -

"C:\Windows\System32\vmconnect.exe"    myPCname    myVMname

I managed to make to launch into full screen mode, as the first time you connect you can select the resolution that it will reuse for future connections.



Monday, 8 September 2014

TIWorker.exe

Today I noticed the hard drive light on my laptop furiously flashing away. Given I had not started any applications yet I pulled up Performance Monitor to investigate. The 'Disk' tab shows 'Processes with Disk Activity' and TIWorker.exe was at the top, merrily reading and writing to my drive.

So, what is it?

My limited research (googling - a verb now!) suggests it is a Windows Installer, connected to the Windows Update Service. Even when you are not updating, it may be tidying and compressing old update files in the background!

There are multiple links to problems that used to be present with tiworker.exe, all reportedly resolved by Windows Updates (Guess what I'm doing next...)

Links




Thursday, 4 September 2014

Cannot set a credential for principal 'sa'

I was setting up a VM for testing purposes today and had cause to change my password on a SQL 2005 instance (admittedly using the 2008R2 tools). A trivial, quick task via management studio, or so I thought. I came up against the following error.

Msg 15535, Level 16, State 1, Line 1 
Cannot set a credential for principal 'sa'.

The 'Credential' tick box was not ticked, in-fact it was greyed out meaning I could not select it either.
The solution was to turn to Tsql.

Run the following against the master database

ALTER LOGIN [sa] WITH PASSWORD=N'n1ncomp00p'

No, that isn't one of my passwords, but I like it...

Links
Cannot set a credential for principal 'sa'. (SQL Server 2005/2008)


Monday, 11 August 2014

SQL 2012 Wait Type : Dirty Page Poll

Whilst running some tried and tested scripts to look at wait types, I came across one I had not seen before. It is a background thread that looks to see if there are dirty pages that need to be flushed to disk. Given I found it on a barely used test box, I was not worried.

To quote the ToadWorld site, 'It is a benign wait type and shouldn’t be any concern if you happen to see high values for this wait type on your systems'.



Toadworld : Dirty Page Poll Wait Type in SQL 2012


Thursday, 7 August 2014

Adventureworks : Large Sample Database

A colleague wanted a 'large database' (he didn't care what) for testing perfomance of a backup solution. Given privacy, security, ethics etc prevent me from parting with a genuine database and creating mock/random data is time consuming I was faced with an issue.

Creating an empty database and simply sizing the files wouldn't work as the backups would be highly compressable i.e would be minuscule. Likewise the data compression that I've applied to my backups by default also needs to be off.

I recalled seeing a tweet about a custom version of AdventureWorks and after a short search I found Enlarging the AdventureWorks Sample Databases. This script adds random data, increaing the size of AdventureWorks by creating SalesOrderHeaderEnlarged and SalesOrderDetailEnlarged tables. It can also easily be adapted to create larger datasets (Hint : Increase the number of UNIONed tables in the derived table).



Monday, 7 July 2014

Function dbo.StripUnwantedCharacters - Cleaning up unwanted ASCII characters

Having determined I had some unwanted NUL characters in my data (see ASCII Value Character Counts Procedure) I set out to remove them.

I tried permutations of LIKE and CHAR to find the character as well as REPLACE and CHARINDEX , all to no avail. Then I spotted this -

 0x0000 (char(0)) is an undefined character in Windows collations and cannot be included in CHARINDEX.

This was on the MSDN page for CHARINDEX

You can read the whole string character by character however and recreate it, removing those characters. This is what my function below does.


CREATE FUNCTION dbo.StripUnwantedCharacters (@InputString VARCHAR(MAX)) 
RETURNS VARCHAR(MAX)
AS 
BEGIN

DECLARE @OutputString varchar(MAX)
DECLARE @CurrentCharacterValue INT
DECLARE @Position INT

SET @OutputString = ''
SET @Position = 1

WHILE @Position < LEN(@InputString)+1
 BEGIN
 SET @CurrentCharacterValue = ASCII(SUBSTRING(@InputString, @Position, 1))

 IF @CurrentCharacterValue >= 9 -- Exclude Characters where ASCII value less that 9
  BEGIN
  SET @OutputString = @OutputString + CHAR(@CurrentCharacterValue)
  END

 SET @Position = @Position + 1
 END
RETURN @OutputString

END
GO

Code to use this procedure is as simple as ...

UPDATE tablename
SET cleandata = dbo.StripUnwantedCharacters(dirtydata)

Or too generate sql for all columns ...

SELECT 
'UPDATE [' + TABLE_SCHEMA + '].[' + TABLE_NAME + '] SET ' + COLUMN_NAME + ' = RTRIM(dbo.StripUnwantedCharacters(' + COLUMN_NAME + '))' 
 from INFORMATION_SCHEMA.COLUMNS
 WHERE DATA_TYPE IN ('char','varchar')
GO

Sunday, 6 July 2014

ASCII Value Character Counts Procedure

I wrote this to determine the ASCII values being stored inside my data. It builds upon Determine ASCII Values of a string and produces a breakdown of all the ASCII values in a column.
To cut a long story short I had an issue where NUL characters (ASCII value 0) were stored and were producing unexpected results.

Code for the results table and procedure is here -

IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('dbo.charactercount'))
 CREATE TABLE dbo.charactercount
 (
  [id] int identity(1,1)
 ,[table_schema] sysname
 ,[table_name] sysname
 ,[column_name] sysname
 ,[character] int
 ,[column_count] int
 )
GO

IF EXISTS (SELECT 1 FROM sys.procedures WHERE name = 'charactercount_insert')
 DROP PROCEDURE dbo.charactercount_insert
GO
CREATE PROCEDURE dbo.charactercount_insert
(@table_schema sysname
,@table_name sysname
,@column_name sysname
,@sample_rows int = NULL)
AS
BEGIN
SET NOCOUNT ON
DECLARE @ColumnData table
(
UniqueRowID int IDENTITY (1, 1) Primary key NOT NULL ,
testeddata nvarchar(MAX)
)
DECLARE @sql NVARCHAR(MAX)
DECLARE @CurrentRow int  
DECLARE @TotalRows int  
DECLARE @Index INT
DECLARE @IntASCII INT
DECLARE @StatusMessage varchar (100)
DECLARE @currentdata NVARCHAR(MAX)
DECLARE @currentrowlength INT

IF @sample_rows IS NULL 
 SET @sql = 'SELECT [' + @column_name + '] FROM [' + @table_schema + '].[' + @table_name + ']'
ELSE
 SET @sql = 'SELECT TOP (' + CAST(@sample_rows AS VARCHAR(10)) + ') [' + @column_name + '] FROM [' + @table_schema + '].[' + @table_name + ']'
  
SET @StatusMessage = 'Fetching: [' + @column_name + '] FROM [' + @table_schema + '].[' + @table_name + ']'
RAISERROR (@StatusMessage , 10, 1) WITH NOWAIT 


INSERT INTO @ColumnData (testeddata)
EXEC(@sql)

SELECT @TotalRows = MAX(UniqueRowID) FROM @ColumnData

SET @StatusMessage = CAST(@TotalRows AS VARCHAR(20)) + ' rows fetched'
RAISERROR (@StatusMessage , 10, 1) WITH NOWAIT 

SELECT @CurrentRow = 1

SET @StatusMessage = 'Counting Characters'
RAISERROR (@StatusMessage , 10, 1) WITH NOWAIT 

 IF EXISTS (SELECT * FROM tempdb..sysobjects WHERE name LIKE '#charactercount')  
    DROP TABLE #charactercount

 CREATE TABLE #charactercount
 (
  [id] int identity(1,1)
 ,[table_schema] sysname
 ,[table_name] sysname
 ,[column_name] sysname
 ,[character] int
 ,[column_count] int
 )

WHILE @CurrentRow <= @TotalRows  
 BEGIN   

 SET @Index = 1

 SELECT @currentdata = testeddata
 FROM @ColumnData
 WHERE [UniqueRowID] = @CurrentRow

 SELECT @currentrowlength = LEN(@currentdata)

 IF EXISTS (SELECT * FROM tempdb..sysobjects WHERE name LIKE '#charactercount')  
    TRUNCATE TABLE #charactercount

 WHILE @Index < 1 +(@currentrowlength)
  BEGIN

  SELECT @IntASCII = ASCII(SUBSTRING(@currentdata, @Index, 1))
  FROM @ColumnData
  WHERE [UniqueRowID] = @CurrentRow

  INSERT INTO #charactercount
  (table_schema ,table_name ,column_name ,character,column_count )
  SELECT @table_schema AS Table_Schema
    ,@table_name AS Table_Name
    ,@column_name AS Column_Name
    ,@IntASCII AS [Character]
    ,1 AS Number 

  SET @Index = @Index + 1
  END

  -- At end of text, update the character count 
  MERGE [dbo].[charactercount] AS [Target]
  USING (SELECT table_schema AS Table_Schema
      ,table_name AS Table_Name
      ,column_name AS Column_Name
      ,character AS character
      ,SUM(column_count) AS Number
      FROM #charactercount
      GROUP BY table_schema, table_name, column_name,character) AS [Source]
  ON  Source.Table_Schema = Target.Table_Schema
  AND Source.Table_Name   = Target.Table_Name
  AND Source.Column_Name  = Target.Column_Name
  AND Source.character  = Target.Character
  WHEN MATCHED 
   THEN UPDATE SET [Target].[column_count] = [Target].[column_count] + Source.Number
  WHEN NOT MATCHED 
   THEN INSERT ([table_schema] 
      ,[table_name] 
      ,[column_name] 
      ,[character] 
      ,[column_count])
     VALUES
      (Source.table_schema
      ,Source.table_name 
      ,Source.column_name
      ,Source.Character
      ,Source.Number);

 SELECT @CurrentRow = @CurrentRow + 1  
  END


 END 
SET @StatusMessage = 'Done - Check dbo.charactercount table'
RAISERROR (@StatusMessage , 10, 1) WITH NOWAIT 
GO


Code to use this procedure is as follows ...
--Remember to clear down the results table before you run

TRUNCATE TABLE dbo.charactercount
GO


-- Provide the name of a column, like this
-- The final parameter is the number of rows to test.
EXEC charactercount_insert 'schemaname', 'tablename', 'columnname', 200
GO



-- View the data like this 
SELECT * FROM charactercount ORDER BY [character]
GO


-- Dynamically generate statements to interogate all columns, as follows

SELECT command = 'EXEC charactercount_insert ''' + TABLE_SCHEMA + ''',''' + TABLE_NAME+ ''',''' +  COLUMN_NAME + ''',200'
FROM INFORMATION_SCHEMA.COLUMNS 
WHERE DATA_TYPE LIKE '%char%'

Wednesday, 2 July 2014

Stored Procedure Parameter Validation

This procedure isn't supposed to do anything. It's merely here as a reminder to validate your input parameters!
Here I'm using RAISERROR to return error code 18 and terminate the stored procedure...
18 is a Nonfatal Internal Error that allows you to return the message to the log.

CREATE PROCEDURE [tools].[dosomething]
(
     @databasename  varchar(200) 
    ,@recoverymodel             varchar(11)
 
)
AS

BEGIN

IF @recoverymodel NOT IN ('FULL','SIMPLE','BULK_LOGGED')
 BEGIN
     RAISERROR('Invalid parameter: @recoverymodel should be FULL, SIMPLE or BULK_LOGGED', 18, 0)
  RETURN
 END

-- Do stuff here...

END

Monday, 30 June 2014

TSQL : Finding the last occurrence of a character

DECLARE @fullpath VARCHAR(500)
SET @fullpath = 'E:\Masters\MP3 DJ Masters\MP3 CD 320\Artist Albums\Sonique\Hear My Cry\01 Sonique - It Feels So Good.mp3'
 
SELECT   CHARINDEX('\', REVERSE(@fullpath)) AS [CharPosition] -- Final '\' character in the data
                ,LEFT(@fullpath,LEN(@fullpath) - CHARINDEX('\', REVERSE(@fullpath))) AS [Folder]
                ,RTRIM(RIGHT(@fullpath, CHARINDEX('\', REVERSE(@fullpath)) - 1)) AS [FileName] 

Thursday, 26 June 2014

SQL : Numeric Data Search Script - Version 1

After investigating another unknown database, I have adapted my Data Search Script to search for values in numeric columns.
/*
NUMERIC DATA SEARCH SCRIPT

version 1 :  

YOU MUST RUN THIS IN GRID VIEW!!! 
*/

declare @columncount int
declare @searchdata int
-- set string to search for here
set @searchdata = 200908

select @columncount = count(*) from information_schema.columns
 where data_type in ('int','bigint')
 and table_name not like 'sync%'
 and column_name <> 'order' 
 and table_name in (select name from sys.sysobjects with (nolock)  where type = 'u')
select 'declare @resultslist varchar(max)' + char(10) + 'set @resultslist = '''''
union
select 'if exists (select 1 from [' + 
  table_schema + '].[' + table_name + 
  '] with (nolock) where [' + 
  column_name  + '] = ' + CAST(@searchdata as VARCHAR(15)) + ' ) ' + char(10) +
  ' begin ' + char(10) +
  '  set @resultslist = @resultslist + char(10) + ''select ''''' + table_name + ''''' as [table_name], [' + column_name + '],* from ['+ table_schema + '].[' + table_name + '] with (nolock) where [' + column_name + '] = ' + CAST(@searchdata  as VARCHAR(15)) + ''''+ char(10) +
  '  raiserror( ''' + table_schema + '.' + table_name  + '/' +column_name  + ''', 10, 1) WITH NOWAIT' + char(10) + 
  ' end ' + char(10) + 
  '' + char(10) +
  ' raiserror( ''' + cast(cast(cast(rank() OVER (ORDER BY table_schema, table_name, column_name) as float)/ @columncount *100 as decimal(10,2)) as varchar(10))+ '%%' +
  ''', 10, 1) WITH NOWAIT' + char(10)
 from  information_schema.columns with (nolock)
 where data_type in ('int','bigint')
 and table_name not like 'sync%'
 and column_name <> 'order' 
 and table_name in (select name from sys.sysobjects with (nolock) where type = 'u')
union
select 'print @resultslist' + char(10) 


Tuesday, 27 May 2014

My Perfect World

This is an document on 'approach' I've composed this for a number of reasons. It covers a lot of things I've done/could do/will do on future projects. Each area could be massively expanded in terms of detail but it should be enough to inspire ideas in others / keep me on the right track.

Introduction

The industry has encouraged front end developers to treat databases purely for storage. Sadly the average developer knows little beyond SELECT, INSERT, UPDATE & DELETE.

ORM frameworks are the current trend, marketed at speeding development, portability of code and encapsulating business rules. This however can be at the cost of SQL efficiency and ultimately performance.

On my current project, the following facts are regularly ignored –

  • The database server has far greater processing capability & memory than the client.
  • SQL has a wealth of functionality which (when used correctly) performs fantastically.
  • RDBMS programmability is constantly evolving.


This post therefore is to put on record the things I would do in a perfect world.  They aim to be all encompassing for those starting out on a new transactional application.

Database design

Normalise as much is practical. Aim for 3NF (3rd Normal Form) but be flexible if performance is going to be compromised by this goal. Do not let an ORM tool design the database. The result will have far too many objects and as a result will be heavy on the number of joins.

Data Types

Closely match the data you want to store with a data type.  For Numeric types this means the precision.  For text this means looking at the length and characters you need to store.

Only use Unicode column types (NCHAR, NVARCHAR) when you need to store Unicode. They require twice the storage yet are frequently used ‘just in case’.

Ensure you don’t use Deprecated data types, i.e. ones that will be removed in future versions of SQL Server. Ntext, text and image were listed as deprecated for SQL 2008, but actually still appeared in both SQL 2012 & 2014.

Data Sizing

Numerical Data types should allow for growth but not be wasteful of storage. Chose tinyint, int, bigint etc appropriately.

Character Columns should be sized appropriately for their content. Do not just guess and oversize them due to not researching the data. Chose between CHAR , VARCHAR  and their Unicode equivalents appropriately.

By default CHAR(n) will store n characters and the storage requirement will be n bytes. Unless overridden by ANSI_PADDING settings, CHAR columns containing less data will be padded with blanks up to n characters.

VARCHAR(n) will store up to n characters without padding. There is a 2 byte storage overhead for defining a column as variable length, hence  storage required is n+2 bytes.

Stay away from VARCHAR(Max) unless you really need over 8000 characters (the maximum VARCHAR can be defined as). VARCHAR(MAX) can hold up to 2147483647 characters. In reality, the sizes and types of the other columns present in your table will determine  if data is not stored ‘in row’.


Nullability

NULL is the presence of no data and is different from an empty string. NULL values represent data that we do not have.  NULLability often provokes strong feelings from those who have not been involved in database design. Some languages (e.g coldfusion) treat it as an empty string which simplifies matters for front end developers.

  • Numeric NULLs are important as you cannot represent an unknown number using 0 (zero).  Zero is not the same as unknown when dealing with finances or statistics.
  • Character columns could represent missing data as either an empty string ‘’ or as NULL.  This depends on the application that entered the data. Allowing both would mean you must test for both to return empty columns.

There are a lot of theories, personal preferences and corresponding web posts surrounding the use of NULLs. Proving or disproving them is difficult across multiple database platforms, implementations of SQL and table sizes.

Storage

  1. In a fixed size column e.g. INT, storing NULL takes the size of the column. (For an INT, 4 bytes). 
  2.  In a fixed length column e.g.CHAR, storing NULL takes the length of the column.  (For CHAR(10), 10 bytes).
  3. In a variable length column e.g. VARCHAR , a NULL is stored as 0 bytes (for the data) + 2 bytes (the storage overhead for defining ‘variable’ length).
Code Complexity 

Handling NULL values correctly in SQL Server is done via
  1. The WHERE clause e.g. WHERE column IS [NOT] NULL
  2. Inline Functions e.g. COALESCE, ISNULL, NULLIF.
NULL values cannot be compared using equals (=). If an application allows a character column to be either NULL or an empty string  (‘’) then you must account for both.

Performance

If there are many NULL values in a queried column, the index SQL Server may decide it is easier to perform a ‘table scan’, i.e. ignore an index in place. A table or clustered index scan means the value of the queried column is compared for each row in turn. This could become an issue as a system grows.
From SQL 2008 onwards a filtered index could be utilised to explicitly ignore NULL values.

NULL By Design

By allowing NULLs in a (traditionally Boolean) BIT column, the data type can store 3 states.
1, 0, NULL representing TRUE, FALSE and UNKNOWN.
1 byte is used to store up to 8 bit columns, 9 to 16 bit columns will be stored in a 2nd byte etc.
Storing multiple BIT columns in the same table occupies a minimal amount of storage.

Some ideas to avoid NULLs
  • Use Left Joins to a second table to return the presence of data, rather than hold NULLs in the source.
  • Horizontally partition a table into mandatory , frequently used & optional columns. Place optional columns where NULLs are frequently present in the second table with a 1-2-1 mapping on primary key.
  • Replace NULLs with a fixed value  ( -1) to represent unknown. This would be indexable. Use default values when defining columns to populate this default value
  •  Expand upon this idea with further keys for Not Applicable, Not Yet Known, Truly Unknown.
  • Mask NULLs from the application by using SQL Programability Objects (views , functions & stored procedures).

Constraints 

There are 4 types of constraint that can be defined on a column in a table.

Primary Key Constraint

A Primary Key is a column (or group of columns) that uniquely identify a row.
A Primary Key cannot be null.  No duplicates are allowed in primary keys.

Foreign Key Constraint

Foreign Keys enforce integrity of relationships and prevent orphaned records. 
They will have an overhead in terms of enforcing the order in which data is loaded.

Check Constraint

A check constraint can ensure data validity and enforce business logic by only allowing values that match a defined range.

Unique Constraint

A column can be required to have unique contents by a UNIQUE Constraint in the definition of the table. It is implemented as a UNIQUE Index internally to SQL Server. Personally I think it is better to explicitly define a unique index on a column instead.

Advantages of a unique index are that it can be -

  • explicitly named (and therefore tested for in rollout scripts)
  • searched (it is an index after all)
  • disabled (if you wanted to allow duplicate values, you cannot disable a unique constraint without altering the table itself
  •  named in an index hint (if you find it necessary to force index use)
In a transactional database, having these constraints in place is best practice and ensures data integrity. Attempts to violate these rules could be captured by a well written application.

In a data warehouse scenario where speed of loading records is paramount, lots of constraints would not be appropriate. Having them would slow down record insertion, but data would also have already been validated at source.



Style

Chose a style and stick to it!
Have a naming convention for objects.
Avoid punctuation or characters in object names that you cannot quickly type.
Avoid Reserved Words, i.e. don’t call a table ‘table’, a procedure ‘procedure’ (or vice versa) or a column ‘datetime’.

Adopt a coding style you can easily maintain.
Place SQL keywords on separate lines.
Use tabs and spaces to align column and table names.
Sensibly alias joined tables with meaningful names.
Format complicated WHERE clauses with brackets to make them more readable and split them over multiple lines. Operator Precedence is not an easy thing to memorise

Schemas 

Schemas were introduced in SQL 2005. Whilst they have management and security benefits their most immediate use is to allow the separation of database objects into logical groups. The AdventureWorks downloads on codeplex are a good place to see this in action.
I would use them to separate Application data into areas of Configuration, Access Control, Auditing as well as the application data itself.


Programming SQL - Use the database

At the time of writing, I would approach a new application with the following techniques.
  • Use Table Valued Functions and Stored Procedures to return data.
  • Use Views (carefully) if sql code is reused frequently. Do not build views on views.
  • Use Stored Procedures to perform DML operations (INSERT, UPDATE & DELETE) rather than allowing an application direct access to the tables.
    (ORM Frameworks can still use stored procedures – You can market this as the best of both worlds).
  • Perform error checking with TRY/CATCH blocks.

I would do my upmost to avoid …
Scalar functions
These are called once for every row in the results set
They run on a single thread
(they do not scale)

RBAR Processing
An acronym for ‘Row By Agonising Row’, this describes the iterative nature or repeating the same code again and again. This could be in CURSORS or a WHILE loop, the thinking here is the opposite of a SQL SET based approach.

Triggers
I am not a massive fan of triggers as they can quickly make a database difficult to maintain.
They hide business logic and need to be documented upfront (which we all do, right?)
One of the first things to do when joining a development project is to look for the existence of triggers.

They do allow some useful functionality e.g. an INSTEAD OF trigger can allow a view to become updatable by containing a code to update multiple underlying tables correctly.


Security – Authentication & Authorization
From Day 1 define the necessary security and create ;
  1. User accounts the application will require. Use the principle of least privilege to assign permissions to these logins.
  2. A maintenance account for executing jobs that manage the SQL environment e.g. backups, maintenance plans etc.
  3. An administrator account for managing the server and applying updates.
You can chose to tie your application security and sql security together via Windows Authentication or develop separate tables for logins & passwords.


Testing

The world of testing has multiple disciplines nowadays. From a database developers perspective there are a number of specific areas of interest.

Unit Testing
Unit Testing is a method of functional testing that can be applied to small modules of code (units). SQL Stored procedures and functions lend themselves well to Unit Testing as they perform actions and provide outputs according to supplied parameters i.e. their inputs.

Detailed Database Testing can also be done to ensure all aspects of database programming, data validity and referential integrity. I would personally be doing this as I go along but the technique is perfectly valid for checking the work of others. http://en.wikipedia.org/wiki/Database_testing

Several ‘Non Functional’ testing disciplines are essential too.
Performance Testing

These are conducted on designated hardware and can be subdivided into
  • Configuration Testing – How making configuration changes on known hardware affects database performance.
  • Load Testing – How the system behaves under a defined load
  • Stress Testing – Increasing the load to determine the limits of a system.
  • Soak Testing – Endurance testing to ensure a system can cope with a continuous load.
  • Spike Testing – How the system performs when suddenly under a greatly increased load, i.e. number of concurrent users, number of active transactions.
  • Volume Testing – How the size of a database can influence all the above tests.
Scalability Testing adds to the above list by measuring how well a system can –
  • Scale up (improve hardware by adding RAM, CPUs or disks)
  • Scale out (add extra servers)
Both approaches could enable more users and faster response times if the application/database code also scales.


Applied SQL Testing

Database performance testing does not need to become complicated and can be aspired to and repeated as development progresses.

Data Generation

Generating a volume of test data is relatively easy. Especially as test data sets and even random data generators are freely available on the web. Random transactional data can be generated ad-hoc. If stored procedures have been used by the application to enter data, calling them again from a TSQL WHILE loop is an easy task and will quickly build a valid data set far faster than the application is able to.

Concurrency

Testing concurrency means a reproducing a realistic load.  In reality users will simultaneous be performing different actions at the same time. User A will be entering records, User B will be searching, User C will be amending data etc.

Testing an application can be automated with expensive tools, or could be mimicked in SQL. To do this, we need to know patterns of usage i.e. the order in which stored procedures are called for a given application action. These can be reproduced in test scripts and populated with random data at run time. Timings can also be recorded in a table with little overhead. 

Dynamically generating SQL agent jobs is a technique we can draw on to generate multiple requests. We can randomise the number of simultaneous users, the actions they are performing and the timings between requests. This will not reproduce client or network issues as results would not leave the server. It provides a solid benchmark for the capability of the SQL backend.

Data Changes

Knowing what data has changed, when, how and by whom fulfils auditing requirements. It can also be used to intelligently load a data warehouse with incremental changes. To achieve this I advocate ensuring development efforts are compatible with a form of change tracking right from the start.

Change Tracking (SQL 2008+)

Change Tracking provides a synchronous update of data changes. Once enabled on a table, you need to query the functions that return the changes within 24 hours, else they will be lost.

Change Data Capture (SQL 2008+)

This requires Enterprise Edition of SQL and utilises the transaction log and sql server agent to provide an asynchronous (after transaction)  update.

Roll your own!

A variety of methods can tell if a row has changed. Using TRIGGERs is the simplest, but deploying a custom auditing solution could also be achieved using either ROWVERSION columns or column CHECKSUMs.


Documentation

Document your schema, code, standards and changes from day one.
Tools exist to assist with documentation, but SQL Server’s own ‘Extended Properties’ are a highly flexible way to achieve this too. The end result is a self-documenting database.

Application Design Considerations 

Here are the things I wish applications would do.

Keep application code light  

Push all data manipulation back to the database server.

Perform connectivity handling 

By this I mean handle timeouts by allowing retries, customising retry intervals and the number of retry attempts performed. During this process provide the user with a display e.g. a countdown to the next retry.

Be a true multi user application

Implement checking to see if a record has been changed by another user since it was loaded. It may be a rare event that two users would change the same client record, but I would advocate adding checking to ensure data is not lost in this way.

Centralise Configuration Information

Hold Configuration information in SQL Tables. Ideally an application configuration file would be a single xml file containing a server, database, and user names as well as an (encrypted) password. All other configuration items would be fetched from database table (in the Config schema!). Login would be permitted by tables in the AccessControl schema and tables in the Config schema would them be queried again for user specific settings.

Monday, 26 May 2014

SQL Resources #1 : Community Sites & Bloggers

A couple of colleagues have expressed an interest in learning SQL recently.
Whilst I am helping with specific queries and the basics on many concepts, here are the starting blocks of self help, the email I send on SQL Resources.

SQL Server has a large following. There are lots of great free resources and tools.

The number one site to sign up to is >
It has newsletters, forums etc, I have been a member for years.

Redgate also does another community site with some great articles.

Without getting too deep (yet), these are the main folk to follow.
(Sign up to their newsletters / RSS)


Twitter is a good place to ask for help too using the hashtag #sqlhelp

Sunday, 25 May 2014

Required Reading : Primary Keys (link)

An excellent (and balanced) article about Primary Key implementation in SQL Server.

https://www.simple-talk.com/sql/learn-sql-server/primary-key-primer-for-sql-server/

I'm putting this link here after recommending it to others. Although long, it is by far the best reading on Primary Keys, Clustered Indexes and implementation practicalities.

Monday, 21 April 2014

SQL Development : Query Tips

Here is my checklist for efficient queries!


SELECT

Always minimize the data set being returned
  • Only SELECT the columns you need
  • Use a WHERE clause to only bring back the rows you need.

Never use SELECT * in production code.
  • This is lazy, NAME the columns you need
  • If you do not need all the columns, you are wasting I/O, network traffic and memory by returning them all.
  • Even if you do currently need them all, if columns to the queried tables are added at a later stage, you’ll return them too.
  • If your query joins tables, then multiple columns of the same name may be returned e.g. id columns.
  • SELECT * will tell the optimizer you need all columns, which may conclude that a table scan may be the most efficient way to return them.
  • The ordinal position of columns should not be relied upon for returning data.

FROM clauses.


Qualify object names with their Owner/Schema name.
  • This prevents SQL looking up the schema of the object (table/view/stored procedure/function)
  • The default schema is dbo, but it can be set on a user basis. 
  • Using fully qualified object names simplifies matters and prevents confusion at implementation.

Further Reading -

JOIN

JOIN tables according to the ANSI 92 syntax, using JOIN conditions named in the ON clause.

The ANSI SQL-92 standard provided more specific join syntax, with join conditions named in the ON clause.

SELECT [columnlist]  FROM Schema1.Table1
[ INNER | LEFT OUTER | RIGHT OUTER | FULL OUTER | CROSS ] JOIN Schema1.Table2
ON Table1.id = Table2.id

Avoid the old ANSI-89 syntax >

SELECT [columnlist] FROM Schema1.Table1, Schema1.Table2 WHERE Table1.id = Table2.id;

Ensure JOINed columns have matching data types


UNION

Use UNION and UNION ALL appropriately; use the latter in preference if possible i.e. you are not testing for uniqueness across the results set.


WHERE

  • Ensure WHERE clauses (and JOIN conditions) are Sargable.
  • Compare alike data types to prevent Implicit Conversions.
  • Use the least likely true AND expression first. This prevents other options being evaluated if it is FALSE.
  • Place the most likely value first if providing a list in an IN clause.
  • Adopt Boolean Logic if creating Conditional WHERE clauses
  • If doing extensive searching of text data, consider implementing Full Text Searching and CONTAINS.

GROUP BY

    Use DISTINCT instead if no aggregate function is required
    Use WHERE to restrict the size of the data set
    Use HAVING to manipulate group data.


ORDER BY


If the ordering of results is necessary,
  • Keep the width and/or number of sorted columns to the minimum
  • Keep the number of rows to be sorted to a minimum


Record Existence -

Evaluate how you Check for the existence of records -
  • Avoid using COUNT(*)
  • JOINs will return all matching rows, hence only use where a value is unique.
  • IN and EXISTS will only return a matching row once.
EXISTS vs IN (old argument)
  • EXISTS is traditionally preferred as it returned when it found the first match.
  • IN continues to evaluate all values.
  • It would appear the query optimizer now evaluates EXISTS and IN similarly.
Beware NULL values in tested columns.
NOT EXISTS ignores NULLs and therefore is preferable over using NOT IN or LEFT OUTER JOIN with a NULL condition check.

Ensure the column you are testing is indexed!

Wednesday, 9 April 2014

SQL Development : Implicit Conversions

Implicit Conversions happen when differing data types are compared.

They can occur in WHERE clauses and table JOINS and can be seen in the execution plan of a query as a CONVERT_IMPLICIT predicate. Data Type Precedence often hides these issues as there are no obvious errors when developing, queries still return results etc!




Further Readfing -