Stephen A. Fuqua (SAF) is a Bahá'í, software developer, and conservation and interfaith advocate in the DFW area of Texas.

Replacing Ad Hoc Query Text When Fields Change

June 27, 2007

Situation: We have a stored procedure running a query whose WHERE clause is given as a parameter. No, that's not the problem in and of itself, at least not today =). [Treat this as a non-negotiable requirement for now]. Within that WHERE clause there might be a query against a field, call it myField. This field is a varchar and wildcards are not used. Platform: SQL Server 2005.

Problem: myField has been changed to a varbinary field and holds an encrypted value — thus can no longer query directly against it. How do we make this work?

Solution: First a tangent — querying against a hash value. Right now we'd have to decrypt every single value of myField, compare it to the WHERE clause, and see if we had a match. Yuk. Big drawback of encryption. Thankfully, there's another way — save a unique and irreversible hash of the encrypted value and then compare against this.

Thus I have an additional column called myFieldHashed. When I save data to myField, I also take the raw data, run it through a hashing algorithm with the T-SQL HashBytes command, and save that value. Now my query against myField will no longer be DECRYPT_COMMAND(myField) = '', but rather myFieldHash = HashBytes('algo', ''). (DECRYPT_COMMAND is not a real SQL command, just a lazy substitute).

Unfortunately neither methods work right now — hashing or decrypting — because the WHERE clause will not contain either the decrypt command or the HashBytes command.

What we need to do is extract anything that looks like myField = '' and replace it with the proper command. A caveat: if there are nested AND or OR statements, this phrase could be repeated. Repitition indicates that we might want to use a UDF that can be called recursively.

So let�s get to it. First let�s create a temp table upon which to operate:

CREATE TABLE #Temp (myField varchar(6), myFieldHash varchar(256), anotherField varchar(1))
INSERT INTO #Temp (myField, myFieldHash, anotherField) SELECT 'value1', HashBytes('MD2','value1'), '1'
INSERT INTO #Temp (myField, myFieldHash, anotherField) SELECT 'value2', HashBytes('MD2','value2'), '2'
INSERT INTO #Temp (myField, myFieldHash, anotherField) SELECT 'value3', HashBytes('MD2','value4'), '3'
SELECT * FROM #temp

And here�s a WHERE clause that might be passed into our stored procedure:

DECLARE @WHERE as VARCHAR(1000), @stmt as VARCHAR(1000)
SET @WHERE = 'WHERE (myField = ''value1'' AND anotherField = ''1'') OR (myField = ''value3'')'

Need to extract the myField = 'xxx' portion and later insert a new clause in the exact same position. Thus must preserve string before and after this clause, and need to know the position of the word "myField" and of the two apostrophes surrounding the criteria in order to isolate that criteria. Declare some variables.

DECLARE @before as varchar(1000), @after as varchar(1000), @criteria as varchar(1000), @pos_myField as INT, @pos_APOS_1 as INT, @pos_APOS_2 as INT

Find "myField = ", assuming there are spaces around "=". And since we�ll be replacing "myField" with "myFieldHash", make sure to capture the space and equal after "myField" so as to not accidentally match "myFieldHash".

SET @pos_myField = CHARINDEX('myField = ', @WHERE)

First apostrophe after "myField":

SET @pos_APOS_1 = CHARINDEX('''', @WHERE, @pos_myField + 1)

Now the second apostrophe:

SET @pos_APOS_2 = CHARINDEX('''', @WHERE, @pos_APOS_1 + 1)

Now extract the three parts:

SET @before = SUBSTRING(@WHERE, 1, @pos_myField - 1)
SET @criteria = SUBSTRING(@WHERE, @pos_APOS_1 + 1, @pos_APOS_2 - @pos_APOS_1 - 1)
SET @after = SUBSTRING(@WHERE, @pos_APOS_2 + 1, LEN(@WHERE)-@pos_APOS_2)

Rebuild with HASH field name

SET @return = @before + ' myFieldHash = HashBytes(''MD2'', ''' + @criteria + ''') ' + @after

Try that out now, if you like! You�ll find that it works as expected — on the first myField criteria anyway. There might be a better solution out there, but this is the one I�ve come up with and it does the trick =).

Now we need to get the second. While I initially suggested that this might be a recursive function, perhaps it would be better to simply put a loop into the function

WHILE (@pos_myField <> 0)

End result:

CREATE FUNCTION dbo.fnFixMyFieldAdHoc(@WHERE varchar(1000))
RETURNS varchar(1000)
AS

BEGIN

     DECLARE @return as varchar(1000)
     SET @return = @WHERE

     DECLARE @before as varchar(1000), @after as varchar(1000), @criteria as varchar(1000), @pos_myField as INT, @pos_APOS_1 as INT, @pos_APOS_2 as INT

     SET @pos_myField = CHARINDEX('myField = ', @return)

     WHILE (@pos_myField <> 0)
     BEGIN

          SET @pos_APOS_1 = CHARINDEX('''', @return, @pos_myField + 1)

          SET @pos_APOS_2 = CHARINDEX('''', @return, @pos_APOS_1 + 1)

          SET @before = SUBSTRING(@return, 1, @pos_myField - 1)
          SET @criteria = SUBSTRING(@return, @pos_APOS_1 + 1, @pos_APOS_2 - @pos_APOS_1 - 1)
          SET @after = SUBSTRING(@return, @pos_APOS_2 + 1, LEN(@return)-@pos_APOS_2)

          SET @return = @before + ' myFieldHash = HashBytes(''MD2'', ''' + @criteria + ''') ' + @after

          SET @pos_myField = CHARINDEX('myField = ', @return)

     END

     RETURN @return

END
GO

GRANT EXECUTE ON fnFixMyFieldAdHoc TO PUBLIC
GO

Now we can test out the function:

SET @WHERE = dbo.fnFixMyFieldAdHoc(@WHERE)
PRINT @WHERE
SET @stmt = 'SELECT * FROM #Temp ' + @WHERE
PRINT @stmt
exec (@stmt)

And let�s also make sure the function does not error out if there are no instances of "myField":

SET @WHERE = 'WHERE anotherField = ''1'''
SET @WHERE = dbo.fnFixMyFieldAdHoc(@WHERE)
PRINT @WHERE
SET @stmt = 'SELECT * FROM #Temp ' + @WHERE
PRINT @stmt
exec (@stmt)

Now we have our solution. It could be done in a more robust fashion using regular expressions (i.e. would be able to handle "myField = " as well as "myField="), but that would require CLR integration and I�m just not ready to go to that extreme.

Note: MD2 has not been considered secure since 2004. You should choose a better algorithm than this.