Showing posts with label replication. Show all posts
Showing posts with label replication. Show all posts

2007/05/07

DFS replication: Inter-site problems

We had DFS replication (the one released with Windows Server 2003 R2) between 3 servers, two of them on the central site (MAIN1 & MAIN2) and the other in the branch site (BRANCH1). The topology we had set up was a hub and spoke with MAIN1 as the center of the star: MAIN2 <-> MAIN1 <-> BRANCH1

One day we thought that it would be better to have a full mesh topology, just in case MAIN1 had any problem, in our case the full mesh was equivalent to hub and spoke with a secondary or alternative hub server. We just added a new connection in the connection tabs for every replication group we had already set up and running, enabling the option to 'Create a second connection in the opposite direction', so that we had also a connection as MAIN2 <-> BRANCH1 besides the others that we had already up and running for months. And forgot about it.

It seemed so simple... If a less connected topology had been working for such a long time without any problem, why would there be any problem with a more tighten topology? As I said, we just add the new connections and switch over to other things to do.

Two new connections were added from and to MAIN2 and BRANCH1 After some days, our clients start complaining about missing files (mismatch) between servers. We checked they were right and went to see event logs for the servers. In summary there were these types of errors, warnings and informational messages:

Events in MAIN1

Event Type: Error
Event Source: DFSR
Event Category: None
Event ID: 5012
Date:  03/05/2007
Time:  16:47:50
User:  N/A
Equipo: MAIN1
Description:
 The DFS Replication service failed to communicate with partner MAIN2 for replication 
 group Orders. The partner did not recognize the connection or the replication group 
 configuration.

 Partner DNS Address: main2.my-domain.com

Optional data if available:
 Partner WINS Address: main2
 Partner IP Address: 192.168.1.9

The service will retry the connection periodically.

Additional Information:
 Error: 9026 (The connection is invalid)
 Connection ID: 0A5392D6-D3D5-49BF-9D1A-F4DFF1C4F0F2
 Replication Group ID: 4F725FDD-34A9-42CA-A3D1-194175AD8F23

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Events in MAIN2

Event Type: Error
Event Source: DFSR
Event Category: None
Event ID: 5008
Date:  03/05/2007
Time:  16:43:37
User:  N/A
Equipo: MAIN2
Description:
 The DFS Replication service failed to communicate with partner BRANCH1 for replication
 group Orders. This error can occur if the host is unreachable, or if the DFS Replication 
 service is not running on the server.

 Partner DNS Address: branch1.my-domain.com
 Optional data if available:
  Partner WINS Address: branch1
  Partner IP Address: 192.168.3.3
 The service will retry the connection periodically.

 Additional Information:
  Error: 1722 (The RPC server is unavailable.)
  Connection ID: 543733B4-D8CC-4C99-9F03-11635B5BC966
  Replication Group ID: 4F725FDD-34A9-42CA-A3D1-194175AD8F23

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Event Type: Information
Event Source: DFSR
Event Category: None
Event ID: 5016
Date:  03/05/2007
Time:  16:47:36
User:  N/A
Equipo: MAIN2
Description:
 The DFS Replication service detected that the connection with partner branch1.my-domain.com
 for replication group Orders has been removed or disabled.

 Additional Information:
  Connection ID: 543733B4-D8CC-4C99-9F03-11635B5BC966
  Replication Group ID: 4F725FDD-34A9-42CA-A3D1-194175AD8F23

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Event Type: Warning
Event Source: DFSR
Event Category: None
Event ID: 6804
Date:  03/05/2007
Time:  16:47:36
User:  N/A
Equipo: MAIN2
Description:
 The DFS Replication service has detected that no connections are configured for replication
 group Order. No data is being replicated for this replication group.

Additional Information:
 Replication Group ID: 4F725FDD-34A9-42CA-A3D1-194175AD8F23
 Member ID: 453E3E21-9989-4795-9C47-5022FC79872A

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Event Type: Information
Event Source: DFSR
Event Category: None
Event ID: 2010
Date:  03/05/2007
Time:  16:48:11
User:  N/A
Equipo: MAIN2
Description:
 The DFS Replication service has detected that all replicated folders on volume E: have been
 disabled or deleted.

Additional Information:
 Volume: AE7DC6D0-1C7C-11DB-A9C6-001372640D81

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Event Type: Information
Event Source: DFSR
Event Category: None
Event ID: 1206
Date:  03/05/2007
Time:  16:49:11
User:  N/A
Equipo: MAIN2
Description:
 The DFS Replication service successfully contacted domain controller main2.my-domain.com to
 access configuration information.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

This latest informational message was repeated every 5 minutes approximately and buried deep in the logs the real errors and warnings. Note that when DFS is working fine, this message is not shown (at least not so repeatedly). One particular thing was that another replication group called Backups, that kept in synch files between MAIN1 and MAIN2 was working fine.

In summary, we had intra-site-replication groups that worked fine, and all inter-site-replication groups were exposing errors and not replicating at all since we did the full-mesh topology change. Just for testing, we created a new replication group from start between MAIN1 and MAIN2 and it replicated fine.

Then we added BRANCH1 to the replication group with a hub-and-spoke topology centered in MAIN1 and still replicated fine. Finally, changed the topology to be full-mesh and the same behavior appeared. Weird. Googling for information about my problem I reached How Active Directory Replication Topology Works and read:

Active Directory KCC Architecture and Processes [...] One domain controller in each site is selected as the Intersite Topology Generator (ISTG). To enable replication across site links, the ISTG automatically designates one or more servers to perform site-to-site replication. These servers are called bridgehead servers. A bridgehead is a point where a connection leaves or enters a site. [...]

I noticed they were using DFS terminology: connection, replication... but the fact was that the document was related to AD replication, not DFS replication. I decided however to give it a try: I opened Active Directory Sites and Services to check which of my servers was(were) bridgehead server(s).

Only MAIN1 was a bridgehead server for IP transport, MAIN2 was NOT configured as a bridgehead server. And new questions arose: How was this configured? Is this automatically or manually configured? Should we override this configuration and select two servers as bridgehead servers?...

As stated in Active Directory Operations Guide, Managing Sites

Bridgehead Server Selection By default, bridgehead servers are automatically selected by the intersite topology generator (ISTG) in each site. Alternatively, you can use Active Directory Sites and Services to select preferred bridgehead servers. However, it is recommended for Windows 2000 deployments that you do not select preferred bridgehead servers. Selecting preferred bridgehead servers limits the bridgehead servers that the KCC can use to those that you have selected. If you use Active Directory Sites and Services to select any preferred bridgehead servers at all in a site, you must select as many as possible and you must select them for all domains that must be replicated to a different site. If you select preferred bridgehead servers for a domain and all preferred bridgehead servers for that domain become unavailable, replication of that domain to and from that site does not occur. If you have selected one or more bridgehead servers, removing them from the bridgehead servers list restores the automatic selection functionality to the ISTG.

Mhh... you must select as many as possible... it sounded as it would be a good idea to enable MAIN2 as a bridgehead server too, despite of my DFS problems, this change would tighten our AD replication topology and so we changed it. Only MAIN1 was a bridgehead enabled server for IP transport. We changed MAIN2 to be also a bridgehead IP server Luckily this little change of enabling MAIN2 as bridgehead server solved all the DFS replication problems.

I have found lots of HOWTOs, documentation and technical documents about setting up DFS replication and none of them said a word about the importance of bridgehead servers for inter-site-replication of DFS, not only for AD replication. I hope this blog entry could help someone and save them some days of headaches as I had.

Keywords: DFS, replication, active directory, ad, bridgehead servers, problems, errors, inter-site, intra-site, active directory sites and services

Links:

Step-by-Step Guide for the Distributed File System Solution in Windows Server 2003 R2
Active Directory Operations Guide
How Active Directory Replication Topology Works

2006/12/28

SQL server 2005 publishing views to SQL 2000 subscribers

If you are in the process of migrating your SQL Server 2000 scenario to SQL Server 2005 and you have replication configured, here is a hint that you must have in mind before changing the publisher to be a SQL Server 2005: SQL Server 2005 has changed the syntax for SELECT clauses (when ORDER BY is used) in the following way:
SQL 2000: SELECT TOP 100 PERCENT Field1, Field2 FROM Table1 ORDER BY Field1
SQL 2005: SELECT TOP (100) PERCENT Field1, Field2 FROM Table1 ORDER BY Field1
Please note the parentesys. This newer syntax is not recognised by SQL Server 2000 engine, thus giving you errors when the initial snapshot is tried to be applied. If this is your case and you need to set up SQL Server 2005 as a publisher to SQL Server 2000 subscriber(s), you will need to change those views that need to be published to remove the ORDER BY so that no TOP [n] PERCENT is needed (and also be removed), and modify your application accordingly. Besides, according ORDER BY Clause (Transact-SQL)(in BOL):
When ORDER BY is used in the definition of a view, inline function, derived table, or subquery, the clause is used only to determine the rows returned by the TOP clause. The ORDER BY clause does not guarantee ordered results when these constructs are queried, unless ORDER BY is also specified in the query itself.
In other words, it is useless to do a SELECT TOP 100 PERCENT... ORDER BY FieldX in a view (let's say View1), since all records will be returned, but not granted to be ordered as expected, unless we do a SELECT * FROM View1 ORDER BY FieldX from the view itself. See also: SQL Server 2005 Ordered View and Inline Function Problems from OakLeaf Systems' blog.

2006/10/02

SQL Server transactional replication issues regarding NOT FOR REPLICATION triggers. Part II

Scenario: Transactional replication with immediate updates among SQL Server 2005 publisher and subscribers. Articles/Tables have user defined triggers marked as NOT FOR REPLICATION (NFR) and those triggers are deployed to subscribers for load balancing.

Problem description: Even though the triggers are marked as NOT FOR REPLICATION (NFR), when the origin of the triggering action is a subscriber, the trigger is fired twice, at both the subscriber (expected behaviour) and at the publisher (changed behaviour from SQL Server 2000 point of view).

The solution: In this page the solution to the problem exposed in SQL Server transactional replication issues regarding NOT FOR REPLICATION triggers is explained: The main idea is using SET CONTEX_INFO logic so that triggers thrown at the server are capable of distinguishing when the call has been made directly by a user (either at the publisher or the subscriber) or has been made in order to replicate a command.

In the former URL we saw an example of a trigger that should not be executed at all when fired at the publisher by means of an insert/update done at the subscriber. The procedure explained here is to avoid the execution of those unwanted double-fired triggers.

You must ensure that your application executes:

SET CONTEXT_INFO 0x80

Everytime it makes a connection to the server. We will use 0x80 (that in binary is 10000000 00000000). In fact, we will only use the first bit from the first byte (8 bits) from CONTEXT_INFO, that is sized 128 bytes, just in case you might need the rest of the variable for other purposes.

If you already use CONTEXT_INFO in your scenario, select another bit position for this purpose and change the following code accordingly.

And now, the other side of the change: You have to include the following code at the very beginning in every NOT FOR REPLICATION trigger deployed to subscribers that might give you problems when fired more than once:

  -- In order to workaround the double triggering issue in both subscribers 
  -- and publisher when the action is executed on the subscriber, we need 
  -- to check here if the trigger has been fired by a user action.
  -- For this reason, client applications MUST execute SET CONTEXT_INFO 0x80
  -- in order to identify themselves. Only first bit of first byte of the 
  -- 128bytes of CONTEXT_INFO is used here. The rest of CONTEXT_INFO could 
  -- still be used for other purposes. We only check that first bit of first
  -- byte is set to 1.
  IF NOT EXISTS(SELECT * FROM master.dbo.sysprocesses
                WHERE spid = @@SPID AND 
                      CONVERT(tinyint, SUBSTRING(context_info, 1, 1)) & 0x80 = 0x80)
       RETURN -- If no context info exists, return
  -- If we reach to this point, the trigger has identified correctly, the user
  -- action and the execution will continue either on the publisher or the 
  -- subscriber, but not on both.

  -- The rest of the trigger follows...

2006/09/01

How to deploy foreign keys to subscribers that are using queued updates

Scenario: Transactional replication with queued updates. 'Subscriber wins' conflict resolution policy ( exec sp_addpublication ... @conflict_policy = N'sub wins' ...). Several publications are used to deploy database instead of a single big publication. Since publications are queued, subscribers must know about foreign key (FK) restrictions in order to prevent users from breaking the rules, thus referential integrity must be replicated to subscribers. The command to create articles matching the former requirements is:

exec sp_addarticle ... @schema_option = 0x0000000008035FDF ...

and a sparse explanation of @schema_option bitmask is:

0x0000000008035FDF
         0x8000000 Create any schemas not already present on the subscriber.
           0x10000 Replicates CHECK constraints as NOT FOR REPLICATION ...
           0x20000 Replicates FOREIGN KEY constraints as NOT FOR REPLICATION...
            0x4000 Replicates UNIQUE constraints. Any indexes related to the...
            0x1000 Replicates column-level collation.
             0x100 Replicates user triggers on a table article, if defined...
             0x200 Replicates foreign key constraints. If the referenced table ...
             0x400 Replicates check constraints. Not supported for Oracle...
             0x800 Replicates defaults. Not supported for Oracle Publishers.
             etc..

You can read more about sp_addarticle and @schema_option in BOL. If you use Management Studio rather than T-SQL you don't have to worry about this bitmask mess: the GUI does everything for you.

Problem: Even though SQL Server 2005 has the option to publish referential integrity among tables in a publication, this option only works for tables that are both inside the publication. If there is any referential integrity that goes TO a table outside the publication or comes FROM a table outside the publication, this foreign key (FK) is not replicated to subscribers.

Besides, if you are using SQL Server 2000, this option is not available at all. If you want to use queued updates, the subscribers must have all foreign constraints applied in order to avoid problems. The stored procedures and semi-automated way of publishing foreign keys described in this document can be used also if you are still using SQL Server 2000.

Subscribers are unaware of those restrictions and there is a chance that a subscriber receives a command that could violate a foreign key on the publisher and still be executed and queued without any warning/error to the user/client application. Afterwards, when the queued command is replicated to the publisher, it is rolled back because it goes against a foreign key that only publisher knows about.

In the following picture you can see a sample of the problem under discussion. There we have 2 publications, QUOTATIONS and ORDERS and there are foreign keys that are completely inside the publications, but there is another one FK_TOrders_TQuotations) that goes beyond the limits of a single publication and starts in ORDERS and finishes in QUOTATIONS.

Orders and Quotations

This kind of foreign keys are not replicated to subscribers even though the option to replicate FKs is enabled. If you are using 'publisher wins' policy, the offending command is rolled back and the original record is restored at the subscriber (seconds or minutes later). The user there is not warned about this fact; he might still think that his changes were commited (but they were not).

On the worst possible scenario, if you are using 'subscriber wins' policy, when the offending command is sent to publisher for replication, it cannot be commited because it breaks the referential integrity, but since 'subscriber wins' is set, the command is not sent back to subscriber to be undone. In this case, subscriber will have the new (but inconsistent) record, and publisher will hold the old (but consistent) record.

After some hours, or even days, if the publication is validated on a scheduled basis using sp_publication_validation, the publication will appear has having validation errors. If the publication is then reinitialised, the subscribers will receive the publishers version of the data (foreign key consistent).

In any case, the changes committed initially at the subscriber are undone without any error nor warning by the time the statements are being executed at the subscriber.

My feedback to Microsoft on this issue: Subscriptions get desynched easily when subscriber uses queued updates and there are foreign keys that point outside the articles in the publication

Solution: The main idea behind this solution is using post-snapshot scripts to create the foreign keys needed at the subcribers that are not replicated automatically by SQL Server.

We must create pre-snapshots scripts also to drop the FKs that might prevent the tables from being dropped and recreated by the initial snapshots at the subscribers.

In our example, in order to successfully publish QUOTATIONS publication, we need first to try dropping FK_TOrders_TQuotations because it points to TQuotations, and since that FK might be already in place, TQuotations could not be dropped in order to be recreated and populated by the initial snapshot (this only applies if the action for in use object is drop and recreate ( exec sp_addarticle ... @pre_creation_cmd = N'drop' ... ).

So, we will need pre-snapshot scripts to drop those FKs that might be already in place and post-snapshot scripts to re-create them.

Is there any way to automate the process of creating those scripts? If the database you are trying to replicate is somewhat big, sure you will have hundreds of tables with at least an average of 3 or 4 foreign keys for each one. Having to manually find and script the creation (and deletion) of those FKs that are beyond the limit of a publication is a headache (at least).

Besides, you will need to mantain those scripts if the schema changes or new tables are created. Having to manually manage pre-snapshot scripts and post-snapshot scripts is not a solution for real environments.

Prerequisites: As commented in the former feedback to Microsoft, there are some conditions that your database must met in order to be able to use the scripts.

  • The tables must be published using their original names.
  • The scripts will try to create/delete both (incoming and outgoing) foreign keys for every single article/table. This will ensure that a FK is created if, in first instance, one of the tables involved in a FK is not already published. It will be created during the post-snapshot script of the second publication initialization.
  • FKs will be created at the subscribers after publisher's name (the same name as publisher's) and, if a FK with the same name already exists, the scripts suppose it is already in place and it will not be created twice, nor with another name.
  • If a FK references tables that are already (both) at the subscriber, the scripts will try to create the FK. The schema will not be checked, this is the reason for the need to deploy the tables with their original names.
  • If horizontal or vertical partitions (row filters) are applied to any of the articles, you must be sure not to remove rows/columns that would violate any of the existing FK at the publisher when tried to be applied at the subscriber (the scripts will not check this).

Scripting: We will discuss here a procedure to automate the process of creating those scripts. We will use an auxiliary publication SyncForeignKeysForSubscribers that will contain 2 articles (tables) called TSyncForeignKeysForSubscribersCREATE and TSyncForeignKeysForSubscribersDROP. The scripts to create those 2 tables are:

CREATE TABLE [dbo].[TSyncForeignKeysForSubscribersCREATE](
  [Id] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
  [PublicationName] [varchar](256) COLLATE Modern_Spanish_CI_AS NOT NULL,
  [TableName] [varchar](1024) COLLATE Modern_Spanish_CI_AS NOT NULL,
  [FKName] [varchar](1024) COLLATE Modern_Spanish_CI_AS NOT NULL,
  [TheRest] [varchar](1024) COLLATE Modern_Spanish_CI_AS NOT NULL,
 CONSTRAINT [PK_TSyncForeignKeysForSubscribersCREATE] PRIMARY KEY CLUSTERED
  ( [Id] ASC )WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]) ON [PRIMARY]
CREATE TABLE [dbo].[TSyncForeignKeysForSubscribersDROP](
  [Id] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
  [PublicationName] [varchar](256) COLLATE Modern_Spanish_CI_AS NOT NULL,
  [TableName] [varchar](1024) COLLATE Modern_Spanish_CI_AS NOT NULL,
  [FKName] [varchar](1024) COLLATE Modern_Spanish_CI_AS NOT NULL,
 CONSTRAINT [PK_TSyncForeignKeysForSubscribersDROP] PRIMARY KEY CLUSTERED
  ( [Id] ASC )WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]) ON [PRIMARY]

Those two tables will be filled at the publisher with the information about FKs and replicated to subscribers. When that information is replicated to subscribers, it will be ready to be used by pre and post-snapshot scripts to drop and recreate the FKs. UpdateForeignKeysInformationForReplicationOnAllPublications This stored procedure is to be called at the publisher. It is a wrapper for calling to UpdateForeignKeysInformationForReplicationOnPublication for every every publication on the current database.

CREATE PROCEDURE [dbo].[UpdateForeignKeysInformationForReplicationOnAllPublications]
AS
BEGIN
  DECLARE 
    @PublicationName varchar(256)

  -- SET NOCOUNT ON added to prevent extra result sets from
  -- interfering with SELECT statements.
  SET NOCOUNT ON;

  -- We retrieve the list of publications that are published 
  -- inside the database we are running into.
  DECLARE ListOfPublications CURSOR LOCAL FOR
    SELECT publication FROM distribution.dbo.MSpublications
    WHERE publisher_db = (SELECT db_name(dbid) FROM master..sysprocesses WHERE spid=@@SPID)
    ORDER BY publication

  -- Just iterate on them calling to the other SP that we will discuss later
  OPEN ListOfPublications
  FETCH NEXT FROM ListOfPublications INTO @PublicationName
  WHILE (@@FETCH_STATUS <> -1) BEGIN
    IF (@@FETCH_STATUS <> -2) BEGIN
      PRINT 'Processing publication ' + @PublicationName
      EXEC dbo.UpdateForeignKeysInformationForReplicationOnPublication @PublicationName
    END
    FETCH NEXT FROM ListOfPublications INTO @PublicationName
  END
  CLOSE ListOfPublications
  DEALLOCATE ListOfPublications
END

UpdateForeignKeysInformationForReplicationOnPublication @PublicationName varchar(256) This stored procedure is to be called at the publisher too and does all the work of retrieving FK information in order to publish it to subscribers. It should be called everytime your schema changes or new FKs are created. It iterates on FKs defined in de database that @PublicationName belongs to, looking for those that go TO a table outside the articles of the publication or comes FROM a table that is not included in it. Once a matching FK is found, its information is saved in the replicated tables so that subscribers could use this info later.

CREATE PROCEDURE [dbo].[UpdateForeignKeysInformationForReplicationOnPublication] 
  @PublicationName varchar(256)
AS BEGIN
  DECLARE 
    @TableName varchar(128),
    @PKTABLE_NAME varchar(128),
    @PKCOLUMN_NAME varchar(128),
    @FKTABLE_NAME varchar(128),
    @FKCOLUMN_NAME varchar(128),
    @FK_NAME varchar(128),
    @sql varchar(2048)

  -- SET NOCOUNT ON added to prevent extra result sets from
  -- interfering with SELECT statements.
  SET NOCOUNT ON;

  -- Create a temporary table to store middle calculations
  CREATE TABLE #tmpFKs (
    PKTABLE_QUALIFIER varchar(1024),
    PKTABLE_OWNER varchar(1024),
    PKTABLE_NAME varchar(1024),
    PKCOLUMN_NAME varchar(1024),
    PKTABLE_QUEALIFIER varchar(1024),
    FKTABLE_OWNER varchar(1024),
    FKTABLE_NAME varchar(1024),
    FKCOLUMN_NAME varchar(1024),
    KEY_SEQ int,
    UPDATE_RULE int,
    DELETE_RULE int,
    FK_NAME varchar(1024),
    PK_NAME varchar(1024),
    DEFERRABILITY int)

  -- 1st step: Retrieve all outgoing FK: from articles within the publication to other tables 
  -- outside this publication. We will use a cursor to iterate through all those articles
  -- inside the publication and retrieve any FK pointing out to any other table. Afterwards, we 
  -- will delete those FK that fall inside the publication articles.
  PRINT 'Step 1: Inside -> Outside FKs from ' + @PublicationName
  DECLARE ListOfArticles CURSOR LOCAL /*SCROLL READ_ONLY*/ FOR
    SELECT DISTINCT A.destination_object
    FROM distribution.dbo.MSpublications P 
      INNER JOIN distribution.dbo.MSarticles A ON P.publication_id = A.publication_id
    WHERE P.publication = @PublicationName
    ORDER BY A.destination_object

  OPEN ListOfArticles
  FETCH NEXT FROM ListOfArticles INTO @TableName
  WHILE (@@FETCH_STATUS <> -1) BEGIN
    IF (@@FETCH_STATUS <> -2) BEGIN
      SELECT @sql = 'sp_fkeys @fktable_name = N''' + @TableName + ''''
      INSERT INTO #tmpFKs EXEC(@sql)
      -- From those just inserted, delete those FKs that are internal (among articles inside the
      -- publication). At the end, we are just interested in those FKs pointing out to tables 
      -- NOT in this publication.
      DELETE FROM #tmpFKs
      WHERE PKTABLE_NAME IN (  -- Subselect with publication's own articles.
        SELECT A.destination_object
        FROM distribution.dbo.MSpublications P
          INNER JOIN distribution.dbo.MSarticles A ON P.publication_id = A.publication_id
        WHERE P.publication = @PublicationName
      )
    END
    FETCH NEXT FROM ListOfArticles INTO @TableName
  END
  CLOSE ListOfArticles
  DEALLOCATE ListOfArticles

  -- 2nd step: We need to retrieve all incoming FKs: from tables NOT included in the publication
  -- to articles inside it. We will retrieve all incoming FK for the tables inside the publication
  -- and then remove those that fall again within tables inside it.
  PRINT 'Step 2: Inside <- Outside FKs from ' + @PublicationName
  DECLARE ListOfOutterArticles CURSOR LOCAL FOR
    SELECT [name] TableName FROM sys.objects T
    WHERE T.type='U' AND 
          T.is_ms_shipped = 0 AND
          T.[name] NOT LIKE 'conflict_%' /* AND
          T.is_published = 1 */ AND
          T.[name] NOT IN (SELECT A.destination_object
                           FROM distribution.dbo.MSpublications P 
                             INNER JOIN distribution.dbo.MSarticles A 
                               ON P.publication_id = A.publication_id
                           WHERE P.publication = @PublicationName)

  OPEN ListOfOutterArticles
  FETCH NEXT FROM ListOfOutterArticles INTO @TableName
  WHILE (@@FETCH_STATUS <> -1) BEGIN
    IF (@@FETCH_STATUS <> -2) BEGIN
      SELECT @sql = 'sp_fkeys @fktable_name = N''' + @TableName + ''''
      INSERT INTO #tmpFKs EXEC(@sql)
      -- From those just inserted, delete FKs that are completely out the publication
      -- (from tables outside to tables outside)
      -- We are just interested in those pointing to articles inside this publication
      DELETE FROM #tmpFKs
      WHERE 
        FKTABLE_NAME = @TableName AND
        PKTABLE_NAME NOT IN (  -- Subselect with publication's own articles.
          SELECT A.destination_object
          FROM distribution.dbo.MSpublications P
            INNER JOIN distribution.dbo.MSarticles A ON P.publication_id = A.publication_id
          WHERE P.publication = @PublicationName
        )
    END
    FETCH NEXT FROM ListOfOutterArticles INTO @TableName
  END
  CLOSE ListOfOutterArticles
  DEALLOCATE ListOfOutterArticles

  -- 3rd step: Some format and cleaning of the auxiliary table.
  -- If there are FKs with more than one key column, KEY_SEQ indicates the 
  -- order we need follow when considering FKCOLUMN_NAME and PKCOLUMN_NAME
  -- We are going to combine those records where KEY_SEQ > 1, flattening 
  -- FKCOLUMN_NAME and FKCOLUMN_NAME into the single record where KEY_SEQ=1
  PRINT 'Step 3: Reorganize #tempFKs   from ' + @PublicationName

  -- First of all, we use QUOTENAME for not having problems afterwards
  UPDATE #tmpFKs 
    SET FKCOLUMN_NAME = QUOTENAME(FKCOLUMN_NAME),
        PKCOLUMN_NAME = QUOTENAME(PKCOLUMN_NAME)

  -- Cursor with records belonging to FKs defined with more than one key column
  -- except this very first column ( > 1 )
  DECLARE ListOfKeys CURSOR LOCAL FOR
    SELECT PKTABLE_NAME, PKCOLUMN_NAME, FKTABLE_NAME, FKCOLUMN_NAME, FK_NAME
    FROM #tmpFKs
    WHERE KEY_SEQ > 1
    ORDER BY PKTABLE_NAME, FK_NAME, KEY_SEQ

  OPEN ListOfKeys
  FETCH NEXT FROM ListOfKeys INTO @PKTABLE_NAME, @PKCOLUMN_NAME, 
                                  @FKTABLE_NAME, @FKCOLUMN_NAME, @FK_NAME
  WHILE (@@FETCH_STATUS <> -1) BEGIN
    IF (@@FETCH_STATUS <> -2) BEGIN
      -- Update the 'main' record of this FK (the one with KEY_SEQ=1)
      -- and matching the rest of variables...
      UPDATE #tmpFKs
        SET
          PKCOLUMN_NAME = PKCOLUMN_NAME + ', ' + @PKCOLUMN_NAME,
          FKCOLUMN_NAME = FKCOLUMN_NAME + ', ' + @FKCOLUMN_NAME
        WHERE PKTABLE_NAME = @PKTABLE_NAME AND 
              FKTABLE_NAME = @FKTABLE_NAME AND 
              FK_NAME      = @FK_NAME AND
              KEY_SEQ      = 1
    END
    FETCH NEXT FROM ListOfKeys INTO @PKTABLE_NAME, @PKCOLUMN_NAME,
                                    @FKTABLE_NAME, @FKCOLUMN_NAME, @FK_NAME
  END
  CLOSE ListOfKeys
  DEALLOCATE ListOfKeys

  -- 4th Step: Clean and final inserts
  PRINT 'Step 4: Clean & final inserts from ' + @PublicationName

  -- Remove records with KEY_SEQ>1 (we don't need them anymore)
  DELETE FROM #tmpFKs WHERE KEY_SEQ > 1

  -- Now we have all the information we need. We just need to do de inserts
  -- into TSyncForeignKeysForSubscribersDROP formatting according the information 
  -- in this auxiliary table. This table is to be read by the stored procedure
  -- UseForeignKeysInformationAndDoDropsOnPublication thrown by a pre-snapshot script
  -- at the subscriber.
  DELETE TSyncForeignKeysForSubscribersDROP WHERE PublicationName = @PublicationName
  INSERT INTO TSyncForeignKeysForSubscribersDROP (PublicationName, TableName, FKName)
    SELECT @PublicationName, FKTABLE_NAME, FK_NAME FROM #tmpFKs
  
  -- Now the commands to recreate those FKs after the initialization of the
  -- publication is done. This table is to be read by the stored procedure
  -- UseForeignKeysInformationAndDoCreatesOnPublication included in every 
  -- post-snapshot script.
  DELETE TSyncForeignKeysForSubscribersCREATE WHERE PublicationName = @PublicationName
  INSERT INTO
    TSyncForeignKeysForSubscribersCREATE (PublicationName, TableName, FKName, TheRest)
    SELECT 
      @PublicationName,
      FKTABLE_NAME,
      FK_NAME,
      'FOREIGN KEY(' + FKCOLUMN_NAME + ') REFERENCES [dbo].' + QUOTENAME(PKTABLE_NAME) +
        ' (' + PKCOLUMN_NAME + ') NOT FOR REPLICATION'
      FROM #tmpFKs

  -- Finally drop the temporary table.
  DROP TABLE #tmpFKs

  -- Once executed, the tables TSyncForeignKeysForSubscribersCREATE and 
  -- TSyncForeignKeysForSubscribersDROP contain the updated instructions that 
  -- subscribers need to apply to replicate the foreign keys.
  PRINT ''
END

UseForeignKeysInformationAndDoDropsOnPublication @PublicationName varchar(256) This stored procedure should be called only on subscribers, since it DROPS all FKs contained in the table TSyncForeignKeysForSubscribersDROP dealing with @PublicationName. The idea is to include this command into a pre-snapshot script for every publication, so that FKs that would create conflicts when tables should be dropped and recreated are not in place when the actual DROP TABLE command arrives (that is why it should be in a pre-snapshot script).

CREATE PROCEDURE [dbo].[UseForeignKeysInformationAndDoDropsOnPublication]
  @PublicationName varchar(256)
AS
BEGIN
  DECLARE
    @i int,
    @TableName varchar(1024),
    @FKName varchar(1024),
    @sql nvarchar(2048),
    @IsSQL2000 BINARY

  -- SET NOCOUNT ON added to prevent extra result sets from
  -- interfering with SELECT statements.
  SET NOCOUNT ON;

  -- Are we running on a SQL 2000 version?
  -- There must be 2 spaces before 2000%, this is not a typo
  SELECT @IsSQL2000 = COUNT(*) FROM (SELECT @@version AS V) AUX
    WHERE AUX.V LIKE 'Microsoft SQL Server  2000%'

  DECLARE FKsToDROP CURSOR LOCAL FOR
    SELECT TableName, FKName FROM TSyncForeignKeysForSubscribersDROP
      WHERE PublicationName = @PublicationName

  OPEN FKsToDROP
  FETCH NEXT FROM FKsToDROP INTO @TableName, @FKName
  WHILE (@@FETCH_STATUS <> -1) BEGIN
    IF (@@FETCH_STATUS <> -2) BEGIN
      SET @i = 0
      IF @IsSQL2000<>0
        SELECT @i = COUNT(*) FROM dbo.sysobjects
          WHERE xtype='F' AND type='F' AND Name = @FKName
      ELSE
        SELECT @i = COUNT(*) FROM sys.foreign_keys
          WHERE Name = @FKName

      IF (@i = 1) BEGIN -- The FK is in place, we need to remove it
        SELECT @i = COUNT(*) FROM sysobjects WHERE TYPE = 'U' AND Name = @TableName
        IF @i > 0 BEGIN -- If the table exists, we can go on
          SELECT @sql = N'ALTER TABLE ' + QUOTENAME(@TableName) +
                        ' DROP CONSTRAINT ' + QUOTENAME(@FKName)
          EXEC sp_executesql @sql
          PRINT 'Successfully dropped: ' + QUOTENAME(@FKName)
         END ELSE BEGIN
           PRINT 'No action done:       ' + QUOTENAME(@TableName) + ' doesn''t exist.'
         END
      END ELSE BEGIN -- The FK is not there, do nothing
        PRINT 'No action done:       ' + QUOTENAME(@FKName)
      END
    END
    FETCH NEXT FROM FKsToDROP INTO @TableName, @FKName
  END
  CLOSE FKsToDROP
  DEALLOCATE FKsToDROP
END

UseForeignKeysInformationAndDoCreatesOnPublication @PublicationName varchar(256) This stored procedure is to be called on subscribers. It uses the information stored in TSyncForeignKeysForSubscribersCREATE to issue CREATE commands for FKs related to @PublicationName. It should be inserted in every post-snapshot script so that FKs are recreated after the initial snapshot is sent.

CREATE PROCEDURE [dbo].[UseForeignKeysInformationAndDoCreatesOnPublication]
  @PublicationName varchar(256)
AS
BEGIN
  DECLARE
    @i INT,
    @FKTableName varchar(1024),
    @PKTableName varchar(1024),
    @FKName varchar(1024),
    @TheRest varchar(1024),
    @sql nvarchar(2048),
    @IsSQL2000 BINARY

  -- SET NOCOUNT ON added to prevent extra result sets from
  -- interfering with SELECT statements.
  SET NOCOUNT ON;

  -- Are we running on a SQL 2000 version?
  -- There must be 2 spaces before 2000%, this is not a typo
  SELECT @IsSQL2000 = COUNT(*) FROM (SELECT @@version AS V) AUX
    WHERE AUX.V LIKE 'Microsoft SQL Server  2000%'

  DECLARE FKsToCREATE CURSOR LOCAL FOR
    SELECT FKTableName, PKTableName, FKName, TheRest FROM TSyncForeignKeysForSubscribersCREATE
      WHERE PublicationName = @PublicationName

  OPEN FKsToCREATE
  FETCH NEXT FROM FKsToCREATE INTO @FKTableName, @PKTableName, @FKName, @TheRest
  WHILE (@@FETCH_STATUS <> -1) BEGIN
    IF (@@FETCH_STATUS <> -2) BEGIN
      SET @i = 0
      IF @IsSQL2000<>0
        SELECT @i = COUNT(*) FROM dbo.sysobjects
          WHERE xtype='F' AND type='F' AND Name = @FKName
      ELSE
        SELECT @i = COUNT(*) FROM sys.foreign_keys
          WHERE Name = @FKName
      
      IF (@i = 0) BEGIN -- If the FK is not in place, we will try to create it

        SELECT @i = COUNT(*) FROM sysobjects
          WHERE TYPE = 'U' AND (Name = @FKTableName OR Name = @PKTableName)
        IF @i = 2 BEGIN -- If both tables exist, we can go on
          SELECT @sql = N'ALTER TABLE ' + QUOTENAME(@FKTableName) + 
                        ' WITH CHECK ADD CONSTRAINT ' + QUOTENAME(@FKName) + ' ' + @TheRest
          EXEC sp_executesql @sql
          SELECT @sql = N'ALTER TABLE ' + QUOTENAME(@FKTableName) +
                        ' CHECK CONSTRAINT ' + QUOTENAME(@FKName)
          EXEC sp_executesql @sql
          PRINT 'Successfully created and applied:  ' + QUOTENAME(@FKName)
        END ELSE BEGIN
          PRINT 'Nothing done, tables do not exist: ' + QUOTENAME(@FKTableName) +
                                               ' or ' + QUOTENAME(@PKTableName)
        END
      END ELSE BEGIN -- If it is not there, do nothing
        PRINT 'No action done, already in place:  ' + QUOTENAME(@FKName)
      END
    END
    FETCH NEXT FROM FKsToCREATE INTO @FKTableName, @PKTableName, @FKName, @TheRest
  END
  CLOSE FKsToCREATE
  DEALLOCATE FKsToCREATE
END

Download: You can also download the scripts described in this article.

Keywords: transactional replication, queued updates, foreign keys, SQL Server 2005, Microsoft, bug, error, workaround, resolution, transact-sql, stored procedure, replicate foreign keys to subscribers

2006/07/27

Distribution agents & network efficiency for transactional replication with immediate updates

Scenario: We have a VPN set with two xDSL between our main office and a branch site. There are two SQL Servers (at each side) with transactional replication with immediate updates. As you might imagine, bandwidth is critical (it is not a T1 line, we have only 300kbps maximum) and it must be used for other purposes also (file transfer/replication using Windows DFS, web browsing, email, etc.) We have a database of ~150 tables divided in 28 publications. Hence we have 28 distribution agents running against the same subscriber. We have also set up a network bandwidth monitoring software (http://www.cacti.net/) to be able to check the status of the physical lines, and VPN, and see their evolution in time, if there are bottlenecks, and so on.

Problem description: We have been using SQL Server 2000 for quite a long time, both at the publisher and at the subscriber. One month ago, more or less, we installed SQL Server 2005 at the main office as new publisher (we are migrating step by step, subscriber will go next), and the network traffic has boosted.

We are now consuming all the 300kbps (uploading from main office to the branch site) permanently, and 95kbp of downloading (from branch to main office). We have confirmed that all that traffic is caused by SQL Server, and not other kind of traffic.

Furthermore, after some tests, we have trace it down and guessed that Distribution Agent is to blame. Default agent profile for distribution agents has as profile parameters one called PollingInterval, set by default to 5 seconds. We have created a new user profile based on the default one, and changed PollingInterval from 5 to 60 seconds.

The traffic has been reduced in somewhat near 12 times less ( 60/5 = 12 ) from 300kbps to something near 40kbps (being so small, those 40kbps can contain other kind of traffic).

The reasons for Distribution Agents being to blame is explained later. According to SQL Server 2005 BOL and regarding PollingInterval parameter:

Is how often, in seconds, the distribution database is queried for replicated transactions. The default is 5 seconds.

Both the agents and the distribution database are at the same computer: the publisher at the main office. Being that way, in theory, changing that parameter from 5 to 60 seconds would only affect local traffic.

Even more, during the night, when there is no activity at none of the offices, no matter the frequency the agent checks the distribution database, there should be no pending transactions to be applied to subscriber, and the subscriber should not be contacted.

Our experience shows that there is traffic between publisher and subscriber, even when there are no transactions to be applied, and the bandwidth of that traffic depends directly on the value of Distributor Agents' PollingInterval parameter.

One year of network traffic

In this one we can see how, since the middle of June, the date when SQL Server 2005 started to act as publisher, the network traffic reaches the maximum (300kbps).

One day of network traffic

In this capture, we can see the last 24 hours. The valley in traffic around 13:00h belongs to a change of all Distributor Agents' profiles, with a PollingInterval=60 and a restart of the SQL Server Agent service. Half an hour later (more or less), in order to check, the default agent profile is set again (PollingInterval=5) for every Distributor Agent and a new restart of the SQL Server Agent service is done also.

It is very strange that the traffic stalls permanently around 300kbps, even during the night, when there is no activity (just a few jobs that call to some stored procedures to update a few administrative tables, scheduled to run at 00:00 and 5:00am)

Steps to reproduce the problem: In order to find what kind of traffic is being executed against the subscriber, I used Profiler at the subscriber:

  1. 1. Create a publication with transactional replication with immediate updates.
  2. 2. Create a subscription to that publication.
  3. 3. Initialise the publication and wait until the subscriber gets the data.
  4. 4. Start Profiler on the subscriber with a filter:
    HostName:        PUBLISHERNAME
    ApplicationName: PUBLISHERNAME_dbname_publicationname
You will see, every 5 second a RPC:CommandCompleted:
exec sp_executesql
  N'update MSreplication_subscriptions
    set transaction_timestamp = cast(@P1 as binary(15)) +
        cast(substring(transaction_timestamp, 16, 1) as binary(1)), "time" = @P2
    where UPPER(publisher) = UPPER(@P3) and publisher_db = @P4 and
          publication = @P5 and subscription_type = 0 and
          (substring(transaction_timestamp, 16, 1) = 0 or datalength(transaction_timestamp) < 16)',
  N'@P1 varbinary(14),@P2 datetime,@P3 nvarchar(5),@P4 nvarchar(5),@P5 nvarchar(7)',
  0x00004F3200000032000100000000,
  'Jul 24 2006 9:03:59:000AM',
  N'PUBLISHERNAME',
  N'dbname',
  N'publicationname'

Since our database has ~150 tables in 28 publications (agents), if we use the default profile (5secs) and do a simple calculation: 60*28/5 = 336 RPC calls to the subscriber 24/7 without doing any user legitimate transaction (only this unjustified traffic). This consumes all our uploading bandwidth at the main office (300kbps at the publisher). If there is nothing to replicate, I though that the subscriber should not be contacted (at least, not so frequently as 5 seconds) because it goes against network efficiency for slow-links configurations.

Other possibility is using shared agents (default for SQL Server 2000 when dealing with transactional replication) vs. Independent agents (default for SQL Server 2005 for transactional replication). You can read about this on BOL Administering Replication Agents when dealing with Independent and Shared Agents:

An independent agent is an agent that services one subscription. A shared agent services multiple subscriptions; if multiple subscriptions using the same shared agent need to synchronize, by default they wait in a queue, and the shared agent services them one at a time. Latency is reduced when using independent agents because the agent is ready whenever the subscription needs to be synchronized. Merge replication always uses independent agents, and transactional replication uses independent agents by default for publications created in the New Publication Wizard (in previous versions of SQL Server, transactional replication used shared agents by default).

My feedback to Microsoft on this issue: Distribution agents & network efficiency for transactional replication with immediate updates This issue was resolved as 'By Design' and my suggested workaround confirmed as the solution.

Workaround/Solution: As you can guess, the solution is changing PollingInterval to a value that fits with your network bandwidth and desired latency. You need to find a compromise between both of those: with a higher latency, less bandwidth is used; if you need a lower latency, you will consume more bandwidth. Instead of having a single new user profile fo all distribution agents, you could create a set of user profiles with PollingInterval set to (for instance) 15, 60 and 300 seconds.

Publications that require realtime latency, can be assigned the default profile (5 seconds), for the rest, depending of the nature of tables included in the publication, you can assign the 15 secs. profile for near realtime, and the 300 seconds for subscriptions that are seldom updated.

Keywords: SQL Server 2005, transactional replication with immediate updates, Microsoft, slow link, bandwidth, latency, traffic, distribution agent, nothing to replicate, error, workaround, resolution

2006/07/13

"Row handle is invalid" error in SQL Server's Queue Reader Agent

Today I came to the office and found a bunch of SQL Server alert emails, all of them related to queue reader agent. I use transactional replication with immediate updates so... I guessed what could be queued and why did this error appear? I went to see the job's history for the agent and found the following (sorry but my SQL Server version is spanish):
Fecha  12/07/2006 9:00:17
Registro  Historial de trabajos ([MYSERVER].8)

Id. del paso  2
Servidor  MYSERVER
Nombre del trabajo  [MYSERVER].8
Nombre del paso  Ejecutar agente.
Duración  00:18:35
Gravedad de SQL  0
Id. de mensaje de SQL  0
Operador notificado mediante correo electrónico
Operador notificado mediante envĆ­o de red
Operador notificado mediante localizador
Reintentos efectuados  0

Mensaje
-XSERVER MYSERVER
   -XCMDLINE 0
   -XCancelEventHandle 000000000000118C
2006-07-12 07:18:03.678 Agente de lectura de cola [MYSERVER].8 (Id. = 1) iniciado
2006-07-12 07:18:03.678 Valores de parƔmetro obtenidos del perfil de agente:
   -historyverboselevel 1
   -logintimeout 15
   -pollinginterval 5000
   -querytimeout 1800
2006-07-12 07:18:33.724 No hay ninguna transacción en cola disponible
2006-07-12 07:18:48.755 No hay ninguna transacción en cola disponible
2006-07-12 07:18:52.552 El Agente de lectura de cola ha detectado el error 'Identificador de fila
 no vÔlido.' al conectarse a 'mydb' en 'MYSERVER'. Asegúrese de que la publicación y la
 suscripción se han definido correctamente y de que los dos servidores estÔn en ejecución.
2006-07-12 07:18:52.584 Error al aplicar el mensaje en cola al publicador
2006-07-12 07:18:52.599 Subproceso de trabajo 4412 : error en la tarea
2006-07-12 07:18:52.599 No hay ninguna transacción en cola disponible
2006-07-12 07:18:52.599 Anulando lector de colas
Since the error did not make clear where (which publication) the error was nor what do I had to do to solve it I started worrying. The good part of it was that, since I was not using queued transactions, the replication among my servers kept working fine (I checked some inserts/updates at both publisher and subscribers). I first tried to restart SQL Server service (and related services) at the publisher without luck. Since the problem was already in the queue, when Queue Reader Agent was restarted, it began to throw the errors again. I then checked BOL for Replication Queue Reader Agent in order to start it manually on a command line (cmd.exe) and see the errors on my own: I started a cmd.exe running as the SQL Server Agent service account. I changed dir to C:\Program Files\Microsoft SQL Server\90\COM and launched:
qrdrsvc -Distributor [MYSERVER] -DistributionDB [distribution] -DistributorSecurityMode 1
 -Continuous -OutputVerboseLevel 2 > outputlog.txt
Then I checked the outputlog.txt and found the following (sorry, there are things in spanish here too):
2006-07-12 08:36:34.643 Agente de lectura de cola de rƩplica de Microsoft SQL Server 9.00.2047.00
2006-07-12 08:36:34.643 Copyright (c) 2005 Microsoft Corporation
2006-07-12 08:36:34.643
2006-07-12 08:36:34.643 Las marcas de hora anexadas al inicio de las lĆ­neas de salida se
 expresan como hora UTC.
2006-07-12 08:36:34.643 Valores de parƔmetro de agente especificados por el usuario:
   -Distributor MYSERVER
   -DistributionDB distribution
   -DistributorSecurityMode 1
   -Continuous
   -OutputVerboseLevel 2
2006-07-12 08:36:34.643 Connecting to OLE DB qrdrsvc at datasource: 'MYSERVER', location: '',
 catalog: 'distribution', providerstring: '' using provider 'SQLNCLI'
2006-07-12 08:36:34.689 OLE DB qrdrsvc: MYSERVER
   DBMS: Microsoft SQL Server
   Version: 09.00.2047
   catalog name: distribution
   user name: dbo
   API conformance: 0
   SQL conformance: 0
   transaction capable: 1
   read only: F
   identifier quote char: "
   non_nullable_columns: 0
   owner usage: 15
   max table name len: 128
   max column name len: 128
   need long data len:
   max columns in table: 1000
   max columns in index: 16
   max char literal len: 131072
   max statement len: 131072
   max row size: 131072

[... A BUNCH OF LINES REMOVED HERE... SCROLL TO THE END OF FILE ...]

2006-07-12 08:37:41.220 OLE DB SUBSCRIBER 'SUBSCRIBER':
 {? = call sp_getsqlqueueversion (?, ?, ?, ?)}
2006-07-12 08:37:41.314 OLE DB SUBSCRIBER 'SUBSCRIBER':
 {? = call sp_replsqlqgetrows (N'MYSERVER', N'mydb', N'Equipos')}
2006-07-12 08:37:41.439 OLE DB qrdrsvc 'MYSERVER':
 exec sp_helpdistpublisher @publisher = N'MYSERVER'
2006-07-12 08:37:41.454 Connecting to OLE DB MYSERVER at datasource:
 'MYSERVER', location: '', catalog: 'mydb', providerstring: '' using provider 'SQLNCLI'
2006-07-12 08:37:41.454 OLE DB MYSERVER: MYSERVER
   DBMS: Microsoft SQL Server
   Version: 09.00.2047
   catalog name: mydb
   user name: dbo
   API conformance: 0
   SQL conformance: 0
   transaction capable: 1
   read only: F
   identifier quote char: "
   non_nullable_columns: 0
   owner usage: 15
   max table name len: 128
   max column name len: 128
   need long data len:
   max columns in table: 1000
   max columns in index: 16
   max char literal len: 131072
   max statement len: 131072
   max row size: 131072
2006-07-12 08:37:41.454 OLE DB MYSERVER 'MYSERVER':
 set arithabort on set numeric_roundabort off set concat_null_yields_null on
 set quoted_identifier on set ansi_nulls on set ansi_padding on set ansi_warnings on
2006-07-12 08:37:41.454 OLE DB MYSERVER 'MYSERVER':
 select pubid, conflict_policy from dbo.syspublications where name = N'Equipos'
2006-07-12 08:37:41.454 OLE DB MYSERVER 'MYSERVER':
 {? = call sp_getqueuedarticlesynctraninfo (N'Equipos', 233)}
2006-07-12 08:37:41.454 El Agente de lectura de cola ha detectado el error 'Identificador de
 fila no vÔlido.' al conectarse a 'mydb' en 'MYSERVER'. Asegúrese de que la publicación y
 la suscripción se han definido correctamente y de que los dos servidores estÔn en ejecución. 
2006-07-12 08:37:41.470 Error al aplicar el mensaje en cola al publicador
2006-07-12 08:37:41.485 Subproceso de trabajo 396 : error en la tarea
2006-07-12 08:37:41.485 No hay ninguna transacción en cola disponible
2006-07-12 08:37:41.485 Disconnecting from OLE DB SUBSCRIBER 'SUBSCRIBER'
2006-07-12 08:37:41.485 Disconnecting from OLE DB SUBSCRIBER 'SUBSCRIBER'
2006-07-12 08:37:41.485 Disconnecting from OLE DB MYSERVER 'MYSERVER'
2006-07-12 08:37:41.485 Anulando lector de colas
The translation for the key line is: The Queue Reader Agent has encountered the error 'Row handle is invalid.' when connecting to 'mydb' on 'MYSERVER'. Ensure that the publication and subscription are defined properly and that both servers are running. I found here that the publication having problems was called 'Equipos'. We are a bit nearer to the solution. As a first idea, I tried to reinitialise the publication... and it worked. Reinitialising the publication probably wiped away any commands that were waiting (queued) to be applied at the publisher because the new fresh snapshot has the current data. After the initial snapshot was applied to subscribers, I monitored the Queue Reader Agent for some minutes and found that it was working fine. The problem was solved, but the big question remains... why did this error appear? I'm still wondering...

2006/06/23

SQL Server transactional replication issues regarding NOT FOR REPLICATION triggers. Part I

Scenario: Transactional replication with immediate updates among SQL Server 2005 publisher and subscribers. Articles/Tables have user defined triggers marked as NOT FOR REPLICATION (NFR) and those triggers are deployed to subscribers for load balancing.

Problem description: Even though the triggers are marked as NOT FOR REPLICATION (NFR), when the origin of the triggering action is a subscriber, the trigger is fired twice, at both the subscriber (expected behaviour) and at the publisher (changed behaviour from SQL Server 2000 point of view).

Side effects: If the double-fired trigger makes any update, this update is first executed at the publisher. When the update is run at the subscriber, symptoms described in KB241511 appear: Server: Msg 20515, Level 16, State 1, Procedure sp_MSreplraiserror, Line 9 Updatable Subscribers: Rows do not match between Publisher and Subscriber. Refresh rows at Subscriber. Server: Msg 20512, Level 16, State 1, Procedure sp_MSreplraiserror, Line 8 Updatable Subscribers: Rolling back transaction. This is because the change tried at the subcriber was already done by the publisher, changing msrepl_tran_version field, thus not matching the version that the subscriber has, and failing the transaction. According KB241511 - PRB: REPL: Msgs. 20515, 21054, or 20512 with Immediate-Updating Subscribers:

These messages appear if a change has been made at the publisher and that change has not appeared on the subscriber yet. These messages prevent you from making a change that would violate transactional consistency.
More information: This behaviour is new to SQL Server 2005. Database solutions including transactional replication that worked perfectly with SQL Server 2000 version, might have problems when migrating to SQL Server 2005 and they will need to make modifications in order to succeed. Acording to SQL Server 2005 Books Online: How to: Control the Behavior of Triggers and Constraints During Synchronization (Replication Transact-SQL Programming):
To prevent triggers from executing during synchronization:
  • When creating a new trigger, specify the NOT FOR REPLICATION option of CREATE TRIGGER (Transact-SQL).
  • For an existing trigger, specify the NOT FOR REPLICATION option of ALTER TRIGGER (Transact-SQL).
And also in Controlling Constraints, Identities, and Triggers with NOT FOR REPLICATION:
For example, if you define an insert trigger with the NOT FOR REPLICATION option set, all user inserts fire the trigger, but inserts from replication agents do not. Consider a trigger that inserts data into a tracking table: when the user inserts the row originally, it is appropriate for the trigger to fire and enter a row into the tracking table, but the trigger should not fire when that data is replicated to the Subscriber, because it would result in an unnecessary row being inserted in the tracking table.

As you can read, this comment is done from the publisher's point of view: what should happen on the subscriber when the insertion is done at the publisher? Since the trigger is marked as NFR, the trigger should not be fired there because it would not be fired by a user, but rather a replication agent. However, what happens when the insertion is done at the subscriber? Well... if you are using transactional replication with immediate updates, the behaviour is not symetric, I mean, the trigger will be fired twice: at the subscriber and also at the publisher (If there are more than one subscriber participating in the replication, when the change is replicated to the rest of them it will not be fired again, since the change is done by replication agent). This asymmetry in behaviour regarding NOT FOR REPLICATION triggers has been the reason for my headaches since we decided to migrate to SQL Server 2005.

Steps to reproduce the problem: Lets go and imagine a simple pair of master/detail tables: TQuotations and TQuotationsDetails. We will create triggers to update the master quotation record with the new sum of items' prices every time an insert/update/delete ocurrs on the details table. The tables's schema will be:

CREATE TABLE [dbo].[TQuotations](
  [Id] [int] NOT NULL,
  [Description] [varchar](50) NOT NULL,
  [Price] [money] NOT NULL CONSTRAINT [DF_TQuotations_Price]  DEFAULT ((0)),
  CONSTRAINT [PK_TQuotations] PRIMARY KEY CLUSTERED ([Id] ASC)
    WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[TQuotationsDetails](
  [QuotationId] [int] NOT NULL,
  [Id] [int] NOT NULL,
  [Description] [varchar](50) COLLATE Modern_Spanish_CI_AS NULL,
  [Quantity] [int] NOT NULL CONSTRAINT [DF_TQuotationsDetails_Quantity]  DEFAULT ((1)),
  [Price] [money] NOT NULL CONSTRAINT [DF_TQuotationsDetails_Price]  DEFAULT ((0)),
  CONSTRAINT [PK_TQuotationsDetails] PRIMARY KEY CLUSTERED ([QuotationId] ASC, [Id] ASC)
    WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
And the triggers, defined at TQuotationsDetails will be:
CREATE TRIGGER [dbo].[trg_Check_ins_upd_TQuotationsDetails] ON [dbo].[TQuotationsDetails]
FOR INSERT, UPDATE
NOT FOR REPLICATION
AS BEGIN
 SET NOCOUNT ON

 IF UPDATE(Quantity) OR UPDATE(Price) BEGIN
  UPDATE dbo.TQuotations
  SET Price = ROUND(AUX.Total, 2)
  FROM dbo.TQuotations INNER JOIN (
   SELECT dbo.TQuotations.Id AS QuotationId,
    SUM(ROUND(dbo.TQuotationsDetails.Quantity * dbo.TQuotationsDetails.Price, 2)) AS Total
   FROM dbo.TQuotations INNER JOIN
    dbo.TQuotationsDetails ON dbo.TQuotations.Id = dbo.TQuotationsDetails.QuotationId INNER JOIN
    (SELECT DISTINCT QuotationId FROM inserted) AS i ON dbo.TQuotations.Id = i.QuotationId
   GROUP BY dbo.TQuotations.Id) AUX ON AUX.QuotationId = dbo.TQuotations.Id
  WHERE dbo.TQuotations.Price <> ROUND(AUX.Total, 2)
 END
END
CREATE TRIGGER [dbo].[trg_Check_del_TQuotationsDetails] ON [dbo].[TQuotationsDetails]
FOR DELETE
NOT FOR REPLICATION
AS BEGIN
 SET NOCOUNT ON

 UPDATE dbo.TQuotations
 SET Price = ROUND(AUX.Total, 2)
 FROM dbo.TQuotations INNER JOIN (
  SELECT dbo.TQuotations.Id AS QuotationId,
   ISNULL(SUM(ROUND(dbo.TQuotationsDetails.Quantity * dbo.TQuotationsDetails.Price, 2)), 0) 
    AS Total
  FROM dbo.TQuotations INNER JOIN
   (SELECT DISTINCT QuotationId FROM deleted) AS d ON dbo.TQuotations.Id = d.QuotationId
   LEFT OUTER JOIN dbo.TQuotationsDetails
    ON dbo.TQuotations.Id = dbo.TQuotationsDetails.QuotationId
  GROUP BY dbo.TQuotations.Id) AUX ON AUX.QuotationId = dbo.TQuotations.Id
 WHERE dbo.TQuotations.Price <> ROUND(AUX.Total, 2)
END
You can now set up a publication including these two tables. Remember to set the option to deploy triggers to subscribers also (this is disabled by default). Configure a subscriber for your new publication and wait until the publication is ready at the subscriber. We will try to do some inserts first at the publisher in order to test that the triggers behave as expected:
INSERT INTO TQuotations (Id, Description) VALUES (1, 'the test1')
SELECT * FROM TQuotations
GO
--Id Description Price msrepl_tran_version
--1 the test1 0,00 75ACD2B8-7A0F-4F15-9F8F-0FA1D8640378
--
INSERT INTO TQuotationsDetails (QuotationId, Id, Description, Quantity, Price)
  VALUES (1, 1, 'Item 1', 1, 20)
INSERT INTO TQuotationsDetails (QuotationId, Id, Description, Quantity, Price)
  VALUES (1, 2, 'Item 2', 10, 5)
SELECT * FROM TQuotations
SELECT * FROM TQuotationsDetails
GO
--Id Description Price msrepl_tran_version
--1 the test1 70,00 80D3E3CB-30D5-42BF-A02A-1F395A94836D
--
--QuotationId Id Description Quantity Price msrepl_tran_version
--1 1 Item 1 1 20,00 CC20554D-E9DE-434D-866C-1A65C6F582A3
--1 2 Item 2 10 5,00 AB690758-ACA9-4D2C-A037-BFCE633A1BEA
As you can see the triggers run fine (the price at the master quotation Id 1 is updated) and the commands are replicated to subscriber without any problem. You can try to run updates and deteles on TQuotationsDetails and check the triggers are doing their job nicely. Now the problems: Let's connect to the subscriber and try to insert a new quotation (id=2) with some details:
INSERT INTO TQuotations (Id, Description) VALUES (2, 'the test2')
SELECT * FROM TQuotations
GO
--Id Description Price msrepl_tran_version
--1 the test1 70,00 80D3E3CB-30D5-42BF-A02A-1F395A94836D
--2 the test2 0,00 7AACDD7F-4A7A-4D0C-93BA-C6C37093FDE6
--
INSERT INTO TQuotationsDetails (QuotationId, Id, Description, Quantity, Price)
  VALUES (2, 1, 'Item 1', 1, 20)
INSERT INTO TQuotationsDetails (QuotationId, Id, Description, Quantity, Price)
  VALUES (2, 2, 'Item 2', 10, 5)
SELECT * FROM TQuotations
SELECT * FROM TQuotationsDetails
GO
--Server: Msg 20515, Level 16, State 1, Procedure sp_MSreplraiserror, Line 9
--Updatable Subscribers: Rows do not match between Publisher and Subscriber.
--Refresh rows at Subscriber.
--Server: Msg 20512, Level 16, State 1, Procedure sp_MSreplraiserror, Line 8
--Updatable Subscribers: Rolling back transaction.

My feedback to Microsoft on this issue: NOT FOR REPLICATION clause in triggers that are replicated to subscribers are fired twice in both publisher and subscriber This issue was resolved as 'By Design'. Please have a look at it. You can read Microsoft's point of view on this problem. This is an 'open' issue for me. I have not found any solution to this problem yet. As a workaround (suggested by Microsoft) I am forced to use transactional replication with queued updates which has its own drawbacks also (I have enough on this for another article).

Keywords: SQL Server 2005, transactional replication with immediate updates, Microsoft, NOT FOR REPLICATION, NFR, trigger, subscriber, fired twice, side-effects, KB241511, Rows do not match between Publisher and Subscriber, Updatable Subscribers: Rolling back transaction, bug, error, workaround, resolution

2006-10-02 Update: See the workaround in SQL Server transactional replication issues regarding NOT FOR REPLICATION triggers. Part II

2006/06/16

Problems with identity fields in MS Access when using SQL Server's transactional replication with queued updates

Scenario: Transactional replication among SQL Server 2005 publisher and 2000 or 2005 subscribers with queued updates. Articles/Tables with a primary key defined by an identity field throw errors in MS Access when inserted at the subscriber (SQL Server 2005 Standard SP1 & MS Access 2003 SP2).

Problem description: @@identity is changed by replication triggers deployed to subscribers.

Side effects: Since MS Access 2003 uses @@identity (see BOL for more information) to retrieve the last inserted record for tables indexed by identity fields, every time you try to insert a new record on a subscriber of transactional replication with queued updates on a table with an identity primary key, the following error message appears: "The data was added to the database but the data won't be displayed in the form because it doesn't satisfy the criteria in the underlying record source."

Good solution (but out of our hands): MS Access 2003 should use SCOPE_IDENTITY() (see BOL) instead of @@identity to retrieve the record that it has just inserted. Since @@identity is not scope-aware, if there are triggers in the offending table that do other inserts on other tables also managed with identities, @@identity would return the wrong value to client application (MS Access). SCOPE_IDENTITY() solves this problem by returning the last inserted identity within the same scope. Since this behavior is hard-coded inside MS Access there is very little we can do with user defined code (VB) to change this. Until a MS Office Service Pack solves this issue, the only things that programmers can do are workarounds.

Workarounds:

a. Do not use identity as PK on tables. Sometimes identities are avoidable, some other times not. If you have read up to this point, you probably need them and not using them is not an option for you.

b. Create triggers on tables experiencing this problem to discard inserts done by MS Access and do the real inserts through VB code inside a form. First of all, you need a way for the trigger to know when the insertion is done by MS Access directly and when the insertion comes through your user defined code. For this purpose you can use SET CONTEXT_INFO as in the following VB code (self documented):

Private ResyncCommandORI As String
Private Sub Form_BeforeUpdate(Cancel As Integer)
    Dim strSQL As String
    Dim rst As ADODB.Recordset
    Dim NewIdentity As Long

    If (Me.NewRecord) Then ' We are inserting
        ' Set this context variable so that the trigger knows the insertion
        ' is done via VB. If this context variable is not set, the trigger
        ' will delete the inserted record transparently.
        CurrentProject.Connection.Execute "SET CONTEXT_INFO 0x1"

        strSQL = "INSERT INTO TestTable (TestField) " & _
                 "VALUES ('" & TestField.Value &amp;amp;amp;amp; "'); SELECT SCOPE_IDENTITY() AS Id"
        Set rst = CurrentProject.Connection.Execute(strSQL)

        ' Once the insertion is done, we delete the context info
        CurrentProject.Connection.Execute "SET CONTEXT_INFO 0x0"

        ' And now we retrieve the fresh new identity
        ' Skip first recordset, closed and without data (insert statement)
        Set rst = rst.NextRecordset
        If (rst.State <> adStateClosed) Then
            If Not rst.EOF Then
                NewIdentity = rst.Fields(0)
            End If
        End If

        ' In order to MS Access copes with the insertion that he thinks
        ' he will be able to do without problems, we need to change
        ' ResyncCommand so that it does not use ? (@@identity) and
        ' use the NewIdentity we have just read.
        ResyncCommandORI = Me.ResyncCommand
        Me.ResyncCommand = Replace(Me.ResyncCommand, "?", Trim(Str(NewIdentity)))
        ' ResyncCommand is ready to be used by MS Access so that it
        ' can read the new record.
    End If
End Sub
Private Sub Form_AfterInsert()
    ' We just leave ResyncCommand as it was originally (with the ?)
    ' so that updates will work also.
    Me.ResyncCommand = ResyncCommandORI
End Sub

Besides this VB user code in every form dealing with tables with identities, we need to create the following trigger on those tables:

CREATE TRIGGER [dbo].[insert_TestTable_trigger] on [dbo].[TestTable]
   AFTER INSERT
   NOT FOR REPLICATION
AS
BEGIN
    DECLARE @v varbinary(1024) -- 128bytes

    -- added to prevent extra result sets from interfering with SELECT statements.
    SET NOCOUNT ON
    SELECT @v = context_info
      FROM master.dbo.sysprocesses
      WHERE spid = @@SPID;

    IF @v <> 0x1 BEGIN
      -- Do not complete the insertion. If we do a ROLLBACK MS Access
      -- will know and will raise an error. We just do a DELETE of the
      -- record just inserted and nothing wrong happens.
      DELETE dbo.TableTest
        FROM dbo.TableTest
        INNER JOIN inserted ON dbo.TableTest.Id = inserted.Id
    END
END

Using this code in every form and in every table that uses identities is a hard work and a nasty workaround. You will agree that this is not an elegant nor scalable solution and it is not a good idea to mix this complex logic between client (MS Access) and server (SQL Server). If you are not satisfied with this workaround, please keep on reading...

c. Modify SQL Server 2005 core system stored procedures to leave @@identity unchanged on sync triggers deployed to subscribers. SQL Server has two kind of transactional replication modes: immediate and queued. When in immediate mode, every subscriber connects to publisher in every insert/update/delete using a two-phase protocol (2PC) so that the statement is only 'approved' for the subscriber when it has been committed previously on the publisher. When transactional replication drops into queued mode (because of a network connection loss or intentionally because you want it that way), the changes are queued and they are tried to be applied on a scheduled basis, hence a little (but selectable) latency occurs. No matter which mode you use, the identities are generated (by default) at the side doing the insertion and that is the reason for them to need a reserved space for assigned identities (identity ranges) so that there are no collisions (conflicts). This is another changed behavior since SQL Server 2000. When using SQL Server 2000 and transactional replication with immediate updates, identities were generated always at the publisher even though the insertions were done on a subscriber. In 2005, identity ranges are used always (by default), even when using immediate updates. In order to implement replication SQL Server deploys a series of stored procedures and triggers to subscribers. The names for those triggers and stored procedures are like:

trg_MSsync_del_TestTable
trg_MSsync_ins_TestTable
trg_MSsync_upd_TestTable
sp_MSdel_dboTestTable
sp_MSins_dboTestTable
sp_MSupd_dboTestTable

It seems that, internally, the *ins* trigger or stored procedure involved during an insertion do another insert into a table with identities, thus, changing the value of @@identity and making it unusable for MS Access afterwards. As I said before, it's MS Access fault but, as long as we cannot change MS Access behavior, we will modify SQL Server core system stored procedures that create the scripts that create the former triggers so that @@identity is preserved when they are executed. Our work is based on a trick widely used by programmers to circumvent MS Access bad behavior (thanks Sylvain Lafontaine for your information on my post at news microsoft.public.access.adp.sqlserver). This trick is commented in this post and basically the idea is saving @@identity on a local variable right at the beginning of the trigger and then restore it before leaving out such as:

CREATE TRIGGER [dbo].[trg_MSsync_ins_TestTable] on [dbo].[TestTable]
AFTER INSERT
NOT FOR REPLICATION
AS BEGIN
 DECLARE @identity int, @strsql varchar(128)
 SET @identity=@@identity
   -- Main block of the trigger here
 SET @strsql='select identity (int, ' + cast(@identity as varchar(10)) + ', 1) as id into #tmp'
 EXECUTE(@strsql)
END

This is the trick that people have been using on their own triggers. Making this goes against the definition of @@identity, that must always return the last inserted identity in any table for the current connection, but as I said, this is a trick and it does the job (MS Access does the inserts without any error despite its own bug). The main challenge for us now is that the offending triggers are not user-generated, but system-generated. Where do I need to 'touch' to modify SQL Server's behavior and include those 4 lines of code into their own generated triggers? That's the question. And here is the answer: The Resource Database (or RDB). I first read about it here and here are some official words from Microsoft:

[...]The Resource database is a read-only database that contains all the system objects that are included with SQL Server 2005. SQL Server system objects, such as sys.objects, are physically persisted in the Resource database, but they logically appear in the sys schema of every database. The Resource database does not contain user data or user metadata.[...]

First of all, I must say that the following procedure is completely unsupported by Microsoft. As stated in the former URL:

[...]The Resource database should only be modified by or at the direction of a Microsoft Customer Support Services (CSS) specialist. The only supported user action is to move the Resource database to the same location as the master database. For more information, see Moving System Databases.[...]

The (unsupported) steps to make things work:

1. Do a backup. As you have probably guessed, the first thing you should do before making any unsupported change is a backup. Do a backup of your databases and (more important) do a backup of your original files mssqlsystemresource.mdf and mssqlsystemresource.ldf. They must be in the same directory as your master database files, usually %PROGRAMFILES%\Microsoft SQL Server\MSSQL.1\MSSQL\Data. Since that database is read only, there should be no problem if you try to do the backup with mssqlserver service running, but just for security, stop the service and do the backup afterwards (from a cmd line):

net stop mssqlserver
copy mssqlsystemresource.mdf mssqlsystemresource.mdf.original
copy mssqlsystemresource.ldf mssqlsystemresource.ldf.original

The backup could be as simple as that. You can try this whole procedure on a virtualised / testing environment for improved security.

2. Start SQL Server in single-user mode. In order to modify the resource database, we need to start SQL Server in single-user mode (/m parameter). We need to stop SQL Server Agent and any other service that is SQL Server related (i.e. reporting services, analysis services, IIS if apply) so that the agent (or other) could use the unique allowed connection.

net stop sqlserveragent
net stop (any other service related)
net start mssqlserver /m

3. Use sqlcmd to make the changes. This is the core of all this article. You need now to connect to the server and execute the ALTER PROCEDURE to modify/patch the system stored procedures sys.sp_MSscript_sync_del_trig, sys.sp_MSscript_sync_ins_trig and sys.sp_MSscript_sync_upd_trig. You can download the full source code for patching SQL Server 2005 SP1 here: sp_MSscript_sync_xxx_trig_for_SQL_Server_2005_SP1_9.00.2047.00.zip. Unzip the files and run the following commands:

sqlcmd -U sa -P yoursapass -A -d master -i sp_MSscript_sync_del_trig.sql
sqlcmd -U sa -P yoursapass -A -d master -i sp_MSscript_sync_ins_trig.sql
sqlcmd -U sa -P yoursapass -A -d master -i sp_MSscript_sync_upd_trig.sql

If everything has gone nicely, you should be again on the command prompt. Now you just need to stop the service and start it again as usual (not in single-user mode).

net stop mssqlserver
net start mssqlserver
net start (anything previously stopped)

Congratulations! Now the system stored procedures sys.sp_MSscript_sync_xxx_trig contain our custom pieces of code. You won't need to follow this procedure for every server involved in replication as long as they are all SQL Server 2005 with the same service pack applied. You just need to copy the new (updated) mssqlsystemresource.mdf and mssqlsystemresource.ldf files to every server/instance (stop service first). Furthermore, the files are the same for 32bit versions and x64 versions of SQL Server 2005 (you can check it using any crc32, md5 file checker as I did). It is important that every subscriber have this patched version of resource database files since the patched stored procedure is called at the subscribers. When a publication is marked for reinitailization at the publisher and the initial snapshot is run, the scripts sent to the subscribers do not include the offending triggers, they are generated and applied directly at the subscribers. The initial snapshot publication files include:

  • bulk copy files with the contents of the tables (.bcp)
  • DDL commands to recreate the tables, sp_MSxxx_ stored procedures and schema (.sch)
  • Constraints dealing with identity ranges repl_identity_range_tran_xxx (.dri)
  • Indexes and primary key constraints (.idx)
  • Pre-script to drop foreign keys pointing to the table we are publishing and need to drop and recreate (.pre)
  • User defined triggers (.trg)

As you can see, there is no trace about the triggers that appear finally on subscribers. That is because they are generated and applied locally on the subscribers and that is the reason because every subscriber must have the patched files.

4. The final step. Reinit every publication that you had already running so that the new patched triggers are generated and applied on every subscriber. After that, MS Access should work nicely without any problem dealing with identities.

Some final words (disclaimer): As I already stated, this patching procedure is completely unsupported by Microsoft. I shall not be liable for any damage (including, without limitation, damage for loss of business or loss of profits) arising from the use of, or inability to use, this document or any material contained in it, or from any action or decision taken as a result of using this document or any such material. I tested this procedure in my environment, solved the problem and worked as expected. I encourage you to do your own test in virtualised/testing environments first. Before applying any new SQL Server Service Pack that might arise in the future, I suggest to stop the server, restore the original files we copied at step 1 of the procedure, and then follow the instructions of the SP. After the SP is applied (also on subscribers), you can reinit the publications and see if the MS Access error appears again. If so, I suppose you could re-patch the new resource database that the SP might have set in place. In my scenario SQL Server 2005 was Spanish localized and a SELECT SERVERPROPERTY('ResourceVersion'), SERVERPROPERTY('ResourceLastUpdateDatetime') returned: Original RDB: 9.00.2047 2006-04-14 04:17:29.497 Patched RDB: 9.00.2047 2006-06-15 19:08:00.210

Note for SQL Server 2000 subscribers: Solving this issue for SQL Server 2000 subscribers is much more easier. We need to modify the same dbo.sp_MSscript_sync_xxx_trig system stored procedures, located at the master database. No need to deal with Resource Databases, needing to stop and start the server in single-user mode, etc... just a simple ALTER PROCEDURE and a reinitialization of the publications. Well, there are some additional steps to be taken in order to still have those stored procedures marked as system objects. If you just do an ALTER PROCEDURE, they will no longer be marked as system objects. The complete scripts for patching SQL Server 2000 SP4 are available here.

Links: You receive an error message when you add a record to a subform in an Access 2002 Project Moving System Databases (SQL Server 2005) http://msdn2.microsoft.com/en-us/library/ms187342.aspx BOL chapter about SCOPE_IDENTITY() function

My feedback to Microsoft on this issue: http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=125475

Keywords: Identity, @@IDENTITY, transactional replication with queued updates, SQL Server 2005, Microsoft, MS Access, SCOPE_IDENTITY(), Reserved Resource Database, sp_MSscript_sync_ins_trig, bug, error, workaround, resolution

NOTE: Just as sp_MSscript_sync_ins_trig purpose is scripting the trg_MSsync_ins_TableName trigger on subscribers, trg_MSsync_upd_TableName is scripted by sp_MSscript_sync_upd_trig, trg_MSsync_upd_TableName is created by sp_MSscript_sync_del_trig and the all three need to be patched the same way. All of them make inserts into MSreplication_queue table, which has an identity column called orderkey. You can read about MSreplication_queue table on SQL Server 2005 Books Online. Any command that should be delivered to publisher is inserted into this table, thus changing the value of @@identity variable. Now I have the three stored procedures (sp_MSscript_sync_ins_trig, sp_MSscript_sync_upd_trig and sp_MSscript_sync_del_trig) patched and everything is running smoothly.