Singleton pattern, scheduled receive location and suspended orchestrations

Hello,

In our project we have several orchestrations that are triggered every minute with the Scheduler Adapter (business requirement is near real-time).

Each of these orchestration must have only 1 instance running at the same time, so we have implemented the singleton pattern. We followed a design like this one: https://fehlberg.wordpress.com/2008/06/06/biztalk-singleton-orchestration-design/

But we end up having suspended orchestrations with the error "The instance completed without consuming all of its messages. The instance and its unconsumed messages have been suspended."

I know it's a risk with this pattern. I thought it would happen only when the instance would run in more than 1 min (because the scheduler adapter is sending a file every minute).

But it happens even with some instances running in about 15sec, so there is something I don't understand, here is an example:

Here we can see that the TransferRegion orchestration started at 10:26:00 (which is expected) and failed.

When I open the orchestration debugger, the time is different, the orchestration is shown starting at 10:26:44 (44 sec later):

And the orchestration ended at 10:27:00:453.

The message not consumed that caused the error has been triggered at 10:27:00.

So I understand that the message arrived just after the listen shape but before the orchestration ends, but I don't understand why the orchestration started 44 sec after is was supposed to ...

Any idea? I'm kind of clueless for now ...

February 19th, 2015 6:03pm

Sounds like Zombie message. Research more on that. See if you can avoid delay shape somehow. Try to achieve such requirement by keeping track of time some other way (probably load in SQL) or change design so that it is not dependent on time but instead dependent on some sort of trigger file (like you can mark last file as unique and Orch instance will keep running till it receives the last file)

High level explanation for Zombie message : Also known as In-Flight message it occurs when delay time is reached and orch is marked as completed (beacuse delay has been achived) but while it is marked as completed , it has subscribed for new incoming message also. After subscribing Orch instances ended and message has no where to go - hence a zombie message or in-flight message or Orphaned message!

Thanks

Please vote up if it helped you understand or answered your question

Free Windows Admin Tool Kit Click here and download it now
February 19th, 2015 6:14pm

If they're Singletons, why do they need to be triggered?  A Singleton would run continuously.

Can you do the 1 min schedule internally?

February 19th, 2015 7:30pm

Sounds like Zombie message. Research more on that. See if you can avoid delay shape somehow. Try to achieve such requirement by keeping track of time some other way (probably load in SQL) or change design so that it is not dependent on time but instead dependent on some sort of trigger file (like you can mark last file as unique and Orch instance will keep running till it receives the last file)

High level explanation for Zombie message : Also known as In-Flight message it occurs when delay time is reached and orch is marked as completed (beacuse delay has been achived) but while it is marked as completed , it has subscribed for new incoming message also. After subscribing Orch instances ended and message has no where to go - hence a zombie message or in-flight message or Orphaned message!

Thanks

Please vote up if it helped you understand or answered your question

Yes they are zombie messages, I agree with that.
To avoid them, I'm working on a change on the delay. As I know when the next trigger message will be send by the scheduler, I can adapt the delay shape duration, to be sure that the message will be consumed by the singleton loop or that he will "arrive" after the orchestration has completed.

But I'm wondering why the Orchestration start time is not consistent between the BizTalk Admin Console and the Orchestration debugger ... 44 seconds is a lot especially when the Orchestration is triggered every minute.

If both are correct what mean these 2 different times?

Free Windows Admin Tool Kit Click here and download it now
February 19th, 2015 8:32pm

If they're Singletons, why do they need to be triggered?  A Singleton would run continuously.

Can you do the 1 min schedule internally?

Good question!

The design was already done when I arrived on the project so I don't really know the reason behind it.

It's a critical application, we can't really have an outage. so I think one of the reason might be to be sure that if an instance fails, maximum 1 min later the next one will start.

If a continuous singleton fails, somebody has to restart it manually, no?

I haven't thought of that, but I guess I could have the 1 minute inside the loop, by checking the process duration at the end, if it's less than 1 min I wait a little bit, otherwise I loop right away and start the process again ...

But I also don't know the memory/performance impact of a continuous singleton, the orchestration is probably going to be dehydrated or the memory usage is going to grow loop after loop ...

We have about 8 orchestrations working with that same design.


February 19th, 2015 8:44pm

So, are these 8 Orchestrations actually processing messages or are they doing some other work every minute or so?

If they're not processing actual messages, maybe a Windows Service or job scheduled by the SQL Agent or Windows Scheduler would be more...appropriate.  We all love BizTalk, but some parts of the stack do some things better.

If they are processing messages or doing some genuine BPI type work, calling services, transformations etc., then maybe a Singleton+Scheduled Task is not the best solution.

I'm assuming the actual requirement is the process must run with no less than 1 minute between executions, but if an execution takes >1 min, don't overlap.  An internal 1 min timer would probably work very well, but you still have the problem of activating and deactivating the Orchestration.  You'd basically have to build some control infrastructure similar to the EDI Batching Service.

Here's a completely different suggestion:

  1. SQL Table that maintains the state of each process, say Active or Complete.
  2. Poll every 15 seconds a Stored Procedure that tests the status and returns an activation message when that process shows Complete.  It would flip it to Active at the same time.
  3. Orchestration runs and the last shape sends a message to change the status to Complete.
  4. Rinse Repeat.  No long running Orchestrations, Singletons, Correlations, etc...
Free Windows Admin Tool Kit Click here and download it now
February 19th, 2015 10:09pm

So, are these 8 Orchestrations actually processing messages or are they doing some other work every minute or so?

If they're not processing actual messages, maybe a Windows Service or job scheduled by the SQL Agent or Windows Scheduler would be more...appropriate.  We all love BizTalk, but some parts of the stack do some things better.

If they are processing messages or doing some genuine BPI type work, calling services, transformations etc., then maybe a Singleton+Scheduled Task is not the best solution.

I'm assuming the actual requirement is the process must run with no less than 1 minute between executions, but if an execution takes >1 min, don't overlap.  An internal 1 min timer would probably work very well, but you still have the problem of activating and deactivating the Orchestration.  You'd basically have to build some control infrastructure similar to the EDI Batching Service.

Here's a completely different suggestion:

  1. SQL Table that maintains the state of each process, say Active or Complete.
  2. Poll every 15 seconds a Stored Procedure that tests the status and returns an activation message when that process shows Complete.  It would flip it to Active at the same time.
  3. Orchestration runs and the last shape sends a message to change the status to Complete.
  4. Rinse Repeat.  No long running Orchestrations, Singletons, Correlations, etc...

Hi John,

We have 8 independent orchestrations. Each of them is triggered by different scheduled receive location.

The orchestrations then process messages (calling stored procedures, mapping and calling web services).

You are right for the requirement. I tried to deactivate the receive location at the beginning of the process and restart it at the end, but I ran into some weird errors on the SSO DB and others like "Could not retrieve transport type data for Receive Location 'Trigger_xxxxx_SCHEDULE' from config store. The transaction associated with the current connection has completed but has not been disposed.  The transaction must be disposed before the connection can be used to execute SQL statements."

Thanks for the new design suggestion, I will see the effort required and if the client approves it ;)

February 23rd, 2015 7:18pm

We do design like that what John suggested, that is the better way to implement this try on scenario.

On your existing design, if it was working fine before then check in sql db if any backup job/any other performance overhead job is running during this time when errors are occurring. 

Network and sql performance can be cause for this issue as your orchestration have lot of sql connection dependency.

Regards

Suman 

Free Windows Admin Tool Kit Click here and download it now
February 24th, 2015 12:31am

We do design like that what John suggested, that is the better way to implement this try on scenario.

On your existing design, if it was working fine before then check in sql db if any backup job/any other performance overhead job is running during this time when errors are occurring. 

Network and sql performance can be cause for this issue as your orchestration have lot of sql connection dependency.

Regards

Suman 

Hi Suman,

The "Not consumed" messages started when the client requested to increase the frequency from 2min to 1min.

I hope he will agree / find budget for the new design.

Thanks all for your help.

February 24th, 2015 12:43am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics