Error Handling for Dynamics 365 CE ↔ Finance & Operations Dual-write

Design for failure, detect fast, isolate impact, and self-heal. That’s the playbook.

TL;DR

Dual-write breaks at schema, data quality, security, and process timing seams. Architect your flows with pre-validation in CE, business rule parity with F&O, observability (logs + alerts), a quarantine (“parking lot”), deterministic retries, and a clear reprocessing pipeline. Treat dashboards as command centers, not decoration.

What’s inside

Architecture at a glance

Dual-write gives near real-time sync between CE (Dataverse) and F&O via mapped entity pairs, triggers, and runtime handlers on both sides. Treat it like a distributed system: network hops, auth tokens, schema drift, referential integrity, and throughput limits are all real. Build guardrails where data enters, not only where it fails.

Error taxonomy (where it breaks)

Auth & Connectivity

Expired tokens, revoked permissions, broken connection references, network timeouts.

Schema & Mapping

Missing fields, data type mismatches, option set ↔ enum misalignment, length overflows.

Data Quality

Nulls where required, invalid references, company/legal entity mismatch, number sequence rules.

Business Rules

CE allows, F&O rejects (credit hold, status transitions, posting requirements).

Throughput & Ordering

Out-of-order updates, bursts, throttling, long chains of dependent entities.

A–Z of production-grade error handling

A. Alignment & Architecture
Document entity pairings, directionality, and prerequisites. Keep a living data contract.
B. Baseline Auth
Service principals with least privilege; rotate secrets; health probe connections post-deploy.
C. Company Mapping
Validate legal entity/company early; block mismatches before sync.
D. Data Model Parity
Mirror required fields and constraints in CE via business rules/plug-ins.
E. Entity Map Discipline
Keep maps small & cohesive. Version them. Avoid “mega maps.”
F. Field-Level Constraints
Pre-validate lengths, formats, and enums in CE; never let bad data leave.
G. Governance & SLAs
Define RACI, response times, and escalation paths for sync failures.
H. Health Checks
Daily map status, retry queues, last success timestamps, and error rate thresholds.
I. Idempotency
Ensure reprocessing the same record doesn’t double-post or corrupt state.
J. Journaling
Write compact logs with correlation IDs across CE↔F&O to reconstruct timelines.
K. Kill Switch
Circuit breaker to pause affected maps without nuking the environment.
L. Limits & Throttling
Respect API limits; batch initial loads; stagger bursts; backoff on 429/5xx.
M. Mapping of Enums
Map by codes, not labels; centralize enum dictionaries; test round-trips.
N. Number Sequences
Guard F&O-owned identifiers; avoid CE generating values F&O expects to own.
O. Orchestration
Order parent→child; enforce dependencies; delay children until parents exist.
P. Parking Lot (Quarantine)
Divert toxic records into a holding table with a reason & fix hints.
Q. Quality Rules
DQ checks in Power Query/CE; reject early; surface friendly errors to users.
R. Reprocessing Pipeline
One-click requeue per record or batch; track attempts & outcomes.
S. Secrets & Certs
Key Vault, managed identities where possible, and audit access.
T. Telemetry
Push CE plug-in traces + F&O logs to a single telemetry sink (e.g., App Insights).
U. Upgrades
Test maps after solution/PU updates; detect schema drift before prod.
V. Validation Parity
Mirror F&O business validation in CE plug-ins to prevent futile trips.
W. War-room Runbooks
Pre-baked steps for auth, mapping, DQ, and throughput incidents.
X. eXceptions Style
Standard error codes, human-readable messages, remediation tips.
Y. Year-end & Close
F&O period close adds rules—expect stricter validations & timing windows.
Z. Zero-Downtime
Blue-green map rollout, feature toggles, and rollback plans.

Patterns: quarantine, retries, and runbooks

Quarantine (“Parking Lot”)

Create a custom CE entity (e.g., Dual-write Error) capturing record reference, map name, error class, reason, suggested fix, and retry count. Divert failures here via plug-ins or post-operation handlers. Add a dashboard for Ops to triage.

Deterministic Retries

Use exponential backoff for transient 429/5xx. Hard-fail immediately on schema or DQ errors with actionable messages. Cap attempts, then park.

Selective Resubmission

After fix (e.g., missing parent), reprocess only the impacted records. Avoid mass “select all and pray.” Maintain idempotent upserts.

War-room Runbook (Skeleton)

  • Identify: Which map, entity, company?
  • Classify: Transient vs permanent.
  • Contain: Pause the map (kill switch) if blast radius is growing.
  • Fix: Data correction / mapping / permission / env health.
  • Reprocess: Targeted retries with logging & confirmation.
  • Review: Root cause, action items, and change ticket.

Code snippets: CE plug-in & F&O validation

// CE (Dataverse) C# Plug-in — PreOperation validation for Dual-write-bound entity
public class DualWritePreValidate : IPlugin
{
  public void Execute(IServiceProvider serviceProvider)
  {
    var ctx = (IPluginExecutionContext)serviceProvider.GetService(typeof(IPluginExecutionContext));
    var factory = (IOrganizationServiceFactory)serviceProvider.GetService(typeof(IOrganizationServiceFactory));
    var svc = factory.CreateOrganizationService(ctx.UserId);

    try
    {
      var target = (Entity)ctx.InputParameters["Target"];
      // Example guards: length, required combos, enum codes, company
      GuardLength(target, "bn_name", 60);
      RequireIf(target, "bn_creditHold", true, requiredField: "bn_creditHoldReason");
      ValidateEnumCode(target, "bn_customerTypeCode", new[] { 100000000, 100000001 });

      // Optional: push normalized error to custom log entity instead of letting Dual-write fail downstream
    }
    catch (ValidationException vex)
    {
      // Make it human. Include a code, cause, and remediation hint.
      throw new InvalidPluginExecutionException($"DW-E1001: {vex.Message} — Fix the data and save again.");
    }
    catch (Exception ex)
    {
      // Unknowns should surface but not leak internals
      throw new InvalidPluginExecutionException($"DW-E1999: Unexpected error. Contact support with CorrelationId={ctx.CorrelationId}.", ex);
    }
  }

  // Helpers omitted: GuardLength, RequireIf, ValidateEnumCode...
}

// F&O (X++) — Validate write to mirror CE rules and throw friendly messages
public boolean validateWrite()
{
    boolean isValid = super();
    if (!isValid) return false;

    // Example: enforce legal entity & required combo
    if (this.DataAreaId == '' || this.CustomerGroup == '')
    {
        error("DW-F1002: Company and Customer group are required for dual-write.");
        return false;
    }

    if (this.CreditMax < 0)
    {
        error("DW-F1010: Credit limit cannot be negative (check CE 'Credit Limit').");
        return false;
    }
    return true;
}

Observability & alerting blueprint

  • Single pane: CE dashboards (quarantine queue, retry counts), F&O Dual-write workspace, and an aggregated telemetry view.
  • Correlation IDs: Include in CE plug-in trace, quarantine record, and any F&O log entry.
  • Alerts: Error rate > X% over 5 min, consecutive failures on a map, or no successful sync for Y minutes.
  • KPIs: Mean time to detect (MTTD), mean time to repair (MTTR), parked records backlog, top 5 error signatures.

Governance, SLAs, and change control

SLA tiers

P1 (financial posting blocked) 1-hour response; P2 4-hours; P3 next business day. Define and enforce.

Change windows

No map changes during period close or payroll; feature flags for risky toggles.

Post-mortems

Blameless, timestamped, with concrete fixes (tests, monitors, docs updates).

Checklists & go-live guardrails

  • ✅ CE plug-ins replicate F&O required combos & lengths; blocking errors are human-friendly.
  • ✅ Enum/code dictionaries are centralized and tested in both directions.
  • ✅ Quarantine entity, dashboard, and Power Automate reprocess action exist.
  • ✅ Kill switch per map + documented rollback path.
  • ✅ Alerts wired for error rate, no-success window, and backlog growth.
  • ✅ Runbooks for auth failure, mapping drift, data burst, and period close.
  • ✅ Load tests for peak create/update throughput; backoff verified.
  • ✅ Change control for schema updates on either side (contracts reviewed).

Hot take:
Dual-write doesn’t fail in prod—it fails in design. If CE lets bad data exist, F&O will loudly refuse it. Validate earlier, log smarter, and make retries boring.

Discover more from BooNars

Subscribe to get the latest posts sent to your email.

Leave a comment