Skip to content

Fix Windows PowerShell host-start hang in SetCorrectExecutionPolicy#2328

Merged
andyleejordan merged 6 commits into
mainfrom
andyleejordan-fix-winps-host-start-hang
Jun 24, 2026
Merged

Fix Windows PowerShell host-start hang in SetCorrectExecutionPolicy#2328
andyleejordan merged 6 commits into
mainfrom
andyleejordan-fix-winps-host-start-hang

Conversation

@andyleejordan

Copy link
Copy Markdown
Member

Fixes #2323.

On recent in-box Windows PowerShell 5.1 servicing builds, SetCorrectExecutionPolicy's Get-ExecutionPolicy -List call autoloads Microsoft.PowerShell.Security into a runspace whose reused InitialSessionState already carries that module's ObjectSecurity type data, throwing "The member AuditToString is already present". That kills the pipeline thread during host startup while TryStartAsync is still awaiting queued startup tasks, so the server hangs and rides the CI timeout.

The execution-policy query is best-effort, so this wraps it in try/catch and skips policy configuration on failure (mirroring the existing Set-ExecutionPolicy handling), plus a guard against a short result list.

Because the hang is fixed, this also reverts the in-box Windows PowerShell E2E test skips from #2318. The ci-test.yml job-timeout backstop is kept.

Known follow-ups

  • Three LSP help tests (Expand-Archive synopsis) fail under in-box powershell.exe on my local older servicing build (.8655) but pass under pwsh; letting CI judge them on its build. The server returns auto-generated syntax instead of the MAML synopsis only in the WinPS server runspace — I could not isolate the mechanism locally.
  • Separately, OmniSharp's From(...) silently swallows an exception thrown from the DAP OnInitialize handler, so a startup failure wedges instead of failing fast — I'll file a tracking issue.

On recent in-box Windows PowerShell 5.1 servicing builds, the language and
debug servers intermittently hang on startup and ride the CI job timeout
(#2323). The wedge is `SetCorrectExecutionPolicy`: it calls
`Microsoft.PowerShell.Security\Get-ExecutionPolicy -List`, which autoloads
`Microsoft.PowerShell.Security` into the freshly created runspace. That
runspace's `InitialSessionState` is reused from the host runspace and already
carries the module's `ObjectSecurity` type data, so re-binding it throws "The
member `AuditToString` is already present". `PsesInternalHost.Run()` catches
it, faults `_started`, and the pipeline thread exits — but `TryStartAsync` is
still awaiting queued startup tasks that now never run, so it hangs forever.

Configuring the execution policy is best-effort, so we wrap the
`Get-ExecutionPolicy` query in try/catch (mirroring the existing
`Set-ExecutionPolicy` handling just below it) and skip policy configuration
when it fails, rather than letting a type-data hiccup abort host startup. I
also guard the subsequent indexing against a short result list.

With the hang fixed we no longer need the in-box Windows PowerShell E2E skips
added in #2318, so this reverts them: the `SkippableFactOnWindowsPowerShell`
attributes, the `LSPTestsFixture` early-return, and the two attribute files.
The DAP suite passes clean under `powershell.exe`. Three LSP help tests
(`Expand-Archive` synopsis) still fail locally on my older servicing build
(.8655); they pass under `pwsh`, so I'm letting CI judge them on its build
rather than re-skipping. The `ci-test.yml` job timeout from #2318 stays as a
backstop.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the real root cause behind the Windows CI hang tracked in #2323: on recent in-box Windows PowerShell 5.1 servicing builds, SetCorrectExecutionPolicy's Get-ExecutionPolicy -List call can throw an ObjectSecurity type-data conflict ("The member ... is already present") while autoloading Microsoft.PowerShell.Security, which kills the startup pipeline thread and hangs the server. The query is wrapped in try/catch (best-effort, mirroring the existing Set-ExecutionPolicy handling) plus a short-list guard. Because the hang is addressed at the source, the PR also reverts the stopgap in-box Windows PowerShell E2E test skips introduced by #2318, restoring full E2E coverage on that leg.

Changes:

  • Harden SetCorrectExecutionPolicy against the type-data conflict by catching and logging the failure (then skipping policy configuration) and guarding against a too-short policy list.
  • Remove the discovery-time Windows PowerShell skip attributes (SkippableFactOnWindowsPowerShell/SkippableTheoryOnWindowsPowerShell) and revert the E2E tests back to [Fact]/[SkippableFact]/[SkippableTheory].
  • Revert the #2318 test-harness "hygiene" changes (log-tail yield and the attach poll Start-Sleep) and the fixture startup/dispose guards.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/.../PowerShell/Utility/PowerShellExtensions.cs Wraps the Get-ExecutionPolicy -List call in try/catch and adds a Count < 2 guard so a failed query skips policy configuration instead of aborting host startup.
test/.../Hosts/SkippableFactOnWindowsPowerShellAttribute.cs Deletes the discovery-time WinPS skip attribute and shared skip-reason class.
test/.../Hosts/SkippableTheoryOnWindowsPowerShellAttribute.cs Deletes the discovery-time WinPS skip theory attribute.
test/.../LSPTestsFixtures.cs Removes the WinPS start/dispose guards so the LSP fixture always starts the server.
test/.../LanguageServerProtocolMessageTests.cs Reverts test attributes back to [Fact]/[SkippableFact] and removes the class remarks.
test/.../DebugAdapterProtocolMessageTests.cs Reverts DAP test attributes, the log-tail loop, and the attach poll Start-Sleep.

The production fix is small and defensive, and the [Fact]/[SkippableFact] mapping is consistent (every test retaining a body-level Skip.* call kept a skippable attribute, and no deleted-type references remain). My comments are minor: an inconsistent ?.Dispose() and two reintroduced busy-waits that #2318 had described as orthogonal "good hygiene." The main risk is non-code: this re-enables a large in-box WinPS E2E suite that previously hung CI for hours, on the hypothesis that the startup fix resolves it, and the author flags open follow-ups (help-test discrepancies, an OmniSharp swallow). Final validation depends on the specific runner image, which warrants human review.

Comment thread test/PowerShellEditorServices.Test.E2E/LSPTestsFixtures.cs Outdated
Comment on lines 243 to 246
while (nextLine is null || nextLine.Length == 0)
{
string nextLine = await scriptLogReader.ReadLineAsync();
if (!string.IsNullOrEmpty(nextLine))
{
return nextLine;
}

await Task.Delay(100);
nextLine = await scriptLogReader.ReadLineAsync(); //Might return null if at EOF because we created it above but the script hasn't written to it yet
}
andyleejordan and others added 5 commits June 24, 2026 10:54
PR #2328 reset the three E2E test files to their pre-#2318 state
(`b57653c40`) to undo the Windows PowerShell skips we no longer want now
that the host-start hang is actually fixed. But #2318 was a squash that
bundled genuine harness hardening *alongside* those skips, so reverting
wholesale quietly dropped the good parts too. Restore just those, keeping
the skips reverted:

- `ReadScriptLogLineAsync` now yields with `await Task.Delay(100)` at EOF
  instead of busy-spinning. At EOF `ReadLineAsync` completes synchronously
  with `null`, so the old `while`/`await` loop never released its
  thread-pool thread and could starve the scheduler on constrained CI
  runners.
- The child-process `Debug-Runspace` readiness poll in
  `CanAttachScriptWithPathMappings` sleeps 100ms per iteration so it can't
  peg a core during the attach handshake.
- `LSPTestsFixture.DisposeAsync` guards against a null `PsesLanguageClient`
  so a startup failure isn't masked by a `NullReferenceException` during
  teardown.

These are defense-in-depth independent of the skips, and they matter more
now that we un-skip: a Windows PowerShell server that fails to start
shouldn't busy-spin or NRE on teardown.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
OmniSharp's `DebugAdapterServer.From()` (0.19.9) awaits an internal
`AsyncSubject` that its `InitializeRequest` handler only signals on the
success path. If an `OnInitialize` delegate throws, the handler faults
before signaling, nothing errors the subject, and `From()` -- and thus
`PsesDebugServer.StartAsync()` -- awaits it forever. So a startup failure
wedges the entire debug server and rides the CI/job timeout instead of
failing fast. This is a library limitation, not our bug: the same code is
present on the library's `master`, so we can't fix it upstream without a
fork or upgrade.

#2328 fixes the specific trigger we hit on in-box Windows PowerShell (the
`Get-ExecutionPolicy -List` type-data conflict), but the wedge mechanism is
generic. Guard against any future `OnInitialize` failure: wrap the delegate
body, log the exception, and signal `_serverStopped` so `WaitForShutdown`
unblocks and the process exits cleanly. `Dispose`'s `_serverStopped`
completion is now idempotent (`TrySetResult`) since the catch may have
already completed it.

This converts a silent multi-hour hang into a clean termination with a
logged error -- the client sees the session end instead of waiting forever.
See #2323.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This PR un-skips `CanAttachScriptWithPathMappings` on Windows PowerShell
(reverting the #2318 skips now that the host-start hang is fixed), and the
un-skipped test promptly failed on the WinPS 5.1 CI leg with "Attached
process exited before the script could start" -- at exactly 10 seconds.

The test spawns a child `powershell.exe`, then waits up to 10s for
`Debug-Runspace` to subscribe to the child's runspace as the marker that
the attach handshake completed, before driving a full breakpoint/continue
interaction under a 15s xUnit timeout. On the 2-vCPU CI runners the WinPS
attach alone exceeds that 10s budget, so the child throws its internal
timeout and exits before the debug session attaches. It isn't even close
to comfortable locally: xUnit flags the run as a long-running test at the
10s mark on a fast dev box, so the old budget was always on a knife's edge
for the slower WinPS path.

Rather than re-skip it, give the slow-but-correct path room:

- The child's `Debug-Runspace` subscription poll goes 10s -> 30s.
- The outer process-watch cancellation goes 30s -> 60s.
- The xUnit `Timeout` goes 15s -> 60s.

This continues 81b273b (which already bumped the xUnit timeout 10s -> 15s
for the same flakiness) and keeps real coverage of the attach path on
Windows PowerShell instead of skipping it. See #2323 and #2318.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Un-skipping the entire Windows PowerShell E2E suite (now that the host-start
hang is fixed) gave us real WinPS coverage for the first time since #2318, but
it also surfaced two pre-existing WinPS-specific failures that have nothing to
do with host startup:

- `CanAttachScriptWithPathMappings` wedges on the in-box, cross-process
  `Debug-Runspace` attach handshake and rides whatever timeout we set. CI
  failed it at 10s, then again at exactly 30s after I bumped the budgets, so
  it genuinely never completes rather than merely running slow. This is the
  in-box attach deadlock tracked by #2323.
- `CanSendCompletionResolveWithModulePrefixRequestAsync` gets an empty
  completion list from the Windows PowerShell server for a prefix-imported
  command (it fails before any help assertion, so it's help-independent).
  That's the same "completion works in PS7 but not WinPS" family as #1355.

Because startup no longer hangs, we don't need #2318's discovery-time
`[SkippableFactOnWindowsPowerShell]` attribute anymore: an in-body
`Skip.If(IsWindowsPowerShell, ...)` runs after `InitializeAsync` (which now
starts the server fine) and skips before the broken code, so both tests skip
cleanly in ~1ms under `powershell.exe` instead of hanging or failing.

This also reverts my earlier timeout bump on `CanAttachScriptWithPathMappings`
(back to the 15s/10s/30s budgets from `main`) since the bigger budgets didn't
help and the test no longer runs on Windows PowerShell anyway. Everything else
stays un-skipped.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Un-skipping the Windows PowerShell E2E suite surfaced one more attach-family
flake: `CanLaunchScriptWithNewChildAttachSession`. It passed on the prior CI run
(`a4e8a823e`) but timed out (`TaskCanceledException` at its 30s budget) on the
next (`25d9e58dd`) with no relevant code change in between, so it's genuinely
flaky on the slow in-box Windows PowerShell CI runners rather than broken.

The test runs `Start-DebugAttachSession` and waits for the server's
`startDebugging` reverse-request round-trip; on in-box Windows PowerShell that
round-trip is slow enough to intermittently miss the timeout. That's the same
in-box attach-E2E reliability bucket as #2323, and its two siblings are already
skipped there: `CanAttachScriptWithPathMappings` (the cross-process
`Debug-Runspace` wedge) and `CanLaunchScriptWithNewChildAttachSessionAsJob`
(no `ThreadJob` on Windows PowerShell). So skip this one on Windows PowerShell
too, keeping the rest of the now-un-skipped DAP suite running.

The flake only reproduces on the constrained CI runner, not on a fast dev box,
so I'm matching how the sibling attach tests are handled rather than chasing a
timeout bump I can't validate locally.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment on lines +141 to +165
IReadOnlyList<PSObject> policies;
try
{
policies = pwsh
.AddCommand(@"Microsoft.PowerShell.Security\Get-ExecutionPolicy")
.AddParameter("List")
.InvokeAndClear<PSObject>();
}
catch (Exception e)
{
// Some Windows PowerShell servicing builds throw a type-data conflict
// ("The member ... is already present" on ObjectSecurity) when
// autoloading Microsoft.PowerShell.Security into a runspace whose
// InitialSessionState already carries that module's type data.
// Configuring the execution policy is best-effort, so log and skip
// rather than letting it abort host startup (which manifests as a hang).
logger.LogError(e, "Failed to query the execution policy; skipping execution policy configuration.");
return;
}

// We need at least the CurrentUser and LocalMachine scopes to proceed.
if (policies is null || policies.Count < 2)
{
return;
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JustinGrote I got this whole issue to repro locally on a Windows box and apparently this was the issue that was hanging all the end-to-end tests on the updated image.

@andyleejordan andyleejordan enabled auto-merge (squash) June 24, 2026 21:00

@SeeminglyScience SeeminglyScience left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@andyleejordan andyleejordan merged commit 24f6f64 into main Jun 24, 2026
12 of 13 checks passed
@andyleejordan andyleejordan deleted the andyleejordan-fix-winps-host-start-hang branch June 24, 2026 21:04
andyleejordan added a commit that referenced this pull request Jun 24, 2026
Integrates #2328, which fixed the Windows PowerShell host-start hang and
reverted the in-box WinPS E2E skips, deleting the
SkippableFactOnWindowsPowerShellAttribute. The getModule E2E tests no
longer need the discovery-time skip, so they resolve back to
[SkippableFact]/[Fact] and run on Windows PowerShell alongside their
now-un-skipped siblings.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
andyleejordan added a commit that referenced this pull request Jun 24, 2026
Merging #2328 un-skipped the Windows PowerShell E2E suite, which surfaced
a pre-existing host gap in `CanSendGetModuleRequestAsync`: it probed
`Microsoft.PowerShell.Management`, but Windows PowerShell's in-box core
modules are snap-in based and are not returned by
`Get-Module -ListAvailable`. The handler correctly resolved nothing, so
`Assert.NotNull` failed on the Windows PowerShell leg while passing on
PowerShell 7 (where the core modules ship as files).

Probe `Microsoft.PowerShell.Archive` instead: it is a file-based module
present in the module path on both Windows PowerShell and PowerShell 7+,
so the test exercises the handler on every host without skipping or
loosening any assertion.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
andyleejordan added a commit that referenced this pull request Jun 24, 2026
Merging #2328 un-skipped the Windows PowerShell E2E suite, which surfaced
a pre-existing host gap in `CanSendGetModuleRequestAsync`: it probed
`Microsoft.PowerShell.Management`, but Windows PowerShell's in-box core
modules are snap-in based and are not returned by
`Get-Module -ListAvailable`. The handler correctly resolved nothing, so
`Assert.NotNull` failed on the Windows PowerShell leg while passing on
PowerShell 7 (where the core modules ship as files).

Probe `Microsoft.PowerShell.Archive` instead: it is a file-based module
present in the module path on both Windows PowerShell and PowerShell 7+,
so the test exercises the handler on every host without skipping or
loosening any assertion.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Windows CI hang: in-box Windows PowerShell attach E2E test wedges after the 20260614 runner image

3 participants