Skip to content

llama : synchronize context before backend teardown#24935

Open
krystophny wants to merge 1 commit into
ggml-org:masterfrom
krystophny:fix/cuda-ctx-synchronize-on-destroy
Open

llama : synchronize context before backend teardown#24935
krystophny wants to merge 1 commit into
ggml-org:masterfrom
krystophny:fix/cuda-ctx-synchronize-on-destroy

Conversation

@krystophny

Copy link
Copy Markdown

Overview

~llama_context() releases backend buffers without first draining
outstanding GPU work. On a multi-GPU CUDA build this aborts during
teardown: cudaFree reports unspecified launch failure inside
~ggml_backend_cuda_buffer_context. Calling synchronize() at the
start of the destructor drains pending work before any buffer is freed.

Additional information

Reproduced on clean master (dec5ca557), CUDA build, two RTX 5060 Ti
(sm_120), CUDA 13.3.

Before:

2/2 Test #36: test-thread-safety ...Subprocess aborted
E CUDA error: unspecified launch failure
E   current device: 1, in function ~ggml_backend_cuda_buffer_context
      at ggml/src/ggml-cuda/ggml-cuda.cu:637
E   cudaFree(dev_ptr)

After:

2/2 Test #36: test-thread-safety ...Passed
100% tests passed, 0 tests failed out of 2

The abort surfaces on Blackwell here. The change is a teardown-ordering
fix and does not depend on the GPU.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES. The two-line teardown fix originates in my
    earlier work; AI assisted with bisecting the failure, building, and
    running test-thread-safety. I reviewed every line, own the change,
    and can explain it.

@krystophny krystophny requested a review from ggerganov as a code owner June 23, 2026 07:54
krystophny added a commit to krystophny/llama.cpp that referenced this pull request Jun 23, 2026
The synchronize-on-teardown fix (src/llama-context.cpp) and the phi3
meta split-state fix (src/llama-model.cpp) entered this branch via an
upstream merge and are unrelated to the Responses API work. They are
now filed independently as ggml-org#24935 and ggml-org#24936. Remove them here so this
PR is limited to the server Responses changes.
@ggml-gh-bot

ggml-gh-bot Bot commented Jun 23, 2026

Copy link
Copy Markdown

Hi @krystophny, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

  • Multiple open PRs from a new contributor: We limit new contributors (those without a previously merged PR) to 1 open PR at a time. You currently have 3 open PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

@ggerganov

Copy link
Copy Markdown
Member

It would be better to synchronize the contexts explicitly at the end of test-thread-safety. Something like:

diff --git a/tests/test-thread-safety.cpp b/tests/test-thread-safety.cpp
index acda4aa81..d0b5946e2 100644
--- a/tests/test-thread-safety.cpp
+++ b/tests/test-thread-safety.cpp
@@ -146,6 +146,8 @@ int main(int argc, char ** argv) {
                 }
 
                 LOG_INF("Model %d/%d, Context %d/%d: %s\n\n", m + 1, num_models, c + 1, num_contexts, result.c_str());
+
+                llama_synchronize(ctx.get());
             });
         }
     }

@krystophny krystophny force-pushed the fix/cuda-ctx-synchronize-on-destroy branch from be1fd16 to e654df6 Compare June 23, 2026 08:21
@github-actions github-actions Bot added the testing Everything test related label Jun 23, 2026
@krystophny

Copy link
Copy Markdown
Author

It would be better to synchronize the contexts explicitly at the end of test-thread-safety. Something like:

diff --git a/tests/test-thread-safety.cpp b/tests/test-thread-safety.cpp
index acda4aa81..d0b5946e2 100644
--- a/tests/test-thread-safety.cpp
+++ b/tests/test-thread-safety.cpp
@@ -146,6 +146,8 @@ int main(int argc, char ** argv) {
                 }
 
                 LOG_INF("Model %d/%d, Context %d/%d: %s\n\n", m + 1, num_models, c + 1, num_contexts, result.c_str());
+
+                llama_synchronize(ctx.get());
             });
         }
     }

thx for the fast look, changed it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants