Skip to content

gh-143008: Fix Null pointer dereferences in TextIOWrapper underlying stream access#145957

Merged
vstinner merged 25 commits into
python:mainfrom
cmaloney:textio_nullbuffer
Jun 9, 2026
Merged

gh-143008: Fix Null pointer dereferences in TextIOWrapper underlying stream access#145957
vstinner merged 25 commits into
python:mainfrom
cmaloney:textio_nullbuffer

Conversation

@cmaloney

@cmaloney cmaloney commented Mar 15, 2026

Copy link
Copy Markdown
Contributor

TextIOWrapper keeps its underlying stream in a member called self->buffer. That stream can be detached by user code, such as custom .flush() implementations, resulting in self->buffer being set to NULL. The implementation often checked at the start of functions if self->buffer is in a good state but did not always recheck after other Python code which could modify self->buffer.

The cases which need to be re-checked are hard to spot so rather than rely on reviewer effort make a better safety net by changing all self->buffer access to go through helper functions.

Thank you @yihong0618 for the test, NEWS and an initial implementation in gh-143041.

TextIOWrapper keeps its underlying stream in a member called
`self->buffer`. That stream can be detached by user code, such as custom
`.flush` implementations resulting in `self->buffer` being set to NULL.
The implementation often checked at the start of functions if
`self->buffer` is in a good state, but did not always recheck after
other Python code was called which could modify `self->buffer`.

The cases which need to be re-checked are hard to spot so rather than
rely on reviewer effort create better safety by making all self->buffer
access go through helper functions.

Thank you yihong0618 for the test, NEWS and initial implementation in
pythongh-143041.

Co-authored-by: yihong0618 <zouzou0208@gmail.com>
@yihong0618

Copy link
Copy Markdown
Contributor

wow that is cool thanks

@cmaloney cmaloney added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Mar 15, 2026
@cmaloney cmaloney changed the title gh-143008: Safer TextIOWrapper underlying stream access gh-143008: Fix Null Pointer Dereferences in TextIOWrapper underlying stream access Mar 17, 2026
@cmaloney cmaloney changed the title gh-143008: Fix Null Pointer Dereferences in TextIOWrapper underlying stream access gh-143008: Fix Null pointer dereferences in TextIOWrapper underlying stream access Mar 17, 2026
Comment thread Lib/test/test_io/test_textio.py
@cmaloney

Copy link
Copy Markdown
Contributor Author

@colesbury could you take a look as you've done some recent work on TextIO safety around threading. These cases are generally single threaded but generally the pattern "Implementation usually checked state; then did an operation which could modify it and didn't recheck".

@colesbury

Copy link
Copy Markdown
Contributor

I'm a bit confused about the relationship here between _textiowrapper_buffer_safe and CHECK_ATTACHED. When can buffer be NULL and CHECK_ATTACHED not trigger?

@colesbury colesbury self-requested a review April 17, 2026 17:57
@colesbury

Copy link
Copy Markdown
Contributor

Less important, but please match the CPython style for C code, e.g.:

  • { on new lines for functions definitions
  • when wrapping function calls, indent continuation lines to align with the call site

@cmaloney

Copy link
Copy Markdown
Contributor Author

I'm a bit confused about the relationship here between _textiowrapper_buffer_safe and CHECK_ATTACHED. When can buffer be NULL and CHECK_ATTACHED not trigger?

CHECK_ATTACHED, by way of CHECK_INITIALIZED, checks self->ok first. The self->ok flag is set to 0 at start of __init__, 1 at the end, and cleared/reset to 0 during .clear() as well as at the start of the Py_tp_dealloc. During __init__ there are calls to check the file is in a good state and get some attributes (remember: self->buf is the underlying file-like object not an internal buffer).

The more complicated case is during deletion where there is an interaction with the base class. TextIOWrapper implements Py_tp_dealloc and its base class (TextIOBase which has a base class IOBase) implements a Py_tp_finalize with the function iobase_finalize. That function tries to ensure all data is written in the full I/O "stack" before any part of it is closed out of order. It tries to have Text I/O flush and close, then Buffered I/O (self->buf), then the Raw I/O. If there is an underlying file (self->buf != NULL) during those operations need to try and write data in the TextIOWrapper even though self->ok is 0.

Adding to the complexity here all the objects involved and the multiple operations (flush(), close(), .closed) can be overridden and could call TextIOWrapper.clear() or TextIOWrapper.detach() causing self->buf to become NULL.

Less important, but please match the CPython style for C code, e.g.:

Sorry for missing those bits, doing a review of that and will update for it shortly.

@cmaloney

Copy link
Copy Markdown
Contributor Author

There are a couple cases I'm not sure how to PEP 7, in particular:

/* hits 81 cols */
        PyObject *bytes = _textiowrapper_buffer_callmethod_noargs(self,
                                                                  &_Py_ID(read));
/* hits 81 cols */
            res = _textiowrapper_buffer_callmethod_onearg(self,
                                                          &_Py_ID(_dealloc_warn),
                                                          (PyObject *)self);


PyObject *input_chunk = _textiowrapper_buffer_callmethod_onearg(self,
            &_Py_ID(read), bytes_to_feed);

A couple directions I looked at but don't know how to decide between:

  1. Make the variable name shorter (increases diff)
  2. Make the function name shorter (ex. _textiowrapper_buf_... or _textio_buffer_, _textiowrapper_buf_callmeth_noargs)
  3. Whats the right indentation to just put on the next line?

@colesbury

Copy link
Copy Markdown
Contributor

I think it's fine to make the function names shorter, i.e., buffer_callmethod_onearg instead of _textiowrapper_buffer_callmethod_onearg. File-scope static functions don't need a prefix or underscore.

In general the preference is for wrapping function calls like the following (from PEP 7 example):

PyErr_Format(PyExc_TypeError,
             "cannot create '%.100s' instances",
             type->tp_name);

But if that's not practical due to overly long lines, than just try to fit the style of surround code. For example, from elsewhere in textio.c:

        PyObject *incrementalDecoder = PyObject_CallFunctionObjArgs(
            (PyObject *)state->PyIncrementalNewlineDecoder_Type,
            self->decoder, self->readtranslate ? Py_True : Py_False, NULL);

@cmaloney

cmaloney commented May 6, 2026

Copy link
Copy Markdown
Contributor Author

This should also resolve gh-143007

@cmaloney

Copy link
Copy Markdown
Contributor Author

I think thish is ready for re-review @colesbury

Comment thread Modules/_io/textio.c
Comment thread Modules/_io/textio.c
leading to NULL pointer dereferences (see gh-143008, gh-142594). Protect against
that by using helpers to check self->buffer validity at callsites. */
static PyObject *
buffer_access_safe(textio *self)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to rename buffer_xxx() functions to textiowrapper_buffer_xxx(), and rename this function to textiowrapper_buffer_get().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started like that (see: d7f14fc) but it made some of the PEP-7 formatting really bad so moved to a shorter name

Comment thread Modules/_io/textio.c
Comment thread Modules/_io/textio.c Outdated
Comment on lines +3324 to +3327
/* During destruction self->ok is 0 but self->buffer is non-NULL and this
needs to error in that case which the safe buffer wrapper does not.

Match original behavior by calling CHECK_ATTACHED explicitly. */

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

textiowrapper_dealloc() sets ok to 0 and almost immediately sets buffer to NULL. So I don't understand well when it's possible that ok is 0 and buffer is non-NULL.

This comment is unclear to me. I suggest rephrasing it to something like:

"Call CHECK_ATTACHED() to raise an exception if ..., when ... happens."

The "Match original behavior" part is unclear to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to checking self->ok directly and just returning true in that case which differs more in this bug fix but I think is a lot more understandable. Described a couple cases this is hit by the test suite.

@serhiy-storchaka serhiy-storchaka added the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label May 30, 2026
Co-authored-by: Victor Stinner <vstinner@python.org>
@read-the-docs-community

read-the-docs-community Bot commented Jun 5, 2026

Copy link
Copy Markdown

Documentation build overview

📚 cpython-previews | 🛠️ Build #33001001 | 📁 Comparing e5c014a against main (a00b24e)

  🔍 Preview build  

104 files changed · ± 103 modified · - 1 deleted

± Modified

- Deleted

@vstinner vstinner left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I don't think that it's worth it to backport the "check buffer" fixes. #143008 issue was found by fuzzing. Regular Python code should never meet this bug in practice. Triggering the code requires to design a very precise class which misbehaves.

But IMO the critical section on TextIOWrapper.__init__ should be backported. Once this PR is merged, you can create a fix for 3.15 (and then we can backport it).

Comment thread Lib/test/test_io/test_textio.py Outdated
('reconfigure', lambda: wrapper.reconfigure(line_buffering=True)),
]
for name, method in tests:
with self.subTest(name), set_recursion_limit(100):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you reduce the recursion limit?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use unlimited recursion at all? Call the method, detach the buffer, set the flag, call the method yet one time if the flag was not set.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the recursion

Comment thread Lib/test/test_io/test_textio.py Outdated
Comment on lines +1602 to +1603
wrapper_ref = lambda: None
del wrapper

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to just assign wrapper to None (and leave wrapper_ref unchanged)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed wrapper_ref entirely and reworked how the classes work to simplify

Comment thread Modules/_io/textio.c Outdated
Comment thread Modules/_io/textio.c Outdated

@serhiy-storchaka serhiy-storchaka left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, would not it be simpler if set buffer to None instead of NULL? How does it work in the Python implementation?

Comment thread Lib/test/test_io/test_textio.py Outdated
from collections import UserList
from test import support
from test.support import os_helper, threading_helper
from test.support import (

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to not mix import of module attributes and submodules in one statement.

Complex multiline import is less readable than separate oneline imports.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed set_recursion_limit entirely

Comment thread Lib/test/test_io/test_textio.py Outdated
('reconfigure', lambda: wrapper.reconfigure(line_buffering=True)),
]
for name, method in tests:
with self.subTest(name), set_recursion_limit(100):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use unlimited recursion at all? Call the method, detach the buffer, set the flag, call the method yet one time if the flag was not set.

Co-authored-by: Victor Stinner <vstinner@python.org>
@cmaloney

cmaloney commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

_pyio uses None to indicate no buffer. Can definitely migrate _io.TextIOWrapper to do that but that's a substantial code delta for an edge case and doesn't simplify fixing the time of check time of use issue here. Moving to None would make it calling methods on None rather than a null pointer de-reference. Would still need most this change of "always check just before using".

re: Recursion most the tests don't rely on it. The one that does is when flush() calls detach itself which results in another call to flush(). gh-143041 resolved that by making a new "state" effectively of "are detaching, don't detach again". Can do a similar pattern here if want to eliminate the recursion potentially: In .detach() change self->detached and/or self->ok before calling _PyFile_Flush(self) and making sure .flush() handles that case appropriately. I'm hesitant to do that as it changes the set of states a TextIOWrapper can be in a bit and I don't have those well mapped out mentally. Limiting recursion means the test exits after some number of recursion instances which validates it no longer segfaults / NULL pointer dereference is resolved.

@cmaloney cmaloney left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a refactoring that removed the recursion while not making too much complex

Comment thread Lib/test/test_io/test_textio.py Outdated
from collections import UserList
from test import support
from test.support import os_helper, threading_helper
from test.support import (

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed set_recursion_limit entirely

Comment thread Lib/test/test_io/test_textio.py Outdated
('reconfigure', lambda: wrapper.reconfigure(line_buffering=True)),
]
for name, method in tests:
with self.subTest(name), set_recursion_limit(100):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the recursion

@cmaloney cmaloney removed needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes needs backport to 3.15 pre-release feature fixes, bugs and security fixes labels Jun 9, 2026
@vstinner vstinner merged commit db4b194 into python:main Jun 9, 2026
61 checks passed
@vstinner

vstinner commented Jun 9, 2026

Copy link
Copy Markdown
Member

Merged, thanks @cmaloney for the fix.

No one should meet this bug in practice, since it requires a custom buffer object which detach() when the stream is not supposed to be detached. But well, it's good to fix the code to support this corner case.

@cmaloney: You can create a change for the 3.15 branch to backport the critical section if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants