Skip to content

Pandas raises a FutureWarning when a synthesizer times out #454

Description

@R-Palazzo

Environment Details

  • SDGym version: 0.10.0

Error Description

When a user sets a timeout value and a synthesizer exceeds that timeout, some columns in the result tables are filled with NaN. As a result, pandas raises a FutureWarning during DataFrame concatenation:

FutureWarning:

The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.

The line where the warning is emitted is:

scores = pd.concat(scores, ignore_index=True)

Steps to reproduce

from sdgym.benchmark import benchmark_single_table

result = benchmark_single_table(
    synthesizers=['GaussianCopulaSynthesizer'],
    custom_synthesizers=None,
    sdv_datasets=['child'],
    additional_datasets_folder=None,
    limit_dataset_size=True,
    compute_quality_score=True,
    compute_diagnostic_score=True,
    compute_privacy_score=True,
    sdmetrics=None,
    timeout=4,  # put a small timeout to trigger it
    show_progress=False,
    multi_processing_config=None,
    run_on_ec2=False,
)
result

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions