Skip to content

feat(rust/sedona-raster-gdal): add RS_AsRaster for rasterizing geometries#956

Open
Kontinuation wants to merge 4 commits into
apache:mainfrom
Kontinuation:pr-i-rs-as-raster
Open

feat(rust/sedona-raster-gdal): add RS_AsRaster for rasterizing geometries#956
Kontinuation wants to merge 4 commits into
apache:mainfrom
Kontinuation:pr-i-rs-as-raster

Conversation

@Kontinuation

@Kontinuation Kontinuation commented Jun 14, 2026

Copy link
Copy Markdown
Member

Summary

  • add the GDAL-backed rs_asraster raster function
  • normalize the SQL name to rs_asraster
  • add SQL reference docs and a dedicated Criterion benchmark

Dependency

  • Independent branch based on main

Testing

  • cargo clippy -p sedona-raster-gdal -p sedona --all-targets -- -D warnings
  • cargo bench -p sedona-raster-gdal --bench rs_as_raster --no-run
  • cargo test -p sedona-raster-gdal
    • blocked in this environment by existing missing raster fixture files in other tests
  • PATH="/home/kontinuation/workspace/github/apache/sedona-db/.venv/bin:$PATH" quarto render docs/reference/sql/rs_asraster.qmd
    • front matter validated; example execution is currently blocked by an existing sedonadb._lib Python import mismatch in this environment

@github-actions github-actions Bot requested a review from paleolimbot June 14, 2026 08:18
@Kontinuation Kontinuation force-pushed the pr-i-rs-as-raster branch 2 times, most recently from 07fd0b7 to 0fbfcce Compare June 17, 2026 13:45
@Kontinuation Kontinuation changed the title feat(rust/sedona-raster-gdal): add rs_asraster feat(rust/sedona-raster-gdal): add RS_AsRaster for rasterizing geometries Jun 20, 2026
@Kontinuation Kontinuation marked this pull request as ready for review June 22, 2026 02:12

@paleolimbot paleolimbot left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

The tests here are pretty light given the complexity of the functionality here. We now have some Python utilities for input and output of rasters that can make a wider variety of tests more readable and/or compare them to something like rasterio to ensure correctness.

Like the other raster functions, I'm a little concerned about doing something expensive/potentially non-cancellable in a synchronous UDF; however, we need some working + correct functions before we can evaluate alternatives 🙂

RS_Width(
RS_AsRaster(
ST_GeomFromWKT('POLYGON((1 1, 1 2, 2 2, 2 1, 1 1))'),
RS_FromPath('../../../submodules/sedona-testing/data/raster/test4.tiff'),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this example be user-runnable using a URL (can be URL to the sedona-testing repo tiff)

Comment on lines +22 to +29
- returns: raster
args:
- name: geometry
type: geometry
- name: reference_raster
type: raster
- name: pixel_type
type: string

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should solve this now but these signatures result in pretty much unreadable SQL (we can probably improve this with some unified "options" handling in the future using struct arrays or JSON).

Comment on lines +34 to +49
fn base_raster() -> arrow_array::StructArray {
with_global_gdal(|gdal| {
let driver = gdal.get_driver_by_name("MEM").unwrap();
let dataset = driver.create_with_band_type::<u8>("", 4, 3, 1).unwrap();
dataset
.set_geo_transform(&[10.0, 2.0, 0.0, 20.0, 0.0, -2.0])
.unwrap();
dataset.set_projection("EPSG:4326").unwrap();
let band = dataset.rasterband(1).unwrap();
band.set_no_data_value(Some(0.0)).unwrap();
let mut buffer = Buffer::new((4, 3), vec![0u8; 12]);
band.write((0, 0), (4, 3), &mut buffer).unwrap();
sedona_raster_gdal::dataset_to_indb_raster(&dataset).unwrap()
})
.unwrap()
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the RasterSpec to generate this a little more readably?

use crate::gdal_common::{band_data_type_to_gdal, nodata_f64_to_bytes, with_gdal};
use crate::gdal_dataset_provider::{configure_thread_local_options, thread_local_provider};

pub fn rs_as_raster_udf() -> SedonaScalarUDF {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for now, but this seems like it would be most effective as an aggregate function (ST_Collect + AsRaster is a workaround but probably much more intensive)

Comment on lines +239 to +255
fn parse_pixel_type(value: &str) -> Result<BandDataType> {
match value.trim().to_ascii_uppercase().as_str() {
"D" => Ok(BandDataType::Float64),
"F" => Ok(BandDataType::Float32),
"I" => Ok(BandDataType::Int32),
"S" => Ok(BandDataType::Int16),
"US" => Ok(BandDataType::UInt16),
"B" => Ok(BandDataType::UInt8),
"I8" | "INT8" => Ok(BandDataType::Int8),
"U64" | "UINT64" => Ok(BandDataType::UInt64),
"I64" | "INT64" => Ok(BandDataType::Int64),
other => exec_err!(
"Unsupported pixelType: {} (expected one of D, F, I, S, US, B, I8, U64, I64)",
other
),
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are pretty cryptic...can we also accept the strings we use elsewhere (int8, uint8, etc.)?

Comment on lines +291 to +297
let start_col = ((env.MinX - ulx) / scale_x).floor() as isize;
let end_col_excl = ((env.MaxX - ulx) / scale_x).ceil() as isize;
let start_row = ((env.MaxY - uly) / scale_y).floor() as isize;
let end_row_excl = ((env.MinY - uly) / scale_y).ceil() as isize;

let width = (end_col_excl - start_col).max(0) as usize;
let height = (end_row_excl - start_row).max(0) as usize;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these use a checked cast? I may be reading this incorrectly but it is probably possible to accidentally pass a huge geometry and a very small reference raster and overflow an isize.

Comment on lines +420 to +421
BandDataType::UInt64 => initialize_band_t::<u64>(dataset, width, height, init_value as u64),
BandDataType::Int64 => initialize_band_t::<i64>(dataset, width, height, init_value as i64),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two could generate potentially lossy values. It may be worth casting the nodata/init value based on the requested data type (or erroring for now until that can be supported).

Ok(())
}

fn calc_num_iterations(args: &[ColumnarValue]) -> usize {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now but we should improve our pattern for iterating over pairs of raster(s)/geometry(ies) without expanding scalars.

Comment on lines +546 to +556
fn wkb_from_wkt(gdal: &Gdal, wkt: &str) -> Result<Vec<u8>> {
let geom = gdal.geometry_from_wkt(wkt).unwrap();
geom.wkb().map_err(|e| exec_datafusion_err!("{e}"))
}

fn bytes_to_f64_vec(bytes: &[u8]) -> Vec<f64> {
bytes
.chunks_exact(8)
.map(|chunk| f64::from_le_bytes(chunk.try_into().unwrap()))
.collect()
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen both of these functions before. Can we use the previous versions of them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants