feat(rust/sedona-raster-gdal): add RS_AsRaster for rasterizing geometries#956
feat(rust/sedona-raster-gdal): add RS_AsRaster for rasterizing geometries#956Kontinuation wants to merge 4 commits into
Conversation
07fd0b7 to
0fbfcce
Compare
0fbfcce to
3ec13dc
Compare
paleolimbot
left a comment
There was a problem hiding this comment.
Thank you!
The tests here are pretty light given the complexity of the functionality here. We now have some Python utilities for input and output of rasters that can make a wider variety of tests more readable and/or compare them to something like rasterio to ensure correctness.
Like the other raster functions, I'm a little concerned about doing something expensive/potentially non-cancellable in a synchronous UDF; however, we need some working + correct functions before we can evaluate alternatives 🙂
| RS_Width( | ||
| RS_AsRaster( | ||
| ST_GeomFromWKT('POLYGON((1 1, 1 2, 2 2, 2 1, 1 1))'), | ||
| RS_FromPath('../../../submodules/sedona-testing/data/raster/test4.tiff'), |
There was a problem hiding this comment.
Can this example be user-runnable using a URL (can be URL to the sedona-testing repo tiff)
| - returns: raster | ||
| args: | ||
| - name: geometry | ||
| type: geometry | ||
| - name: reference_raster | ||
| type: raster | ||
| - name: pixel_type | ||
| type: string |
There was a problem hiding this comment.
I don't think we should solve this now but these signatures result in pretty much unreadable SQL (we can probably improve this with some unified "options" handling in the future using struct arrays or JSON).
| fn base_raster() -> arrow_array::StructArray { | ||
| with_global_gdal(|gdal| { | ||
| let driver = gdal.get_driver_by_name("MEM").unwrap(); | ||
| let dataset = driver.create_with_band_type::<u8>("", 4, 3, 1).unwrap(); | ||
| dataset | ||
| .set_geo_transform(&[10.0, 2.0, 0.0, 20.0, 0.0, -2.0]) | ||
| .unwrap(); | ||
| dataset.set_projection("EPSG:4326").unwrap(); | ||
| let band = dataset.rasterband(1).unwrap(); | ||
| band.set_no_data_value(Some(0.0)).unwrap(); | ||
| let mut buffer = Buffer::new((4, 3), vec![0u8; 12]); | ||
| band.write((0, 0), (4, 3), &mut buffer).unwrap(); | ||
| sedona_raster_gdal::dataset_to_indb_raster(&dataset).unwrap() | ||
| }) | ||
| .unwrap() | ||
| } |
There was a problem hiding this comment.
Can we use the RasterSpec to generate this a little more readably?
| use crate::gdal_common::{band_data_type_to_gdal, nodata_f64_to_bytes, with_gdal}; | ||
| use crate::gdal_dataset_provider::{configure_thread_local_options, thread_local_provider}; | ||
|
|
||
| pub fn rs_as_raster_udf() -> SedonaScalarUDF { |
There was a problem hiding this comment.
Not for now, but this seems like it would be most effective as an aggregate function (ST_Collect + AsRaster is a workaround but probably much more intensive)
| fn parse_pixel_type(value: &str) -> Result<BandDataType> { | ||
| match value.trim().to_ascii_uppercase().as_str() { | ||
| "D" => Ok(BandDataType::Float64), | ||
| "F" => Ok(BandDataType::Float32), | ||
| "I" => Ok(BandDataType::Int32), | ||
| "S" => Ok(BandDataType::Int16), | ||
| "US" => Ok(BandDataType::UInt16), | ||
| "B" => Ok(BandDataType::UInt8), | ||
| "I8" | "INT8" => Ok(BandDataType::Int8), | ||
| "U64" | "UINT64" => Ok(BandDataType::UInt64), | ||
| "I64" | "INT64" => Ok(BandDataType::Int64), | ||
| other => exec_err!( | ||
| "Unsupported pixelType: {} (expected one of D, F, I, S, US, B, I8, U64, I64)", | ||
| other | ||
| ), | ||
| } | ||
| } |
There was a problem hiding this comment.
These are pretty cryptic...can we also accept the strings we use elsewhere (int8, uint8, etc.)?
| let start_col = ((env.MinX - ulx) / scale_x).floor() as isize; | ||
| let end_col_excl = ((env.MaxX - ulx) / scale_x).ceil() as isize; | ||
| let start_row = ((env.MaxY - uly) / scale_y).floor() as isize; | ||
| let end_row_excl = ((env.MinY - uly) / scale_y).ceil() as isize; | ||
|
|
||
| let width = (end_col_excl - start_col).max(0) as usize; | ||
| let height = (end_row_excl - start_row).max(0) as usize; |
There was a problem hiding this comment.
Should these use a checked cast? I may be reading this incorrectly but it is probably possible to accidentally pass a huge geometry and a very small reference raster and overflow an isize.
| BandDataType::UInt64 => initialize_band_t::<u64>(dataset, width, height, init_value as u64), | ||
| BandDataType::Int64 => initialize_band_t::<i64>(dataset, width, height, init_value as i64), |
There was a problem hiding this comment.
These two could generate potentially lossy values. It may be worth casting the nodata/init value based on the requested data type (or erroring for now until that can be supported).
| Ok(()) | ||
| } | ||
|
|
||
| fn calc_num_iterations(args: &[ColumnarValue]) -> usize { |
There was a problem hiding this comment.
This is fine for now but we should improve our pattern for iterating over pairs of raster(s)/geometry(ies) without expanding scalars.
| fn wkb_from_wkt(gdal: &Gdal, wkt: &str) -> Result<Vec<u8>> { | ||
| let geom = gdal.geometry_from_wkt(wkt).unwrap(); | ||
| geom.wkb().map_err(|e| exec_datafusion_err!("{e}")) | ||
| } | ||
|
|
||
| fn bytes_to_f64_vec(bytes: &[u8]) -> Vec<f64> { | ||
| bytes | ||
| .chunks_exact(8) | ||
| .map(|chunk| f64::from_le_bytes(chunk.try_into().unwrap())) | ||
| .collect() | ||
| } |
There was a problem hiding this comment.
I have seen both of these functions before. Can we use the previous versions of them?
Summary
rs_asrasterraster functionrs_asrasterDependency
mainTesting
cargo clippy -p sedona-raster-gdal -p sedona --all-targets -- -D warningscargo bench -p sedona-raster-gdal --bench rs_as_raster --no-runcargo test -p sedona-raster-gdalPATH="/home/kontinuation/workspace/github/apache/sedona-db/.venv/bin:$PATH" quarto render docs/reference/sql/rs_asraster.qmdsedonadb._libPython import mismatch in this environment