Version 2.0 development #476

leouieda · 2024-11-29T23:08:27Z

This is a PR to keep track of development in the 2.0 branch. It won't be merged as a PR when 2.0 is ready.

PRs that add things meant for 2.0 should be made against this branch and not main. This includes documentation overhauls and breaking changes.

The docs for this branch are deployed to https://www.fatiando.org/verde/dev2.0

See #331 and the 2.0 milestone

DO NOT MERGE THIS PR. The default of "squash and merge" is no good here. We'll have to merge this branch preserving the history.

santisoler · 2024-12-06T18:35:42Z

Awesome @leouieda! I configured a branch protection for 2.0-dev that forbids pushing directly to it (a PR must be opened), and requires branches to be updated before merging into 2.0-dev.

The first tutorial of the new documentation structure (see #433). Trying to be as simple as possible on how to generate a grid from some data. No cross-validation or other fancy things if it can be avoided.

Talk about how to handle them using pyproj and the ways Verde is able to generate grids and profiles using it.

Create a "dev2" folder in the gh-pages branch for the 2.0-dev branch docs. This is useful since a lot of the work on 2.0 is a complete rewrite of the docs. Need to undo this before the 2.0 release.

The datasets module was deprecated and was replaced by the Ensaio package. Remove the data folder and the datasets module from Verde. Remove the gallery and old tutorial which used the removed module. Remove the tests and baseline images used to test datasets. Remove pytest-mpl as a testing requirement.

This class was deprecated and replaced by Linear, Cubic, and KNearest. Remove it from the codebase along with tests and docs.

The client argument was deprecated in favor of the delayed interface of Dask. Remove it now from the few places that it was present.

Numba will be required from 2.0 and so the slower numpy engine isn't needed anymore. Remove the engine argument from Spline and VectorSpline2D. Remove utils functions that are no longer used. Remove the numpy-based code from both classes.

We're using numba's parallel execution in Spline and VectorSpline2D and they're great. But turns out using that in a `ThreadPoolExecutor` (which dask uses) when numba was installed with pip causes a crash. This is because the threading backend that numba uses from pip on Mac (called "workqueue") is not thread-safe and aborts the process when it's inside another threading layer. There are workarounds in the numbagg package but they're complicated and involve caching of jit-compiled functions. We'll use a simpler solution and allow users to turn off parallelism and add a note in the docs about avoiding it when it would cause a crash.

Use caching of pip and conda as much as possible. Use pip instead of conda for the tests now that the datasets module was removed. Remove unused dependencies sphinx-gallery, cartopy, and pooch. Bump minimum version of Python to 3.10 and update dependencies. Co-authored-by: Santiago Soler <[email protected]>

It's no longer needed since we found a workaround for the case where the distance is 0. This was been deprecated and now can be removed.

The newer KDTree in Scipy is already very fast and the difference isn't as big. It's not worth the effort of maintaining an optional dependency in terms of CI runs and code complexity. Since Verde doesn't have optional dependencies now, we don't need the extra run of the CI.

This functions was deprecated and should be removed from 2.0. The tests can easily be run with pytest and don't need this.

On second thought, that's not really necessary as the R² is good in most cases. If RMSE is wanted, we can still do it on the SplineCV and cross_val_score.

It was missing from the docstring.

We never want to predict on random points. And if we did, we could still do this with the predict method.

For security, it's better to pin versions of third-party Actions to commit hashes. This is because the tags used for versions can be changed to point to a different (possibly malicious) commit while the hash can't. Dependabot can still update using the hashes so it's not a huge issue.

This paper no longer describes current Verde and can be removed. The published version is on JOSS and archived in several places.

When nodes aren't given and are the same as the data coordinates, the Jacobian is symmetric and we can calculate only the upper or lower triangular part. In serial computation, exploiting this leads to ~50% decrease of computation time. In parallel, numba does some kind of optimization that doesn't work on the symmetric implementation and the symmetric code is actually a little bit slower than the naive implementation. So only use the new code when running the Jacobian building in serial. This doesn't have a huge impact on performance but it's significant enough.

Ditch the `setup.cfg` file and use only `pyproject.toml`. Move flake8 configurations to their own `.flake8` file. Co-authored-by: Leonardo Uieda <[email protected]>

The community has been moving to putting the package in a src folder. This has some advantages, for example we don't need to make a temp folder to run the tests in since the module isn't importable from the repository root without being installed. Also moving the tests to a folder outside of the package to avoid distributing them. The tests don't need to be there and aren't used outside of development. This makes the tarballs and wheels a bit smaller, which is good.

Remove the `block_split` function which moved to Bordado. Renamed the `spacing` arguments in `BlockReduce`, `BlockMean`, `BlockKFold`, `BlockShuffleSplit` and `train_test_split` to `block_size` and `shape` to `block_shape`. This is the new argument name in `bordado.block_split` and it makes much more sense. Update tests and examples to match. Add Bordado as a dependency.

This function was moved to Bordado. Remove it from the codebase and use the Bordado version. Also removed the `spacing_to_size` function which was only used by `line_coordinates`. Remove the associated tests and the API entry.

This is created by setuptools_scm and shouldn't have been in the repository.

The combination of autosummary and numpydoc was causing warnings about missing stub files because numpydoc was building a toctree for methods but the pages didn't exist (they are all in the main page). Disable this option and remove the attribute autogenerated pages since we already describe important attributes in the docstring.

The Chain class was introduced to help perform trend estimation and spline interpolation in a single step. This was to help prevent data leakage from the trend estimation step. In practice, this is never really used since the trend ended up not being that necessary. The existence of Chain makes it necessary to have the `filter` method, which is awkward. It also makes it necessary for BlockMean and BlockReduce to be classes with a single method (filter) instead of functions, as would be more natural. Removing the Chain fixes all of these issues and won't have a large impact on usability. With the Chain removed, the filter method of the BaseGridder can also be removed.

This avoids clashes with the Python standard library io module.

These are all of the `check_*` functions that are scattered throughout the code. Also take the opportunity to modernize them and combine some that could be made into a single function. Functions `check_data` and `check_coordinates` were combined into `check_tuple_of_arrays`. Function `check_data_names` and `check_extra_coords_names` were combined into `check_names`. Function `check_fit_input` was given more functionality, like ravelling the input arrays and better checking of compatibility between data and weights. The option to unpack data and weights arguments was removed since we're moving towards having estimators always deal with multiple data components.

leouieda added this to the v2.0.0 milestone Nov 29, 2024

leouieda force-pushed the 2.0-dev branch 2 times, most recently from 3101644 to b77d986 Compare April 29, 2025 14:58

leouieda mentioned this pull request Apr 29, 2025

Stop deploying the docs of the 2.0-dev branch #483

Open

leouieda and others added 17 commits May 8, 2025 10:47

Create a first tutorial about generating grids (#475)

c29a1a9

The first tutorial of the new documentation structure (see #433). Trying to be as simple as possible on how to generate a grid from some data. No cross-validation or other fancy things if it can be avoided.

Add a tutorial about geographic coordinates (#479)

b32b29e

Talk about how to handle them using pyproj and the ways Verde is able to generate grids and profiles using it.

Deploy docs for the 2.0-dev branch (#482)

06a6a76

Create a "dev2" folder in the gh-pages branch for the 2.0-dev branch docs. This is useful since a lot of the work on 2.0 is a complete rewrite of the docs. Need to undo this before the 2.0 release.

Remove the ScipyGridder class (#486)

c396e96

This class was deprecated and replaced by Linear, Cubic, and KNearest. Remove it from the codebase along with tests and docs.

Remove the client argument from dask-enabled code (#488)

1554ab1

The client argument was deprecated in favor of the delayed interface of Dask. Remove it now from the few places that it was present.

Remove the mindist argument from Spline (#491)

2d1a48c

It's no longer needed since we found a workaround for the case where the distance is 0. This was been deprecated and now can be removed.

Remove the verde.test function (#493)

c154fa2

This functions was deprecated and should be removed from 2.0. The tests can easily be run with pytest and don't need this.

Undo changing the default scoring to negative RMSE (#494)

db1e861

On second thought, that's not really necessary as the R² is good in most cases. If RMSE is wanted, we can still do it on the SplineCV and cross_val_score.

Add a description of the nan arg of maxabs (#495)

81ad97c

It was missing from the docstring.

Remove the BaseGridder.scatter method (#496)

dd2b3e0

We never want to predict on random points. And if we did, we could still do this with the predict method.

Remove the old v1.0 JOSS paper source files (#500)

103d248

This paper no longer describes current Verde and can be removed. The published version is on JOSS and archived in several places.

leouieda force-pushed the 2.0-dev branch from 7aed255 to 103d248 Compare May 8, 2025 13:49

leouieda and others added 7 commits May 8, 2025 11:00

Replace setup.cfg for pyproject.toml (#443)

c7646c9

Ditch the `setup.cfg` file and use only `pyproject.toml`. Move flake8 configurations to their own `.flake8` file. Co-authored-by: Leonardo Uieda <[email protected]>

Remove line_coordinates which has moved to Bordado (#507)

cf4a1ce

This function was moved to Bordado. Remove it from the codebase and use the Bordado version. Also removed the `spacing_to_size` function which was only used by `line_coordinates`. Remove the associated tests and the API entry.

Remove the autogenerated _version.py file (#508)

7dd0a1c

This is created by setuptools_scm and shouldn't have been in the repository.

leouieda added 3 commits May 14, 2025 10:34

Rename the io module to surfer.py (#512)

e2def98

This avoids clashes with the Python standard library io module.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Version 2.0 development #476

Version 2.0 development #476

Uh oh!

leouieda commented Nov 29, 2024 •

edited

Loading

Uh oh!

santisoler commented Dec 6, 2024

Uh oh!

Uh oh!

Version 2.0 development #476

Are you sure you want to change the base?

Version 2.0 development #476

Uh oh!

Conversation

leouieda commented Nov 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

santisoler commented Dec 6, 2024

Uh oh!

Uh oh!

leouieda commented Nov 29, 2024 •

edited

Loading