-
Notifications
You must be signed in to change notification settings - Fork 74
Version 2.0 development #476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
leouieda
wants to merge
27
commits into
main
Choose a base branch
from
2.0-dev
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Awesome @leouieda! I configured a branch protection for |
3101644
to
b77d986
Compare
The first tutorial of the new documentation structure (see #433). Trying to be as simple as possible on how to generate a grid from some data. No cross-validation or other fancy things if it can be avoided.
Talk about how to handle them using pyproj and the ways Verde is able to generate grids and profiles using it.
Create a "dev2" folder in the gh-pages branch for the 2.0-dev branch docs. This is useful since a lot of the work on 2.0 is a complete rewrite of the docs. Need to undo this before the 2.0 release.
The datasets module was deprecated and was replaced by the Ensaio package. Remove the data folder and the datasets module from Verde. Remove the gallery and old tutorial which used the removed module. Remove the tests and baseline images used to test datasets. Remove pytest-mpl as a testing requirement.
This class was deprecated and replaced by Linear, Cubic, and KNearest. Remove it from the codebase along with tests and docs.
The client argument was deprecated in favor of the delayed interface of Dask. Remove it now from the few places that it was present.
Numba will be required from 2.0 and so the slower numpy engine isn't needed anymore. Remove the engine argument from Spline and VectorSpline2D. Remove utils functions that are no longer used. Remove the numpy-based code from both classes.
We're using numba's parallel execution in Spline and VectorSpline2D and they're great. But turns out using that in a `ThreadPoolExecutor` (which dask uses) when numba was installed with pip causes a crash. This is because the threading backend that numba uses from pip on Mac (called "workqueue") is not thread-safe and aborts the process when it's inside another threading layer. There are workarounds in the numbagg package but they're complicated and involve caching of jit-compiled functions. We'll use a simpler solution and allow users to turn off parallelism and add a note in the docs about avoiding it when it would cause a crash.
Use caching of pip and conda as much as possible. Use pip instead of conda for the tests now that the datasets module was removed. Remove unused dependencies sphinx-gallery, cartopy, and pooch. Bump minimum version of Python to 3.10 and update dependencies. Co-authored-by: Santiago Soler <[email protected]>
It's no longer needed since we found a workaround for the case where the distance is 0. This was been deprecated and now can be removed.
The newer KDTree in Scipy is already very fast and the difference isn't as big. It's not worth the effort of maintaining an optional dependency in terms of CI runs and code complexity. Since Verde doesn't have optional dependencies now, we don't need the extra run of the CI.
This functions was deprecated and should be removed from 2.0. The tests can easily be run with pytest and don't need this.
On second thought, that's not really necessary as the R² is good in most cases. If RMSE is wanted, we can still do it on the SplineCV and cross_val_score.
It was missing from the docstring.
We never want to predict on random points. And if we did, we could still do this with the predict method.
For security, it's better to pin versions of third-party Actions to commit hashes. This is because the tags used for versions can be changed to point to a different (possibly malicious) commit while the hash can't. Dependabot can still update using the hashes so it's not a huge issue.
This paper no longer describes current Verde and can be removed. The published version is on JOSS and archived in several places.
When nodes aren't given and are the same as the data coordinates, the Jacobian is symmetric and we can calculate only the upper or lower triangular part. In serial computation, exploiting this leads to ~50% decrease of computation time. In parallel, numba does some kind of optimization that doesn't work on the symmetric implementation and the symmetric code is actually a little bit slower than the naive implementation. So only use the new code when running the Jacobian building in serial. This doesn't have a huge impact on performance but it's significant enough.
Ditch the `setup.cfg` file and use only `pyproject.toml`. Move flake8 configurations to their own `.flake8` file. Co-authored-by: Leonardo Uieda <[email protected]>
The community has been moving to putting the package in a src folder. This has some advantages, for example we don't need to make a temp folder to run the tests in since the module isn't importable from the repository root without being installed. Also moving the tests to a folder outside of the package to avoid distributing them. The tests don't need to be there and aren't used outside of development. This makes the tarballs and wheels a bit smaller, which is good.
Remove the `block_split` function which moved to Bordado. Renamed the `spacing` arguments in `BlockReduce`, `BlockMean`, `BlockKFold`, `BlockShuffleSplit` and `train_test_split` to `block_size` and `shape` to `block_shape`. This is the new argument name in `bordado.block_split` and it makes much more sense. Update tests and examples to match. Add Bordado as a dependency.
This function was moved to Bordado. Remove it from the codebase and use the Bordado version. Also removed the `spacing_to_size` function which was only used by `line_coordinates`. Remove the associated tests and the API entry.
This is created by setuptools_scm and shouldn't have been in the repository.
The combination of autosummary and numpydoc was causing warnings about missing stub files because numpydoc was building a toctree for methods but the pages didn't exist (they are all in the main page). Disable this option and remove the attribute autogenerated pages since we already describe important attributes in the docstring.
The Chain class was introduced to help perform trend estimation and spline interpolation in a single step. This was to help prevent data leakage from the trend estimation step. In practice, this is never really used since the trend ended up not being that necessary. The existence of Chain makes it necessary to have the `filter` method, which is awkward. It also makes it necessary for BlockMean and BlockReduce to be classes with a single method (filter) instead of functions, as would be more natural. Removing the Chain fixes all of these issues and won't have a large impact on usability. With the Chain removed, the filter method of the BaseGridder can also be removed.
This avoids clashes with the Python standard library io module.
These are all of the `check_*` functions that are scattered throughout the code. Also take the opportunity to modernize them and combine some that could be made into a single function. Functions `check_data` and `check_coordinates` were combined into `check_tuple_of_arrays`. Function `check_data_names` and `check_extra_coords_names` were combined into `check_names`. Function `check_fit_input` was given more functionality, like ravelling the input arrays and better checking of compatibility between data and weights. The option to unpack data and weights arguments was removed since we're moving towards having estimators always deal with multiple data components.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a PR to keep track of development in the 2.0 branch. It won't be merged as a PR when 2.0 is ready.
PRs that add things meant for 2.0 should be made against this branch and not main. This includes documentation overhauls and breaking changes.
The docs for this branch are deployed to https://www.fatiando.org/verde/dev2.0
See #331 and the 2.0 milestone
DO NOT MERGE THIS PR. The default of "squash and merge" is no good here. We'll have to merge this branch preserving the history.