[RELEASE] cuvs v24.08 #274

raydouglass · 2024-08-01T17:27:44Z

❄️ Code freeze for `branch-24.08` and v24.08 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-24.08 until release (merging of this PR).

What is the purpose of this PR?

Update documentation
Allow testing for the new release
Enable a means to merge branch-24.08 into main for the release

Forward-merge branch-24.06 into branch-24.08

Contributes to rapidsai/build-planning#31. Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Jake Awe (https://github.com/AyodeAwe) - Dante Gama Dessavre (https://github.com/dantegd) - James Lamb (https://github.com/jameslamb) - Bradley Dice (https://github.com/bdice) URL: #145

Forward-merge branch-24.06 into branch-24.08

This change allows serializing to a std::ostream and deserializaing from a std::istream. This also fixes some minor docstring issues in the C++ serialization api's. Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Divye Gala (https://github.com/divyegala) URL: #173

Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #175

This PR overhauls how `ops-codeowners` reviews are handled. `ops-codeowners` is replaced by `ci-codeowners` & `packaging-codeowners`. The coverage of files is expanded as well. Additionally, the process will change: reviews will be assigned to a member of the teams instead of a manual request to `ops-codeowners`. --------- Co-authored-by: Bradley Dice <bdice@bradleydice.com>

This PR removes text builds of the documentation, which we do not currently use for anything. Contributes to rapidsai/build-planning#71. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) - Jake Awe (https://github.com/AyodeAwe) URL: #180

Use raft's large workspace resource for large temporary allocations during ANN index build. This is the port of rapidsai/raft#2194, which didn't make into raft before the algorithms were ported to cuVS. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #181

…wup (#185) Contributes to rapidsai/build-planning#31 Contributes to rapidsai/dependency-file-generator#89 Since #145 was merged, we've made some small adjustments to the approach for `rapids-build-backend`. This catches `cuvs` up with those changes: * consolidates version-handling in `ci/` scripts * uses `--file-key` instead of `--file_key` in `rapids-dependency-file-generator` calls Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #185

This changes the build_index method to build in the python API for cagra. All of the other python api's use a `build` method for building the index, as do both the C++ and Rust api's as well. Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #187

Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #186

There is a bug in the current CAGRA graph rank-based neighbor reordering process. A low recall or illegal memory access can occur if there are many detourable nodes from a node to its neighbors, e.g. there is a small subgraph in the initial kNN graph. This PR fixes this. Authors: - tsuki (https://github.com/enp1s0) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #192

The Python library is distributed as conda package `cuvs`, but the installation docs say `pycuvs`. This fixes that. Looked for other uses like this: ```shell git grep -i pycuvs ``` Didn't find any. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #193

Porting the ANN benchmarks from RAFT. - [x] Make it build Sanity check that benchmarks work (runs and gives reasonable recall for Deep-1M dataset) - [x] cuVS brute force kNN - [x] cuVS IVF-Flat - [x] cuVS IVF-PQ (+ refinement) - [x] cuVS CAGRA - [x] cuVS CAGRA-Q (+refinement) - [x] Faiss GPU/CPU IVF-Flat & IVF-PQ - [x] HNSW - [x] CAGRA + HNSW - [x] GGNN NB: the indices built using the old ANN_BENCH in raft tend to crash in cuvs search benchmarks during index deserialization - don't forget to build the indexes anew when testing. Authors: - Artem M. Chirkin (https://github.com/achirkin) - Malte Förster (https://github.com/mfoerste4) - Tamas Bela Feher (https://github.com/tfeher) - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - James Lamb (https://github.com/jameslamb) - Corey J. Nolet (https://github.com/cjnolet) URL: #130

Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Divye Gala (https://github.com/divyegala) URL: #203

Add an example project using the cuvs bindings uploaded to crates.io, as well as some basic instructions on how to compile Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #206

This updates the notebook link from https://github.com/rapidsai/cuvs/tree/HEAD/exmples (doesn't exist) to https://github.com/rapidsai/cuvs/tree/branch-24.08/notebooks Authors: - Ray Bell (https://github.com/raybellwaves) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #191

Some minor fixes that are required to publish our rust bindings to crates.io: * using relative paths in the cuvs-sys cmake files didn't work, get around this by symlinking required files instead * Need to specify an actual version for cuvs-sys and ndarray-rand packages in the rust/cuvs/Cargo.toml file Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Ray Douglass (https://github.com/raydouglass) URL: #207

Port from rapidsai/raft#2350 Authors: - Yinzuo Jiang (https://github.com/jiangyinzuo) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #202

With the deployment of rapids-build-backend, we need to make sure our dependencies have alpha specs. Contributes to rapidsai/build-planning#31 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - James Lamb (https://github.com/jameslamb) URL: #209

Contributes to rapidsai/build-planning#80 Adds constraints to avoid pulling in CMake 3.30.0, for the reasons described in that issue. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Ben Frederickson (https://github.com/benfred) - Bradley Dice (https://github.com/bdice) URL: #214

Usage of the CUDA math libraries is independent of the CUDA runtime. Make their static/shared status separately controllable. Contributes to rapidsai/build-planning#35 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Ben Frederickson (https://github.com/benfred) URL: #216

This PR introduces the new vector addition feature to CAGRA. Rel: rapidsai/raft#1775 Original PR: rapidsai/raft#2157 CAGRA-Q is not supported ## Usage ```cpp auto additional_dataset = raft::make_host_matrix<float, int64_t>(res,updated_dataset_size, dim); cuvs::neighbors::cagra::extend(handle, raft::make_const_mdspan(additiona_dataset.view()), cagra_index); ``` ## Algorithm Graph degree: d The algorithm consists of two stages: rank-based reordering and reverse edge addition. 1. Rank-based reordering 1-1. Obtain d' (=2d) nearest neighbor vectors (V) of a given new vector using the CAGRA search 1-2. Count the number of detourable edges using the result of step 1 and the neighbor list of the input index. Then we prune (3*d/2) edges in the same way as the CAGRA graph optimization. Through this operation, we decide d/2 neighbors. 2. Reverse edge addition 2-1. Count the number of incoming edges for all nodes. 2-2. Add d/2 reverse edges from the nodes added to the neighbor list in Step 1 by replacing a node with a new node. To prevent the connection to the replaced node from being lost, we add the node to the neighbor list of the new node. This allow us to make a detour connection. The replaced nodes are the largest number of incoming edge nodes in the 2/d nodes from the back of the neighbor list without duplication with the nodes already in the neighbor list. ## Performance In this experiment, we first split the dataset into two parts: the initial and the additional part. Then, we extend the CAGRA index built by the initial part to include the additional part. ![search-eval](https://github.com/rapidsai/raft/assets/12711693/0fbae9e5-defc-4263-9d34-176667fb3359) We can see a larger recall drop compared to the baseline by increasing the number of added vectors. Therefore, rebuilding the CAGRA index is recommended when one wants to add a lot of vectors. Authors: - tsuki (https://github.com/enp1s0) - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: #151

Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #220

After updating everything to CUDA 12.5.1, use `shared-workflows@branch-24.08` again. Contributes to rapidsai/build-planning#73 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - James Lamb (https://github.com/jameslamb) URL: #234

Attempting to pin the version of a raft to a custom fork wasn't working, and it was still using the version installed by conda. Fix by mirroing the `CUML_RAFT_CLONE_ON_PIN` logic found in the cuml cmake files. Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Divye Gala (https://github.com/divyegala) URL: #235

Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Ben Frederickson (https://github.com/benfred) URL: #229

A small change that reduces the number of arguments in one of the wrapper layers in the detail namespace of CAGRA. The goal is twofold: 1) Simplify the overly long signature of `selet_and_run` (which has many instances) 2) Give access to all search parameters for future upgrades of the search kernel This is to simplify the integration (and review) of the persistent kernel (#215). No performance or functional changes expected. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #227

rapidsai/raft#2346 introduced a breaking change in the API. This PR fixes the API usage. Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #249

Contributes to rapidsai/build-planning#31 In short, RAPIDS DLFW builds want to produce wheels with unsuffixed dependencies, e.g. `cudf` depending on `rmm`, not `rmm-cu12`. This PR is part of a series across all of RAPIDS to try to support that type of build by setting up CUDA-suffixed and CUDA-unsuffixed dependency lists in `dependencies.yaml`. For more details, see: * rapidsai/build-planning#31 (comment) * rapidsai/cudf#16183 ## Notes for Reviewers ### Why target 24.08? This is targeting 24.08 because: 1. it should be very low-risk 2. getting these changes into 24.08 prevents the need to carry around patches for every library in DLFW builds using RAPIDS 24.08 Authors: - James Lamb (https://github.com/jameslamb) - Paul Taylor (https://github.com/trxcllnt) Approvers: - Bradley Dice (https://github.com/bdice) URL: #247

Port rapidsai/raft#2323 PR from RAFT [Cleans up a collection of anti-patterns in the cuvs CMake code while also enabling building faiss from latest main] Authors: - Tarang Jain (https://github.com/tarang-jain) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Corey J. Nolet (https://github.com/cjnolet) - Paul Taylor (https://github.com/trxcllnt) - Ray Douglass (https://github.com/raydouglass) URL: #241

Add extra information to benchmark context for better reproducibility and performance analysis: 1. Full command line used to call the executable (so you can copy-paste and run again). 2. More CUDA device information: whether HMM, AST, or host atomics are available (how GPU can efficiently communicate with CPU). 3. Host information: min/max frequences, used virtual processors and cores, available physical memory and swap (does the benchmark segfault due to not enough host memory? is SMT enabled? etc). Addresses parts of #160 Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #248

Authors: - rhdong (https://github.com/rhdong) - Ben Frederickson (https://github.com/benfred) Approvers: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) URL: #174

Authors: - rhdong (https://github.com/rhdong) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #251

Utils / Helpers to enable FAISS migration to cuVS from RAFT. Authors: - Tarang Jain (https://github.com/tarang-jain) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: #213

iteraton -> iteration Authors: - Ikko Eltociear Ashimine (https://github.com/eltociear) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #232

Authors: - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #244

) This PR allows us to guarantee the connectivity of the CAGRA search graph using approximate MST. It has been empirically shown that the graph indexes generated by CAGRA for search provide comparable search accuracy to other libraries, but reachability from any node to all nodes is not guaranteed. In fact, it has been confirmed that the number of strongly connected components (SCC) of graph indexes created by CAGRA is not 1 in some 100M scale datasets. This problem can be alleviated by increasing the number of degrees in the search graph, but this would increase the size of the graph index. It is desirable to address this problem without increasing the number of degrees of the search graph. Prior study has shown that this can be solved by using a Minimum Spanning Tree (MST)-like approach, but in general, MST calculation takes a long time. However, what is needed here is not an exact MST, but, for example, an approximate MST in which the total number of edges is not necessarily minimum. Such an approximate MST could be computed quickly on GPUs. This PR contains implementation to create a approximate MST on the GPU at high speed based on the above policy and use it to guarantee the connectivity of the search graph. This functionality is not always required, so it is considered an opt-in feature. A member variable named `guarantee_connectivity` is added to `index_params`, so set this variable to `true` if you wish to use this featgure. > cuvs::neighbors::cagra::index_params index_params; > index_params.guarantee_connectivity = true; > auto index = cuvs::neighbors::cagra::build(res, index_params, dataset_view); Authors: - Akira Naruse (https://github.com/anaruse) - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: #237

Currently, in IVF index building (both IVF-Flat and IVF-PQ), large dataset is usually in pageable host memory or mmap-ed file. In both case, after the cluster centers are trained, the entire dataset needs to be copied twice to the GPU -- one for assigning vectors to clusters, the other for copying vectors to the corresponding clusters. Both copies are done using `batch_load_iterator` in a chunk-by-chunk fashion. Since the source buffer is in pageable memory, the current `batch_load_iterator` implementation doesn't support kernel and memcopy overlapping. This PR adds support on prefetching with `cudaMemcpyAsync` on pageable memory. We achieve kernel copy overlapping by launching kernel first following by the prefetching of the next chunk. We benchmarked the change on L40S. The results show 3%-21% speedup on index building, without impacting the search recall (about 1-2%, similar to run-to-run variance). algo | dataset | model | with prefetching (s) | without prefetching (s) | speedup -- | -- | -- | -- | -- | -- IVF-PQ | deep-100M | d64b5n50K | 97.3547 | 100.36 | 1.03 IVF-PQ | wiki-all-10M | d64-nlist16K | 14.9763 | 18.1602 | 1.21 IVF-Flat | deep-100M | nlist50K | 78.8188 | 81.4461 | 1.03 This PR is related to the issue submitted to RAFT: rapidsai/raft#2106 Authors: - Rui Lan (https://github.com/abc99lr) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #230

`libcuvs.so` contains fp16 kernels that are not accessible (missing headers and missing public entry points). This PR removes the unused kernel. Authors: - Tamas Bela Feher (https://github.com/tfeher) Approvers: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) URL: #268

Random sampling of training set for IVF methods was reverted in rapidsai/raft#2144 due to the large memory usage of the subsample method. Since then, PR rapidsai/raft#2155 has implemented a new random sampling method with improved memory utilization. Using that we can now enable random sampling of IVF methods (rapidsai/raft#2052 and rapidsai/raft#2077). Random subsampling has measurable overhead for IVF-Flat, therefore it is only enabled for IVF-PQ. Authors: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #122

copy-pr-bot · 2024-08-01T17:27:47Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

raydouglass and others added 30 commits May 20, 2024 17:41

DOC v24.08 Updates [skip ci]

105e47e

Merge pull request #140 from rapidsai/branch-24.06

50f2907

Forward-merge branch-24.06 into branch-24.08

Merge pull request #147 from rapidsai/branch-24.06

29ebe32

Forward-merge branch-24.06 into branch-24.08

Merge pull request #149 from rapidsai/branch-24.06

a05fc3c

Forward-merge branch-24.06 into branch-24.08

Merge pull request #150 from rapidsai/branch-24.06

dee0d0f

Forward-merge branch-24.06 into branch-24.08

Merge branch-24.06 into branch-24.08

9a06102

Merge pull request #169 from benfred/branch-24.08-merge-24.06

c39d999

Forward-merge branch-24.06 into branch-24.08

Merge pull request #176 from rapidsai/branch-24.06

b8a65b4

Forward-merge branch-24.06 into branch-24.08

Add refine to the Python and C api's (#175)

b771085

Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #175

Add python serialization API's for ivf-pq and ivf_flat (#186)

9dc3a4d

Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #186

Adding IVF examples (#203)

25630a0

Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Divye Gala (https://github.com/divyegala) URL: #203

Add rust example (#206)

7bac20f

Add an example project using the cuvs bindings uploaded to crates.io, as well as some basic instructions on how to compile Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #206

Fix compilation error when _CLK_BREAKDOWN is defined in cagra. (#202)

8dfef2b

Port from rapidsai/raft#2350 Authors: - Yinzuo Jiang (https://github.com/jiangyinzuo) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #202

Add python bindings for ivf-* extend functions (#220)

37fe9f4

Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #220

KyleFromNVIDIA and others added 17 commits July 19, 2024 17:23

Moving over C++ API of CAGRA+hnswlib from RAFT (#229)

63285f7

Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Ben Frederickson (https://github.com/benfred) URL: #229

Use raft::util::popc(...) public API (#249)

b442756

rapidsai/raft#2346 introduced a breaking change in the API. This PR fixes the API usage. Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #249

[FEA] expose python & C API for prefiltered brute force (#174)

2517826

Authors: - rhdong (https://github.com/rhdong) - Ben Frederickson (https://github.com/benfred) Approvers: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) URL: #174

[Opt] introduce the masked_matmul to prefiltered brute force. (#251)

33698a5

Authors: - rhdong (https://github.com/rhdong) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #251

chore: update search_plan.cuh (#232)

0227527

iteraton -> iteration Authors: - Ikko Eltociear Ashimine (https://github.com/eltociear) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #232

Add cuvs_bench python folder, config files and constraints (#244)

812fffd

Authors: - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #244

raydouglass requested review from a team as code owners August 1, 2024 17:27

github-actions bot added ci cpp CMake Python labels Aug 1, 2024

Update Changelog [skip ci]

653bf27

raydouglass merged commit 0be69fd into main Aug 8, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RELEASE] cuvs v24.08 #274

[RELEASE] cuvs v24.08 #274

raydouglass commented Aug 1, 2024

copy-pr-bot bot commented Aug 1, 2024

[RELEASE] cuvs v24.08 #274

[RELEASE] cuvs v24.08 #274

Conversation

raydouglass commented Aug 1, 2024

❄️ Code freeze for branch-24.08 and v24.08 release

What does this mean?

What is the purpose of this PR?

copy-pr-bot bot commented Aug 1, 2024

❄️ Code freeze for `branch-24.08` and v24.08 release