(Translated by https://www.hiragana.jp/)
[TODO] 开发memory_reserved算子 和 性能优化 · Issue #122 · Oneflow-Inc/one-yolov5 · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TODO] 开发memory_reservedさん 性能せいのう优化 #122

Open
2 tasks
ccssu opened this issue Mar 27, 2023 · 2 comments
Open
2 tasks
Labels

Comments

@ccssu
Copy link
Collaborator

ccssu commented Mar 27, 2023

  • - 利用りようprofile 工具こうぐ优化性能せいのう标: GLM仓库性能せいのう优化 https://github.com/Oneflow-Inc/libai/tree/main/projects/GLM , one-yolov5ふん类模がた性能せいのう优化)
  • - がく习oneflow ないそん管理かんり

profile工具こうぐ上手じょうず

one-yolov5项目

项目: https://github.com/Oneflow-Inc/one-yolov5
かずすえしゅうみち: @oneflow-25:/data/home/fengwen/imagenette160
权重みち: @oneflow-25:/data/home/fengwen/weight_v1_2_0

如果执行nsys产生报错
The target application terminated. One or more process it created re-parented.
Waiting for termination of re-parented processes.
Use the `--wait` option to modify this behavior.

请将 train.pyちゅう check_git_status() 这いちぎょうちゅう

glm 项目

项目: https://github.com/Oneflow-Inc/libai/tree/main/projects/GLM
权重みち:
@oneflow-25:/data/home/xiezipeng/glm-10b-chinese
@oneflow-25:/data/home/xiezipeng/glm-10b

@ccssu
Copy link
Collaborator Author

ccssu commented Mar 27, 2023

结合NVTXちゅう释上しゅnsys

NVTXいち种工まこと许开发人员使用しようてい义标记注释其だい码,这些标记以在ぞうNVIDIA Nsight Systems(nsys)这样てき性能せいのう分析ぶんせき工具こうぐちゅう视化。这些标记以帮じょ开发じん了解りょうかい其代码的性能せいのうとくせい,并确てい优化てき领域。

nvtx 教程きょうてい: https://nvtx.readthedocs.io/en/latest/index.html

Python Demo

import numpy as np
import cupy as cp
import nvtx

@nvtx.annotate("fft function", color="blue")
def fast_fft(input_array):
    with nvtx.annotate("Copy input array to GPU and CuPy", color="red"):
        gpu_array = cp.array(input_array)
    with nvtx.annotate("GPU FFT operation", color="yellow"):
        result = cp.fft.fft(gpu_array)
    with nvtx.annotate("Copy back to CPU and Numpy", color="green"):
        cpu_result = cp.asnumpy(result)
    return cpu_result

for i in range(5):
    print(fast_fft(np.random.random(10)))

启动指令しれい:

nsys profile python3 demo.py

image
うえ图对应的 nsysぶんけん report1.zip

C++ Demo

#include <cuda_runtime.h>
#include "nvToolsExt.h"
#include <iostream>

// てい义向りょう加法かほうてき CUDA かく函数かんすう
__global__ void vectorAdd(const float *A, float *C, int N) {
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    if(i < N) {
        C[i] = A[i] + 1.0f;
    }
}

// 启动 CUDA かく函数かんすう
void launch_kernel(const float *A, float *C, int N) {
    nvtxRangePushA("_FUNCTION_"); // 开始记录 _FUNCTION_ てき时间戳
    int threadsPerBlock = 256;
    int blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock;
    for(int i = 0; i < 4; i++) {
        nvtxRangePushA("vectorAdd"); // 开始记录 vectorAdd てき时间戳
        vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(A, C, N);
        nvtxRangePop(); // 结束记录 vectorAdd てき时间戳
    }
    nvtxRangePop(); // 结束记录 _FUNCTION_ てき时间戳
}

int main() {
    const int N = 100;
    float *A, *C;
    cudaMallocManaged(&A, N * sizeof(float));
    cudaMallocManaged(&C, N * sizeof(float));
    for(int i = 0; i < N; i++) {
        A[i] = static_cast<float>(i);
        C[i] = 0.0f;
    }
    std::cout << "Launching kernel..." << std::endl;
    launch_kernel(A, C, N);
    cudaFree(A);
    cudaFree(C);
    return 0;
}
// 完成かんせいほどじょ

Reference

@ccssu ccssu added the Guide label Mar 27, 2023
@hhhfccz
Copy link

hhhfccz commented Mar 27, 2023

memory_reservedさん目前もくぜん打算ださん使用しよう直接ちょくせつ调CUDA APIてき方式ほうしき需要じゅよう更改こうかいoneflow BInAllocator部分ぶぶん跟lazy_init一起かずきりょう,这周PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants