Highlights
- Arctic Code Vault Contributor
- Developer Program Member
Create your own GitHub profile
Sign up for your own profile on GitHub, the best place to host code, manage projects, and build software alongside 50 million developers.
Sign up
Pinned
1,540 contributions in the last year
Contribution activity
September 2020
Created a pull request in PaddlePaddle/Paddle-Lite that received 2 comments
[CORE][PROFILE] Write output tensor to file for each OP when precision profiler enabled
状态:等待review 主要内容 修复opencl对于output写入文件失败的问题; 修复tensor名带有反斜杠的问题; 默认开启每个op的输出tensor写入到手机指定目录的单独文件,一个output tensor对应一个文件。见如下日志: [I 9/14 15:29: 5.311 …
+97
−33
•
2
comments
- [cherry-pick][BugFix][OPENCL] BugFix for OpenCL: image memory malloc; dropout kernel register; precision profiler enhance; layout pass bugfix for opencl
- [PASS][BugFix] Fix layout pass for opencl when convert model
- [OPENCL][BugFix] Fix conv when memory reuse. test=develop
- [cherry-pick][PROFILE] Add ENV var controls whether write output tensor of each op to files; Rename output tensor name when mem_reuse pass enabled by default etc. (#4348)
- [PROFILE] Add ENV var controls whether write output tensor of each op to files; Rename output tensor name when mem_reuse pass enabled by default etc.
- [DEMO] Support multi inputs for mobile_light demo. test=develop
- [cherry-pick][PROFILE][BugFix] Precision profiler writes output tensor to files for each op; Fix dropout opencl kernel register
- [BugFix][OPENCL][KERNEL] Fix opencl dropout. test=develop
- [cherry-pick][BugFix][KERNEL][OPENCL] Fix opencl conv3x3 group
- [BugFix][KERNEL][OPENCL] Fix conv3x3 group. test=develop
- [arm] fix xiaodu a53 crash problem
- [cherry-pick] fix xiaodu run kernnel in a53 problem. test=develop
- delete ch_four contorl in conv_3x3_dw. test=develop
- fix conv_conv fusion error in conv_dw+conv_1x1
- [cherry-pick] [ARM] Add int64 implement for `gather` and `greater_than`
- [Cherry-pick][ARM] rm redundant time-profile func. test=develop
- renote conv_conv, in some case conv_Conv compute error
- [arm] fix conv_3x3_dw compute error in no-equal-padding. test=develop
- [Cherry- pick] fix pooling3x3s2 max. test=develop
- remove paddle mobile old project , never say good bye
- [Cherry-pick][Bugfix][OpenCL][Core] fix opencl multi-run result error
- [BUG FIX] Fix the issue that light_api_shared.so can not work on full_publish compiling
- [Framework] Add method for specifying initial size of `workspace_`
- win32 thread-local support, test=develop
- [Bugfix][OpenCL][Core] fix opencl multi-run result error when using memory_optimize_pass
- [cherry-pick][PROFILE] Add ENV var controls whether write output tensor of each op to files; Rename output tensor name when mem_reuse pass enabled by default etc. (#4348)
- Dot
- [arm]Bilinear resize compute error fix
- [NPU] Fix build error caused by flatbuffer if the target is tiny_publish
- [Cherry-pick][Bugfix][OpenCL] fix depthwise_conv2d_3x3 with dilation > 1 (#4281)
- [arm] add reduce_sum op on arm. test=develop
- [Cherry-pick][BugFix][OpenCL] Fix concat image impl when axis is not 1 (#4241)
- [Bugfix][OpenCL] fix depthwise_conv2d_3x3 with dilation > 1
- [BugFix][OpenCL] Fix concat image impl when axis is not 1. test=develop
- [BugFix][PASS] Handle layout cast when input place is <host, float, ImageDefault, 0>. test=develop
Created an issue in AI-performance/embedded-ai.bench that received 1 comment
tensorflow lite没有设置绑定大核,taskset
Reducing variance between runs on Android. Most modern Android phones use ARM big.LITTLE architecture where some cores are more power hungry but fa…
1
comment
1
contribution
in private repositories
Sep 15