i-chaochen/age-gender-estimation 0
Keras implementation of a CNN network for age and gender estimation
System for AI Education Resource.
An open autonomous driving platform
Awesome resources for GPUs
i-chaochen/awesome-modern-cpp 0
A collection of resources on modern C++
i-chaochen/awesome-tensor-compilers 0
A list of awesome compiler projects and papers for tensor computation and deep learning.
i-chaochen/DeepLearningSystem 0
Deep Learning System core principles introduction.
Deep Interest Network for Click-Through Rate Prediction / Deep Interest Evolution Network for Click-Through Rate Prediction
Few shot learning in NLP
pull request commentROCmSoftwarePlatform/tensorflow-upstream
Small test cleanup of third_party/xla
retest Ubuntu-CPU please
comment created time in 2 days
pull request commentROCmSoftwarePlatform/tensorflow-upstream
Small test cleanup of third_party/xla
does this PR can fix
dot_dimension_sorter_test
on xla side?It was passing for me, so I assumed it was fixed in the meantime. Will wait if it shows up in the logs.
Great! please also create a PR to upstream into XLA. Thanks for the fix!
comment created time in 2 days
pull request commentROCmSoftwarePlatform/tensorflow-upstream
Small test cleanup of third_party/xla
retest gpu-pycpp please retest cpu-pycpp please retest gpu-non-pip-multi please
comment created time in 2 days
pull request commentROCmSoftwarePlatform/tensorflow-upstream
Small test cleanup of third_party/xla
does this PR can fix dot_dimension_sorter_test
on xla side?
comment created time in 2 days
pull request commentopenxla/xla
[ROCm] Unifying hip/cuda blas-lt APIs
Hi @akuegel wonder have you any suggestion on this PR? Thanks!
comment created time in 2 days
pull request commentopenxla/xla
[ROCm] HipBLASLt row major layout support.
Hi @ddunl wonder how's situation in this PR, is it still requiring another reviewer to approval?
comment created time in 3 days
pull request commentopenxla/xla
[ROCm] Unifying hip/cuda blas-lt APIs
@ddunl could you find a related people to review this PR, please? Thanks!
comment created time in 4 days
push eventROCmSoftwarePlatform/triton
commit sha cfb3b2e967299a183897214b91615a3dadc9f517
keep rocm side only
push time in 4 days
create barnchROCmSoftwarePlatform/triton
branch : triton-mlir-xla-softmax
created branch time in 4 days
pull request commentopenxla/xla
Thanks @ezhulenev I will recheck it tomorrow when you're finished.
comment created time in 5 days
pull request commentopenxla/xla
Thanks for the notice. Yes we are monitoring it by our daily nightly CI and we will try our best to catch up.
comment created time in 5 days
pull request commentopenxla/xla
Do you know which change caused the error out of curiosity? I know there's been a lot of changes to stream_executor recently and wondering if those are causing issues for you
I think it's something related to https://github.com/openxla/xla/commit/d04387fd4075af9d70858b2910cfa19a28a28fee and this is a follow-up based on https://github.com/openxla/xla/commit/5c405c4a25d7f32036c03995383462d6ce9a5619
it seems rocm gpu executor now requires //xla/stream_executor:kernel
but cuda doesn't need it (or maybe it has it in somewhere else)
https://github.com/openxla/xla/pull/5867/files#diff-923ec2612240244a60ae28c17f798a7daeccbab84a8e9de9efd7ba96c311c763R107
comment created time in 5 days
push eventROCmSoftwarePlatform/triton
commit sha b661030287ebc3fcc2832bd53499e60706a0a924
[BACKEND] Fixes for latest llvm changes. To build against llvm/llvm-project@50665511c79f62d97c0d6602e1577b0abe1b982f.
push time in 8 days
push eventROCmSoftwarePlatform/triton
commit sha 6b833f1d5397241e30e1b14477663453ddb22b4c
OpenXLA-specific changes. - Disable short pointer. - Fixup expensiveLoadOrStore().
push time in 8 days
PR opened openxla/xla
ROCm fixes kernel name due to https://github.com/openxla/xla/commit/d04387fd4075af9d70858b2910cfa19a28a28fee
@akuegel @ezhulenev Thanks in advance!
pr created time in 8 days
pull request commentopenxla/xla
[ROCm] gpu command buffer for ROCm
Done
comment created time in 10 days
push eventROCmSoftwarePlatform/xla
commit sha abba1481c40cc7dbdaccbc4bbceb5b38da2b11f4
Initialize creation_pass_id in GPU Compiler This sets the creation pass id and the logical creation pass id to -1 for all ops in the input HLO. -1 is the special value which identifies ops present in the input HLO. By setting the -1 for input ops we will be able to differentiate ops originating from the input from ops generated by optimization passes. PiperOrigin-RevId: 566929969
commit sha cd44c588740c677d115ac7fd1b30cc74acc6e4a5
[XLA:GPU] Priority fusion: don't allow partial fusions. Only consider cases when a producer can be fused will all the consumers. Otherwise fusion is not beneficial. PiperOrigin-RevId: 566932055
commit sha cf1f0bf7121b1c7b444e5a9b63deba7453fa1588
Integrate LLVM at llvm/llvm-project@2baf4a06ef06 Updates LLVM usage to match [2baf4a06ef06](https://github.com/llvm/llvm-project/commit/2baf4a06ef06) PiperOrigin-RevId: 566960191
commit sha e6a84715a45d8c9d18acba1da45587c6e7e5f815
Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/752d6d83d403986227dffe42beb5014843cf2ddb. PiperOrigin-RevId: 566971528
commit sha 4b472be5f3ac0d003fa7323a23cb2c90cd07ab29
Fix layout logic in IrArray::Index::Linearize. The logic previously assumed the layout was major-to-minor. In practice, I believe this assumption is correct in all places where Linearize is called, because LayoutNormalization converts most instructions to be major-to-minor. But in general, IrArray supports arbitrary layouts and so we should obey the given layout in IrArray::Index::Linearize as well. I am working on a change which calls IrArray::Index::Linearize in another place, and it requires Linearize to correctly handle non-major-to-minor layouts. Also fix a comment incorrectly stating multidim was major-to-minor. PiperOrigin-RevId: 566971940
commit sha 7c7ff5a0c07bce1cfa3004f2989a04418b6775da
Internal, fix build error. PiperOrigin-RevId: 566975236
commit sha 6ea1dbae84db61b8b1684018b88726dc091ae837
Update Docker image tags and digests in Kokoro and Bazel This is changing the Kokoro job config back to its original state (using the latest-python3.X docker image tags) and is updating the Bazel remote toolchain config to the latest Docker containers. (The ones created here https://github.com/tensorflow/tensorflow/actions/runs/6239795702/job/16938502214 which also the latest tags refer to.) PiperOrigin-RevId: 566975309
commit sha ac171499bdc795a1e5c4e464798095933e7984f2
[NFC] Log NCCL calls for AllToAll PiperOrigin-RevId: 566988443
commit sha c5138a43b771b6ecb779a66c6dd2acb0b25d0604
Rename tsl/cuda/BUILD to tsl/cuda/BUILD.bazel. Unify _stub and _lib targets in tsl/cuda (under an unsuffixed name), and update users to point directly to merged target. This also allows us to remove some .bzl macros. PiperOrigin-RevId: 566996155
commit sha aca5aa0e853b201c94420f41c22f5b837dd97f76
ROCm adds gpu command buffer
push time in 10 days
Pull request review commentopenxla/xla
[ROCm] gpu command buffer for ROCm
load( "//xla/stream_executor:build_defs.bzl", "stream_executor_friends", )+load(+ "//xla:xla.bzl",+ "xla_cc_test",
Thanks for pointing out. It's removed.
comment created time in 10 days
push eventROCmSoftwarePlatform/xla
commit sha c7bb8d7976996b467ee0297ffb854a2d84e31fe7
ROCm adds gpu command buffer
push time in 10 days
push eventi-chaochen/DeepLearningSystem
commit sha 7b8abfb9f2ed948159b82a9ceef499d8bcc89b71
fix #63 issues.
push time in 10 days
create barnchROCmSoftwarePlatform/tensorflow-upstream
created branch time in 10 days