Skip to content
Avatar

Highlights

  • Arctic Code Vault Contributor
  • Pro
dnbaker/README.md

Hi, I'm Daniel 👋

PhD Candidate at Johns Hopkins University department of Computer Science. Previously, I was a Bioinformatics Scientist at ARUP Laboratories, where I worked on cell-free circulating tumor DNA (ctDNA) analysis and clinical genomics after my training in Physics [BS] and Biophysics/Computational Biology [MS].

🔭 Currently working on similarity search, and clustering, and indexing for large-scale data

😄 Pronouns: He/Him/His

A quick tour of my interests

  1. Practical randomized algorithms

This ranges from libraries providing sketch data structures and coresets, as well as projects using random projections and DCI.

My work on coresets and clustering is primarily part of the minocore project, with the aims of providing a standard utility for coreset construction and weighted clustering, especially for exponential family models and shortest-paths metrics.

  1. Computational Biology

The bonsai project provides methods for metagenomic analysis, along with k-mer encoding/decoding and I/O, while the Dashing performs scalable sketching and comparison of sequence data.

BMFtools performs molecular demultiplication over sequencing barcoded data, reducing error rates while eliminating redundant information. Designed for ctDNA, this method can reduce error rates by orders of magnitude, allowing confident detection of very rare events.

  1. General C++

Most of my projects fall into this category, serving as tools I can reuse in various projects.

Some of my favorites:

  • vec provides type-generic abstractions over x86-64 vectorization, making it easy to write fast, portable code.
  • kspp is an RAII-based variant of kstring from klib with extra niceties making appending printf-style formatting easy.
  • aesctr provides STL-style random number generators built on fast aes-ctr and wyhash
  • circularqueue provides a range-based circular queue container that uses power-of-two sizes

Pinned

  1. Fast and accurate genomic distances using HyperLogLog

    C++ 128 7

  2. C++ Implementations of sketch data structures with SIMD Parallelism, including Python bindings

    C++ 81 4

  3. Bonsai: Fast, flexible taxonomic analysis and classification

    C++ 54 7

  4. FRP: Fast Random Projections

    C++ 30 2

  5. Barcoded Molecular Families

    C++ 18 5

  6. Type-generic SIMD library for optimized generic code generation

    C++ 9

1,473 contributions in the last year

Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Mon Wed Fri

Contribution activity

November 2020

Created 20 commits in 1 repository
Created 1 repository

Created a pull request in dnbaker/minicore that received 2 comments

Dev

New SIMD optimizations.

+1,514 −546 2 comments

Seeing something unexpected? Take a look at the GitHub profile guide.

You can’t perform that action at this time.