Time for Rust/Nextflow

Let's see where this goes.

Time for Rust/Nextflow
Photo by Mackenzie Cruz / Unsplash

I've had a fascination with Rust for two years and it's time start coding. Let's see where this goes. I think the first thing I'll try is a pretty basic exercise to implement a bioinformatic application that computes symmetry of DNA curvature values across nucleosome-sized windows of sequence given a FASTA file. The algorithm is one I encountered early in my time as a PhD student, and comes from this paper.

So what's the plan? Build out the MVP then make it suitable for the public. Basically a progression following:

  1. Write a design document. Essentially a short but more technical and illustrative version of this itemization and the end-to-end development pipe in its full glory.
  2. Make a Rust "hello world": start with a binary.
  3. Add FASTA-reading capability there with tests.
  4. Add SymCurv algorithm and tests.
  5. Add bedgraph (.bed.gz) outputting capability.
  6. Add bigWig outputting capability.
  7. Refactor Rust binary as Rust binary+library.
  8. Publish to crates.io.
  9. Add Pyo3 Python bindings to Rust library.
  10. Integrate Numpy.
  11. Publish auto-generated rustdoc documentation to docs.rs, auto-generated Python docs to readthedocs.
  12. Publish python package to PyPi.
  13. Build a GitHub Pages page to be the homepage, with usage examples and installation information. This tends to be just a longer, prettier version of the repository's README.md page, but nevertheless it's expected.
  14. Build FASTA-processing Nextflow pipeline locally, with nf-test unit testing included. Decide on a virtualization or container strategy to possess the runtimes. This might be Docker, might be something else.
  15. Add Nextflow pipeline to public NF Tower.
  16. Build out reporting features of NF pipeline with MultiQC, and find a destination for reports (perhaps GitHub pages).
  17. Add integration test for Rust symcurve's repository by having a GitHub Action (CI/CD) trigger the NF Tower pipeline.
  18. Contribute symcurve pipeline to nf-core.
  19. Bells and whistles like badging the GitHub repository.
  20. Add public UCSC Genome Browser Hub and tracks for several genomes and find a free/cheap host for the corresponding processed bigWig/bedgraphs.

A lot of details are missing but that covers the gist of it. Creating a professional open source project has a lot of components.