Time for Rust/Nextflow
Let's see where this goes.
I've had a fascination with Rust for two years and it's time start coding. Let's see where this goes. I think the first thing I'll try is a pretty basic exercise to implement a bioinformatic application that computes symmetry of DNA curvature values across nucleosome-sized windows of sequence given a FASTA file. The algorithm is one I encountered early in my time as a PhD student, and comes from this paper.
So what's the plan? Build out the MVP then make it suitable for the public. Basically a progression following:
- Write a design document. Essentially a short but more technical and illustrative version of this itemization and the end-to-end development pipe in its full glory.
- Make a Rust "hello world": start with a binary.
- Add FASTA-reading capability there with tests.
- Add SymCurv algorithm and tests.
- Add bedgraph (.bed.gz) outputting capability.
- Add bigWig outputting capability.
- Refactor Rust binary as Rust binary+library.
- Publish to crates.io.
- Add Pyo3 Python bindings to Rust library.
- Integrate Numpy.
- Publish auto-generated rustdoc documentation to docs.rs, auto-generated Python docs to readthedocs.
- Publish python package to PyPi.
- Build a GitHub Pages page to be the homepage, with usage examples and installation information. This tends to be just a longer, prettier version of the repository's README.md page, but nevertheless it's expected.
- Build FASTA-processing Nextflow pipeline locally, with nf-test unit testing included. Decide on a virtualization or container strategy to possess the runtimes. This might be Docker, might be something else.
- Add Nextflow pipeline to public NF Tower.
- Build out reporting features of NF pipeline with MultiQC, and find a destination for reports (perhaps GitHub pages).
- Add integration test for Rust symcurve's repository by having a GitHub Action (CI/CD) trigger the NF Tower pipeline.
- Contribute symcurve pipeline to nf-core.
- Bells and whistles like badging the GitHub repository.
- Add public UCSC Genome Browser Hub and tracks for several genomes and find a free/cheap host for the corresponding processed bigWig/bedgraphs.
A lot of details are missing but that covers the gist of it. Creating a professional open source project has a lot of components.