19 Data Science Incubator

19.1 Finding incubator participants

Incubator project lead should be housed within ALVSCE.
Where to find?
- Office hour and workshop attendees.
- Search arizona.edu for interesting and relevant projects.
- Attend seminars and brainstorm.
Advertising

Announcement Text:

The CCT Data Science Team is excited to announce the ALVSCE Data Science Incubator. This is an ongoing opportunity for researchers in ALVSCE to work with our team to understand and take advantage of research opportunities enabled by new and emerging data resources and computing methods. This opportunity is open to all researchers and staff in ALVSCE.

Our team of domain data scientists and software engineers has expertise in state-of-the-art technologies and methods alongside domain expertise. These include statistics, data synthesis, data management, GIS, simulation modeling, analytics, cloud and cluster computing, software design and engineering, visualization, and more. The ALVSCE Data Science Incubator will provide up to ten full-time days of support for ALVSCE projects spread over three months to work on focused, intensive, collaborative projects.

We invite short project plans (1-2 pages) for a data-intensive research collaboration focusing on a research problem with clearly defined outcomes. This support is funded by USDA Hatch funds, and can be leveraged to develop funding for longer term collaborations. See our project plan guidelines and contact the group at cct-datascience@arizona.edu for more information.

19.2 Project Plan Development

Generally starts with one or two conversations to be sure the project is appropriate for an incubator
Naming convention for incubators are “Short title / main contact’s name”
A new folder for the project should be created in the incubator Google Drive folder; notes and project plan will go there
Ideally, the project lead creates a first draft of a project plan. In practice, this often involves nudging, guidance, and a few rounds of revisions before a project plan is ready.
Project plans should clearly identify expected product, tasks required to result in that product, and who (project member or data science team member) is responsible for each of the tasks.
Include any hard deadlines.
Determine with project lead where to put data/code/outputs and whether they are comfortable with having a public repository and / or planning to share data.
project plan can be versioned and modified as the project evolves.

19.3 Incubator Hour

To keep the group apprised of developments for each incubator, we will hold a 1-1.5 hour meeting biweekly during the off-week from Sprint Planning.

Our running notes for incubator hour are on HackMD.

19.4 Starting incubator projects

19.4.1 Collaboration

When ready to start, send an email to notify project lead and collaborators that we are ready, propose a rough timeline, and schedule a kickoff meeting.
Kickoff meeting
- After project plan is accepted
- Review project plan, define scope, see if there are any other changes or updates
- Make sure to define contributions from all parties and how the project will build capacity.
- Set expectations for conclusion, skill development, and eventual hand-off
- Review, update, agree on timeline for each deliverable and project conclusion / next steps post incubator if applicable
- Schedule regular check-ins and co-working times.
  - weekly or biweekly is a good starting point

19.4.2 Internally managing time and project progress

Create a repository if necessary or ensure you have access to one that exists. It is preferable if these are under a collaborator account or organization from the get go.
Add GitHub epic for incubator project to incubator board when initially developing the project—that way we can have a backlog of ideas and potential projects.
Create an Epic with sub-tasks for each project plan to track progress internally.
Move epic through relevant categories as project progresses.
Track time in approximately hourly increments on group’s Toggl page.
The two weeks of FTE work is the time required to complete the proposed scope of work. Additional time for planning, writing the project plan, and follow up post-project plan aren’t included in this. However, for purposes of reporting it is useful to know or be able to reasonably estimate the total time spent.

19.5 Finishing incubator projects

19.5.1 Wrap-up steps

Before the final meeting:

Transfer ownership of GitHub repo(s) to collaborator (if applicable)
Make sure that the project gives recognition to our group (guidelines)
Make sure collaborators feel comfortable taking over as primary maintainers of project
Archive code with Zenodo or UA Library ReData
Where data is likely to be useful independent of the analysis code, it should be split into a separate archive and put in an appropriate repository.
Preferred license and copyright is CC-0 for data and MIT License for code.
- See “Why does Dryad use CC0”
- copyright can be listed as “The University of Arizona and Authors”

Send an email to schedule a final wrap-up meeting and to mark the conclusion of the incubator phase of the collaboration.

Example email

Hi Jeremy,

As we come to the end of the incubator phase of our collaboration with AZMET, I just want to take some time and reflect on all the great stuff we’ve accomplished together. I looked over the goals we set out to accomplish in the original project plan, and I think we’ve done a good job meeting them.

Static QA/QC routines that check data prior to going into the database ✅
Static checks for whether data for a station was retrieved in the previous hour: I don’t think we put this one into place yet, but this could be a pretty simple addition to the report that just checks for NAs. I opened an issue to remind me to look into this.
A simple R package to wrap the API ✅
A “dynamic” QA/QC procedure that compares recent data to historical ✅
A Cooperative Extension bulletin describing the new AZMet QA/QC procedures: I’d be happy to help spread the word and provide edits to a draft bulletin. I also need to write a blog-post for our website and promote the azmet R package.
Availability of code on GitHub with a permissive open source license ✅

At our meeting on Monday, we’ll discuss any final steps to the incubator phase and next steps for continued collaboration (i.e. funding continued work on the QA/QC report). We’ll also have a conversation about the incubator process itself—what worked well, what didn’t work, and what could be improved.

Looking forward to seeing you then!

19.5.2 Wrap-up Meeting

Have a final meeting. Here are some questions you can ask:

Do you feel like you have the resources you need to take ownership (with our continued support)?
Do you feel we met the objectives of the project plan?
Did the project take longer, shorter, or about as long as you expected?
For each phase of the incubator (project plan writing, work, and wrap-up): What went well? What didn’t go well? What suggestions do you have for improvement?
Is it ok to share the product on our blog, social media, in UA newsletters, etc. (as appropriate)?
What/when is the next step for continued work or funding (as appropriate)?

Take notes on the answers and share them with the team at the next incubator meeting.

19.5.3 After the meeting

Write a blog post on finished product, including links to repository and any other outputs. This can rely heavily on the project plan, README contents, other content generated during the project.
Add the incubator to our website
Done isn’t defined as the end of the collaboration. We can still provide ongoing support during the publication process and through office hours, help define follow on projects, allocate time paid by external funding, assist in project plan development.

19.6 Defining and acknowledging contributions

Members of the CCT-Data Science team who make substantial and substantive contributions to incubator projects should be acknowledged or included as co-authors as appropriate. This should be discussed early in the project.

The CRedIT Taxonomy provides a list of contribution types that may warrant co-authorship. All DIAG contributors should be involved in review and revision—as co-authors, we should feel comfortable with the content.

More details in the Authorship and Licensing Guidelines page.