Building a Hosted Platform for Managing Monographic Source Materials: Report on the First Year of the Mellon Grant to University of Michigan

In March 2015 University of Michigan Press received $899,000 from the Andrew W. Mellon Foundation for a project entitled “Building a Hosted Platform for Managing Monographic Source Materials.” The proposal was for a suite of activities to be conducted over a period of three years (April 1, 2015 to March 31, 2018) with the end deliverable being a publishing platform built on the Hydra/Fedora framework, to be made available open source for reuse as well as in the form of a hosted for-fee solution. This report describes progress made on the project during the first year (April 1, 2015 to March 31, 2016) as well as some of the challenges encountered. It is a version of the required annual grant update submitted to the Foundation.

While in the short term the primary application of the platform is to address the “companion website” problem (an increasing demand from authors for a way of presenting research data alongside their books), in the longer term it will also provide the infrastructure to enable long form presentations of digital scholarship (the monographs of the future?) to be published. The Hydra/Fedora framework has become popular among large research libraries building data repositories and other content management solutions, and the proposal leverages the developing relationship between libraries and university presses to further their joint goal of better serving the digital scholarship needs of faculty.

To provide a proof of concept and keep platform development grounded in real-world needs, five case studies are being published on the platform during the period of the grant; one each from Indiana University Press, University of Michigan Press, University of Minnesota Press, Northwestern University Press, and Penn State University Press. Charles Watkinson, Director of University of Michigan Press, and Jeremy Morse, Director of Publishing Technology, are co-PIs on the grant and Melissa Baker-Young is the project manager.

Progress Made and Accomplishments

Good progress was made toward the final goals of the grant in the first year and the project is on track. The main accomplishments are outlined below.

1. Undertaking a collaborative but controlled approach to design

Under the co-leadership of Melissa Baker-Young and a User Experience specialist (50% of Michigan Publishing’s front-end developer, Jon McGlone), the first nine months of the project were focused on gathering user requirements from colleagues at University of Michigan Press and the editors, and sometimes authors, of the four case studies. Key personas were identified and user stories constructed, leading to a series of wireframes iteratively developed with case study partners. This strong emphasis on planning before software development enabled the team to clearly communicate with Data Curation Experts when they started programming work in February 2016. During the remaining three months, the wire frames were refined as the constraints of the software modules that DCE was building on became apparent.

2. Leveraging library strengths in metadata organization

Nicole Scholtz, a data librarian, is dedicating 5% of her effort to the grant over all three years of the grant. She has been an active participant in the needs assessment calls with partners, focusing on the development of a metadata template distributed to the partner presses to prepare materials for ingest to the platform. This Google Spreadsheet allows the project team to capture descriptive, structural, and rights metadata in a consistent, low-barrier way; provides the authors and editors with a consistent format for organizing their materials; and will drive a batch-upload process for content ingest. By comparing metadata templates from partner presses, the team are able to clearly communicate the development needs for the Hydra ingest system. The metadata template is now very long and will be further refined during the project to include more informative “help text” and to present different “views” of fields that need to be completed by individuals with different roles in the project (author, editor, permissions expert, etc.)

3. Leveraging library strengths to understand IP issues

Melissa Levine, lead copyright officer at the Library, is dedicating 5% of her effort to the grant for the first two years. She has coordinated with copyright experts at the partner universities to lead “deep dive” conversations into the particular intellectual property challenges faced by the five case study projects. The conversations focused on both specific issues and broader challenges that the specific issues suggested. The “IP Deep Dive Report” on the conversations records some of the particular problems authors and editors faced, particularly in clearing third-party rights to materials to be used on the platform. These were as much “social” as strictly “legal” in nature. For example, Northwestern needed to navigate relationships with Russian museums for use of video clips and stills while Indiana’s subcontract was renegotiated to allow their author to travel to the Pacific Northwest to negotiate with environmental activists/musicians for the rights to host some of their creations on a third-party site. The general discussions have informed the development of a new proposal (Emory University with University of Michigan) to the Foundation to develop a “Model Contract for Digital Scholarship” which will include a sample permissions letter to third-party rights-holders which was funded in late February 2016.

4. Platform development

Software development during the first year of the grant commenced with a co-development effort between Data Curation Experts (2.0 FTE software developers and 0.5 FTE Project Manager) and developers local to University of Michigan Library (2.5 FTE and 0.5 Project Manager). This co-development covered a span of 14 weeks of active development from March through May 2016, with a 2-week planning and setup period in February. During this period, we followed DCE’s practice of managing software development according to a lightweight version of the Agile Scrum methodology. This practice allowed for daily check-ins between the DCE and Michigan camps, close collaboration between developers, and a regular feedback cycle of reviewing progress and collaborative planning of the next stage of development. This allowed us to ensure that the vendor was fully aware of our project needs, development was tracking with the needs of our stakeholders, and sufficient knowledge transfer in Hydra expertise was taking place in order to allow Michigan developers to continue apace without DCE, as we anticipate they will do over the next two years.

Development effort so far has concentrated on the foundational of deploying the Hydra stack in our environment and building out its limited, institutional repository-focused data model to accommodate the more complex hierarchy necessary to represent a publisher’s lists of monographs and the structures that can be found within them. Much of the basic functionality to build the Hydra equivalent of a barebones “companion website” is in place; however, work has yet to begin on developing a user interface that empowers a press editor to make use of this functionality or to present the materials properly to a reader/researcher, and this is where are efforts will be focused in the next phase of development.

5. Business plan development

Business planning consultant Brian O’Leary of Magellan Media presented a straw man business planning report in October 2015. His initial focus was on the potential of generating income through the platform providing a hosting service for “companion websites” from other university presses but this activity did not show a level of income sufficient for sustainability. During several revisions, a revised business plan was created by the end of the first year which presents revenue and expense projections at a high level for a five year time frame, starting in 2015. Three sources of revenue are anticipated: platform payment by University of Michigan Press for a new UMP Ebook Collection; “hosting as a service” for smaller publishers interested in a comprehensive platform solution branded as their own (Lever Press as first client); per-title support for presses interested in mounting companion websites. The latest version of the business plan (version 4) shows a small surplus by 2020. As set out in the grant proposal, Magellan Media will continue to refine the proposal over the next two years.

Setbacks and challenges

No major setbacks were encountered during the first year of the project, but some challenges were encountered which are described below.

1. Hiring programmers

The initial proposal included funds to hire two junior Ruby-on-Rails programmers to work internally at University of Michigan. Due to challenges finding the right talent within the grant timeline, we instead approached the Foundation on September 18, 2015 with a request to use the first year’s worth of programming grant money to purchase 14 weeks’ worth of work from Data Curation Experts – a leading Hydra/Fedora development company. The Foundation authorized this on October 13, 2015. This was not only a pragmatic decision so that we could begin development, but also offered an opportunity to co-develop with DCE and get existing Library developers supporting the grant to become more familiar with the Hydra/Fedora codebase. The participants of various Hydra projects currently under development at Michigan will continue to look for areas of overlap and common needs, which will reduce redundancy in effort but also afford opportunities to leverage DCE development for other Library needs. A new recruitment effort is currently underway with the aim of hiring staff to continue development work for years 2 and 3 of the grant, although DCE will remain a backup option and we have carried forward some funding for them to help with documentation/reintegration of our work into the Hydra Community.

2. Publishing Gabii

While the four partner case studies clearly differentiate the “narrative” (published as monographs outside the platform) from the “data” (published on the platform), the University of Michigan Press case study is deliberately more ambitious. A Mid-Republican House from Gabii presents an archaeological report that is primarily accessed from an interactive 3D model that exists alongside a “narrative” and “data” view. This is long-form digital scholarship that cannot be adequately represented in print and poses a number of challenges to which we had to find solutions:

  • The research team uses a MySQL database (ARK), with a PHP front-end, to gather hard data from the dig site, including hi-res still images of the artifacts. We will take a snapshot of the data relevant to the specific building (House B) and treat it as an image database with associated metadata. This strategy allows us to host the database on our current DLXS imageclass platform at present, and places it in the planned migration work of all imageclass content and functionality into Hydra.
  • The 3D model that represents the Gabii site is generated in and exported from the popular Unity 3D game engine. While Unity 5.3 can export to the open WebGL format, browser compatibility is still an issue. We are weighing the benefits of using the WebGL version now vs. starting with the proprietary Unity format (which requires a browser plugin) and waiting for WebGL support to improve. Throughout the 3D content, anchor points will contain additional textual information and links to the relevant records in the ARK database.
  • The text manuscript contains links to the anchor points in the 3D game. The texts need to be expanded for accessibility purposes to ensure that any readers unable to access the 3D content will find its most salient information conveyed in the text. The links will also be encoded in such a way that those users unable to access the 3D content will be able to bypass the 3D and link directly to the ARK database.

3. Branding the platform

While we believed that we could come up with a name for the platform and a visual identity internally, it became apparent that we had underestimated this aspect of product design and needed professional help not only to develop the identity that might encourage client interest as we built toward a sustainability model, but to ensure that the multiple internal stakeholders bought into the project. We decided to hire a professional branding agency (Berman Creative) with non-grant funds to lead us through a discovery process, create a “strategic brand platform,” and develop both a textual and visual identity. Berman was selected because of its experience with other platform developers (notably Credo Reference) and with libraries (most recently in rebranding ALA’s CHOICE magazine). The discovery exercise led by Jeff Berman was a helpful process in defining some of the unique selling points of our platform and built upon the insights developed by our business consultant Brian O’Leary of Magellan Media, who interviewed a number of stakeholders in late 2015 in preparing a strawman business plan. We are happy with both the name, Fulcrum, and the visual identity (now displayed on fulcrum.org) that we ended up with and the concepts of durability, flexibility, and discoverability that now define the brand express values that we will continue to integrate into the product. The separation of Fulcrum Scholar (the technology solution) and Fulcrum Services (value-added services built on the infrastructure) provides us with a way of articulating the different elements of the platform.

4. Developing with open source software

Building our platform entirely on open source software has yielded definite benefits but also its own challenges that manifest the adage that open source software is “free as in puppies.” Among the challenges we expected were a lack of documentation and a lack of industrial-grade efficiency in the code design. Less expected, but extremely impactful, was the sudden rapid development of the core Hydra component, Curation Concerns, directly upon which we were building our platform. Spurred by the Hydra-in-a-Box project, this codebase underwent a period of rapid evolution, requiring our team to take time regularly to update our own codebase in order to stay in sync. While we undoubtedly inherited bug fixes and feature improvements that will benefit our platform, many of the changes are not directly applicable and, of course, the schedule of these updates is beyond our influence. While at the time of our initial beta release we will be able to freeze our dependency on a specific version of Curation Concerns and plan around integration with future upgrades, in this period of high flux within our own software development, it has been a new challenge to deal with the ripple effects of similar flux elsewhere within a vibrant and active open source community.

Plans and Goals for Year 2

In the second year, case study publications will start to appear on the Fulcrum platform as a series of beta releases, one per case study. The first of these, for the Northwestern title, is scheduled for release on August 17, with subsequent iterations for Indiana, Minnesota, and Penn State university presses to be released in September and October. These beta versions will not have full functionality or a refined appearance but will be public and available to readers. Refinements of the public interface will continue to be released after initial launch, with development time reserved for refactoring the codebase, to improve overall design and maintainability, and to bring it into line with subsequent developments elsewhere in the Hydra community (a recurring cost for which all projects built on open source software must account). By the end of year 2, the four partner case studies should be approaching their final presentation form. Gabii, the Michigan case study, will not yet be displayed on Fulcrum but will be made available on Michigan’s existing DLXS platform in a draft form to meet contractual responsibilities to the authors. This stopgap measure will prove valuable to the Michigan Library, as migrating the content to Fulcrum will serve as a useful test case for the large-scale migration of content from DLXS to Hydra which the Library must accomplish in time.

In the meantime, the development team will work on adding support in Hydra for 3D content and fully encoded text. This latter problem is a significant requirement for our purposes that has not been addressed elsewhere in the Hydra community; as such, it represents an opportunity for Michigan to make a significant contribution to expanding the types of content which the growing Hydra community will be able to support. In pursuit of this innovation, members of the Fulcrum development team are partnering with other staff within the Library to form an Investigation Team to determine the most effective method for encoding fully-encoded text in Hydra that will serve our functional needs while preserving the semantic structures present in our legacy TEI-encoded content in DLXS. While the scope of this work reaches across our institution and will merit the consultation of field experts, the requirements of the Fulcrum project serve as the immediate need which places a timetable on the investigation, yet another example of Mellon’s support spurring innovation in ways that reach far beyond the scope of any one grant.

With the conclusion of the DCE engagement in May, programming responsibilities will move to the internal Michigan team. At present this consists of the UX/UI specialist and one full-time developer supported by the grant, with another to be added shortly (as of this writing, an offer is being extended to the second of these developers after another difficult candidate search), plus one senior developer whose effort is being contributed 100% for the duration of the grant. Getting the new programmer fully up-to-speed and working in parallel with other U of M Library Hydra/Fedora projects will be a focus of summer and fall 2016. Further refinement of the ingest template will focus on intellectual property issues. There are some important questions remaining as to how some of the licensing restrictions imposed by rights granting organizations can be managed. For example, is there any way to implement geographical restriction by IP address? And to what extent can thumbnail versions of images be used to represent images to which the author has no rights? Melissa Levine and Nicole Scholtz will continue to investigate these issues. The ingest template will also be made more user-friendly, with help text and a clearer communication of roles and responsibilities for populating particular fields.

Brian O’ Leary at Magellan Media will continue to refine the business plan. The major revenue source is predicted to be the portion of the platform fees recouped from mounting a University of Michigan Press Ebook Collection so the estimates gathered as the Press builds its ebook offering will provide more and more accurate numbers for insertion into the planning matrix. Further publicity about the Fulcrum platform will attract interest and attention from other potential partners for the platform hosting and companion website hosting offers, giving a better sense of likely demand.

List of Recent Publications

Michigan Publishing took steps to increase attention on the project in a number of electronic and in-person venues. A summary of the project was published, and more recently a dedicated website was launched in order to promote our work as an infrastructure offering that could support the work of other scholarly humanities publishers. In addition, presentations were made at the following conferences: