Rights declarations in the Connecticut Digital Archive

With the launch of standardized rights statements in the repository, it is good to give some context to the project by looking at the current picture of rights declarations in the CTDA. We took a look at the rights declarations* for objects added to the repository before January 2, 2020 and did some analysis.

First, we should talk about how we count objects in the repository. We are very proud of the over 1.5 million objects that have been added to the CTDA over the past 5 + years. 1.5 million+ objects is a raw count of all digital objects in the repository, including all book, manuscript and newspaper pages. In fact, page objects make up over 90% of the digital objects in the repository.

We excluded page objects from this summary because most, if not all, of these objects have no descriptive metadata. Also, users searching and viewing these pages interact with them as parts of a larger object, such as a book or newspaper. For example, when we send metadata records to the Digital Public Library of America (DPLA) we do not send the metadata records for page objects. We send the metadata records for the books, manuscripts, and newspaper issues that contain the pages. We took the same approach for this review.

129,323 objects with descriptive metadata reviewed.139 unique rights declarations in the raw data

After excluding page objects, we had 129,323 objects in the repository with descriptive metadata records. Of these objects, we found there were 139 unique rights declarations. We also found that a little over 90% of objects (117,316 objects) had some type of rights information in the metadata record and almost 10% of (12,007 objects) lacked any rights declaration. You can view the raw data we pulled.

90.7% have rights info in the metadata

The raw data tells us a couple of things. First, 139 unique rights declarations is a pretty big number to navigate when you approach using the repository from the researcher’s point of view. And second, if you look at the actual declarations themselves, there are statements that are very similar, but the system counts them as unique entries due to missing a period here, an extra space there, or a difference in capitalization. In order to make sense of this data, we needed to do some more work with the raw rights declarations.

We used Open Refine to identify declarations that used different language to declare the same rights status for objects and then reconciled these declarations to get a better idea of rights in the repository. For example, the rights declarations, No known copyright restrictions and No Known Copyright, count as two unique data points in the repository, even though they signify the same rights status. We reconciled both of these statuses to be No Known Copyright for our analysis. We went through and reconciled as many of the different declarations as we could and found there were 94 unique rights declarations for objects in the repository. You can find the reconciled rights declarations data set on the CTDA Resource Center, along with their original rights data in this spreadsheet.

94 unique rights declarations after reconciling the data

For the reconciliation process we only manipulated data after it was pulled out of the repository in Open Refine. No metadata was changed or modified in any of the records in the repository system.

After reconciling the data, 25,173 objects have been labeled as having no known or no copyright. According to the rights declarations, 24,227 objects have no known copyright, 680 are not in copyright in the United States and 266 are in the public domain.

25,173 objects declared in copyright

594 objects are clearly labeled as In Copyright with 390 allowing for Non Commercial Use only and 86 for Educational Use only.

594 objects clearly declared in copyright

Most of the other 78,977 object’s rights declarations contain information about the borad copyright information about objects or tell users to contact the institution to determine the copyright status of the object. Most of these declarations also include information on how the objects can be used and reproduced. Use and reproduction information is important information for our Community Members and  researchers, but can be confusing when included in the same metadata field with copyright information. Before we decided to implement standardized rights statements, the only place to enter use and reproduction information was with copyright declarations. Now, we are introducing a local use field for all objects along with standardized rights statements. This means copyright statement of an object and any use and reproduction policies will be clearly marked and displayed for researchers as separate fields.

So, what’s next? Where do we go from here with all of this data? The second phase of the implementation of standardized rights statements in the repository will be to look at objects added to the repository prior to February 1, 2020 (129,000+ and counting) and apply standardized rights statements to these objects. Part of this process will also look at the validity and appropriateness of the rights declarations applied to collections and items. For example, are there items declared to be in copyright that are really in the public domain, or vice versa? We are not sure, but we will find out. Implementing standardized rights statements gives us the opportunity to not only standardized the language of rights statements, but also to create a more accurate representation of the rights of items in the repository.

We have some ideas and possible plans in the works for the second phase of work in this project, but we do not have any exact details right now. What we are sure of is working with our Community Members we will be able to take this project head on and eventually turn the 139 unique rights declarations in the CTDA into 6 standardized rights statements.

Keep an eye out for more rights analysis throughout the year as we track the number of objects with standardized rights statements in the repository.


* - We looked at the <accessCondition type=”use and reproduction”> MODS metadata element for each object.