The D-Light of Digital Collections

My Digital Collection

December 5, 2008

This semester for my Digital Collections course I put together a small digital collection for physics TAs.

You can visit my collection, the Physics TA Library.

Why I built it

Physics graduate schools often hire teaching assistants (TAs) to teach introductory physics courses and laboratories. These teaching assistants are typically graduate students with normal course loads and little to no formal teaching experience. They are therefore unfamiliar with general teaching and classroom management strategies. Additionally, they may also be unfamiliar with the Physics Education Research (PER) community and the tools, resources, and techniques they have developed to help physics teachers make a greater impact in the classroom.

Collection Proposal, Purpose, and Goals

In this environment, I built a collection of 50+ resources targeted directly at serving the physics TA population. Resources include links to teaching resources and support articles introducing TAs to PER. Additionally, forums are available to anyone that registers an account so that TAs can share teaching and research tips with each other.

The goal of the digital collection is to help bridge the teaching experience gap by providing targeted resources that help to improve the teaching practice of new TAs. A secondary purpose is to introduce physics TAs to the vetted, high-quality online teaching resources developed by the PER community.

What I learned

For this proposal, I used the comPADRE Digital Library’s platform. This was my first time using the interface from a cataloger’s perspective, and so I was surprised at how much time it actually took to catalog something from scratch. My initial estimation was that I could select ~100 total resources and catalog maybe 30 from scratch in 25 hours. However, that estimate forgot to include time for creating header images, adjusting the policies, about pages, faq, and putting out fires. Once I factored that time into my 25 hours, ~35 appropriate existing comPADRE records were added to the collection (with adjusted metadata as necessary), and ~15 new records were cataloged from scratch. Given the specificity of metadata fields (see a detail page here), cataloging from scratch took much longer than I thought.

All in all, I think it came out well though and I hope you find it interesting. I’d love to hear any feedback you have.

3 Comments | Digital Collections | Tagged: comPADRE, Physics TAs | Permalink
Posted by Lyle

Why a Digital Library?

December 5, 2008

Putting together a good team isn’t easy, metadata is expensive, technology is constantly changing, funding is difficult to come by, and sustainability is a joke. So then why put together a digital library?

One argument is that you have the chance to reach a greater audience when they need the information. But I’m not sure that’s enough. I think what pushes it over the edge is that you can find things in a digital library.

Sure, the search algorithms used today have issues and we have cause to complain about them. But then again, computers have opened up new vistas of storing and finding information. You can search all of Moby Dick in seconds. You can find a single record out of millions using an ISBN number.

Before digital collections, information and collections were stored in inventory lists or a card catalog which was limited in time, space, and cross-indexing. Having them online means multiple people can use the catalog at the same time. Additionally, given good metadata, cross indexing and ‘similar to’ algorithms can be automated.

These features that allow people to find things are what make digital collections exciting to me.

Now, don’t get me wrong, card catalogs were great! Browsing shelves for similar books is also great! But they aren’t necessarily enough when you’re looking for something.

Retrieval is the promise of a digital collection. It’s why we build them and what we need them for.

Here’s to retrieval.

Leave a Comment » | Digital Collections | Tagged: retrieval | Permalink
Posted by Lyle

Sustainability

December 3, 2008

The amorphous, rapidly evolving, and fickle nature of the Web makes forays into creating DLs dangerous. There are no guarantees that your effort will be worth anything in 3 years (and it likely won’t be worth much).

Because of this, there are some questions as to whether massive expenditures into digital collections are really worth it.

It’s a question we have to ask without an easy answer.

Sure, digital collections are neat, have great information, and aren’t expensive compared to holdings in physical buildings, but the expenditure is constant, and digital directories of web contents will always be a step behind the actual Web.

Given this, can we justify the cost of creating a DL for the long term? And if so, who will pay? The state? The federal government? Ad-revenue?

I have 3 thoughts on this topic:

1. Most digital collections will be abandoned or fail.

Not all projects have a chance at a long life and many are stillborn. Some successful ones will flicker and die. All of this is likely a good thing. Not all ideas are grand, and not all ventures are needed. Sometimes you have to accept failure, cull, and focus on the successful parts.

2. Standards are the lifeblood of long-term successful collections.

Why open standards are important, and why I’ve focused on them in this blog so far, is because they allow existing effort to be transferred between systems. Any project with long term aspirations needs to be ready to transfer into a new technical system or risk failure. Having no transition path in case of disaster is a recipe for failure. Disaster always strikes in the long term.

3. The survivors will serve specific needs

I think that surviving collections will serve needs aside from search. Mega-conglomerate search engines will always attract more users than small web directories. However, communities and individuals will likewise have needs not met by the big engines. Whether that need is a catalog for a local physical library, or a collection of papers presented at conferences for a professional organization, or your collection of shot glasses, local collections will always be needed – and those communities and individuals will be the ones to support the surviving collections.

But these are just my thoughts on the ultimate sustainability of digital collections. Your thoughts?

2 Comments | Digital Collections | Tagged: Sustainability | Permalink
Posted by Lyle

Digital Libraries and Accessibility

November 30, 2008

There are unique opportunities available to reach disadvantaged audiences using digital libraries.

Blind users may use screen-reading technologies to access catalog records, interfaces may be designed so that they may be enlarged for low-vision users, and information can be organized in ways that allow for use by the cognitively-impaired.

However, just as with traditional libraries, certain technical standards and techniques should be followed when preparing your digital library to ensure that it is accessible. In particular, there are two sets of recommendations I suggest you view: the WCAG 1.0 Checklist and the WebAIM Section 508 Checklist.

From these sets of objectives, I’d like to highlight two easy-to-implement objectives that will help to ensure your interface is accessible:

1. Ensure that no information is indicated by color alone.

For instance, specifying that shaded records are not in stock isn’t enough for users with screen readers. Likewise, keys and other charts should indicate information in ways that do not rely solely on color.

2. Ensure all images have descriptive text specified for those that may not be able to view the images.

Images should always have alt tags to hold descriptive text. By including these tags, users that aren’t able to view the images will still be able to understand the point of the page.

Implementing these two standards will help to ensure your library is accessible to all.

Leave a Comment » | Uncategorized | Permalink
Posted by Lyle

Is it Unethical to include Web 2.0 features in Digital Libraries

November 9, 2008

Greetings from not-so-sunny Iceland. This week I thought I’d write a bit about the ethics of including Web 2.0 features in digital collections.

The thought is that users are becoming so used to having commenting/rating/tagging features on commercial records that they now “expect” these tools. By providing them, digital librarians are providing what users want while adding value to their collection.

Of course, it’s not that simple. Aside from the technical challenges, there are serious ethical questions that must be addressed regarding these tools.

For instance, if users are allowed to rate or comment on material, should the library censor such comments? Especially considering the nature of discourse on the web today (see YouTube), are libraries willing to allow unfettered comments in their online catalogs? If not, who will do the censoring?

Additionally, there are serious questions as to whether these tools are valid according to librarian ethics.

Strictly speaking, including ratings and tags on bibliographic records violates the Library Bill of Rights interpretation on Labels and Rating Systems. By including potentially prejudicial user-generated keywords (much less ratings) directly on bibliographic records, digital collections are directly flouting the tenets of this interpretation of the Library Bill of Rights.

Does that mean that digital libraries shouldn’t be including these Web 2.0 features? That’s a different story. Users do want comments and they do want ratings.

I think the better question revolves around the point of this tenet. Determining why this interpretation exists will help us determine how we can provide these features without unduly influencing future use. One thing is for certain – the demand to include these features in catalogs will only grow.

Leave a Comment » | Uncategorized | Permalink
Posted by Lyle

CWIS

November 3, 2008

CWIS is a great option if you are looking for turnkey digital library software. It, like Omeka, offers a free, easy-to-install option for collection builders. Developed by the Internet Scout Project team and NSDL’s AMSER portal, installing it gives you access to a full digital library currently in use by an NSDL Pathway Project.

CWIS features include:

resource comments and ratings
keyword and fielded searching
a recommender system
forums
OAI 2.0 export (with oai_dc and nsdl_dc schemas)
RSS feeds
user-definable schema (comes with full qualified Dublin Core) and prepackaged taxonomies
user interface themes
turnkey installation
integrated usage tracking

Personally, I would recommend it over Omeka as I believe it is further along in its development. Additionally, CWIS’s OAI export options are extremely helpful to ensure any data put into the system can be exported out should you need to switch systems in the future.

Regarding accessibility, CWIS has a W3C/WAI compliant interface, and includes ACCLIP metadata field support. Although designed for installation on a LAMP (Linux, Apache, MySQL, Php) system, all it requires is Php and MySql.

On a personal note, I’ve been a fan of the CWIS technical team for the past few years, and they have made remarkable strides recently in improving CWIS import and export options.

At the 2008 NSDL annual meeting, I attended the Tutorial session Collection Development with CWIS where Ed Almasy gave a presentation on the CWIS features and set up a full CWIS installation in about 10 minutes.

If you’re looking to build a collection from scratch (or from an OAI service), take a look.

1 Comment | Digital Collections | Tagged: Amser, CWIS, NSDL | Permalink
Posted by Lyle

Solutions for Correlating Collections to K-12 Content Standards for Interoperability

October 6, 2008

Another presentation I attended at the National Science Digital Library Annual Meeting was Solutions for Correlating Collections to K-12 Content Standards for Interoperability.

I’d like to start with a personal comment – It was great to see Diny Golder doing well. She was going through chemotherapy when I last saw her and so I was very happy to see that she has had a full recovery.

Now to the preliminary information:

Diny runs a company, JES & Co., that aggregates K-12 state and national content standards. By updating these standards as states change them, and by keeping them in a consistent import format, JES & Co is instrumental in keeping educational digital libraries up-to-date with evolving content standards.

However, going from a big list of 300,000+ standards to the alignment of resources can be a daunting challenge for most digital libraries. Additionally, because state standards constantly change, resources that are aligned to out-of-date standards are a constant problem.

Now to the NSDL presentation:

Because of those issues, tools to help with standards alignment are greatly needed. To help fill that need, JES & Co, in conjunction with the Center for Natural Language Processing, have been working on tools to ease standard correlation.

I won’t discuss the manual tool demonstrated as the semi-automatic Content Assignment Tool (called CAT) was very interesting. CAT can take a url and from it guess what state and national standards an object might be aligned to. Aligners may either pick the standards the object aligns with from the suggestions, or manually pick standards if necessary. Additionally, the system improves as more alignments are made. Best of all, an object that has been aligned to a single state standard can then be aligned to other states’ standards through an automated process.

This tool therefore has the potential to help reduce the human overhead in aligning learning objects to state standards. Additionally, since JES & Co attempts to correlate standards between old and new versions of state standards, correlations from out-of-date standards to new standards can generally be made automatically.

To get your hands on the ASN’s standards, visit JES & Co. Or, click here to see your state’s standards as represented in the ASN. Or visit the Center of Natural Language Processing for more info on the CAT.

P.S. As a slight aside, Elizabeth Liddy from the Center of Natural Language Processing is amazing. If you get a chance, visit the CNLP’s list of projects, recent conference presentations, and publications. Their hands are in everything from eye-tracking on metadata records to semantics and information correlation. If you are interested in processes to automatically extract semantic information from documents, this is one of the places to watch.

P.P.S. How standards relate to digital libraries is probably less tenuously than you might expect. Standards can be very important in some educational digital libraries, and are generally described using the Dublin Core conformsTo element. In practical terms, having resources aligned to standards may help a teacher to immediately know if a digital lesson plan is appropriate for their classroom upon access. Additionally, a search interface based on alignments could facilitate resource discovery by allowing teachers to search for digital objects directly from the standard they wish to teach.

2 Comments | Digital Collections | Tagged: ASN, CAT, CNLP, content standards, semantic processing | Permalink
Posted by Lyle

Learning Object Repositories and Reuse

October 2, 2008

I will take a one week hiatus from discussing OAI and metadata since I attended the National Science Digital Library Annual Meeting this week. Instead, I will create a series of posts regarding some of the sessions I attended and the issues discussed in each.

The first talk I attended was by Thomas Wrensch and dealt with the “Reuse” of NSDL Resources. Reuse is a hot topic for educational digital libraries as the ability to re-purpose materials into multiple educational contexts is key to making usable learning object repositories.

For instance, if you have a lesson plan tied to a particular book’s exercises, that is likely not usable outside of its specific context. However, if I have a video displaying a generic acid-base reaction, that video might be embedded in a number of different educational contexts without loss of value.

Ensuring that your learning object repository provides objects for teachers that can be used in multiple situations is the primary idea behind making Reusable objects. Of course, without context, you can also have useless objects. Therefore, there are warring desires behind having rich resources that describe particular concepts and resources that can be used in multiple situations.

Now to Tom’s talk. Tom crawled an archive of the entire NSDL by the San Diego Supercomputer Center to look for how NSDL objects were being reused. What he found was that they weren’t being re-used (and re-stored/linked to at least) very much except for some particular non-learning objects (for instance, the Get Adobe Acrobat image was the most replicated resource among all NSDL objects). While he did find more paragraph-level re-purposing (cutting, pasting, and adapting single paragraphs) his tools were not sophisticated enough yet to really demonstrate this in a way that he found completely satisfactory.

As another element of his study into reuse, Tom also conducted a survey across a number of collections to see how individuals were using objects in the NSDL. While the survey data is not yet fully analyzed, an interesting result that he found was that teachers were very concerned about how resources could be used in their classes. ‘They had heard about students being sued for downloading music, and they weren’t sure that they wouldn’t be guilty of the same thing’ when downloading and using resources for teaching.

I talked with Tom after his talk regarding this point, and he said that he was shocked because it wasn’t part of a survey question. This was an issue that “nearly every” teacher brought up without prompting when discussing digital objects and their use.

If you are interested, click here to find out more about Tom or here for his NSDL abstract and presentation (coming soon).

2 Comments | Digital Collections | Tagged: NSDL, Reuse | Permalink
Posted by Lyle

Metadata Crosswalks

September 24, 2008

No matter what metadata standards your digital library exports, services you need to interact with will occasionally require a different standard.

Metadata crosswalks are a common way of dealing with this issue as they are mappings from one metadata format to another.

Many standards bodies offer crosswalks. For instance the Library of Congress provides standard crosswalks from MARC to Dublin Core and Dublin Core to MARC.

To give a concrete example, the LOC’s MARC-Dublin Core crosswalk maps the MARC fields 245 and 246 to dc:title.

However, crosswalking is dangerous. Most organizations follow the best practice guidelines to ensure that they catalog to a particular metadata format correctly. And even then it is difficult to ensure consistency in the use of a metadata field in a single project.

If you add in that a remote project’s metadata may not have originated in the same format, but in a field ‘like’ the one you use, you have potential chaos. For instance, although your dc:title field might have certain best practices, the MARC 245 field it was mapped from may have had completely different best practice standards.

Because of this, crosswalks are frowned upon in some fields, and it is best to use direct crosswalks between formats. It’s also important to understand the best practices of both formats if you plan on creating your own.

Despite the danger of garbled fields, crosswalks are often necessary as projects from various fields try to interoperate.

Next time I’ll talk about how digital libraries exchange metadata records using OAI.

2 Comments | Digital Collections | Tagged: Crosswalks | Permalink
Posted by Lyle

Qualified Metadata Fields

September 15, 2008

(I had wrist surgery Friday morning, so this post will be shorter than normal.)

When harvesting an unqualified <dc:subject> field:

<dct:audience>Learner</dct:audience>

you can’t tell if the digital library is using a particular subject scheme, random user-entered keywords, or something else.

Enter qualified metadata fields which allow one to specify the range of values that may appear in a particular metadata field.

For instance, here is the NSDL’s qualification for the Audience field (from the Metadata Guidelines).

Administrator
Educator
General Public
Learner
Parent/Guardian
Professional/Practitioner
Researcher

The fact that a field uses this vocabulary would appear in the XML as:

<dct:audience xsi:type=” nsdl_dc:NSDLAudience”>Learner</dct:audience>

Notice that the only difference is the xsi:type designation (although additional changes would appear in the header of the file).

By having a prearranged agreement on what different metadata fields hold in terms of controlled vocabulary, you can get more meaningful data exchange between libraries. And if you can agree on what fields and vocabularies should be used in assignment, then you can get very meaningful data exchange.

But even if you don’t agree on a particular controlled vocabulary, by designating which vocabularies you are using with qualified metadata fields, you enable partners to create valid crosswalks to their own vocabularies.

Next time, I will talk about crosswalks again, but this time between metadata formats.

Leave a Comment » | Digital Collections | Tagged: Qualified Metadata | Permalink
Posted by Lyle

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Archives

Meta