(Translated by https://www.hiragana.jp/)
NYPL Labs : The Process Behind the Product
The Wayback Machine - https://web.archive.org/web/20091102120122/http://labs.nypl.org/
[ Content | View menu ]

“Summer Camp For Archivists” Sounds So Much Better

July 19, 2009 by Mark A. Matienzo
Filed in archives

Crossposted to thesecretmirror.com.

I’m staying with colleagues and good friends during my week-long stint in Charlottesville, Virginia for Rare Book School. If you’re here – particularly if you’re in my class (Daniel Pitti’s Designing Archival Description Systems) – let me know. I’m looking forward to a heady week dealing with descriptive standards, knowledge representation, and as always, doing my best to sell the archives world on Linked Data. Notes and thoughts will follow, as always, on thesecretmirror.com.

Post or View Comments (0)

m-Libraries 2009: Thinking about the mobile future

June 30, 2009 by Michael Lascarides
Filed in Conferences, Mobile

I just returned from the 2009 m-Libraries conference at the University of British Columbia in Vancouver (which–let’s get this out of the way up front– was an exceptionally beautiful and well-organized venue). The topic of the conference, the second annual, was the influence of the rapid expansion of mobile technology on libraries.

While those of us here at the Digital Experience Group have plenty on our plate with the upcoming overhaul of NYPL.org and more, a good part of our “charter” is to anticipate ways that changes in technology can keep the Library relevant in the future. It was in this spirit that I approached the conference, with an ear towards how other institutions are dealing with one massive technological upheaval: the explosive growth of mobile technologies.

As a good conference on new and speculative topics should do, m-Libraries 2009 left me with more questions than answers, but fortunately they are much better questions than I had when I arrived. I have already seen a few blog posts covering the conference session by session (see Paul Coyne, Vicki Owen, and the official conference blog, among others), so I won’t post another. Rather, in the spirit of pushing the conversation forward, I’d like to enumerate the questions I feel went incompletely answered.

What is mobile technology?

“Mobile” is one of those large amorphous concepts that means something different to different people, and all attendees seemed to be coming at it in different ways. It’s just too big. Are we talking about just phones? Text? Mobile web? How about Wifi? Is it enough to just be wireless? If I have a satellite Internet connection in my shack in the middle of the desert but I never move, am I a mobile learner? In a sense, this question of “what is mobile?” was implied but never addressed head-on, and I naively wanted to try to get people to answer it. It’s not really necessary to have an answer to this question, but I think it would have been a good plenary or workshop topic leading to interesting discussion.

What is the difference between the mobile user, the web user, and the physical user?

I would have liked to have seen more research about the patterns of mobile use, and specifically more about the movement between modes. There was a fair amount of talk about “the mobile user”, yet the mobile user is also the wired user as well as the offline user. The interesting question is where those modes overlap. This can be observed anecdotally just by walking into the Rose Reading Room–our most physical and traditional of spaces–any day of the week and observing the sheer variety of mobile devices spread out in front of each patron. This is fertile ground for future research. 

What is the context of mobile tools?

A related question. The applications of mobile technology can be brought to bear on the library in any number of ways, and at all levels. There is neither a “right” way nor a necessary way to use mobile technology. In session after session, presenters stressed the need to listen to users and meet their expectations rather than expect them to adapt to the library’s tools. 

One of my favorite sessions was by a pair of reference librarians from Temple University who described their mobile reference-on-demand project as a “complete failure”. There plan was to let students who might be wandering in the stacks of their (dark, low-ceilinged, 1960s Brutalist) library send a text the reference desk when they needed help. Cheekily titled “Ask Us Upstairs”, the project fared dismally. Yet they offered a half-dozen reasons why things went wrong, and in the process illuminated the intimacy with which mobile tech operates: Students didn’t like being approached in the stacks by people they didn’t know, the stacks were considered “student-occupied territory”, voice calls were a better fit than text in the stacks, students did very little browsing in the stacks and were already very directed and intentional before going into what was considered a slightly claustrophobic space, and so on. By sharing the problems they ran into, the Temple folks perfectly illustrated the degree of psychological and emotional design challenges that need to be overcome to make a successful mobile application.

Are we doing enough with SMS text messages?

A common refrain of the conference was that mobile doesn’t necessarily mean mobile internet, SMS text messages are still the standard means of communication for a vast majority of the wired world, due to both the lack of mobile data infrastructure (in the developing world) and the prohibitive expense of data plans for students and low-income users (in the developed world). While there’s understandably lots of excitement about iPhone apps and the mobile Web, it was a strong takeaway that we not undersell the utility of text-based reminders and other tools.

Do we see this coming?

The scale of the mobile revolution is staggering, and hearing about how rapidly the developing world is becoming connected was a real “We’re gonna need a bigger boat” moment for many in the audience. Ken Banks of the FrontlineSMS project pointed out in a highly engaging talk that 65% of the world’s population now has access to a mobile phone (with more coming online all the time), an absolutely staggering number. As technologies start to piggyback on those connections, there is serious potential for the mobile network to be a disruptive technology (to use Clay Christensen’s term) in any number of ways. What does society look like when a majority of citizens carries an always-on global communication device in its pockets? For The Library, we need to have mobile tech on our radar both in terms of reaching new audiences (perhaps a future source of distance learning resources) and expectations of the tools we deliver to our existing audiences. 

Where do we start?

We’ve already (sort of) launched NYPL Mobile, but that’s still a beta project. There’s a lot of room for experimentation, and the rest of the world is driving innovation at breakneck pace. Mobile traffic on NYPL.org, while still a small part of the overall, has increased over 300% in the past year without any concession to mobile web surfing on our part. At a certain point, a critical mass of dedicated local users (and our mobile users are overwhelmingly local; there’s that intimacy thing again) are going to expect state-of-the-art mobile services for exploring our collections and physical spaces, and not as a cool luxury, but as a standard practice.

As we plan our strategy for the next couple of years, we need to keep an eye on developments in this field to insure that we’re not looking irrelevant to our most dedicated users.

Post or View Comments (6)

Batch Reindexing for Drupal + Solr

May 13, 2009 by Mark A. Matienzo
Filed in drupal

Crossposted to thesecretmirror.com. Sorry for any duplication!

Hey, do you use Drupal on a site with several thousand nodes? Do you also use the Apache Solr Integration module? If you’re like me, you’ve probably needed to reindex your site but couldn’t be bothered to wait for those pesky cron runs to finish – in fact, that’s what led me to file a feature request on the module to begin with.

Well, fret no more, because thanks to me and Greg Kallenberg, my illustrious fellow Applications Developer at NYPL DGTL, you can finally use Drupal’s Batch API to reindex your site. The module is available as an attachment from that same issue node on drupal.org. Nota bene: this is a really rough module, with code swiped pretty shamelessly from the Example Use of the Batch API page on drupal.org. It works, though, and it works well enough as we tear stuff down and build it back up over and over again.

Post or View Comments (3)

Infomaki goes Open Source!

May 6, 2009 by Michael Lascarides
Filed in Open source, Ruby on Rails, User testing, infomaki

Source of the world’s most gig... Digital ID: G91F314_063F. New York Public LibraryWe’re happy to announce that, as promised, our Infomaki usability testing tool has been released Open Source under the GNU General Public License. If you would like to tinker with its inner workings, you can grab a copy at https://sourceforge.net/projects/infomaki/.

This first release is a “throw it over the side and see if it swims” release. To get it running, you will need to have a decent familiarity with the Ruby on Rails programming framework. It has spotty-to-nonexistent test coverage, a bit of vestigial code, and possibly some dependencies on a RubyGem or two that we probably forgot to package. Also, there are some non-user-friendly features in the admin, such as the fact that it’s possible to delete screenshots that are in use, causing errors.

If you’re not the adventurous type, you might want to wait a couple of weeks for a more novice-friendly release. But there’s been definite interest in the software, so those who want to have at it can have at it!

We’ll soon be adding documentation to the Sourceforge site to accompany the source code, but here’s the Quick Start version for getting it running on your local machine:

  1. Have Ruby on Rails 2.2+ running, along with the database of your choice.
  2. Check out the Infomaki source code from our Subversion repository or by downloading the .zip file (see the Sourceforge project site for details).
  3. Update the config/database.yml file with your local database settings.
  4. From the command line, change the directory to the root of the Infomaki application and run “rake db:migrate” to build the database structure.
  5. Install the required RubyGems (running “rake gems:install” from command line should do the trick).
  6. Start the Rails server (type “script/server”). If all goes well , you should see the project home page at http://0.0.0.0:3000/
  7. To create content, go to http://0.0.0.0:3000/initiatives and log in with username “admin@admin.com” and password “rootroot”.

Hopefully this will be of use to a couple of people. We welcome suggestions and code patches. Have fun!

Post or View Comments (20)

Visualizing search data: What’s the right amount of visibility?

April 29, 2009 by Michael Lascarides
Filed in Ruby on Rails, Search, privacy, visualization


In a fortuitous bit of timing for us here at the Library, Google released its Analytics API last week. We had already been discussing ways of exposing site usage statistics to NYPL staff, and the API makes that job a lot easier. This means that we can set about accessing our 18 months or so of Analytics data with some programming, and thereby turn that raw data into web sites and other visualizations. As soon as the API was announced, I hacked together a Ruby on Rails application to start digging in to the ungodly amount of cool data now exposed (side note: I’ll soon be posting a technical how-to for any Ruby-ists interested in exploring their Analytics). Within a couple of hours, we were pulling live stats data into a locally-formatted web application.

First came a tag cloud of search terms sized in proportion to their popularity. Next came a thumbnail gallery of Digital Gallery images ranked by popularity, and lastly a keyword explorer where we could enter a keyword (say, “dvd”, “military”, “japan”, “staten island” or “art deco”) and retrieve a thorough analysis of that keyword: where searchers for that keyword are located, what other terms they searched for, and what pages they visited the most after searching. (Click the thumbnails above for example results)

This is revolutionary for us. We’ve already been exploring lists of the most popular search terms, but to see them proportionally scaled in a tag cloud really drives home the relative popularity of different terms. And the keyword explorer is just amazing. We’ve long known that the Digital Gallery is extremely popular among military history buffs (to pick a random subject), but to see the follow-up searches for “military” on the DG really paints a picture of the breadth of topics and level of detail of these enthusiasts.

There is no doubt that we’ll be using the aggregate search data internally. For example, looking at searches containing “dvd” reveals a huge list of movies and DVDs being sought by our users whether we have them in stock or not, which can then be used by our Acquisitions department to meet demand. And searches relevant to divisions (e.g., “maps” or “manuscripts”) or locations (”staten island”) can help staff in those areas gain valuable insight into the needs and wants of their users.

But almost immediately, we also started thinking about ways of using some of this data to bring context to the Library experience of the end users. Could we display popular searches by location on screens in the branch libraries? Could we make maps of the country showing the most popular searches by city? Should we let end users explore other relevant searches by keyword? The API solves a big chunk of the technical hurdles, which leaves only the question, “Are we sure this is a good idea?”

We checked with our counsel’s office about whether our privacy policy precludes us from publishing this data or its derivatives, and to my surprise, they didn’t have any problem with it from a legal point of view. Still, it makes me nervous to start producing any public products using user search data in any form until we’ve had a thorough discussion of what current best practices are.

Just to be clear, this data from Analytics is always aggregate data. There’s no way to say that a particular search came from a particular person or computer. But it can potentially get pretty specific: terms that were searched a single time, for example, or the city from which a specific term was searched.

So I’d like to throw open the question to the library community and the public: what is the balance of rich context versus privacy? Should we make none of this data public, or all of it (that is, allow users to see the context of every search term)? Should we only make contextual search information public when it reaches a certain threshold of use (the “safety in numbers” argument)? If so, what is that threshold? Is it a case-by-case basis? Is it possible to generalize any guidelines? Have other institutions explored a similar policy?

We’re very interested in hearing from you in the comments.

Post or View Comments (19)

Yesterday at “Digital Dilemmas”

April 17, 2009 by Barbara Taranto
Filed in Communities, Conferences, Frameworks

It’s not everyday that Cliff Lynch and Dan Cohen are in the same city, let alone at the same meeting. But yesterday New York was lucky enough to have both of them in town for a one day workshop Digital Dilemmas hosted by the Metropolitan New York Library Council in association with OCLC.  If you’ve every heard either of them speak you’ll know what I mean.

Cliff began the day with a birds-eye-view of the challenges in the current digital landscape – a sort of tour de force for librarians deeply entangled in the day to day struggles of information management. As usual, the 45 minutes seemed like 10 and was chocked full of Cliff’s special blend of data, anecdote and insightful allusion. He advised the audience members to think broadly about our respective vocations in relation to the communities we serve and to not reduce the discussion to the mechanics of digital activities. Libraries, he remarked, are an integral part of,  and partner with their constituents and their constituents are members of a greater society. The society is changing. Learning and education are changing and consequently, the place for libraries in the society is changing.

Dan Cohen who rounded out the day, had a similar message but delivered in an entirely different manner. Dan conducted an experiment (crowdsourcing) to demonstrate information seeking behaviors and information retrieval behaviors. Dan posted an image of an artifact on Twitter and asked those who were following him whether they could identify the object. He then turned back to his presentation. Shortly after, the Twitter dialogue box kept popping up to interrupt him. His “followers” were responding. By the time he had finished discussing the ideas of the “open library” – a virtual environment where library activities occur outside the formal, professional structure – he had made his point.  The community had participated in creating new knowledge and had done so in an interactive and sometimes noisy fashion.  Dan’s point wasn’t that libraries are obsolete but that libraries and librarians must engage with their communities in the places that knowledge is being acquired and since knowledge acquiring behaviors are changing, libraries and librarians must change.

This is hardly bleeding edge ….. but  it does warrant repeating since it reminds us that we must remain interested in the big picture as well as the minutiae. Our communities would be well served if we imagined ourselves as partners in this endeavor rather than an endangered species.

Whatever your opinion, it is well worth checking out the meeting proceeding that will be posted shortly.

Kudos to Jason Kucsma Digitization and Emerging Technologies manager at METRO for putting this together.

Post or View Comments (1)

65 is a magic number

April 13, 2009 by Michelle Misner
Filed in Projects, Workflow

The magic numbers. Digital ID: 410411. New York Public Library

We’ve reached a milestone in our list of Basecamp projects. 65 projects!  And, we have more in our queue that have been submitted but not approved.

Over the past few months, we’ve developed a process for managing the flow of projects that we are working on to be submitted, approved and documented.

We start with filling out a project submission form with information on the deadline, project owners and a brief description of the project requirements.  Once the project has been reviewed and approved, we assign an alphanumeric code and it gets promoted to Basecamp.  We have the following codes:

  • I for Internal sites and applications  (such as the staff intranet)
  • D for sites and applications used by the Digital Experience Group (such as our project submission form built in Ruby on Rails)
  • N for public Nypl.org sites and applications
  • V for audio/Video projects
  • E for Externally hosted sites (such as Flickr)
  • S for other Screens (such as information screens in library locations)

As of our last count, we have 41 projects filed under the letter N.  The biggest project is N0004, the nypl.org redesign project.  N0004 has 17 sub-projects.  We divided the redesign work into specific aspects to include content development, theming and configuration, redirects, third party applications, archival collections, blogs, taxonomies, events, locations.  Some of the sub-projects are very small, but didn’t fit anywhere else, like implementing the Qwidget application on the site.  And, some of the sub-projects are quite large in scope, such as Search, which is dependent on the Solr indexing projects and the new library catalog.

To paraphrase one of my colleagues, soon we may have more projects than library locations!

Post or View Comments (0)

Stat of the Week No. 5 – The Most Popular Search Term at the Library is…

April 10, 2009 by Michael Lascarides
Filed in Search, Stat of the Week, Usability

OK, so Stat of the Week has returned! Can we just agree to pretend those twelve skipped weeks never happened? Thanks.

tumblebooksRecently, the Digital Experience Group gave a presentation to the occasional meeting of our Site Managers, the people who run each of our branch libraries. These are the people who work every day with our patrons, help them with their requests, and get them situated on the computers. They know the NYPL patron as well as anyone. We asked this group, “What do you think the most frequently searched word on the NYPL web site is?” We got a lot of good guesses: “Jobs”. “DVD”. “Books”. “Hours”. “Classes”. “Late fees”.

Good guesses all. Many of these searches are definitely in the top 25 or so on a regular basis. But over the past year, one search term has consistently occupied the number one spot, and not by a small margin. I mean, we’re talking a 1996 Chicago Bulls level of dominance. That term is…

Tumblebooks.

When we revealed that answer to the site managers, the reponse was an audible “Huh?”

Tumblebooks are animated storybooks for children. They’re built and maintained on a third-party web site, but they’re there, free of charge, to anyone who visits via the web site of a participating library (like the NYPL). The site managers knew that they were very popular. But the number one search term? Consistently? Really? Yes, really. And they could be even further in front; as you can see from the screenshot above, the third most popular search term is “tumble books” as two words!

I’m not quite sure how Tumblebooks got so popular. Are they promoted in the schools? Great word of mouth? Did Oprah say something nice about them? I really don’t know. Feel free to comment if you can cast some light on the mystery.

Regardless, I love the unexpectedness of this discovery, and the popularity of Tumblebooks has some lessons for us:

  1. Don’t make assumptions. As the Site Managers’ meeting showed, we probably could have interviewed every member of our front-line staff and never had anyone bring up Tumblebooks as a top priority for navigation. But in the quantitative statistics, there it is. We should never assume we know what’s getting a lot of traffic based on “common sense” without verification.
  2. The popularity of content can lead to insights about audiences. We might not be able to interview every single patron as they arrive on the site, but since our most popular search is for animated storybooks, we can probably infer that there’s a big audience of parents looking for good content to keep their children occupied. We already sort of knew that, but you can be sure we’ll be trying to figure out if the web site has a higher percentage of young parents in its audience than the physical libraries. Are stay-at-home moms using the web site to avoid dragging the kids to the library? Should we be increasing the amount of kids’ content on the site? All sorts of good questions start to flow.
  3. If someone’s doing a lot of searching for an item, we might want to make better links to it. The Library’s link to Tumblebooks is not terribly hidden; it’s on the “Books and Materials” page, which is already one of the top 5 most popular on the whole site. But even though it’s only a single click away from the home page, enough visitors miss it and resort to search that we might want to take a look at how the links are presented. And in fact the current Books page is a bit of a mess, with a lack of clear hierarchies. The number of “tumblebooks” searches is evidence of a design that’s not working.
  4. Search terms reveal context. A search for “tumblebooks” clearly marks the user as a child or someone who’s caring for a child. When we return this search result, not only should we be returning the Tumblebooks link as the first result, but we should be suggesting secondary links to children’s programs, other online storybooks, etc. Our upcoming search redesign (much more on this in future posts) will take this notion into account.

Finally, the popularity of Tumblebooks itself is instructive. It’s a really good web site. Once you follow the link from a library, it’s simple to find a book, it downloads quickly, the usability is excellent (as it should be when your target demographic is four years old), there’s no login or other restrictive administrative overhead, there’s a great variety of storybooks and the content is fun and engaging.

There’s something to be said for design so simple a four-year-old can grasp it intuitively. It’s not an easy trick to pull off.

Post or View Comments (13)

Calling All Widget Builders! (#2)

by Andrew Wilson
Filed in widgets

The New York Public Library is updating its earlier request for the development of widgets for our homework resources (posted Feb. 27, 2009).  Rather than building several widgets at all at once we’ve decided to simplify the process and build them one at a time, learning from the successes and problems of each before we build the next.   Our first widget is going to be a List Building Widget that will include platform integration for iGoogle, Facebook, MySpace, blogs and web pages, and/or desktops.    The widget will allow users to:

  • Build lists of favorite book, movie, music, game, etc. lists
  • Build lists of materials needed for homework projects
  • Share lists with friends

We are looking for open source designs that can be made available to and repurposed by other organizations seeking to engage young people. The widgets will be developed as part of a 2008 National Leadership Grant from the Institute of Museum and Library Services titled Homework NYC Widgets: A Decentralized Approach to Homework Help By Public Libraries.

Read the RFP.

The project team will accept and answer questions about the proposal via comments on the NYPL Labs blog.  Proposals for the List Building Widget should be sent as an email attachment to Josh Greenberg, Joshua_Greenberg@nypl.org,  by May 1, 2009.

Post or View Comments (2)

The Most Exciting Thing About Web Standards Is … umm …

April 7, 2009 by Trevor Thornton
Filed in CSS, Code, Content, Design, Javascript, Planning, Semantic Web, Usability, drupal, interfaces, users

Master (1777). Digital ID: 1635292. New York Public LibraryAs you may have heard, we’re currently in the process of redesigning NYPL.org from the ground up – a huge undertaking that seems to get bigger the deeper we get into it. We in the UX team have taken the redesign as an opportunity to implement new guidelines for best practices in web design that will be used as standards in future projects. The purpose of these guidelines is to provide a consistent experience for all users regardless of platform or browser capabilities, while ensuring that the site is flexible enough to incorporate new features and functionality in the future.

Here is a quick overview of some of the new guidelines we’re putting in place:

Markup

XHTML 1.0 Strict will be the standard Document Type Definition for all generated web pages, and all pages should validate. All content will be structured using semantic XHTML markup, with layout and presentational properties specified in external CSS files (see below). Presentational markup (e.g. use of <b>, <i> <u>, etc.) will be avoided. Class and ID names applied to content elements should be descriptive of the element’s function and not specific to presentational attributes (”title” or “subtitle” rather than “bold” or “red”).

CSS

All CSS rules will be defined in external files – use of embedded and inline styles should be avoided where possible. To best support users accessing the site with Internet Explorer 6 (currently about 25-30% of our overall traffic), use of CSS2 and CSS3 selectors and declarations will be avoided where practical as these are not supported in IE6. CSS2/3 can be used to enhance certain aspects of the design for browsers that support it, but care will be taken that the overall design and user experience is not significantly compromised in IE6. Browser-specific rules and hacks will be avoided whenever possible.

Javascript

Javascript will be utilized only in cases where its use provides a significant improvement to the user experience. Where it is used, Javascript should be unobtrusive, with scripts contained in external files and implemented in such a way that there is no loss in basic functionality for users with Javascript disabled.

Accessibility

Many of the guidelines above, in addition to representing best practices for web development, also serve to ensure that the web site is accessible to users with disabilities. Our goal is to fully comply with Priorities 1 and 2 of the W3C Web Content Accessibility Guidelines. In this effort we are fortunate to have the input of the staff and patrons of the Andrew Heiskell Braille and Talking Book Library, who have provided very useful insight into practical issues faced by users who rely on screen reading and enlarging software to view web content. While the W3C guidelines are important, having insight into real user behavior is invaluable.

Post or View Comments (9)