|By David Weinberger||
|March 18, 2014 05:07 PM EDT||
Dean Krafft, Chief Technology Strategist for Cornell University Library, is at Harvard to talk about the Mellon-funded Linked Data for Libraries (LD4L) project he leads. The grantees include Cornell, Stanford, and the Harvard Library Innovation Lab (which is co-sponsoring the talk with ABCD). (I provide nominal leadership for the Harvard team working on this.)
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
Dean will talk about the LD4L project by talking about its building blocks. [Dean had lots of information and a lot on the slides. I did a particularly bad job of capturing it.]
Mellon last December put up $1M for a 2-year project that will end in Dec. 2015. The participants are Cornell, Stanford, and the Harvard Library Innovation Lab.
Cornell: Dean Krafft, Jon Corso-Rickert, Brian Lowe, Simeon Warner
Stanford: Tom Cramer, Lynn McRae, Naomi Dushay, Philip Schreur
Harvard: Paul Deschner, Paolo Ciccarese, me
Aim: Create a Scholarly Resource Semantic Info Store model that works within and across institutions to create a network of Linked Open Data to capture the intellectual value that librarians and other domain experts add to info, patterns of usage, and more.
Ld4L wants to have a common language for talking about scholarly materials. – Outcomes: – Create a SRSIS ontology sufficiently expressive to encompass catalog metadata and other contextual elements – Create a SRSIS semantic editing display, and discovery system based on Vitro to support the incremental ingest of semantic data from multiple info sources – Create a Project Hydra-compatible interface to SRSIS, an active triples software component to facilitate easy use of the data
Why use Linked Data?
LD puts the emphasis on the relationships. Everything is related.
Benefits: The connections have meaning. And it supports “many dimensions of nearness”
Dean explains RDF triples. They connect subjects with objects via a consistent set of relationships.
A nice feature of LOD is that the same URL that points to a human-readable page can also be taken as a query to show the machine-readable data.
There’s commonality among references: shared types, shared relationships, shared instances defined as types and linked by relationships.
LOD is great for sharing data. There’s a startup cost, but as you share more data repositories and types, the costs/effort goes up linearly, not at the steeper rate of traditional approaches.
Dean shows the mandatory graphic of a cloud of LOD sources.
VIVO: Vivo was the inspiration for LD4L. It makes info about researchers discoverable. It’s software, data, a standard, and a community. It connects scientists and scholars through their research and scholarship. It provides self-describing data via shared ontologies. It provides search results enhanced by what it knows. And it does simple reasoning.
Vivo is built on the VIVO/Vitro platform. It has ingest tools, ontology editing tools, instance editing tools, and a display system. It models people, organizations, grants, etc., the relationships among them, and links to URIs elsewhere. It describes people in the process of doing research. It’s discipline-neutral. It uses existing domain terminology to describe the content of research. It’s modular, flexible, and extensible.
VIVO harvests much of its data automatically from verified sources.
It takes a complexity of inputs and makes them discoverable and usable.
All the data in VIVO is public and visible.
Dean shows us a page, and then traverses the network of interrelated authors.
He points out that other institutions are able to mash up their data with VIVO. E.g., the ICTS has info about 1.2M publications that they’ve integrated with VIVO’s data. E.g., you can see research papers created with federal funding but not deposited in PubMed Central.
VIVO is extensible. LASP extended VIVO to include spacecraft. Brown U. is extending it to support the humanities and artistic works, adding “performances,” for example.
The LD4L ontology will use components of the VIVO-ISF ontology. When new ontologies are needed, it will draw upon VIVO design patterns. The basis for SRSIS implementations will be Vitro plus LD4L ontologies. The multi-institution LD4L demo search will adapt VIVOsearch.org.
The 8M items at Cornell have generated billions of triples.
Project Hydra. Hydra is a tech suite and a partnership. You put your data there and can have many different apps. 22 institutions are collaborating.
Fundamental assumption: No single system can provide the full range of repository-based solutions for a given institution’s needs, yet sustainable solutions do require a common repository. Hydra is now building a set of “heads” (UI’s) for media, special collections, archives, etc.
Fundamental assumption: No single institution can build the full range of what it needs, so you need to work with others.
Hydra has an open architecture with many contributors to a common core. There are collaboratively built solution bundles.
Fedora, Ruby on Rails for Blacklight, Solr, etc.
LD4L will create an activeTriples Hyrdra component to mimic ActiveFedora.
Our Lab’s LibraryCloud/ShelfRank is another core element. It provides model for access to library data. Provides concrete example for creating an ontology for usage.
LD4L – the project
We’re now developing use cases. We have 32 on the wiki. [See the wiki for them]
We’re identifying data sources: Biblio, person (VIVO), usage (LibCloud, circ data, BorrowDirect circ), collections (EAD, IRs, SharedShelf, Olivia, arbitrary OAI-PMH), annotations (CuLLR, Stanford DMW, Bloglinks, DBpedia LibGuides), subjects and authorities (external sources). Imagine being able to look at usage across 50 research libraries…
Assembling the Ontology:
Whenever possible the project will use existing ontologies
Timeline: By the end of the year we hope to be piloting initial ingests.
Workshop: Jan. 2015. 10-12 institutions. Aim: get feedback, make a “sales pitch” to other organizations to join in.
June 2015: Pilot SRSIS instances at Harvard and Stanford. Pilot gather info across all three instances.
Dec. 2015: Instances implemented.
Q: Who anointed VIVO a standard?
A: It’s a de facto.
Q: SKOS is considered a great start, but to do anything real with it you have to modify it, and if it changes you’re screwed.
A: (Paolo) I think VIVO uses SKOS mainly for terms, not hierarchies. But I’m not sure.
Q: What are ActiveTriples?
A: It’s a Ruby Gem that serves as an interface for Hydra into a Fedora repository. ActiveTriples will serve the same function for a backend triple store. So you can swap different triple stores into the Fedora repository. This is Simeon Warner’s project.
Q: Does this mean you wouldn’t have to have a Fedora backend to take advantage of Hydra?
A: Yes, that’s part of it.
Q: Are you bringing in GIS linked data?
A: Yes, to the extent that we can and it makes sense to.
A: David Siegel: We have 6M data points from 1.1M Hollis records. LibraryCloud is ingesting them.
Q: What’s the product at the end?
A: We promised Mellon the ontology and instances of LOD based on the ontology at each of the 3 institutions, and search across the three.
Q: Harvard doesn’t have a Fedora backend…
A: We’d like to pull from non-catalog sources. That might well be an OAI-PMH ingest, or some other non-Fedora source.
Q: What is Simeon interested in with regard to Arxiv.org?
A: There isn’t a direct relationship.
Q: He’s also working on ORCID.
A: We have funding to do some level of integration of ORCID and VIVO.
Q: What is the bibliographic scope? BibFrame isn’t really defining items, etc. They’ve pushed it into annotations.
A: We’re interested in capturing some of that. BibFrame is offering most of what we need, but we have to look at each case. Then we communicate with them and hope that BibFrame does most of the work.
Q: Are any of your use cases posit tagging of contents, including by users perhaps with a controlled vocabulary?
A: We’ll be doing tagging at the object level. I’m unsure whether we’re willing to do tagging within the object.
A: [paolo] We assume we don’t have access to the full text.
A: You could always point into our data.
Q: How can we help?
A: We’re accumulating use cases and data sources. If you’re aware of any, let us know.
Q: It’s been hard for libraries to put enough effort into authority control, to associate values comparable across different subject schemes…there’s a lot of work to make things work together. What sort of vocabulary or semantic links will you be using? The hard part is getting values to work across domains.
A: One way to deal with that is to bring together the disparate info. By pulling together enough info, you can sometimes use the network to you figure that out. But in general the disambiguation challenge (and text fields are even worse) is not something we’re going to solve.
Q: Are the working groups institutionally based?
A: No. They’re cross-institution.
[I'm very excited about this project, and about the people working on it.]
A completely new computing platform is on the horizon. They’re called Microservers by some, ARM Servers by others, and sometimes even ARM-based Servers. No matter what you call them, Microservers will have a huge impact on the data center and on server computing in general. Although few people are familiar with Microservers today, their impact will be felt very soon. This is a new category of computing platform that is available today and is predicted to have triple-digit growth rates for some ...
Oct. 24, 2016 02:00 PM EDT Reads: 34,140
SYS-CON Events announced today that MathFreeOn will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. MathFreeOn is Software as a Service (SaaS) used in Engineering and Math education. Write scripts and solve math problems online. MathFreeOn provides online courses for beginners or amateurs who have difficulties in writing scripts. In accordance with various mathematical topics, there are more tha...
Oct. 24, 2016 01:00 PM EDT Reads: 1,000
In past @ThingsExpo presentations, Joseph di Paolantonio has explored how various Internet of Things (IoT) and data management and analytics (DMA) solution spaces will come together as sensor analytics ecosystems. This year, in his session at @ThingsExpo, Joseph di Paolantonio from DataArchon, will be adding the numerous Transportation areas, from autonomous vehicles to “Uber for containers.” While IoT data in any one area of Transportation will have a huge impact in that area, combining sensor...
Oct. 24, 2016 01:00 PM EDT Reads: 822
SYS-CON Events announced today that SoftNet Solutions will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. SoftNet Solutions specializes in Enterprise Solutions for Hadoop and Big Data. It offers customers the most open, robust, and value-conscious portfolio of solutions, services, and tools for the shortest route to success with Big Data. The unique differentiator is the ability to architect and ...
Oct. 24, 2016 01:00 PM EDT Reads: 853
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
Oct. 24, 2016 12:45 PM EDT Reads: 4,698
More and more brands have jumped on the IoT bandwagon. We have an excess of wearables – activity trackers, smartwatches, smart glasses and sneakers, and more that track seemingly endless datapoints. However, most consumers have no idea what “IoT” means. Creating more wearables that track data shouldn't be the aim of brands; delivering meaningful, tangible relevance to their users should be. We're in a period in which the IoT pendulum is still swinging. Initially, it swung toward "smart for smar...
Oct. 24, 2016 12:30 PM EDT Reads: 974
@ThingsExpo has been named the Top 5 Most Influential Internet of Things Brand by Onalytica in the ‘The Internet of Things Landscape 2015: Top 100 Individuals and Brands.' Onalytica analyzed Twitter conversations around the #IoT debate to uncover the most influential brands and individuals driving the conversation. Onalytica captured data from 56,224 users. The PageRank based methodology they use to extract influencers on a particular topic (tweets mentioning #InternetofThings or #IoT in this ...
Oct. 24, 2016 12:15 PM EDT Reads: 8,407
SYS-CON Events announced today that Niagara Networks will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Niagara Networks offers the highest port-density systems, and the most complete Next-Generation Network Visibility systems including Network Packet Brokers, Bypass Switches, and Network TAPs.
Oct. 24, 2016 11:45 AM EDT Reads: 1,309
In an era of historic innovation fueled by unprecedented access to data and technology, the low cost and risk of entering new markets has leveled the playing field for business. Today, any ambitious innovator can easily introduce a new application or product that can reinvent business models and transform the client experience. In their Day 2 Keynote at 19th Cloud Expo, Mercer Rowe, IBM Vice President of Strategic Alliances, and Raejeanne Skillern, Intel Vice President of Data Center Group and ...
Oct. 24, 2016 11:45 AM EDT Reads: 1,511
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at Cloud Expo, Ed Featherston, a director and senior enterprise architect at Collaborative Consulting, will discuss the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
Oct. 24, 2016 10:45 AM EDT Reads: 3,861
Virgil consists of an open-source encryption library, which implements Cryptographic Message Syntax (CMS) and Elliptic Curve Integrated Encryption Scheme (ECIES) (including RSA schema), a Key Management API, and a cloud-based Key Management Service (Virgil Keys). The Virgil Keys Service consists of a public key service and a private key escrow service.
Oct. 24, 2016 10:15 AM EDT Reads: 1,071
Fact is, enterprises have significant legacy voice infrastructure that’s costly to replace with pure IP solutions. How can we bring this analog infrastructure into our shiny new cloud applications? There are proven methods to bind both legacy voice applications and traditional PSTN audio into cloud-based applications and services at a carrier scale. Some of the most successful implementations leverage WebRTC, WebSockets, SIP and other open source technologies. In his session at @ThingsExpo, Da...
Oct. 24, 2016 09:00 AM EDT Reads: 2,321
Fifty billion connected devices and still no winning protocols standards. HTTP, WebSockets, MQTT, and CoAP seem to be leading in the IoT protocol race at the moment but many more protocols are getting introduced on a regular basis. Each protocol has its pros and cons depending on the nature of the communications. Does there really need to be only one protocol to rule them all? Of course not. In his session at @ThingsExpo, Chris Matthieu, co-founder and CTO of Octoblu, walk you through how Oct...
Oct. 24, 2016 08:15 AM EDT Reads: 3,155
SYS-CON Events announced today that Embotics, the cloud automation company, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Embotics is the cloud automation company for IT organizations and service providers that need to improve provisioning or enable self-service capabilities. With a relentless focus on delivering a premier user experience and unmatched customer support, Embotics is the fas...
Oct. 24, 2016 08:00 AM EDT Reads: 866
The Internet of Things (IoT), in all its myriad manifestations, has great potential. Much of that potential comes from the evolving data management and analytic (DMA) technologies and processes that allow us to gain insight from all of the IoT data that can be generated and gathered. This potential may never be met as those data sets are tied to specific industry verticals and single markets, with no clear way to use IoT data and sensor analytics to fulfill the hype being given the IoT today.
Oct. 24, 2016 07:30 AM EDT Reads: 2,578
@ThingsExpo has been named the Top 5 Most Influential M2M Brand by Onalytica in the ‘Machine to Machine: Top 100 Influencers and Brands.' Onalytica analyzed the online debate on M2M by looking at over 85,000 tweets to provide the most influential individuals and brands that drive the discussion. According to Onalytica the "analysis showed a very engaged community with a lot of interactive tweets. The M2M discussion seems to be more fragmented and driven by some of the major brands present in the...
Oct. 24, 2016 05:45 AM EDT Reads: 11,400
WebRTC has had a real tough three or four years, and so have those working with it. Only a few short years ago, the development world were excited about WebRTC and proclaiming how awesome it was. You might have played with the technology a couple of years ago, only to find the extra infrastructure requirements were painful to implement and poorly documented. This probably left a bitter taste in your mouth, especially when things went wrong.
Oct. 24, 2016 05:00 AM EDT Reads: 5,552
The Quantified Economy represents the total global addressable market (TAM) for IoT that, according to a recent IDC report, will grow to an unprecedented $1.3 trillion by 2019. With this the third wave of the Internet-global proliferation of connected devices, appliances and sensors is poised to take off in 2016. In his session at @ThingsExpo, David McLauchlan, CEO and co-founder of Buddy Platform, discussed how the ability to access and analyze the massive volume of streaming data from millio...
Oct. 24, 2016 05:00 AM EDT Reads: 3,108
SYS-CON Events announced today that Pulzze Systems will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Pulzze Systems, Inc. provides infrastructure products for the Internet of Things to enable any connected device and system to carry out matched operations without programming. For more information, visit http://www.pulzzesystems.com.
Oct. 24, 2016 05:00 AM EDT Reads: 2,511
Successful digital transformation requires new organizational competencies and capabilities. Research tells us that the biggest impediment to successful transformation is human; consequently, the biggest enabler is a properly skilled and empowered workforce. In the digital age, new individual and collective competencies are required. In his session at 19th Cloud Expo, Bob Newhouse, CEO and founder of Agilitiv, will draw together recent research and lessons learned from emerging and established ...
Oct. 24, 2016 04:30 AM EDT Reads: 1,342