(PDF) Spatial cyberinfrastructures, ontologies, and the · PDF fileSpatial cyberinfrastructures, ontologies, and the humanities ... critical as the technical components for its ... humanities, - DOKUMEN.TIPS (2024)

(PDF) Spatial cyberinfrastructures, ontologies, and the · PDF fileSpatial cyberinfrastructures, ontologies, and the humanities ... critical as the technical components for its ... humanities, - DOKUMEN.TIPS (1)

Spatial cyberinfrastructures, ontologies, andthe humanitiesRenee E. Siebera,b,1, Christopher C. Wellena,c, and Yuan Jind

aDepartment of Geography, bSchool of Environment, and dSchool of Computer Science, McGill University, Montreal, QC, Canada H3A 2K6; and cDepartmentof Geography, University of Toronto, Toronto, ON, Canada M5S 3G3

Edited by Michael Goodchild, University of California, Santa Barbara, CA, and approved February 1, 2011 (received for review September 24, 2009)

We report on research into building a cyberinfrastructure forChinese biographical and geographic data. Our cyberinfrastructurecontains (i) the McGill-Harvard-Yenching Library Ming Qing Wom-en’s Writings database (MQWW), the only online database onhistorical Chinese women’s writings, (ii) the China BiographicalDatabase, the authority for Chinese historical people, and (iii)the China Historical Geographical Information System, one of thefirst historical geographic information systems. Key to this inte-gration is that linked databases retain separate identities as basesof knowledge, while they possess sufficient semantic interopera-bility to allow for multidatabase concepts and to support cross-database queries on an ad hoc basis. Computational ontologiescreate underlying semantics for database access. This paper fo-cuses on the spatial component in a humanities cyberinfrastruc-ture, which includes issues of conflicting data, heterogeneous datamodels, disambiguation, and geographic scale. First, we describethe methodology for integrating the databases. Then we detailthe system architecture, which includes a tier of ontologies andschema. We describe the user interface and applications that allowfor cross-database queries. For instance, users should be able toanalyze the data, examine hypotheses on spatial and temporalrelationships, and generate historical maps with datasets fromMQWW for research, teaching, and publication on Chinese womenwriters, their familial relations, publishing venues, and the literaryand social communities. Last, we discuss the social side of cyber-infrastructure development, as people are considered to be ascritical as the technical components for its success.

geospatial ontologies | prosopography | ontology integration

We report on research into building a cyberinfrastructure(CI) for Chinese biographical and geographic data. This

CI contains the McGill-Harvard-Yenching Library Ming QingWomen’s Writings database (MQWW), the China BiographicalDatabase (CBDB), and the China Historical Geographical In-formation System (CHGIS). It represents the integration ofimportant data related to the humanities. CHGIS is one of thefirst historical GISs in the world. MQWW is the only onlinedatabase on historical Chinese women’s writings. The two majordatabases in Chinese history are the CHGIS and the CBDB.They serve as central repositories of historical geoadministrativeunits and official names of notable individuals. Substantial re-sources have been devoted to the accurate depiction of places foreach time period and provenance of names. Additionally, Chinapossesses the largest number of biographies in the world.Key to this integration is that the databases to be linked retain

their separate identities as bases of knowledge, while they pos-sess sufficient semantic interoperability to allow for multi-database concepts and to support cross-database queries on anad hoc basis. We aim for flexibility within the system architec-ture, in cases where the data structure needs to be altered fornew ontologies, concepts, databases or user groups.This project is a unique CI within East Asian studies and

historical research, and serves as a general guide for CIs in thehumanities. It provides a proof of concept for the seamlesslinkage of heterogeneous databases, which may differ in type,

location, and content. Interoperability of concepts is essential forCI and is driven by ontologies. Because one database is geo-graphic, the project represents the application of spatial ontol-ogies in the humanities, where computational ontologies areused to create underlying semantics for database access.CIs are well promoted in the sciences for interoperability

among distributed databases. The problem is how one accom-plishes data integration and conducts concept-based searches inthe humanities. CIs are arguably much harder to build in thehumanities than in the sciences, which is why these promotionshave yet to transcend theory to actual design and implementation.We focus this paper on spatial issues in a humanities CI, whichinclude issues of conflicting data, differing data models, disam-biguation, and geographic scale. First, we describe the method-ology for integrating the databases. Then we detail the systemarchitecture, which includes a tier of ontologies and schema. Wedescribe the user interface and applications that allow for cross-database queries. For instance, users should be able to analyze thedata, examine hypotheses on spatial and temporal relationships,and generate historical maps with datasets from MQWW for re-search, teaching, and publication on Chinese women writers, theirkinship networks, publishing venues, and the literary and socialcommunities. Last, we discuss the social side of CI development,because people are considered to be as critical as the technicalcomponents for its success.

Role of CI in the HumanitiesA CI takes its name from the infrastructure required by an in-dustrial society to function—the fabric of roads, running water,and power. A CI provides the infrastructure for a knowledgesociety. As originally envisioned in a report by the NationalScience Foundation (NSF) (1), a CI includes high-performancecomputing; observation, measurement, and integration services;graphical user interfaces (GUIs); and visualization services. It ispossible that future knowledge societies will be reliant on CIeven as current industrial societies are reliant on industrial in-frastructure; however, CI will most likely be initiated in the re-search community, which has both the need for CI and themeans to devote research and development resources to nascenttechnologies. Therefore, the sciences are developing a sharedplatform, a CI, for locating and integrating data, visualizing thisdata, and enabling analyses and simulations.Spatiality plays an important role in CIs. An essential aspect of

a CI is a geographically distributed computing environment con-nected by a network. Much of the world’s information containsgeolocations. Handling that geographically distributed spatialinformation can demand specialized knowledge (e.g., for geo-visualization and spatial statistics) (2).

Author contributions: R.E.S. designed research; C.C.W. and Y.J. performed research;C.C.W. and Y.J. analyzed data; and R.E.S. and C.C.W. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.1To whom correspondence should be addressed. E-mail: [emailprotected].

5504–5509 | PNAS | April 5, 2011 | vol. 108 | no. 14 www.pnas.org/cgi/doi/10.1073/pnas.0911052108

mailto:[emailprotected]

www.pnas.org/cgi/doi/10.1073/pnas.0911052108

(PDF) Spatial cyberinfrastructures, ontologies, and the · PDF fileSpatial cyberinfrastructures, ontologies, and the humanities ... critical as the technical components for its ... humanities, - DOKUMEN.TIPS (2)

CIs are “also the more intangible layer of expertise and thebest practices, standards, tools, collections and collaborativeenvironments that can be broadly shared across communities ofinquiry” (3). Building a CI is a social as well as a scientific/technological endeavor (1, 3–5), requiring people to collaborateon envisioning and developing this infrastructure; it cannot bethe domain of technical experts alone. It remains a very technicaltask, undoubtedly because the computational challenges aresignificant and innovation emphasizes science needs, such ashigh-performance computing (1, 4–7). To date, CI research hasbeen a joint project between the physical or natural sciences withthe computing sciences (5, 6). In the North American contextthere are few examples of CI development for the humanities(8), and none within the paradigm envisioned by the NSF (1).There have been spatial data infrastructures and gazetteers forthe humanities (9, 10), which serve as precursors to CIs. How-ever, there has been little research in the humanities into thecross-database nature of CIs, much less one interoperable witha spatial database.Challenges have not deterred strong promotion of CIs by the

humanities, which produced its own seminal call for an “out-pouring of creative energy [in which humanities researchersshould] lead rather than follow in the design of this new culturalinfrastructure” (3). The humanities must build their own CIbecause no one else will. Humanities computing is a robust field(11) but is hampered by limited resources for computation re-search and skill development relative to science, few sustainablemodels for project management, and weaker connections withindustry that could assist in development (12). The disciplinescould be viewed as a reaction to excessive focus on quantificationin the academy. Humanities data tends to be highly unstructuredand makes greater use of multimedia, although Short (11)reminds us that structured databases form an important com-ponent of humanities research. There is every reason to thinkthat the federation of structured content and ability to servescholars in diverse locations would be of benefit to researchers inthe arts.

Role of Ontologies in CICI designers seek to integrate data and models from variedsources with disparate terminologies and conceptual models. Weturn to ontologies to achieve that integration (4, 13). Computerscience defines an ontology as a theory that formalizes in somelogical structure knowledge important to the understanding ofa domain (14). With CIs, ontologies can serve as metamodels forautomated reasoning and allow, for instance, concept-basedqueries over large datasets with multiple data schema or auto-matic workflow generation (4, 5, 15). Ontologies can be thoughtof as controlled, standardized, and structured vocabularies—setsof concepts that also possess properties and relations. Multiple-database schema can map to a single ontology that enable usersto interact with a single structure when seeking data. Ontologiescould provide an intuitive way for users to cross-query multipledatabases, relying on concepts in the ontology instead of varia-bles in specific databases.Spatial ontologies have unique design considerations (16, 17).

Much geographic data are represented by a series of x,y coor-dinates, so ontologies can be very useful in describing, for ex-ample, the extent of a mountain or the relation of sacred sites tonearby rivers. Spatial information has a geometric component(e.g., latitude and longitude) and also a relational component(e.g., part of). Specific instances of spatial features have nu-merous properties that can be used for identification (e.g., placename). Name, ZIP code, centroid, and polygon outline are allfairly nonambiguous ways to identify a US county.Whereas current ontologies like GeoOWL provide a good

framework for spatial information generally, there has been noexamination of how well current ontologies handle structured

spatial information in humanities contexts, where spatial featuresmay be indicated by only a place name, a place name with someadditional information (e.g., the place name of which it is a part),and geometric coordinates. Additionally, information from var-ious sources may disagree. One historical text may claim a per-son’s lineage from a prestigious region; another text may disputethat claim. Humanities scholars may consider the two claimsequally relevant and equally worthy of study, even if one claim isdemonstrably false. CIs should support multiple and competingontologies (4, 18). Most spatial ontologies consider the real. How-ever, throughout time, people have considered all sorts of places:imaginary places, metaphoric places (e.g., a specific castle as a hu-man body; body parts being used to describe parts of a river), aswell as religious and spiritual places (e.g., nirvana). For this rea-son, a humanities CI should seek to present relevant information tousers, but not necessarily resolve contested information.The use of historical data presents additional unique chal-

lenges. In a historical GIS, each feature has a temporal durationfor which it is valid, and a feature or number of features thatpredate and postdate it in time (9).Humanities scholars may very well resist ontologies. To

domains steeped in individuality and fluidity, ontologies mayrepresent a fossilization of a particular school of thought, andspatial ontologies a method to rigidify political boundaries whileignoring extrastate phenomena (11, 12). Like Friedlander (19),we view ontologies as a shared and ongoing conversation withresearchers and other interested parties who want the func-tionality that a CI provides. Indeed, our project responds to callsby humanities researchers for scientists to get involved in re-alizing humanities computation.†

Project OverviewWe built a proof-of-concept CI to integrate three databases re-lated to Chinese geography: history, gender, and culture. Theinformation exists in multiple data models and languages (i.e.,English, simplified Chinese, and traditional Chinese). CHGIS,physically located at Harvard, is a database of populated placesand historical administrative units for Chinese history between221 BCE and 1911 CE. CBDB, a prosopographic or biographicaldatabase, is currently housed at Academia Sinica in Taiwan. It is

Servlet JSP

Browser Browse

Query API

Query Broker

DBConnection

WS Connection

CBDB MQWW CHGIS

HTTP HTML

Request Response

SPARQL XML

JDBC HTTP

GU

ID

ATAA

PP

LICAT

ION

Application Ontologies

Fig. 1. CI system architecture, which is composed of three tiers: GUI, ap-plication, and data.

†Mactavish A (2001) Practice, technology, and academic capitalism in the arts and human-ities computing curriculum. Paper presented at the Humanities Computing Curriculum/The Computing Curriculum in the Arts and Humanities, November 9–10, 2001, Nanaimo,BC, Canada.

Sieber et al. PNAS | April 5, 2011 | vol. 108 | no. 14 | 5505

SOCIALSC

IENCE

SSP

ECIALFEATU

RE

(PDF) Spatial cyberinfrastructures, ontologies, and the · PDF fileSpatial cyberinfrastructures, ontologies, and the humanities ... critical as the technical components for its ... humanities, - DOKUMEN.TIPS (3)

considered to be the authority of Chinese historical names, al-though the database contains mostly male names. Until recently,it was a relational database, which structures the characteristicsof those names on the basis of multiple variables (e.g., place,time, occupation, kinship, nonkinship affiliations, writings, andoffice holding). Currently, it resembles a network database, al-though the exact data structure is now proprietary. MQWW, atMcGill University in Canada, is a repository of scanned imagesof texts by women writers held in rare-book collections alongwith some biographic information about the writers. It is the onlyonline database of historical Chinese women’s writings. Re-searchers can link between women based on exchange of theirwork and correspondence, obtain contextual information onfamily and friends, note the ethnicity and marital status of thewomen writers, and access other information about womenwriters and their works.

CI System Architecture. The system architecture is shown in Fig. 1.A user on the Internet poses a query in their browser to the GUI(Fig. 2). The query is sent to a servlet, which passes requests andresponses from the browser. The servlet interacts with java serverpages (jsp), which allows the GUI to create dynamic Web pages.It also interacts with the query application programming in-terface (API), which expresses the query as a SPARQL, with theresult being a resource description framework (RDF). An API isan online library of tools to support an application, and RDF isa standard of the Web Consortium (W3C). RDF is expressed intriplets, with a subject, predicate, and object, much like English(e.g., personID:personLivedAt ‘Shanghai’). A query can gener-ate multiple triplets (e.g., person’s name plus person’s birth-place). The major communication language for RDF is simpleprotocol and RDF query language (SPARQL), another standardof the W3C. SPARQL provides a structured way to pose ques-tions of an ontology (SELECT * WHERE {?personID:person-LivedAt ‘Shanghai’}). These RDF triplets and SPARQL queriesexpress the concepts of the ontology.The SPARQL query is sent to the query broker, which

reframes the ontology-based triplets into a set of RDF tripletsthat can be understood by the database-specific applicationontologies (AOs). AOs are built from the database schema orstructures. The original query may require two or more data-bases for an answer, which is reflected in the triplets and thecross-database SPARQL queries. The query broker also deter-mines the order in which the queries are answered and thenumber of sequential searches that must be made to completethe original query. The separate result sets, returned in eXten-sible markup language (XML), are combined in the query brokerand returned to the user.

CI Ontologies. The CI is driven throughout by ontologies.Ontologies are used to construct the input parameters in theGUI so the user sees concepts like birth or writer. We decided ona tiered ontology design. An upper-level ontology (ULO) guidesthe creation of a domain ontology (DO), which extends theconcepts and properties of the ULO to ensure relevancy todomains like Chinese history. The AOs connect each databaseschema to the DO. The relationship of ontology to architectureis seen in Fig. 3, which only shows the DO concepts and prop-erties for place. This follows the tiers proposed by others (18),with the exception that we included a ULO.The highly abstract ULO is both database independent and

more general than Chinese history, gaining relevance beyond thespecifics of this project. We included it with the hope that itwould ease the addition of any new concepts and databases (e.g.,if the CI were to include a database on Chinese Islamic sites),because it ensures a database-independent template for de-signing the DO. The ULO was constructed from long-existingand well-tested standards such as GeoOWL (for space) andFriend of a Friend (FOAF; for people). These standards wereextended in our ULO by adding subproperties and subclasseswhere necessary. For instance, the FOAF served as a basictemplate for our ontology of people, but it required sub-properties of the FOAF:name property to specify the languageof the name (Chinese characters or romanized characters) aswell as the type of the name, because throughout China, peoplehad numerous types of names.The DO inherits classes and relationships from ULO and, like

the ULO, is shared across all databases. To build our DO, weinitially treated relational data tables as classes and the fields intables as properties. We then related the tables and fields to theULO, creating subclasses and subproperties of the ULO whereneeded and also creating some new ones. The databases gener-ally fit well into the DO structure because the humanitiesscholars, with whom we collaborated, thought along similar lines.The greatest exception was place, which the DO models with theclass called Feature. Table 1 describes how each databasemodels the class Feature. Figs. 4 and 5 diagram the ULO and theDO class Feature. Note that the DO posits some subpropertiesof ULO properties (e.g., partOf, a subproperty of geo:relation-ship) and adds others that have no precedent in the ULO (suchas featureValidFrom, the first date a feature is valid). This“middling-out” process, where the ontology construction beginsfrom the top and bottom and meets in the middle, is a recom-mended best practice (20).Place undergirds many debates on human relationships. Our

databases primarily contain place names as opposed to placegeometry. The existing geometry is fairly simple, largely point-

Fig. 2. Screenshot of search function of GUI, searching for all poets who lived in Luoyang, China. (A) ontology-based search function. (B) Cross-databaseresults of the search (MQWW and CHGIS).

5506 | www.pnas.org/cgi/doi/10.1073/pnas.0911052108 Sieber et al.

www.pnas.org/cgi/doi/10.1073/pnas.0911052108

(PDF) Spatial cyberinfrastructures, ontologies, and the · PDF fileSpatial cyberinfrastructures, ontologies, and the humanities ... critical as the technical components for its ... humanities, - DOKUMEN.TIPS (4)

based as opposed to lines or polygons. For Chinese spatial data,it was far easier to locate the center of power in each adminis-trative unit than to delineate the unit itself. The CHGIS containsextensive records concerning relationships in space and time, inthe form of two tables. The first is called part_of, and indicateswhich geographic features form part of other features. This“parthood” is mereotopological, the political relations of wholesgoverning parts (mereological) and spatial relations of wholescontaining parts (topological). Emphasis on properties of a geo-location, simple geometry, and ontological spatial relations asopposed to geometry are what distinguish much humanities in-formation about space.Similar to Chavez (10), relationships are valid only for a par-

ticular time interval. The second table is called preceded_by andindicates which features precede other features in time and whatevent precipitated the end of one feature and the beginning ofanother (e.g., subdivision or amalgamation). This leads to a net-work model of space, where points are surrogates for unknownor vague polygons but spatial relationships such as containmentare explicit with reference to the Chinese political hierarchy.This hierarchy is three tiered, where provinces govern (andcontain) prefectures, which in turn govern (and contain) coun-ties. This political hierarchy was explicitly included in the spatialontology, from which spatial relationships can be inferred. Thespatial relationships are available from the CHGIS database forinstances in the ontology. The rows in the part_of table ofCHGIS are mapped to featureIsPartOf relationships in the DOand the rows in the preceded_by table of CHGIS are mapped tofeaturePrecededBy relationships in the DO. The DO explicitlyspecifies that these two relationships have as inverses featur-eHasPart and featureFollowedBy, respectively.For now, the system does a simple lookup of CHGIS. Adding

some simple rules, upcoming versions of the ontologies willsupport further spatial queries. Rules will specify, for instance,that parthood and containment are transitive, meaning that if Acontains B and B contains C, than A also contains C. Thus, ifa query requests the names of all women writers active in a certainprovince, the query broker will have sufficient information to sendqueries to the databases with clauses selecting women writers not

only in that province, but also in the prefectures and counties thatprovince governs. For instances of classes in the CI to be quali-tatively georeferenced, they can refer to some administrative unitin the Chinese historical administrative hierarchy.Designing ontologies required balance. We balanced the

generality of an ontology with the specificity of each database,repurposing existing standards to ensure interoperability andusing best practices, while making ontologies compatible witheach database and with the various domains. We tied the DOstructurally to the databases, not to a single database. Having itresemble a single database defeats the purpose of the in-tegration. We could not support the nuanced representation ofgeometry in GeoOWL; where there was geometric information,we expressed only latitudes and longitudes (due to the accessprovided by CHGIS web services). Our proof of concept cur-rently expresses concepts with spatial features (e.g., birth). Allinstances in a class are supported—such is the nature of struc-tured data. However, a variable will show up only if it is sup-ported in an ontology.An incompletely resolved problem is disambiguation for

competing or overlapping information. Each database containsfeatures (e.g., Sichuan province), some of which refer to identicalevents (e.g., lived at). With no global (cross-database) uniqueidentifiers in the databases, we cannot know exactly which fea-tures in MQWW are identical to those in the CHGIS. We canonly compare their attributes, which eliminate many, but not all,potential matches. At the moment, we handle competing in-formation by displaying all related data.A solution proposed by humanities scholars with whom we

worked was to designate authority databases—databases thatdefinitively model and contain, for example, places and people.Primary keys used to identify items in these authority databaseswould identify those same items in any database. This database-specific solution is straightforward but has many disadvantages—it requires controlling the scope of the authority databases, and itcould very well stifle the independence considered vital in hu-manities disciplines (21). Other options include ontology-enableddisambiguation, rules, or other domain knowledge, or users whospecify items that are identical across databases. Either optioncould be used to identify competing information that should beretained. The first option is computationally intense; the secondoption may be more work than users are willing to undertake.Following suggestions in Chavez (10), we use time as a partial

way to disambugate place. With historical spatial databases, andcontrasting with most spatial ontologies, places depend highly ontemporal properties. Space and time are interwoven: each fea-ture in CHGIS has a beginning and an end date.

Social Endeavor of CI.Building a CI is a social as well as a technicalendeavor. The project represents collaboration across dis-ciplines, including East Asian studies, history, and geographicinformation science (GIScience), and will require numerousskills to maintain over time. Like similar infrastructure ini-

DomainOntology

UpperLevel

Ontology

ApplicationOntologies

QueryBroker

DBs

23

1

Fig. 3. System process of how ontologies interact with other elements ofthe architecture.

Table 1. Competing data and differing data models regarding spatial data in the three databases

Database Spatial model

China Biographical Database (CBDB) Place names, feature types, and a five-tier spatial hierarchy. Part-of spatial relation.Point geometry and nearby current place names.

China Historical GIS (CHGIS) Place names, feature types, and a five-tier spatial hierarchy. Part-of spatial relation.Point geometry and some more complex geometry. Beginning and ending dates for features.Preceded-by temporal relation.

Ming-Qing Women’s Writings (MQWW) Place names and a two-tier spatial hierarchy (county-level names and provincial-level names).Part-of spatial relation.

Spatial model includes attribute information (e.g., place names), type of feature, number, temporal predicates if any (e.g., preceded by), hierarchypredicates if any (e.g., part of), types of hierarchy (i.e., geographic scale), and type of geometry (e.g., point based).

Sieber et al. PNAS | April 5, 2011 | vol. 108 | no. 14 | 5507

SOCIALSC

IENCE

SSP

ECIALFEATU

RE

(PDF) Spatial cyberinfrastructures, ontologies, and the · PDF fileSpatial cyberinfrastructures, ontologies, and the humanities ... critical as the technical components for its ... humanities, - DOKUMEN.TIPS (5)

tiatives, limited resources constrain our project. We built a proofof concept with students, who have their own timelines, and withresearchers who move on to other projects. There is no definitivelong-term custodian. Because this CI serves a niche of users withspecific needs, as opposed to a large number of everyday users(as represented by Google), few broad initiatives or support willlikely emerge for further technological development.One solution to resource constraints is the use of open source

software, which offers robust and extensible platforms and mayspeed application development. We used preexisting open sourcesoftware for some components to “mash up” with our own tools.Mashups and APIs reflect the Web of shareable user content andapplications, Web 2.0. SPARQL and RDF represent the se-mantic Web, Web 3.0. Our query broker is contained in an opensource API called D2R. D2R allows for conducting queries ofrelational databases using SPARQL and RDF. D2R began asa graduate student project and, because it is open source, can bemodified and maintained by its small user community. Despitebenefits, D2R required modifications to our CI and to D2R. Weproposed a Web 2.0-based architecture, with databases wrappedin XML and Web services, which are a more generalized versionof APIs. D2R necessitated handling database structures directlywithin the query broker. D2R also constrained the query brokerprocess. D2R searches across databases in reverse order: a firstsearch is in CBDB to obtain the result of a place name, anda second for that place name in CHGIS yields all latitudes andlongitudes of all name occurrences, which could generate a largeresult set. The results sets are compared for matches. BecauseD2R is open source, we could modify D2R’s source code tocontrol the order in which queries are posed. Numerous human-ities projects have decided on open source, using or devel-

oping their own reusable code (e.g., the Perseus Project). Offeringopen source software to the larger humanities community alle-viates certain constraints but does not avoid modifications andrequisite skills to do them.Humanities CIs should support multiple languages (3). Many

software programs continue to be written for an English(American) audience, and may not recognize simplified or tra-ditional Chinese. Character encoding problems forced us toswitch from a faster XML parser to a slower one. In MQWW,Chinese characters are encoded in binary; relational data tablesare encoded in latin1_swedish; and the database is encoded inUTF-8. It is conceivable to convert all data to a single standard,but it is cumbersome to impose standards on legacy databases—hence the focus on trying to make the CI adapt to the databasesas opposed to the databases conforming to each other.Designing for adaptability allowed for a better response to

exogenous factors. CBDB moved from easy (Harvard, relationaldatabase) to more difficult access (Academia Sinica, networkdatabase). For proof of concept, we recreated the relational datastructure on our site. This decision made sense to control vali-dation of result sets, but it delayed integration with the live da-tabase. Exogenous factors also governed data access via Webservices, the gateway to individual databases. We can designelaborate concepts, for example, concerning geographic hierar-chy, in an ontology. However, scale is not yet exposed in the Webservices of CHGIS, so the gateway does not release the datarequired for concepts. Limitations imposed by CHGIS servicessuggest the emergent hurdles CIs must handle when data ownersallow access to their data on their own terms, and reinforce theneed for flexibility in system design, to the extent possible.Allowing for heterogeneity impacted clarity. In our project, the

link between the databases and the ULOs proved insufficientlyclear. This contrasted with the DOs, which were based on ex-tensive consultations with database developers. Treating eachdatabase schema as an initial draft of an ontology reduces thenormally top-heavy process of ontology creation and increasesunderstanding among participants and database developers. Afully functioning CI could be an open environment; new data-bases join the CI or existing databases can easily change structure.In such an environment, having a ULO or DOmore complex thanthe sum of the schema of the participating databases is desirable,as it enables the addition of databases without necessarily alteringthe DO and ULO.Another route to clarity is through the GUI (22). The DO

drives the GUI, but an ontology posed problems for a GUI andquery interface. Which items in the ontologies should be used asinput parameters and how much information should be drawnfrom the database and displayed? Irrespective of the concept, werequired a basic profile for everything returned to the GUI. Weneeded to balance returning only the ID number or label of eachmatch for a query (i.e., “yes, the person you are looking for is inthe database”) with returning every triplet associated with theindividual (DO and ULO), the latter of which could overwhelmthe user. We are currently building GUI 2.0 (Fig. 2), witha separate GUI schema, to determine input parameters and sizeof result sets.It is uncertain how the best CI, database, or GUI design will

attract people to deeply engage with databases, at least in theway databases are analyzed in the sciences. Even without the CI,MQWW found few scholars using the database for more thanreading the scanned texts. The CI increases complexity viaontologies and middleware, but we try to hide the complexitybehind the GUI. These complexities are not equal. Considerpersonal computers, whose intuitive GUIs are complex, thoughthe command line is a simple user interface. A complex archi-tecture has facilitated uptake, not hampered it. A more complexCI could be a more usable CI; complexity of the architecture maybe the only way to improve usability.

Fig. 4. Upper-level ontology, as it pertains to space. White, class; gray,data type.

Fig. 5. Domain ontology, as it pertains to space.

5508 | www.pnas.org/cgi/doi/10.1073/pnas.0911052108 Sieber et al.

www.pnas.org/cgi/doi/10.1073/pnas.0911052108

(PDF) Spatial cyberinfrastructures, ontologies, and the · PDF fileSpatial cyberinfrastructures, ontologies, and the humanities ... critical as the technical components for its ... humanities, - DOKUMEN.TIPS (6)

ConclusionsWe focused on integration of spatial and nonspatial informationfor humanities research and the challenges that spatial data bringto a CI system architecture. Cross-cutting themes to emerge werethe positives and pitfalls of standardization, mostly in the realmof ontologies but also in the use of tools. We also noted thetension between flexibility and clarity. The project highlights thevalue of non–high-performance computing CIs for research.Our project, like most CIs, is resource intensive and fairly top-

down in design, even as it employs a participatory approach.What are the alternatives? In addition to authority databases,which are structured versions of standard reference (authority)texts, data could be collected and uploaded in an unstructuredmanner. Data mining provides a computational solution to un-structured data, defined as inductively determining the frequencyof adjacent word pairs, triplets, quadruplets, and so on. Co-incident word groups produce semantic results but contain littlemeaning. We cannot determine the meaning of Shanghai Rose(the tea, the movie, the color) and cannot distinguish Shanghai(the movie) from Shanghai, China.The text encoding initiative (TEI) allows for aggregation by

formalized annotation. Situated between ontologies and tags, theTEI produces rich results because the tags are expert driven.However, the TEI is not designed for databases and cannot handlespatial concepts in which a series of x,y coordinates characterizea feature. One could revert to complete user-driven tags or folk-sonomies. Small variations in tags (e.g., Beijing, Peking, peking)may result in mismatches, and tags cannot provide the semanticreliability of the TEI. Someone must tag concepts; GIScience iswell aware of difficulties in capturing metadata. Working withmultimedia (e.g., scanned maps), one could use social networkingsites such as Flickr. The danger is that data sits on proprietary orephemeral platforms (3). Irrespective of individual efforts, thedata needs to remain reasonably accessible. Ontologies can pro-

vide the reasoning that TEI cannot, and the controlled vocabulary,nested concepts, and relationships that informal tags cannot.Initiatives like the Deep Web propose to create the semantics

through extracting information from the supporting Web sites,‡

but automation of semantics is fairly remote. Developers of Web2.0 tools are not waiting for researchers to build the perfectontology, rather identifying either spatial properties or geometryas needed. Flickr, for example, is crowdsourcing boundary filesthrough geotagged photographs. Flickr also is building rela-tionships like nested hierarchies of place. Those relationshipsare imperfect and will never deliver on the promise of inter-operability (e.g., with historical places). There is vast interest inthe use of digital earths (e.g., Google Earth) in the humanities,such as in mashups. They have facile GUIs but lack the topology.CI development for the humanities should meld the perfect

(the authoritative, the highly specified) with the imperfect (theunstructured data, the rapidly innovating Web tool). This in-tegration will not deliver on the original promise envisioned by theNational Science Foundation or American Council of LearnedSocieties, but it will maximize humanities data resources. In theinterim, we believe one can learn much from CIs like ours. TheGIScience community will not only need to partner with the hu-manities, but must demonstrate the value of our more compli-cated tools, build easier-to-use software, and extend functionalityof digital earths.

ACKNOWLEDGMENTS. Special thanks go to Jin Xing for Web services and toJimmy Li and Rafi Ahmed for GUI development. Support for this work wasprovided by Canada’s Social Science and Humanities Research Council’s In-ternational Opportunities Fund.

‡Madhavan J, Afanasiev L, Antova L, Halevy A (2009) Harnessing the Deep Web: Presentand future. Paper presented at the Fourth Biennial Conference on Innovative DataSystems Research (CIDR), January 4–7, 2009, Asilomar, CA.

1. Atkins D, et al. (2003) Revolutionizing Science and Engineering ThroughCyberinfrastructure. Report of the National Science Foundation Blue Ribbon AdvisoryPanel on Cyberinfrastructure (National Science Foundation, Washington, DC).

2. Anselin L, Florax R, Rey S, eds (2004) Advances in Spatial Econometrics: Methodology,Tools and Applications (Springer, Berlin).

3. Unsworth J (2006) Our Cultural Commonwealth: The Report of the American Councilof Learned Societies Commission on CI for the Humanities and Social Sciences (AmerCouncil Learned Soc, New York).

4. Kintigh KW (2006) The promise and challenge of archaeological data integration. AmAntiq 71:567–578.

5. Ribes D (2007) Social studies of cyberinfrastructure: GEON as a community buildingendeavor. The Geosciences Network (GEON) Short Papers. Available at http://www.geongrid.org/publications/Geon-2.0/science-applications/Geon-42-43.pdf.

6. Buetow KH (2005) Cyberinfrastructure: Empowering a “third way” in biomedicalresearch. Science 308:821–824.

7. Hey T, Trefethen AE (2005) Cyberinfrastructure for e-Science. Science 308:817–821.8. Broughton J, Jackson GA (2008) Bamboo Planning Project. An Arts and Humanities

Community Planning Project to Develop Shared Technology Services for Research(Univ of California Press, Berkeley).

9. Gregory IN, Bennett C, Gilbam VL, Southall HR (2002) The Great Britain historical GISproject: From maps to changing human geography. Cartogr J 39:37–49.

10. Chavez RF (2000) Generating and reintegrating geospatial data. Proceedings of theFifth ACM Conference on Digital Libraries (Assoc for Computing Machinery, NewYork), pp 250–251.

11. Short H (2006) The role of humanities computing: Experiences and challenges. LitLinguist Comput 21:15–27.

12. Blackwell C, Crane G (2009) Conclusion: Cyberinfrastructure, the Scaife Digital Libraryand classics in a digital age. Digital Humanities Quarterly 3(1). Available at http://digitalhumanities.org/dhq/vol/3/1/000035/000035.html.

13. Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 284:28–37.14. Guarino N, Giaretta P (1995) Towards Very Large Knowledge Bases: Knowledge

Building and Knowledge Sharing, ed Mars N (IOS, Amsterdam), pp 25–32.15. Siber D, et al. (2007) GEONGrid portal: Enabling access to distributed resources and

problem solving environments for earth science research. The Geosciences Network

(GEON) Short Papers. Available at http://www.geongrid.org/publications/Geon-2.0/

information-technology-innovations/Geon-18-19.pdf.16. Egenhofer MJ (2002) Toward the semantic geospatial web. Proceedings of the 10th

ACM International Symposium on Advances in Geographic Information Systems

(Association for Computing Machinery, New York), pp 1–4.17. Lieberman H, Paternò F, Wulf V, eds (2006) End User Development. Human–

Computer Interaction Series (Springer, New York).18. Bowers S, Lin K, Ludascher B (2004) On integrating scientific resources through

semantic registration. Sixteenth International Conference on Scientific and Statistical

Database Management (SSDBM’04) (Institute of Electrical and Electronics Engineers,

Washington, DC), p 349.19. Friedlander A, et al. (2009) Working together or apart: Promoting the next

generation of digital scholarship. Report of a workshop cosponsored by the Council

on Library and Information Resources and the National Endowment for the

Humanities. Available at www.clir.org/pubs/reports/pub145/pub145.pdf.20. Uschold M, Gruninger M (1996) Ontologies: Principles, methods and applications.

Knowl Eng Rev 11:93–136.21. Boast R, Bravo M, Srinivasan R (2007) Return to Babel: Emergent diversity, digital

resources, and local knowledge. Inf Soc 23:395–403.22. Martin K, Lin X, Lunin L (2003) User centric design and implementation of

a digital historic costume collection. Proc Am Soc Information Sci Technol 40(1):

280–290.

Sieber et al. PNAS | April 5, 2011 | vol. 108 | no. 14 | 5509

SOCIALSC

IENCE

SSP

ECIALFEATU

RE

(PDF) Spatial cyberinfrastructures, ontologies, and the · PDF fileSpatial cyberinfrastructures, ontologies, and the humanities ... critical as the technical components for its ... humanities, - DOKUMEN.TIPS (2024)
Top Articles
Latest Posts
Article information

Author: Lidia Grady

Last Updated:

Views: 6435

Rating: 4.4 / 5 (45 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Lidia Grady

Birthday: 1992-01-22

Address: Suite 493 356 Dale Fall, New Wanda, RI 52485

Phone: +29914464387516

Job: Customer Engineer

Hobby: Cryptography, Writing, Dowsing, Stand-up comedy, Calligraphy, Web surfing, Ghost hunting

Introduction: My name is Lidia Grady, I am a thankful, fine, glamorous, lucky, lively, pleasant, shiny person who loves writing and wants to share my knowledge and understanding with you.