The Core Data Standard (CDS) Package¶
The Arches community has developed the CDS Package v1.0 as an initial data management module for Arches Server v2.0.0. The CDS Package demonstrates the structure and contents of an Arches Package, and implements the following standards:
- CIDOC Conceptual Reference Model (CRM) (www.cidoc-crm.org). The CRM provides definitions and a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation.
- International Core Data Standard for Archaeological and Architectural Heritage. This is a soon-to-be finalized standard for the inventory of both archaeological and architectural heritage, which is based on the earlier Core Data Index to Historic Buildings and Monuments of the Architectural Heritage (adopted by the Council of Europe in 1992) and the Core Data Standard for Archaeological Sites and Monuments (adopted by CIDOC in 1995). The new standard under preparation (referred to here as the “CDS”) was used to identify the data fields of the CDS. Organizations that deploy the CDS can customize those data fields to meet their specific requirements.
You may wish to familiarize yourself with these data standards as part of your the CDS installation and deployment effort. If you ever wish to further customize the CDS, familiarity with both will help you greatly along the way.
Some of the key contents of the CDS Package include:
- Resource Graphs
- Authority Documents and Thesauri
- Data Import Files
The following sections summarize these components of an Arches Package.
Arches is designed to manage cultural heritage data anywhere in the world. Needless to say, that’s an ambitious goal. After all, architecture considered culturally significant in San Francisco - a city founded in 1776 - might not merit much comment in Cairo or London.
So, how does Arches resolve this?
Arches Server requires a set of Resource Graphs that define the set of resource types that you wish to include in your inventory and the terms that you will use to describe them.
What is an Arches Resource Graph?
In Arches, the term “Resource Graph” refers to a class of cultural heritage records. Things like “Architectural Heritage”, “Investigation Activity”, and “Person” are all examples of Resource Graphs. Think of a Resource Graph as defining the attributes for a particular category of information that Arches will manage.
A Resource is simply one instance of a particular Resource Graph.
Resource Graphs are described following the CIDOC Conceptual Reference Model (CRM). The CRM is an ontology for cultural heritage information that has been developed by a the International Committee for Documentation (CIDOC) of the International Council of Museums. Arches uses the CRM because it was adopted by the International Organization for Standardization as the standard (ISO 21127:2006) for the interchange of cultural heritage information.
CDS Package Resource Graphs¶
The CDS Package defines the following Resource Graphs:
ARCHAEOLOGICAL HERITAGE (ARTIFACT).E18 ARCHAEOLOGICAL HERITAGE (SITE).E27 ARCHITECTURAL HERITAGE.E18 LANDSCAPE HERITAGE.E27 MARITIME HERITAGE.E18 INVESTIGATION.E7 MANAGEMENT.E7 DESIGNATION AND PROTECTION.E7 HISTORICAL EVENT.E5 DOCUMENT.E31 IMAGE.E38 PERSON.E21 ORGANIZATION.E74
Heritage Resources: Heritage Resources are archaeological, built, landscape, or other immovable cultural heritage. In the CDS, Heritage Resources include:
Archaeological Heritage (element) : a single archaeological entity that could stand alone or be an element of a larger archaeological group (e.g., a bath house within a Roman villa)
Archaeological Heritage (site) : an area of archaeological potential or an area of known or discovered archaeological elements
What’s the difference between archaeological elements and sites?
While conceptually these categories overlap, the CDS differentiates between the two because of the way that they are represented using the CIDOC CRM.
Architectural Heritage : culturally significant buildings, structures, and groups thereof
Landscape Heritage : areas of land designed and created intentionally by man, such as garden and parkland landscapes constructed for aesthetic reasons, organically evolved areas of land resulting from an initial social, economic, administrative, and/or religious imperative and that has developed its present form by association with and in response to its natural environment, or areas of land that are culturally significant due to powerful religious, artistic or cultural associations of the natural element rather than material cultural evidence, which may be insignificant or even absent (UNESCO)
Maritime Heritage : underwater heritage (both under sea and inland), which may include heritage inundated by sea level rise and dam construction, shipwrecks and aircraft, as well as heritage afloat (e.g., ships, sailing vessels)
Activities Activities are events or actions that may take place during a given time span and at a location or area. In the CDS, Activities include:
Investigation Activity : an activity undertaken with the explicit intention of gathering information about, and understanding of, a Heritage Resource, and the creation of an information source to record that information and understanding
Management Activity : an activity undertaken to prevent damage to, promote the survival of, and promote the understanding and appreciation of Heritage Resources.
Designation and Protection Activity : an activity which implements or revokes statutory and non-statutory designation and protection regimes which may apply to Heritage Resources
Historical Event : any activity that took place in the past, including both human and natural events
Documents Documents are information carriers such as books, texts, periodicals, inscriptions, audio files, video files, 3-D models, or images. In the CDS, Documents include:
Document : an information carrier, other than an Image, whether physical or digital, eg. books, maps, pdfs, word-processed documents
Image : an information carrier that represent an external form, whether physical or digital, eg. photographs, slides, drawings, jpegs or tiff files.
Actors Actors refer to individuals or groups of people. In the CDS, Actors include:
Person : real persons (i.e., who live or are assumed to have lived)
Organization : A group or legally identifiable body
Data Import Files¶
As you’ve seen, the CDS package comes with the following 13 Resource Types:
- Heritage Resources
- Archaeological Heritage (element)
- Archaeological Heritage (site)
- Architectural Heritage
- Landscape Heritage
- Maritime Heritage
- Investigation activity
- Management activity
- Designation and protection activity
- Historical event
Each of these Resource Types has a set of attributes. Attributes are really just pieces of information about each resource. For example, one of the Attributes of the Archaeological Heritage (Element) resource is its cultural period. In the Arches user interface, Attributes are organized into Information Themes. See the companion Arches User Guide for more information on the user interface.
Some attributes are common to many Resource Types. “Name” is a good example, which is common to all Resource Types. Every resource can have a name associated with it. Most Resource Types have quite a few attributes.
Why do I need to know about attributes?
If you already have heritage data in a database or spreadsheet, you can build a file that will let you import your data into Arches. And to do this, you need to know exactly which attributes each resource has.
To see exactly which attributes are associated with a resource, open the this file within the CDS repo:
Here are the first few records of CDS attributes.csv:
Each line in this file simply states that an Arches resource has an attribute that is associated with it. Notice the three columns:
ResourceType: The Arches Resource Type of the Resource AttributeName: The attribute associated with the Resource Type AuthorityDocument: If this column is populated, then the attribute value must be a unique identifier found in the authority file that is named in this column.
You can think of the Resource Attributes.txt file as a listing of the set of data that you can load into Arches.
What’s up with all the ”.E Numbers”?
Resources, attributes, and all other entities in Arches are instances of CIDOC CRM classes. The CRM uses a “E.xx” naming style to define its classes, so we append the CRM class identifier to our Arches entities so that its clear what each Arches entity actually represents.
Creating a Data Import File¶
The CDS comes with a file that you can use to load information into Arches. The is located at:
The package reads the content of resource_info.csv and loads it into a database. If you open this file from the CDS package, you’ll see that it already contains data for a set of heritage resources. You’ll need to replace the information in this file with your own data if you want to automatically populate Arches with data.
By the way, don’t let the file name fool you, “resource_info.csv” is a | delimited file (a “pipe” delimited file) that illustrates the structure of a valid Arches import file.
Why use a “|” instead of commas?
The Arches import file allows you to include text blocks (so that you can import free text). Text often includes commas, but rarely includes a |. So we use the | character to distinguish columns in the resource_info.csv file.
Here are some representative records from a load file:
And here’s how to understand what’s going on: Each line in the file defines a resource and an attribute to load into Arches. Each of the column headers RESOURCEID, RESOURCETYPE, ATTRIBUTENAME, ATTRIBUTEVALUE, and GROUPID define the content of the file. Most are self-explanatory, with the exception of the “GROUPID” field (which we’ll look at in more detail later).
Let’s look at line 1:
15897|ARCHITECTURAL HERITAGE.E18|COMPILER.E82|ROD FITZGERALD|COMPILER.E82-0
This record tells Arches that:
- We’re going to load information about an “ARCHITECTURAL HERITAGE.E18” resource (RESOURCETYPE)
- Our resource can be uniquely identified by the string “15897” (RESOURCEID)
- We’re going to load a value for the “COMPILER.E82” attribute (ATTRIBUTENAME)
- The value of the Compiler.E82 attribute is “ROD FITZGERALD” (ATTRIBUTEVALUE)
- This record is part of a group of records identified as “COMPILER.E82-0” (GROUPID)
Notice that we can import lots of information about a single resource simply by referencing the same RESOURCEID and RESOURCETYPE.
Adding Multiple Values to an Attribute¶
Why all this formalism and complexity?
Because many cultural heritage objects have more than value for a given attribute. For example, a single resource can often have many names. Or the characteristics of a resource may change over time. Without allowing for multiple values for an attribute, Arches wouldn’t be able to track the evolution of a resource.
So, the Arches data import file structure is important because many resource attributes may have multiple values. Indeed, an Architectural Heritage resource may be associated with several cultural periods, have many addresses, and have several protection grades.
That’s why Arches allows you load many attribute values for a single resource attribute.
Notice in the file shown that we can add several compilers into Arches for the same resource. In fact, lines 3, 4, and 5 also define compilation records. In general, Arches will allow you to add multiple values for a given attribute.
At present, Arches’ user interface can show multiple values for all entities except for the Summary Description, Distinguishing Features, and Location Description entities.
Now look at lines 8 through 11 in the example file.
Individually, each row assigns a value for a specific attribute. The FROM DATE.E49 and TO DATE.E49 attributes are years. Line 8 seems to be describing a Cultural Period, and Line 10 the Architectural Resource type.
But notice that each record in rows 8 through 11 all share the same “PHASE TYPE ASSIGNMENT.E17-0” GROUPID. This means that these 4 records together describe a resource.
By grouping these rows together with the same GROUPID, you can tell Arches that these 4 records are not independent; rather together they describe the resource.
Here’s how to interpret what’s going on. Each of the records starts with:
This means that each record contains information about the Architectural Heritage.E18 Resource with the unique identifier of 15897. Okay, let’s focus on what comes next:
8 CULTURAL PERIOD.E55|PERIOD_UID:28|PHASE TYPE ASSIGNMENT.E17-0 9 FROM DATE.E49|1066|PHASE TYPE ASSIGNMENT.E17-0 10 ARCHITECTURAL HERITAGE TYPE.E55|AH:THE_TE_UID:68841|PHASE TYPE ASSIGNMENT.E17-0 11 TO DATE.E49|1540|PHASE TYPE ASSIGNMENT.E17-0
Look at line 8. Its ATTRIBUTENAME is “CULTURAL PERIOD.E55”, and the value of this attribute is “PERIOD_UID:28”. Recall from the Chapter “Step 3: Load Controlled Vocabularies” that we use an Authority Document for Cultural Periods. So, the ATTRIBUTEVALUE “PERIOD_UID:28” in line 8 is a pointer to a concept in CULTURAL PERIOD AUTHORITY DOCUMENT.csv (which works out to “Medieval”).
OK, now look at line 9. Its ATTRIBUTENAME is “FROM DATE.E49” and its ATTRIBUTEVALUE is “1066”. Simple enough: “from date” is the year 1066.
Line 10 tells Arches to use the unique identifier “AH:THE_TE_UID:68841” in ARCHITECTURAL HERITAGE TYPE AUTHORITY DOCUMENT.csv to define the value for “ARCHITECTURAL HERITAGE TYPE.E55” (a “Motte”), and line 11 says that the resource has a “TO DATE.E49” value of “1540”.
And here’s the payoff: by giving these 4 records the same GROUPID, we are telling Arches that:
“The ARCHITECTURAL HERITAGE.E18 resource with unique identifier of “15897” was a “Motte” during the “Medieval” cultural period, specifically between the years 1066 and 1540.”
Recall that Arches will support many combinations of cultural period, heritage type, and from/to dates for a single resource.
Now, we can understand that the purpose of the GROUPID is to allow for grouping a set of related attributes into a coherent record.
The actual values that you use for GROUPID are arbitrary, just make sure that you use a unique value for each set of attributes that you want to group.
One last note on GROUPID and resource_info: notice that in this resource_info file records with the same GROUPID are grouped together(located on adjacent lines). This grouping within the resource_info file is necessary to ensure your grouped attributes are not duplicated upon import into Arches.
By design, Arches can accommodate a wide variety of data. The only real data requirement is this: If you want to name a resource, you must provide one name with a NAME TYPE.E55 of “Primary.” Because you can give a resource as many names as you want, Arches uses the “Primary” designation to identify the preferred name of the resource.
Which Attributes can be Grouped?¶
Not all attributes can (or should) be grouped together. Based on the Core Data Standard for Archaeological and Architectural Heritage (introduced in the “Deploying Arches” chapter), Arches allows for the following attributes to be grouped:
- NAME.E41 and NAME TYPE.E55
- SPATIAL COORDINATES_GEOMETRY.E47 and SPATIAL COORDINATES QUALIFIER
- Any combination of attributes starting with ADRESS_
- Any combination of PERIOD AUTHORITY DOCUMENT, FROM DATE.E49, TO DATE.E49, and HERITAGE RESOURCE TYPE AUTHORITY DOCUMENT (e.g.: Archaeological Heritage (Element), Archaeological Heritage (Site), Architectural Heritage, Landscape Heritage, Maritime Heritgae)
Loading data into Arches requires three things. Those are:
Validly Structured and Populated Resource Graphs
for defining the resources, their attributes, and how they are related to each other
Validly Structured Authority Files and ENTITY_TYPE_X_ADOC file
for defining the values that can be associated with attributes constrained by controlled vocbularies
A resource_info.csv file with content that adheres to the constraints imposed by the resource graphs and the authority files.
resource_info contains the business data that is to be imported to the Arches database.
Once you’ve prepared your data load file, you may insert its contents into Arches. Be sure to confirm that you are referencing your Authority Documents properly (e.g.: the unique ID you use in your load file should also exist in the appropriate Authority Document).
Arches will load and index each of the cultural heritage data records that you’ve defined in your data file. Don’t worry if it takes a few attempts to get your data to load cleanly. You can always re-run the Arches build scripts to get a pristine instance of Arches.