NHGIS Project

From MPC Wiki MARKDOWN TEST SITE
Jump to: navigation, search

Testing.

Introduction[edit | edit source]

The National Historical Geographic Information System (NHGIS) provides access to summary statistics, harmonized time series tables, and GIS boundary files for U.S. censuses and other nationwide surveys from 1790 through the present.  Through the NHGIS Data Finder, users may filter and sort through source tables, time series tables, and boundary files and select and download multiple source tables, time series tables, and boundary files, for different geographic levels and from different years, all in one request.

NHGIS supplies data designed for use in spreadsheet applications (e.g., Microsoft Excel), statistical software (e.g., Stata, SPSS, SAS, R), or GIS applications (e.g., Esri ArcGIS). It does not provide tools for data analysis, mapping, or reporting.  Most NHGIS data cover all areas in the United States. Data for smaller geography units (e.g. census blocks, census block groups) are available for individual states.  NHGIS is funded by grants from the National Science Foundation and the Eunice Kennedy Shriver National Institute of Child Health and Human Development.

Key Personnel[edit | edit source]

  • Product Owners - David Van Riper, Jonathan Schroeder
  • Tech Owners - Kevin Horne is the IT Lead.
  • Domain Experts - Kevin Horne, Jake Wellington

Architectural Overview[edit | edit source]

Applications[edit | edit source]

You can view the code documentation in html form here: https://pages.github.umn.edu/mpc/nhgis-webapp/. Test.




Architecture with split app and spark: Architecture.pdf

Environments[edit | edit source]

  • sandbox
    • Deploy to sandbox at any time.
    • There is NO expectation that the sandbox environment meets any particular production requirement.
    • Use this environment to test concepts.
  • demo
    • Deploy to demo to show new features.
    • The demo environment should never be assumed to be stable in its entirety, unless otherwise specified.
  • internal
    • Deploy to internal when a new release is about to be deployed to live.
    • After a release to live, internal should mirror live and is expected to be functioning properly.
    • Typically, deploying to internal is covered within the Release Checklist (see below).
  • staging
    • Deploy to staging when ....
    • Typically, deploying to staging is covered within the Release Checklist (see below).
  • live
    • Deploy to live when a new release is absolutely ready to be deployed for public use.
    • After any release to live, it is expected to be functioning properly.
    • Typically, deploying to live is covered within the Release Checklist (see below).



Test Suites (Jenkins)[edit | edit source]

The goal is that each NHGIS repository have its own set of Jenkins builds. Some Jenkins builds (see below) produce pages with details on the current level of test coverage.




Metadata[edit | edit source]

Metadata are data maintained by researchers and used by the web application to facilitate extracts. The metadata serve the web application and they also serve to assist researchers in developing time series.

Datasets[edit | edit source]

Datasets and Data Tables are loaded into the metadata via the Source Dataset Ingest process. Datasets typically originate from an external source (i.e. external to the MPC). Most sources are from the US Census Bureau (e.g. Census of Population and Housing 2010, American Community Survey 2018-2012). Metadata for datasets exist for the years/decades beginning in 1790 to the present day. Most datasets provide state- and/or county-level data. Many modern-day (i.e. 1970-present) provide mutiple geography levels.

Data for 'Source' extracts are derived from data files. See the Source Dataset Ingest process that inventories the data files into the metadata.

Shapefiles[edit | edit source]

Shapefiles are maintained by the MPC's Spatial Core and loaded into .zip files to which the NHGIS extract engine has access. When a user selects a shapefile via NHGIS, they are currently selecting one of these zip files containing shapefiles. Shapefile zip files are included within the metadata via the Shapefile Ingest process. There are plans to provide users data from a database in lieu of shapefiles. That project will begin in 2015.

Shapefiles for 'Shapefile' extracts are derived from zip files. See the Shapefile Ingest process that inventories the zip files into the metadata.

Time Series[edit | edit source]

Time Series are data producted developed by researchers within the MPC. Researchers have inventoried and catalogued dataset attributes called aggregate data variables, or agg data vars. One key aspect to time series is the manner in which the data are linked across time.

When initially released, time series used a form of "nominal" integration whereby a location's are linked by codes and/or names. For example, the state of Virginia in 1790 is linked to the state of Virginia in all subsequent decades via a code. This form of integration does not take into account the changes in a location's size. It is incumbant upon the users of nominally-linked time series to be aware of the change that occurred in Virginia in 1863 when West Virginia was carved out as a new state. Typically, the values between years will represent such a drastic change. However, boundary changes happen all the time: e.g. cities getting larger via annexation, new counties carved from older counties. Sometimes the changes in area are not immediately obvious.

Data for 'Nominally-Integrated Time Series' extracts are derived from the source data files. There is no ingest process for nominally-integrated time series.

Time series are being expanded to now provided spatially-integrated time series whereby one particular time period is selected as the base time (e.g. 2010), and then the data from other years are provided using the 2010 geography codes and boundaries. Data for these extracts must be prepared in advance: standardized data files. A base year is selected (e.g. 2010) and then data for other years are adjusted (interpolated) to correspond to the standard year's boundaries [see the Standardized Data File Production process.]

Data for 'Spatially-Integrated Time Series' extracts are derived from standardized data files. See the Standardized Data File Ingest process that inventories the standardized data files into the metadata.

Geographies[edit | edit source]


How geographies work: NHGISGeographies.pdf

Files[edit | edit source]

Parquet Files[edit | edit source]

Parquet files contain the complete set of HDSF tabular data (accessible to Spark) for distribution to users via extracts.

Data Files[edit | edit source]

Data files contain the complete set of fixed width tabular data distributed to users via extracts. development : serves both the demo website and the internal website

  • /pkg/ipums/istads/nhgis_project/conversion/06_convert_standardized_nhgis/outdata
  • /pkg/ipums/istads/data/current/aggdata -> /pkg/ipums/istads/nhgis_project/conversion/06_convert_standardized_nhgis/outdata
  • /pkg/ipums/istads/data/aggdata -> current/aggdata


production : serves both the staging website and the live website

  • [ee.nhgis.org] /web/data/current/agg_data
  • [ee.nhgis.org] /web/data/pre_deploy/agg_data [used as a staging area when deploying new/revised data files]
  • [ee.nhgis.org] /web/data/archive_data/agg_data [used to archive previous versions of revised data files]



Boundary Files[edit | edit source]

Boundary files are .zip files (containing shapefiles) distributed to users via extracts. serves both the demo website and the internal website

  • /pkg/ipums/istads/data/current/shape
  • /pkg/ipums/istads/data/shape -> current/shape


production : serves both the staging website and the live website

  • [ee.nhgis.org] /web/data/current/shape
  • [ee.nhgis.org] /web/data/pre_deploy/shape [used as a staging area when deploying new/revised boundary files]
  • [ee.nhgis.org] /web/data/archive_data/shape [used to archive previous versions of revised boundary files]



Database (SQL) Dumps[edit | edit source]

  • /pkg/ipums/istads/db/metadata_source_sql
    • dev_work
      • build_db
        • dumps from istads_build during IT-related tasks
      • demo_db
        • symlinks to dumps (in websites/deployed) deployed to demo website
      • test_db
        • dumps of test metadata databases
    • mdt_researcher_db
      • to_deploy
        • target location for MDT when dumping istads_metadata_researcher
      • weekly_cron
        • target directory for CRON job that dumps istads_metadata_researcher on a weekly basis
    • web_sites
      • deployed
        • physical SQL dump files deployed to any website: demo, internal, staging, or live
      • internal_db
        • symlinks to dumps (in deployed) deployed to internal website
      • live_db
        • symlinks to dumps (in deployed) deployed to live website
      • staging_db
        • symlinks to dumps (in deployed) deployed to staging website


  • /pkg/ipums/istads/db/extract_source_sql
    • istads_extracts_live



Scripts: Manage Metadata via SQL[edit | edit source]

Maintaining metadata schema

  • /pkg/ipums/istads/db/schemata
    • extracts
      • Contains a diagram of the current extracts database (has not been kept current!)
    • metadata
      • Contains folders and files relating to managing the contents of the metadata database
      • deploy_00_populate_convenience_tables
        • Contains scripts used to produce a metadata database that can be deployed to an application environment: demo, internal, staging, and live
      • deploy_01_to_demo
        • Contains scripts deploying a prepared metadata database to the demo applications (Web UI and Extract Engine)
      • deploy_02_to_internal
        • Contains scripts deploying a prepared metadata database to the internal applications (Web UI and Extract Engine)
      • deploy_03_to_staging
        • Contains scripts deploying a prepared metadata database to the staging applications (Web UI and Extract Engine)
      • deploy_04_to_live
        • Contains scripts deploying a prepared metadata database to the live applications (Web UI and Extract Engine)
      • deploy_05_pseudo_live
        • Contains scripts dumping the production extract and metadata databases and re-deploying them to demo-db databases: istads_extracts_pseudo_live and istads_metadata_pseudo_live
      • test_db
        • Contains a folder for each test database
        • v0047_018917_72_0001
          • Contains scripts used to generate a test database (v47) from an existing metadata database (v72.1.18917)
        • v0048_020730_73_0000
          • Contains scripts used to generate a test database (v48) from an existing metadata database (v73.0.20730)
      • v74
        • A folder corresponding to a schema version containing:
          1. a folder for each IT update: the first update is the schema change; subsequent updates are changes to content within the given schema
          2. a Visio document and its corresponding PDF document providing a visual representation of the current schema
          3. an empty SQL dump file for the current schema
        • v74_00_021758_add_fips_code_to_integ_geog_name
          • Contains scripts needed to prepare the updates for the specified version
          • sql
            • Contains:
              1. a YML file used to generate the sanity check SQL files (both pre- and post-)
              2. SQL files used to update the database: typically there are at least three SQL files; one for pre-sanity check, one to implement the change, and another for a post-sanity check



Scripts: Ingest[edit | edit source]

Creating data files & ingesting datasets

  • /pkg/ipums/istads/ingest
    • Contains the key dataset groups ingested since the deployment of the ruby-based NHGIS web application
    • NOTE: It is not possible to include the entire ingest directory hiearchy in this document. Therefore, several key directories are included along with several dataset examples.
    • NOTE: The directory structure evolves with each new ingest: the most recent ingests (e.g. ACS 20155, Haines 1954) can be used as templates for future ingest efforts.
    • acs
      • Contains a directory for each ACS dataset ingested: e.g. acs20145 represents ACS 2014 5-Year
      • acs20145
        • Contains base directories for ingesting the dataset(s) related to the Census Bureau's ACS 2014 5-Year dataset
        • 01_prep_metadata
          • Contains worksteps and data used by researchers to prepare metadata delivered to IT; the contents of this directory are at the complete discretion of researchers
        • 02_load_metadata
          • Contains worksteps used by IT to load metadata into the metadata database
        • 03_prep_data_files
          • Contains worksteps used by IT to build the data files that will eventually be deployed to the agg_data directory and accessed by the extract engine
        • IT_api
          • Contains input/output for comparing NHGIS extract results with those from the Census Bureau's API
        • IT_reports
          • Contains reports generated from various steps in the ingest process
        • IT_specs
          • Contains files prepared by IT for use as input to one or more ingest steps
        • TDR_specs
          • Contains files prepared by Researchers for use as input to one or more ingest steps
      • acs20155
        • Contains base directories for ingesting the dataset(s) related to the Census Bureau's ACS 2015 5-Year dataset
        • 01_prep_metadata
          • Contains worksteps and data used by researchers to prepare metadata delivered to IT; the contents of this directory are at the complete discretion of researchers
        • 02_load_metadata_phase1
          • Contains worksteps used by IT to load metadata into the metadata database where the metadata do not require knowledge of the data files
        • 03_prep_data_files
          • Contains worksteps used by IT to build the data files that will eventually be deployed to the agg_data directory and accessed by the extract engine
        • 04_load_metadata_phase2
          • Contains worksteps used by IT to load metadata into the metadata database where the metadata are related to the data files constructed in the previous step
        • IT_reports
          • Contains reports generated from various steps in the ingest process
        • IT_specs
          • Contains files prepared by IT for use as input to one or more ingest steps
        • TDR_specs
          • Contains files prepared by Researchers for use as input to one or more ingest steps
    • cbp
      • Contains a directory for each County Business Pattern dataset ingested: so far we've only generated the data files for CBP_1998_2003_US_ST since releasing NHGIS as a rails application
      • cbp_1998_2003_us_st
        • Contains base directories for rebuilding data files for dataset CBP_1998_2003_US_ST
        • 03_prep_data_files
          • Contains worksteps used by IT to build the data files that will eventually be deployed to the agg_data directory and accessed by the extract engine
    • census_2010
      • Contains directories for each 2010 Decennial data product by the US Census Bureau
      • NOTE: do not use these directories as a template or guide for any future ingest
      • pl_94-171
      • sf1
      • sf2
    • haines_ag_census
      • 1945
        • 01_prep_metadata
          • Contains worksteps and data used by researchers to prepare metadata delivered to IT; the contents of this directory are at the complete discretion of researchers
        • 02_load_metadata
          • Contains worksteps used by IT to load metadata into the metadata database
        • 03_prep_data_files
          • Contains worksteps used by IT to build the data files that will eventually be deployed to the agg_data directory and accessed by the extract engine
        • IT_reports
          • Contains reports generated from various steps in the ingest process
        • IT_specs
          • Contains files prepared by IT for use as input to one or more ingest steps
        • TDR_specs
          • Contains files prepared by Researchers for use as input to one or more ingest steps
      • 1950
        • 01_prep_metadata
          • Contains worksteps and data used by researchers to prepare metadata delivered to IT; the contents of this directory are at the complete discretion of researchers
        • 02_load_metadata
          • Contains worksteps used by IT to load metadata into the metadata database
        • 03_prep_data_files
          • Contains worksteps used by IT to build the data files that will eventually be deployed to the agg_data directory and accessed by the extract engine
        • IT_reports
          • Contains reports generated from various steps in the ingest process
        • IT_specs
          • Contains files prepared by IT for use as input to one or more ingest steps
        • TDR_specs
          • Contains files prepared by Researchers for use as input to one or more ingest steps



Databases[edit | edit source]

MySQL[edit | edit source]


demo-db.pop.umn.edu


Jenkins Databases: Metadata
  • managed by IT and loaded from SQL dump files
Build Databases Applications
live nhgis_ci_live_ee_metadata_test
nhgis_ci_live_web_metadata_test
Extract Engine
Web UI
master nhgis_ci_master_ee_metadata_test
nhgis_ci_master_ing_metadata_test
nhgis_ci_master_web_metadata_test
Extract Engine
Data Mgmt
Web UI, Metadata Gem, & Extract Var Gem
pullreq nhgis_ci_pullreq_ee_metadata_test
nhgis_ci_pullreq_ing_metadata_test
nhgis_ci_pullreq_web_metadata_test
Extract Engine
Data Mgmt
Web UI, Metadata Gem, & Extract Var Gem


Jenkins Databases: Rails
  • managed by Rails applications via Jenkins
Build Databases Applications
live nhgis_ci_live_ee_extracts_test
nhgis_ci_live_web_test
Extract Engine
Web UI
master nhgis_ci_master_ee_extracts_test
nhgis_ci_master_ing_test
nhgis_ci_master_web_extracts_test
nhgis_ci_master_web_test
Extract Engine
Data Mgmt
Metadata Gem & Extract Var Gem
Web UI
pullreq nhgis_ci_pullreq_ee_extracts_test
nhgis_ci_pullreq_ing_test
nhgis_ci_pullreq_web_extracts_test
nhgis_ci_pullreq_web_test
Extract Engine
Data Mgmt
Metadata Gem & Extract Var Gem
Web UI


Metadata Databases (non-application specific)
Database Type Managed by Loaded from Used by Applications
istads_build metadata (work) IT SQL dumps IT for ingest and preparing new metadata databases for application deployment (i.e. demo, internal, staging, live) Data Management
istads_metadata_researcher metadata (canonical) Researchers (& IT as needed) MDT as the canonical metadata database; its convenience tables are empty and requires processing before being deployed to an application MDT (& Data Management)


Application Databases
Database Type Managed by Loaded from Used by Applications
istads_metadata_demo
istads_metadata_demo_alt
metadata IT SQL dumps demo applications; only one demo metadata database is active; the other is inactive and available for reloading demo Web UI & demo Extract Engine
istads_metadata_internal
istads_metadata_internal_alt
metadata IT SQL dumps internal applications; only one internal metadata database is active; the other is inactive and available for reloading internal Web UI & internal Extract Engine
istads_metadata_sandbox metadata IT SQL Dumps sandbox applications sandbox Web UI & sandbox Extract Engine
nhgis_extracts_demo rails Rails App Rails App demo Extract Engine demo Extract Engine
nhgis_extracts_internal rails Rails App Rails App internal Extract Engine internal Extract Engine
nhgis_web_demo rails Rails App Rails App demo Web UI demo Web UI
nhgis_web_internal rails Rails App Rails App internal Web UI internal Web UI



prod-db.pop.umn.edu
Application Databases
Database Type Managed by Loaded from Used by Applications
istads_metadata_staging
istads_metadata_staging_alt
metadata IT SQL dumps staging applications; only one staging metadata database is active; the other is inactive and available for reloading staging Web UI & staging Extract Engine
istads_metadata_live
istads_metadata_live_alt
metadata IT SQL dumps live applications; only one live metadata database is active; the other is inactive and available for reloading live Web UI & live Extract Engine
nhgis_extracts_staging rails Rails App Rails App staging Extract Engine staging Extract Engine
nhgis_extracts_live rails Rails App Rails App live Extract Engine live Extract Engine
nhgis_web_staging rails Rails App Rails App staging Web UI staging Web UI
nhgis_web_live rails Rails App Rails App live Web UI live Web UI



PostgreSQL[edit | edit source]


pg-demo-0.pop.umn.edu
Jenkins Databases: Rails
  • managed by Rails applications via Jenkins
Build Database Applications
live nhgis_ci: tractor_ci_live_test Tractor & Extract Engine
master nhgis_ci: tractor_ci_master_test Tractor & Extract Engine
pullreq nhgis_ci: tractor_ci_pullreq_test Tractor & Extract Engine


Application Databases
Database Name Type Managed by Loaded from Used by Applications
tractor_jobs_demo rails Rails App (Tractor) Rails App (Tractor) demo Tractor
demo Extract Engine
Tractor
Extract Engine


pg-internal-0.pop.umn.edu
Application Databases
Database Name Type Managed by Loaded from Used by Applications
tractor_jobs_internal rails Rails App (Tractor) Rails App (Tractor) internal Tractor
internal Extract Engine
Tractor
Extract Engine


pg-staging-0.pop.umn.edu
Application Databases
Database Name Type Managed by Loaded from Used by Applications
tractor_jobs_staging rails Rails App (Tractor) Rails App (Tractor) staging Tractor
staging Extract Engine
Tractor
Extract Engine


pg-live-0.pop.umn.edu
Application Databases
Database Name Type Managed by Loaded from Used by Applications
tractor_jobs_live rails Rails App (Tractor) Rails App (Tractor) live Tractor
live Extract Engine
Tractor
Extract Engine



Applications & Gems[edit | edit source]

Web UI (APP)[edit | edit source]

The NHGIS Web UI is the webapp front-end for NHGIS. This web application provides users with the ability to review and filter metadata, to make selections into a data cart, and to submit, review, and download extracts.


  • Repository (Github) : location of project code itself
    • Ruby on Rails
    • development (branch): master
    • release (branch): rx.x
    • deployment (tag): vx.x.x



  • Common Tasks


Task More Information
Setup
Getting System Status
  • How to tell if things are normal or if not, what's broken.
    • Check the webapp: data2.nhgis.org/main
    • Submit an extract
    • Make sure the extract completes
Restart Restart server
Deployment Deployment



Extract Engine (APP)[edit | edit source]

The NHGIS Extract Engine is the application on the back-end of NHGIS. This application prepares the files (e.g. codebook, syntax files, shapefiles, tabular data files) for an extract request job.


  • Repository (Github) : location of project code itself
    • Ruby on Rails
    • development (branch): master
    • release (branch): rx.x
    • deployment (tag): vx.x.x



  • Common Tasks


Task More Information
Setup
Getting System Status
  • How to tell if things are normal or if not, what's broken.
    • Check the webapp: data2.nhgis.org/main
    • Submit an extract
    • Make sure the extract completes
Restart Stop/Start/Restart
Deployment Deployment (LEGACY: See NHGIS capistrano)



Metadata Toolkit (MDT)[edit | edit source]

The NHGIS Metadata Toolkit (MDT) is the application that allows researchers to update the NHGIS metadata database.

Common Tasks[edit | edit source]



Metadata GEM[edit | edit source]

The NHGIS Metadata gem is a Ruby gem that is used in other NHGIS projects. The classes defined within this gem are linked to the NHGIS metadata database.

Setup[edit | edit source]



Extract Var GEM[edit | edit source]

The NHGIS Extract Var gem is a Ruby gem that is used in other NHGIS projects. The classes defined within this gem manage commonly used extract-related objects.

Setup[edit | edit source]



Data Management with database access[edit | edit source]

NHGIS Data Management with database access is a collection of rake tasks, classes, and methods used to manage the NHGIS metadata database.

Setup[edit | edit source]



Data Management without database access[edit | edit source]

NHGIS Data Management without database access is a collection of rake tasks, classes, and methods used to 1) generate and manage NHGIS data files and 2) generate SQL for updating the metadata database. These are activities that do not require any access to the metadata database.

Setup[edit | edit source]



Data Management: Standardized Datafiles (Interpolation)[edit | edit source]

NHGIS Standardized Datafiles (Interpolation) is a collection of python-based tasks, classes, and methods used to generate standardized data files derived from spacially-integrated time series.



Data Management: Parquet (NEW)[edit | edit source]

NHGIS Parquet is a collection of python-based tasks, classes, and methods used to generate parquet data files.



Metadata SQL: Updates, Deployments, Test Databases[edit | edit source]

NHGIS Metadata SQL contains the scripts and SQL used to modify the NHGIS metadata database when the NHGIS Metadata Toolkit (MDT) could not be used to make the changes. Such changes include schema modfications and ad-hoc content changes.

  • Repository (Github) : location of project code itself
    • SQL, scripts, other
    • development (branch): master
    • production (branch): master
  • Jenkins
    • none



Data Management[edit | edit source]

Ingest[edit | edit source]

  • Datasets
    • data management repository
      • provides ingest behavior requiring access to the metadata database.
      • /pkg/ipums/istads/ingest/git/nhgis_data_mgmt
    • data management (no database) repository
      • provides ingest behavior requiring no access to the metadata database.
      • /pkg/ipums/istads/ingest/git/nhgis_ingest
    • schemata repository
      • contains steps and processes which update the metadata database (e.g. v73_06_021756_ingest_ACS2015_5).
      • Also see Scripts: Ingest for more information on the directory structure
      • **NOTE: this repositoy contains a README document referring to the appropriate ingest directory (e.g. v64_01_018498_ingest_ACS2013_3/README.txt)
    • NHGIS dataset/data_file Ingest Process
    • The source dataset ingest directories contain the steps and configurations for ingesting a source dataset.
    • Typically, each step has an input and an output.
    • These directories identify a workflow that was followed to complete a particular data item.
    • The steps have changed over time as the software has changed.



  • Standardized Datasets
    • NHGIS Standardized Time Series (preparing metadata databases and constructing standardized data files)
    • The data management repository contains behavior for ingesting standardized datasets
    • The schemata repository contains steps and processes which update the metadata database for standardized data files
      • v70_02_018854_add_standardized_datasets_65_tst
      • v73_01_021514_add_standardized_time_series_tables_rls_7
    • The standardized dataset ingest directories contain researcher requirements and configurations for preparing the standardized data files.
      • NOTE: the output directory must be the 7th sub-directory of /pkg (i.e. /pkg/1/2/3/4/5/6/output) so that the backup process ignores these files.
      • /pkg/ipums/istads/ingest/STANDARDIZED_DATASETS/
        • [standardized dataset]/
          • data_files/
            • configurations/
            • output/
              • [standardized dataset data directory]/
                • [year]/
                  • [geog_level]/
                    • file.dat
                    • file_l.dat
                    • file_u.dat
          • TDR_specs/



Metadata (via SQL)[edit | edit source]

Make changes and dump them to a SQL file see NHGIS build metadata database

Others as Needed[edit | edit source]

What other common tasks does your project have?

Deployments[edit | edit source]

Release Checklists[edit | edit source]

NHGIS Release Checklists are Google Doc spreadsheets that drive the release process when deploying the NHGIS web app, the NHGIS extract engine, NHGIS data files, NHGIS shapefiles, or a NHGIS metadata database to production. Whenever a new release is planned, then a new release checklist is drafted.IT typically prepares a draft document and then meets with researchers to finalize the document.

  • There are several types of release checklists:
    • Webapp: release checklists for deployments of the NHGIS web application (UI) or the NHGIS extract engine (EE)
    • Data/Metadata: release checklists for deployments of metadata databases and/or data files and/or shapefiles
    • Shapefiles Only: release checklists for deployments of shapefiles only (with no metadata database updates)


  • Each release checklist contains three sections:  
    • Review: describes all of the changes in the release along with all of the review activities required by any/all parties prior to authorization for final release.
    • Deploy: describes all of the steps taken by IT to deploy the release to production.
    • Followup: describes all of the steps to be taken once the release to production has been completed.



  1. Create a new Google Doc Spreadsheet by copying the most recent deployment found in NHGIS Release Checklists/Completed
    • Assumption: You have access to the release checklist folders in Google Docs: MPC IT Shared Google Drive.
    • Names
      • e.g. Data/Metadata - v99.99.99999 -- [description]
      • e.g. Extract Engine - X.Y.Z -- [description]
      • e.g. WebApp - X.Y.Z -- [description]
    • Review Page
      • Clear out the first four columns; these will be adjusted during the Release Checklist meeting.
      • Include the corresponding Redmine Tickets
    • Deploy Page
    • Followup Page
      • Clear out the first three columns; these will be adjusted during the Release Checklist meeting.
  2. Conduct the Release Checklist meeting
    • For the Review and Followup Pages; assign specific actions to persons with specific target dates.
    • The people assigned should be at the meeting and understand what it is that they are supposed to do.
  3. Deploy the application following the instructions set forth in the Deploy Page.
  4. Follow up as prescribed in the Follow-up Page.
  5. Move the completed checklist to the Completed directory at the next Bi-Weekly Workgroup meeting.



Metadata[edit | edit source]

Prepare wcon.db[edit | edit source]

A metadata database must be prepared before it can be deployed to any web app/extract engine environment.

  1. Determine base SQL dump file from which a webapp/extract-engine-ready metadata database will be prepared:
    • e.g. metadata_source_sql/dev_work/build_db/v72_02_018917_refine_geog_labels_innodb.sql.gz
    • e.g. metadata_source_sql/mdt_researcher_db/to_deploy/v72_01_018962_istads_metadata_researcher.sql.gz
  2. Prepare, Add, Commit, and Push a new populate file to the nhgis-schemata repository:
    • e.g. metadata/deploy_00_populate_convenience_tables/v72_02_018917_pop_conv_tables.txt
    • NOTE: Using a previous version as a template and making adjustments can often work well.
  3. Execute the steps in the populate file:
    • Load base SQL dump file into istads_build (or another work database).
    • Populate the convenience tables.
    • Drop MDT-related tables.
    • Run the Verify process and make sure there are no errors or issues.
    • Dump the contents of istads_build into a new source SQL dump file which can be loaded to website metadata databases.
      • e.g. metadata_source_sql/dev_work/build_db/v72_02_018917_refine_geog_labels_wcon_innodb.sql.gz



Deploy wcon.db[edit | edit source]

Once a webapp-ready has been prepared (see above), then it can be deployed for use by the NHGIS webapp and NHGIS extract engine.

  1. Determine source SQL dump file: e.g. metadata_source_sql/dev_work/build_db/v73_08_021756_shapefile_corrections_wcon_innodb.sql.gz
  2. Determine metadata database:
    • demo: nhgis_metadata_demo -- or -- nhgis_metadata_demo_alt?
    • internal: nhgis_metadata_internal -- or -- nhgis_metadata_internal_alt?
    • staging: nhgis_metadata_staging -- or -- nhgis_metadata_staging_alt?
    • live: nhgis_metadata_live -- or -- nhgis_metadata_live_alt?
  3. Prepare, Add, Commit, and Push a new deploy file to the nhgis-schemata repository:
    • demo: metadata/deploy_01_to_demo/v73_08_021756_to_demo.txt
    • internal: metadata/deploy_02_to_internal/v73_08_021756_to_internal.txt
    • staging: metadata/deploy_03_to_staging/v73_08_021756_to_staging.txt
    • live: metadata/deploy_04_to_live/v73_08_021756_to_live.txt
    • NOTE: Using a previous version as a template and making adjustments can often work well.
  4. Execute the steps in the deploy file:
    • Deploy source SQL dump file:
      • Copy source SQL dump file to /pkg/ipums/istads/db/metadata_source_sql/web_sites/deployed
      • Symlinks source SQL dump file to:
        • demo: /pkg/ipums/istads/db/metadata_source_sql/dev_work/demo_db
        • internal: /pkg/ipums/istads/db/metadata_source_sql/web_sites/internal_db
        • staging: /pkg/ipums/istads/db/metadata_source_sql/web_sites/staging_db
        • live: /pkg/ipums/istads/db/metadata_source_sql/web_sites/live_db
    • Load deployed source SQL dump file to database.
    • Swap databases:
      • Stop extract engines
      • Swap database for web app
      • Swap database for extract engine (automatically updates popscores and restarts the extract engines)



Load Pseudo-Live databases[edit | edit source]

  • AFTER a new metadata database is released, the next steps are:
    • dump the contents of the live extracts database and the live metadata database (which was updated to include the most current popularity scores) and
    • reload these databases into "demo-db" databases: istads_metadata_pseudo_live and istads_extracts_pseudo_live.


  1. Prepare, Add, Commit, and Push a new deploy file to the nhgis-schemata repository:
    • e.g. metadata/deploy_05_to_pseudo_live/20170217.txt
    • NOTE: Using a previous version as a template and making adjustments can often work well.
  2. Execute the steps in the deploy file:
    • Dump the databases:
      • extracts: e.g. /pkg/ipums/istads/db/extracts_source_sql/istads_extracts_live/20170217.sql
      • metadata: e.g. /pkg/ipums/istads/db/metadata_source_sql/istads_metadata_live/20170217.sql
    • Load the dumped databases:
      • extracts: istads_extracts_pseudo_live
      • metadata: istads_metadata_pseudo_live
    • GZIP the dumped SQL files:
      • extracts: e.g. /pkg/ipums/istads/db/extracts_source_sql/istads_extracts_live/20170217.sql.gz
      • metadata: e.g. /pkg/ipums/istads/db/extracts_source_sql/istads_metadata_live/20170217.sql.gz



Troubleshooting / Common Problems[edit | edit source]

What tends to go wrong? When it does, how do we 1. know it and 2. fix it.

Popscores are not updating[edit | edit source]

"Popscores" refers to Popularity Scores. NHGIS captures statistics from requests, translates these statistics into popularity scores (in the extract engine's popscores table), and updates the metadata database on a regular basis (via a CRON job).

If the Popscores do not update properly, then here are some steps to help identify the source of the problem:

  1. Log into ee.nhgis.org
  2. Go to directory /web/data2.nhgis.org/rails/istads/current/log
  3. Check log: cron.log
  4. Access the extracts database and look at the popscores table

Infrastructure Details[edit | edit source]

Gory details about software stacks and configurations. Subpages entirely appropriate here.


This section is under construction!

NHGIS (Legacy)[edit | edit source]


Historical Archive for ISTADS documentation[edit | edit source]