Designing a suite of measurements to understand the critical zone

Many scientists have begun to refer to the earth surface environment from the upper canopy to the depths of bedrock as the critical zone (CZ). Identification of the CZ as an integral object worthy of study implicitly posits that the study of the whole earth surface will provide benefits that do not arise when studying the individual parts. To study the CZ, however, requires prioritizing among the measurements that can be made – and we do not generally agree on the priorities. Currently, the Susquehanna Shale Hills Critical Zone Observatory (SSHCZO) is expanding from a small original focus area (0.08 km, Shale Hills catchment), to a larger watershed (164 km, Shavers Creek watershed) and is grappling with the prioritization. This effort is an expansion from a monolithologic first-order forested catchment to a watershed that encompasses several lithologies (shale, sandstone, limestone) and land use types (forest, agriculture). The goal of the project remains the same: to understand water, energy, gas, solute, and sediment (WEGSS) fluxes that are occurring today in the context of the record of those fluxes over geologic time as recorded in soil profiles, the sedimentary record, and landscape morphology. Given the small size of the Shale Hills catchment, the original design incorporated measurement of as many parameters as possible at high temporal and spatial density. In the larger Shavers Creek watershed, however, we must focus the measurements. We describe a strategy of data collection and modeling based on a geomorphological and land use framework that builds on the hillslope as the basic unit. Interpolation and extrapolation beyond specific sites relies on geophysical surveying, remote sensing, geomorphic analysis, the study of natural integrators such as streams, groundwaters or air, and application of a suite of CZ models. We hypothesize that measurements of a few important variables at strategic locations within a geomorphological framework will allow development of predictive models of CZ behavior. In turn, the measurements and models will reveal how the larger watershed will respond to perturbations both now and into the future. Published by Copernicus Publications on behalf of the European Geosciences Union. 212 S. L. Brantley et al.: Designing a suite of measurements to understand the critical zone


Introduction
The critical zone (CZ) is changing due to human impacts over large regions of the globe at rates that are geologically significant (Crutzen, 2002;Vitousek et al., 1997aVitousek et al., , 1997bWilkinson and McElroy, 2007). To maintain a sustainable environment requires that we learn to project the future of the CZ. Models are therefore needed that accu-5 rately describe CZ processes and that can be used to project, or "earthcast," the future. At present we generally cannot earthcast all the properties of the CZ but we can run models to project certain processes based on scenarios of human behavior (Godderis and Brantley, 2014). However, many of our models are inadequate to make successful projections. For example, we cannot a priori predict the streamflow in a catchment even if we know the average climate conditions, current soil textures, and current vegetation, because we are often uncertain how much water is lost to evapotranspiration and to groundwater (Beven, 2011). Likewise, we cannot a priori predict the depth or chemistry of regolith on a hillslope even if we know its lithology and tectonic and climatic history, because we do not fully understand what controls the rates of regolith formation toward average steady-state values. Perturbations that occur over timescales shorter than the characteristic time needed to reach steady state for a given gradient result in short-term changes, but in general the gradients are thought to move toward steadystate average values determined by the operative feedbacks. In Fig. 1, some examples of these emergent properties are shown as depth or spatial gradients that are identified in the brown boxes. Scientists from different disciplines generally focus on different emergent properties as shown in Fig. 1, and thus tend to think about processes operating at disparate timescales. However, CZ science is built upon the hypothesis that an investigation of the entire object -the CZ -across all timescales ( Fig. 1) will yield insights that disciplinary-specific investigations cannot. 15 This is a challenging task, given that the driving mechanisms for landscape change also span disparate timescales, from tectonic forcing over millions of years to glacialinterglacial climate change, to the recent influence of humans on the landscape. Each setting or observatory for analysis of the CZ must grapple with processes at different timescales to understand the dynamics and evolution of the system. At the Susque-covered seasons for more than a year, and measured soil moisture at 105 locations (Fig. 2).
The Shale Hills catchment is also situated on a single lithology (shale), and this simplified the complexity of the system by simplifying the boundary conditions for models. To aid in use of models for interpretation, we used an approach of monitoring at 10 ridgetops (i.e. 1-D sites), along catenas (2-D transects), and the full catchment (full 3-D) and we have similarly targeted models for 1-D, 2-D and 3-D simulations. The work led us to measure two types of hillslopes (2-D sites) which dominate the catchment: planar hillslopes that experience downslope but nonconvergent flow of water and soil, and swales that experience downslope convergent flow of water and soil. Much of our 15 effort focused on understanding soils and waters in these two hydrologic units (Fig. 2).
The goal of the SSHCZO project now is to upscale from Shale Hills to the entire 164 km 2 Shavers Creek watershed (Fig. 3). The expansion from 0.08 to 164 km 2 is an expansion from a zeroth-order catchment to a watershed with three HUC-12 watersheds (this terminology refers to hydrologic unit codes as defined by the US Geological Introduction resources to measure and model the dynamics and evolution of the entire CZ system? This paper describes our philosophy of measurement and our previous paper describes the modelling approach (Duffy et al., 2014). Obviously, due to the wide range of CZ processes across environmental gradients, the specifics of an ideal sampling design will vary from site to site. Nonetheless we describe the philosophy behind our approach 5 as a way to hypothesize an answer to the question, how can we adequately and efficiently measure the entire CZ? We then present specific examples for the first part of our expansion from Shale Hills to a sandstone subcatchment within Shavers Creek.

Rationale for the measurement plan
The choices of measurements to be made during the expansion are driven by data 10 needs for the models under development (Table 1), at the same time that the models are driven by observations in the field. The suite of models shown in Table 1 is one way to understand the entire CZ as an object of study, rather than as a set of disparate systems. In coordination with this modelling approach, a stratified sampling plan is being implemented in the CZO and paired with geophysical, catchment-scale stream and 15 remote sensing measurements. The models will then be used to upscale from limited point or subregion measurements to the whole watershed and from limited temporal measurements over longer timescales. Perhaps the largest difficulty in spatially characterizing the CZ in any observatory is the assessment of the extremely heterogeneous land surface, including regolith and 20 pore fluids down to bedrock. In other words, while assessment of atmospheric and surface water pools can be technically challenging, mixing of these pools is much faster than mixing of the biotic pool, the regolith and rock reservoir, or the pool of soil porewater, making assessment of the spatial distribution of these latter reservoirs exceedingly difficult (Niu et al., 2014). On the other hand, the rates of changes in the land sur-Introduction porally characterizing today's fluxes in the CZ in any observatory may be measuring the fast-changing fluxes of the atmospheric pools and fluxes. In recognition of these difficulties, the project started at Shale Hills precisely because it is a catchment almost 100 % underlain by Rose Hill formation shale with land use strictly as managed forest. Surface heterogeneities at Shale Hills were largely related 5 to hillslope position, colluvium related to the Last Glacial Maximum (LGM), fracturing, and relatively limited spatial variations in vegetation. To understand the CZ at the Shavers creek watershed, on the other hand, we must grapple with a more complex set of variations related to differences in lithology, land use, climate change, and landscape adjustment to changes in base level due to tectonics, eustasy or stream capture 10 ( Fig. 1). Here, base level is used to refer to the reference level or elevation down to which the watershed is currently being graded.
In recognition of the new complexities within Shavers creek, the sampling strategy was designed using a stratified sampling plan based on geological and geomorphological knowlege rather than random sampling. An implicit hypothesis underlying this 15 approach is the idea that sampling can be more limited when it is designed as a stratified approach based on geological (geomorphological knowledge). For example, where many many randomly chosen soil pits might be necessary if the delineation of swales vs. planar hillslopes was not recognized, if these two features are recognized and representative pits are dug to investigate these features, the number of pits can be min- 20 imized. Furthermore, one of the models under development for the CZO is a regolith formation model: by using this model to understand regolith formation, the number of pits in regolith can similarly be minimized.
Measurements at Shale Hills will soon be supplemented with targeted instrumentation in the two new subcatchments of Shavers creek watershed. The subcatchments Introduction identified on calcareous shale. This second subcatchment will also host several farms and will allow assessment of the effects of this land use on WEGSS fluxes. The targetted subcatchment data will be amplified by measurements of chemistry and streamflow along the mainstem of Shavers Creek as well as catchment-wide meteorological measurements to upscale from Shale Hills to Shavers Creek (Fig. 3). The 5 upscaling will rely on only a small number of sites for soil, vegetation, pore fluid, and soil gas measurements in each subcatchment. To extrapolate from and interpolate between these limited land surface measurements, models of landscape evolution (LE-PIHM), soil development (Regolith-RT-PIHM, WITCH), distribution of biota (BIOME4, CARAIB), C and N cycling (Flux-PIHM-BGC), sediment fluxes (PIHM-SED), solute fluxes (RT-10 Flux-PIHM, WITCH), soil gases (CARAIB), and energy and hydrologic fluxes (PIHM, Flux-PIHM) will be used. In effect, the plan is to substitute "everything everywhere" with measurements of "only what is needed" by using models of the CZ. As a simple example, a regolith formation model is under development that will predict distributions of soil thickness on a given lithology under a set of boundary conditions. Since much 15 of the water flowing through these small catchments flows as interflow through the soil and upper fractured zone (Sullivan et al., 2015), use of the regolith formation model is necessary to predict the distribution of permeability in the catchment. The model will be groundtruthed with pinpointed field measurements. With this approach, water fluxes in the subcatchments and in Shavers creek watershed itself will eventually be estimated. Introduction groundwater levels and chemistry. These streams and ground waters are natural spatial and temporal integrators over the watershed and therefore provide constraints on the 3-D-upscaled models. Stream and ground water data will parameterize and constrain model-data comparison and data assimilation, as described below.
3 Data assimilation 5 The choice of targeted measurements are derived at least in part from an observational system simulation experiment (OSSE) completed for the Shale Hills catchment using the Flux-PIHM model ( for the catchment. Prior to the OSSE, a sensitivity analysis was performed (Shi et al., 2014a) to determine the six most influential model parameters that were needed to constrain and produce a successful simulation. We defined "successful simulation" as one that reproduced the temporal variations of the four land surface-hydrologic fluxes (stream 15 discharge, sensible heat flux, latent heat flux, and canopy transpiration), and the three state variables (soil moisture, water table depth, and surface brightness temperature) ( Table 1) with high correlation coefficients and small root mean square errors. Once the six most influential model parameters were determined -porosity, van Genuchten alpha and beta, Zilitinkevich parameters, minimum stomatal resistance, and canopy 20 water storage -the OSSE was then performed.
The OSSE evaluated which of the fluxes and state variables were most important in constraining those model parameters. Shi et al. (2014b) found that the calibration coefficients for the most important model parameters were most sensitive to observations of (i) stream discharge, (ii) soil moisture, and (iii) surface brightness temperature. Introduction and latent heat fluxes.) The OSSE has also been validated with assimilation of field observations at Shale Hills (Shi et al., 2015b). On the basis of this OSSE, we are targeting measurement of stream discharge, soil moisture, and surface brightness temperature for each of the SSHCZO subcatchments on shale, sandstone, and calcareous shale. These measurements should allow 5 us to reproduce subcatchment-averaged land-atmosphere fluxes and subsurface hydrology adequately. Once the three subcatchments are parameterized, the models will then be upscaled to the entire Shavers Creek watershed using information from lidar, SSURGO, geological maps, geophysical surveying, and land use.
Currently, the OSSE has only been used for assimilation of water and energy data 10 but is being expanded to include biogeochemical variables. In other words, our ultimate aim is to complete an OSSE for C and N fluxes in each subcatchment. In the long run, we could also extend the OSSE to assimilate data for other solutes and for sediments.
Modeling results from Shale Hills indicated that an accurate simulation of the subcatchment spatial patterns in soil moisture were achieved using a relatively limited set 15 of hydrologic measurements made at a few points (Shi et al., 2015a). Specifically, we had to measure (i) stream discharge at the outlet, (ii) soil moisture at a few locations, and (iii) groundwater levels at a few locations. The soil moisture (ii) and groundwater (ii) data used to calibrate the model were from 3 nearly co-located sites in the valley floor. These sites (referred to as RTHnet on Fig. 2) were the only sites with 20 continuous data at the time of model calibration. Notably, COSMOS data were not yet available. The measurements were averaged across the three RTHnet sites (see data posted at http://criticalzone.org/shale-hills/data/dataset/3615/) to provide one calibration point in the model. Extending from this calibration point to the entire catchment was attempted using data from the SSURGO database (http://www.nrcs.usda.gov/wps/ Introduction for the whole soil column for each soil series (Supplement Table S2). These soil core measurements for each soil series were used to constrain the shape of the soil water retention curve for each soil series in the catchment in the model. The result of this effort was that for the monolithologic 0.08 km 2 catchment of Shale Hills, five soil series were identified and soil properties measured (Lin et al., 2006). As 5 we proceed with work on the new subcatchments, one of two approaches will be used. First, it is possible that relatively few soil moisture measurement locations are required in any given catchment, as long as we can obtain soil hydraulic properties for each soil series. Using the SSURGO soils database, such measurements could be made to parameterize the model. Alternately, spatially extensive soil moisture measurements 10 based on COSMOS may be adequate to infer the variations in soil hydraulic properties on a series-by-series basis or based on geomorphological criteria. The overall plan is to use (i) SSURGO, (ii) geomorphological constraints, (iii) COSMOS, and (iv) soil moisture measurements along the catenas to parameterize Flux-PIHM.
4 Implementation in the Garner Run subcatchment 15 In this section we introduce the Garner Run subcatchment, one of the two new focus areas planned within the Shavers Creek watershed. In addition to describing the geologic and geomorphologic setting, we detail the sampling strategy. Preliminary observations and measurements from soil pits, vegetation surveys, and surface water monitoring are also presented.

Geologic and geomorphic context of Garner Run
A central underlying hypothesis of the CZO work is that the use of geomorphological analysis can inform the sampling strategy so that measurements can be limited in number. Therefore, we include a description of current knowledge of the geomorphological setting of the new subcatchment at Garner Run. The subcatchment drains a syncli-Introduction the hillslopes parallel bedding in the sandstone (Fig. 5). Indeed, subtle bedding planes can be observed in lidar-derived elevation data (Fig. 6b). The strong lithologic control on landscape form is manifested clearly in the high-resolution (1 m) bare-earth lidar topography.
One question that must be addressed is whether the Rose Hill Shale of the Clinton Group, which underlies Shale Hills and lies stratigraphically above the Tuscarora, may be present in the Garner Run subcatchment. Down-valley of the Garner Run study area, the Rose Hill Shale has been mapped in a low-sloping bench at the foot of Tussey Mountain (Figs. 3 and 4, Flueckinger, 1969). Although the entirety of the Garner Run study area is mapped as the Tuscarora Formation, the continuation of a low-sloping 15 bench along the entire valley ( Fig. 7) could be consistent with the presence of Rose Hill Shale throughout the catchment. In general, bedrock exposure is poor in the Shavers Creek watershed, but lidar topographic analysis, field mapping, and targeted geophysical surveys will aid in resolving uncertainties in subsurface composition necessary for modeling water, solute, and sediment fluxes. Introduction

Tables Figures
Back Close

Full Screen / Esc
Printer-friendly Version

Interactive Discussion
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | steep (mean slope = 12-17 • ) compared to those at Shale Hills (mean slope = 14-21 • ), despite having stronger underlying bedrock (quartzite vs. shale). This observation of steeper hillslopes in Shale Hills vs. Garner Run is particularly curious given their presumably similar histories of climate and tectonism. If the two landscapes were in a topographic steady-state with erosion rate equal to rock uplift 5 rate, we would expect Garner Run to have evolved with time to have steeper slopes in order to erode and transport its more resistant bedrock and coarser sediment. This hillslope conundrum could be related to the role of local base level and transient landscape adjustment (Whipple et al., 2013).
Specifically, analysis of stream longitudinal profiles on Garner Run and the mainstem 10 of Shavers Creek reveals prominent knickpoints at elevations of ∼ 320 m and 380 m, respectively ( Fig. 7). Such breaks in channel slope geomorphically insulate the upper stream reaches from the mainstem of Shavers Creek and could be consistent with different rates of base level fall upstream and downstream of the knickpoint. Equivalently, the knickpoints could delineate different rates of local river incision into bedrock in 15 the upper and lower reaches. Published cosmogenic nuclide-derived bedrock lowering rates ranging from 5-10 m Myr −1 from similar nearby watersheds Portenga et al., 2013) may be a good estimate for rates in Garner Run upstream of the knickpoint (Fig. 7). These rates are indeed 3-4 times lower than bedrock lowering rates inferred for the Shale Hills catchment (20-40 m Myr −1 ), which lies downstream of 20 the knickpoint on Shavers Creek (Ma et al., 2013;West et al., 2014West et al., , 2013. The origin and genesis of these knickpoints is likely due to some combination of the following: regional baselevel adjustment on the Susquehanna River since the Neogene (3.5-15 Ma) due to epierogenic uplift , stream capture and drainage reorganization (e.g. Willett et al., 2014), or temporal and spatial variations in bedrock 25 exposure at the surface (e.g. Cook et al., 2009). Testing these competing controls will require additional direct measurements of bedrock lowering rates with cosmogenic nuclides at Garner Run, in addition to bedrock river incision models. Such models can account for both variations in rock strength and temporal changes in relative base level. In addition to variations in structure, lithology, and base level, Quaternary climate variations have left a strong imprint on the landscape of Shavers Creek. While the relict of the periglacial processes at Shale Hills are mostly observed in the subsurface colluvial stratigraphy (West et al., 2013), at Garner Run these processes have left behind boulder fields, solifluction lobes, and landslides observed at the land surface 5 (Fig. 6). Such features are found throughout central Pennsylvania south of the LGM (last glacial maximum) limit (Gardner et al., 1991). These features document a major reorganization of the uppermost CZ by processes such as permafrost thaw. For example, the Leading Ridge hillslope (the southern hillslope defining the Garner Run subcatchment) is characterized by a hummocky topography at the 5-10 m scale, with 10 partially vegetated boulder fields observed to be common. The other side of the catchment -Tussey Mountain hillslope -is steeper at the top, has greater relief, retains evidence of past translational slides, and contains open, unvegetated boulder fields. At the foot of the Tussey Mountain hillslope is a strong slope break that demarcates a low-sloping region characterized by abundant solifluction lobes (Figs. 6 and 7). Such 15 features were either not as active, or their evidence has been erased or buried, at the Shale Hills subcatchment.

Water and energy flux measurements at Garner Run: Tower HOG
One of our major focuses is measuring precipitation and evapotranspiration (ET). These fluxes are drivers for landscape evolution as they are manifested today (Fig. 1). 20 These measurements also are needed for the land surface water balance to constrain today's WEGSS fluxes. First, we are installing a laser disdrometer (LPM, Theis Clima GmbH) to measure precipitation amount and type in Garner Run. Another disdrometer has been in use at Shale Hills since 2008. The disdrometer will be deployed as part of the "tower hydrological observation gear" -referred to here as Tower HOG (Table 2).
Tower HOG will be placed outside the watershed on Tussey Mountain ridgeline (Fig. 3). The remote, rocky terrain in Garner Run made constructing a new tower in the center of the watershed challenging. In contrast, a communications tower that is surrounded 1018 Introduction

Tables Figures
Back Close

Full Screen / Esc
Printer-friendly Version

Interactive Discussion
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | by representative forests already exists on the ridge top above the watershed, and we have therefore chosen this to host the eddy covariance flux instrumentation. Although the measurement footprint (i.e. fetch) for the tower measurements will include other areas, the tower instrumentation will be sensitive to fluxes from the forest in Garner Run. The tower measurements can also be compared to regional measurements 5 such as the National Atmospheric Deposition Program measurements and samples of rainwater. For example, according to the nearest NADP site, Garner Run receives 1006 mm yr −1 precipitation with an average pH of 5.0 (Thomas et al., 2013).
In addition to precipitation, sensible and latent heat fluxes (i.e. using eddy covariance), or skin temperature (upwelling terrestrial radiation) must also be measured to constrain Flux-PIHM (Shi et al., 2013). A small clearing below the tower site on the Tussey ridgeline makes the site unsuitable for skin temperature measurements representative of the forest, so we are only collecting eddy covariance measurements at the Tussey ridgeline. Of course, the complex terrain at both Shale Hills and Garner Run make eddy covariance measurements difficult to interpret in stable micrometeorologi-15 cal conditions. Since the primary energy partitioning happens during the day, however, daytime flux measurements are sufficient to constrain the modeling. For the Garner Run subcatchment, in addition, we also may be able to use upwelling infrared radiation measurements currently being made at the nearby Shale Hills. These radiative energy fluxes are measured using a four component radiometer, i.e., one that mea-20 sures upwelling and downwelling terrestrial and solar radiation (Table 2). With both the EC measurements at Garner Run and radiative flux measurements at Shale Hills, Flux-PIHM should be well constrained.

Vegetation mapping
Vegetation has important impacts on the WEGSS fluxes and has important but poorly 25 understood impacts on regolith formation and sediment transport. As we study individual subcatchments to understand WEGSS budgets, we seek to learn enough about the fluxes to extrapolate to the entire Shavers creek watershed: we therefore seek to 1019 Introduction

Tables Figures
Back Close

Full Screen / Esc
Printer-friendly Version

Interactive Discussion
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | understand some of the biogeochemical controls on WEGSS fluxes. For example, one of our goals is to assess nitrate fluxes out of Shavers creek. To do this, it may be necessary to determine tree species (Williard et al., 2005). Ultimately, we plan to run an OSSE to compare model predictions to measurements as a way to determine the important parameters for predicting carbon and nitrogen fluxes.

5
The vegetation has already been mapped in Garner Run subcatchment. The objective of the ground-based vegetation sampling design for the subcatchment was to measure spatial variability in vegetation across the catena (ridge top, midslope, and valley floor positions) defined for GroundHOG. These measurements set the stage for later re-measurements to understand temporal variability. For example, future as-10 sessments will quantify above-ground biomass, an important carbon pool. Variability in forest composition, standing biomass, and productivity across a watershed is generally related to gradients in biotic and abiotic resources such as soil chemistry or structure, water flux, and incoming solar energy. Therefore, the relatively restricted vegetation analysis design (Fig. 5) will be upscaled based on the team's developing knowledge 15 of the distribution of soils across the watershed as well as lidar-based estimates of tree biomass and seasonal patterns of leaf area index and tree diameter growth. Given that we have not yet run an OSSE for carbon or nitrogen fluxes, our measurements of vegetation are relatively broad to enable future such analysis.
The vegetation measurements are important not only for C and N fluxes, but also for 20 water flux. At Shale Hills, seasonal variation in tree transpiration has been estimated using tree sap flux sensors (Meinzer et al., 2013). While we sampled many different tree species in multiple locations at Shale Hills (Fig. 2), a more restricted number will be sampled at Garner Run. For example, sapflux sensors are being deployed at only the midslope positions of Ground HOG (Fig. 5 addition, all approaches to measuring water fluxes are imperfect; errors can best be constrained when multiple approaches are used. In addition to these sapflux measurements limited to midslope pits, vegetation has been sampled in linear transects parallel to the slope contour at each of the four soil pits (Fig. 5, Sect. 4.4), i.e., at each of the following pits: Leading Ridge ridge top (LRRT), 5 Leading Ridge midslope (LRMS), Leading Ridge valley floor (LRVF), Tussey Mountain midslope (TMMS). Each vegetation transect was 10 m along the direction perpendicular to the valley axis and ∼ 700-1400 m parallel to the valley axis.
Measurements along the transects yielded vegetation and forest floor cover data for 4.1 ha in the subcatchment (Table 3). They provide vegetation input data for land surface hydrologic models, and also evaluation data for a spatially-distributed biogeochemistry model (Flux-PIHM-BGC, Table 1). In the transected area 2241 trees > 10 cm diameter at breast height were measured, mapped, and permanently tagged. Understory vegetation composition was measured at 5 m intervals along transects and coarse woody debris was measured in 25 m planar transects parallel to the main transect, 15 spaced every 100 m. Forest floor cover was classified as rock (typically boulder clasts from periglacial block fall), bare soil, or leaf litter every 1 m along each transect, and the dimensions (a, b, c axes) of the five largest exposed rocks was recorded every 25 m. Forest floor biomass was measured every 25 m along transects by removing the organic horizon from a 0.03 m 2 area for laboratory analysis: samples were dried, 20 weighed and measured for carbon loss on ignition.
The results from these linear transects document variations in vegetation across catena positions (Table 3), as well as spatial variation in vegetation within a position. For example, mean tree basal area (BA; the ratio of the total cross-sectional area of tree stems ratioed to the total land surface area) in the LRRT transect is 25.3 m 2 ha −1 ; how-25 ever, BA measurements ranged from 0 to 79 m 2 ha −1 . Similarly, 16 % of points sampled every meter in LRRT fell on rock, yet at certain points along the transect rock cover was as high as 100 % or as low as 0 %. Vegetation measurements will be combined with data on surface rockiness (from transects) and a suite of ground and remotely sensed measurements from the watershed such as slope, curvature, aspect, solar radiation, and soil depth to model vegetation dynamics from environmental conditions and interpolate vegetation structure in areas of the watershed not directly sampled. Future resampling of linear transects will allow assessment of carbon uptake in vegetation, as well as changes in forest composition and structure. 5 Additional key vegetation parameters will be assessed at the soil pits described in Sect. 3.4. These additional measurements include root distributions, leaf area index (LAI, described in the next paragraph), litter fall, tree diameter growth and tree sap flux. Root distributions are being measured at all four soil pits in Garner Run using a combination of soil cores to accurately assess the high length densities near the surface.

10
Root distributions, combined with soil water depletion patterns, can inform depth of tree water use over the season, which is an input parameter in the PIHM suite of models. Currently, a look-up table (http://www.ral.ucar.edu/research/land/technology/lsm/ parameters/VEGPARM.TBL) is used to determine the rooting depth of each landcover type in the PIHM suite of models. Using field measured rooting depth as model input 15 may improve the modeling of water uptake. In addition, profile wall mapping is being used to analyze the architecture, mycorrhizal colonization, and anatomy of deep roots. By characterizing and understanding the controls on root traits along a hillslope, we will eventually be able to use such observations to inform both models of water cycling (Flux-PIHM) and regolith formation (RT-Flux-PIHM, see Table 1). 20 At weekly intervals in the spring and fall and monthly during the summer, LAI will be assessed with a Li-2200 plant canopy analyzer (LI-COR Inc., Lincoln, Nebraska USA). The Moderate Resolution Imaging Spectroradiometer (MODIS) also provides remotely-sensed 8 day composite LAI (Knyazikhin et al., 1999;Myneni et al., 2002). The MODIS LAI product, however, has a spatial resolution of 1 km 2 , which cannot 25 resolve the spatial structure in LAI within small watersheds. The product also has a notable bias compared to field measurements (e.g. Shi et al., 2013). The LAI field measurements will be used for detailed information on leaf phenology, which is an important driver for the modeling of water and carbon fluxes for land surface and hydrologic mod- Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | els (e.g., PIHM, Flux-PIHM (Table 1)), and provides calibration or evaluation data for biogeochemistry models like Flux-PIHM-BGC (Naithani et al., 2013;Shi et al., 2013). Another important value we must estimate is net primary productivity (NPP). With NPP it is possible to constrain carbon and nutrient fluxes in vegetation stocks, which can be large components of the overall budgets. To estimate aboveground NPP, we will measure annual variation in trunk growth with dendrobands emplaced on examples of each of the six dominant tree species near each soil pit site. In addition, traps at each soil pit are also being used to assess litter fall. One of the key model outputs of Flux-PIHM-BGC is NPP, which can be evaluated using these measured data.

Soil observations
To first order, the Garner Run subcatchment land surface falls into one of three categories: (i) fully soil mantled with few boulders emerging at the ground surface, (ii) boulder-covered with tree canopy, and (iii) boulder-covered without tree canopy. To assess the spatial heterogeneity of soils in the Garner Run subcatchment, we focused 15 efforts on four soil pits: three on the north-facing planar slope of Leading Ridge (LRRT, LRMS, LRVF) and one mid-slope pit on the south-facing slope of Tussey Mountain (TMMS) (Fig. 5). This deployment of observations in soil pits along a catena, with an additional pit on the opposite valley wall, is here referred to as "Ground HOG" (ground hydrological observation gear) (Fig. 5, Supplement Fig. S1) and is the result of our 20 focus on a minimalist sampling design.
In addition, the surface cover at Garner Run consists of coarse blocks of the Tuscarora sandstone ranging in diameter from ∼ 10-200 cm, making it challenging to excavate large soil pits, limiting the number of such installations ( on slopes that were planar in planview to avoid areas of convergent flow. The midslope pits were located on convex-up hillslopes for reasons discussed below. Given our catena design, we excavated pits in the following soil series: TMMS, LRRT and LRMS (Hazleton-Dekalb association, very steep), and LRVF (Andover extremely stony loam, 0-8 % slopes).

5
The rationale for the positions of the pits in Ground HOG are as follows. First, regolith formation at a ridge top is the simplest to understand and model (see, for example, Lebedeva et al., 2007Lebedeva et al., , 2010 because net flux of water and earth materials is largely 1-D: i.e., net water flux is downward and net earth material flux is upward over geological time. Regolith-RT-PIHM is a model under development to simulate regolith develop-10 ment quantitatively for such 1-D systems, using constraints from cosmogenic isotope analysis (Table 1). Second, Regolith-RT-PIHM will also be able to model convex-upward hillslopes by assessment of the hillslope as a 2-D system that incorporates downslope transport of water and soil (e.g. Lebedeva and Brantley, 2013). By analyzing soil pits along a planar hillslope as we did for Shale Hills (Jin et al., 2010b), both 1-D and 2-15 D models of regolith formation will be enabled. With such conceptual and numerical models, we will extrapolate to other hillslopes within Shavers creek watershed. Third, at Shale Hills we discovered that both planar hillslopes and swales were important, requiring measurements at both (Graham and Lin, 2010;Jin et al., 2011;Thomas et al., 2013). No such swales have been observed at Garner Run, allowing focus on just 20 one catena in the minimalist design. Finally, the importance of aspect on soil development and WEGSS fluxes at Shale Hills has been noted Lin, 2011, 2010;Ma et al., 2011;West et al., 2014) on shale, as well as on sandstones in Pennsylvania (Carter and Ciolkosz, 1991). For that reason, one additional pit was sited on the northern side of the catchment to make observations to constrain the effect of aspect 25 (Fig. 5).
At each pit location, we described the soil profile, which typically had the following structure: an upper rocky layer with a thin organic soil, a leached layer with large clasts mostly absent, a sandy mineral soil with a thin layer of accumulated organic and sesquioxide material, and a deeper clay-rich layer with larger rock fragments interspersed (Fig. S3, Table S2). Additionally, for each pit we sampled soils at 10 cm intervals for chemistry, grain size, organic matter, and composition analysis (Table S4).
Most of the Garner Run subcatchment has been mapped to lie on Tuscarora sandstone (Flueckinger, 1969). This sandstone, deposited in the Lower Silurian, has been interpreted as reworked beach sediments during original deposition (Cotter, 1982). The unit has been mildly metamorphosed so that pressure solution has cemented the fabric of the rock: as such, the unit is often referred to as a quartzite. Cotter reported the unit to be close to 98 % SiO 2 . Weathering of sandstone is largely controlled by the porosity, the fraction of non-quartz grains, the composition of the cement (Turkington and Paradise, 2005), and the pH of soil porewaters (Certini et al., 2003). The porosity is important because it dictates how much water enters the weathering rock; in addition, during seasonal drying, salts deposited inside a sandstone can crystallize and disintegrate the rock (Labus and Bochen, 2012). Thermal cycling can also crack sandstones (Turkington and Paradise, 2005) as can tree roots (Amundson, 2004). 15 The average of the bulk compositions of four rock samples collected from the bottoms of the GroundHOG soil pits were used to estimate an average composition of the quartzite for comparison to similar analyses of bulk regolith samples (all measured using Li metaborate fusion followed by analysis by inductively coupled plasma atomic emission spectroscopy, Table S3). In Garner run samples, the Tuscarora was observed 20 to be close to 98 % SiO 2 . A small amount of titanium (Ti), generally present in sandstones in highly insoluble minerals, was observed to be present (Table S3). By calculating the normalized concentrations for elements assuming Ti is insoluble, we assessed the loss or gain of elements from the regolith as compared to Ti in the underlying Tuscarora sandstone. These normalized concentrations are referred to as mass transfer 25 coefficients, τ i j , where i is the immobile element and j is the mobile element (Anderson et al., 2002;Brimhall and Dietrich, 1987). From this assessment of regolith mass balance, it was observed that Al, Ca, Na, Si, and P were either largely unchanged (τ ≈ 0) or highly depleted (τ < 0) compared to the underlying rock. In contrast, Mg, K, and Fe were all significantly enriched in the soils (τ > 0) compared to the protolith (Fig. S3). On each plot, the star represents the parent composition (τ ≈ 0), plotted at an arbitrary depth. These observations are consistent with arguments in the literature that ridgetop soils are residual, poorly developed, and thin (Ciolkosz et al., 1990). In contrast, downslope 5 soils generally developed not only from rock in place but also from colluvium (Fig. 5). Furthermore, soils in Pennsylvania commonly show a brown over red color layering that has been attributed to exposure of earlier regolith to weathering (producing the red layer) followed by emplacement of colluvium that experienced additional weathering (the brown layer) (Hoover and Ciolkosz, 1988). Such polygenetic histories will make regolith formation modelling more complex. The addition of Mg, K, and Fe to the soils, even at the ridgetop where downslope transport is unlikely to have been significant (Fig. S3), could either be explained by exogenous additions to the soil or by protolith compositional variation which was not assessed in the small set of 5 rock samples. For example, some interfingered shales are known to occur within the Tuscarora formation 15 (Flueckinger, 1969) and could have provided the excess Mg, K, and Fe. Alternately, addition of these elements could have been caused by (i) dust inputs (Ciolkosz et al., 1990) which were likely to be important especially during the glacial period and just after, or (ii) fines percolating downward from weathering of the overlying Rose Hill shale before it was eroded away (Fig. 4). Movement of fines out of the Rose Hill shale is 20 known to be happening today from our work at Shale Hills (Jin et al., 2010a).

Ground HOG
The Ground HOG instrumentation enables the in situ measurement of soil moisture and temperature, as well as gas and pore-fluid compositions, all at multiple depths (Fig. 5,  Fig. S2). Ground HOG complements the atmospheric measurements taken by Tower 25 HOG instrumentation (Sect. 3.2). Because the sites are difficult to access, measurements were automated to the extent possible. However, the lack of access to electricity and the cost of automated sensors (for CO 2 for example) meant that a completely 1026 Introduction

Tables Figures
Back Close

Full Screen / Esc
Printer-friendly Version

Interactive Discussion
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | automated monitoring system was unfeasible as well. Therefore, our final approach (Fig. S2) included a few automated components recording a continuous time series of data, coupled with additional components to be monitored manually, but with lower temporal resolution. In selecting depths for soil sampling we wanted to instrument the site so that results 5 could be compared across all watersheds, which meant we focused on a depth-based (as opposed to horizon-based) sampling scheme. In addition, we wanted to emphasize surface soils that have the highest water and biogeochemical flux rates. These layers also have the strongest influence on the atmospheric boundary layer. At the same time, we wanted to also document deep soil processes critical to understanding weathering  , 1974;Topp et al., 1980;Topp and Ferre, 2002). We placed 12 (4 depths × 3 pit faces) in each pit. The automated sensors were emplaced at depths expected to have the most dynamic 25 soil moisture. In contrast, the waveguides measure deep soil moisture where temporal variability is expected to be low. The use of waveguides added spatial replication at all depths (Fig. 5, Fig. S2). Co-located with every soil moisture waveguide is a soil gas access tube to sample soil gas for measurements of the depth distribution of CO 2 and O 2 at a low temporal frequency. At 20 cm below the soil surface and 20 cm above the bottom of the uphill face of the pit, sensors are continuously measuring soil CO 2 (GP001 CO 2 probe, Forerunner Research, Canada) and O 2 (SO-110 Sensor, Apogee Intruments, Utah, USA) 5 at the two midslope catena positions. We selected the midslope catenas for these sensors because they provide the best locations for contrasting north and south aspects. We placed one sensor at the D-20 location to document controls on acid and oxidative weathering near the bedrock interface. The second sensor is near the surface to monitor a zone of high biological CO 2 and O 2 processing. We did not install the sensors 10 at the shallowest depth (10 cm) because we found that high diffusion and advection at shallower depths causes the gas concentrations at 10 cm to reflect atmospheric conditions, providing less information on soil biology (Jin et al., 2014) (Hasenmueller et al., 2015).

ESURFD
Lysimeters (Super Quartz, Prenart Equipment ApS, Denmark) have been emplaced 15 to allow periodic manual sampling of soil pore water for chemical analysis at 20 cm and D-20 cm depths in all catena locations. The rationale for these depths is the same as described above for the automated CO 2 and O 2 sensors (they are co-located in the midslope pits). Overall, these Ground HOG measurements will parameterize the regolith formation models (Table 1) and will be used to test hypotheses linking hydrology, 20 biotic production/consumption of soil gases, and weathering rates.

Upscaling from the pits to the catena using geophysics
To supplement the Ground HOG observations, we use geophysical and large-footprint methods to interpolate between and extrapolate beyond soil pits. For example, a cosmic-ray neutron detector (CR-1000B, Hydroinnova Inc.) has been emplaced to 25 measure large-scale (∼ 0.5 km radius) average soil moisture every 30 min. This COS-MOS unit, already used in a variety of ecosystems , will measure spatially averaged ( (e.g. canopy storage, snow, water vapor, Franz et al., 2013;. The sensor has been installed near the LRVF (Leading Ridge valley floor) pit to provide spatially averaged moisture estimates across the valley. The COSMOS fills in the gap between small-scale point measurements (Fig. 5) and 5 large-scale satellite remote sensing. The footprint of COSMOS is optimal for hydrometeorological model calibration and validation at small watersheds. One sensor was installed at Shale Hills in 2011 and we are currently testing the COSMOS data with PIHM. We anticipate the results from both catchments will yield insights into the capabilities of cosmic-ray moisture sensing technology in steep terrain and will offer valuable 10 insights into the problem of upscaling soil moisture measurements.
Ground HOG measurements will be further complemented by geophysical mapping along the catenas, including ground penetrating radar (GPR) transects of subsurface structure. Electromagnetic induction (EMI) mapping of soil electric conductivity will similarly be used to measure soil spatial variations between pits. We plan repeated GPR 15 and EMI surveys, in combination with terrain analysis using lidar topography, to identify subsurface hydrological features and soil distribution using published procedures (Zhu et al., 2010a, b). We will also field check regolith depths using augers, drills, etc. With repeated geophysical surveys over time (e.g., different seasons and/or before and after storm events), we will explore temporal changes in heterogeneous soilscapes and 20 subsurface hydrologic dynamics, as demonstrated in the previous studies at Shale Hills (Guo et al., 2014;Zhang et al., 2014).
Such geophysical mapping is necessary to link between soil-pit point measurements. Depth to bedrock along the catenas will also be mapped using the geophysical surveys and compared to pit measurements. These data can be used for upscaling biogeo-25 chemical patterns and processes. For example, we expect that soil depth and soil moisture exert the strongest controls on variation in soil gas concentrations, as observed in many places, including Shale Hills (Hasenmueller et al., 2015;Jin et al., 2014 coupled with catchment scale soil moisture (from COSMOS) and soil depth (from GPR) data to upscale soil gas characteristics to the whole catchment. An example of the utility of this approach is shown here from an investigation completed using a ground penetrating radar unit (TerraSIRch Subsurface Interface Radar System-3000). The unit was used to map the depth to bedrock in the Garner Run 5 hillslope near the three major monitoring sites (LRVF, LRMS, LRRT) (Fig. 5). Multiple GPR traverses were completed by pulling the antennae along the ground surface. A distance-calibrated survey wheel with encoder was bolted onto these antennae to provide greater control of signal pulse transmission and data collection. The survey wheel occasionally slipped in the challenging terrain, resulting in some line lengths 10 recorded by the survey wheel which were slightly different than the actual lengths. In order to surface normalize the radar records collected, relative elevation data were collected at major slope breaks along the traverse line with an engineering level and stadia rod.
A traverse line was established that ascends Leading Ridge in essentially a west 15 to east direction from near Garner Run to the summit, running from about 494 to 588 m a.s.l. (Fig. 5). The dominant soils mapped along this traverse line (Table S2) include: Andover, Albrights, Hazleton, and Dekalb. The very deep, poorly drained Andover and moderately well to somewhat poorly drained Albrights soils have been reported in general to have formed in colluvium derived from acid sandstone and shale on 20 upland toe-slope and foot-slope positions. The moderately deep, excessively drained Dekalb and the deep and very deep, well-drained Hazleton soils formed on higherlying slope positions in residuum weathered from acid sandstone. These soils have moderate potential for penetration with GPR. The traverse line was cleared of debris but the ground surface remained highly ir-25 regular with numerous rock fragments and exposed tree roots. These obstacles often halted the movement and caused poor coupling of the antennas with the ground. In this study, flags were inserted in the ground at noticeable breaks in the topography along the traverse line. User marks were inserted on the radar records as the antenna passed by these survey flags. Later, the elevations of these points were determined using an engineering level and stadia rod. The elevation data were entered into the radar data files and used to "surface normalize" or "terrain correct" the radar records. In this preliminary investigation, the soil-bedrock interface was not easy to identify. This was attributed to poor antenna coupling with the ground surface in the challenging 5 rocky terrain, noise in the radar records caused by rock fragments in the overlying soil, irregular and fractured bedrock surfaces, and varying degrees of hardness in both rock fragments and the underlying bedrock. These factors weakened the amplitude, consistency and continuity of reflections from the soil-bedrock interface. Nevertheless, we describe the preliminary results below. Figure 9 shows two surface-normalized plots of the data that were collected with the 400 MHz antenna as it was pulled down Leading Ridge from the summit area to near the Garner Run (the stream). In these plots, the distance scale is measured from the summit area to near Garner Run. While differences in gross reflection patterns can be used to differentiate rock from soil, on these images the soil-bedrock interface is dif- 15 fuse. However, we collected four repeated GPR transects using both 400 and 270 MHz antenna. Compared with the 400 MHz antenna, the lower resolution of the 270 MHz antenna has smoothed-out irregularities in the bedrock surface and reduced the noise from smaller, less extensive subsurface features, thus improving the interpretability of the soil-bedrock interface. Based on a total of 14 748 soil-depth measurements from 20 ∼ 400 m long GPR images along this traverse line, the interpreted depth to bedrock averaged 1.37 m, with a range of 0.58 to 2.42 m. Table 4 summarizes the depth to bedrock values estimated from the two radar traverses shown in Fig. 9. Each entry in the table indicates the frequency of depth to bedrock data collected with the 400 MHz antenna along a traverse line that descended from the Leading Ridge. Data are grouped into 25 four soil depth classes. The GPR-derived soil depths are reasonable compared to the values we estimated in the soil pits.

Hydrology: groundwater measurements
Several methods are needed in a catchment to characterize physical and chemical interactions of water with regolith and rock. First, physical inputs and outputs to a catchment, including precipitation, interception, ET, soil infiltration, and groundwater discharge, must be understood. In fact, however, groundwater flows are often omitted 5 from comprehensive hydrology-meteorology-vegetation models (e.g. the Variable Infiltration Capacity (VIC) hydrologic model, or the Noah Land-Surface Model (LSM)); however, at Shale Hills, we have estimated that 5 % of the nonevapotranspired water that enters the catchment reaches the regional groundwater table and flows to the stream as a deep flow component (Sullivan et al., 2015). At Garner Run, we also ex-10 pect groundwater to play a significant role in streamflow and geochemical dynamics. For example, some researchers have found that drainage and runoff on sandstone catchments is controlled to great extent by bedrock (Hattanji and Onda, 2004), and specifically by flow through fractures in the upper meters of sandstone directly beneath the soil (Williams et al., 2010). In this section and the next section we focus on quanti-15 fying flows through and between surface water and groundwater. We aim to measure the relative magnitudes, timing, and spatial variability of these fluxes. We emphasize methodologies for measuring and characterizing groundwater and streamwater to characterize groundwater residence times, identify subsurface flow paths, and the drivers and controls on water-rock interactions.

20
Our plans for well installation and solid earth sampling by coring are reduced compared to sampling at Shale Hills. At Shale Hills, 28 wells were emplaced and then intermittently monitored (Fig. 2). In Garner run, two deep cores (> 50 m) will be extracted at two locations near the Garner Run catchment, one (∼ 100 m) on Tussey ridge, i.e. the ridge that divides Shavers Creek from the watersheds to the northwest and one 25 (50-75 m) on the smaller divide within Shavers Creek between Garner Run and Roaring Run (see Fig. 3). Three shallow wells will be installed and cores (∼ 10 m) will be collected at the catena sites (Fig. 5). Two to four additional monitoring wells will be in- stalled along the stream reach on the valley floor. In drilling boreholes for assessment of groundwater, we also sample borehole solid-phase chemistry and mineralogy. All core samples will be analyzed for bulk chemistry and mineralogy to characterize the weathering reactions and protolith in the critical zone. All boreholes will have groundwater monitoring wells installed, with screened intervals spanning the water ta-5 ble and with instrumentation as shown in Fig. 5. Monitoring at the wells will include hourly water level measurements using autonomous pressure loggers, hourly temperature measurements at two depths below the water table, and monthly water samples collected and analyzed for major ion chemistry. A pumping test will be conducted at the adjacent valley floor wells to measure aquifer storativity and hydraulic conductivity.

10
Deep core samples and groundwater monitoring will provide a baseline understanding of the geologic/pedologic and hydrologic system on the new sandstone lithology. Subsequent hypotheses about controls on weathering and hydrologic dynamics, as well as historical flow and solute fluxes, will be constrained by these observations at the catchment boundaries.

Hydrology: streamflow and chemistry measurements
The Garner Run study reach is approximately 500 m long within the catchment (Fig. 5) and consists of a rocky, often braided, channel. We have deployed a flume at the downstream end of the reach to measure discharge, and are monitoring stage continuously using a pressure transducer (Hobo U-20, Onset Computer Corp., Hyannis, MA). Sur-20 face water -groundwater (SW-GW) exchange characteristics have been measured using a short-term deployment of a distributed temperature sensor (DTS), and will be supplemented by a series of tracer injection tests to investigate hyporheic exchange characteristics over a wider range of stream discharges. Stream chemistry, including DO, pH, TDS, NO Stream chemistry is also being monitored intermittently using higher temporal resolution by using a s::can spectrometer and an autosampler during storm events. The s::can is an in-situ measurement instrument for several water quality parameters (pH, TDS, DOC, NO − 3 ,DO, NH + 4 , K, F (s::can, GmBH, Vienna, Austria). The chemistry and tracer test data will help quantify the flux of fluid and solutes through the subcatchment. 5 The stream chemistry and discharge data will be combined with soil moisture, soil pore water chemistry, and groundwater data to estimate relative contributions to the stream, and underlying processes related to weathering in the near surface and aquifer.
Preliminary results from Garner Run indicate lower concentrations of Ca, Mg, and K, compared to Shale Hills. In addition, as expected, an initial constant injection tracer test 10 at Garner Run revealed significant exchange with the subsurface during low-flow conditions (∼ 0.004 m 3 s −1 ). Tracer test and temperature results documented that the stream sometimes loses and sometimes gains water in different sections over the 500 m experimental reach. One point stood out along the reach: both DTS and stream chemistry measurements are consistent with a significant input of groundwater at ∼ 100 m down-15 stream of the catena (Fig. 5). The DTS time series data will be analyzed to identify locations and magnitudes of groundwater inputs, as well as characteristic responses to rainfall events or changes in stream discharge. In combination with the tracer tests, DTS, and chemistry results, we will use well logs and lidar topography to explain the lithological and geomorphologic controls on the SW-GW system. 20 To characterize the major controls and processes governing WEGSS fluxes through the entire Shavers Creek catchment, we are making strategic measurements across the watershed to represent variability: stream discharge, stream chemistry, lithology, and geomorphology. Stream discharge and chemistry are being monitored along the main stem of Shavers Creek (SCAL, SCBL, and SCO) as shown in Fig. 3. At each loca-25 tion we are constructing a stage-discharge rating curve, and monitoring stage continuously using pressure transducers (Hobo U-20, Onset Computer Corp., Hyannis, MA). Streamwater-groundwater exchange characteristics will be measured as the channel crosses varying lithologies using a series of tracer injection tests. Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | will be measured monthly at each sampling site along Shavers creek. Analyses from the main stem of Shavers Creek provides a spatial integration of solute behavior from upstream lithologies and land use types. Eventually, with data from the three subcatchments on shale, sandstone, and calcareous shale, we will make estimates for nonmonitored catchments and test up-scaled estimates of the processes observed in each 5 small watershed. Preliminary stream chemistry and discharge results indicate significant variability among the three monitoring locations along Shavers Creek (Fig. 10). We see declining concentrations with increasing discharge for Mg and Ca (not shown), and somewhat chemostatic behavior for Si, K, nitrate and others. In this context, chemostatic is used to 10 refer to concentrations of a stream that vary little with discharge (Godsey et al., 2009). Concentrations of Si decrease downstream (a dilution trend), while concentrations increase for Mg and nitrate, possibly due to agricultural amendments in the lower half of the watershed. The variety of behaviors will be investigated with respect to land use and lithology changes through the catchment. 15

Conclusions: measuring and modelling the CZ
Many environmental scientists worldwide are embracing the concept of the critical zone -the surface environment considered over all relevant timescales from the top of the vegetation canopy to the bottom of ground water. CZ science is built upon the hypothesis that an investigation of the entire object -the CZ -will yield insights that more 20 disciplinary-specific investigations cannot. To understand the evolution and dynamics of the CZ, we are developing a suite of simulation models as shown in Table 1 ( Duffy et al., 2014). These models are being parameterized based on measurements made at the Susquehanna Shale Hills Critical Zone Observatory (SSHCZO) which is currently expanding from less than 1 to 165 km 2 . 25 In this paper we described an approach for assessing the CZ in the larger watershed. In effect, our measurement design is a hypothesis in answer to this question: if we want to understand the dynamics and evolution of the entire CZ, what measurements are needed and where should they be made? Our approach emphasizes upscaling from 1-D to 2-D to 3-D using a catena paradigm for ground measurements that are extended with geological, geophysical, lidar, stream and meteorological measurements. Of course, our dataset has very low or no sampling replication within each catchment 5 and we have only designed for one catchment per parent material. Obviously, there is a tension between monitoring a core dataset over time (a geological or hydrological approach) vs. the replication that is needed for spatial characterization (a soil science or ecological approach). Our spatial design was chosen based on the implicit assumption that implementation of Ground HOG and Tower HOG in each subcatchment could 10 be upscaled to the entire watershed by interpolation, extrapolation, and modelling as described in Table 1. For example, we are testing the hypothesis that fewer soil pits are needed because we are using a regolith formation model and geological knowledge to site the few pits that we dig.
As an example of this approach, we point to our earlier observation of loss of Al, Na, 15 Si and P from the soils at the same time that we identified significant enrichment in Mg, K, and Fe (Fig. 8). Simple mass balance arguments can be used to show that the enrichments in these latter elements are not likely due to residual accumulation during weathering of the parent orthoquartzite: prohibitively large thickness of quartzite would have had to weather away without loss of any Mg, K or Fe to enrich the soils ade-20 quately. On the other hand, accumulation of dust during weathering over a significant time period could explain the enrichment. Alternately, downward mobilization of fine particles from weathering of the overlying Rose Hill shale or interfingered shaley units might adequately explain the enrichment in these elements. Use of Regolith-RT-PIHM (Table 1) or WITCH (Godderis et al., 2006)  in the Shavers creek watershed. In other words, the numerical models in Table 1 will be used to extend beyond the limited observations. Of course, we can also augment the sampling design described here with brief measurement campaigns inside and outside the subcatchments or Shavers creek watershed as warranted. For example, while we will only monitor soil CO 2 continuously at 5 a few catena positions and soil depths, we can augment these high frequency data with spatially extensive, but temporally limited measurements using manual soil gas samplers. Likewise, we may characterize vegetation and surface soil properties at 3-5 additional catchments of each parent material type using the transect design that we initiated at Garner Run (Fig. 5). In general, these outside measurements will be 10 discipline-specific excursions to understand a specific variable. Another example is a set of measurements that are ongoing in a catchment to the north of Shavers creek to investigate regolith formation and hillslope form where the erosion rate is considerably faster. At this site, we anticipate learning how to parameterize or run models of regolith formation by exploring the impact of the rate of erosion (Table 1).
As we improve our understanding of the behavior of components of the critical zone, the point is to discover system-wide patterns and processes. Throughout, upscaling will remain a challenge. There is no comprehensive mathematical model of the critical zone, partly because it would be arduous to parameterize and perhaps more importantly because we do not yet understand all the interacting governing processes 20 (Fig. 1). The research in Shavers Creek, and the work done at other critical zone observatories around the world, is an attempt to develop a system-wide process model (or ensemble of models) and to identify the essential measurements required for parameterization. The most robust models we have are conceptual models, and the most predictive are complex numerical simulations. However, both typically include only a por-25 tion of the critical zone. We seek a model that successfully explains the dynamics between topography, groundwater levels, and regolith thickness -at present we are working mostly with conceptual relationships drawn between pairs of factors (Fig. 1). Campbell Scientific CC5MPX digital camera Every 24 h a All four components of radiation (upwelling and downwelling (longwave and shortwave)) will only be measured at Shale Hills Tower HOG due to the location of the Garner Run Tower HOG. To model Garner Run we will use the Shale Hills data. b originally designed as part of tower system but will be deployed at LRVF Ground HOG location because the Garner Run tower will be located outside of the catchment. c The turbulent fluxes (sensible and latent heat) and the momentum flux are computed at 30 min intervals via eddy covariance using these data collected at     ) and Garner Run (middle panel) subcatchments, emphasizing differences in slope asymmetry and hillslope length. Soil production and erosion rates for Shale Hills subcatchment were measured based on U-series isotopes and meteoric 10Be concentrations in regolith respectively (Ma et al., 2013;West et al., 2013West et al., , 2014. Erosion rate for Garner Run subcatchment is estimated based on detrital 10Be concentrations from nearby sandstone catchments with similar relief . . Y axis indicates the depth below the organic -mineral horizon interface. The normalized concentration is the mass transfer coefficient determined using average parent composition from five rocks (Supplement Table S3) from the bottom of several of the pits and Ti as the immobile element. One explanation for these plots is that Al has largely been removed or moved downward in the profile while Mg, K, and Fe have largely been added to the profile. In these plots, τ = −1 when an element is completely depleted compared to Ti in the parent material, τ = 0 when no loss or gain has occurred, and is τ > 0 when the element has been added to the profile. Introduction  Table 4 Figure 10. (a) Mg, (b) Si, and (c) Nitrate concentrations and stream discharge measured at three locations on Shavers Creek: Above Lake (SCAL, blue), Below Lake (SCBL, red), and the Outlet (SCO, yellow).