Bulletin of the Seismological Society of America
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Bulletin of the Seismological Society of America; February 2006; v. 96; no. 1; p. 90-106; DOI: 10.1785/0120050067
© 2006 Seismological Society of America
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (28)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Helmstetter, A.
Right arrow Articles by Jackson, D. D.
Right arrow Search for Related Content
GeoRef
Right arrow GeoRef Citation

Article

Comparison of Short-Term and Time-Independent Earthquake Forecast Models for Southern California

Agnès Helmstetter1, Yan Y. Kagan2 and David D. Jackson2

1 Lamont-Doherty Earth Observatory
61 Route 9W
Palisades, New York 10964
agnes{at}ldeo.columbia.edu
 (A.H.)

2 Department of Earth and Space Sciences
University of California
Los Angeles, California 90095-1567
djackson{at}ucla.edu
ykagan{at}ucla.edu
 (D.D.J., Y.Y.K.)


    Abstract
 Top
 Abstract
 Introduction
 Time-Independent Forecasts
 Time-Dependent Forecasts
 Results and Discussion
 Discussion and Conclusion
 Appendix
 
We have initially developed a time-independent forecast for southern California by smoothing the locations of magnitude 2 and larger earthquakes. We show that using small m ≥2 earthquakes gives a reasonably good prediction of m ≥5 earthquakes. Our forecast outperforms other time-independent models (Kagan and Jackson, 1994; Frankel et al., 1997), mostly because it has higher spatial resolution. We have then developed a method to estimate daily earthquake probabilities in southern California by using the Epidemic Type Earthquake Sequence model (Kagan and Knopoff, 1987; Ogata, 1988; Kagan and Jackson, 2000). The forecasted seismicity rate is the sum of a constant background seismicity, proportional to our time- independent model, and of the aftershocks of all past earthquakes. Each earthquake triggers aftershocks with a rate that increases exponentially with its magnitude and decreases with time following Omori's law. We use an isotropic kernel to model the spatial distribution of aftershocks for small (m ≤5.5) mainshocks. For larger events, we smooth the density of early aftershocks to model the density of future aftershocks. The model also assumes that all earthquake magnitudes follow the Gutenberg-Richter law with a uniform b-value. We use a maximum likelihood method to estimate the model parameters and test the short-term and time-independent forecasts. A retrospective test using a daily update of the forecasts between 1 January 1985 and 10 March 2004 shows that the short-term model increases the average probability of an earthquake occurrence by a factor 11.5 compared with the time-independent forecast.


    Introduction
 Top
 Abstract
 Introduction
 Time-Independent Forecasts
 Time-Dependent Forecasts
 Results and Discussion
 Discussion and Conclusion
 Appendix
 
Several studies show that many earthquakes are triggered in part by preceding events. Aftershocks are the most obvious examples, but many large earthquakes are preceded by and probably triggered by smaller ones. Recent studies indeed suggest that we can explain the triggering of a large earthquake by a previous smaller one using the same laws as for the triggering of small earthquakes (aftershocks) by a large one (mainshock) (Helmstetter and Sornette, 2003c; Helmstetter et al., 2005). The physics of earthquake triggering, which we use in our forecasts, is probably sufficient to explain the acceleration of seismicity sometimes observed before large earthquakes (e.g., Bufe and Varnes, 1993).

Typically, the seismicity rate just after and close to a large m ≥7 earthquake can increase by a factor 104, and stay above the background level for several decades. Small earthquakes also have a significant contribution in earthquake triggering because they are much more numerous than larger ones (Helmstetter, 2003; Helmstetter et al., 2005). As a consequence, many large earthquakes are triggered by previous smaller earthquakes (foreshocks). Also, many events are apparently triggered through a cascade process in which triggered quakes trigger others in turn.

Short-term clustering has been recognized for some time, and several quantitative models of clustering have been proposed (Kagan and Knopoff, 1987; Ogata, 1988; Reasenberg and Jones, 1989; Kagan, 1991; Reasenberg, 1999; Kagan and Jackson, 2000; Helmstetter and Sornette, 2003a; Ogata, 2004). Short-term forecasts based on earthquake clustering have already been developed. Kagan and Knopoff (1987) performed retrospective tests using the California seismicity and showed that earthquake clustering provides a way to improve earthquake forecasting significantly compared with time-independent forecasts. Jackson and Kagan (1999) and Kagan and Jackson (2000) calculated in real time short-term hazard estimates for the northwest and southwest Pacific regions since 1999 (see http://scec.ess.ucla.edu/~ykagan/predictions_index.html). More recently, Gerstenberger et al. (2005) (see also Agnew [2005]) proposed a method to provide daily forecasts of seismic hazard in California.

We have devised and implemented another method for issuing daily earthquake forecasts for southern California. Short-term effects may be viewed as temporary perturbations to a long-term earthquake potential. This long-term forecast could be a Poisson process or a time-dependent process including, for example, stress shadows. It can include any geologic information based on fault geometry and slip rate, as well as data from geodesy or paleoseismicity. As a first step, we have measured the time-independent seismic activity using instrumental seismicity (1932–2003) only. We show that this simple model performs better than a more sophisticated model that incorporates geology data and characteristic earthquakes (Frankel et al., 1997).

In distinction from the Gerstenberger et al. (2005) prediction scheme, we use a particular stochastic point process (Daley and Vere-Jones, 2004), the epidemic type earthquake sequence (ETES) model (Kagan and Knopoff, 1987; Ogata, 1988), to obtain short-term earthquake forecasts including foreshocks and aftershocks. This model is usually called "epidemic type aftershock sequence" (ETAS), but in addition to aftershocks this model also describes background seismicity, mainshocks, and foreshocks, using the same laws for all earthquakes. We use a maximum likelihood approach to estimate the parameters by maximizing the forecasting skills of the model (Kagan and Knopoff, 1987; Gerstenberger et al., 2005; D. Schorlemmer et al., unpublished manuscript, 2005).

We will compare our model with other models as part of the regional earthquake likelihood model (RELM) project to test in real time daily and long-term models for California (Kagan et al., 2003a; Jackson et al., 2004; D. Schorlemmer et al., unpublished manuscript, 2005).


    Time-Independent Forecasts
 Top
 Abstract
 Introduction
 Time-Independent Forecasts
 Time-Dependent Forecasts
 Results and Discussion
 Discussion and Conclusion
 Appendix
 
Definition of the Model
We have developed a method to estimate the probability of an earthquake as a function of space and magnitude from early instrumental seismicity. We estimate the density of seismicity µ(Formula ) by declustering and smoothing past seismicity. We use the composite seismicity catalog from the Advanced National Seismic System (ANSS), available at http://quake.geo.berkeley.edu/anss/catalog-search.html. We selected earthquakes above md 3 for the period 1932–1979 and above md 2 since 1980. We computed the background density of earthquakes on a grid that covers southern California with a resolution of 0.05° x 0.05°. The boundary of the grid is 32.45° N to 36.65° N in latitude and 121.55° W to 114.45° W in longitude. Before computing the density of seismicity, we need to decluster the catalog, to remove the largest clusters of seismicity, which would give large peaks of seismicity that do not represent the time-independent average. We used Reasenberg's (1985) declustering algorithm with parameters rfact = 20, xmeff = 2.00, p1 = 0.99, {tau}min = 1.0 day, {tau}max = 10.0 days, and with a minimum cluster size of five events. The parameters of the declustering procedure were adjusted so that the resulting catalog is close to a Poisson process. In particular, we checked that there is no residual change in seismicity rate after large earthquakes in the declustered catalog.

The declustered catalog is shown in Figure 1. Note that a better method of declustering exists, which does not need to specify a space-time window to define aftershocks and uses an ETES-type model to estimate the probability that each earthquake is an aftershock (Kagan and Knopoff, 1976; Kagan, 1999; Zhuang et al., 2004). This method is more complex to use, however, and time-consuming for a large number of earthquakes. The declustering procedure is done only for the data used to build the time-independent model µ(Formula ), also used as the input of the short-term model. The catalog used to test both models is not declustered.


Figure 001
View larger version (50K):
[in this window]
[in a new window]

 
Figure 1. Declustered catalog, obtained with Reasenberg's (1985) algorithm, including 6,861 m ≥3 earthquakes in the time window 1932–1979 and 46,937 earthquakes with m ≥2 for 1980–2003.

 

We estimate the density of seismicity in each cell by smoothing the location of each earthquake i with an isotropic adaptive kernel Kdi(r) (Izenman, 1991). The bandwidth di associated with earthquake i decreases if the density of seismicity at the location Formula i of this earthquake increases, so that we have a better resolution (smaller di) where the density is higher.

To estimate di for each earthquake, we need an initial estimation of the density µ*(Formula i) at the location of this earthquake. We estimate µ*(Formula i) using a smoothing kernel with a fixed bandwidth d = dmin = 0.5 km (current location accuracy), and summing over all earthquakes


Formula 001

(1)
We use an isotropic kernel Kd(r) given by


Formula 002

(2)
where C(d) is a normalizing factor, so that the integral of Kd(r) over an infinite area equals 1.

The bandwidth di associated with each earthquake is then given by


Formula 003

(3)
so that di is proportional to the average distance between events around this earthquake.

The background density at any point Formula (given as a number of m ≥ md events per year and per km–2) is then estimated by


Formula 004

where T is the duration (in years) of the catalog.

Our forecasts are given as an average number of events in each cell. The background density defined by (4) has spatial variations at scales smaller than the grid resolution ({approx}5 km). Therefore, we need to integrate the value of µ(Formula ) defined by (4) over each cell to obtain the background rate in this cell. The advantage of the function (2) is that we can compute analytically the integral of Kd(x, y) over one dimension x or y and then compute numerically the integral in the other dimension.

We estimate the parameter d0 in (3) by optimizing the likelihood of the model. We use the data from 1932 to 1995 to compute the density µ(Formula ) on each cell and the data from 1996 until 2003 to evaluate the model.

The log likelihood of the model is given by the sum over all cells:


Formula 005

(5)
where n is the number of events that occurred in the cell (ix, iy).

Assuming a Poisson process, the probability p(µ(ix, iy), n) of having n events in the cell (ix, iy) is given by


Formula 006

(6)
where µ(ix, iy) represents the seismicity density integrated over the cell ix, iy per time unit. The optimization gives d0 = 0.0045. The background density for the declustered catalog is shown in Figure 2.


Figure 002
View larger version (88K):
[in this window]
[in a new window]

 
Figure 2. Time-independent density obtained by declustering and smoothing the ANSS catalog.

 

Note that using two different data sets to compute µ(Formula ) and LL is important, otherwise the optimization of LL gives a smoothing distance di = dmin for all earthquakes (d0 = 0), that is, all the weight is at the location of observed earthquakes (Kagan and Jackson, 1994).

Note also that it is not necessary to introduce bins to test the models. We could have computed the LL of the continuous model µ(Formula ), given by the sum over all events:


Formula 007

(7)
where E(N) is the expected number of target earthquakes. The binning was introduced to follow the rules adopted by RELM for the real-time tests of earthquake forecasts in southern California. In these tests, each author must submit a table of the number of events in each cell, not a program. The earthquake numbers cannot be easily tested for a continuum model. The binning is also introduced to avoid problems due to location errors.

Comparison with Other Time-Independent Models
We have compared this model with other time- independent forecasts for southern California, the model of Kagan and Jackson (1994) (see also http://moho.ess.ucla.edu/~kagan/s_cal_tbl_new.dat) and the model of Frankel et al. (1997).

Kagan and Jackson (1994, 2000) Forecasts.
Kagan and Jackson (1994) (KJ94) used a similar smoothing method to forecast the seismicity rate in California (see also Kagan et al. [2003b]). The main differences between their algorithm and the present work are:

We modified our model to compare with KJ94 model, to use the same grid (from 31.95° to 37.05° in latitude and from –122.05° to –113.95° in longitude) with the same resolution of 0.1°. Both models were developed by using only data before 1990 to estimate the parameters and the density µ. We estimated the parameter d0 (defined in equation 3) of our time-independent model by using the data until 1 January 1986 to compute µ and the data from 1 January 1986 to 1 January 1990 to estimate the likelihood LL of the model. The optimization gives d0 = 0.004. We then use this value of d0 and the data until 1 January 1990 to estimate the average density µ(Formula ).

We use the log likelihood defined in (5) to compare the KJ94 model with our model. Because we want to test only the spatial distribution of earthquakes, not the predicted total number, we normalized both models by the observed number of earthquakes (N = 56). We obtain LL = –433 for the KJ94 model and LL = –389 for our model. Both models are shown in Figure 3 together with the observed m ≥5 earthquakes since 1990. The present work thus improves the prediction of KJ94 by a factor (ratio of probabilities) exp((433 – 389)/56) = 2.2 per earthquake, despite being much simpler (isotropic and point-source model). This result suggests that including small earthquakes (m ≥2) to predict larger ones (m ≥5) considerably improves the predictions, because large earthquakes, in general, occur at the same location as smaller ones (Kafka and Levin, 2000).


Figure 003
View larger version (46K):
[in this window]
[in a new window]

 
Figure 3. Time-independent seismicity density for the Kagan and Jackson (1994) model (left) and for the present work (right). White circles represent m ≥5 earthquakes that occurred between 1990 and 2004.

 

For comparison, a purely uniform model, with an expected number of 4.0 m ≥5 events per year, has a likelihood of –461. The prediction gain relative to this uniform model is 3.6 for our model and 1.6 for KJ94.

Frankel et al. (1997) Forecasts.
The Frankel et al. (1997) (F97) model is a more complex model that includes both a smoothed historical and instrumental seismicity (using m ≥4 earthquakes since 1933 and m ≥6 earthquakes since 1850) and characteristic earthquakes on known faults, with a seismicity rate constrained by the geologic slip rate and a rupture length controlled by the fault length. The magnitude distribution follows the Gutenberg-Richter (GR) law with b = 0.9 for small magnitudes (m ≤6.2) and a bump for m >6.2 due to characteristic events. We adjusted our model to use only data before 1996 to build the model and the same grid as F97 with a resolution of 0.1°. We assumed a GR distribution with b = 1 and with an upper magnitude cutoff at m 8 (Bird and Kagan, 2004). We used the ANSS catalog for the period 1932–1995 to estimate the average rate of m ≥4 earthquakes (without declustering the catalog). We then estimate the average rate of m ≥5 earthquakes from the number of m ≥4 events by using the GR law. We use m ≥5 earthquakes in the ANSS catalog that occurred since 1996 to compare the models. Both models are illustrated on Figure 4.


Figure 004
View larger version (45K):
[in this window]
[in a new window]

 
Figure 4. Time-independent density of m ≥5 earthquakes for the Frankel et al. (1997) model (left) and for our model (right). White circles represent m ≥5 earthquakes that occurred between 1996 and 2004.

 

We test how each model explains the number of observed events, as well as their location and magnitude, by comparing the likelihood of each model. The log likelihood is defined by


Formula 008

(8)
where n is the number of events that occurred in the cell (ix, iy) and in the magnitude bin im. The magnitude range (5.0–8.0) is divided in bins of 0.1 unit. The expected number in this bin is µ(ix, iy) P(im)T.

The log likelihood is LL = –155 for our model and LL = –161 for the F97 model. For comparison, a time- independent model (with a uniform density, a GR magnitude distribution with b = 1, and the same expected number of events as our model) gives LL = –168. Our model has a probability gain of 1.5 compared with F97 and a gain of 2.4 compared with the uniform model. Our model thus better predicts the observed earthquake occurrence since 1996 than the F97 model. F97, however, better predicts the observed number than our model, because the number of m ≥5 earthquakes in the period 1996–2004 was smaller than the average rate between 1932 and 1995 (predicted number N = 14.6 for F97 and N = 26.6 for our model, compared with the observed number N = 15). The difference in likelihood between the two models is mainly due to the choice of the kernel and of the minimum magnitude used to estimate the seismicity rate. F97 use a smoother kernel, with a fixed characteristic smoothing distance of 10 km and with an approximately 1/r decay, and only m ≥4 earthquakes.


    Time-Dependent Forecasts
 Top
 Abstract
 Introduction
 Time-Independent Forecasts
 Time-Dependent Forecasts
 Results and Discussion
 Discussion and Conclusion
 Appendix
 
Definition of the ETES Model
The ETES model is based on two empirical laws of seismicity, which can also be reproduced by a multitude of physical mechanisms: the G-R law to model the magnitude distribution and Omori's law to characterize the decay of triggered seismicity with the time since the mainshock (Kagan and Knopoff, 1987; Ogata, 1988; Kagan, 1991; Kagan and Jackson, 2000; Helmstetter and Sornette, 2003a; Rhoades and Evison, 2004). This model assumes that all earthquakes may be simultaneously mainshocks, aftershocks, and possibly foreshocks. Each earthquake triggers direct aftershocks with a rate that increases exponentially ~10{alpha}m with the earthquake magnitude m and that decays with time according to Omori's law. We also assume that all earthquakes have the same magnitude distribution, which is independent of the past seismicity. Each earthquake thus has a finite probability of triggering a larger earthquake. An observed "aftershock" sequence in the ETES model is the sum of a cascade of events in which each event can trigger more events.

The global seismicity rate {lambda}(t, Formula , m) is the sum of a background rate µb(Formula ), usually taken as a spatially nonhomogeneous Poisson process, and the sum of dependent events of all past earthquakes


Formula 009

(9)
where Pm(m) is a time-independent magnitude distribution (see equation 13). The function {phi}m(Formula ,t) gives the spatiotemporal distribution of triggered events at point Formula and at time t after an earthquake of magnitude m


Formula 010

(10)
where {rho}(m) is the average number of earthquakes triggered directly by an earthquake of magnitude m ≥ md


Formula 011

(11)
the function {psi}(t) is Omori's law normalized to 1 Formula


Formula 012

(12)
and f(Formula ,m) is the normalized aftershock density at a distance Formula relative to the mainshock of magnitude m. We have tested different choices for f(Formula , m), which are described in the Spatial Distribution of Aftershocks section. We fix c-value in Omori's law (12) equal to 0.0035 day (5 min). This parameter is not important as long as it is much smaller than the time window T = 1 day of the forecast.

The exponent {alpha} has been found equal or close to 1.0 for the southern California seismicity (Felzer et al., 2004; Helmstetter et al., 2005), equal to the GR b-value, showing that small earthquakes are collectively as important as larger ones for seismicity triggering. Note that in the sum in (9) we consider only earthquakes above the detection magnitude md. Smaller undetected earthquakes may also have an important contribution to the rate of triggered seismicity. These undetected earthquakes may thus bias the parameters of the model, that is, the parameters estimated by optimizing the likelihood of the models are "effective parameters," which more or less account for the influence of undetected small earthquakes.

The ETES model assumes that each primary aftershock may trigger its own aftershocks (secondary events). Secondary aftershocks may themselves trigger tertiary aftershocks and so on, creating a cascade process. The exponent p, which describes the time distribution of direct aftershocks, is larger than the observed Omori exponent, which characterizes the whole cascade of direct and secondary aftershocks (Helmstetter and Sornette, 2002).

As a first step, we use a simple GR magnitude distribution


Formula 013

(13)
with a uniform b-value equal to 1.0, a upper cutoff at mmax = 8 (Bird and Kagan, 2004), and a minimum magnitude md = 2. If we want to predict relatively small m ≤ 4 earthquakes, we must take into account the fact that small earthquakes are missing in the catalog after a large mainshock. The procedure for correcting for undetected small earthquakes is described in the Threshold Magnitude section.

The background rate µb in (9) is given by


Formula 014

(14)
where µ0(Formula ) is equal to our time-independent model µ(Formula ) (4) normalized to 1, so that µs represents the expected number of background events per day with m ≥ md.

We use the QDDS and the QDM java applications (available at http://quake.wr.usgs.gov/research/software) to obtain the data (time, locations, and magnitude) in real time from several regional networks (southern and northern California, Nevada) and to create a composite catalog. We automatically update our forecast each day. The model parameters are estimated by optimizing the prediction (maximizing the likelihood of the model) using retrospective tests. The inversion method and the results are presented in the Definition of the Likelihood and Estimation of the Model Parameters section.

Application of ETES Model for Time-Dependent Forecasts
By definition, the ETES model provides the average instantaneous seismicity rate {lambda}(t) at time t given by (9), if we know all earthquakes that occurred until time t. To forecast the seismicity between the present time tp and a future time tp + T, we cannot use directly expression (9), because a significant fraction of earthquakes that will occur between time tp and time tp + T will be triggered by earthquakes that will occur between time tp and time tp + T (see Fig. 5). Therefore, the use of expression (9) to provide short-term seismicity forecasts, with a time window T of 1 day, may significantly underestimate the number of earthquakes (Helmstetter and Sornette, 2003a).


Figure 005
View larger version (8K):
[in this window]
[in a new window]

 
Figure 5. Plot of the magnitude versus time for a few days in the ANSS catalog, which illustrates the fact that a significant fraction of earthquakes that will occur in the next day (between the present time tp and tp + T) may be triggered by earthquakes that will occur in the next day (tp < t < tp + T).

 

To solve this problem, Helmstetter and Sornette (2003a) proposed to generate synthetic catalogs with the ETES model to predict the seismicity for the next day by averaging the number of earthquakes over many scenarios. This method provides a much better estimation of the number of earthquakes than the direct use of (9) but is much more complex and time consuming. Helmstetter and Sornette (2003a) have shown that, for synthetic ETES catalogs, the use of Formula to predict the number of earthquakes between tp and tp + T underestimates the number of actually occurred earthquakes by an approximate constant factor, independent of the future events number. This means that the effect of yet unobserved seismicity is to amplify the aftershock rate of past earthquakes by a constant factor.

This result suggests a simple solution to take into account the effect of yet unobserved earthquakes. We can use the ETES model (9) to predict the number of earthquakes between tp and tp + T but with effective parameters k, µs, and {alpha}, which may be different from the true ETES parameters. Instead of using the likelihood of the ETES model to estimate these parameters, as done by Kagan (1991), we will estimate the parameters of the model by optimizing the likelihood of the forecasts, defined in the Definition of the Likelihood and Estimation of the Model Parameters section. These effective parameters depend on the duration (horizon) T of the forecasts.

Threshold Magnitude
An important problem when modeling the occurrence of relatively small earthquakes m ≤ 4 in California is that the completeness magnitude significantly increases after large earthquakes (Kagan, 2004). One effect of missing earthquakes is that the model overestimates the observed number of earthquakes because small earthquakes are not detected. But another effect of missing early aftershocks is to underestimate the predicted seismicity rate, because we miss the contribution from these undetected small earthquakes in the future seismicity rate estimated from the ETES model (9). Indeed, secondary aftershocks (triggered by a previous aftershock) represent an important fraction of aftershocks (Felzer et al., 2003; Helmstetter and Sornette, 2003b).

We have developed a method to correct from both effects of undetected small aftershocks. We first estimate the threshold magnitude as a function of the time from the mainshock and of the mainshock magnitude. We analyzed all aftershock sequences of m ≥6 earthquakes in southern California since 1985. We propose the following relation between the threshold magnitude mc(t,m) at time t (in days) after a mainshock of magnitude m:


Formula 015

and


Formula 016

(15)
Of course, some fluctuations occur between one sequence and another one, but relation (15) is correct within {approx}0.2 magnitude units. This relation is illustrated on Figure 6 for 1992 Joshua Tree m 6.1, 2003 San Simeon m 6.5, and 1992 Landers m 7.3 aftershock sequences.


Figure 006
View larger version (18K):
[in this window]
[in a new window]

 
Figure 6. Magnitude versus time since mainshock for aftershocks of Joshua Tree m 6.1 (a), San Simeon m 6.5 (b), and Landers m 7.3 earthquakes (c). The continuous line represents the threshold magnitude estimated from (15) and includes the effect of all m ≥5 earthquakes. The vertical lines in (c) are due to the increase of mc(t) after large m ≥5 aftershocks. Dates of these earthquakes are shown in Table 2.

 


View this table:
[in this window]
[in a new window]

 
Table 2 Comparison of the Predicted Number of m ≥6 Events per Day, for the Days When a m ≥6 Earthquake Occurred, at the Location of the Earthquake (i.e., within the Cell of 0.05° x 0.05°), for the ETES Model (NETES, using Models 10 and 11 in Table 1) and for the Time-Independent Model (NTI)
 
We use expression (15) to estimate the detection magnitude mc(t) at the time of each earthquake. The time- dependent detection threshold mc(t) is larger than the usual threshold md for earthquakes that occurred at short times after a large m ≥5 earthquake. We select only earthquakes with m > mc to estimate the seismicity rate (9) and the likelihood of the forecasts (21).

We can also correct the forecasts for the second effect, missing contribution from undetected aftershocks in the sum (9). We can take into account the effect of missing earthquakes with md < m < mc(t) by adding a contribution to the number {rho}(m) of aftershocks of detected earthquakes m > mc(t), that is, by replacing {rho}(m) in (11) by


Formula 017

(16)
where mc(t) is the detection threshold at the time t of the earthquake, estimated by (15), due to the effect of all previous m ≥5 earthquakes. The second contribution corresponds to the effect of all earthquakes with md < m < mc(t) that occur on average for each detected earthquake. Practically, for a reasonable value of {alpha} {approx} 0.8, this correction (16) is of the same order as the contribution from observed earthquakes, because a large fraction of aftershocks are secondary aftershocks (Felzer et al., 2003), and because small earthquakes are collectively as important as larger ones for earthquake triggering if {alpha} = b.

Spatial Distribution of Aftershocks
We have tested different choices for the spatial kernel f(Formula m), which models the aftershock density at a distance r from the mainshock of magnitude m. We used a power-law function


Formula 018

(17)
and a Gaussian distribution


Formula 019

(18)
where Cpl and Cgs are normalizing factors, such that the integral of f(Formula , m) over an infinite surface is equal to 1. The spatial regularization distance d(m) accounts for the finite rupture size and for location errors. We assume that d(m) is given by


Formula 020

(19)
where the first term accounts for location accuracy and the second term represents the aftershock zone length of an earthquake of magnitude m. The parameter fd is adjusted by optimizing the prediction and should be close to 1.0 if the aftershock zone size is equal to the rupture length as estimated by Wells and Coppersmith (1994).

The Gaussian kernel (18), which describes the density of earthquakes at point Formula , is equivalent to the Rayleigh distribution ~r exp[–(r/d)2/2] of distances |Formula | used by Kagan and Jackson (2000). The choice of an exponent 1.5 in (17) is motivated by recent studies (Ogata, 2004; Console et al., 2003; Zhuang et al., 2004) who inverted this parameter in earthquake catalogs by maximizing the likelihood of the ETES model, and who all found an exponent close to 1.5. This choice is also convenient because the function (17) is integrable analytically. It predicts that the aftershock density decreases with the distance r from the mainshock as 1/r3 in the far field, proportionally to the static stress change.

For large earthquakes, which have a rupture length larger than the grid cell size of 0.05° ({approx} 5 km) and a large number of aftershocks, we can improve the model by using a more complex anisotropic kernel, as done previously by Wiemer and Katsumata (1999), Wiemer (2000), and Gerstenberger et al. (2005). We use the location of early aftershocks as a witness for estimating the mainshock fault plane and the other active faults in the vicinity of the mainshock. We compute the distribution of later aftershocks of large m ≥5.5 mainshocks by smoothing the location of early aftershocks


Formula 021

(20)
where the sum is on the mainshock and on all earthquakes that occurred within a distance Daft (m) from the mainshock and at a time smaller than the present time tp and not larger than Taft from the mainshock. We took Daft (m) = 0.02 x 100.5m km (approximately two rupture lengths) and Taft = 2 days.

The kernel f(Formula , m) in (20) used to smooth the location of early aftershocks is either a power law (17) or a Gaussian distribution (18), with an aftershock zone length given by (19) for the mainshock, but fixed to d = 2 km for the aftershocks. The density of aftershocks estimated using (20) is shown in Figure 7 for the Landers earthquake, using a power-law kernel (Fig. 7a) or a Gaussian kernel (Fig. 7b). The distribution of aftershocks that occurred after more than 2 hr after Landers (black dots) is in good agreement with the prediction based on aftershocks that occurred in the first 2 hr (white circles). In particular, the largest aftershock (Big Bear, m 6.4, latitude = 34.2°, longitude = –116.8°), which occurred about 3 hr after Landers, was preceded by other earthquakes in the first 2 hr after Landers, and is well predicted by our method. The Gaussian kernel (18) produces a density of aftershocks which is more localized than with a power-law kernel.


Figure 007
View larger version (36K):
[in this window]
[in a new window]

 
Figure 7. Density of aftershocks estimated by smoothing the location of early aftershocks (white circles) that occurred less than 2 hr after the Landers mainshock (m 7.3, 28 June 1992), using either a Gaussian kernel (18) (a) or a power-law kernel (17)(b).

 

The advantage of using the observed aftershocks to predict the spatial distribution of future aftershocks is that this method is completely automatic and fast, and it uses only information from the time and location of aftershocks that are available soon after the earthquake. It provides an accurate prediction of the spatial distribution of future aftershocks after less than 1 hr after the mainshock when enough aftershocks have occurred. Our method also has the advantage of taking into account the geometry of the active-fault network close to the mainshock, which is reflected by the spatial distribution of aftershocks.

Therefore, even if the spatial distribution of aftershock is controlled by the Coulomb stress change, it may be more accurate, much simpler, and faster to use the method described previously rather than to compute the Coulomb stress change. Indeed, the Coulomb stress-change calculation requires the knowledge of the mainshock fault plane and the slip distribution, which are available only several hours or days after a large earthquake (Scotti et al., 2003; Steacy et al., 2004). Felzer et al. (2003) have already shown that a simple forecasting model (simplified ETES model), based on the time, location, and magnitudes of all previous aftershocks, better predicts the location of future aftershocks than the Coulomb stress-change calculations do.

Definition of the Likelihood and Estimation of the Model Parameters
We use a maximum likelihood method to test the forecasts and to estimate the parameters. We have five parameters to estimate: p (Omori exponent defined in equation 12), k and {alpha} (see equation 11), µs (number of background events per day, defined by equation 14), and fd (parameter defined by equation 19, which describes the size of the aftershock zone).

The log likelihood (LL) of the forecasts is defined by (Kagan and Jackson, 2000; Kagan et al., 2003b; D. Schorlemmer et al., unpublished manuscript, 2005]


Formula 022

(21)
where n is the number of events that occurred in the bin (it, ix, iy, im).

The expected number of events per bin Np(it, ix, iy, im) is given by the integral over each space-time-magnitude bin of the predicted seismicity rate {lambda}(Formula , t, m)


Formula 023

(22)
We take a step of T = 1 day in time, 0.05 degree in space, and 0.1 in magnitude. The forecasts are updated each day at midnight Los Angeles time. We assume a Poisson process (6) to estimate the probability p(Np, n) of having exactly n events in a bin for which the expected number of events is Np.

We can simplify the expression of LL, by noting that we need to compute the seismicity rate only in the bins (ix, iy, im) that have a nonzero number of observed events n. We can rewrite (21) and (6) as


Formula 024

(23)
where Np(it) is the total predicted number of events in the time bin it (between t(it) and t(it) + T)


Formula 025

(24)
The factor fi in (24) is the integral of the spatial kernel fi(Formula Formula t) over the grid, which is smaller than 1.0 due to the finite grid size.

We maximize the log likelihood LL defined by (21) using a simplex algorithm (Press et al., 1992, p. 402), and using all earthquakes with m ≥2 since 1 January 1985 and until 10 March 2004 to test the forecasts. We take into account in the seismicity rate (9) the aftershocks of all earthquakes with m ≥2 since 1 January 1980 that occurred within the grid ([32.45° N to 36.65° N] in latitude and [121.55° W to 114.45° W] in longitude) or at less than 1° outside the grid. There are 65,664 target earthquakes above the threshold magnitude mc in the time and space window used to compute the LL. We test different models for the spatial distribution of aftershocks, a power-law kernel (17) or a Gaussian (18).

We use the probability gain per earthquake G to quantify the performance of the short-term prediction by comparison to the time-independent forecasts


Formula 026

(25)
where LLTI is the log likelihood of the time-independent model, LLETES is the likelihood of the ETES model, and N is the total number of target events. The time-independent model is obtained by taking the background density µ(Formula ) described in the Time-Independent Forecasts section and normalizing µ(Formula ) so that the total forecasted number of m ≥ mmin events is equal to the observed number. The gain defined by (25) is related to the information per earthquake I defined by Kagan and Knopoff (1977) (see also Daley and Vere-Jones [2004] and Harte and Vere-Jones [2005]) by G = 2I.

A certain caution is needed in interpreting the probability gain for the ETES model. Earthquake temporal occurrence is controlled by Omori's law, which diverges to infinity for time approaching zero. Calculating the likelihood function for aftershock sequences illustrates this point: the rate of aftershock occurrence after a strong earthquake increases by a factor of thousands. Because log(1000) = 6.9, one early aftershock yields a contribution to the likelihood function analogous to about seven additional free parameters. This means that the likelihood optimization procedure as well as the probability gain value strongly depends on early aftershocks. As Figure 6 demonstrates, many early aftershocks are missing from earthquake catalogs (Kagan, 2004); therefore, the likelihood substantially depends on poor-quality data in the beginning of the aftershock sequence.

Similarly, earthquake hypocenters are concentrated on a fractal set with a correlation dimension slightly above 2.0 (Helmstetter et al., 2005). Due to random location errors for small interearthquake distances the dimension increases close to 3.0. This signifies that the likelihood would substantially depend on location uncertainty, because kernel width Kd(Formula ) (equations 2 and 4) can be made smaller if a catalog with higher location accuracy is used.


    Results and Discussion
 Top
 Abstract
 Introduction
 Time-Independent Forecasts
 Time-Dependent Forecasts
 Results and Discussion
 Discussion and Conclusion
 Appendix
 
Model Parameters and Likelihood
The model parameters are obtained by maximizing the LL. The optimization usually converges after about 100 iterations. We have checked that the final values do not depend on the initial parameters. The results are given in Table 1 and in Figure 8. We have tested different versions of the model (various spatial kernels, unconstrained or fixed {alpha} value, and different values of the minimum magnitude).


View this table:
[in this window]
[in a new window]

 
Table 1 Model Parameters, Log Likelihood of ETES Model (LLETES) and of the Time-Independent Model (LLTI), Number N of Target Events, and Probability Gain G (25).
 

Figure 008
View larger version (13K):
[in this window]
[in a new window]

 
Figure 8. Results of the optimization of the log likelihood LL for the unconstrained model 1 by using a Gaussian kernel. Value of LL as a function of each model parameter (a)–(e) and as a function of the number of iterations (f).

 

An example of our daily forecasts (using model 3 in Table 1) is shown in Figure 9, for the day of 23 October 2004. All six earthquakes that occurred during that day are located in areas of high predicted seismicity rates (large values of Np). All except one occurred close enough in time and space from a recent earthquake, so that the short-term predicted number Np(Formula ) is larger than the average rate µ(Formula ). The probability gain (25) per earthquake for this day is 26.


Figure 009
View larger version (40K):
[in this window]
[in a new window]

 
Figure 9. (a) Forecasted number of events with m ≥2 per cell for 23 October 2004 (logarithmic scale). Black circles represent observed earthquakes with m ≥2 that occurred during this day. Two of these events are aftershocks of the m 6.5, 22 December 2003 San Simeon earthquake (located at latitude 35.7° and longitude –121.1°), three are associated with the m 6 28 September 2004 Parkfield mainshock (latitude 35.81° and longitude –120.37°), and one is an aftershock of a m 3.7 9 September 2004 earthquake (latitude 35.09° and longitude –117.52°). The predicted number of events for this day was 8.39 and the observed number was 6. Most of these earthquakes are better predicted by the ETES model than by the time-independent model (i.e., Np > µ in the cells within which these earthquakes occurred). Only one event, which occurred at 11 km away from the San Simeon earthquake (latitude 35.81° and longitude –121.02°), was better predicted by the time-independent model, because it occurred just outside of the main aftershock zone. (b) Ratio of the forecasted number of events estimated using the ETES and the time-independent models (logarithmic scale). High values of Np/µ (up to 800) are associated with recent large earthquakes, such as Parkfield, San Simeon, Landers, Hector Mine, and Northridge.

 

Figure 8 shows the LL of the daily forecasts, for the period from 1 January 1985 to 10 March 2004, and for each iteration of the optimization as a function of the model parameters. The variation of the LL with each model parameter gives an idea of the resolution of this parameter. The unconstrained inversion gives a probability gain G = 11.7, and an exponent {alpha} = 0.43, much smaller than the direct estimation {alpha} = 0.8 ± 0.1 (Helmstetter, 2003) or {alpha} = 1 (Felzer et al., 2004; Helmstetter et al., 2005) obtained by fitting the number of aftershocks as a function of the mainshock magnitude. The optimization with {alpha} fixed to 0.8, closer to the observed value, provides a probability gain G = 11.1 slightly smaller than the best model. Note that there is a negative correlation between the parameters k and {alpha} (defined by equation 11) in Table 1: k is larger for a smaller {alpha} to keep the number of forecasted earthquakes constant.

Comparison of Predicted and Observed Aftershock Rate
Figure 10 compares the predicted number of events following the Landers mainshock, for the unconstrained model 2 (see Table 1) and for models 3 and 5 with {alpha} fixed to 0.8. Model 3 underestimates the number of aftershocks but predicts the correct variation of the seismicity rate with time. In contrast, model 2 (with {alpha} = 0.43) greatly underestimates the number of aftershocks until 10 days after Landers, because the low value of {alpha} yields a relatively small increase of seismicity at the time of the mainshock. Model 2 then provides a good fit to the end of the aftershock sequence, when enough aftershocks have occurred so that the predicted seismicity rate increases because of the importance of secondary aftershocks. The saturation of the number of aftershocks at early times in Figure 10 (for both the model and the data) is due to the increase of the threshold magnitude mc (see equation 15), which recovers the usual value md 2 about 10 days after Landers. Adding the corrective term {rho}*(m) defined by (16), to account for the contribution of undetected early aftershocks in the rate of triggered seismicity, better predicts the rate of aftershocks just after Landers but gives, on average, a smaller probability gain than without including this corrective term (see models 3 and 5 in Table 1 and Fig. 10).


Figure 010
View larger version (20K):
[in this window]
[in a new window]

 
Figure 10. Observed (solid black line) and predicted number of m ≥2 earthquakes per day as a function of the time since the Landers mainshock, for model 2 (circles), model 3 (crosses), and model 5 (diamonds). The saturation at t ≤ 10 days is due to the incompleteness of the catalog for small magnitudes (see Fig. 6).

 

Figure 11 shows the predicted number of earthquakes and the probability gain (see equation 25) in the time window 1992–1995 for model 3. The model underestimates the rate of aftershocks for Joshua Tree (m 6.1) and Landers (m 7.3) mainshocks, slightly overestimates for Northridge (m 6.6), and provides a good fit (not shown) for Hector Mine (m 7.1) and for San Simeon (m 6.5). All models overestimate by a factor larger than 2 the aftershock productivity of the 1987 m 6.6 Superstition Hills earthquake. This shows that there is a variability of aftershock productivity that the model does not take into account, which may in part be due to errors in magnitudes. This implies that a model that estimates the parameters of each aftershock sequence (aftershock productivity, Omori p exponent, and the GR b-value), such as the STEP model (Gerstenberger et al., 2005) may perform better than the ETES model that uses the same parameters for all earthquakes (except for the increase in productivity {rho}(m) with magnitude).


Figure 011
View larger version (41K):
[in this window]
[in a new window]

 
Figure 11. (a) Observed (black) and predicted (gray) number of m ≥2 earthquakes per day in southern California for model 3 (see Table 1). Dashed line is the background rate µs = 2.81/day. (b) Probability gain per earthquake defined in (25).

 

Figure 12 shows the predicted number of m ≥2 earthquakes per day for model 3 (see Table 1) as a function of the observed number. Most points in this plot are close to the diagonal, that is, the model, in general, gives a good prediction of the number of events per day. A few points however have a large observed number of earthquakes but a small predicted number. These points correspond to days on which a large earthquake and its first aftershocks occurred, whereas the preceding seismicity was close to its background level, and the predicted seismicity rate was small.


Figure 012
View larger version (25K):
[in this window]
[in a new window]

 
Figure 12. Predicted number of m ≥2 earthquakes per day for model 3 (see Table 1) as a function of the observed number, for the period 1985–2003. The dashed line represents the perfect fit. The horizontal line is the background rate µs = 2.81/day.

 

We can complexify the model to take into account fluctuations of aftershock productivity, as done in the STEP model, by using early aftershocks to estimate the productivity {rho}(m) of large earthquakes. Whether magnitude errors, biases, and systematic effects significantly contribute to prediction efficiency needs to be investigated, however. A method that adjusts parameters to available data may seemingly perform better, especially in retrospective testing when various adjustments are possible. But if aftershock rate fluctuations are being caused by various technical factors and biases, this forecast advantage can be spurious.

Proportion of Aftershocks in Seismicity
The background seismicity is estimated to be µs = 2.81 m ≥2 earthquakes per day for model 3, compared with the average seismicity rate µ = 9.4, that is, the proportion of triggered earthquakes is 70%. This number underestimates the actual fraction of triggered earthquakes, because it does not count the early aftershocks that occur a few hours after a mainshock, between the present time tp and the end of the prediction window tp + T (see Fig. 6). We have also removed from the catalog aftershocks smaller than the threshold magnitude mc(t, m) given by (15).

Because the background rate represents only a small fraction of the total seismicity rate, the declustering procedure involved to estimate µ(Formula ) has only a minor influence on the performance of our short-term forecast.

Scaling of Aftershock Productivity with Mainshock Magnitude
There may be several reasons for the small value {alpha} = 0.43 selected by the optimization, compared with the value {alpha} = 1 estimated by Felzer et al. (2004) and Helmstetter et al. (2005). A smaller {alpha} value corresponds to a weaker influence of large earthquakes. A model with a small {alpha} has thus a shorter memory in time and can adapt faster to fluctuations of the observed seismicity. A smaller {alpha} predicts a larger proportion of secondary aftershocks after a large mainshock. Therefore, it can better account for fluctuations of aftershock productivity. Indeed, if the rate of early aftershocks is low, a model with a small {alpha} will predict a small number of future aftershocks (less secondary aftershocks).

A model with a smaller {alpha} is also less sensitive to magnitude errors. An error on the mainshock magnitude of 0.3 gives an error for the rate of direct aftershocks of a factor 2.0 for {alpha} = 1 and a factor 1.3 for {alpha} = 0.4. Finally, a model with a smaller {alpha} may provide a better forecast for the spatial distribution of aftershocks. Because the aftershock spatial distribution is significantly different from the isotropic model (used for m ≤5.5 earthquakes), a model with a smaller {alpha} may perform better than the model with the true {alpha}. A small {alpha} gives more importance to secondary aftershocks, and can thus better model the heterogeneity of the spatial distribution of aftershocks. In contrast, a larger {alpha} value produces a quasi- isotropic distribution at short times, dominated by the mainshock contribution.

The corrective contribution {rho}*(m) > {rho}(m) (16), introduced to take into account the contribution of missing aftershocks, can also bias the value of {alpha}. Using this term {rho}*(m) with a value of {alpha} smaller than the true value overestimates the contribution of small earthquakes just after a large earthquake when mc > md. For this reason we did not use this contribution (except for model 5 in Table 1).

The main interest of short-term forecasts is to predict the rate of seismicity after a large mainshock, when the best model with {alpha} = 0.4 clearly underestimates the observations. Therefore, we constrain the value of {alpha} = 0.8 (models 3 and 4 in Table 1). This model gives a slightly smaller likelihood than the best model but provides a best fit just after a large mainshock.

Spatial Distribution of Aftershocks
The power-law kernel (17) gives a slightly better LL than the Gaussian kernel (18) (see Table 1) for the unconstrained models 1 and 2 ({alpha} is an adjustable parameter), but the Gaussian kernel works a little better when {alpha} is fixed to 0.8 (see models 3 and 4 in Table 1). The parameter fd defined in (19) is the ratio of the typical aftershock zone d(m) (19) and of the mainshock rupture length L(m) = 0.01 x 100.5m km. For the Gaussian kernel (18) fd {approx} 1, that is, the average distance between a mainshock and its (direct) aftershocks is close to the mainshock rupture length.

For the power-law kernel (17), the average distance is not defined. In this case, d(m) is the distance at which fpl(r) starts decreasing with r. The inversion of fd using a power- law kernel gives an unrealistically small value fd ≤ 0.06 for model 2 (see Table 1), so that d(m) {approx} 0.5 km (fixed minimum value of d(m) equal to the location error) independently of the magnitude of the triggering earthquake for m ≤5. It gives short-range interactions, with most of the predicted rate concentrated in the cell of the triggering earthquake. Using a complex spatial distribution of aftershocks for m ≥5.5 earthquakes (obtained by smoothing the location of early aftershocks; see the Spatial Distribution of Aftershocks section) slightly improves the LL compared with the simple isotropic kernel (see models 3 and 6 in Table 1).

Probability Gain as a Function of Magnitude
Table 1 shows the variation of the probability gain G (25) as a function of the minimum magnitude of target events mmin. We used m ≥2 earthquakes in models 7–11 to estimate the forecasted rate of m ≥ mmin earthquakes, with mmin ranging between 3 and 6, and using the same parameters as in model 3 (but multiplying the background rate µs by Formula to estimate the background rate for m ≥ mmin earthquakes). The probability gain is slightly larger for mmin = 3 than for mmin = 2, but then G decreases with mmin for mmin ≥ 4. For mmin = 6 (only eight earthquakes), the time- independent model (with a rate adjusted so that it predicts the exact number of observed events) performs even better than the ETES model (G < 1) for model 10 in Table 1.

We think that this variation with mmin does not mean that our model predicts only small earthquakes (aftershocks), or that larger earthquakes have a different distribution in space and time than smaller ones, but that these results simply reflect the large fluctuations of the probability gain from one earthquake to another one; the difference in likelihood between ETES and the time-independent model is mainly due to a few large aftershock sequences. We thus need a large number of earthquakes and aftershock sequences to compare different forecasts (see also discussion at the end of the section on Definition of the Likelihood and Estimation of the Model Parameters).

Table 2 compares the predicted seismicity rate at the time and location of each m ≥6 earthquake, estimated for the ETES model and for the time-independent model. For each earthquake, Table 2 gives two values of the predicted number of earthquakes, using the same parameters of the ETES model, but changing the time at which we update the forecasts, either midnight (universal time) for model 10 (see line 10 in Table 1) or at 1:00 p.m. for model 11. The large differences in the predicted seismicity rate between these two models show that the forecasts are very sensitive to short-term clustering, which has a large influence on the predicted seismicity rate. This suggests that the number of m ≥6 earthquakes in the catalog (eight earthquakes from 1985 to 2004) is too small to compare our short-term and time-independent models for this magnitude range.

While some of these m ≥6 earthquakes are preceded by a short-term (hours) increase of seismicity (Superstition Hill, Joshua Tree, Landers, Big Bear, Hector Mine), the time- independent model performs better than the ETES model if the forecasts are not updated between the foreshock activity and the mainshock (e.g., with model 10, between Elmore Ranch and Superstition Hill, and between Landers and Big Bear). Landers occurred about two months after Joshua Tree, and its hypocenter was just outside the Joshua Tree aftershock zone, so that the predicted seismicity rate at the location of Landers hypocenter, and before the precursory foreshock activity (which started 6 hr before Landers) was slightly lower than the average rate. Joshua Tree had foreshocks, which started 2 hr before the mainshock and thus were not included in the daily forecasted rate for both ETES models. Hector Mine was also preceded by foreshocks, with m ≥3.6, which started about 20 hr before the mainshock. Therefore, the predicted seismicity rate (using ETES model 10) is 120 times larger than the average rate for Hector Mine. Other large m ≥6 earthquakes (Elmore Ranch, Northridge, San Simeon), were not preceded by any significant foreshock activity. Therefore the forecasted seismicity rate was smaller than the average rate.

Updating the forecasts more often (each hour, or after each earthquake) would of course improve the performance of our short-term forecasts. But optimizing and testing the forecasts would then be much more difficult and time consuming if the duration of the forecasts (one day) is different from the interval between two forecasts. Moreover, preliminary earthquake catalogs are much less accurate in the first few hours, especially after a strong earthquake.


    Discussion and Conclusion
 Top
 Abstract
 Introduction
 Time-Independent Forecasts
 Time-Dependent Forecasts
 Results and Discussion
 Discussion and Conclusion
 Appendix
 
We have first developed a time-independent model of seismicity in southern California, obtained by smoothing the location of previous m ≥2 earthquakes. Including small earthquakes improves the spatial resolution of our model; therefore, our forecasts outperform the previous model of Kagan and Jackson (1994), which used only m ≥5.5 events (both historical and instrumental). Our model also performs better than a more complex one, which incorporates geological data (Frankel et al., 1997), when tested on m ≥ 5 earthquakes since 1996. Note that the difference between those models may be negligible for hazard assessment because of the smoothing inherent in forecasting ground motion. The better resolution obtained with our method may be important however for testing and understanding the physical mechanisms of earthquake triggering.

We have then developed daily earthquake forecasts, which use our time-independent model for the background seismicity level, by adding a time-dependent contribution to model triggered seismicity. Our model is based on empirical laws of seismicity: the G-R magnitude distribution, Omori's law, and the exponential increase of triggered seismicity with the mainshock magnitude. Our model includes only data from earthquake catalogs (time, magnitude, and locations).

Our model also forecasts well the spatial distribution of future aftershocks by smoothing the locations of early aftershocks. We can obtain a good forecast of the aftershocks within a few hours of a large m ≥5.5 earthquake, based on plentiful early aftershocks. Even if the spatial distribution of aftershocks is controlled by Coulomb stress changes, our empirical method may be more accurate and faster than direct calculations of the Coulomb stress change. Our method is accurate because the distribution of early aftershocks represents well the mainshock rupture surface and because our method accounts for secondary aftershocks.

Retrospective tests for m ≥2 earthquakes in the period 1 January 1985 to 10 March 2004 show that our short-term model realizes a probability gain of 11.5 over a stationary Poisson forecast. Several features of our model could be improved. First, geologic slip rate and geodetic strain rate data could be used to better constrain the time-independent seismicity. Second, a better estimate of the magnitude distribution, resulting from statistical studies of the relationship between fault geometry and earthquakes, could improve the forecasting of large quakes. Third, other research (e.g., Gerstenberger et al., 2005) suggests that aftershock productivity and magnitude distribution may vary considerably from one sequence to another. Comparing our model with others proposed to the RELM working group (Kagan et al., 2003a; Jackson et al., 2004; D. Schorlemmer et al., unpublished manuscript, 2005) should help to improve all available models. For example, seismicity forecast for the STEP model of Gerstenberger et al. (2005) can be compared directly with our model.

Both our models have been tested on the same data as the data used to build the models. The time-independent model (also used as the background rate in ETES) depends on the location of all earthquakes until 2003 (with aftershocks removed). A better test would be a pseudo-real-time test, using completely different data to estimate the model parameters and compare the models. But there are unfortunately not enough data to do so. The value of the likelihood for a real-time prediction will thus probably be smaller (for both models) than the tests performed in this article. But the results should not vary too much, because the number of adjusted parameters (4) is much smaller than the number of target earthquakes (65,664).

We have tested our models on relatively "small" earthquakes, using a minimum magnitude mmin 5 for the time- independent model and a mmin ranging from 2 to 6 for our daily forecasts. Damaging earthquakes are usually m ≥6, but there are not enough large earthquakes to perform meaningful tests. The fact that our time-independent model, obtained by smoothing m ≥2 earthquakes, correctly predicts the location of m ≥5 earthquakes is encouraging, however, and suggests that our model also applies to larger damaging earthquakes, because large events are likely to occur at the same location as smaller ones.

In addition to the epicenter, seismic-hazard estimation also requires the specification of the fault plane. Kagan and Jackson (1994) have developed a method to forecast the orientation of the fault plane by smoothing the focal mechanisms of past m ≥5.5 earthquakes. As for forecasting epicenters, it may be useful to include small earthquakes in the forecasts to improve the resolution.


    Appendix
 Top
 Abstract
 Introduction
 Time-Independent Forecasts
 Time-Dependent Forecasts
 Results and Discussion
 Discussion and Conclusion
 Appendix
 
We acknowledge the Advanced National Seismic System for the earthquake catalog. We are grateful to the Editor Andrew Michael, to the reviewer Kristy Tiampo, and to an anonymous reviewer for useful suggestions. This work is partially supported by NSF-EAR02-30429, NSF-EAR- 0409890, the Southern California Earthquake Center (SCEC), the James S. McDonnell Foundation 21st century scientist award/studying complex system, and the Brinson Foundation. SCEC is funded by NSF Cooperative Agreement EAR-0106924 and USGS Cooperative Agreement 02HQAG0008. The SCEC contribution number for this article is 895.

Manuscript received April 4, 2005

Agnew, D. C. (2005). Earthquakes: future shock in California, Nature435 ,284 –285, doi 10.1038/435284a.[CrossRef][Medline]

Bird, P., and Y. Y. Kagan (2004). Plate-tectonic analysis of shallow seismicity: apparent boundary width, beta, corner magnitude, coupled lithosphere thickness, and coupling in seven tectonic settings, Bull. Seism. Soc. Am.94 , no. 6,2380 –2399.[Abstract/Free Full Text]

Bufe, C. G., and D. J. Varnes (1993). Predictive modeling of the seismic cycle of the greater San-Francisco Bay region, J. Geophys. Res.98 ,9871 –9883.

Console, R., M. Murru, and A. M. Lombardi (2003). Refining earthquake clustering models, J. Geophys. Res.108 , 2468, doi 10.1029/2002JB002130.[CrossRef]

Daley, D. J., and D. Vere-Jones (2004). Scoring probability forecasts for point processes: the entropy score and information gain, J. Appl. Probability41A (special issue),297 –312.[CrossRef]

Felzer, K. R., R. E. Abercrombie, and G. Ekström (2003). Secondary aftershocks and their importance for aftershock forecasting, Bull. Seism. Soc. Am.93 ,1433 –1448.[Abstract/Free Full Text]

Felzer, K. R., R. E. Abercrombie, and G. Ekström (2004). A common origin for aftershocks, foreshocks, and multiplets, Bull. Seism. Soc. Am.94 ,88 –99.[Abstract/Free Full Text]

Frankel, A., C. Mueller, T. Barnhard, D. Perkins, E. Leyendecker, N. Dickman, S. Hanson, and M. Hopper (1997). Seismic hazard maps for California, Nevada, and Western Arizona/Utah, U.S. Geol. Surv. Open-File Rept. 97-130.

Gerstenberger, M. C., S. Wiemer, L. M. Jones, and P. A. Reasenberg (2005). Real-time forecasts of tomorrow's earthquakes in California, Nature435 ,328 –331, doi 10.1038/nature03622.[CrossRef][Medline]

Harte, D., and D. Vere-Jones (2005). The entropy score and its uses in earthquake forecasting, Pure Appl. Geophys. 162, no. 6-7,1229 – 1253.[CrossRef]

Helmstetter, A. (2003). Is earthquake triggering driven by small earthquakes? Phys. Rev. Lett.91 ,058501 .[CrossRef][Medline]

Helmstetter, A., and D. Sornette (2002). Sub-critical and super-critical regimes in epidemic models of earthquake aftershocks, J. Geophys. Res.107 , 2237, doi 10.1029/2001JB001580.[CrossRef]

Helmstetter, A., and D. Sornette (2003a). Predictability in the ETAS model of interacting triggered seismicity, J. Geophys. Res. 108,2482 , doi 1029/2003JB002485.[CrossRef]

Helmstetter, A., and D. Sornette (2003b). Importance of direct and indirect triggered seismicity in the ETAS model of seismicity, Geophys. Res. Lett.30 , 1576, doi 1029/2003GL017670.[CrossRef]

Helmstetter, A., and D. Sornette (2003c). Foreshocks explained by cascades of triggered seismicity, J. Geophys. Res. 108,2457 , doi 10.1029/2003JB002409.[CrossRef]

Helmstetter, A., D. D. Jackson, and Y. Y. Kagan (2005). Importance of small earthquakes for stress transfers and earthquake triggering, J. Geophys. Res.110 , B05S08, doi 10.1029/2004JB003286.[CrossRef]

Izenman, A. J. (1991). Recent developments in non-parametric density estimation, J. Am. Stat. Assoc.86 ,205 –224.[CrossRef][Web of Science]

Jackson, D. D., and Y. Y. Kagan (1999). Testable earthquake forecasts for 1999, Seism. Res. Lett.70 , no. 4,393 –403.

Jackson, D. D., D. Schorlemmer, M. Gerstenberger, Y. Y. Kagan, A. Helmstetter, S. Wiemer, and N. Field (2004). Prospective tests of southern California earthquake forecasts (abstract), EOS Trans. AGU85 , no. 47 (Fall Meet. Suppl.), S21C-08.

Kafka, A. L., and S. Z. Levin (2000). Does the spatial distribution of smaller earthquakes delineate areas where larger earthquakes are likely to occur? Bull. Seism. Soc. Am.90 ,724 –773.[Abstract/Free Full Text]

Kagan, Y. Y. (1991). Likelihood analysis of earthquake catalogues, Geophys. J. Int.106 ,135 –148.[CrossRef]

Kagan, Y. Y. (1999). Universality of the seismic moment-frequency relation, Pure Appl. Geophys.155 ,537 –573.[CrossRef]

Kagan, Y. Y. (2004). Short-term properties of earthquake catalogs and models of earthquake source, Bull. Seism. Soc. Am. 94, no. 4,1207 – 1228.[Abstract/Free Full Text]

Kagan, Y. Y., and D. D. Jackson (1994). Long-term probabilistic forecasting of earthquakes, J. Geophys. Res. 99,13,685 –13,700.[CrossRef][Web of Science]

Kagan, Y. Y., and D. D. Jackson (2000). Probabilistic forecasting of earthquakes, Geophys. J. Int.143 ,438 –453.[CrossRef]

Kagan, Y., and L. Knopoff (1976). Statistical search for non-random features of the seismicity of strong earthquakes, Phys. Earth Planet. Interiors 12,291 –318.[CrossRef]

Kagan, Y. Y., and L. Knopoff (1977). Earthquake risk prediction as a stochastic process, Phys. Earth Planet. Interiors 14, no. 2,97 –108.[CrossRef]

Kagan, Y. Y., and L. Knopoff (1987). Statistical short-term earthquake prediction, Science236 ,1563 –1467.[Abstract/Free Full Text]

Kagan, Y. Y., D. D. Jackson, D. Schorlemmer, and M. Gerstenberger (2003a). Testing hypotheses of earthquake occurrence (abstract), Eos Trans. AGU 84, no. 47 (Fall Meet. Suppl.), S31G-01.

Kagan, Y. Y., Y. F. Rong, and D. D. Jackson (2003b). Probabilistic forecasting of seismicity, in Earthquake Science and Seismic Risk Reduction, F. Mulargia and R. J. Geller (Editors), Kluwer, Dordrecht, 185–200.

Ogata, Y. (1988). Statistical models for earthquake occurrence and residual analysis for point processes, J. Am. Statist. Assoc. 83,9 –27.[CrossRef][Web of Science]

Ogata, Y. (2004). Space-time model for regional seismicity and detection of crustal stress changes, J. Geophys. Res. 109, no. B3, art no. B03308; Correction J. Geophys. Res.109 , no. B6, art. no. B06308.

Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery (1992). Numerical Recipes in Fortran: The Art of Scientific Computing, Second Ed., Cambridge Univ. Press, New York,992 pp.

Reasenberg, P. A. (1985). Second-order moment of central California seismicity, 1969–82, J. Geophys. Res.90 ,5479 –5495.[Web of Science]

Reasenberg, P. A. (1999). Foreshock occurrence before large earthquakes, J. Geophys. Res.104 ,4755 –4768.[CrossRef]

Reasenberg, P. A., and L. M. Jones (1989). Earthquake hazard after a mainshock in California, Science243 ,1173 –1176.[Abstract/Free Full Text]

Rhoades, D. A., and F. F. Evison (2004). Long-range earthquake forecasting with every earthquake a precursor according to scale, Pure Appl. Geophys.161 , no. 1,47 –72.[CrossRef]

Scotti, O., S. Steacy, M. Cocco, J. Zahradnik, and J. McCloskey (2003). Coulomb stress modelling as a practical tool in real-time aftershock hazard assessment: the example of the PRESAP blind test (abstract), EOS Trans. AGU 84, no. 46 (Fall Meet. Suppl.), S31A-06.

Steacy, S., D. Marsan, S. S. Nalbant, and J. McCloskey (2004). Sensitivity of static stress calculations to the earthquake slip distribution, J. Geophys. Res.109 , B04303, doi 10.1029/2002JB002365.[CrossRef]

Wells, D. L., and K. J. Coppersmith (1994). New empirical relationships among magnitude, rupture length, rupture width, rupture area, and surface displacement, Bull. Seism. Soc. Am.84 ,974 –1002.[Abstract/Free Full Text]

Wiemer, S. (2000). Introducing probabilistic aftershock hazard mapping, Geophys. Res. Lett.27 ,3405 –3408.[CrossRef][Web of Science][GeoRef]

Wiemer, S., and K. Katsumata (1999). Spatial variability of seismicity parameters in aftershock zones, J. Geophys. Res. 104,13,135 –13,151.[CrossRef]

Zhuang, J., Y. Ogata, and D. Vere-Jones (2004). Analyzing earthquake clustering features by using stochastic reconstruction, J. Geophys. Res.109 , B05301, doi 10.1029/2003JB002879.[CrossRef]




This article has been cited by other articles:


Home page
Bulletin of the Seismological Society of AmericaHome page
K. R. Felzer and D. Kilb
A Case Study of Two M~5 Mainshocks in Anza, California: Is the Footprint of an Aftershock Sequence Larger Than We Think?
Bulletin of the Seismological Society of America, October 1, 2009; 99(5): 2721 - 2735.
[Abstract] [Full Text] [PDF]


Home page
Bulletin of the Seismological Society of AmericaHome page
D. A. Rhoades and M. C. Gerstenberger
Mixture Models for Improved Short-Term Earthquake Forecasting
Bulletin of the Seismological Society of America, April 1, 2009; 99(2A): 636 - 646.
[Abstract] [Full Text] [PDF]


Home page
Bulletin of the Seismological Society of AmericaHome page
S. Hainzl, A. Christophersen, and B. Enescu
Impact of Earthquake Rupture Extensions on Parameter Estimations of Point-Process Models
Bulletin of the Seismological Society of America, August 1, 2008; 98(4): 2066 - 2072.
[Abstract] [Full Text] [PDF]


Home page
Seismological  Research LettersHome page
D. Schorlemmer, M. C. Gerstenberger, S. Wiemer, D. D. Jackson, and D. A. Rhoades
Earthquake Likelihood Model Testing
Seismological Research Letters, January 1, 2007; 78(1): 17 - 29.
[Full Text] [PDF]


Home page
Seismological  Research LettersHome page
R. Console, M. Murru, F. Catalli, and G. Falcone
Real Time Forecasts through an Earthquake Clustering Model Constrained by the Rate-and-State Constitutive Law: Comparison with a Purely Stochastic ETAS Model
Seismological Research Letters, January 1, 2007; 78(1): 49 - 56.
[Full Text] [PDF]


Home page
Seismological  Research LettersHome page
A. Helmstetter, Y. Y. Kagan, and D. D. Jackson
High-resolution Time-independent Grid-based Forecast for M >= 5 Earthquakes in California
Seismological Research Letters, January 1, 2007; 78(1): 78 - 86.
[Full Text] [PDF]


Home page
Seismological  Research LettersHome page
Y. Y. Kagan, D. D. Jackson, and Y. Rong
A Testable Five-Year Forecast of Moderate and Large Earthquakes in Southern California Based on Smoothed Seismicity
Seismological Research Letters, January 1, 2007; 78(1): 94 - 98.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (28)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Helmstetter, A.
Right arrow Articles by Jackson, D. D.
Right arrow Search for Related Content
GeoRef
Right arrow GeoRef Citation


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS