ObsPy Tutorial
Handling Waveform Data
</div> </div> </div> image: User:Abbaszade656 / Wikimedia Commons / CC-BY-SA-4.0

Workshop for the "Training in Network Management Systems and Analytical Tools for Seismic"

Baku, October 2018

Seismo-Live: http://seismo-live.org

Authors:

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = 12, 8

  • read waveform data is returned as a Stream object.
In [2]:
from obspy import read
st = read("./data/waveform_PFO.mseed", format="mseed")
print(st)
2 Trace(s) in Stream:
II.PFO.00.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:22.969500Z | 20.0 Hz, 60000 samples
II.PFO.10.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:22.994500Z | 40.0 Hz, 120000 samples
  • UNIX wildcards can be used to read multiple files simultaneously
  • automatic file format detection, no need to worry about file formats

    • currently supported: MiniSEED, SAC, SEG-Y, SEG-2, GSE1/2, Seisan, SH, DataMark, CSS, wav, y, Q, ASCII, Guralp Compressed Format, Kinemetrics EVT, K-NET&KiK formats, PDAS, Reftek 130, WIN (keeps growing...)
    • more file formats are included whenever a basic reading routine is provided (or e.g. sufficient documentation on data compression etc.)
    • custom user-specific file formats can be added (through plug-ins) to be auto-discovered in a local ObsPy installation
In [3]:
from obspy import read
st = read("./data/waveform_*")
print(st)
8 Trace(s) in Stream:
GR.BFO..BHE   | 2011-03-11T05:46:23.021088Z - 2011-03-11T06:36:22.971088Z | 20.0 Hz, 60000 samples
GR.BFO..BHN   | 2011-03-11T05:46:23.021088Z - 2011-03-11T06:36:22.971088Z | 20.0 Hz, 60000 samples
GR.BFO..BHZ   | 2011-03-11T05:46:23.021088Z - 2011-03-11T06:36:22.971088Z | 20.0 Hz, 60000 samples
II.PFO.00.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:22.969500Z | 20.0 Hz, 60000 samples
II.PFO.10.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:22.994500Z | 40.0 Hz, 120000 samples
SY.PFO.S3.MXE | 2011-03-11T05:47:31.587750Z - 2011-03-11T06:36:22.974250Z | 6.2 Hz, 18152 samples
SY.PFO.S3.MXN | 2011-03-11T05:47:31.587750Z - 2011-03-11T06:36:22.974250Z | 6.2 Hz, 18152 samples
SY.PFO.S3.MXZ | 2011-03-11T05:47:31.587750Z - 2011-03-11T06:36:22.974250Z | 6.2 Hz, 18152 samples
  • for MiniSEED files, only reading short portions of files without all of the file getting read into memory is supported (saves time and memory when working on large collections of big files)
In [4]:
from obspy import UTCDateTime

t = UTCDateTime("2011-03-11T05:46:23.015400Z")
st = read("./data/waveform_*", starttime=t + 10 * 60, endtime=t + 12 * 60)
print(st)
8 Trace(s) in Stream:
GR.BFO..BHE   | 2011-03-11T05:56:23.021088Z - 2011-03-11T05:58:23.021088Z | 20.0 Hz, 2401 samples
GR.BFO..BHN   | 2011-03-11T05:56:23.021088Z - 2011-03-11T05:58:23.021088Z | 20.0 Hz, 2401 samples
GR.BFO..BHZ   | 2011-03-11T05:56:23.021088Z - 2011-03-11T05:58:23.021088Z | 20.0 Hz, 2401 samples
II.PFO.00.BHZ | 2011-03-11T05:56:23.019500Z - 2011-03-11T05:58:23.019500Z | 20.0 Hz, 2401 samples
II.PFO.10.BHZ | 2011-03-11T05:56:23.019500Z - 2011-03-11T05:58:23.019500Z | 40.0 Hz, 4801 samples
SY.PFO.S3.MXE | 2011-03-11T05:56:23.084250Z - 2011-03-11T05:58:23.078750Z | 6.2 Hz, 744 samples
SY.PFO.S3.MXN | 2011-03-11T05:56:23.084250Z - 2011-03-11T05:58:23.078750Z | 6.2 Hz, 744 samples
SY.PFO.S3.MXZ | 2011-03-11T05:56:23.084250Z - 2011-03-11T05:58:23.078750Z | 6.2 Hz, 744 samples

The Stream Object

  • A Stream object is a collection of Trace objects
In [5]:
from obspy import read
st = read("./data/waveform_PFO.mseed")
print(type(st))
<class 'obspy.core.stream.Stream'>
In [6]:
print(st)
2 Trace(s) in Stream:
II.PFO.00.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:22.969500Z | 20.0 Hz, 60000 samples
II.PFO.10.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:22.994500Z | 40.0 Hz, 120000 samples
In [7]:
print(st.traces)
[<obspy.core.trace.Trace object at 0x11d7c6940>, <obspy.core.trace.Trace object at 0x11d9b2710>]
In [8]:
print(st[0])
II.PFO.00.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:22.969500Z | 20.0 Hz, 60000 samples
  • More traces can be assembled using + operator (or using the .append() and .extend() methods)
In [9]:
st1 = read("./data/waveform_PFO.mseed")
st2 = read("./data/waveform_PFO_synthetics.mseed")

st = st1 + st2
print(st)

st3 = read("./data/waveform_BFO_BHE.sac")

st += st3
print(st)
5 Trace(s) in Stream:
II.PFO.00.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:22.969500Z | 20.0 Hz, 60000 samples
II.PFO.10.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:22.994500Z | 40.0 Hz, 120000 samples
SY.PFO.S3.MXE | 2011-03-11T05:47:31.587750Z - 2011-03-11T06:36:22.974250Z | 6.2 Hz, 18152 samples
SY.PFO.S3.MXN | 2011-03-11T05:47:31.587750Z - 2011-03-11T06:36:22.974250Z | 6.2 Hz, 18152 samples
SY.PFO.S3.MXZ | 2011-03-11T05:47:31.587750Z - 2011-03-11T06:36:22.974250Z | 6.2 Hz, 18152 samples
6 Trace(s) in Stream:
II.PFO.00.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:22.969500Z | 20.0 Hz, 60000 samples
II.PFO.10.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:22.994500Z | 40.0 Hz, 120000 samples
SY.PFO.S3.MXE | 2011-03-11T05:47:31.587750Z - 2011-03-11T06:36:22.974250Z | 6.2 Hz, 18152 samples
SY.PFO.S3.MXN | 2011-03-11T05:47:31.587750Z - 2011-03-11T06:36:22.974250Z | 6.2 Hz, 18152 samples
SY.PFO.S3.MXZ | 2011-03-11T05:47:31.587750Z - 2011-03-11T06:36:22.974250Z | 6.2 Hz, 18152 samples
GR.BFO..BHE   | 2011-03-11T05:46:23.021088Z - 2011-03-11T06:36:22.971088Z | 20.0 Hz, 60000 samples
  • convenient (and nicely readable) looping over traces
In [10]:
for tr in st:
    print(tr.id)
II.PFO.00.BHZ
II.PFO.10.BHZ
SY.PFO.S3.MXE
SY.PFO.S3.MXN
SY.PFO.S3.MXZ
GR.BFO..BHE
  • Stream is useful for applying the same processing to a larger number of different waveforms or to group Traces for processing (e.g. three components of one station in one Stream)

The Trace Object

  • a Trace object is a single, contiguous waveform data block (i.e. regularly spaced time series, no gaps)
  • a Trace object contains a limited amount of metadata in a dictionary-like object (as Trace.stats) that fully describes the time series by specifying..
    • recording location/instrument (network, station, location and channel code)
    • start time
    • sampling rate
In [11]:
st = read("./data/waveform_PFO.mseed")
tr = st[0]  # get the first Trace in the Stream
print(tr)
II.PFO.00.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:22.969500Z | 20.0 Hz, 60000 samples
In [12]:
print(tr.stats)
         network: II
         station: PFO
        location: 00
         channel: BHZ
       starttime: 2011-03-11T05:46:23.019500Z
         endtime: 2011-03-11T06:36:22.969500Z
   sampling_rate: 20.0
           delta: 0.05
            npts: 60000
           calib: 1.0
         _format: MSEED
           mseed: AttribDict({'dataquality': 'M', 'number_of_records': 32, 'encoding': 'STEIM1', 'byteorder': '>', 'record_length': 4096, 'filesize': 356352})
  • For custom applications it is sometimes necessary to directly manipulate the metadata of a Trace.
  • The metadata of the Trace will stay consistent, as all values are derived from the starttime, the data and the sampling rate and are updated automatically
In [13]:
print(tr.stats.delta, "|", tr.stats.endtime)
0.05 | 2011-03-11T06:36:22.969500Z
In [14]:
tr.stats.sampling_rate = 5.0
print(tr.stats.delta, "|", tr.stats.endtime)
0.2 | 2011-03-11T09:06:22.819500Z
In [15]:
print(tr.stats.npts)
60000
In [16]:
tr.data = tr.data[:100]
print(tr.stats.npts, "|", tr.stats.endtime)
100 | 2011-03-11T05:46:42.819500Z
  • convenience methods make basic manipulations simple
In [17]:
tr = read("./data/waveform_PFO.mseed")[0]
tr.plot()
Out[17]:
In [18]:
print(tr)
tr.resample(sampling_rate=100.0)
print(tr)
II.PFO.00.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:22.969500Z | 20.0 Hz, 60000 samples
II.PFO.00.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:23.009500Z | 100.0 Hz, 300000 samples
In [19]:
print(tr)
tr.trim(tr.stats.starttime + 12 * 60, tr.stats.starttime + 14 * 60)
print(tr)
tr.plot()
II.PFO.00.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:23.009500Z | 100.0 Hz, 300000 samples
II.PFO.00.BHZ | 2011-03-11T05:58:23.019500Z - 2011-03-11T06:00:23.019500Z | 100.0 Hz, 12001 samples
Out[19]:
In [20]:
tr.detrend("linear")
tr.taper(max_percentage=0.05, type='cosine')
tr.filter("lowpass", freq=0.1)
tr.plot()
Out[20]:
In [21]:
# try tr.<Tab> for other methods defined for Trace
tr.detrend?
In [22]:
print(tr.data[:20])
[  0.00000000e+00   1.14214131e-11   1.36532184e-10   8.27107696e-10
   3.42897869e-09   1.10577664e-08   2.98034791e-08   7.02499050e-08
   1.49294808e-07   2.92257039e-07   5.35256796e-07   9.27855448e-07
   1.53594149e-06   2.44484947e-06   3.76269890e-06   5.62394072e-06
   8.19309900e-06   1.16686963e-05   1.62873519e-05   2.23280418e-05]
  • Data can be directly modified e.g. ..

..by doing arithmetic operations (fast, handled in C by NumPy)

In [23]:
print(tr.data ** 2 + 0.5)
[  5.00000000e-01   5.00000000e-01   5.00000000e-01 ...,   2.19703659e+10
   2.19251720e+10   2.18793418e+10]

..by using numpy.ndarray builtin methods (also done in C by NumPy)

In [24]:
print(tr.data.max())
print(tr.data.mean())
print(tr.data.ptp())
# try tr.data.<Tab> for a list of numpy methods defined on ndarray
560511.615943
6903.56237783
1074087.25528

..by using numpy functions (also done in C by NumPy)

In [25]:
import numpy as np
print(np.abs(tr.data))
# you can try np.<Tab> but there is a lot in there
# try np.a<Tab>
[  0.00000000e+00   1.14214131e-11   1.36532184e-10 ...,   1.48224039e+05
   1.48071510e+05   1.47916672e+05]

..by feeding pointers to existing C/Fortran routines from inside Python!

This is done internally in several places, e.g. for cross correlations, beamforming or in third-party filetype libraries like e.g. libmseed.

  • Trace objects can also be manually generated with data in a numpy.ndarray (e.g. when needing to parse waveforms from non-standard ascii files)
In [26]:
from obspy import Trace

x = np.random.randint(-100, 100, 500)
tr = Trace(data=x)
tr.stats.station = "XYZ"
tr.stats.starttime = UTCDateTime()

tr.plot()
Out[26]:
  • Stream objects can be assembled from manually generated Traces
In [27]:
from obspy import Stream

tr2 = Trace(data=np.random.randint(-300, 100, 1000))
tr2.stats.starttime = UTCDateTime()
tr2.stats.sampling_rate = 10.0
st = Stream([tr, tr2])

st.plot()
Out[27]:

Builtin methods defined on Stream / Trace

  • Most methods that work on a Trace object also work on a Stream object. They are simply executed for every trace. See ObsPy documentation for an overview of available methods (or try st.<Tab>).
    • st.filter() - Filter all attached traces.
    • st.trim() - Cut all traces.
    • st.resample() / st.decimate() - Change the sampling rate.
    • st.trigger() - Run triggering algorithms.
    • st.plot() / st.spectrogram() - Visualize the data.
    • st.attach_response()/st.remove_response(), st.simulate() - Instrument correction
    • st.merge(), st.normalize(), st.detrend(), st.taper(), ...
  • A Stream object can also be exported to many formats, so ObsPy can be used to convert between different file formats.
In [28]:
st = read("./data/waveform_*.sac")
st.write("output_file.mseed", format="MSEED")
!ls -l output_file*
-rw-r--r--  1 lion  staff  737280 Nov  4 09:56 output_file.mseed

Trace Exercises

  • Make an numpy.ndarray with zeros and (e.g. use numpy.zeros()) and put an ideal pulse somewhere in it
  • initialize a Trace object with your data array
  • Fill in some station information (e.g. network, station, ..)
  • Print trace summary and plot the trace
  • Change the sampling rate to 20 Hz
  • Change the starttime of the trace to the start time of this session
  • Print the trace summary and plot the trace again
In [29]:
x = np.zeros(300)
x[100] = 1.0
tr = Trace(data=x)
tr.stats.station = "ABC"
tr.stats.sampling_rate = 20.0
tr.stats.starttime = UTCDateTime(2014, 2, 24, 15, 0, 0)
print(tr)
tr.plot()
.ABC.. | 2014-02-24T15:00:00.000000Z - 2014-02-24T15:00:14.950000Z | 20.0 Hz, 300 samples
Out[29]:
  • Use tr.filter(...) and apply a lowpass filter with a corner frequency of 1 Hertz.
  • Display the preview plot, there are a few seconds of zeros that we can cut off.
In [30]:
tr.filter("lowpass", freq=1)
tr.plot()
Out[30]:
  • Use tr.trim(...) to remove some of the zeros at start and at the end
  • show the preview plot again
In [31]:
tr.trim(tr.stats.starttime + 3, tr.stats.endtime - 5)
tr.plot()
Out[31]:
  • Scale up the amplitudes of the trace by a factor of 500
  • Add standard normal gaussian noise to the trace (use np.random.randn())
  • Display the preview plot again
In [32]:
tr.data = tr.data * 500
tr.data = tr.data + np.random.randn(len(tr))
tr.plot()
Out[32]:

Stream Exercises

  • Read all Tohoku example earthquake data into a stream object ("./data/waveform_*")
  • Print the stream summary
In [33]:
st = read("./data/waveform_*")
print(st)
8 Trace(s) in Stream:
GR.BFO..BHE   | 2011-03-11T05:46:23.021088Z - 2011-03-11T06:36:22.971088Z | 20.0 Hz, 60000 samples
GR.BFO..BHN   | 2011-03-11T05:46:23.021088Z - 2011-03-11T06:36:22.971088Z | 20.0 Hz, 60000 samples
GR.BFO..BHZ   | 2011-03-11T05:46:23.021088Z - 2011-03-11T06:36:22.971088Z | 20.0 Hz, 60000 samples
II.PFO.00.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:22.969500Z | 20.0 Hz, 60000 samples
II.PFO.10.BHZ | 2011-03-11T05:46:23.019500Z - 2011-03-11T06:36:22.994500Z | 40.0 Hz, 120000 samples
SY.PFO.S3.MXE | 2011-03-11T05:47:31.587750Z - 2011-03-11T06:36:22.974250Z | 6.2 Hz, 18152 samples
SY.PFO.S3.MXN | 2011-03-11T05:47:31.587750Z - 2011-03-11T06:36:22.974250Z | 6.2 Hz, 18152 samples
SY.PFO.S3.MXZ | 2011-03-11T05:47:31.587750Z - 2011-03-11T06:36:22.974250Z | 6.2 Hz, 18152 samples
  • Use st.select() to only keep traces of station BFO in the stream. Show the preview plot.
In [34]:
st = st.select(station="BFO")
st.plot()