Friday, March 9, 2012

Three ways to read in a data file, then plot it [UPDATED: + 1!]

UPDATE #2:  There are now 5 methods that I've tried.  My clear favorite is Pandas Data Tables.  It's much more powerful than Astropy tables, and no harder to use.  Here it is:

import pandas
pspecs = pandas.read_table(thelist, delim_whitespace=True, comment="#")p.info()
pspecs.info
pspecs.keys()
pspecs.head()
pspecs.tail()


UPDATE:  Astropy has standardized this process, so I'm adding one more method.  Python4Astronomers recommends using this way.  An example:

from astropy.io import ascii
import matplotlib.pylab as plt 
tbl = ascii.read("simple_table.csv")
tbl.colnames  
print tbl['col1']
plt.plot(tbl['col1'], tbl['col2']
plt.show() 

Below is obsolete, and useful only to show how frustrating the standard Python tutorials can be for scientists.  They treat "read in a text file, and manipulate its data" as a curiosity, rather than our bread and butter.  If only "Python for Data Analysis" had been written when I started learning Python!


First, venting about tutorials:

  • I am ready to throw python out of a tall building.  The tutorials are terrible.  I don't care about tuples yet; I just want to load a text file, plot x vs y, and maybe flag some of the datapoints.  That's not so hard, is it?
  • The most relevant tutorial I've found is Using Python for Interactive Data Analysis  (from STScI).  But it assumes your input files are .FITS, which isn't helpful when you have simple .text files.  There's no section on "Reading in data from a text file".  Why not?
  • The matplotlib demos are even worse, because they auto-generate fake data.  Cute, but would it kill them to have an example of actually reading in real data?

So, sans tutorials, but with my wife holding my hand, I wrote a basic python script that reads in a text file, and plots the data.  It reads in the text file as perl would: line by line, laboriously, which I understand is not in the python spirit, but I wanted to see how to go line-by-line if need be.  So here is "simple-read-in-and-plot-demo.py":

# minimal plot of spectrum in python
import sys
import os
import numpy
from pylab import *
wavelength = []
flux = []
sigma = []
inE = 'infile.txt'
myfile = open(inE)
for line in myfile:
    if line.startswith('#'):
        continue
    else:
        line = line.strip() # like chomp in perl
        fields = line.split()
        wavelength.append(fields[0])
        flux.append(fields[1])
        sigma.append(fields[2])
for i in range(0,5):
    print wavelength[i], flux[i]
plot(wavelength, flux, linewidth=1.0)
xlabel('wavelength (Angstrom)')
ylabel('specific intensity)')
title('Please work')
grid(True)
show()
Victory!   That was extremely painful, but educational.


Next, let's try that again, but using the numpy function "loadtxt" to read in the spectrum.  (loadtxt is kinda like readcol in IDL.  Here's a nice tutorial on loadtxt.)  Way shorter code:
# minimal plot of spectrum in python, now with loadtxt
import sys
import os
import numpy
from pylab import *
inE = 'somedata.txt'
wavelength,flux,sigma = numpy.loadtxt(inE, comments="#", skiprows=3, usecols=(0,1,2), unpack=True)
plot(wavelength, flux)
xlabel('wavelength (Angstrom)')
ylabel('specific intensity)')
title('This was a bit easier')
show()
Yay for numpy.loadtxt!  You can set your delimiter and datatype, if you need to.  One word of advice:  that "unpack=True" bit is needed to write simple arrays, rather than weird multidimensional python arrays.  It's the ThippCrinkleSplort=no setting.

So example two made the same output, but with a much shorter, higher-level code.  Actually that wasn't that painful.  Wish I'd found loadtxt at the beginning!


Now, let's do this a third way, using the powerful, flexible, and intimidating package asciitable.  It supports all kind of astronomy tables, including tables from IRAF's daophot, CDS, latex, and even (shudder) fixed-width tables.  The default output is a NumPy record array:  "this powerful container efficiently supports both column-wise and row access to the table and comes with the full NumPy stack of array manipulation methods."  (Which sounds powerful, but I have no idea how to use it.  Of course the documentation for a record array is absolutely opaque for a newbie.)  But I figured out the simplest usage:

import asciitable
from pylab import *
inE = 'input.txt'
data = asciitable.read(inE)
plot(data.col1, data.col2)     # See?  columns are in col1, col2, etc...
show()

So that was a pretty rewarding exercise!  Took all fricking morning, but now I know how to read in ascii text files and do simple plotting.   (Next, I should read Perry's tutorial...)


 

6 comments:

  1. Hey Foobar, I've been struggling with python too, specifically with reading *.npz files. I was thinking I ought to make an entry on it. But I don't have a real blog right now. You want me to send it to you? Or can you support multiple bloggers here?

    ReplyDelete
  2. I am using mathplotlib to do some plotting..
    My task is to read data from file, and plot it. I am successful with normal data... But when my data has dates in format(3-Nov,4-Dec etc) i am struggling to plot it on graph... Any help??

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. what changes in code is required for scatter plot?
    I am very new to python! (3hrs old :P)

    ReplyDelete
  5. Oh! I found out!
    replaced the last two lines of the 3rd method with the following:
    matplotlib.pyplot.scatter(data.col1, data.col2) # See? columns are in col1, col2, etc...
    matplotlib.pyplot.show()
    also added these two lines in the beginning:
    import matplotlib.pyplot
    import pylab

    ReplyDelete