Forecasting Methods: Cleaning Data

One of the problems with data that you might run into early is that not all market data is that clean. When I worked at Chase Manhattan Bank in the mid 1990s, I was on a project where we had to 'scrub' the data because we were creating a huge data warehouse of a ton of price data so they could price every single financial instrument that the bank owned. It contained prices for everything from wheat to stocks to bonds to airplanes. I was the GUI project leader, and responsible for making the tools that were used by risk management for updating the prices with a Java based GUI.

The methodology we used was a standard technique, which was to run forward the data via a windowing technique (take, for example, 200 values), and compare the standard deviation of the data with the data values. If the data value was more than a standard deviation away from the rest of the data, it was suspect. If it was several, it was very suspect. This is a standard technique evidently in the industry.

The rest cannot be disclosed because I signed a non-disclosure agreement, but suffice it to say that you want to make sure that when you're forecasting, that you scrub the data and look for outliers.

I especially had this problem when trading Forex, and with some data exchanges later in my trading experience, because there would be huge outliers and run-ups, run-downs especially in times when the exchanges aren't open. For example, Sunday afternoon and early evening Pacific time, Tokyo is just coming online, and there isn't a lot of trading going on. Sometimes during those hours there can be dramatic up and down jumps that show up badly when you start to do calculation. Filtering these out with data scrubbing can help things dramatically -- otherwise your indicators can show wild jumps.

Forecasting Methods

Tuesday, September 1, 2009

Cleaning Data

No comments:

Post a Comment

Advertisement

About