Tuesday, October 22, 2013

Why should textbooks be free.

Since I've started studying, I noticed something really strange, textbooks and science books are surprisingly expensive.

Some books that I would consider essential for an engineering undergrad, like Oppenheim and Willsky's book on Signals and Systems go well above 150 USD. I've even seen biological sciences books go for way more than that.

Furthermore, is unlikely you will ever use Oppenheim's book for the whole length of an undergraduate course, and unless you start doing research in that area, it is very likely you will never touch that book again in your entire life. There are always the possibility to rent the book, which goes for about 99 USD in Amazon.

Scientific books are thought to be expensive because there is a whole deal of research behind them, there are tons of money invested so the book can be written and a professor (who is already being paid) has to devote some time to write it. There is a proofreading process (which sometimes is done by undergrad students also being paid already)

Economically speaking, it just does not make any sense, the US government most likely is paying many of the grad students, postdocs and professors that are writing the book via grant money. The professor may or may not get an advance on the book, and the royalties he will get on the book are around 10% of the cost of the book. Which means that both Oppenheim and Willsky should get around 15 bucks for every book that is sold. 

Also, is not like scientific books can earn you big bucks, mandatory books like S&S may get you good money, but most likely you won't see a lot of royalty money, especially for highly topical books in advance graduate courses.

Then the next question is: Why charge for it, originally it made sense, since printing was the only way to communicate new ideas and teach scientific ideas, and printing is overall an expensive process. However, the internet brought that down, if you are living in the internet age, and you are writing a book because you want to educate people, there is no good reason you cannot give your book for free. With 10% of royalties you are clearly not getting rich, and we can distribute a thousand copies with the click of a mouse.

You can always publish it, and expect someone will buy it in print (I know I still do sometimes), but I do believe is a researcher's duty to allow people to access freely to the contents of the book.

Why? For one, the money to develop the knowledge that you use to write the book is most likely taxpayers' money. The money given to you so you have a hefty team of undergrads, grads and postdocs probably is also taxpayers money. And the fact that you have students going out of their way to write a book, might actually hurt them in their pursue of a graduate degree.

And finally, I do believe that as educator, the ultimate goal of the professor should be to pursue the education of as much people as they can reach. If their objective is to make money, they are probably in the wrong business anyway. The main question I like to make is: Do you care that people pay for your book, or do you care that people read your book? If the answer is the former, you probably do not care the IEEE and other printing houses charge 20 bucks for an 8 page article.

Luckily I'm not alone on this, and many great Machine Learning professors have made their books freely available in the internet. I do believe there is a possibility to get a great ML education based only on free books, although some of the best books are still not available for free download. 

Thursday, May 23, 2013

Stationarity in ML Data (An EEG application)

A stationary process in statistic is one where the distribution does not change in time or space. Which in layman terms just means that if we measure the data today and we test it tomorrow,  the underlying distribution must remain more or less the same.

This is a fundamental concept to Machine Learning. Stationarity in data allow us to train models today and expect them to work in data that we gather in the future. Sure there is a constant retraining of the models, but most of the models will assume that all the data comes from a stationary process. There is no point in modeling your data with some parameters today (mu and sigma if it is Gaussian) if you expect that tomorrow's parameters are going to be wildly different. However much of the data out there is non stationary.

Think about training a robot following a path based on images taken in spring, and then try to have the robot follow the same path in winter. We ran in this issues a lot when working in the Tsukuba Challenge Project, they allowed you to take the data in the summer, but the competition was in fall, when the trees have no leaves to show for.

Interesting problems like these arise in many other areas, like computer vision, where we would like to think that the objects we use to train are not rotated or transformed, when in reality they are. For a more extensive review of CV approaches to this, you can check LeCun's Convolutional Neural Networks.

Some of the most interesting problems in Neuroscience are also  non stationary (we like to think they are, but they aren't).  EEG readings that we do today are often affected by many environmental and subject conditions. Not to mentions that reading EEG from human scalp is not an exact science.  The whole process tends to be messy, time consuming and difficult to replicate.

One cool approach to deal with this is to transform the data in such a way that you can obtain non invariant features of the data. For example, if you train a CV vision system, you could extract these features from objects, instead of measuring size and color of a circle you could measure it's radius and circumference  and if the parameters follow the circle's circumference equation, you could assure it was a circle. Convolutional NN do something of sorts with input images. You could also do an extensive training, which means to train with every possible transformation of the data.

In EEG we try to use frequency analysis, since it tends to be more reliable than the simple time series (in theory). A recent paper in the MIT Journal of  Neurocomputation has a great introduction on this topic, and how non stationarity is attacked using things like stationary feature extraction and adaptive model algorithms.

Stationary feature extraction is the jargon for what I described before with CV, many people use things like Common Spatial Patterns that tend to remove all of the EEG non stationarity and leave us with nice stationary features to use with our favorite ML algorithm.

Adaptive model algorithms are those that change the algorithms' parameters in subsequent recording sessions. As users get accustomed to have their EEG reading plotted in front of them, they also tend to learn how to control them better. And as such, adaptive algorithms are used to address this non stationarity. Think of it as a videogame that learns your behavior as you get better playing it, and can react better to your inputs.

The approach in the aforementioned paper is interesting in the sense that they used a really simple statistical concept, the Kullback-Leibler (KL) divergence, which is a fancy term for a measure of how the difference between different probability distributions.

They assume that if you have a training session done in day 1, and a small sample of data from day 2, you can use the KL divergence to measure how different the probabilities from day 1 and day 2 are, and create a linear transformation such that the information from day 1 is relevant to that you will obtain in day 2.

The rest of the paper goes on how to obtain these transformation matrices using different flavors of this approach, one where the labels of the day 2 are available, and one where they aren't.

The method looks eerily similar to what you would do in a State Model, where you try to approximate (predict) the values of the next state given the previous state's parameters and then do an update as you can read the data for the next state (Update).

The paper goes the safe route and assume both probabilities are gaussian, I could think in a nice extension where you approximate them to be gaussian  but in reality you can have different probability distributions modeling the shape of the data. Using simple gaussian approximations that should not be so hard.

The paper is nice, but still in preprint, so you will need a university account to access it, and maybe a couple of tricks from your university's library.