The Machine Learning Journey: 2012

Thursday, September 6, 2012

I don't know about you, but to me it sounds like a ripoff

Have you ever stopped and pondered about why ink cartridges are so expensive?

The market explanation is that is with the cartridges that the companies get their profit, they are probably losing money with each printer they sell.

So now begs the question, why are Journals so expensive?

Have you tried downloading a paper from things like Elsevier and the IEEE without being connected to a network that has a license for any of them?

The standard price of your paper will be about $30(USD). Which again begs the question? Is it really that expensive to publish your paper?

Recently the IEEE solved my question, by telling me that I (and only I) have the possibility of making my paper free for the world by paying a unique fee of $3000 USD!!

Of course, your contract with the IEEE says that the exact camera-ready paper that you submitted is the one that they can make open access. If you change a single paragraph, you can actually upload it to your favorite server for null or almost null fees.

The IEEE says that price, the $3000, is to cover the publishing costs. So they have answered my question, the publishing costs are 3000 USD, because they wouldn't think on using my money to get a profit right?

So, what if I want to publish my own journal, the IEEE just told me that the price for each paper is $3000 USD, so publishing a single Volume with 15 papers should have a cost of 45,000 USD right?????

A single volume, would be equivalent to about a book of 120 pages, if each paper is 15 pages (because they publish on both sides)

So, there we have it, it costs $45,000 USD to publish a full Edition of a volume. Now, I do not know how many copies do they release per Volume, but lets do an estimate.

The market price of publishing your own book is about $5 USD (150 color pages) for a paperback,so 45,000 USD get you 9000 printed volumes.

And then we ask, do they publish 9000 volumes? Perhaps they do, I don't really know, and it's a number they don't seem to publish or make public. If each University receives printed copies of the Journals, they might well be over the 9000 printed volumes and their numbers would make sense. But on the other hand, if only the authors receive this copies, it's just a huge ripoff. Since I hardly think that the hosting costs of the papers are over (45,000 x 6 volumes per year) $270,000 USD.

Friday, August 10, 2012

Bad Reviews and Trolling

"You weren't smart enough two capitalize Christian

The cosmos exist, and they have an obvious design.

God exists."

-Random Comment Found in YouTube

I recently tweeted that "Bad Reviews are like Trolling", and I kept wondering that perhaps such a thought deserved a bit of explaining, and we might as well learn a bit on how to do a good review.When someone asks you to review a paper for a journal or a conference, they expect you'll devote an appropriate amount of time to read, analyze and review the paper. The thing is, people might get 5 or 6 of these every month, and it may increase in months when there is a conference looming. I have to say that I've receive most bad reviews than good ones, I'm not saying I got rejected, I'm saying reviews were lazy, ill written and obviously rushed. Like a Youtube comment, there are so many videos that demand your attention that you cannot bother on writing good comprehensive reviews for each of them. So a lot of people do what is commonly known as trolling, that is, they just give a negative comment without any suggestion or space to discuss.

Trolling usually is characterized by 3 things:

Bad grammar, trolling is obviously done without care, so there is no care in writing well either.
Usually a negative comment without any suggestion or room for discussion.
If they disagree it is based on a deep personal belief rather than a well informed and researched decision.

If you see these three points, a bad review of a paper usually has these same characteristics. Most reviewers won't even tell you what is wrong if they rejected it.

What is worse, most times, a reviewer is the pipeline worker and final judge on whether a paper is accepted or rejected for publication or a conference.It seems almost unfair that months of work get to be evaluated in a short burst by someone who might be unprepared or not willing to do the job.

But not everything is lost, I've seen great reviews, with constructive criticism, and always a chance to reply the comments. Not only that, they also are written in pristine and clear English, so the review itself is not confusing to the authors.Some suggestions you might like to follow when doing a review:

If you attack the author's spelling, try to give concrete examples of what you think is a mistake, perhaps the authors do not consider it that way.
If you attack the author's methodology, try suggesting a better one, and point out the errors or points you would change in what they are doing so far.
If you attack the idea in general: Please do not do it, if at the end it was a good idea, you'll look foolish, ideas are too personal and a paper should never be rejected on the basis "I did not like your idea"
If you attack the organization of the paper: Perhaps you could suggest a better way to organize it, and give a suggestion or two on why the paper should be organized in the way you are suggesting.
If you have issues with the theory behind the paper, be clear to point out why aren't you convinced and point out references or proofs that the theory is wrong.

Remember that you are not in a review committee as an almighty god but rather as a humble quality control manager, your job is to see that the work is not "plagiarized", that the work makes scientific sense and that the work is readable to most of the audience. You are not an editor, so you do not get to impose your style of writing and you are not a scientific adviser, so you are not to impose your scientific ideas.

Friday, June 22, 2012

My last seminar and Non Parametrics for the Lay Man

In The University of Tokyo, we have to present 3 seminars through the course of our PhD's, the first one is a survey, the second one is your midterm evaluation and the last one is another survey.

They are also called Rinko (輪講), It roughly translates as reading (or discussion) circle.

The format of the seminar is the following:

Presenters: 3 Presenters (PhD and Master students)

Audience: A room packed with about 70 students from different research groups, you have people from every background in Electric Engineering and CS, information, semiconductors, power systems, computer science, robotics. You also have your Prof, and usually, a couple other Professors who might or might not be related to your topic. They are the Prof. of the other 2 guys presenting with you. The administration tries to have Prof. on related fields.

Time: 25 minutes to present your slides
Materials: You have to present a Paper-Like document of at most 8 pages with your topic. Slides for your presentation
Questions: 5-10 minutes at the end, either from the Prof or the Students.

Given this format, it is tricky to introduce to them a new topic like Non Parametric Bayesian methods . I had to decide either to spend my time trying to teach them the inner workings of things like the Dirichlet Process or the Gamma Process, and then try to explain how things like Naive Bayes or LDA benefit from this; or spend my time showing them some cool applications and areas where they could use it, prepare an easy reading paper, not to deep and with lots of citations for them to go look if they were interested.

Needless to say, I went with the last option, and here it is, my version of what I'd call Non Parametric Bayesian Methods for the layman.

This document is NOT TO LEARN Non Parametric Methods, but rather to see how can you use it and have a friendly introduction to the topic, I introduced basic things about DP and IBP, but I did not mention things like inference or Gibbs Sampling.

If you wish to learn DP or IBP, you can always go to the papers I cite. But I commend you not to use this as your main source of information, I know I wouldn't do it.

I would like to make this a living document, so if you have suggestions or ideas, I can always add them to the final paper, I left a ton of things out of the paper due to space constrains (8 pages). So send me an email or a tweet if you wish to add something, or point out a typo, I'm sure it's full of those. @leonpalafox

Paper

Slides

Note: These slides and document are free for you to use, distribute and modify as you wish, if you want to give me a little credit, just point out to here or my webpage, and it would be more than enough

Cheers

Monday, May 28, 2012

Reading List for the ICML 2012

The list of accepted papers for the ICML 2012 is out, and following some of my colleagues, I'll post the papers that at first hand cached my eye:

(Disclaimer: Since my research tends to be on nonparametric statistics, I tend to gravitate towards paper on those topics)

Gaussian Process Regression Networks
Andrew Wilson, David Knowles, Zoubin Ghahramani

Abstract: We introduce a new regression framework, Gaussian process regression networks (GPRN), which combines the structural properties of Bayesian neural networks with the nonparametric flexibility of Gaussian processes. GPRN accommodates input (predictor) dependent signal and noise correlations between multiple output (response) variables, input dependent length-scales and amplitudes, and heavy-tailed predictive distributions. We derive both elliptical slice sampling and variational Bayes inference procedures for GPRN. We apply GPRN as a multiple output regression and multivariate volatility model, demonstrating substantially improved performance over eight popular multiple output (multi-task) Gaussian process models and three multivariate volatility models on real datasets, including a 1000 dimensional gene expression dataset.

Quick Opinion: Based on the abstract and a quick reading of the arxiv version, this sure looks like a nice variation for Gaussian Processes. And merging both Bayesian Neural Networks with GP seems both a good idea for specific problems like Gene Regulatory Networks Inference, given that some people have been using Recursive Neural Networks for such tasks.

Modeling Images using Transformed Indian Buffet Processes
KE ZHAI, Yuening Hu, Jordan Boyd-Graber, Sinead Williamson

Abstract: Latent feature models are attractive for image modeling; images generally contain multiple objects. However, many latent feature models ignore that objects can appear at different locations, or require pre-segmentation of images. While the transformed Indian buffet process (tIBP) provides a method for modeling transformation-invariant features in simple, unsegmented binary images, in its current form it is inappropriate for real images because of computational constraints and modeling assumptions. We combine the tIBP with likelihoods appropriate for real images. We also develop an efficient inference scheme using the cross-correlation between images and features that is both theoretically and empirically faster than existing inference techniques. We demonstrate that, using our method, we are able to discover reasonable components and achieve effective image reconstruction in natural images.

Quick Opinion: I could not find the pdf, so based on the Abstract, the paper seems pretty interesting, although I'm curious in which way did they extend tIBP using likelihoods.

Revisiting k-means: New Algorithms via Bayesian Nonparametrics
Brian Kulis, Michael Jordan

Abstract: Bayesian models offer great flexibility for clustering applications—Bayesian nonparametrics can be used for modeling infinite mixtures, and hierarchical Bayesian models can be utilized for shared clusters across multiple data sets. For the most part, such flexibility is lacking in classical clustering methods such as k-means. In this paper, we revisit the k-means clustering algorithm from a Bayesian nonparametric viewpoint. Inspired by the asymptotic connection between k-means and mixtures of Gaussians, we show that a Gibbs sampling algorithm for the Dirichlet process mixture approaches a hard clustering algorithm in the limit, and further that the resulting algorithm monotonically minimizes an elegant underlying k-means-like clustering objective that includes a penalty for the number of clusters. We generalize this analysis to the case of clustering multiple data sets through a similar asymptotic argument with the hierarchical Dirichlet process. We also discuss further extensions that highlight the benefits of our analysis: i) a spectral relaxation involving thresholded eigenvectors, and ii) a normalized cut graph clustering algorithm that does not fix the number of clusters in the graph.

Quick Opinion: I think this extension was something that was missing in ML, I'm very intrigued on this paper in particular, I remember reading on how K-means was a relaxation for Mixture of distributions with circular Gaussians.

Friday, March 30, 2012

Machine Learning and Memes

Inspired by this great post

This is a story of how our life through the peer reviewing process goes in the ML community.

First, we usually start looking for ideas to do new research on, and more often that not, I'm like this:

Then, I start reading papers on the topic, and find out a lot of papers like:

Then, I finally send a paper to a journal or conference to be peer reviewed and I find different types of reviewers:

The grammar defender, who will punish every little grammar mistake and recommend you have your paper checked by the editors of the The New Yorker.

The organizer who want's to increase the perceived quality of the journal/conference by being overly strict when doing reviews.

The guy who knows nothing of your topic, but still, looks at the comparison tables with other works and asks:

The author of one of the papers you cite, who thinks no one but him/her has more authority on the topic:

But some times, you find someone who gives you some expectation that this area may become better overtime:

So, keep submitting so you are lucky enough the three reviewers of your paper are like the last guy.

Good luck

Monday, February 27, 2012

An introduction to linear regression - Cost Function (ML for the Layman)

I've tried keeping away from posting tutorials on ML topics. Mainly because I did not feel well prepared to do it yet. I hardly think I'm prepared now, but I definitely can give it a better shot, and with your feedback, I can ,at least, get an idea on where I can improve.

Disclaimer: Even though this introduction will be basic, and I mean as basic as it can get. You'll still need some knowledge on matrix operations, basic algebra and calculus to get through some of the explanations.

Imagine you want to sell your car, let's say it is a Prius 2007 with 20,000 miles. It is in a very good condition and you would like to do a survey of how much does your car costs in the market. You definitively want to get, as a seller, the best price for it.

So, how do you price a car? If you know nothing about cars (like me), you can go to someone that has a better idea, in our case, it is the internet.

We can see that the price is set by things, like the age, the maker, the mileage, the overall condition, etc. We will call this things "features". So a set of features is basically the characteristics of our car or any object.

So which features to choose? For the sake of simplicity let's choose year and mileage. It's important to notice that there are whole researches around how to choose features, but we are first learners here, so we do not care about that. It is important to know, however, that having many features is not always better than having few features.

Now, that you have chosen a set of features, how do you compare one car to another? We now go and look for data. For car comparison, we can go to different web sites where you can see different combinations of these features for different cars. Your local newspaper's classifieds or craiglist.

Let's create a mock data set of 5 cars using only 2 features, age and mileage. All of them are Prius:

$year=(2007,2005,2006,2007,2010)$
$mileage=(50000, 60000,54000,40000,20000)$
$price=(12000,9700,10500,13000,20000)$

It is intuitive to think that the price of our car will be directly related to age and mileage. A low mileage and a recent year increases the price, while an old car with a high mileage has a lower price. So there should be an abstract way to write this relation.

To model this kind of data, we use linear regression, which states that a variable is the resutl of a linear combination of other variables. That is, our price actually obeys to some combination of mileage and year.

For the first element (12,000 USD for a 2007 model with 50,000 miles):

$12000=a_1*(50000)+a_2*(2007)$

A second characteristic, is that every element has to share those $a_1$ and $a_2$ variables, so we can use those values with our own Prius, there would be no point in describing individual values for each car, when we want to find the best price for our car, so we can write a general equation:

$Price=a_1*mileage+a_2*year$

How do we find the $a_1$ and $a_2$ that solve this model? How do I know that "1" and "1" are not good choices? Well, for starters, summing up the year and the mileage does not seem like a good idea.

We need some function that'll tell us how bad or how good our prediction is. This is usually called the "cost" function. How do we build a cost function, well, our intuition tells us that we have to compare (rest) the truth with our guess. So for our guess of "1" and "1" for $a_1$ and $a_2$:

$Cost=(12000-(1*50000+1*2007))$

We can see there is something off here, since we are looking for the minimum cost, we can make the values in $a$ large enough and we will have incredibly low values (negative values), so we squared them to have a nice function that has a lower bound (it does not go lower than certain value).

$Cost^2=(12000-(1*50000+1*2007))^2$

We now have an intuition that the best we can do is $cost=0$, since a squared number can never be negative, we cannot do any better than that.

This cost, however, is only the cost for 1 example, we need the cost for all our cars. So we sum all of them:

$Total cost^2=\sum_{i=1}^5(Price_i-(1\times Mileage_i+1\times year_i))^2$

We now have the intuition that our choice of 1 and 1 may not be a very good choice at all, just for kicks, lets put how much the cost would be for different values for $a_1$ and $a_2$:

a1,a2

Cost

0,0

917,340,000

0,1

675,707,259

1,0

6,595,340,000

1,1

7,252,615,259

The costs are terrible! There is, however, something happening for "0,1", since the cost decreased compared to "0,0". But there has to be something better than just random guessing right?

Our guess of 1,1 is just terrible. But we gained an intuition, we see that our solution has to be near the "0,1". Also, some of you may notice by now that no matter which values do we put, there is no way we are going to get a zero. It's just not possible. Not without some extra help, which we will call an offset.

Next time, we'll talk about optimization or how to stop you from trying everypossible combination by hand, how to use the offset and a bit about comparing pears and apples.

See you later

Remember to visit my webpage www.leonpalafox.com. And if you want to keep up with my most recent research, you can tweet me at @leonpalafox.
Also, check my Google+ account, be sure to contact me, and send me a message so I follow you as well.

Thursday, January 26, 2012

Geoff Hinton Memes

A week ago, Yann LeCun started a ML related meme, by writing the Geoff Hinton facts (ala Chuck Norris Facts).

I'm writing them here as they appear, and if you have more, please send me yours:

Geoff Hinton doesn't need to make hidden units. They hide by themselves when he approaches

Deep Belief Nets actually believe deeply in Geoff Hinton.

Geoff Hinton uses an infinite amount of training data for each experiment - twice.

Others prove theorems. Geoff Hinton proves axioms.

Geoff Hinton once built a neural network that beat Chuck Norris on MNIST.

Geoff Hinton discovered how the brain really works. Once each year for the last 25 years.

Markov random fields think Geoff Hinton is intractable.

If you defy Geoff Hinton, he will maximize your entropy in no time. Your free energy will be gone even before you reach equilibrium.

Geoff Hinton can make you regret without bounds.

Geoff Hinton can make your weight decay(your weight, but unfortunately not mine)

Geoff Hinton doesn't need support vectors. He can support high-dimensional hyperplanes with his pinky finger.

Geoff Hinton frequents Bayesians.

Most farm-houses are surrounded by nice fields. Geoff Hinton's house is surrounded by mean fields.

All kernels that ever dared approaching Geoff Hinton woke up convolved.

The only kernel Geoff Hinton has ever used was a kernel of truth.

After an encounter with Geoff Hinton, support vectors become unhinged

Geoff Hinton's generalizations are boundless.

Geoff Hinton goes directly to third Bayes.

Never interrupt one of Geoff Hinton's talks: you will suffer his wrath if you maximize the bargin'.