## 20111024

## 20110927

### Notes on Matlab's Parallel Tools

A couple days ago I sat through a Matlab information session on Matlab's parallel tools. This session largely convinced me that Matlab's parallel tools are not worth using.

Basically, Matlab's simple parallel tools do not offer much benefit over conventional Matlab optimizations, and Matlab's advanced parallel tools are no easier to use than the tools in say, Python, or other simple parallelism solutions.

Matlab's simple parallel tools (parfor) offer some advantage over unoptimized code, but not much. Parfor parallelizes loop execution on a single machine. However, the matrix operators that underly even single threaded matlab execution are already parallelized through multithreaded implementations of BLAS. Therefore, most applications gain automatic single machine multiple core parallelism simply by vectorizing the code and expressing your operations in terms of matrix operators.

At an intermediate level, for embarrassingly parallel tasks, it is not difficult to script job submission, and to do some weak communication through the filesystem. If you are even a vaguely competent programmer, this far is more pleasant than forking over $20K of your grant money for the distributed computing toolbox, and then learning to use Matlab's distributing computing API.

Matlab's more advanced tools, such as distributed computing and CUDA interfaces, require additional training beyond typical Matlab use. I would argue that any programmer who has the ability to use open source equivalents would benefit form doing so. For an experienced programmer, using these features in matlab is no more difficult than using existing open source solutions. Furthermore, if you are using an open source framework, you never have to worry about the cost of licenses or that people will be unable to use your code because they cannot afford Matlab.

For me, the only benefit of Matlab is that more scientists seem to know it as their only programming language, and so it facilitates collaboration. The major (major major) downside of Matlab is that it is F*cking expensive, and so it effectively a elitist platform confined to large universities and corporations, and so coding publicly funded science in Matlab limits participation, and is in some sense unethical, in my opinion. But, to get back to the point, Matlab's adcanced parallel tools are new, and require additional training to learn how to use. This means that parallel code in Matlab can't be as readily shared within the scientific community, and also that existing open source solutions actually have an older and larger body of users. Thus, the two main advantage of Matlab is not present for its parallel toolboxes.

Finally, having used PyCUDA, I believe that the Python bindings for GPU are actually more advanced than those in Matlab, and the greater expressiveness of the Python language makes working with CUDA kernels much less of a hassle. Python's numerical routines are on par with Matlab's in terms of speed, and the development branch of MatPlotLib is in some ways nicer than Matlab's plotting utilities.

I would argue that there is no longer any reason not to use Matlab over Python. Of course, I am biased. I learned to use Pylab before I learned to use Matlab. I believe that the Python language is as easily learned as Matlab. Both Matlab and Python have quirks that will allow you to shoot yourself in the foot in terrible ways, and I'm not sure either is strictly better in this regard. Their core numerical and plotting routines are comparable. The only advantage that Matlab has to offer is that more scientific code has already been written in Matlab for historical reasons. We can help to change this by favoring Pylab for developing new analysis code, and publishing out Python code to the wider scientific community. Andreas Klöckner sets an excellent example in this regard.

I should add that this is just my personal opinion, and that I have more experience using Python and CUDA, as well as scripting over clusters and using the filesystem for inter-process-communication, than I do with using Matlab's parallel tools. Therefore, I am naturally biased toward familiar, open, free, but perhaps slightly more time consuming, solutions.

Basically, Matlab's simple parallel tools do not offer much benefit over conventional Matlab optimizations, and Matlab's advanced parallel tools are no easier to use than the tools in say, Python, or other simple parallelism solutions.

Matlab's simple parallel tools (parfor) offer some advantage over unoptimized code, but not much. Parfor parallelizes loop execution on a single machine. However, the matrix operators that underly even single threaded matlab execution are already parallelized through multithreaded implementations of BLAS. Therefore, most applications gain automatic single machine multiple core parallelism simply by vectorizing the code and expressing your operations in terms of matrix operators.

At an intermediate level, for embarrassingly parallel tasks, it is not difficult to script job submission, and to do some weak communication through the filesystem. If you are even a vaguely competent programmer, this far is more pleasant than forking over $20K of your grant money for the distributed computing toolbox, and then learning to use Matlab's distributing computing API.

Matlab's more advanced tools, such as distributed computing and CUDA interfaces, require additional training beyond typical Matlab use. I would argue that any programmer who has the ability to use open source equivalents would benefit form doing so. For an experienced programmer, using these features in matlab is no more difficult than using existing open source solutions. Furthermore, if you are using an open source framework, you never have to worry about the cost of licenses or that people will be unable to use your code because they cannot afford Matlab.

For me, the only benefit of Matlab is that more scientists seem to know it as their only programming language, and so it facilitates collaboration. The major (major major) downside of Matlab is that it is F*cking expensive, and so it effectively a elitist platform confined to large universities and corporations, and so coding publicly funded science in Matlab limits participation, and is in some sense unethical, in my opinion. But, to get back to the point, Matlab's adcanced parallel tools are new, and require additional training to learn how to use. This means that parallel code in Matlab can't be as readily shared within the scientific community, and also that existing open source solutions actually have an older and larger body of users. Thus, the two main advantage of Matlab is not present for its parallel toolboxes.

Finally, having used PyCUDA, I believe that the Python bindings for GPU are actually more advanced than those in Matlab, and the greater expressiveness of the Python language makes working with CUDA kernels much less of a hassle. Python's numerical routines are on par with Matlab's in terms of speed, and the development branch of MatPlotLib is in some ways nicer than Matlab's plotting utilities.

I would argue that there is no longer any reason not to use Matlab over Python. Of course, I am biased. I learned to use Pylab before I learned to use Matlab. I believe that the Python language is as easily learned as Matlab. Both Matlab and Python have quirks that will allow you to shoot yourself in the foot in terrible ways, and I'm not sure either is strictly better in this regard. Their core numerical and plotting routines are comparable. The only advantage that Matlab has to offer is that more scientific code has already been written in Matlab for historical reasons. We can help to change this by favoring Pylab for developing new analysis code, and publishing out Python code to the wider scientific community. Andreas Klöckner sets an excellent example in this regard.

I should add that this is just my personal opinion, and that I have more experience using Python and CUDA, as well as scripting over clusters and using the filesystem for inter-process-communication, than I do with using Matlab's parallel tools. Therefore, I am naturally biased toward familiar, open, free, but perhaps slightly more time consuming, solutions.

Labels:
Matlab,
Parallel Computing,
Python

## 20110721

### How to update your project website on Sourceforge

The Sourceforge website confuses me. I need these notes so I can remember how to maintain a project web page.

This link creates a new project

Pick a name less than 15 chars long, e.g. qwerasdfzxcv, and create a new project.

go to http://nameofyourproject.sourceforge.net/

you will then find the link where you can learn how to update your project website.

I want to ideally use sshfs to mount my project website. I'm not sure thats possible, but this page allegedly explains how to use SCP, which might be a good start.

ok, so the username for a sourceforge project is "username,projectname".

the server you talk to in order to manage the website is web.sourceforge.net

your documents live on a path of the form /home/groups/q/qw/qwerasdfzxcv/htdocs/

a typical SCP command might look like

scp FILE username,projectname@web.sourceforge.net:/home/groups/q/qw/qwerasdfzxcv/htdocs/

Is this enough information to sshfs mount your website ? Nope. It seems like it might not be possible. It looks like one can only SCP to or from this space, not interact with it over SSH. So, to modify this, you need to pull down the entire directory, make edits, and then put it back up.

To pull down the entire web space as a local directory htdocs

scp -r user,project@web.sourceforge.net:/home/groups/q/qw/qwerasdfzxcv/htdocs ./

Once you're done making changes, put it back

scp -r ./htdocs/* user,project@web.sourceforge.net:/home/groups/q/qw/qwerasdfzxcv/htdocs/

This link creates a new project

Pick a name less than 15 chars long, e.g. qwerasdfzxcv, and create a new project.

go to http://nameofyourproject.sourceforge.net/

you will then find the link where you can learn how to update your project website.

I want to ideally use sshfs to mount my project website. I'm not sure thats possible, but this page allegedly explains how to use SCP, which might be a good start.

ok, so the username for a sourceforge project is "username,projectname".

the server you talk to in order to manage the website is web.sourceforge.net

your documents live on a path of the form /home/groups/q/qw/qwerasdfzxcv/htdocs/

a typical SCP command might look like

scp FILE username,projectname@web.sourceforge.net:/home/groups/q/qw/qwerasdfzxcv/htdocs/

Is this enough information to sshfs mount your website ? Nope. It seems like it might not be possible. It looks like one can only SCP to or from this space, not interact with it over SSH. So, to modify this, you need to pull down the entire directory, make edits, and then put it back up.

To pull down the entire web space as a local directory htdocs

scp -r user,project@web.sourceforge.net:/home/groups/q/qw/qwerasdfzxcv/htdocs ./

Once you're done making changes, put it back

scp -r ./htdocs/* user,project@web.sourceforge.net:/home/groups/q/qw/qwerasdfzxcv/htdocs/

## 20110429

### Is Maximizing Mutual Information, or Information, Equivalent to the Maximum Coverage Problem ?

( Ignore, Personal notes )

I need some measure theory / analysis / math help to proceed :

update : This question has been answered in depth at MathOverflow

Say you have N cells. Say also that you would like to construct the best possible decoder using only K of these cells. How can we find these K best cells ?

This problem appears superficially similar to the maximum coverage problem.

(From Wiki) Formally, (unweighted) Maximum Coverage

A very hand-wavy argument :

We know that greedy selection approximates maximum coverage as well as can be hoped. Perhaps selecting the K best cells can be re-phrased as the maximum coverage problem ? Forget about mutual information for the moment and just consider entropy, or information. Given a number $k$ and a collection $S=S_1,S_2,\ldots,S_m$ of random variables. I would like to find a subset $S^\prime \subseteq S$ of variables, such that $\left| S^\prime \right| \leq k$ and the joint entropy $H( S^\prime )$ is maximized.

Entropy of random variables seems to behave, algebraically, like the magnitude of a set. There is probably some formal way of stating this, like "they are both ((some abstract algebraic structure here))" or something, but this is what I can say for now: let $S(x)$ be a hypothetical set representation of random variable $x$ such that $|S(x)|=H(x)$ the magnitude of $S(x)$ equals the entropy of $x$. Maybe $|S(x)|$ isn't really a magnitude but some other notion of "volume" ( needs details )

Lets explore $H(x)\sim|S(x)|$ a bit more. Is it really possible to think of random variables as sets ?

Say I have a collection $X=x_1,x_2,\ldots,x_n$ of independent random variables with unit entropy $H(x_i)=1$. If I create a collection $S=S_1,S_2,\ldots,S_m\,\,st.\,\,S_i\subseteq X$ of subsets of $X$, I know that the joint entropy of each set equals its size $H(S_i)=|S_i|$, since each element $x\in S_i$ is an independent random variable with unit entropy. Under this construction, maximum cover for $S$ and maximizing the joint entropy of $\bigcup_{S_i\in S^\prime} S_i$ are equivalent.

It isn't clear that I can go in the reverse direction and argue that this equivalence holds for any collection of random variables.

super handwave ( I need some measure theory / analysis / math majors to proceed ) :

Consider some sort of limiting case where the sets are actually subsets of the reals ( with some nice property, like .. measurable or compact ? or something ? ) $S_i\subseteq \mathbb{R}$, so that you can have real-valued entropy corresponding to the volumes of these sets $H(S_i)=|S_i|$. The collection of disjoint sets that can be formed by union, intersection, and set-minus from your collection of independent elements with which you construct a weighted set-cover analogy of joint-entropy maximization. If I can get away with this, then I can also easily convert maximizing joint entropy to maximizing mutual information, by considering only the parts of $S_i$ that intersect with some target variable $A$, such that $I(A;S_i)=|S(A)\cap S(S_i)|$. If this is possible it probably doesn't involve taking actual subsets of the reals, just proving things based on the algebraic behavior of information. Something like this has almost certainly been investigated somewhere and I just need to track it down and understand it.

I need some measure theory / analysis / math help to proceed :

update : This question has been answered in depth at MathOverflow

Say you have N cells. Say also that you would like to construct the best possible decoder using only K of these cells. How can we find these K best cells ?

This problem appears superficially similar to the maximum coverage problem.

(From Wiki) Formally, (unweighted) Maximum Coverage

Given a number $k$ and a collection of sets $S=S_1,S_2,\ldots,S_m$, where $S_i \subseteq \left\{e_1, e_2, \ldots, e_n \right\}$, find a subset $S^\prime \subseteq S$ of sets, such that $\left| S^\prime \right| \leq k$ and the number of covered elements $\left| \bigcup_{S_i \in S^\prime}{S_i} \right|$ is maximized.

A very hand-wavy argument :

We know that greedy selection approximates maximum coverage as well as can be hoped. Perhaps selecting the K best cells can be re-phrased as the maximum coverage problem ? Forget about mutual information for the moment and just consider entropy, or information. Given a number $k$ and a collection $S=S_1,S_2,\ldots,S_m$ of random variables. I would like to find a subset $S^\prime \subseteq S$ of variables, such that $\left| S^\prime \right| \leq k$ and the joint entropy $H( S^\prime )$ is maximized.

Entropy of random variables seems to behave, algebraically, like the magnitude of a set. There is probably some formal way of stating this, like "they are both ((some abstract algebraic structure here))" or something, but this is what I can say for now: let $S(x)$ be a hypothetical set representation of random variable $x$ such that $|S(x)|=H(x)$ the magnitude of $S(x)$ equals the entropy of $x$. Maybe $|S(x)|$ isn't really a magnitude but some other notion of "volume" ( needs details )

Entropy is like the size of a set : $H(x)\sim|S(x)|$Repeatedly selecting the variable that would most increase joint entropy gives a greedy algorithm for finding $S^\prime$ with large joint entropy. Maybe a result analogous to "greedy selection is the best polynomial time approximation for maximum cover" holds for greedy maximization of joint entropy ?

Joint entropy is like union : $H(x,y)\sim|S(x)\cup S(y)|$

Mutual information is like intersection : $I(x;y)\sim|S(x)\cap S(y)|$

Conditional entropy is like set-minus : $H(x|y)\sim|S(x)\setminus S(y)|$

Lets explore $H(x)\sim|S(x)|$ a bit more. Is it really possible to think of random variables as sets ?

Say I have a collection $X=x_1,x_2,\ldots,x_n$ of independent random variables with unit entropy $H(x_i)=1$. If I create a collection $S=S_1,S_2,\ldots,S_m\,\,st.\,\,S_i\subseteq X$ of subsets of $X$, I know that the joint entropy of each set equals its size $H(S_i)=|S_i|$, since each element $x\in S_i$ is an independent random variable with unit entropy. Under this construction, maximum cover for $S$ and maximizing the joint entropy of $\bigcup_{S_i\in S^\prime} S_i$ are equivalent.

It isn't clear that I can go in the reverse direction and argue that this equivalence holds for any collection of random variables.

super handwave ( I need some measure theory / analysis / math majors to proceed ) :

Consider some sort of limiting case where the sets are actually subsets of the reals ( with some nice property, like .. measurable or compact ? or something ? ) $S_i\subseteq \mathbb{R}$, so that you can have real-valued entropy corresponding to the volumes of these sets $H(S_i)=|S_i|$. The collection of disjoint sets that can be formed by union, intersection, and set-minus from your collection of independent elements with which you construct a weighted set-cover analogy of joint-entropy maximization. If I can get away with this, then I can also easily convert maximizing joint entropy to maximizing mutual information, by considering only the parts of $S_i$ that intersect with some target variable $A$, such that $I(A;S_i)=|S(A)\cap S(S_i)|$. If this is possible it probably doesn't involve taking actual subsets of the reals, just proving things based on the algebraic behavior of information. Something like this has almost certainly been investigated somewhere and I just need to track it down and understand it.

## 20110428

### Mutual Information Does not Necessarily Predict Decoding Accuracy

( Ignore, personal notes )

Entropy and Mutual Information are increasingly used for analysis in Neuroscience. Frequently, spike times and a continuous variables are binned into discrete processes. This avoids certain conceptual problems with reasoning about information on continuous variables, but can add its own complications. Mutual information between a spike train and a stimulus does not necessarily predict measures of decoding accuracy like correlation or root mean squared error (RMSE). While we may know that there are N bits of mutual information between a count process (derived from spiking data) and a discrete variable (derived from a continuous stimulus), we do not know how "important" this information is.

The "importance" of information is fixed by the experimenter, and may represent a prejudiced expectation of what the neuron encodes, or may represent a practical constraint of the experiment. For instance, in brain machine interface (BMI) research, we are interested in reconstructing from neural recordings how the arm moves (kinematics). A good decoding will be highly correlated with measured kinematics, or that the RMSE between the decoding and the measurement should be minimized. We place more importance on the information that pertains to large amplitude kinematic information, but the mutual information for a discrete random variable does not necessarily capture this importance.

This is intuitive when considering binary representations of integers. Consider two discrete random variables A and B that generate N bit integers. Say that K bits in A and B are always the same, and that the remaining N-K bits are independent. In this case, A and B will have K bits of mutual information $I(A;B)=K$. What do we know about $|A-B|$, the absolute difference between these two processes ?

If A and B share the highest order bits, then errors will be limited to the N-K lower order bits and the error is bounded as $|A-B|\sim O(2^{N-K})$. However, if A and B share their low order bits, while the high order bits are independent, then the magnitude of $|A-B|$ will scarcely differ from the case where all bits of A and B are independent. Mutual information does not tell you the magnitude of the impact of the shared information on the values of A and B.

A population of cells in the brains of rats ( and probably primates ) has receptive fields that would seem to suffer from this decoding issue. Grid cells in the entorhinal cortex have been known to encode a rat's position in the environment. These cells have spatially periodic receptive fields. If you were to listen to a grid cell, and place a point on map representing the animal's location each time the cell fired, you would see a hexagonal pattern of "bumps". Different grid cells represent different spatial frequencies ( larger / smaller bumps ), or different phase ( slightly shifted hexagonal grids of bumps ). Decoding the position of an animal from grid cell activity is much like decoding the value of an integer from its binary representation.

Say I can record from some grid cells. If these cells contain a range of spatial scales, starting with the largest period and decreasing, I can figure out where the animal is. Start with the cell that has the largest period receptive field to exclude some areas of the room, and narrow down the position using cells with progressively finer spatial scales. However, if I start with the cells with small, high spatial frequency maps, I may be able to restrict the animal's location to a grid-like collection of possibilities, but this information is not particularly useful if the gross location of the animal is missing.

It is unclear to me whether decoding kinematics could suffer from a similar problem. At the very least, we know that the lower bounds on decoding accuracy for a given mutual information are quite bad, with the grid cell encoding as a worst case scenario. In practice, decoding accuracy might correlate quite well with mutual information.

It seems like some of my confusion might be cleared up by the concept of distortion in information theory. Apparently there is a relationship between distortion and mutual information that holds for both discrete and continuous random variables.

Particularly confused / speculative / handwave part :

Consider histograms with N equal sized bins ( as opposed to equal width ). These bins can be enumerated by N integers each $log_2(N)$ bits long. Each additional bit of information excludes half remaining bins. However, this half could be "all points larger than 10", or it might be "all points N s.t. N%2=0". When it comes to honing in on the position of a point in space, the former is more useful and will reduce error, while the latter barely reduces error at all.

Perhaps considering the information present in a collection of histograms, starting from most to least course, could reveal the relative contribution of "bits" to different magnitudes of error ? But then, why not consider the Fourier decomposition in the stimulus-value domain, or all arbitrary partitions of the sample space ? When the sample space is continuous, one might be tempted to take ever finer partitions, which should cause the entropy of the distribution to diverge. At this point, differential entropy looks like it might be much better than discrete entropy for some applications.

Entropy and Mutual Information are increasingly used for analysis in Neuroscience. Frequently, spike times and a continuous variables are binned into discrete processes. This avoids certain conceptual problems with reasoning about information on continuous variables, but can add its own complications. Mutual information between a spike train and a stimulus does not necessarily predict measures of decoding accuracy like correlation or root mean squared error (RMSE). While we may know that there are N bits of mutual information between a count process (derived from spiking data) and a discrete variable (derived from a continuous stimulus), we do not know how "important" this information is.

The "importance" of information is fixed by the experimenter, and may represent a prejudiced expectation of what the neuron encodes, or may represent a practical constraint of the experiment. For instance, in brain machine interface (BMI) research, we are interested in reconstructing from neural recordings how the arm moves (kinematics). A good decoding will be highly correlated with measured kinematics, or that the RMSE between the decoding and the measurement should be minimized. We place more importance on the information that pertains to large amplitude kinematic information, but the mutual information for a discrete random variable does not necessarily capture this importance.

This is intuitive when considering binary representations of integers. Consider two discrete random variables A and B that generate N bit integers. Say that K bits in A and B are always the same, and that the remaining N-K bits are independent. In this case, A and B will have K bits of mutual information $I(A;B)=K$. What do we know about $|A-B|$, the absolute difference between these two processes ?

If A and B share the highest order bits, then errors will be limited to the N-K lower order bits and the error is bounded as $|A-B|\sim O(2^{N-K})$. However, if A and B share their low order bits, while the high order bits are independent, then the magnitude of $|A-B|$ will scarcely differ from the case where all bits of A and B are independent. Mutual information does not tell you the magnitude of the impact of the shared information on the values of A and B.

A population of cells in the brains of rats ( and probably primates ) has receptive fields that would seem to suffer from this decoding issue. Grid cells in the entorhinal cortex have been known to encode a rat's position in the environment. These cells have spatially periodic receptive fields. If you were to listen to a grid cell, and place a point on map representing the animal's location each time the cell fired, you would see a hexagonal pattern of "bumps". Different grid cells represent different spatial frequencies ( larger / smaller bumps ), or different phase ( slightly shifted hexagonal grids of bumps ). Decoding the position of an animal from grid cell activity is much like decoding the value of an integer from its binary representation.

Say I can record from some grid cells. If these cells contain a range of spatial scales, starting with the largest period and decreasing, I can figure out where the animal is. Start with the cell that has the largest period receptive field to exclude some areas of the room, and narrow down the position using cells with progressively finer spatial scales. However, if I start with the cells with small, high spatial frequency maps, I may be able to restrict the animal's location to a grid-like collection of possibilities, but this information is not particularly useful if the gross location of the animal is missing.

It is unclear to me whether decoding kinematics could suffer from a similar problem. At the very least, we know that the lower bounds on decoding accuracy for a given mutual information are quite bad, with the grid cell encoding as a worst case scenario. In practice, decoding accuracy might correlate quite well with mutual information.

It seems like some of my confusion might be cleared up by the concept of distortion in information theory. Apparently there is a relationship between distortion and mutual information that holds for both discrete and continuous random variables.

Particularly confused / speculative / handwave part :

Consider histograms with N equal sized bins ( as opposed to equal width ). These bins can be enumerated by N integers each $log_2(N)$ bits long. Each additional bit of information excludes half remaining bins. However, this half could be "all points larger than 10", or it might be "all points N s.t. N%2=0". When it comes to honing in on the position of a point in space, the former is more useful and will reduce error, while the latter barely reduces error at all.

Perhaps considering the information present in a collection of histograms, starting from most to least course, could reveal the relative contribution of "bits" to different magnitudes of error ? But then, why not consider the Fourier decomposition in the stimulus-value domain, or all arbitrary partitions of the sample space ? When the sample space is continuous, one might be tempted to take ever finer partitions, which should cause the entropy of the distribution to diverge. At this point, differential entropy looks like it might be much better than discrete entropy for some applications.

Labels:
information theory,
math,
neuroscience,
speculation

## 20110413

### Some Functions

(ignore this, personal notes)

I keep forgetting this and having to re-derive it, so I'm putting it up here. Exponential decay a very simple response function. To a first approximation, the response function of a synapse might be modeled as an exponential function. The exponential isn't perfect, however, since synapses don't react immediately. Sometimes we use a function of the form $t*exp(-t)$, which has a rising phase as well as a falling phase. This function is actually the result of convolving two exponential decay functions together. Generally, the family $t^n* xp(-t)$ captures the idea of having n feed-forward variables coupled by exponential decay $X_n'=X_{n-1}-X_n$. However, I keep forgetting the appropriate constants to convert between the system of differential equations and the family of response functions. I'm not going to derive that right now, but rather show a series of different parameterizations of the response functions so I can remember how the response changes as $n$ varies.

The integral of $t^n exp(-t)$ actually gets larger for larger $n$. To keep the integral 1, divide by $n!$.

The peak response of $t^n*exp(-t)$ arrives at time $t=n$. Rescale time with $t:=nt$ to get a peak response at $t=1$.

Rescaling time changes the integral again, but this is easily fixed by multiplying by $n$.

Notice how the curves are getting progressively more smooth for larger $n$. I wonder what this looks like on a log axis ?

So, this family of functions appears to tend toward a log-gaussian in the limit. I don't know if thats particularly meaningful.

I keep forgetting this and having to re-derive it, so I'm putting it up here. Exponential decay a very simple response function. To a first approximation, the response function of a synapse might be modeled as an exponential function. The exponential isn't perfect, however, since synapses don't react immediately. Sometimes we use a function of the form $t*exp(-t)$, which has a rising phase as well as a falling phase. This function is actually the result of convolving two exponential decay functions together. Generally, the family $t^n* xp(-t)$ captures the idea of having n feed-forward variables coupled by exponential decay $X_n'=X_{n-1}-X_n$. However, I keep forgetting the appropriate constants to convert between the system of differential equations and the family of response functions. I'm not going to derive that right now, but rather show a series of different parameterizations of the response functions so I can remember how the response changes as $n$ varies.

The integral of $t^n exp(-t)$ actually gets larger for larger $n$. To keep the integral 1, divide by $n!$.

The peak response of $t^n*exp(-t)$ arrives at time $t=n$. Rescale time with $t:=nt$ to get a peak response at $t=1$.

Rescaling time changes the integral again, but this is easily fixed by multiplying by $n$.

Notice how the curves are getting progressively more smooth for larger $n$. I wonder what this looks like on a log axis ?

So, this family of functions appears to tend toward a log-gaussian in the limit. I don't know if thats particularly meaningful.

## 20110410

## 20110223

### ZSH Colored prompt

Just a quick post because this was driving me crazy : how do you get a colored prompt on ZSH on ubuntu ?

Various suggestions online seemed to fail. It seemed like the escape sequences for getting colored varied wildly, and none worked for me.

I followed the instructions here for loading one of the default prompts. eg :

Various suggestions online seemed to fail. It seemed like the escape sequences for getting colored varied wildly, and none worked for me.

I followed the instructions here for loading one of the default prompts. eg :

autoload -U promptinitThen, I typed

promptinit

prompt -l

prompt bigfade

echo $PROMPTto see what the proper escape sequences were ( they didn't look like anything provided online ). e.g.:

%B%F{blue}█▓▒░%B%F{white}%K{blue}%n@%m%b%k%f%F{blue}%K{black}░▒▓█%b%f%k%F{blue}%K{black}█▓▒░%B%F{white}%K{black} %D{%a %b %d} %D{%I:%M:%S%P}So, apparantly %B gives you bold, %F{colorname} sets the foreground color, and %b and %f return these to defaults ? Anyway, I finally got my simple blue prompt:

%}%B%F{yellow}%K{black}/home/mrule>%b%f%k

%B%F{blue}%~$ %b%f

## 20110217

### Willmore et al. — Neural Representation of Natural Images in Visual Area V2 (2010)

Ben D. B. Willmore, Ryan J. Prenger, and Jack L. Gallant — Neural Representation of Natural Images in Visual Area V2 doi:10.1523/JNEUROSCI.4099-09.2010

The authors explore response properties in area V2 using natural images. The spatial but not the temporal statistics of natural images are present in their stimuli. They use the Berkeley Wavelet Transform as a non-linear preprocessing step to generate components to relate to V2 responses by reverse correlation. Phases are quadrature encoded. The authors find that the BWT fits V1 and some V2 cells well with only a few excitatory components. Other cells in V2 are a mixture of several wavelet components as well as suppression from other components. Qualitatively, it appears that components are summed to create robust, broad-band edge detectors in V2. The authors note that inhibition plays a large role in the receptive field properties and that natural input statistics are important for recruiting realistic inhibition.

The authors explore response properties in area V2 using natural images. The spatial but not the temporal statistics of natural images are present in their stimuli. They use the Berkeley Wavelet Transform as a non-linear preprocessing step to generate components to relate to V2 responses by reverse correlation. Phases are quadrature encoded. The authors find that the BWT fits V1 and some V2 cells well with only a few excitatory components. Other cells in V2 are a mixture of several wavelet components as well as suppression from other components. Qualitatively, it appears that components are summed to create robust, broad-band edge detectors in V2. The authors note that inhibition plays a large role in the receptive field properties and that natural input statistics are important for recruiting realistic inhibition.

Labels:
neuroscience,
papers

## 20110212

## 20110205

### 3D Printed Polyhedral Lamp

These are instructions for building a very bright lamp with 20 bulbs and a truncated icosahedral core. [Thingiverse entry].

Parts :

- 1 × pentagonal hook http://www.thingiverse.com/thing:6117 "p hook 3"
- 12 × basic pentagons http://www.thingiverse.com/thing:5961 "p" ( pentagon )
- 20 × hexagonal lamp brackets http://www.thingiverse.com/thing:6055
- 20 × lamp sockets Cooper Wiring 732-BOX Sign/Scoreboard Lampholder
- 20 × low wattage CFL bulbs, like these
- 1 × electrical plug SA440-BKCC10 15A black nonpolar plug or equivalent ( any plug will do )
- 16 feet of 12 to 16 gauge lamp cord
- 6 feet of rope or chain
- 2 × (optional) rope holder thing http://www.thingiverse.com/thing:6149

Materials:

- electrical tape
- super-glue ( I used Gorilla brand )

Tools:

- Pliers
- 3D printer
- razor knife
- wire cutters
- wire strippers
- Phillip's head screwdriver

Assembly:

First print out the indicated quantity of all printed parts.

More detailed assembly instructions for the lamp socket brackets can be found on the thingiverse page. Trim the bracket until the black socket rests flush inside. This is important, since we need the hexagonal cover plate to bond to both the bracket and the socket for a good fit.

The orientation of the socket within the bracket will matter later. The socket has a wide ridge. Align this ridge with a side of the bracket for 10 pieces. Align the ridge with a corner of the bracket for the other 10. Aligning randomly also works, as long as you don't align all sockets so that the wide parts face a side.

Print out 12 pentagonal pieces. All pieces have extra plastic to stabilize the hinge while printing. This can be removed easily with a razor knife.

Perform a test assembly with just the hexagonal pieces. Leave out the pentagons for now since they are hard to remove once assembled. Ensure that all light sockets fit properly and don't collide. You may have to experiment, rotating and swapping between pieces, to get everything to fit well. If all else fails you can tap apart one of the brackets and re-orient it.

Carefully unfold your test assembly into an as-linear-as-possible planar arrangement like below. The exact arrangement doesn't really matter, just so long as there isn't too much branching.

The lamp sockets clip onto 12 to 14 gauge electrical wire. The only 12 gauge wire I could find had too thick of insulation to work with these sockets. I used 16 gauge wire instead, which just barely works. Using scissors or a knife, separate one end of the lamp cord. Protect the ends with electrical tape. Starting at the far end, clamp the sockets to the cable in turn. The sockets are difficult to close, so I had to use pliers to get enough force.

Before you get excited and attach the plug to test everything, slide on the pentagonal hook piece over the cable. The top of the printed piece should be facing away from the assembly, toward the plug. I neglected to do this, and had to dis-assemble my plug to add this piece.

To assemble the plug, use needle-nose pliers to remove the orange stopper from the front of the plug. Remove the prongs. Thread the lamp cord through. Split and strip about 13mm from the end of each wire. Wrap the exposed wire around the bolts attached to the prongs, and tighten the bolts well. Replace the prongs and stopper.

Test each of your sockets. Turn everything over and plug in some lightbulbs. I did it the dangerous way by adding and removing bulbs ( I only had 2 at the time ) while the thing was plugged in. People that don't want to die should un-plug the setup while moving the bulbs. Better yet, order the bulbs with the rest of your parts and put them all in at once to test.

The next step is tricky. Unplug the setup and remove the bulbs. Turn over the setup. You are going to need to fold the pieces back into the polyhedral shape. The lamp cord is inflexible and resists folding, but bending each joint beforehand helps. Adding in the pentagons while folding provides more stability. As the polyhedron becomes more complete, it becomes more difficult to add pieces. If you're having trouble getting a hinge to mate, pry up slightly the side that is already in the polyhedron. The hinges come together more easily if pushed together from the side, rather than if pushed down from above.

When it was all done, the compressed cable overpowered the super-glue on a couple brackets, thankfully this mistake is easily fixed with more super-glue and some patience. You should end up with an object that looks more than a little bit like the detonation mechanism for an atomic bomb. The final assembly is very strong and the hinges will hold together without additional glue.

The last piece you'll insert is the one that contains the power cord and the rope or chain for hanging the lamp. I would attach rope or chain before you add this piece. Don't use polypropylene rope like I did, it doesn't hold knots. A chain would look nicer anyway.

Thats it. You're done. Hang the lamp somewhere, insert bulbs, and power up your own miniature sun.

Labels:
makerbot,
Projects,
very bright lights

## 20110203

### Lighting Element

I plan on assembling 20 of these in a soccer ball shape, to create a ridiculously luminous lamp, which I'm going to call "modern sun" (even though that has been done). It will wake me up naturally in the morning to combat this grey New England sky.

Labels:
3D printing,
elements,
lighting,
makerbot,
thingiverse,
things

## 20110202

### A few questions

Every now and then, I get confused by the conversion of spike trains to continuous signals. A lot of existing signal processing algorithms have assumptions that make them unsuitable for spike-like data. To circumvent this, we usually smooth or bin the spike trains in some way. This conversion is always arbitrary, and I still haven't wrapped my head around what it means when we do it.

In effort to learn more about this, I asked myself ( and some friends ) three questions :

Given a continuous signal and a spike source encoding part of said input, how do you optimally reconstruct the continuous signal from the spikes ?

Given the same as above, how might you optimally reconstruct the spikes from the signal ?

Given a single-channel analog data stream, how might you best encode it in a neural spike train ?

While its possible to write down mathematical expressions for all of the above, we don't really have algorithms to optimize or solve for the raw expressions.

We can do a couple of things instead :

Given a continuous signal and a spike source encoding part of said input, how do you optimally reconstruct the continuous signal from the spikes ?

- Reverse correlation can give you a reconstruction of a continuous stimulus or response based on spikes. Similar but more advanced algorithms also exist to perform this conversion. Of the three questions, I believe this one is most nearly solved.

Given the same as above, how might you optimally reconstruct the spikes from the signal ?

- Rather than model spikes, model the conditional intensity function ( variable rate in a poisson process ). This has the benefit of capturing the variability of spiking response. Inidivual spike trains can be drawn from this conditional intensity function. If your spike trains are highly reliable, the conditional intensity function will become obviously spike-like and this will effectively reconstruct spike times. If your spike trains are unreliable, the conditional intensity will look more like a smoothly fluctuating rate. There is a lot of flexibility in how you fit this model, and algorithms in "point process modeling" are subject of current research.

Given a single-channel analog data stream, how might you best encode it in a neural spike train ?

- Perform a decomposition with a sparseness constraint in both space and time. This will give you a basis where functions are sparsely and briefly activated. Conceivably, you could convert a time course in the new basis to spikes with less error than say, asuming rate coding. This research is more machine learning. Specifically, Olhousen has done work in this area.

In effort to learn more about this, I asked myself ( and some friends ) three questions :

Given a continuous signal and a spike source encoding part of said input, how do you optimally reconstruct the continuous signal from the spikes ?

Given the same as above, how might you optimally reconstruct the spikes from the signal ?

Given a single-channel analog data stream, how might you best encode it in a neural spike train ?

While its possible to write down mathematical expressions for all of the above, we don't really have algorithms to optimize or solve for the raw expressions.

We can do a couple of things instead :

Given a continuous signal and a spike source encoding part of said input, how do you optimally reconstruct the continuous signal from the spikes ?

- Reverse correlation can give you a reconstruction of a continuous stimulus or response based on spikes. Similar but more advanced algorithms also exist to perform this conversion. Of the three questions, I believe this one is most nearly solved.

Given the same as above, how might you optimally reconstruct the spikes from the signal ?

- Rather than model spikes, model the conditional intensity function ( variable rate in a poisson process ). This has the benefit of capturing the variability of spiking response. Inidivual spike trains can be drawn from this conditional intensity function. If your spike trains are highly reliable, the conditional intensity function will become obviously spike-like and this will effectively reconstruct spike times. If your spike trains are unreliable, the conditional intensity will look more like a smoothly fluctuating rate. There is a lot of flexibility in how you fit this model, and algorithms in "point process modeling" are subject of current research.

Given a single-channel analog data stream, how might you best encode it in a neural spike train ?

- Perform a decomposition with a sparseness constraint in both space and time. This will give you a basis where functions are sparsely and briefly activated. Conceivably, you could convert a time course in the new basis to spikes with less error than say, asuming rate coding. This research is more machine learning. Specifically, Olhousen has done work in this area.

Labels:
neuroscience,
notes

## 20110201

### Improved Polyhedron Kit

This kit uses the same type of snap at all locations so theres no puzzle of figuring out where to put snaps to assemble the polyhedron.

Labels:
elements,
makerbot,
thingiverse,
things

## 20110131

### Collins Assisi & al. Using the Structure of Inhibitory Networks to Unravel Mechanisms of Spatiotemporal Patterning

Collins Assisi, Mark Stopfer, Maxim Bazhenov (2011) Using the Structure of Inhibitory Networks to Unravel Mechanisms of Spatiotemporal Patterning, DOI 10.1016/j.neuron.2010.12.019 [via]

Initially I was quite excited by this paper. I was a little disappointed that only a qualitative description of the results was provided. The figures were remarkably clear.

The way they constructed graphs with a given k-coloring was just to use bipartite, tripartite ... k-partite graphs to explore 1,2,...,k colorings. This didn't seem all that biologically plausible to me, but they do some controls that suggests that the results will generalize.

Real neural networks probably have connectivity such that the actual number of colors required to color the graph is huge, and that in practice there aren't really as many oscillatory subpopulations as there are colors. They address this with their "multiple coloring" explanation.

They note that as the number of colors increased the orderly activity decreased. It sounds like the timescale of adaptation determines this : you need the k colorings to be able to form a cycle that fits within the natural wavelength of slow inhibitory processes.

I wonder if the sets of neurons with identical color in this paper are related to the potential wells seen in Tkačik & al. In Tkačik & al the wells are stable minima for spontaneous activity. In Assisi & al., if you turn of adaptation single-color clusters are also the stable minima.

I would be interested in seeing a more theoretical analysis using LIF, theta, or rate-based neurons.

Initially I was quite excited by this paper. I was a little disappointed that only a qualitative description of the results was provided. The figures were remarkably clear.

The way they constructed graphs with a given k-coloring was just to use bipartite, tripartite ... k-partite graphs to explore 1,2,...,k colorings. This didn't seem all that biologically plausible to me, but they do some controls that suggests that the results will generalize.

Real neural networks probably have connectivity such that the actual number of colors required to color the graph is huge, and that in practice there aren't really as many oscillatory subpopulations as there are colors. They address this with their "multiple coloring" explanation.

They note that as the number of colors increased the orderly activity decreased. It sounds like the timescale of adaptation determines this : you need the k colorings to be able to form a cycle that fits within the natural wavelength of slow inhibitory processes.

I wonder if the sets of neurons with identical color in this paper are related to the potential wells seen in Tkačik & al. In Tkačik & al the wells are stable minima for spontaneous activity. In Assisi & al., if you turn of adaptation single-color clusters are also the stable minima.

I would be interested in seeing a more theoretical analysis using LIF, theta, or rate-based neurons.

Labels:
neuroscience,
papers

### Breakout Elements

Hello. Now that I've got my MakerBot in practically working order, I'm thinking of posting one thing every week. Maybe even more frequently than that. Since I just started, I have a backlog of things I've designed.

Today's thing is "Breakout Elements". I designed this script to make snap-together brackets for Sparkfun breakout-boards. This thing also sets a new standard in terms of snap dimensions and distances.

Labels:
breakout,
elements,
makerbot,
thingiverse,
things

## 20110114

### Fake Dynamic DNS, SSH, BASH, and Dropbox

I recently wanted to set up remote access to my home computer, which doesn't have a stable IP address. Rather than do the sane thing and set up dynamic DNS, I decided to experiment with an ad-hoc solution involving drop box.

In brief : the home computer writes its current IP address to a drop-box folder, and scripts on remote computers reference this information to log in remotely.

In brief : the home computer writes its current IP address to a drop-box folder, and scripts on remote computers reference this information to log in remotely.

### Hatsopoulos &al — Encoding of Movement Fragments in the Motor Cortex (2007)

Nicholas G. Hatsopoulos, Qingqing Xu, and Yali Amit, Encoding of Movement Fragments in the Motor Cortex, The Journal of Neuroscience, May 9, 2007 - 27(19)

This paper explores the spatio-temporal response fields of M1 neurons. The authors were able to reconstruct spiking activity based on movement trajectory information within -100 to +300 ms. They use the average area under the receiver operating characteristic (ROC) curve for predicted spikes to quantify the predictive power of their model.

This paper explores the spatio-temporal response fields of M1 neurons. The authors were able to reconstruct spiking activity based on movement trajectory information within -100 to +300 ms. They use the average area under the receiver operating characteristic (ROC) curve for predicted spikes to quantify the predictive power of their model.

Labels:
neuroscience,
paper

## 20110113

### Vangeneugden & al. — Distinct Mechanisms for Coding of Visual Actions in Macaque Temporal Cortex (2011)

Joris Vangeneugden, Patrick A. De Maziere, Marc M. Van Hulle, Tobias Jaeggli, Luc Van Gool, and Rufin Vogels — Distinct Mechanisms for Coding of Visual Actions in Macaque Temporal Cortex, The Journal of Neuroscience, January 12, 2011 - 31(2)

The authors explore the coding of action perception, specifically walking, in extrastriate cortex. They used stick-silhouette figures, and used measured neural responses as inputs to a linear classifier in attempt to see how well the neurons encoded the stimuli.

Labels:
neuroscience,
paper

### Nelissen & al. — Charting the Lower Superior Temporal Region, a New Motion-Sensitive Region in Monkey Superior Temporal Sulcus (2006)

Koen Nelissen, Wim Vanduffel, and Guy A. Orban — Charting the Lower Superior Temporal Region, a New Motion-Sensitive Region in Monkey Superior Temporal Sulcus, The Journal of Neuroscience 26(22)

This paper explores motion sensitive areas in the superior temporal sulcus (STS) using fMRI. Their approach is to present a diversity of stimuli with different properties, then look at the pairwise differences in fMRI activation between each type of stimulus.

This paper explores motion sensitive areas in the superior temporal sulcus (STS) using fMRI. Their approach is to present a diversity of stimuli with different properties, then look at the pairwise differences in fMRI activation between each type of stimulus.

Labels:
neuroscience,
paper

## 20110112

### Learning Transformational Invariants from Natural Movies

Cadieu & Olshausen, 2009, Learning Transformational Invariants from Natural Movies

My favorite thing about this paper is its cool spatiotemporal eigenfunction video. Although their mathematical methods don't relate directly to biology, their model provides a new way of thinking about separation of form and motion information in early visual processing.

Their model has two layers, the first of which resembles basis-function learning explored in previous papers, and the second of which learns a sparse representation of motion based on modulations of spatial basis functions. They aim to extract transformation invariants : the features common to moving objects that depend entirely on the movement, and very little on the form of the object.

My favorite thing about this paper is its cool spatiotemporal eigenfunction video. Although their mathematical methods don't relate directly to biology, their model provides a new way of thinking about separation of form and motion information in early visual processing.

Their model has two layers, the first of which resembles basis-function learning explored in previous papers, and the second of which learns a sparse representation of motion based on modulations of spatial basis functions. They aim to extract transformation invariants : the features common to moving objects that depend entirely on the movement, and very little on the form of the object.

Labels:
neuroscience,
paper

## 20110111

### Pietro Berkes &al. — Spontaneous Cortical Activity Reveals Hallmarks of an Optimal Internal Model of the Environment (2011)

Pietro Berkes, Gergő Orbán, Máté Lengyel, József Fiser (2011) Spontaneous Cortical Activity Reveals Hallmarks of an Optimal Internal Model of the Environment, Science Vol. 331 no. 6013 pp. 83-87, DOI: 10.1126/science.119587

One theory for early sensory cortex is that the brain learns the statistics of the environment, and predicts incomplete information about the environment in a Bayesian optimal manner.

One theory for early sensory cortex is that the brain learns the statistics of the environment, and predicts incomplete information about the environment in a Bayesian optimal manner.

Labels:
bayesian,
information theory,
neuroscience,
paper,
statistics,
summary

## 20110110

### M. Szabo, R. Almeida, G. Deco, M. Stetter — A neuronal model for the shaping of feature selectivity in IT by visual categorization (2005)

M. Szabo, R. Almeida, G. Deco, M. Stetter (2005) A neuronal model for the shaping of feature selectivity in IT by visual categorization, Neurocomputing 65–66 (2005) 195–201

The authors propose that this effect might be explained by interaction between IT and another part of the brain, speculatively pref-frontal cortex (PFC), a region implicated in attention and judgement. A competing hypothesis would be that area IT changes how it interprets visual input, rather than relying on such top-down judgements. It is possible that both mechanisms are at play, but on different timescales. However, this paper explains experimental data using only the attention model.

I think I will start cross-posting to here my contributions to http://pqdb.org/posts/, at least those that relate to my honest research efforts ( as opposed to baseless speculation ).

## 20110104

### Subject 3, Trial 4, "Walk on uneven terrain"

CMU motion capture dataset

Subject 3, Trial 4, "Walk on uneven terrain"100 overlaid walkers

scale : Uniform(0.4,1.0), offset : Gaussian(μ=0,σ = [1 2 3 4 6 8])

Labels:
dots,
motion capture,
rendered,
science,
stimuli

Subscribe to:
Posts (Atom)