# Size's PE data analysis

## Size's PE data analysis

I have made some small inroads into analyzing Size’s PE data. The analysis so far is very basic due to the unsophisticated programs I am forced to use. I hate Quattro Pro. Excel is OK for basic statistical analysis but Quattro Pro is worse. After making a scatter plot of the data, I could not even figure out how to change the size of data points on the plot. It does not even do two dimensional histogram plotting. How ridiculous.

What I want to do first is make a probability density estimation of the data generator. ie, we have the data points that are samples form the “Data generator”. The data generator represents the “Truth” of PE as best we can know it. I want a density plot of this “data generator function”.

I need matlab at least. I will have to figure out some way of getting it.

Here is the stupid plot of the data in the most basic form possible. The yellow points are a linear regression of the data. Unfortunately I doubt QuattroPenis preserved the multiplicity of tdata points, hence the regression is flawed.

Ugh, I need to stop this Quattro Pro nonsense else I will kill something.

Hi Tube,

Try Mathematica. It’s really one of the best pieces of software ever made. Ok, it’s a little tough to understand/use for people that didn’t use any scripting language before but with Mathematica almost every data handling is possible.
I’ve done some data analysis on the Size’s Data by myself with Mathematica. Maybe I should post some plot screenshots too.

I prefer matlab for data analysis, it can do anything mathematica can (as far as statistical analysis) and even more. For instance, it can use artificial neural networks for pattern recognition. I’d take mathematica if I could get it though. I am going to have to figure out how/where to get either one.

Good work Tube, you just made my favorites. :up:

I really like the linear regression, but yeah, better software would make it easier to read and understand. Have you looked for the software you want to use on eMule?

I haven’t tried file sharing yet. I think thats what eMule is right?

PS. After I thought about it for a few minutes, I think the linear regression is in fact accurate since it was performed on the original data and not derived from the plotting engine.

PPS. I made my own favorites too! ;)

>>I haven’t tried file sharing yet. I think thats what eMule is right?<<

Yep, check this post for a very brief introduction.

>>PPS. I made my own favorites too!<<

I saw that. Very nice. :thumbs:

I’d like to know about that dot at around 3.5 inches in 4 months. :D

started 10/22/2003: BPEL: 5.5" EG: 4.0" 4/12/2004 BPEL: 6.875" EG: 5.2" 30-min exercise workout and pills

The member is RB who went from 5 to 8.5 ELBP in 22 weeks. Pretty incredible.

## Sneak peak.

Here’s a sneak peak of what I’ve been working on. I’ll explain later.

Hmm.

This reminds me of something, A tv commercial.

Had something to do with tastes buds. When the camera zoomed in there were actors in taste-bud-on the-tongue costumes complaining about something. Salt? Beer?

Ok, the 3-dimensional plot I posted above is the same as the data in the two dimensional plot at the top of this thread, in case you hadn’t guessed.

I modelled each data point as a gaussian distribution in the two axes of these plots — time on one axis and gains on the other. The reason behind doing this was pretty hand-waving; basically I figured that we could assume each data point has some error in it (also we are totally neglecting the PE techniques used so you could say that variance in technique shows up here as an uncertainty in the gains made, or stated more simply there should be some uncertainty in each data point because well, what if they had stretched with more effective techniques/longer. Also there is the variance in a persons biochemical makeup). For each data point I used 0.5 weeks as the error in time, and 0.1 inches as the error in length. Then I added up all of these gaussian curves. But there was a problem. In regions of the graph with many many data points (closer to the origin) these data points overwhelmed the graph so that all that could be seen was a giant tower near the origin (everywhere else on the graph was so small in comparison that you could not see anything).

So I had to normalize everything so that the height of each peak was not proportional to the number of data points nearby. Unfortunately I have not perfected the method of normalizing things. As you can see I now have the opposite problem. In regions of sparse data, the peaks are larger. This has to do with the method I used to normalize things which I won’t bother you with the details of (technically I normalized based on the density of data points in each bin along the x-axis).

So now I have a better idea. I figure the uncertainty in the data points should be porportional to the number of data points in the vicinity.
I’m working on this now.

## Dataset 1 View 2

Ok, I am going to post some new graphs of the same data taken from different viewpoints. I am calling this data “dataset1”. All of the graphs in dataset one have gone through the same data processing routine. The only thing different is the way this data was presented visually.

Here is view 2:

## Dataset 1 View 3

View 3:

## Dataset 1 View 4

View 4:

