Generalization is the part
of induction in which formal rules are extracted from a set of observation.
A modeling error which occurs when a function is
too closely fit to a limited set of data points. Overfitting the model
generally takes the form of making an extra complex model to explain the data it
may individual or in group , under study. In reality, the data being studied
often has some degree of error or random noise within it. Thus attempting to
make the model accurate , too closely to
slightly inaccurate data can infect the model with fundamental errors and
reduce its predictive power.
While developing a model, our objective is to
be able to predict / forecast the dependent variables. In scenarios where the
accuracy of predicting is low, we say the model is underfitting and in cases
where the accuracy of prediction is too high for the known cases we say the
model is overfitting. Over fitting will arise in the cases where the test data
is made by subsetting the train data only or where we have used combined
results of several different models for prediction. Under fitting arises in
case of not well defined models.
the overfitting problem:
There are few of solution to avoid overfitting
Cross validation is very powerful technique to
In cross validation we devide our data in small parts using standard
k-fold cross validation . the data is partitioned in three subsets which are
called folds . then we iteratively train the algorithms on k-1 folds before
using the other folds.
This allows you to keep your test set as a truly
unseen dataset for selecting your final model.
it is not work every time . but training with
more data can help different algorithms to detect the signal better.
In those situation you can analyze each feature
in it ,you can make a imagination scenario and check which model is used where
and when and where is its basic importance if anything useless you can easily
judge it and remove it.
When you are training a learning algorithm many of iterations can perform after
performing each iteration a new iteration can improve the model and lesser the
error as possible. The models ability to generalized can weaken as it begins to
over fit the data . early stopping means stopping the training process before
the learner passes that point.
This is refers to broad range of techniques for
artificially forcing your model to be simpler. The regularization method is a
hyper parameters as well , which means it can be tuned through cross validation
It is machine learning methods for combining
predictions from multiple separate models , there are two different methods
which are common in ensembling.
Bagging attempts to reduce the chance overfitting complex
Boosting attempts to improve the predictive flexibility of
Both models work in opposite direction bagging
use complex model and tries to easier
their predictions , while boosting uses simple base models and tries to
“boost ” their aggregate complexity.
overfitting problem in back propagation
When doing backpropagation
in artificial neural networks we train our data accurately and correctly and
the errors are minimized as possible all things are doing right but sometimes
it can face to train the data which is not of this kind which we train our
system this problem causes overfitting .
We can avoid overfitting in
backpropagation using different techniques which are listed up cross-validation
,train with more data ,early stopping regularization and ensembling.
It can be avoided by
using as much training data as possible, ensuring as much diversity as possible
in the data. This cuts down on the potential existence of features that might
be discriminative in the training data, but are otherwise spurious.
It can be avoided by jittering (adding
The problem with this
approach is how can we correctly choose
the level of noise.
Overfitting phenomena using polynomial interpolation problem
interpolation is a method of estimating values between known data points . when
graphical data contains a gap , but data is available on either side of the gap
or the few specific point within the gap , an estimates of values within the
gap can be made by interpolation . The simplest method of
interpolation is to draw straight lines between the known data points and
consider the function as the combination of those straight lines. This method,
called linear interpolation, usually introduces considerable error.
phenomena we can categorized our data by draw two lines which is called
interpolation it can separate extra data which is not used to train or
irrelevant we can also used another technique which is more precise which
connect the points
Polynomials can be used to approximate
complicated curves, for example, the shapes of letters in typograph .pick a few known data
points, create a lookup
table, and interpolate between
those data points. This results in significantly faster computations Polynomial
interpolation also forms the basis for algorithms in numerical quadrature and numerical ordinary
differential equations and Secure Multi Party
A polynomial is
a mathematical expression comprising a sum of terms, each term including a
variable or variables raised to a power and multiplied by a coefficient.
The simplest polynomials have one variable
Applying overfitting we can use curve to touch approximately
all data which can be touch by the curve and then ignore which is not in the curve.
It can satisfied all the range approximately
Q= Give a list of machine learning model for
supervised learning . how are they different? What are their similarities ?
what representation do they use ? what ml method use tree structure for representing
their model ? what methods use graph or network representations? What method
use list structure or rule sets for representing models?
Machine learning model
The models which are u
choose are depends on the target which you want to achieve. Algorithm used in machine learning can be roughly categorized in
three main categories. Supervised unsupervised and reinforcement learning.
this is a model of
supervised learning. A neural network processes an input vector to a resulting
output vector through a model inspired by neurons and their connectivity in the
brain . this model consist of layers of neuron which are interconnected to each
other .it has inputs and sometimes
target outputs which are determined by activation function .the output is
computed by applying the input input vector to the input layer of the network
than computing the outputs of each neuron through the network.