Generalization=

Generalization is the part

of induction in which formal rules are extracted from a set of observation.

Overfitting=

A modeling error which occurs when a function is

too closely fit to a limited set of data points. Overfitting the model

generally takes the form of making an extra complex model to explain the data it

may individual or in group , under study. In reality, the data being studied

often has some degree of error or random noise within it. Thus attempting to

make the model accurate , too closely to

slightly inaccurate data can infect the model with fundamental errors and

reduce its predictive power.

While developing a model, our objective is to

be able to predict / forecast the dependent variables. In scenarios where the

accuracy of predicting is low, we say the model is underfitting and in cases

where the accuracy of prediction is too high for the known cases we say the

model is overfitting. Over fitting will arise in the cases where the test data

is made by subsetting the train data only or where we have used combined

results of several different models for prediction. Under fitting arises in

case of not well defined models.

Avoiding

the overfitting problem:

There are few of solution to avoid overfitting

Cross

validation:

Cross validation is very powerful technique to

avoiding overfitting.

In cross validation we devide our data in small parts using standard

k-fold cross validation . the data is partitioned in three subsets which are

called folds . then we iteratively train the algorithms on k-1 folds before

using the other folds.

This allows you to keep your test set as a truly

unseen dataset for selecting your final model.

Train with

more data

it is not work every time . but training with

more data can help different algorithms to detect the signal better.

Remove

features

In those situation you can analyze each feature

in it ,you can make a imagination scenario and check which model is used where

and when and where is its basic importance if anything useless you can easily

judge it and remove it.

Early

stopping

When you are training a learning algorithm many of iterations can perform after

performing each iteration a new iteration can improve the model and lesser the

error as possible. The models ability to generalized can weaken as it begins to

over fit the data . early stopping means stopping the training process before

the learner passes that point.

Regularization

This is refers to broad range of techniques for

artificially forcing your model to be simpler. The regularization method is a

hyper parameters as well , which means it can be tuned through cross validation

.

Ensembling

It is machine learning methods for combining

predictions from multiple separate models , there are two different methods

which are common in ensembling.

·

Bagging attempts to reduce the chance overfitting complex

models

·

Boosting attempts to improve the predictive flexibility of

simple models

Both models work in opposite direction bagging

use complex model and tries to easier

their predictions , while boosting uses simple base models and tries to

“boost ” their aggregate complexity.

Avoid

overfitting problem in back propagation

When doing backpropagation

in artificial neural networks we train our data accurately and correctly and

the errors are minimized as possible all things are doing right but sometimes

it can face to train the data which is not of this kind which we train our

system this problem causes overfitting .

We can avoid overfitting in

backpropagation using different techniques which are listed up cross-validation

,train with more data ,early stopping regularization and ensembling.

It can be avoided by

using as much training data as possible, ensuring as much diversity as possible

in the data. This cuts down on the potential existence of features that might

be discriminative in the training data, but are otherwise spurious.

It can be avoided by jittering (adding

noise).

The problem with this

approach is how can we correctly choose

the level of noise.

Overfitting phenomena using polynomial interpolation problem

Polynomial

interpolation is a method of estimating values between known data points . when

graphical data contains a gap , but data is available on either side of the gap

or the few specific point within the gap , an estimates of values within the

gap can be made by interpolation . The simplest method of

interpolation is to draw straight lines between the known data points and

consider the function as the combination of those straight lines. This method,

called linear interpolation, usually introduces considerable error.

in overfitting

phenomena we can categorized our data by draw two lines which is called

interpolation it can separate extra data which is not used to train or

irrelevant we can also used another technique which is more precise which

connect the points

Polynomials can be used to approximate

complicated curves, for example, the shapes of letters in typograph .pick a few known data

points, create a lookup

table, and interpolate between

those data points. This results in significantly faster computations Polynomial

interpolation also forms the basis for algorithms in numerical quadrature and numerical ordinary

differential equations and Secure Multi Party

Computation, Secret

Sharing schemes.

A polynomial is

a mathematical expression comprising a sum of terms, each term including a

variable or variables raised to a power and multiplied by a coefficient.

The simplest polynomials have one variable

Applying overfitting we can use curve to touch approximately

all data which can be touch by the curve and then ignore which is not in the curve.

It can satisfied all the range approximately

Q= Give a list of machine learning model for

supervised learning . how are they different? What are their similarities ?

what representation do they use ? what ml method use tree structure for representing

their model ? what methods use graph or network representations? What method

use list structure or rule sets for representing models?

Machine learning model

The models which are u

choose are depends on the target which you want to achieve. Algorithm used in machine learning can be roughly categorized in

three main categories. Supervised unsupervised and reinforcement learning.

Neural networks

this is a model of

supervised learning. A neural network processes an input vector to a resulting

output vector through a model inspired by neurons and their connectivity in the

brain . this model consist of layers of neuron which are interconnected to each

other .it has inputs and sometimes

target outputs which are determined by activation function .the output is

computed by applying the input input vector to the input layer of the network

than computing the outputs of each neuron through the network.

Decision tree