Data Science FAQ’s
Data Science Online Training in Ameerpet Hyderabad
We are providing Data Science Online Training in Ameerpet Hyderabad. We are one of best Institute to provide Best High Quality Data Science online training all over India. The IT Professionals and Students from India and abroad who are unable to attend regular classes can attend our Data Science online training from their home in their convenient timings. For more details on Data Science Online Training and Data Science FAQ’s please call to 9290971883, / 9247461324, or drop a mail to revanthonlinetraining@gmail.com
2. What is the difference between supervised and unsupervised machine learning?
The Supervised machine learning allows to collect data or produce a data output from the previous experience. It uses labeled datasets to train algorithms that to classify data or predict outcomes accurately.
Unsupervised machine learning is a technique, where you do not need to supervise the model. It helps you to finds all kind of unknown patterns in data.
3. What is selection Bias ?
Selection bias occurs when the sample obtained is not representative of the population intended to be analysed.
4. What are the different kernels functions in SVM ?
There are 4 different types of kernels in SVM.
- Linear Kernel
- Polynomial kernel
- Radial basis kernel
- Sigmoid kernel
5. What does NLP stand for?
NLP stands for Natural Language Processing. NLP is a branch of artificial intelligence which gives machines the ability to read and understand human languages.
6. What is pruning in Decision Tree ?
The process of removing sub-nodes of a decision node is called pruning or opposite process of splitting.
7. What is Ensemble Learning ?
Ensemble learning is the art of combining diverse set of learners(Individual models) together to improvise on the stability and predictive power of the model.
8. What is Random Forest?
Random forest is a versatile machine learning method, which is capable of performing both regression and classification of tasks. It is used for dimentionality reduction, treats missing values, outlier values. It is a type of ensemble learning method, where a group of weak models combine to form a powerful model.
9. What is a false positive and a false negative ?
A false positive is an incorrect identification of the presence of a condition when it’s absent.
A false negative is an incorrect identification of the absence of a condition when it’s actually present.
10. What is the statistical power?
‘Statistical power’ is the power of a binary hypothesis, which is the probability that the test rejects the null hypothesis given that the alternative hypothesis is true.
11. What is the difference between “long” and “wide” format data?
In the wide-format data, a subject’s repeated responses will be in a single row, and each response is in a separate column.
In the long-format data, each row is a one-time point per subject. We can recognize data in wide format by the fact that columns generally represent groups.
12. What is p-value?
When we perform a hypothesis test in statistics, a p-value can help us to determine the strength of the results. p-value is a number between 0 and 1. Based on the value it will denote the strength of the results.
13. Give an example where the median is a better measure than the mean
The mean is the most frequently used measure of central tendency as it uses all the values in the data set to give you an average. For data from the skewed distributions, the median is better than the mean because it is not influenced by extremely large values.
14. Give examples of data that does not have a Gaussian distribution, nor log-normal.
Any type of categorical data will not have a gaussian distribution or lognormal distribution.
Exponential distributions — eg. the amount of time that a car battery lasts or the amount of time until an earthquake occurs.
15. Why Is Re-sampling Done?
Resampling is done to (i) Estimate the accuracy of sample statistics by using subsets of accessible data or drawing randomly with replacement from a set of data points. (ii). Substitute labels on data points when performing significance tests (iii) Validaeg models by using random subsets (bootstrapping, cross-validation).
16. What is the Law of Large Numbers?
The Law of Large Numbers is a theory which states that as the number of trials increases, the average of the result will become closer to the expected value.
Eg. flipping heads from fair coin 100,000 times should be closer to 0.5 than 100 times.
17. How to combat Overfitting and Underfitting?
To combat overfitting and underfitting, we can resample the data to estimate the model accuracy (k-fold cross-validation) and by having a validation dataset to evaluate the model.
18. What is A/B testing?
A/B testing is a form of hypothesis testing and two-sample hypothesis testing to compare two versions, the control and variant, of a single variable. It is commonly used to improve and optimize user experience and marketing.
19. What is Survivorship Bias?
It is the logical error of focusing aspects that support surviving some process and casually overlooking those that did not work because of their lack of prominence. This can lead to wrong conclusions in numerous different means.
20. How do you control for biases?
In many ways we can control and minimize bias. The 2 common ways include randomization, where participants are assigned by chance, and random sampling, sampling in which each member has an equal probability of being chosen.
Data Science FAQ’s
Institute Address :
B1, 3rd Floor, Eureka Court, Near Image Hospital, Ameerpet, Hyderabad, India
Other Courses :
UI Development Online Training