Sayed Jamal Mirkamali

Well, Mike Piazza has a slightly higher career batting average (2127 hits / 6911 at-bats = 0.308) than Hank Aaron (3771 hits / 12364 at-bats = 0.305). But can we say with confidence that his skill is actually higher, or is it possible he just got lucky a bit more often?

In this series of posts about an empirical Bayesian approach to batting statistics, we’ve been estimating batting averages by modeling them as a binomial distribution with a beta prior. But we’ve been looking at a single batter at a time. What if we want to compare two batters, give a probability that one is better than the other, and estimate by how much?

This is a topic rather relevant to my own work and to the data science field, because understanding the difference between two proportions is important in A/B testing. One of the most common examples of A/B testing is comparing clickthrough rates (“out of X impressions, there have been Y clicks”)- which on the surface is similar to our batting average estimation problem (“out of X at-bats, there have been Y hits””).1

Here, we’re going to look at an empirical Bayesian approach to comparing two batters.2 We’ll define the problem in terms of the difference between each batter’s posterior distribution, and look at four mathematical and computational strategies we can use to resolve this question. While we’re focusing on baseball here, remember that similar strategies apply to A/B testing, and indeed to many Bayesian models.

Most machine learning algorithms have been developed and statistically validated for linearly separable data. Popular examples are linear classifiers like Support Vector Machines (SVMs) or the (standard) Principal Component Analysis (PCA) for dimensionality reduction. However, most real world data requires nonlinear methods in order to perform tasks that involve the analysis and discovery of patterns successfully.

The focus of this article is to briefly introduce the idea of kernel methods and to implement a Gaussian radius basis function (RBF) kernel that is used to perform nonlinear dimensionality reduction via BF kernel principal component analysis (kPCA).

You may be thinking that this title makes no sense at all. ML, AI, ANN and Deep learning have made it into the everyday lexicon and here I am, proclaiming that ML is dead. Well, here is what I mean…

The open sourcing of entire ML frameworks marks the end of a phase of rapid development of tools, and thus marks the death of ML as we have known it so far. The next phase will be marked with ubiquitous application of these tools into software applications. And that is how ML will live forever, because it will seamlessly and inextricably integrate into our lives.

Assessing the fit of a count regression model is not necessarily a straightforward enterprise; often we just look at residuals, which invariably contain patterns of some form due to the discrete nature of the observations, or we plot observed versus fitted values as a scatter plot. Recently, while perusing the latest statistics offerings on ArXiv I came across Kleiber and Zeileis (2016) who propose the rootogram as an improved approach to the assessment of fit of a count regression model. The paper is illustrated using R and the authors’ countreg package (currently on R-Forge only). Here, I thought I’d take a quick look at the rootogram with some simulated species abundance data.

Integrates the theory and applications of statistics using R. A Course in Statistics with R has been written to bridge the gap between theory and applications and explain how mathematical expressions are converted into R programs. The book has been primarily designed as a useful companion for a Masters student during each semester of the course, but will also help applied statisticians in revisiting the underpinnings of the subject. With this dual goal in mind, the book begins with R basics and quickly covers visualization and exploratory analysis. Probability and statistical inference, inclusive of classical, nonparametric, and Bayesian schools, is developed with definitions, motivations, mathematical expression and R programs in a way which will help the reader to understand the mathematical development as well as R implementation. Linear regression models, experimental designs, multivariate analysis, and categorical data analysis are treated in a way which makes effective use of visualization techniques and the related statistical techniques underlying them through practical applications, and hence helps the reader to achieve a clear understanding of the associated statistical models.

On Tuesday, analytics company SAS announced Viya, its new analytics and visualization architecture. A SAS spokeswoman said Viya would become the "foundation for all future SAS products."

The goal of Viya is to make analytics more accessible to all users and to better support cloud-native apps and data stored in the cloud.

The announcement came at the 2016 SAS Global Forum, the company's annual conference for SAS users and executives, which took place in Las Vegas. Viya will be available as part of an early adopter program in May, but it will reach general availability sometime in Q3 of 2016.


Latest Comments

K2 Content

  • A synergetic R-Shiny portal for modeling and tracking of COVID-19 data
    A synergetic R-Shiny portal for modeling and tracking of COVID-19 data

    Dr. Mahdi Salehi, an associate member of SDAT and assistant professor of statistics at the University of Neyshabur, introduced a useful online interactive dashboard that visualize and follows confirmed cases of COVID-19 in real-time. The dashboard was publicly made available on 6 April 2020 to illustrate the counts of confirmed cases, deaths, and recoveries of COVID-19 at the level of country or continent. This dashboard is intended as a user-friendly dashboard for researchers as well as the general public to track the COVID-19 pandemic, and is generated from trusted data sources and built-in open-source R software (Shiny in particular); ensuring a high sense of transparency and reproducibility.

    Access the shiny dashboard:

    Written on Friday, 08 January 2021 07:03 in SDAT News Read 589 times Read more...
  • First Event on Play with Real Data
    First Event on Play with Real Data

    Scientific Data Analysis Team (SDAT) intends to organize the first event on the value of data to provide data holders and data analyzers with an opportunity to extract maximum value from their data. This event is organized by International Statistical Institute (ISI) and SDAT hosted at the Bu-Ali Sina University, Hamedan, Iran. 

    Organizers and the data providers will provide more information about the goals of the initial ideas, team arrangement, competition processes, and the benefits of attending this event on a webinar hosted at the ISI Gotowebianr system. Everyone invites to participate in this webinar for free, but it is needed to register at the webinar system by 30 December 2020. 

    Event Time: 31 December 2020 - 13:30-16:30 Central European Time (CET)

    Register for the webinar: 

    More details about this event: 

    Aims and outputs:

    • Playing with real data by explorative and predictive data analysis techniques 
    • A platform between a limited number of data providers and hundreds to thousands of data scientist Teams
    • Improving creativity and scientific reasoning of data scientist and statisticians 
    • Finding the possible “bugs” with the current data analysis methods and new developments
    • Learn different views about a dataset.


    The best-report awards consist of a cash prize:
    $400 for first place,
    $200 for second place, and
    $100 for third place.

    Important Dates: 

    Event Webinar: 31 December 2020 - 13:30-16:30 Central European Time (CET). 
    Team Arrangement: 01 Jan. 2021 - 07 Jan. 2021
    Competition: 10 Jan. 2021 - 15 Jan. 2021
    First Assessment Result: 25 Jan. 2021
    Selected Teams Webinar: 30 Jan. 2021
    Award Ceremony: 31 Jan. 2021

    Please share this event with your colleagues, students, and data analyzers. 

    Written on Wednesday, 23 December 2020 13:45 in SDAT News Read 744 times Read more...
  • Development of Neuroimaging Symposium and Advanced fMRI Data Analysis
    Development of Neuroimaging Symposium and Advanced fMRI Data Analysis

    The Developement of Structural and Functional Neuroimaging Symposium hold at the School of Sciences, Shiraz University in April 17 2019.  The Advanced fMRI Data Analysis Workshop also held in April 18-19 2019. For more information please visit: 

    Written on Sunday, 21 April 2019 12:18 in SDAT News Read 1191 times Read more...
  • Releasing Rfssa Package by SDAT Members at CRAN
    Releasing Rfssa Package by SDAT Members at CRAN

    The Rfssa package is available at CRAN. Dr. Hossein Haghbin and Dr. Seyed Morteza Najibi (SDAT Members) have published this package to provide the collections of necessary functions to implement Functional Singular Spectrum Analysis (FSSA) for analysing Functional Time Series (FTS). FSSA is a novel non-parametric method to perform decomposition and reconstruction of FTS. For more information please visit github homepage of package. 

    Written on Sunday, 03 March 2019 21:03 in SDAT News Read 1339 times
  • Data Science Symposium
    Data Science Symposium

    Symposium of Data Science Developement and its job opportunities hold at the Faculty of Science, Shiraz University in Feb 20 2019. For more information please visit: 

    Written on Friday, 01 February 2019 00:13 in SDAT News Read 1859 times Read more...

About Us

SDAT is an abbreviation for Scientific Data Analysis Team. It consists of groups who are specialists in various fields of data sciences including Statistical Analytics, Business Analytics, Big Data Analytics and Health Analytics. 

Get In Touch

Address:  No.15 13th West Street, North Sarrafan, Apt. No. 1 Saadat Abad- Tehran

 Phone: +98-910-199-2800


Login Form