I can use for and while loops to repeatedly execute code
I can use conditional (if) statements to execute code if a condition is met, including the use of boolean logic
I can manipulate lists, including indexing and slicing into them to access subsets and using list methods (e.g., .sort, .append, .count, etc.)
I can write a user-defined function
I can create and work with dictionaries
I can create and format different types of plots using the Matplotlib library
I can access and manipulate data stored in NumPy arrays and Pandas dataframes, including the respective methods and using vectorization
I can calculate various measures of central tendency (e.g., mean, median) and dispersion (e.g., standard deviation, interquartile range) from a dataset and know when particular measures are appropriate
I can create and interpret different types of plots for representing distributions, including box plots, histograms, probability/cumulative density functions
I can use one-sided and two-sided hypothesis tests (t-tests) to compare a sample mean to a reference value and to another sample mean, and to compare multiple sample means (ANOVA)
I can use and interpret normality tests, including the Kolmogorov-Smirnov, Lilliefors’ and Shapiro-Wilk tests
I can interpret and select appropriate types of plots used for representing: Amounts, Distributions, Proportions, X-Y relationships and Uncertainty
I can use LabPlot to make commonly used types of plots, including scatter, histogram, bar, box and violin plots
I can make different types of plots using Seaborn, including regression, categorical, distribution, matrix and relational plots
I can use LabPlot to create a scatter plot and fit data to basic functions such as linear, polynomial, exponential functions as well as custom functions
I can smooth data with a variety of SciPy functions, including unweighted and weighted sliding average, median smoothing and Savitzky-Golay smoothing
I can fit univariate data using the NumPy polyfit() function and fit multivariate data using the NumPy lstsq() function
I can use the SciPy curve_fit() function to carry out both linear and non-linear regression
I can recognize and correct misleading elements of a plot, including violations of proportional ink, overlapping points and misuse of color
I can improve plot interpretability by using redundant coding, multi-panel figures and informative titles, captions and tables
I can improve plot aesthetics by minimizing the amount of non-data ink and balancing the data and the context, by using appropriately-scaled axis titles and by using filled plot elements
I can use the Scikit-Learn pca() function to carry out principal components analysis (PCA) and make and interpret scores and loadings plots
I can implement a supervised machine learning model with Sckit-Learn by: creating the model, training it with the training data and evaluating it using test data
I can build a multiple linear regression model using various methods of pre-processing and evaluate it using a train-test split and k-fold cross validation using Scikit-Learn
I can build a random forest classification model and evaluate it using a train-test split and k-fold cross validation with a confusion matrix using Scikit-Learn
I can use the RDKit library to represent molecules with various methods, including SMILES and Morgan fingerprint
I can use RDKit to calculate molecular descriptors, search for substructures and calculate molecular similarity
I can use the RDKit and py3Dmol libraries to visualize molecules