Workflow For Statistical Analysis and Report Writing

Workflow for statistical analysis and report writing

I generally break my projects into 4 pieces:

  1. load.R
  2. clean.R
  3. func.R
  4. do.R

load.R: Takes care of loading in all the data required. Typically this is a short file, reading in data from files, URLs and/or ODBC. Depending on the project at this point I'll either write out the workspace using save() or just keep things in memory for the next step.

clean.R: This is where all the ugly stuff lives - taking care of missing values, merging data frames, handling outliers.

func.R: Contains all of the functions needed to perform the actual analysis. source()'ing this file should have no side effects other than loading up the function definitions. This means that you can modify this file and reload it without having to go back an repeat steps 1 & 2 which can take a long time to run for large data sets.

do.R: Calls the functions defined in func.R to perform the analysis and produce charts and tables.

The main motivation for this set up is for working with large data whereby you don't want to have to reload the data each time you make a change to a subsequent step. Also, keeping my code compartmentalized like this means I can come back to a long forgotten project and quickly read load.R and work out what data I need to update, and then look at do.R to work out what analysis was performed.

Statistical Analysis with mixed numbers

The problem with using np.log10 is that there is no root on base 10 for negative numbers. In other words, 10^x = y with y < 0 is not solvable. If you want or need to use that function in particular you will need to sum 3 to all your options. That is, instead of going from -2 to 2 they should go from 1 to 5.

import numpy as np

feel = [1,2,3,4,5]
for i in feel:
print(np.log10(i))

This outputs:

>>> 0.0
>>> 0.3010299956639812
>>> 0.47712125471966244
>>> 0.6020599913279624
>>> 0.6989700043360189

Is there a list of data analysis tools without coding requirement for beginners?

There are a lot tools for data analysis. If you want to start coding I recommend R and python. For tools without coding KNIME, and Weka are popular!
I also recently tried AutoDataMiner (https://autodataminer.com/). It is really simple to use.



Related Topics



Leave a reply



Submit