Workflow for statistical analysis and report writing
I generally break my projects into 4 pieces:
- load.R
- clean.R
- func.R
- do.R
load.R: Takes care of loading in all the data required. Typically this is a short file, reading in data from files, URLs and/or ODBC. Depending on the project at this point I'll either write out the workspace using save()
or just keep things in memory for the next step.
clean.R: This is where all the ugly stuff lives - taking care of missing values, merging data frames, handling outliers.
func.R: Contains all of the functions needed to perform the actual analysis. source()
'ing this file should have no side effects other than loading up the function definitions. This means that you can modify this file and reload it without having to go back an repeat steps 1 & 2 which can take a long time to run for large data sets.
do.R: Calls the functions defined in func.R to perform the analysis and produce charts and tables.
The main motivation for this set up is for working with large data whereby you don't want to have to reload the data each time you make a change to a subsequent step. Also, keeping my code compartmentalized like this means I can come back to a long forgotten project and quickly read load.R and work out what data I need to update, and then look at do.R to work out what analysis was performed.
Statistical Analysis with mixed numbers
The problem with using np.log10
is that there is no root on base 10 for negative numbers. In other words, 10^x = y with y < 0 is not solvable. If you want or need to use that function in particular you will need to sum 3 to all your options. That is, instead of going from -2 to 2 they should go from 1 to 5.
import numpy as np
feel = [1,2,3,4,5]
for i in feel:
print(np.log10(i))
This outputs:
>>> 0.0
>>> 0.3010299956639812
>>> 0.47712125471966244
>>> 0.6020599913279624
>>> 0.6989700043360189
Is there a list of data analysis tools without coding requirement for beginners?
There are a lot tools for data analysis. If you want to start coding I recommend R and python. For tools without coding KNIME, and Weka are popular!
I also recently tried AutoDataMiner (https://autodataminer.com/). It is really simple to use.
Related Topics
How to Put Labels Over Geom_Bar in R With Ggplot2
Select Groups Which Have At Least One of a Certain Value
Create Sequence of Repeated Values, in Sequence
Identify Groups of Linked Episodes Which Chain Together
How to Calculate Cumulative Sum
How to Send an Email With Attachment from R in Windows
Ggplot2: Histogram With Normal Curve
Error in Plot.New(): Figure Margins Too Large in R
Order Stacked Bar Graph in Ggplot
Addressing X and Y in Aes by Variable Number
Why Is the Parallel Package Slower Than Just Using Apply
Filtering a Data Frame on a Vector
Reshape Multiple Values At Once
How to Set Multiple Legends/Scales For the Same Aesthetic in Ggplot2
Call Apply-Like Function on Each Row of Dataframe With Multiple Arguments from Each Row
Replace X-Axis With Own Values
Ggplot2 - Jitter and Position Dodge Together
Create New Variables With Mutate_At While Keeping the Original Ones