Data science basics¶
In this tutorial, you will learn to do the following in Julia:
Load and output csv files
Manipulate DataFrames
You could run the code for this tutorial in code/technical-analysis_julia/data-science-basics.ipynb.
Make sure you have installed Julia and all the required dependencies (follow instructions here).
You would need to install and import the following libraries:
# import libraries
using CSV;
using Dates;
using DataFrames;
using Statistics;
using Plots;
using StatsPlots;
using RollingFunctions;
Read and output csv file¶
The CSV package is useful for loading and manipulating dataframes.
# load data
df = CSV.File("../../database/hkex_ticks_day/hkex_0001.csv") |> DataFrame
# save as csv
CSV.write("test.csv", df)
Data inspection¶
The first and last functions are similar to head and tail
in pandas. Additionally, we also have describe that returns a summary of the dataframe.
first(df, 5) # show first 5 rows
last(df, 5) # last 5 rows
describe(df) # get summary of df
We can get the column names by:
names(df) # column names
Data selection¶
As we always select rows within a particular date range for stock price data, here is how to do it:
df[(df.Date .> Date(2017, 1)) .& (df.Date .< Date(2019, 1)), :]
Alteratively, we could generate a list of dates and check if date is in this range:
dates = [Date(2017, 1),Date(2018)];
yms = [yearmonth(d) for d in dates];
print(yms) # [(2017, 1), (2018, 1)]
df[in(yms).(yearmonth.(df.Date)), :], 10
We can select column(s) by the following way:
close = select(df, :Close) # select column "Close"
close = select(df, [:Close, :Volume]) # select columns "Close", "Volume"
References
Attention