Data science basics

In this tutorial, you will learn to do the following in Julia:

  • Load and output csv files

  • Manipulate DataFrames

You could run the code for this tutorial in code/technical-analysis_julia/data-science-basics.ipynb. Make sure you have installed Julia and all the required dependencies (follow instructions here).

You would need to install and import the following libraries:

# import libraries
using CSV;
using Dates;
using DataFrames;
using Statistics;
using Plots;
using StatsPlots;
using RollingFunctions;

Read and output csv file

The CSV package is useful for loading and manipulating dataframes.

# load data
df = CSV.File("../../database/hkex_ticks_day/hkex_0001.csv") |> DataFrame
# save as csv
CSV.write("test.csv", df)

Data inspection

The first and last functions are similar to head and tail in pandas. Additionally, we also have describe that returns a summary of the dataframe.

first(df, 5) # show first 5 rows
last(df, 5) # last 5 rows
describe(df) # get summary of df

We can get the column names by:

names(df) # column names

Data selection

As we always select rows within a particular date range for stock price data, here is how to do it:

df[(df.Date .> Date(2017, 1)) .& (df.Date .< Date(2019, 1)), :]

Alteratively, we could generate a list of dates and check if date is in this range:

dates = [Date(2017, 1),Date(2018)];
yms = [yearmonth(d) for d in dates];
print(yms) # [(2017, 1), (2018, 1)]

df[in(yms).(yearmonth.(df.Date)), :], 10

We can select column(s) by the following way:

close = select(df, :Close) # select column "Close"
close = select(df, [:Close, :Volume]) # select columns "Close", "Volume"

References

Attention

All investments entail inherent risk. This repository seeks to solely educate people on methodologies to build and evaluate algorithmic trading strategies. All final investment decisions are yours and as a result you could make or lose money.