OPEN-SOURCE SCRIPT

Scientific Correlation Testing Framework

116
Scientific Correlation Testing Framework - Comprehensive Guide
Introduction to Correlation Analysis
What is Correlation?
Correlation is a statistical measure that describes the degree to which two assets move in relation to each other. Think of it like measuring how closely two dancers move together on a dance floor.

Perfect Positive Correlation (+1.0): Both dancers move in perfect sync, same direction, same speed
Perfect Negative Correlation (-1.0): Both dancers move in perfect sync but in opposite directions
Zero Correlation (0): The dancers move completely independently of each other
In financial markets, correlation helps us understand relationships between different assets, which is crucial for:

Portfolio diversification
Risk management
Pairs trading strategies
Hedging positions
Market analysis
Why This Script is Special
This script goes beyond simple correlation calculations by providing:

Two different correlation methods (Pearson and Spearman)
Statistical significance testing to ensure results are meaningful
Rolling correlation analysis to track how relationships change over time
Visual representation for easy interpretation
Comprehensive statistics table with detailed metrics

Deep Dive into the Script's Components
1. Input Parameters Explained-
Symbol Selection:
This allows you to select the second asset to compare with the chart's primary asset
Default is Apple (NASDAQ:AAPL), but you can change this to any symbol
Example: If you're viewing a Bitcoin chart, you might set this to "NASDAQ:TSLA" to see if Bitcoin and Tesla are correlated

Correlation Window (60): This is the number of periods used to calculate the main correlation
Larger values (e.g., 100-500) provide more stable, long-term correlation measures
Smaller values (e.g., 10-50) are more responsive to recent price movements
60 is a good balance for most daily charts (about 3 months of trading days)
Rolling Correlation Window (20): A shorter window to detect recent changes in correlation
This helps identify when the relationship between assets is strengthening or weakening
Default of 20 is roughly one month of trading days
Return Type: This determines how price changes are calculated
Simple Returns: (Today's Price - Yesterday's Price) / Yesterday's Price
Easy to understand: "The asset went up 2% today"
Log Returns: Natural logarithm of (Today's Price / Yesterday's Price)
More mathematically elegant for statistical analysis
Better for time-additive properties (returns over multiple periods)
Less sensitive to extreme values.

Confidence Level (95%): This determines how certain we want to be about our results
95% confidence means we accept a 5% chance of being wrong (false positive)
Higher confidence (e.g., 99%) makes the test more strict
Lower confidence (e.g., 90%) makes the test more lenient
95% is the standard in most scientific research
Show Statistical Significance: When enabled, the script will test if the correlation is statistically significant or just due to random chance.

Display options control what you see on the chart:

Show Pearson/Spearman/Rolling Correlation: Toggle each correlation type on/off
Show Scatter Plot: Displays a scatter plot of returns (limited to recent points to avoid performance issues)
Show Statistical Tests: Enables the detailed statistics table
Table Text Size: Adjusts the size of text in the statistics table

2.Functions explained-
calcReturns():
This function calculates price returns based on your selected method:

Log Returns:
Formula: ln(Price_t / Price_t-1)
Example: If a stock goes from $100 to $101, the log return is ln(101/100) = ln(1.01) ≈ 0.00995 or 0.995%
Benefits: More symmetric, time-additive, and better for statistical modeling
Simple Returns:
Formula: (Price_t - Price_t-1) / Price_t-1
Example: If a stock goes from $100 to $101, the simple return is (101-100)/100 = 0.01 or 1%
Benefits: More intuitive and easier to understand

rankArray():
This function calculates the rank of each value in an array, which is used for Spearman correlation:

How ranking works:
The smallest value gets rank 1
The second smallest gets rank 2, and so on
For ties (equal values), they get the average of their ranks
Example: For values [5, 2, 7, 2, 9]
Sorted: [2, 2, 5, 7, 9]
Ranks: [1.5, 1.5, 3, 4, 5] (the two 2s tie for ranks 1 and 2, so they both get 1.5)
Why this matters: Spearman correlation uses ranks instead of actual values, making it less sensitive to outliers and non-linear relationships.

pearsonCorr():
This function calculates the Pearson correlation coefficient:

Mathematical Formula:
r = (nΣxy - ΣxΣy) / √[(nΣx² - (Σx)²)(nΣy² - (Σy)²)]
Where x and y are the two variables, and n is the sample size
What it measures:
The strength and direction of the linear relationship between two variables
Values range from -1 (perfect negative linear relationship) to +1 (perfect positive linear relationship)
0 indicates no linear relationship
Example:
If two stocks have a Pearson correlation of 0.8, they have a strong positive linear relationship
When one stock goes up, the other tends to go up in a fairly consistent proportion

spearmanCorr():
This function calculates the Spearman rank correlation:

How it works:
Convert each value in both datasets to its rank
Calculate the Pearson correlation on the ranks instead of the original values
What it measures:
The strength and direction of the monotonic relationship between two variables
A monotonic relationship is one where as one variable increases, the other either consistently increases or decreases
It doesn't require the relationship to be linear
When to use it instead of Pearson:
When the relationship is monotonic but not linear
When there are significant outliers in the data
When the data is ordinal (ranked) rather than interval/ratio
Example:
If two stocks have a Spearman correlation of 0.7, they have a strong positive monotonic relationship
When one stock goes up, the other tends to go up, but not necessarily in a straight-line relationship

tStatistic():
This function calculates the t-statistic for correlation:

Mathematical Formula: t = r × √((n-2)/(1-r²))
Where r is the correlation coefficient and n is the sample size
What it measures:
How many standard errors the correlation is away from zero
Used to test the null hypothesis that the true correlation is zero
Interpretation:
Larger absolute t-values indicate stronger evidence against the null hypothesis
Generally, a t-value greater than 2 (in absolute terms) is considered statistically significant at the 95% confidence level

criticalT() and pValue():
These functions provide approximations for statistical significance testing:

criticalT():
Returns the critical t-value for a given degrees of freedom (df) and significance level
The critical value is the threshold that the t-statistic must exceed to be considered statistically significant
Uses approximations since Pine Script doesn't have built-in statistical distribution functions
pValue():
Estimates the p-value for a given t-statistic and degrees of freedom
The p-value is the probability of observing a correlation as strong as the one calculated, assuming the true correlation is zero
Smaller p-values indicate stronger evidence against the null hypothesis
Standard interpretation:
p < 0.01: Very strong evidence (marked with **)
p < 0.05: Strong evidence (marked with *)
p ≥ 0.05: Weak evidence, not statistically significant

stdev():
This function calculates the standard deviation of a dataset:

Mathematical Formula: σ = √(Σ(x-μ)²/(n-1))
Where x is each value, μ is the mean, and n is the sample size
What it measures:
The amount of variation or dispersion in a set of values
A low standard deviation indicates that the values tend to be close to the mean
A high standard deviation indicates that the values are spread out over a wider range
Why it matters for correlation:
Standard deviation is used in calculating the correlation coefficient
It also provides information about the volatility of each asset's returns
Comparing standard deviations helps understand the relative riskiness of the two assets.

3.Getting Price Data-
price1: The closing price of the primary asset (the chart you're viewing)
price2: The closing price of the secondary asset (the one you selected in the input parameters)

Returns are used instead of raw prices because:
Returns are typically stationary (mean and variance stay constant over time)
Returns normalize for price levels, allowing comparison between assets of different values
Returns represent what investors actually care about: percentage changes in value

4.Information Table-
Creates a table to display statistics
Only shows on the last bar to avoid performance issues
Positioned in the top right of the chart
Has 2 columns and 15 rows
Populating the Table
The script then populates the table with various statistics:

Header Row: "Metric" and "Value"
Sample Information: Sample size and return type
Pearson Correlation: Value, t-statistic, p-value, and significance
Spearman Correlation: Value, t-statistic, p-value, and significance
Rolling Correlation: Current value
Standard Deviations: For both assets
Interpretation: Text description of the correlation strength
The table uses color coding to highlight important information:

Green for significant positive results
Red for significant negative results
Yellow for borderline significance
Color-coded headers for each section

=> Practical Applications and Interpretation
How to Interpret the Results
Correlation Strength
0.0 to 0.3 (or 0.0 to -0.3): Weak or no correlation
The assets move mostly independently of each other
Good for diversification purposes
0.3 to 0.7 (or -0.3 to -0.7): Moderate correlation
The assets show some tendency to move together (or in opposite directions)
May be useful for certain trading strategies but not extremely reliable
0.7 to 1.0 (or -0.7 to -1.0): Strong correlation
The assets show a strong tendency to move together (or in opposite directions)
Can be useful for pairs trading, hedging, or as a market indicator
Statistical Significance
p < 0.01: Very strong evidence that the correlation is real
Marked with ** in the table
Very unlikely to be due to random chance
p < 0.05: Strong evidence that the correlation is real
Marked with * in the table
Unlikely to be due to random chance
p ≥ 0.05: Weak evidence that the correlation is real
Not marked in the table
Could easily be due to random chance
Rolling Correlation
The rolling correlation shows how the relationship between assets changes over time
If the rolling correlation is much different from the long-term correlation, it suggests the relationship is changing
This can indicate:
A shift in market regime
Changing fundamentals of one or both assets
Temporary market dislocations that might present trading opportunities
Trading Applications
1. Portfolio Diversification
Goal: Reduce overall portfolio risk by combining assets that don't move together
Strategy: Look for assets with low or negative correlations
Example: If you hold tech stocks, you might add some utilities or bonds that have low correlation with tech
2. Pairs Trading
Goal: Profit from the relative price movements of two correlated assets
Strategy:
Find two assets with strong historical correlation
When their prices diverge (one goes up while the other goes down)
Buy the underperforming asset and short the outperforming asset
Close the positions when they converge back to their normal relationship
Example: If Coca-Cola and Pepsi are highly correlated but Coca-Cola drops while Pepsi rises, you might buy Coca-Cola and short Pepsi
3. Hedging
Goal: Reduce risk by taking an offsetting position in a negatively correlated asset
Strategy: Find assets that tend to move in opposite directions
Example: If you hold a portfolio of stocks, you might buy some gold or government bonds that tend to rise when stocks fall
4. Market Analysis
Goal: Understand market dynamics and interrelationships
Strategy: Analyze correlations between different sectors or asset classes
Example:
If tech stocks and semiconductor stocks are highly correlated, movements in one might predict movements in the other
If the correlation between stocks and bonds changes, it might signal a shift in market expectations
5. Risk Management
Goal: Understand and manage portfolio risk
Strategy: Monitor correlations to identify when diversification benefits might be breaking down
Example: During market crises, many assets that normally have low correlations can become highly correlated (correlation convergence), reducing diversification benefits
Advanced Interpretation and Caveats
Correlation vs. Causation
Important Note: Correlation does not imply causation
Example: Ice cream sales and drowning incidents are correlated (both increase in summer), but one doesn't cause the other
Implication: Just because two assets move together doesn't mean one causes the other to move
Solution: Look for fundamental economic reasons why assets might be correlated
Non-Stationary Correlations
Problem: Correlations between assets can change over time
Causes:
Changing market conditions
Shifts in monetary policy
Structural changes in the economy
Changes in the underlying businesses
Solution: Use rolling correlations to monitor how relationships change over time
Outliers and Extreme Events
Problem: Extreme market events can distort correlation measurements
Example: During a market crash, many assets may move in the same direction regardless of their normal relationship
Solution:
Use Spearman correlation, which is less sensitive to outliers
Be cautious when interpreting correlations during extreme market conditions
Sample Size Considerations
Problem: Small sample sizes can produce unreliable correlation estimates
Rule of Thumb: Use at least 30 data points for a rough estimate, 60+ for more reliable results
Solution:
Use the default correlation length of 60 or higher
Be skeptical of correlations calculated with small samples
Timeframe Considerations
Problem: Correlations can vary across different timeframes
Example: Two assets might be positively correlated on a daily basis but negatively correlated on a weekly basis
Solution:
Test correlations on multiple timeframes
Use the timeframe that matches your trading horizon
Look-Ahead Bias
Problem: Using information that wouldn't have been available at the time of trading
Example: Calculating correlation using future data
Solution: This script avoids look-ahead bias by using only historical data

Best Practices for Using This Script
1. Appropriate Parameter Selection
Correlation Window:
For short-term trading: 20-50 periods
For medium-term analysis: 50-100 periods
For long-term analysis: 100-500 periods
Rolling Window:
Should be shorter than the main correlation window
Typically 1/3 to 1/2 of the main window
Return Type:
For most applications: Log Returns (better statistical properties)
For simplicity: Simple Returns (easier to interpret)
2. Validation and Testing
Out-of-Sample Testing:
Calculate correlations on one time period
Test if they hold in a different time period
Multiple Timeframes:
Check if correlations are consistent across different timeframes
Economic Rationale:
Ensure there's a logical reason why assets should be correlated
3. Monitoring and Maintenance
Regular Review:
Correlations can change, so review them regularly
Alerts:
Set up alerts for significant correlation changes
Documentation:
Keep notes on why certain assets are correlated and what might change that relationship
4. Integration with Other Analysis
Fundamental Analysis:
Combine correlation analysis with fundamental factors
Technical Analysis:
Use correlation analysis alongside technical indicators
Market Context:
Consider how market conditions might affect correlations
Conclusion
This Scientific Correlation Testing Framework provides a comprehensive tool for analyzing relationships between financial assets. By offering both Pearson and Spearman correlation methods, statistical significance testing, and rolling correlation analysis, it goes beyond simple correlation measures to provide deeper insights.

For beginners, this script might seem complex, but it's built on fundamental statistical concepts that become clearer with use. Start with the default settings and focus on interpreting the main correlation lines and the statistics table. As you become more comfortable, you can adjust the parameters and explore more advanced applications.

Remember that correlation analysis is just one tool in a trader's toolkit. It should be used in conjunction with other forms of analysis and with a clear understanding of its limitations. When used properly, it can provide valuable insights for portfolio construction, risk management, and pair trading strategy development.

Declinazione di responsabilità

Le informazioni ed i contenuti pubblicati non costituiscono in alcun modo una sollecitazione ad investire o ad operare nei mercati finanziari. Non sono inoltre fornite o supportate da TradingView. Maggiori dettagli nelle Condizioni d'uso.