The Ouroboros King, the demo update

Hi everyone, during the last few weeks I’ve been on vacation, which has allowed me to spend a lot of time adding more content to the game. Since the previous update, I’ve added:

  • Many new pieces: portal mage, immortal, cardinal, pawn, and fool
  • The item system, including gold rewards for winning battles and a shop
  • A difficulty system to make sure everyone can enjoy the game
  • Quality of life improvements to the initial army and map system, making sure you’re not shown too many new units at the same time and you always have relevant options on the map
  • Some polish to the sounds and new tracks by my brother Licus

With all of this, I am very happy with the demo in terms of gameplay. But there are still many visual improvements that I’d like to add, mainly animations to improve the game’s juice.

Next steps

Since Steam’s Next fest is at the beginning of October and I already have a working demo, I’ll focus on marketing during the following weeks. I intend to test many different things and see what sticks to try to build some momentum before the festival. Make sure to wishlist the game on Steam if you haven’t yet. I’ll surely do a minor update when the cover art is ready, and if I have some time to spare I’ll add some extra animations.

After the festival, I’ll go back to developing the game working on a dynamic monologue system and on content for the other 2 stages.

Thanks for reading and as always, subscribe for more updates.

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Be the first to know about my games and their updates:

The Ouroboros King, the relic update

Game updates

I’m happy to announce that during the past few weeks I’ve added the following changes to the game:

  • Introduction of relics, some of which affect combat and others affect army upgrades. You will always start with an Alarm Bell, that tells you when your king is in danger (which should make the game more accessible). Other relics will be available by visiting treasure nodes on the map
  • Added rocks that block unit movement during combat
  • Added a soundtrack, composed by my brother Licus
  • Added some animations and sound effects to improve the game’s juice
  • Nerfed the Berzerker, as it was too strong. Now it moves either 3 squares vertically or horizontally, or 2 squares diagonally. It can no longer kill the enemy king on its own on most situations

You can play the updated demo on the same itch link.

Progress

I’ve created a Steam page for the game, with the placeholder art. If you’re reading this, go wishlist it now, thanks! I don’t really expect it to get noticed too much right now, but I needed it up to enroll on festivals. On that note, I got rejected from Tacticon and I’ve enrolled on October’s Steam Next Fest. Starting to enroll on festivals so early may be a bit reckless, but I expect development speed to pick up soon as I’ll be on vacation during August and on September my daughter will start kindergarden.

I’ve also started talks with an artist to commission art for the game.

My following steps will be adding the item system along with a rewind mechanic that should make the game a lot more accessible.

Thanks for reading and as always, subscribe for more updates.

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Be the first to know about my games and their updates:

The Ouroboros King, work in progress release

During the last months, I’ve been working on a chess roguelike game: the Ouroboros King. I’m trying to do with chess what Slay the Spire did with card games.

I’ve finally released the first version of the Ouroboros King, you can play it on itch. All constructive feedback is welcome.

This first version contains the following elements:

  • A procedurally generated map a la Slay the Spire
  • An army management system, so you can change your piece formation
  • A combat system, that is basically chess with some variations (doesn’t tell you when you’re in check, kings can be captured, new pieces are available)
  • An event system, where you can upgrade your army and recruit new pieces after winning a combat

However, it’s still lacking many elements that I want to incorporate into the game:

  • An item system, with consumable combat bonuses
  • A relic system, with permanent bonuses
  • A dialog system and lore descriptions to tell the game’s story (similarly to how the Souls series or Hollow Knight tell their stories)
  • Many extra alternative chess pieces, to add more variety
  • Battlefield modifications, such as rocks that block movement
  • 2 extra stages with boss fights at the end (this release includes only the 1st stage)
  • Endgame difficulty options and unlockables to extend the game’s life
  • Background music

Many of them will make it to the free demo, and the rest will be available on the final version that I plan to release on steam.

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Be the first to know about my games and their updates:

HS Battlegrounds, optimizing your late game Naga board (post-nerf)

In May 2022 the Naga tribe was introduced to HS Battlegrounds. From the start, the tribe was completely OP with decent early-game units what and crazy late-game scaling. Since then they’ve been nerfed twice, lowering both the initial stats and scaling potential of some minions. In this post I’ll help you build a Naga board optimized for scaling, using the tools of numerical analysis.

The growth engine

This scaling is thanks to growth engines that interact with spells and the new spellcraft mechanic. There are many Naga that scale when you play spells, but not all of them are equally effective. Here are the scaling Nagas ordered by decreasing order of effectiveness:

  • Tidemistress Athissa, is not as OP as it used to be, but still very strong. If you get 5 procs (a quite conservative amount, 4 Spellcrafts on board and cycling 2 extra spells), that is +18/+18 on your board, more than a golden Ligthfang with 4 tribes or a Charly and a Pumba. Note that Athissa procs on all spells, including coins, blood gems and discovers from triples. We’ll compare the other minions to Athissa.
  • Critter Wrangler, half the scaling of Athissa on Spellcrafts, none on other spells. All in all, this will be ~40% as effective as Athissa, depending on whether Quilboar are on the lobby and the number of triples you get.
  • Eventide Brute (after you cast a spell, gain +1/+1). ~33% of Athissa’s scaling and it gets all the buffs, making it more vulnerable to poison/Leeroy.
  • Lava Lurker (the 1st Spellcraft spell cast on this each turn is permanent). The best spell you can use on it is Shoal Commander’s one, which gives it +7/+7 assuming you have 7 Nagas. If you optimize your setup for the Lurker and get 1 golden Lurker and 2 golden Commanders, you could get +28/+28 scaling per turn, which is still below the conservative estimate for Athissa. All in all, Lava Lurker can help you in the mid-game, but it falls short as a scaling engine.
  • Corrupted Myrmidon (Start of combat: double this minion’s stats). It doesn’t grow on its own but utilizes buffs better than other minions. Assuming you get all Athissa procs on it, you’ll get an extra ~25% plus you can double the stats from gems. If you have Critter Wrangler instead, you’ll double its efficiency on spells from hand. Another bonus is that it gives you a lot of tempo if you already have some Spellcrafts to buff it. As with Eventide Brute, concentrating buffs on this will make you susceptible to poison and Leeroy.

The clear winner by a wide margin is Athissa. In its absence, you can try to survive with a combination of Wranglers, Brutes, Corrupted Myrmidons and Lava Lurker.

Spellcraft minions

There are 7 Spellcraft minions, 6 of which are Naga and the other one gives you Nagas. Let’s analyze them:

  • Orgozoa, the Tender is not a Naga, but procs Athissa and also gives you more Nagas to round up your composition or proc Athissa again. Once you have 4 Naga on the board, this gives you the best scaling since it can discover more spells for extra procs.
  • Glowscale is great for combat, giving you the ability to DS your biggest minion.
  • Other Spellcraft minions. They offer a moderate amount of stats and taunt/windfury. They can be useful in helping you survive while you get your growth engine, but won’t help you scale as much as Orgozoa and their buffs aren’t as significant as DS in the late game. The best of them in terms of stats is Shoal Commander. However, even if you get a golden Commander, it will give be +14/+14 in combat stats which can be easily outclassed by one or two turns of scaling with Athissa. The only case when it’s relevant and even necessary is when you include Lava Lurker on your composition.

The ideal composition

Once we know the pieces of the puzzle, it’s time to think about the best way to assemble it. How many Spellcraft minions should we get? Is Lava Lurker worth it?

To analyze the composition, I’ve simulated the number of +1/+1 buffs we get for many different board combinations. These simulations make the following assumptions:

  • We have 6 “stable” minions that you are growing and 1 flex slot that you use to rotate spells
  • 3 played spells per turn from the shop (Spellcraft, coins, gems, discovers)
  • 80% of the spells are Spellcraft, and 20% are other types
  • We have a maximum of 1 Corrupted Myrmidon (or a golden one), which gets an equivalent of an extra 80% of the Critter Wrangler procs (you may put DS on other minions our use the discover from Orgozoa) and 20% of the Athissa procs
  • We have a maximum of 1 Lava Lurker (or a golden one) and it gets +7/+7 each turn (+14/+14 if golden), equivalent to having 1 Shoal Commander (2 if golden) and 7 Naga on board

With this in mind, we can calculate the number of procs as follows:

Spells cast = Other spells + Spellcraft minions

Athissa procs = Spells cast * (3 * Athissa + 6 * golden Athissa)

Critter Wrangler procs = 80% * Spells cast * 80% * (1.5 * Critter Wrangler + 3 * golden Critter Wrangler)

Eventide Brute procs = Spells cast * (Eventide Brute + 2 * Golden Brute)

Corrupted Myrmidon procs = (20% * Athissa procs + Critter Wrangler procs) * (Corrupted Myrmidon + 1.5 * Golden Corrupted Myrmidon)

Lava Lurker procs = 7 * Lava Lurker + 14 * Golden Lava Lurker

Procs = Athissa + Critter Wrangler + Eventide Brute + Corrupted Myrmidon + Lava Lurker

The best composition gets an equivalent of 104 +1/+1 procs per turn and consists of 2 golden Athissa, 2 golden Critter Wrangler, 1 golden Myrmidon and 1 Spellcraft minions.

The best composition without golden Athissa gets an equivalent of 79 +1/+1 procs and consists of 3 golden Wranglers, 1 golden Corrupted Myrmidon and 2 Spellcraft minion.

The best composition without any golden minions gets an equivalent of 46 +1/+1 procs and consists of 2 Athissa, 1 Critter Wrangler, 1 Corrupted Myrmidon and 2 Spellcraft minions.

I’ve measured the importance of each minion by calculating the average number of appearances on the top 10 compositions for each scenario. All copies are golden unless forbidden by the scenario:

All compositionsNo golden AthissaNo golden minions
Athissa2.10.82
Corrupted Myrmidon110.5
Critter Wrangler1.42.80.8
Lava Lurker0.10.30.5
Eventide Brute00.10.1
Spellcraft Minions1.412.1
Avg. procs per turn977444

I’ve made this spreadsheet calculator to calculate the number of procs you’d get based on your composition. It’s read-only so it remains the same, but you can copy it to another spreadsheet and use it if you want.

The flex slot

As suggested above, the flex slot is used to rotate minions that give you spells (Spellcraft, Seashell Collector, Quilboar). However, at the end of the turn, you should be playing a minion on that slot.

If you feel like the combat will be easy, you can try to get an extra spell for the next round by playing a Spellcraft minion or a Quilboar that gets gems on combat. If you play a Spellcraft minion, you should do so after playing all your spells so it doesn’t “steal” any procs.

If you’re pressured, try to get a Leeroy, Mantid Queen, Ghastcoiler or Selfless Hero to strengthen your board.

Getting there

This article just covers the ideal composition in a void, but on a BG game, you need to survive while you build your comp. In some cases, it will be impossible to build full scaling and you’ll keep your early Lurker or Brute on the board, that’s completely fine.

Conclusion

I’ve done the math on scaling for Naga comps, here are the main take aways:

  • Get as many copies of Athissa as you can
  • Critter Wrangler is a great minion to complement Athissa
  • A Corrupted Myrmidon (especially golden), is a great receptor of Athissa and Wrangler buffs
  • Lava Lurker (if you have Shoal Commander) and Eventide Brute are also viable
  • Get between 1 and 3 Spellcraft minions on the board, Orgozoa and Glowscale are the best
  • Round up your comp with another Spellcraft for a bit more scaling or another useful unit if under pressure
Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Subscribe if you want more content like this sent to you:

How I built my first video game

I’ve always wanted to learn how to make video games, but I just had never gotten to it.

This August I decided to change that. I had a two-week vacation in a remote and quiet place and used my spare time to build my first game. The final result is a short and unpolished game, but I have the satisfaction of having finalised the project and made almost all assets from scratch.

I had no previous experience in video game design, but have coded for some time and I used to draw a lot as a kid. In preparation for the project, I did a 2d platformer Unity tutorial on the afternoons during the week prior to my vacation. I also thought of a whole game concept of a roguelike where you’re an evil weapon (inspired on Nightblood from Stormlight Archive) that is trying to escape its confinement by tricking a human to wield it… but that ended up being waaaay too much and I had to cut the scope multiple times.

Once I had an idea i started planning out the main parts I needed for the project:

  • Level outline and player control
  • Enemies, attack and death animations
  • Aesthetic level design
  • Enemy sprites
  • Player sprites
  • Sound
  • Menu and victory screen

Level outline and player control

For the level outline, I wanted something short since it was my first project. Also the cultist theme made me think of ancient rituals and a Stonehenge aesthetic, including stone monuments. I ended up building a level that consisted of three main parts:

  • A couple of platforms to get the player started on the jump mechanics
  • A plateau with space for a couple of enemies that could be engaged individually
  • A final area where you had to fight many enemies at the same time

As for the movement, I just used the same control scheme that I’d learned from the platformer tutorial, and improved the jump a bit by learning from other tutorials. I also added gamepad compatibility by follwoing this tutorial. Anecdotally, I missed a small step in the middle of that tuto and ended up wasting more than an hour trying to figure out what was wrong…

This is the result after the first iteration:

Enemies, attack and death animations

Since I wanted to do attack animations I needed some sprites that could do that, not just a bean. So, let me introduce you to …

Bean with a sword

I made it and animated it using Photoshop’s basic tools. And since I didn’t want to waste much time I used it for the player and tweaked its size and color for the enemies.

Once I had my beans in place, I started coding the player’s attack controls and the health system. For the player’s attack, I followed this Blackthornprod tutorial. For the health system, I used what I had learned from the 2d platformer tutorial.

After that, I started coding the enemy AI. I started with a very easy approach that ended up doing the job, with no need for extra complexity. This is the AI’s behavior:

  • If you’ve recently been hit wait for a bit, else
  • If the player is in range wait a bit and then attack, else
  • If the player is in sight follow him, else
  • If you’re in front of an obstacle turn around, else
  • Walk in the direction you’re facing

I had also initially planned for “mage” type enemies that shoot fire balls at you, but realised that I didn’t have time to implement that (coding + drawing animations). So I just cut that out of the project.

Mage sketch, inspired by Final Fantasy’s black mages and a staff in Riot’s game design video series

Here is the result after the second iteration:

Aesthetic level design

Something that I wanted to focus on during this project was learning to design beautiful levels such as the ones in Hollow Knight. I searched a bit and found these two awesome tutorials from a small youtube channel.

After tinkering a bit with some elements, the final level setup was:

  • Some fog (from the above tutorials)
  • Black squares to cover “blank” regions
  • 2 or 3 layers of grass paralax in the front (photoshop brushes)
  • The player layer
  • Rocks and walls (copied from google images)
  • 3 layers of half-assed (time restrictions…) mountain paralax in the back
  • A blue sky with stars and a moon that’s too high up to be seen at any moment

I know it lacks polish, but I wanted to finish the project during my vacation so I had to move on.

Here’s the third iteration:

Enemy sprites

My first enemy: the soldier (sword added with PS)

I don’t really like pixel art and I don’t have a drawing tablet, so I decided to draw the characters, take pictures and use photoshop to digitalize them. The traditional way to do this is with the pen tool, but I quickly realized that this would take too much time so I ended up using the magic wand and some filters to make the lines more even. I think the result looks nice enough while being quite fast. If there’s interest I may write a guide detailing my method.

The first step of animating is having a clear character model and the second one id defining which animations. The animations that I needed for the enemy character were:

  • Walk/run
  • Attack
  • Die

Die and attack animations were kind of easy, but running was harder. For the run animation I took inspiration from a shovel knight gif. I used an online tool to break it down into frames and basically copied the leg positions from the frames.

Finally, I added a particle system to simulate blood splashes when the player or an enemy is hit (here’s a couple of tutorials).

Player sprites

I initially planned on having a hooded guy wielding an evil scythe as the player character

The fact that I used the same placeholder sprites for the player character and the enemies and that I was low on time, led me to do the same for the final sprites. I just added jumping and idle animations and called it a day.

Sound

The sounds I needed were:

  • Jump
  • Slash
  • Character hurt
  • Character dies
  • Background music (I shamelessly copied the song of the prayer from FF X)
  • Click
  • Victory song

I just recorded all of those with my phone in less than half an hour (all mouth noises). Afterwards, I did some light editing with Audacity and followed a couple of tutorials to get them into the game.

Menu and victory screen

For the menu, I just followed this tutorial and added the player sprites. I also built victory and defeat screens using the same principles.

Conclusion

Here’s the final result:

And that’s it. I learned a lot from this project and had fun doing it. The result is nowhere near what profesional videogames look like, but it does look better than I anticipated.

This project helped me get a better understanding of what making a full game entails, even if at a small scope. It’s an exercise I’d recommend to all aspiring game developers before getting into bigger projects.

Next, I’ll try to make a project with more focus on playability and less focus on assets. Let’s see how it goes.

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Join me on the adventure of game development:

How to use simulations in data science

Simulation is a very potent tool that is often lacking in many data scientists’ toolkits. In this article, I will teach you how to use simulation in combination with other analytical tools.

I will be sharing some educational and professional examples of simulation with Python code. If you are a data scientist (or on the road to becoming one), you’ll love the possibilities that simulation opens for you.

What is simulation?

Simulating is digitally running a series of events and recording their outcomes. Simulations help us when we have a good understanding of how individual events work, but not of how the aggregate works.

In physics, simulations are often used when we have a hard-to-solve differential equation. We know the starting state, and we know the rules for infinitesimal (very small) changes, but we don’t have a closed formula for longer timespans. Simulation allows us to project that initial state into the future, step by step.

In data science, we usually work with probabilistic events. Sometimes we can easily aggregate them analytically. Other times there is no analytical solution, or it’s very hard to reach it. We can estimate the probabilities and expected results of complex chains of events, by running multiple simulations and aggregating the results. This can be very useful to understand the risks we are exposed to.

Simulation is also used in hard artificial intelligence. When interacting with others, simulation can allow us to anticipate their behavior and plan accordingly. For example, Deep Mind’s Alpha Go uses simulations to calculate some moves into the future and make a better assessment of the best moves in its current position.

To run a simulation we will need a model of the underlying events. This model will tell us what can happen at any given point, the probabilities of each outcome and how we should evaluate the results.

The better our model, the better the accuracy of the simulation. However, simulations with imperfect models can still be helpful and give us a ballpark estimate.

Simulation is a subject where examples work better than theory, so let’s jump into some use cases.

Example 1. Estimate the value of pi by using simulation

This task can be done in many ways. One of the easiest is as follows:

  1. Draw a square of side 2 and with its center at the origin of coordinates of a 2d plane
  2. Draw the inscribed center of that square (radius 1 and its center at the origin of coordinates)
  3. Sample random points from the square (two uniform distributions from -1 to 1)
  4. Whenever you draw a point, check whether it is inside the circle or not
  5. The proportion of points inside the circle will be proportional to the area of the circle so:

    \[{Num\_points\_inside\_circle \over Num\_total\_points} \approx {Area\_of\_circle \over Area\_of\_square} = {\pi \cdot 1^2 \over 2 \cdot 2} =  {\pi \over 4}\]

And finally:

    \[\pi  \approx 4 \cdot {Area\_of\_circle \over Area\_of\_square}\]

Here is Python code to simulate the value of pi:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1234)

num_sims = 5000
x_random = np.random.rand(num_sims)
y_random = np.random.rand(num_sims)

inside_circle = ((x_random ** 2 + y_random**2) < 1)

print(4*inside_circle.mean())
plt.figure(figsize=[8,5])
n_to_one = np.arange(1, num_sims+1)
plt.plot(n_to_one , 4*inside_circle.cumsum() / n_to_one)
plt.show()
Pi simulation convergence

Similar methods can be used to estimate the value of integrals via simulation.

Example 2. Solve a difficult probability problem

Solve this problem by P. Winkler:

One hundred people line up to board an airplane. Each has a boarding pass with an assigned seat. However, the first person to board has lost his boarding pass and takes a random seat. After that, each person takes the assigned seat if it is unoccupied, and one of the unoccupied seats at random otherwise. What is the probability that the last person to board gets to sit in his assigned seat?

The problem can be solved using logic and probabilities, but it can also be solved by simply programming the described behavior and running some simulations:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1234)

def simulate_boarding(num_passengers):
    passenger_seats = set(range(num_passengers))
    for i in range(num_passengers):
        if i == num_passengers - 1:
            if list(passenger_seats)[0] == i:
                return 1
            else:
                return 0
        if (i == 0) or (not i in passenger_seats):
            i = list(passenger_seats)[np.random.randint(0, num_passengers - i)]
            passenger_seats.remove(i)
        else:
            passenger_seats.remove(i)
        
num_sims = 10000
num_passengers = 100
positives = 0

is_same_seat = [simulate_boarding(num_passengers) for i in range(num_sims)]
is_same_seat = np.array(is_same_seat)


print(is_same_seat.mean())
plt.figure(figsize=[8,5])
one_to_n = np.arange(1, num_sims+1)
plt.plot(one_to_n, is_same_seat.cumsum() / one_to_n)
plt.show()
Probability simulation convergence

You can find more probability problems to practice here.

Example 3. Simulating game outcomes

How many games would it take Magnus Carlsen (Elo of 2847 as of 18-07-2021) to get back to his current rating if he was dropped at 1000?

To solve this problem we need to understand how the Elo system works.

First, given two player’s Elo ratings, the probability of player1 beating player2 is:

    \[P(\textrm{player1 beats player2}) = {1 \over 1 + K \cdot 10 ^{(Elo_2 - Elo_1)/400}}\]

Second, after the game, player1’s Elo rating is updated as follows:

    \[Elo_1= Elo_1+K \cdot (\textrm{result} - P(\textrm{player1 beats player2}))\]

Where:

  • result is 1 for a win, 0.5 for a tie and 0 for a loss
  • K (also known as K-factor) is the maximum possible adjustment per game and varies depending on the player’s age, games played and ELO

Now that we have a model, we just have to initialize Magnus current Elo to 1000 and code a while loop that:

  1. Has Magnus play a game against a player of his current Elo
  2. Calculates the probability of winning using the real Elo and simulates the outcome of the game
  3. Updates Magnus’s current Elo according to the result
  4. Stops the loop if Magnus has reached his real Elo
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1234)

def get_prob(elo1, elo2):
    return 1/(1+10**((elo2 - elo1)/400))

def update_elo(elo, prob, result, k):
    return elo + k * (result - prob)

def play_until_top(real_elo, initial_elo):
    current_elo = initial_elo
    num_games = 0
    k = 40
    elo_list = [initial_elo]
    while current_elo < real_elo:
        if num_games > 30:
            k = 20
        if current_elo > 2400:
            k = 10
        prob_win = get_prob(real_elo, current_elo)
        result = 1 if np.random.rand(1)[0] < prob_win else 0
        current_elo = update_elo(current_elo, 0.5, result, k)
        elo_list.append(current_elo)
        num_games += 1
    return elo_list

num_sims = 1000

num_games = [len(play_until_top(2847, 1000)) for i in range(num_sims)]
num_games = np.array(num_games)

print(num_games.mean())
plt.figure(figsize=[8,5])
plt.hist(num_games,bins=50)[2]

elo_history = np.array(play_until_top(2847, 1000))
plt.figure(figsize=[8,5])
plt.plot(np.arange(0, len(elo_history)), elo_history)
plt.show()
Example Elo trajectory
Games to real Elo distibution

Another cool example would be to simulate the NBA playoffs. For a first approach, you can assume that each team has a probability of winning proportional to the games they won during the regular season (GW) so that in any game the probability of team 1 winning is GW1 / (GW1 + GW2). You can also analyze how probabilities change if you change the series from Best of 7 to Best of 5 or Best of 9.

Example 4. Business application, estimating value at risk

Collectors LTD is a debt collection company focused on enterprise debt. It buys portfolios of business loans that have defaulted at some point and tries to collect the payments for those loans. Some of the companies will be bankrupt and won’t be able to pay, and others are likely to go bankrupt in the future. The key to Collectors LTD’s business is in estimating the value it can get back from a portfolio. For this reason, Collectors LTD has developed a model that predicts the probability of a company repaying part of that debt. Among those companies that repay some of the debt, the amount paid is distributed uniformly from 0% to 100%. Collectors LTD can use its model in combination with simulation to evaluate the expected return of the portfolio, and how volatile that return is.

Since I can’t share the real data with you, I’ve created a synthetic dataset that mimics the relevant properties:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1234)

def generate_synthetic_portfolio(num_companies):
    debt = 100000 * np.random.weibull(0.75, num_companies)
    
    prob_repayment = np.random.normal(0.2, 0.1, num_companies)
    prob_repayment = np.clip(prob_repayment, a_min=0, a_max=1)
    
    return debt, prob_repayment

num_companies = 1000
debt, prob_repayment = generate_synthetic_portfolio(num_companies)

Given the following synthetically generated portfolio, estimate the expected amount to be collected and the 95% percentile.

def simulate_collection(debt, prob_repay):
    num_companies = len(debt)
    did_repay = (np.random.rand(num_companies) < prob_repay)
    pct_paid = np.random.rand(num_companies)
    amount_collected = debt * did_repay * pct_paid
    return amount_collected.sum()

num_sims = 1000
amount_collected =
    np.array([simulate_collection(debt, prob_repay) for i in range(num_sims)])

print(f"Total debt: {np.round(amount_collected.mean())} usd")
print(f"Average amount collected: {np.round(amount_collected.mean())} usd")
percentile_95 = np.round(np.sort(amount_collected)[int(0.05*num_sims)])
print(f"95% percentile collection: {percentile_95} usd")

plt.figure(figsize=[8,5])
plt.hist(amount_collected,bins=50)[2]
Debt collection distribution

Keep in mind that this solution assumes the probabilities of collection are independent of one another. This isn’t true for systemic risks such as a global economic downturn.

Conclusion

I hope you’ve liked these examples and that you can find applications of simulation in your day-to-day data science job. If you’ve enjoyed the article, please subscribe and share it with your friends.

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Get more articles like this emailed to you

13 essential tips for learning machine learning and data science

When you start learning, it’s very hard to have a clear direction. You often waste time on uninteresting, useless, or outdated topics. You wander and run in circles.

However, once you’ve mastered the topic, it’s easy to look back and see the fastest path from noob to pro. If you only could go back in time and give yourself the roadmap… Even if I cannot do that with myself, I can do that for others. This is the objective of this article: to give you the tips I wish I knew when I started learning data science and machine learning.

To build this list, first I wrote down what has been useful to me in my experience as a data scientist. Then I went to Reddit, to seek help in curating and completing the list, getting 300+ upvotes and 35+ comments. I hope you find it helpful!

1. Get solid mathematics, probabilities, and statistics foundations

Mathematics and statistics are at the core of machine learning. So it will be very difficult to understand machine learning algorithms if you don’t know the building blocks.

However, this doesn’t mean you need to be a math wizard. You should understand math and stats concepts such as vectors, matrices, derivatives, probability distribution, independent variables, or standard deviation. More advanced mathematics (like learning to prove theorems) won’t help you much when studying machine learning, even though it can be a lot of fun.

2. Learn either Python or R and learn them well

When doing data science and machine learning, you will spend most of your time coding in R/Python. So it’s important to learn the ins and outs of your language of choice.

Data scientists spend a lot of time cleaning and manipulating data, so you should give special attention to data manipulation libraries. The most popular ones are Pandas for Python and data.table and dplyr for R.

3. Learn good programming practices

Writing clean and efficient code will make it easier to share your work with others. And even if you work alone, will make it easier for you to debug and maintain your own code. Entire books have been written about this so I’ll give you a short list:

  1. Use consistent and descriptive names for variables, columns, and functions
  2. Don’t repeat code, use functions or classes if you need to do the same process multiple times
  3. Understandable code is better than compact one: 10 lines everybody understands vs 2 lines nobody understands
  4. Don’t overoptimize your code at the start, but know where the bottlenecks (parts that won’t work well if you increase the volume of data) are in case you need it to scale
  5. Use consistent indentation and try to limit line length

4. You don’t need to learn all the different supervised learning models

This is one I struggled with. When I started learning I thought that every situation would need a different type of model and that I needed to learn them all to be well equipped. But this is far from true. Linear/logistic regression is surprisingly effective for tabular data problems. And XGBoost or random forest will help you if you have a lot of non-linearities. Artificial neural nets are great for image and NLP problems but are otherwise overkill and more difficult to set up.

Aditionally, you don’t have to keep up with all the published papers. Most staple techniques in the industry are decades old. If you ever have to face a very unique problem, then may be a good moment to dive into the literature.

5. Once you know the basics and understand them well, it’s mostly about doing projects

After completing one or two ML courses, don’t spend your time on more theory, dive straight into doing some projects. If you’re lacking some knowledge, you can pick it up on the way.

Working on projects puts your knowledge into practice, and helps you figure if you really understood everything well. Additionally, by doing projects you create valuable experiences that will help you get hired later on.

6. Doing tutorials and reviewing other people’s projects is very helpful at the start

When you’re learning a new tool or model and don’t feel confident about using it on your own, looking at an example is a great way to get some inspiration.

7. You can learn everything online for free, but some paid resources can be helpful

For example, studying a master’s will give you credentials and a class of peers. I’ve actually written a full article about self-learning vs studying a master’s.

Additionally, some useful online resources are paid. I have personally tried to distill my years of experience as a data scientist into Data Projects, a product to learn data science by doing real-world projects. I hope it can help others as much as it would’ve helped me.

8. Explaining your work to others is a great way to consolidate your knowledge

It’s also a great way to work on your communication. You can do this by telling your friends, blogging, or making youtube videos. This will be a crucial skill when working with others.

9. Don’t despair if you don’t get it right

Nobody gets it right the first time. Trial and error is the way to go, especially on fields like this where there is no one exact solution

10. Lean on online communities

The internet is full of helpful and generous people, if you’re struggling with something search and if you don’t find the answers, ask in the forums (reddit or stackoverflow).

11. Learn more about your problem domain

Don’t focus only on the purely technical, try to understand what is really behind the problems you’re modeling. It will help you decide which is the best error metric for the problem, select the most insightful variables, and communicate to non-technical stakeholders using their own language.

12. Work with messy data

Don’t just stick to problems with pre-cleanded data. The world is messy, and having some experience on treating and structuring data will prepare you for future challenges.

13. Work on what makes you curious, that will keep you motivated

Following your curiosity and your passions will make sure you don’t abandon your path to becoming a data scientist halfway through. Additionally, it makes the whole learning experience a lot more fun!

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Get more articles like this emailed to you

How to self-learn data science from scratch

When I learned data science I didn’t know where to start, so I wasted many hours learning only tangentially useful stuff. Now, after more than five years as a data science consultant, I know what I would’ve done differently. In this article, I will offer you a roadmap on how self-learn data science with links to useful resources.

Data science pre-requisites

Even though I believe everyone can learn data science, those with a technical background will have a head start. Before getting into DS specific subjects it is useful to have some notions about mathematics, statistics and probability.

It is not necessary to be an expert in any of those, but you need a solid foundation. If you’ve never studied any of those, don’t worry, I’m here to help. In the following paragraphs, I’ll briefly describe each prerequisite and link to educational resources.

Mathematics for data science

To get started with data science you need to get familiar with some of mathematics’ most common objects. These Khan academy lessons about vectors, matrices and functions are a good place to start. Also, here’s the summary (in more formal mathematical language) of a Stanford course. These concepts are the building blocks of most machine learning algorithms and provide you with a framework for structuring data. Getting to this level of mathematics will allow you to understand and use the algorithms that others have invented and implemented and get results.

If you really like mathematics, you can dive deeper into mathematics by taking full calculus and linear algebra courses. This will require a lot more work but will unlock a more complete understanding of the inner workings of machine learning algorithms and how to implement and adjust them.

Probability and statistics

Probability lies at the core of the data scientists’ view of the world. When dealing with big numbers and random events, probability and statistics provide the tools to make sense of them. It isn’t only about the exact methods or formulas, but also about developing a probabilistic intuition. These courses from Khan academy on probability and statistics are both beginner-friendly and got all the information you’ll need. Here is a mathematically formal summary of a probability course from Stanford.

In addition to formal education in probability and statistics, reading non-fiction books can also help to develop an intuition. I recommend the following books in no particular order: Thinking fast and slow, Factfulness, Thinking in bets, Fooled by randomness (or any of Nassim Taleb’s books).

Finally, reading about statistical paradoxes will help you make sense of data when you face unintuitive conclusions.

Data-oriented programming language

A big part of a data scientist’s job is reading, manipulating and running analysis on data. This is usually done by coding in a data-oriented language. These languages allow us to write instructions for a computer to execute. Even though there are many different programming languages, most of them use very similar structures. The two most popular data-oriented programming languages are Python and R, and you can start with either one. If at some later point you work with people using the other one, you can use that as an opportunity to learn it.

If you’ve never coded before, don’t worry. Both of them can be a good first point of contact with programming. A lot has been written about which one is better, but the truth is they have different strengths.

R’s strong points are:

  • It is designed for data and statistical work, so manipulating data is easier
  • There is a vast universe of statistics libraries
  • The Shiny library makes it very easy to make a web app with no previous web design experience
  • RStudio is a wonderful IDE (I haven’t found one that I like as much for Python)

Python’s strong points are:

  • It’s a general-purpose programing language as well as one of the most popular languages overall
  • It usually runs faster than R
  • It has better packages for deep learning

I personally prefer R because of its more compact syntax in the data.table package and also because I have more experience with it.

Learning R

If you are new to programming, I recommend you start with one of these resources:

If you have been coding for a while, you can get the basics with learn R in Y minutes.

Once you know the basics, it’s time to learn one of the two main data manipulation libraries: data.table (my personal favorite) or dplyr. Another useful library is ggplot2 for making beautiful graphics.

Learning Python

If python is your first programming language you can start with any of these:

If you’re already familiar with coding you can just read this documentation.

And once you’ve mastered python’s basics, you can go into the specialized tools to manipulate data: Pandas and Numpy. Here’s a tutorial and here’s a video to help you learn those packages.

Learn machine learning

Now we get to the exciting part.

There are many different techniques and tools in machine learning. One of them has been my most used analytical tool during my years as a data science consultant. And that technique is supervised learning, in both of its forms: classification and regression.

Supervised learning, also known as predictive modeling, is about learning from examples in which we know in advance the correct answer. In regression the answer is a numerical value, and in classification it is categorical.

Predictive models can be used to make demand forecasts, identify risky creditors and estimate the market price of a house among many other uses.

Here are some courses that will teach you the main framework to approach predictive modeling problems, as well as some supervised learning models:

In my experience, 3 families of models can help you solve most supervised learning problems you’ll ever encounter:

  1. Linear and logistic models (explained in the above courses) are easy to understand, easy to interpret, fast to train and reasonably accurate
  2. XGBoost (gradient boosting trees implementation) is a top-of-the-class model in terms of precision, speed and ease of use. However, they’re not as easy to interpret as linear models. Here’s an introduction to decision trees (pre-requisite) and a couple of articles about how XGBoost works
  3. Neural networks are great for natural language processing and image models. However, I’d leave them to more advanced data scientists since they’re more difficult to set up

Here are some examples of using linear regression in R and Python, and of using XGBoost in both languages.

SQL

SQL is the most used database language and most companies use one of its variants for their database. Even Amazon’s Athena and Google’s big query can be accessed using SQL syntax.

So if you’re planning on getting a job in data science I recommend you learn SQL since it will be a requirement for most employers. If you’re doing personal projects it’s up to you. For small-scale projects, you will be just saving your data on text files. For bigger projects, SQL skills may come in handy.

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Get more articles like this emailed to you

What’s next?

Once you’ve learned the basics about R/Python and supervised learning, it’s time to practice. Do a project with open data or participate in a Kaggle competition. Or get a job as a data scientist and learn while getting paid. Practice is what will help you hone your skills and generate proof of your knowledge.

How to get a job (in data science)

In this article, I’ll give you a structured approach to getting a data science job.

In fact, I’ll be sharing all the techniques that have helped me get offers from startups and management consulting firms, along with examples of my own resume and project portfolio. Additionally, I’ll talk about what I look for when screening CVs and running interviews.

So if you want to get a job in data science, you’ll love the actionable steps in this guide.

Let’s get started.

Be eligible for a data science job

Before going into the details of job hunting, let’s get this out of the way: no amount of tricks will get you a job if you don’t have the required skills. So the first step is to learn the fundamentals: coding in a data-friendly language (ideally Python or R), some machine learning, and SQL. Those are the basics for an entry position in data science. And they can be learned for free on the internet.

If your market is very hot or you’re looking for internships, you may get hired with a technical degree (CS, math, physics, engineering) and no specialized data science knowledge.

Some work experience may also help to make you a more attractive candidate. Adjacent positions like data analyst and data engineer can help you move to a data science position.

Additionally, some domain knowledge of the industry of the companies you’re applying to will be a great asset to your job search.

You have to present your story in the best light possible

And this is true through the whole hiring process, from your resume to interviews.

When you are looking for a job, you are both the product and the salesperson. No one else but you will highlight your qualities. There are many ways to explain who you are and what kind of work you’ve done in the past. You should choose the most persuasive way in every situation. To do this focus on two main principles:

  1. Make your story as interesting as possible by being specific enough
  2. Adapt your story to your audience, highlighting what is more relevant to them

For example, when Mary is asked what she does for a living in an interview she can say “I do customer segmentation” and that can be as true as it is boring and unspecific. But she can do better. She could say “I use algorithms to segment users according to their past purchases”. That sounds more interesting.

Moreover, if Mary is on an interview with a software engineer, she can specify that she uses a mix of SQL and Python code for her analysis. If she is talking to the marketing manager, she can explain how her segmentations helped increase the email open rate by 12%.

Additionally, she should try to use her audience’s own words. Her official job title states “Business Analyst”, but she’s using SQL and Python to do her job. If she’s applying for a “Data Analyst” position, she could say her current job is a “Data Analyst” position too.

These ideas apply to interviews as well as the wording of your resume and any other document you present.

Improve the steps of the funnel where you’re weak

A job search is like a sales funnel. You find some job postings and apply to them. Some of those applications will get you interviews. And some of the interviews will result in job offers.

By thinking of the process as a funnel, you can isolate its parts and try to optimize them separately. For example, imagine John has sent lots of applications and isn’t getting any interviews. In that case, before sending more applications John should make sure his CV is well-formatted and that he is a good fit for the positions he’s applying to.

The main parts of the job search funnel are:

How to get a job – search funnel

Applications, increasing the funnel input

The first step to getting a job is finding job postings. The main ways to do so are:

  • Asking your network, which may even let you skip directly to the interview phase
  • Online job searches, I’d suggest searching about once per week (LinkedIn has by far been the best for me)
  • Improving your LinkedIn profile and setting it as open to work
  • Local job banks
  • Company jobs page if you’re  interested in specific companies
  • Cold emails to people in your industry

Once you’ve got some job pots you have to decide which ones to apply for. My rule of thumb is to apply to whatever job you feel confident you can do, regardless of requirements. Very often companies will post job offers where it’s almost impossible to find a person with 100% of the requirements. If you don’t fulfill some of them but feel like you can pick them up easily on the job, just apply.

Maybe you have confidence issues and feel you may not be worthy of the job. In this case try to see what kind of jobs did your classmates get, and shoot for something on that level. If people that studied with you did it, you can do it too!

Unless you’ve applied to at least 20 positions, your best bet is to keep sending more applications. Think of it this way: if 20 people apply for a job your base probability of getting an offer is 5%. Lately, on LinkedIn, I’ve seen many postings with as many as 100 applicants.

Improving your LinkedIn profile

On my last job search, I got contacted by many recruiters that found me through LinkedIn and brought relevant offers. This will probably happen more and more as you advance through your career.

But to get noticed you have to work on your profile. Here’s what I did:

  • Create a complete profile with all your relevant work and learning experiences
  • Follow LinkedIn’s advice to improve your profile
  • Write an “about” section that sounds professional
  • Get your friends to endorse you on the necessary skills for your job search
  • Take LinkedIn’s skill certificates to make your profile stand out
  • Accept random connections, sometimes they have offers for you
  • Set your profile as “Open to work”

Successful applications and getting noticed

Once you’ve selected some job posts, you need to send the best application you can. Your objective when applying is to convince HR that you’re a good fit for the job.

Before going into which documents to send, let’s talk about referrals. This is the step in the process with the highest potential return on your effort. If you have a connection working at the company, talk to them and ask for a referral. This will bring extra attention to your application and possibly let you skip some of the required steps. Just do it if it’s available.

Whether or not you can get a referral, applying for a position involves sending some documents that show who you are. Any document you include, make it a pdf. Word and other editable formats may visualize differently on different computers.

How to write your CV

The most common document is your resume or CV (Curriculum Vitae). The objective of your CV is to communicate who you are to the recruiter. Here are a few guidelines about how to write your curriculum:

  1. Use a nice template, for example
  2. Only include a picture if you look good on it, in the picture dress for the job you want
  3. One page, include only relevant information
  4. Make everything easy to understand
  5. Don’t get too technical (this will probably be filtered by someone with no technical knowledge)

Here’s the resume I used the last time I was looking for a job (2021).

In addition to your CV, you may want to include a cover letter or a project portfolio.

Cover letter

The cover letter should be about why this job is a good match for you and why you are a good match for the job. It could increase your chances of getting a positive response, especially in more formal recruiting processes such as management consulting. You can send it as an email or as an attached pdf (max 1 page). You can structure your cover letter as follows:

  1. Introduction: why you’re writing this document
  2. Why the company is a good fit for you, for example: it’s a market leader, very innovative, you personally use their products, …
  3. Why the position is a good fit for you
  4. Why you are a good fit for the position and how your previous work experience and education has prepared you for this job (try to address all the points in the job description)
  5. Conclusion

Here’s an example of a cover letter I used some time ago to get into management consulting.

Project portfolio tips

Another document you can send is a project portfolio. This is a document explaining some of the projects you’ve worked on. If you have done some projects that you’re proud of, this can make them shine. In fact, the last time I applied for a job I impressed some of the interviewers with my project portfolio.

If you do so, keep in mind the following points:

  1. Don’t make it too long: 2/3 projects, 10/15 slides max
  2. For every project, explain the technology used, the process, and the results
  3. If possible, showcase projects relevant to your potential employer (same industry, technologies, or modeling problems)
  4. Don’t assume the reader has previous knowledge of your projects, give all the necessary context
  5. Don’t share any confidential information

If you send a GitHub link, make sure to keep your profile clean and organized. Also, create a clear readme file on your projects with a summary. Otherwise, reviewers may not know where to start and will just skip it.

What I look for when screening applications

At my past job, I screened applications of potential candidates for data science consulting jobs. This is what I looked for in order of relevance:

  1. Evidence of proactiveness and problem-solving in previous work experience and side projects
  2. Fundamental data science skills (ML, R/Python, SQL). I personally don’t care much whether they’ve taken a master’s degree or some MOOCs
  3. Numerical and coding skills (technical degree, side projects,  …)

What if you don’t get answers to your job applications?

Looking for a job can be tough, especially when companies and recruiters ignore you. Don’t despair. If you find yourself in this spot, here’s what could be happening:

  • You haven’t applied to enough jobs or have been unlucky so far, send more applications
  • Your resume is poorly formatted or difficult to understand, work on it
  • You have the skills but not the credentials, try to explain better why you are a good candidate and maybe give recruiters some proof of your skills (for example: portfolio, LinkedIn certificates)
  • You don’t have the necessary skills for the positions you’re applying to, in which case you should level up your knowledge or apply to other more suitable jobs

Proving yourself: Tests and assignments

If after sending your application letter the company likes you, they will contact you. At this point, some companies will assess your skills and commitment with an assignment or a test.

Tests are like exams. You will have limited time to answer a series of theoretical questions or practical exercises. Ask about it and try to prepare in advance. Doing similar tests from other companies or preparation websites will help. Also, try to schedule it for a time when you’re rested.

Here are some resources to prepare for a Data Science test:

Assignments are small projects that may range from about 2 to 12h. Some assignments are downright abusive. If you aren’t too interested in the job, now is the time to get out of the process.

Other assignments are interesting and fun challenges. You can take them as a chance to see what the job will be like and also test your skills.

Whatever the type of assignment, always think that it’s a relatively small investment compared to the time you’ll spend at the job if you end up with get an offer.

What to do if you are not passing the tests

Don’t worry if you get turned down at one test or assignment, flukes can happen. And sometimes recruiters don’t know what they’re doing.

However, if you fail at tests repeatedly, that means you should review the theory. Try to remember which questions you missed and study those topics and adjacent ones. Also, try to practice doing tests, as that always helps.

If you’re having trouble with assignments, then it’s a matter of practice. Try to do projects on your own and explain them to friends. Reviewing projects by others can also help.

The interview

Interviews come in many shapes and forms but tend to follow a common pattern. Most of them will consist of 4 main parts: introduction, HR-type questions, technical questions, and your questions. Preparing for each of the parts will improve your odds of getting a job offer.

Before an interview, you should review the job description and make sure you understand all concepts mentioned in it. This will automatically make you a better candidate. Bonus points if you think for some time about their business and how data science can improve their bottom line.

In the first part, the interviewer will present herself and also give some information about the company and the role. Then she will ask you to introduce yourself. You should prepare your introduction and practice it in front of a friend to project a better image of yourself.

HR-type questions are usually about your motivations, your character, and your soft skills. Some typical HR questions are:

  • Why did you decide to apply to this role?
  • Tell us about your strengths and weaknesses
  • What do your colleagues think of you?
  • Can you describe your management style?

It is impossible to have a prepared answer for all these kinds of questions. However, taking the time to prepare an answer for some of these will make you better at coming up with good answers for other questions. Additionally, writing down a description of the impression you want to give is also a good way to prepare.

There are 4 main types of technical questions:

  • Explaining a previous project
  • Case-type questions about how you would approach a certain task
  • Theory questions (here are some examples)
  • Practical exercises

Again, the range of possible questions here is almost unlimited. If you’re applying for a big company you may find some information about their interviewing style online. Having more experience with data science projects will give you an edge on technical questions but you can always get blindsided with a theory question about an algorithm you’ve never used. If this happens, acknowledge your ignorance and offer another subject about which you could talk.

Finally, when it’s your turn, asking a couple of questions will make you look interested in the job. You can spend 30 minutes googling the company before the interview to stand out from the competition by asking interesting questions.

How I run interviews

I have run data science interviews at both my current and past jobs. One was for consulting, the other for a SaaS that optimizes retail stock management. In both cases, interviews consisted of a data science business case, in which candidates have to solve business problems using analytical tools. It’s not so much about coding (in fact we don’t do live coding) as it is about problem-solving and knowing when to apply each data science technique. More specifically, what I look for when interviewing is:

  1. Problem-solving, understanding business problems and developing data-driven strategies to solve them.
  2. Communication skills, capacity to explain complex concepts in a clear and concise manner
  3. Leadership and initiative, ability and willingness to propose and run projects as well as to mentor more junior colleagues.
  4. Code craftsmanship, love for writing clear and easy to maintain code, while being conscious of the problems associated with excess complexity.
  5. Analysis depth, for example by identifying confounding variables and getting to the root cause of issues.

Just keep in mind that this is based on my personal opinions and what my company needs. Other interviews may be different.

How to get better at interviews

Many people struggle with interviews. The good news is, practice can help a lot.

Practice your introduction in front of the mirror until it’s perfect. Create or get some interview scripts, and get a friend or relative to interview you. Or if you’re still in college, maybe get together with some other students to interview each other.

After 5-10 mock interviews, you will be more articulate and more confident in yourself.

If you feel like anxiety is an issue in interviews, try to do breathing exercises to relax before going in.

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Get more articles like this emailed to you

Tying it up 

The job search doesn’t end when you get the offer, but when you sign the contract. Now it’s time to negotiate the terms. If there is something you don’t like, you have a right to say it and try to reach a different agreement. This can range from salary to vacation days.

Words of encouragement

Finally, the most important thing to keep in mind is that getting a job like many things in life is a numbers game. No amount of effort and skill will guarantee that you get the job. They may forget about your CV, the position may be covered by the CEO’s nephew, or the recruiter may be an ass.

Additionally, for every posting, there may be lots of applicants. So don’t lose faith and don’t let rejection bring you down. Even if it’s tedious and it takes a long time in the end it’s still worth it.

Assuming you send 50 applications and each of them takes you about one hour. Then interview with 10 companies, spending an average of 3 hours with each. This would be a total of 80 hours, which is about 4-5% of what someone with a full-time job works on a year. If you get a 10% raise it’s a great investment. If you get 5%, better conditions, or a more fulfilling job it’s still well worth it.

So, go get it!

How to estimate the impact of algorithms

You’ve just finished training a credit risk tree model with a whooping 57 AUC score, and you feel great. And you should. But let’s dig deeper. How much better will this model be than using no model? Or than using the previous model which had an AUC of 48?

Have you ever wondered what the impact of an algorithm you are building is? How much money you are making for your company? How many lives are our campaigns saving?

Every member of an organization should know how their actions contribute to the organization’s goals. This allows them to prioritize and be more efficient in their work.

The impact of an algorithm is tied to the actions it enables

To estimate the impact of an algorithm, first, we’ll need to define a metric. This will usually be money because it’s the main human mean of value exchange and one of the main goals of businesses. However, depending on the nature of your project, you can use metrics such as lives or time saved.

To estimate the impact of an action, we have to calculate the difference of our metric between two different scenarios:

  1. The current outcome (measured in the metric we’ve defined). This can be 0 if nothing can be done without the algorithm
  2. The outcome we expect to get by using the algorithm instead

Building simplified models of the situation will allow us to make estimations of the impact. This is similar to how we would build a business case.

Let’s make it more clear with an example

STL limited (Short Term Loans) is a credit company that gives 1-year loans. This is how their business is going:

  1. They give loans at a 10% interest rate to everyone that applies for one
  2. Their default rate is 10% (percentage of customers that don’t pay back all the money they owe)
  3. Customers that default had paid back an average of 30% of the loan amount before defaulting
  4. The average loan amount is 1.000$
  5. Every year 100.000 new customers apply for a loan

With this information we can estimate how much they currently earn per year by using a simple excel spreadsheet. We will first estimate the expected earnings per non-defaulting customer (NDC) and per defaulting-customer (DC). After this we will combine those estimations to calculate the expected value per customer by using conditional probabilities.

Building a model to improve earnings

John, the lead data scientist at STL limited, has developed a probability of default model. He has trained it using customer employment data that was collected anyway for regulatory reasons.

John uses the model to make predictions on a holdout set (a dataset that the model has never seen before). He then divides the customers into four groups of the same size based on the probability of default predictions. The following table shows the probability of default for each of the groups:

Modifying the default rate on the previous spreadsheet, we can estimate the expected earings per customer for each of the groups:

The average customer on group 4 loses money for STL limited.

How would much would STL limited earn if they only gave credit to people on groups 1, 2 and 3?

By only giving credit to customers with positive expected earnings, STL could make a total of 3,5M$ per year. This means that the model would have an impact of 1,5M$ (3,5M$ minus the 2M$ of the base case).

Wrapping it up

This impact estimation method is based on simplification and it leaves out second-order consequences of the actions. Additionally, future performance isn’t guaranteed to be the same as in the past. To account for these sources of uncertainty, I generally multiply the impact estimation by a conservative factor of 50-80%.

Nevertheless, the objective of these estimations is not perfect accuracy but getting a ballpark figure that will allow us to compare and prioritize.

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Get more articles like this emailed to you