The Ouroboros King, post-launch devlog

The game was released about a month ago (February 27th, 2023), and it’s sold over 5000 copies (I estimate ~30-35% of sales will get into my bank account after discounts, VAT, Steam cut, and income tax). This would be a flop for indie studios with multiple people, however, I’m doing this part-time and it is a success for me. It looks like it’ll pay for the salary I’ve foregone by only working part-time while being just my first commercial game.

In this post, I’ll document what I did since the last update:

  • Finished most of the game by the beginning of January (a bit later than planned)
  • Tried to get players for the beta but didn’t really manage to get too many
  • Emailed streamers by the end of January, and then the magic happened. Some big streamers like Retromation, Aliensrock, Olexa, and Sifd picked up the game and wishlists shot up. All the feedback I didn’t get through the official beta, I got via an increase in players due to streamer exposure, but the deadline was very tight at that point since I wanted to release by February 27th. I’m still not sure whether I should’ve pushed the release a couple of weeks further
  • At the beginning of February, I was contacted by illustrator Isaac Murgadella offering to do artwork for the game. I was hesitant because I feared the game might lose its identity so close to the release date. However, I also feared that the art I had at the moment resembled Magnus Carlsen too much and that he may not like that (I’d emailed his team about it and never got a response). I ended up deciding to go with it, and I’m super happy I did as the game looks a lot better
  • Participated in Steam’s Next Fest and wishlists kept increasing
  • I wrote to j4nw (developer of Pawnbarian), who inspired me to get into this journey, and we decided to bundle together for my launch. He’s been super nice and I’m very happy to work with him
  • Ported the game to Linux (very easy), and I tried to port it to Mac but it wasn’t so easy and I decided to wait after the release
  • With one week left until release, I felt that there were too few relics and sprinted and added ~20 new relics, duplicating the amount
  • Finally released the game on February 27th, and found out that I’d left some bugs in it, so I spent a week patching all bugs that appeared
  • My Discord server grew up to ~250 people and I kept getting feedback and requests for more content

Plans for the future

Given the game’s moderate success, I’ve decided to expand a bit on it with some of the top requests I’ve gotten from my discord, as well as port it to more platforms and localize it to other languages. Here’s an estimated timeline of how this could happen:

  • Infinity mode: you can play endlessly after beating the game. I released the final version of it on April 7th
  • Practice mode: set up a board and play against friends or the AI. Expected by mid-April
  • Content update, with new units and maybe new relics and items. Expected by end of April, maybe later if lots of playtesting is required
  • Mac port. I’m not 100% on the complexity of this, but hopefully, I can have it done by mid-May
  • Localization for German, Spanish, French, Chinese, Russian, and Catalan. Here I need to work on a system that makes sure that all the text in the game is read from a file, so I can easily change languages. After that, I’ll have to send the text to translators. I expect this to be done by the end of May, but I’m aware it could take longer
  • Mobile port as a free demo + in-app purchase for the full game. It will require UI re-design, and making sure the controls work for mobile. I’ve never done this before, I hope it’s done by end of June.

Keep in mind that I’m working on this solo and any real-life issues may delay this plan. But if there are no emergencies I think it’s doable.

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Be the first to know about my games and their updates:

The Ouroboros King, big update

Since these posts aren’t getting much traction, I haven’t made any in what feels like an eternity. Since I last posted, I have made lots of improvements to the game:

  • Improved the map both visually and its inner workings, now offering more paths
  • Added some juice, by improving some of the game’s visual effects, adding a slight screen shake, and blood stains on the board
  • Added a lot more pieces, relics, and items
  • Prepared the infrastructure for the full game (3 chapters + final boss)

You can check out all these improvements on the Steam demo and on itch.

Additionally, I’ve participated in a couple of festivals, getting to ~250 wishlists. Assuming I double the number of wishlists by launch, that wishlist conversion rate is 20% and the price is 10€, the game will make ~1.000€ and I’ll probably get half of that amount after steam cut and taxes. This figure is way below my initial expectations, but it’s only my first commercial game and I’ll still be happy. However, I’ll still do what I can to get it to more people.

My plan is to release it in February. Before launch, the plan goes as follows:

  • Finish the game by the end of December. This includes a new type of map location involving sacrifices in exchange for more powerful units and relics, completing the game lore (I have a draft in my head but it needs some ironing), and populating the final stages (90% of the content is made, I just need to tell the game where to show it)
  • Run a beta to gather some feedback. I’ll get players for the beta from alphabetagamer and some specific subreddits. My plan is to credit and give steam keys to everyone that makes a contribution to improving the game
  • Contact streamers (hopefully by mid-January) to start generating some interest in the game
  • Participate in February’s Steam Next Fest
  • Release a week after Next Fest

So that’s that. Hopefully, the plan works out and I can be here in a couple of months talking about a successful launch 😉

The Ouroboros King, the demo update

Hi everyone, during the last few weeks I’ve been on vacation, which has allowed me to spend a lot of time adding more content to the game. Since the previous update, I’ve added:

  • Many new pieces: portal mage, immortal, cardinal, pawn, and fool
  • The item system, including gold rewards for winning battles and a shop
  • A difficulty system to make sure everyone can enjoy the game
  • Quality of life improvements to the initial army and map system, making sure you’re not shown too many new units at the same time and you always have relevant options on the map
  • Some polish to the sounds and new tracks by my brother Licus

With all of this, I am very happy with the demo in terms of gameplay. But there are still many visual improvements that I’d like to add, mainly animations to improve the game’s juice.

Next steps

Since Steam’s Next fest is at the beginning of October and I already have a working demo, I’ll focus on marketing during the following weeks. I intend to test many different things and see what sticks to try to build some momentum before the festival. Make sure to wishlist the game on Steam if you haven’t yet. I’ll surely do a minor update when the cover art is ready, and if I have some time to spare I’ll add some extra animations.

After the festival, I’ll go back to developing the game working on a dynamic monologue system and on content for the other 2 stages.

Thanks for reading and as always, subscribe for more updates.

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Be the first to know about my games and their updates:

The Ouroboros King, the relic update

Game updates

I’m happy to announce that during the past few weeks I’ve added the following changes to the game:

  • Introduction of relics, some of which affect combat and others affect army upgrades. You will always start with an Alarm Bell, that tells you when your king is in danger (which should make the game more accessible). Other relics will be available by visiting treasure nodes on the map
  • Added rocks that block unit movement during combat
  • Added a soundtrack, composed by my brother Licus
  • Added some animations and sound effects to improve the game’s juice
  • Nerfed the Berzerker, as it was too strong. Now it moves either 3 squares vertically or horizontally, or 2 squares diagonally. It can no longer kill the enemy king on its own on most situations

You can play the updated demo on the same itch link.

Progress

I’ve created a Steam page for the game, with the placeholder art. If you’re reading this, go wishlist it now, thanks! I don’t really expect it to get noticed too much right now, but I needed it up to enroll on festivals. On that note, I got rejected from Tacticon and I’ve enrolled on October’s Steam Next Fest. Starting to enroll on festivals so early may be a bit reckless, but I expect development speed to pick up soon as I’ll be on vacation during August and on September my daughter will start kindergarden.

I’ve also started talks with an artist to commission art for the game.

My following steps will be adding the item system along with a rewind mechanic that should make the game a lot more accessible.

Thanks for reading and as always, subscribe for more updates.

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Be the first to know about my games and their updates:

The Ouroboros King, work in progress release

During the last months, I’ve been working on a chess roguelike game: the Ouroboros King. I’m trying to do with chess what Slay the Spire did with card games.

I’ve finally released the first version of the Ouroboros King, you can play it on itch. All constructive feedback is welcome.

This first version contains the following elements:

  • A procedurally generated map a la Slay the Spire
  • An army management system, so you can change your piece formation
  • A combat system, that is basically chess with some variations (doesn’t tell you when you’re in check, kings can be captured, new pieces are available)
  • An event system, where you can upgrade your army and recruit new pieces after winning a combat

However, it’s still lacking many elements that I want to incorporate into the game:

  • An item system, with consumable combat bonuses
  • A relic system, with permanent bonuses
  • A dialog system and lore descriptions to tell the game’s story (similarly to how the Souls series or Hollow Knight tell their stories)
  • Many extra alternative chess pieces, to add more variety
  • Battlefield modifications, such as rocks that block movement
  • 2 extra stages with boss fights at the end (this release includes only the 1st stage)
  • Endgame difficulty options and unlockables to extend the game’s life
  • Background music

Many of them will make it to the free demo, and the rest will be available on the final version that I plan to release on steam.

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Be the first to know about my games and their updates:

HS Battlegrounds, optimizing your late game Naga board (post-nerf)

In May 2022 the Naga tribe was introduced to HS Battlegrounds. From the start, the tribe was completely OP with decent early-game units what and crazy late-game scaling. Since then they’ve been nerfed twice, lowering both the initial stats and scaling potential of some minions. In this post I’ll help you build a Naga board optimized for scaling, using the tools of numerical analysis.

The growth engine

This scaling is thanks to growth engines that interact with spells and the new spellcraft mechanic. There are many Naga that scale when you play spells, but not all of them are equally effective. Here are the scaling Nagas ordered by decreasing order of effectiveness:

  • Tidemistress Athissa, is not as OP as it used to be, but still very strong. If you get 5 procs (a quite conservative amount, 4 Spellcrafts on board and cycling 2 extra spells), that is +18/+18 on your board, more than a golden Ligthfang with 4 tribes or a Charly and a Pumba. Note that Athissa procs on all spells, including coins, blood gems and discovers from triples. We’ll compare the other minions to Athissa.
  • Critter Wrangler, half the scaling of Athissa on Spellcrafts, none on other spells. All in all, this will be ~40% as effective as Athissa, depending on whether Quilboar are on the lobby and the number of triples you get.
  • Eventide Brute (after you cast a spell, gain +1/+1). ~33% of Athissa’s scaling and it gets all the buffs, making it more vulnerable to poison/Leeroy.
  • Lava Lurker (the 1st Spellcraft spell cast on this each turn is permanent). The best spell you can use on it is Shoal Commander’s one, which gives it +7/+7 assuming you have 7 Nagas. If you optimize your setup for the Lurker and get 1 golden Lurker and 2 golden Commanders, you could get +28/+28 scaling per turn, which is still below the conservative estimate for Athissa. All in all, Lava Lurker can help you in the mid-game, but it falls short as a scaling engine.
  • Corrupted Myrmidon (Start of combat: double this minion’s stats). It doesn’t grow on its own but utilizes buffs better than other minions. Assuming you get all Athissa procs on it, you’ll get an extra ~25% plus you can double the stats from gems. If you have Critter Wrangler instead, you’ll double its efficiency on spells from hand. Another bonus is that it gives you a lot of tempo if you already have some Spellcrafts to buff it. As with Eventide Brute, concentrating buffs on this will make you susceptible to poison and Leeroy.

The clear winner by a wide margin is Athissa. In its absence, you can try to survive with a combination of Wranglers, Brutes, Corrupted Myrmidons and Lava Lurker.

Spellcraft minions

There are 7 Spellcraft minions, 6 of which are Naga and the other one gives you Nagas. Let’s analyze them:

  • Orgozoa, the Tender is not a Naga, but procs Athissa and also gives you more Nagas to round up your composition or proc Athissa again. Once you have 4 Naga on the board, this gives you the best scaling since it can discover more spells for extra procs.
  • Glowscale is great for combat, giving you the ability to DS your biggest minion.
  • Other Spellcraft minions. They offer a moderate amount of stats and taunt/windfury. They can be useful in helping you survive while you get your growth engine, but won’t help you scale as much as Orgozoa and their buffs aren’t as significant as DS in the late game. The best of them in terms of stats is Shoal Commander. However, even if you get a golden Commander, it will give be +14/+14 in combat stats which can be easily outclassed by one or two turns of scaling with Athissa. The only case when it’s relevant and even necessary is when you include Lava Lurker on your composition.

The ideal composition

Once we know the pieces of the puzzle, it’s time to think about the best way to assemble it. How many Spellcraft minions should we get? Is Lava Lurker worth it?

To analyze the composition, I’ve simulated the number of +1/+1 buffs we get for many different board combinations. These simulations make the following assumptions:

  • We have 6 “stable” minions that you are growing and 1 flex slot that you use to rotate spells
  • 3 played spells per turn from the shop (Spellcraft, coins, gems, discovers)
  • 80% of the spells are Spellcraft, and 20% are other types
  • We have a maximum of 1 Corrupted Myrmidon (or a golden one), which gets an equivalent of an extra 80% of the Critter Wrangler procs (you may put DS on other minions our use the discover from Orgozoa) and 20% of the Athissa procs
  • We have a maximum of 1 Lava Lurker (or a golden one) and it gets +7/+7 each turn (+14/+14 if golden), equivalent to having 1 Shoal Commander (2 if golden) and 7 Naga on board

With this in mind, we can calculate the number of procs as follows:

Spells cast = Other spells + Spellcraft minions

Athissa procs = Spells cast * (3 * Athissa + 6 * golden Athissa)

Critter Wrangler procs = 80% * Spells cast * 80% * (1.5 * Critter Wrangler + 3 * golden Critter Wrangler)

Eventide Brute procs = Spells cast * (Eventide Brute + 2 * Golden Brute)

Corrupted Myrmidon procs = (20% * Athissa procs + Critter Wrangler procs) * (Corrupted Myrmidon + 1.5 * Golden Corrupted Myrmidon)

Lava Lurker procs = 7 * Lava Lurker + 14 * Golden Lava Lurker

Procs = Athissa + Critter Wrangler + Eventide Brute + Corrupted Myrmidon + Lava Lurker

The best composition gets an equivalent of 104 +1/+1 procs per turn and consists of 2 golden Athissa, 2 golden Critter Wrangler, 1 golden Myrmidon and 1 Spellcraft minions.

The best composition without golden Athissa gets an equivalent of 79 +1/+1 procs and consists of 3 golden Wranglers, 1 golden Corrupted Myrmidon and 2 Spellcraft minion.

The best composition without any golden minions gets an equivalent of 46 +1/+1 procs and consists of 2 Athissa, 1 Critter Wrangler, 1 Corrupted Myrmidon and 2 Spellcraft minions.

I’ve measured the importance of each minion by calculating the average number of appearances on the top 10 compositions for each scenario. All copies are golden unless forbidden by the scenario:

All compositionsNo golden AthissaNo golden minions
Athissa2.10.82
Corrupted Myrmidon110.5
Critter Wrangler1.42.80.8
Lava Lurker0.10.30.5
Eventide Brute00.10.1
Spellcraft Minions1.412.1
Avg. procs per turn977444

I’ve made this spreadsheet calculator to calculate the number of procs you’d get based on your composition. It’s read-only so it remains the same, but you can copy it to another spreadsheet and use it if you want.

The flex slot

As suggested above, the flex slot is used to rotate minions that give you spells (Spellcraft, Seashell Collector, Quilboar). However, at the end of the turn, you should be playing a minion on that slot.

If you feel like the combat will be easy, you can try to get an extra spell for the next round by playing a Spellcraft minion or a Quilboar that gets gems on combat. If you play a Spellcraft minion, you should do so after playing all your spells so it doesn’t “steal” any procs.

If you’re pressured, try to get a Leeroy, Mantid Queen, Ghastcoiler or Selfless Hero to strengthen your board.

Getting there

This article just covers the ideal composition in a void, but on a BG game, you need to survive while you build your comp. In some cases, it will be impossible to build full scaling and you’ll keep your early Lurker or Brute on the board, that’s completely fine.

Conclusion

I’ve done the math on scaling for Naga comps, here are the main take aways:

  • Get as many copies of Athissa as you can
  • Critter Wrangler is a great minion to complement Athissa
  • A Corrupted Myrmidon (especially golden), is a great receptor of Athissa and Wrangler buffs
  • Lava Lurker (if you have Shoal Commander) and Eventide Brute are also viable
  • Get between 1 and 3 Spellcraft minions on the board, Orgozoa and Glowscale are the best
  • Round up your comp with another Spellcraft for a bit more scaling or another useful unit if under pressure
Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Subscribe if you want more content like this sent to you:

How I built my first video game

I’ve always wanted to learn how to make video games, but I just had never gotten to it.

This August I decided to change that. I had a two-week vacation in a remote and quiet place and used my spare time to build my first game. The final result is a short and unpolished game, but I have the satisfaction of having finalised the project and made almost all assets from scratch.

I had no previous experience in video game design, but have coded for some time and I used to draw a lot as a kid. In preparation for the project, I did a 2d platformer Unity tutorial on the afternoons during the week prior to my vacation. I also thought of a whole game concept of a roguelike where you’re an evil weapon (inspired on Nightblood from Stormlight Archive) that is trying to escape its confinement by tricking a human to wield it… but that ended up being waaaay too much and I had to cut the scope multiple times.

Once I had an idea i started planning out the main parts I needed for the project:

  • Level outline and player control
  • Enemies, attack and death animations
  • Aesthetic level design
  • Enemy sprites
  • Player sprites
  • Sound
  • Menu and victory screen

Level outline and player control

For the level outline, I wanted something short since it was my first project. Also the cultist theme made me think of ancient rituals and a Stonehenge aesthetic, including stone monuments. I ended up building a level that consisted of three main parts:

  • A couple of platforms to get the player started on the jump mechanics
  • A plateau with space for a couple of enemies that could be engaged individually
  • A final area where you had to fight many enemies at the same time

As for the movement, I just used the same control scheme that I’d learned from the platformer tutorial, and improved the jump a bit by learning from other tutorials. I also added gamepad compatibility by follwoing this tutorial. Anecdotally, I missed a small step in the middle of that tuto and ended up wasting more than an hour trying to figure out what was wrong…

This is the result after the first iteration:

Enemies, attack and death animations

Since I wanted to do attack animations I needed some sprites that could do that, not just a bean. So, let me introduce you to …

Bean with a sword

I made it and animated it using Photoshop’s basic tools. And since I didn’t want to waste much time I used it for the player and tweaked its size and color for the enemies.

Once I had my beans in place, I started coding the player’s attack controls and the health system. For the player’s attack, I followed this Blackthornprod tutorial. For the health system, I used what I had learned from the 2d platformer tutorial.

After that, I started coding the enemy AI. I started with a very easy approach that ended up doing the job, with no need for extra complexity. This is the AI’s behavior:

  • If you’ve recently been hit wait for a bit, else
  • If the player is in range wait a bit and then attack, else
  • If the player is in sight follow him, else
  • If you’re in front of an obstacle turn around, else
  • Walk in the direction you’re facing

I had also initially planned for “mage” type enemies that shoot fire balls at you, but realised that I didn’t have time to implement that (coding + drawing animations). So I just cut that out of the project.

Mage sketch, inspired by Final Fantasy’s black mages and a staff in Riot’s game design video series

Here is the result after the second iteration:

Aesthetic level design

Something that I wanted to focus on during this project was learning to design beautiful levels such as the ones in Hollow Knight. I searched a bit and found these two awesome tutorials from a small youtube channel.

After tinkering a bit with some elements, the final level setup was:

  • Some fog (from the above tutorials)
  • Black squares to cover “blank” regions
  • 2 or 3 layers of grass paralax in the front (photoshop brushes)
  • The player layer
  • Rocks and walls (copied from google images)
  • 3 layers of half-assed (time restrictions…) mountain paralax in the back
  • A blue sky with stars and a moon that’s too high up to be seen at any moment

I know it lacks polish, but I wanted to finish the project during my vacation so I had to move on.

Here’s the third iteration:

Enemy sprites

My first enemy: the soldier (sword added with PS)

I don’t really like pixel art and I don’t have a drawing tablet, so I decided to draw the characters, take pictures and use photoshop to digitalize them. The traditional way to do this is with the pen tool, but I quickly realized that this would take too much time so I ended up using the magic wand and some filters to make the lines more even. I think the result looks nice enough while being quite fast. If there’s interest I may write a guide detailing my method.

The first step of animating is having a clear character model and the second one id defining which animations. The animations that I needed for the enemy character were:

  • Walk/run
  • Attack
  • Die

Die and attack animations were kind of easy, but running was harder. For the run animation I took inspiration from a shovel knight gif. I used an online tool to break it down into frames and basically copied the leg positions from the frames.

Finally, I added a particle system to simulate blood splashes when the player or an enemy is hit (here’s a couple of tutorials).

Player sprites

I initially planned on having a hooded guy wielding an evil scythe as the player character

The fact that I used the same placeholder sprites for the player character and the enemies and that I was low on time, led me to do the same for the final sprites. I just added jumping and idle animations and called it a day.

Sound

The sounds I needed were:

  • Jump
  • Slash
  • Character hurt
  • Character dies
  • Background music (I shamelessly copied the song of the prayer from FF X)
  • Click
  • Victory song

I just recorded all of those with my phone in less than half an hour (all mouth noises). Afterwards, I did some light editing with Audacity and followed a couple of tutorials to get them into the game.

Menu and victory screen

For the menu, I just followed this tutorial and added the player sprites. I also built victory and defeat screens using the same principles.

Conclusion

Here’s the final result:

And that’s it. I learned a lot from this project and had fun doing it. The result is nowhere near what profesional videogames look like, but it does look better than I anticipated.

This project helped me get a better understanding of what making a full game entails, even if at a small scope. It’s an exercise I’d recommend to all aspiring game developers before getting into bigger projects.

Next, I’ll try to make a project with more focus on playability and less focus on assets. Let’s see how it goes.

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Join me on the adventure of game development:

How to use simulations in data science

Simulation is a very potent tool that is often lacking in many data scientists’ toolkits. In this article, I will teach you how to use simulation in combination with other analytical tools.

I will be sharing some educational and professional examples of simulation with Python code. If you are a data scientist (or on the road to becoming one), you’ll love the possibilities that simulation opens for you.

What is simulation?

Simulating is digitally running a series of events and recording their outcomes. Simulations help us when we have a good understanding of how individual events work, but not of how the aggregate works.

In physics, simulations are often used when we have a hard-to-solve differential equation. We know the starting state, and we know the rules for infinitesimal (very small) changes, but we don’t have a closed formula for longer timespans. Simulation allows us to project that initial state into the future, step by step.

In data science, we usually work with probabilistic events. Sometimes we can easily aggregate them analytically. Other times there is no analytical solution, or it’s very hard to reach it. We can estimate the probabilities and expected results of complex chains of events, by running multiple simulations and aggregating the results. This can be very useful to understand the risks we are exposed to.

Simulation is also used in hard artificial intelligence. When interacting with others, simulation can allow us to anticipate their behavior and plan accordingly. For example, Deep Mind’s Alpha Go uses simulations to calculate some moves into the future and make a better assessment of the best moves in its current position.

To run a simulation we will need a model of the underlying events. This model will tell us what can happen at any given point, the probabilities of each outcome and how we should evaluate the results.

The better our model, the better the accuracy of the simulation. However, simulations with imperfect models can still be helpful and give us a ballpark estimate.

Simulation is a subject where examples work better than theory, so let’s jump into some use cases.

Example 1. Estimate the value of pi by using simulation

This task can be done in many ways. One of the easiest is as follows:

  1. Draw a square of side 2 and with its center at the origin of coordinates of a 2d plane
  2. Draw the inscribed center of that square (radius 1 and its center at the origin of coordinates)
  3. Sample random points from the square (two uniform distributions from -1 to 1)
  4. Whenever you draw a point, check whether it is inside the circle or not
  5. The proportion of points inside the circle will be proportional to the area of the circle so:

    \[{Num\_points\_inside\_circle \over Num\_total\_points} \approx {Area\_of\_circle \over Area\_of\_square} = {\pi \cdot 1^2 \over 2 \cdot 2} =  {\pi \over 4}\]

And finally:

    \[\pi  \approx 4 \cdot {Area\_of\_circle \over Area\_of\_square}\]

Here is Python code to simulate the value of pi:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1234)

num_sims = 5000
x_random = np.random.rand(num_sims)
y_random = np.random.rand(num_sims)

inside_circle = ((x_random ** 2 + y_random**2) < 1)

print(4*inside_circle.mean())
plt.figure(figsize=[8,5])
n_to_one = np.arange(1, num_sims+1)
plt.plot(n_to_one , 4*inside_circle.cumsum() / n_to_one)
plt.show()
Pi simulation convergence

Similar methods can be used to estimate the value of integrals via simulation.

Example 2. Solve a difficult probability problem

Solve this problem by P. Winkler:

One hundred people line up to board an airplane. Each has a boarding pass with an assigned seat. However, the first person to board has lost his boarding pass and takes a random seat. After that, each person takes the assigned seat if it is unoccupied, and one of the unoccupied seats at random otherwise. What is the probability that the last person to board gets to sit in his assigned seat?

The problem can be solved using logic and probabilities, but it can also be solved by simply programming the described behavior and running some simulations:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1234)

def simulate_boarding(num_passengers):
    passenger_seats = set(range(num_passengers))
    for i in range(num_passengers):
        if i == num_passengers - 1:
            if list(passenger_seats)[0] == i:
                return 1
            else:
                return 0
        if (i == 0) or (not i in passenger_seats):
            i = list(passenger_seats)[np.random.randint(0, num_passengers - i)]
            passenger_seats.remove(i)
        else:
            passenger_seats.remove(i)
        
num_sims = 10000
num_passengers = 100
positives = 0

is_same_seat = [simulate_boarding(num_passengers) for i in range(num_sims)]
is_same_seat = np.array(is_same_seat)


print(is_same_seat.mean())
plt.figure(figsize=[8,5])
one_to_n = np.arange(1, num_sims+1)
plt.plot(one_to_n, is_same_seat.cumsum() / one_to_n)
plt.show()
Probability simulation convergence

You can find more probability problems to practice here.

Example 3. Simulating game outcomes

How many games would it take Magnus Carlsen (Elo of 2847 as of 18-07-2021) to get back to his current rating if he was dropped at 1000?

To solve this problem we need to understand how the Elo system works.

First, given two player’s Elo ratings, the probability of player1 beating player2 is:

    \[P(\textrm{player1 beats player2}) = {1 \over 1 + K \cdot 10 ^{(Elo_2 - Elo_1)/400}}\]

Second, after the game, player1’s Elo rating is updated as follows:

    \[Elo_1= Elo_1+K \cdot (\textrm{result} - P(\textrm{player1 beats player2}))\]

Where:

  • result is 1 for a win, 0.5 for a tie and 0 for a loss
  • K (also known as K-factor) is the maximum possible adjustment per game and varies depending on the player’s age, games played and ELO

Now that we have a model, we just have to initialize Magnus current Elo to 1000 and code a while loop that:

  1. Has Magnus play a game against a player of his current Elo
  2. Calculates the probability of winning using the real Elo and simulates the outcome of the game
  3. Updates Magnus’s current Elo according to the result
  4. Stops the loop if Magnus has reached his real Elo
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1234)

def get_prob(elo1, elo2):
    return 1/(1+10**((elo2 - elo1)/400))

def update_elo(elo, prob, result, k):
    return elo + k * (result - prob)

def play_until_top(real_elo, initial_elo):
    current_elo = initial_elo
    num_games = 0
    k = 40
    elo_list = [initial_elo]
    while current_elo < real_elo:
        if num_games > 30:
            k = 20
        if current_elo > 2400:
            k = 10
        prob_win = get_prob(real_elo, current_elo)
        result = 1 if np.random.rand(1)[0] < prob_win else 0
        current_elo = update_elo(current_elo, 0.5, result, k)
        elo_list.append(current_elo)
        num_games += 1
    return elo_list

num_sims = 1000

num_games = [len(play_until_top(2847, 1000)) for i in range(num_sims)]
num_games = np.array(num_games)

print(num_games.mean())
plt.figure(figsize=[8,5])
plt.hist(num_games,bins=50)[2]

elo_history = np.array(play_until_top(2847, 1000))
plt.figure(figsize=[8,5])
plt.plot(np.arange(0, len(elo_history)), elo_history)
plt.show()
Example Elo trajectory
Games to real Elo distibution

Another cool example would be to simulate the NBA playoffs. For a first approach, you can assume that each team has a probability of winning proportional to the games they won during the regular season (GW) so that in any game the probability of team 1 winning is GW1 / (GW1 + GW2). You can also analyze how probabilities change if you change the series from Best of 7 to Best of 5 or Best of 9.

Example 4. Business application, estimating value at risk

Collectors LTD is a debt collection company focused on enterprise debt. It buys portfolios of business loans that have defaulted at some point and tries to collect the payments for those loans. Some of the companies will be bankrupt and won’t be able to pay, and others are likely to go bankrupt in the future. The key to Collectors LTD’s business is in estimating the value it can get back from a portfolio. For this reason, Collectors LTD has developed a model that predicts the probability of a company repaying part of that debt. Among those companies that repay some of the debt, the amount paid is distributed uniformly from 0% to 100%. Collectors LTD can use its model in combination with simulation to evaluate the expected return of the portfolio, and how volatile that return is.

Since I can’t share the real data with you, I’ve created a synthetic dataset that mimics the relevant properties:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1234)

def generate_synthetic_portfolio(num_companies):
    debt = 100000 * np.random.weibull(0.75, num_companies)
    
    prob_repayment = np.random.normal(0.2, 0.1, num_companies)
    prob_repayment = np.clip(prob_repayment, a_min=0, a_max=1)
    
    return debt, prob_repayment

num_companies = 1000
debt, prob_repayment = generate_synthetic_portfolio(num_companies)

Given the following synthetically generated portfolio, estimate the expected amount to be collected and the 95% percentile.

def simulate_collection(debt, prob_repay):
    num_companies = len(debt)
    did_repay = (np.random.rand(num_companies) < prob_repay)
    pct_paid = np.random.rand(num_companies)
    amount_collected = debt * did_repay * pct_paid
    return amount_collected.sum()

num_sims = 1000
amount_collected =
    np.array([simulate_collection(debt, prob_repay) for i in range(num_sims)])

print(f"Total debt: {np.round(amount_collected.mean())} usd")
print(f"Average amount collected: {np.round(amount_collected.mean())} usd")
percentile_95 = np.round(np.sort(amount_collected)[int(0.05*num_sims)])
print(f"95% percentile collection: {percentile_95} usd")

plt.figure(figsize=[8,5])
plt.hist(amount_collected,bins=50)[2]
Debt collection distribution

Keep in mind that this solution assumes the probabilities of collection are independent of one another. This isn’t true for systemic risks such as a global economic downturn.

Conclusion

I hope you’ve liked these examples and that you can find applications of simulation in your day-to-day data science job. If you’ve enjoyed the article, please subscribe and share it with your friends.

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Get more articles like this emailed to you

13 essential tips for learning machine learning and data science

When you start learning, it’s very hard to have a clear direction. You often waste time on uninteresting, useless, or outdated topics. You wander and run in circles.

However, once you’ve mastered the topic, it’s easy to look back and see the fastest path from noob to pro. If you only could go back in time and give yourself the roadmap… Even if I cannot do that with myself, I can do that for others. This is the objective of this article: to give you the tips I wish I knew when I started learning data science and machine learning.

To build this list, first I wrote down what has been useful to me in my experience as a data scientist. Then I went to Reddit, to seek help in curating and completing the list, getting 300+ upvotes and 35+ comments. I hope you find it helpful!

1. Get solid mathematics, probabilities, and statistics foundations

Mathematics and statistics are at the core of machine learning. So it will be very difficult to understand machine learning algorithms if you don’t know the building blocks.

However, this doesn’t mean you need to be a math wizard. You should understand math and stats concepts such as vectors, matrices, derivatives, probability distribution, independent variables, or standard deviation. More advanced mathematics (like learning to prove theorems) won’t help you much when studying machine learning, even though it can be a lot of fun.

2. Learn either Python or R and learn them well

When doing data science and machine learning, you will spend most of your time coding in R/Python. So it’s important to learn the ins and outs of your language of choice.

Data scientists spend a lot of time cleaning and manipulating data, so you should give special attention to data manipulation libraries. The most popular ones are Pandas for Python and data.table and dplyr for R.

3. Learn good programming practices

Writing clean and efficient code will make it easier to share your work with others. And even if you work alone, will make it easier for you to debug and maintain your own code. Entire books have been written about this so I’ll give you a short list:

  1. Use consistent and descriptive names for variables, columns, and functions
  2. Don’t repeat code, use functions or classes if you need to do the same process multiple times
  3. Understandable code is better than compact one: 10 lines everybody understands vs 2 lines nobody understands
  4. Don’t overoptimize your code at the start, but know where the bottlenecks (parts that won’t work well if you increase the volume of data) are in case you need it to scale
  5. Use consistent indentation and try to limit line length

4. You don’t need to learn all the different supervised learning models

This is one I struggled with. When I started learning I thought that every situation would need a different type of model and that I needed to learn them all to be well equipped. But this is far from true. Linear/logistic regression is surprisingly effective for tabular data problems. And XGBoost or random forest will help you if you have a lot of non-linearities. Artificial neural nets are great for image and NLP problems but are otherwise overkill and more difficult to set up.

Aditionally, you don’t have to keep up with all the published papers. Most staple techniques in the industry are decades old. If you ever have to face a very unique problem, then may be a good moment to dive into the literature.

5. Once you know the basics and understand them well, it’s mostly about doing projects

After completing one or two ML courses, don’t spend your time on more theory, dive straight into doing some projects. If you’re lacking some knowledge, you can pick it up on the way.

Working on projects puts your knowledge into practice, and helps you figure if you really understood everything well. Additionally, by doing projects you create valuable experiences that will help you get hired later on.

6. Doing tutorials and reviewing other people’s projects is very helpful at the start

When you’re learning a new tool or model and don’t feel confident about using it on your own, looking at an example is a great way to get some inspiration.

7. You can learn everything online for free, but some paid resources can be helpful

For example, studying a master’s will give you credentials and a class of peers. I’ve actually written a full article about self-learning vs studying a master’s.

Additionally, some useful online resources are paid. I have personally tried to distill my years of experience as a data scientist into Data Projects, a product to learn data science by doing real-world projects. I hope it can help others as much as it would’ve helped me.

8. Explaining your work to others is a great way to consolidate your knowledge

It’s also a great way to work on your communication. You can do this by telling your friends, blogging, or making youtube videos. This will be a crucial skill when working with others.

9. Don’t despair if you don’t get it right

Nobody gets it right the first time. Trial and error is the way to go, especially on fields like this where there is no one exact solution

10. Lean on online communities

The internet is full of helpful and generous people, if you’re struggling with something search and if you don’t find the answers, ask in the forums (reddit or stackoverflow).

11. Learn more about your problem domain

Don’t focus only on the purely technical, try to understand what is really behind the problems you’re modeling. It will help you decide which is the best error metric for the problem, select the most insightful variables, and communicate to non-technical stakeholders using their own language.

12. Work with messy data

Don’t just stick to problems with pre-cleanded data. The world is messy, and having some experience on treating and structuring data will prepare you for future challenges.

13. Work on what makes you curious, that will keep you motivated

Following your curiosity and your passions will make sure you don’t abandon your path to becoming a data scientist halfway through. Additionally, it makes the whole learning experience a lot more fun!

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Get more articles like this emailed to you

How to self-learn data science from scratch

When I learned data science I didn’t know where to start, so I wasted many hours learning only tangentially useful stuff. Now, after more than five years as a data science consultant, I know what I would’ve done differently. In this article, I will offer you a roadmap on how self-learn data science with links to useful resources.

Data science pre-requisites

Even though I believe everyone can learn data science, those with a technical background will have a head start. Before getting into DS specific subjects it is useful to have some notions about mathematics, statistics and probability.

It is not necessary to be an expert in any of those, but you need a solid foundation. If you’ve never studied any of those, don’t worry, I’m here to help. In the following paragraphs, I’ll briefly describe each prerequisite and link to educational resources.

Mathematics for data science

To get started with data science you need to get familiar with some of mathematics’ most common objects. These Khan academy lessons about vectors, matrices and functions are a good place to start. Also, here’s the summary (in more formal mathematical language) of a Stanford course. These concepts are the building blocks of most machine learning algorithms and provide you with a framework for structuring data. Getting to this level of mathematics will allow you to understand and use the algorithms that others have invented and implemented and get results.

If you really like mathematics, you can dive deeper into mathematics by taking full calculus and linear algebra courses. This will require a lot more work but will unlock a more complete understanding of the inner workings of machine learning algorithms and how to implement and adjust them.

Probability and statistics

Probability lies at the core of the data scientists’ view of the world. When dealing with big numbers and random events, probability and statistics provide the tools to make sense of them. It isn’t only about the exact methods or formulas, but also about developing a probabilistic intuition. These courses from Khan academy on probability and statistics are both beginner-friendly and got all the information you’ll need. Here is a mathematically formal summary of a probability course from Stanford.

In addition to formal education in probability and statistics, reading non-fiction books can also help to develop an intuition. I recommend the following books in no particular order: Thinking fast and slow, Factfulness, Thinking in bets, Fooled by randomness (or any of Nassim Taleb’s books).

Finally, reading about statistical paradoxes will help you make sense of data when you face unintuitive conclusions.

Data-oriented programming language

A big part of a data scientist’s job is reading, manipulating and running analysis on data. This is usually done by coding in a data-oriented language. These languages allow us to write instructions for a computer to execute. Even though there are many different programming languages, most of them use very similar structures. The two most popular data-oriented programming languages are Python and R, and you can start with either one. If at some later point you work with people using the other one, you can use that as an opportunity to learn it.

If you’ve never coded before, don’t worry. Both of them can be a good first point of contact with programming. A lot has been written about which one is better, but the truth is they have different strengths.

R’s strong points are:

  • It is designed for data and statistical work, so manipulating data is easier
  • There is a vast universe of statistics libraries
  • The Shiny library makes it very easy to make a web app with no previous web design experience
  • RStudio is a wonderful IDE (I haven’t found one that I like as much for Python)

Python’s strong points are:

  • It’s a general-purpose programing language as well as one of the most popular languages overall
  • It usually runs faster than R
  • It has better packages for deep learning

I personally prefer R because of its more compact syntax in the data.table package and also because I have more experience with it.

Learning R

If you are new to programming, I recommend you start with one of these resources:

If you have been coding for a while, you can get the basics with learn R in Y minutes.

Once you know the basics, it’s time to learn one of the two main data manipulation libraries: data.table (my personal favorite) or dplyr. Another useful library is ggplot2 for making beautiful graphics.

Learning Python

If python is your first programming language you can start with any of these:

If you’re already familiar with coding you can just read this documentation.

And once you’ve mastered python’s basics, you can go into the specialized tools to manipulate data: Pandas and Numpy. Here’s a tutorial and here’s a video to help you learn those packages.

Learn machine learning

Now we get to the exciting part.

There are many different techniques and tools in machine learning. One of them has been my most used analytical tool during my years as a data science consultant. And that technique is supervised learning, in both of its forms: classification and regression.

Supervised learning, also known as predictive modeling, is about learning from examples in which we know in advance the correct answer. In regression the answer is a numerical value, and in classification it is categorical.

Predictive models can be used to make demand forecasts, identify risky creditors and estimate the market price of a house among many other uses.

Here are some courses that will teach you the main framework to approach predictive modeling problems, as well as some supervised learning models:

In my experience, 3 families of models can help you solve most supervised learning problems you’ll ever encounter:

  1. Linear and logistic models (explained in the above courses) are easy to understand, easy to interpret, fast to train and reasonably accurate
  2. XGBoost (gradient boosting trees implementation) is a top-of-the-class model in terms of precision, speed and ease of use. However, they’re not as easy to interpret as linear models. Here’s an introduction to decision trees (pre-requisite) and a couple of articles about how XGBoost works
  3. Neural networks are great for natural language processing and image models. However, I’d leave them to more advanced data scientists since they’re more difficult to set up

Here are some examples of using linear regression in R and Python, and of using XGBoost in both languages.

SQL

SQL is the most used database language and most companies use one of its variants for their database. Even Amazon’s Athena and Google’s big query can be accessed using SQL syntax.

So if you’re planning on getting a job in data science I recommend you learn SQL since it will be a requirement for most employers. If you’re doing personal projects it’s up to you. For small-scale projects, you will be just saving your data on text files. For bigger projects, SQL skills may come in handy.

Sorry, your subscription could not be saved. Please try again.
Thanks for subscribing!

Get more articles like this emailed to you

What’s next?

Once you’ve learned the basics about R/Python and supervised learning, it’s time to practice. Do a project with open data or participate in a Kaggle competition. Or get a job as a data scientist and learn while getting paid. Practice is what will help you hone your skills and generate proof of your knowledge.