This blog post is intended to give all of you my take on the last few days. Since it's been such an exciting time for us here, I'd like to give my own perspective on it. This is a pretty tech-heavy post, so be warned.

Last Thursday I travelled to Paris to give a couple of interviews to the mainstream French gaming media about Perpetuum. All went well, and I'd like to seize this opportunity to send a big shout to all the participants for their kindness and professionalism.

Something you don't want to see.

Saturday morning, while roaming the streets of Paris, I started receiving messages and phone calls from the dev team at home about a nice amount of new registrations. As time went on the number just went up and up. Having no idea what's going on, we were simply happy about it and tried to craft theories around it. Later that day I met Guillaume - who is one of our players - and he explained everything about the current situation on the MMO landscape. (Thanks Guillaume for the hospitality, I owe you one!) In the meantime, I was being spammed with messages that something weird was going on with the relay server, causing insane lags. You can imagine the number of calls, as even the battery ran out in my mobile!

Sunday night I got back to Budapest, went home, took a shower and immediately headed to the office to check on the situation and fix the server problem. In the office I found out Calvin came to pick me up at the airport, but since my phone died on me earlier, we managed to miss each other. From this point on, I completely lost track of days and hours with almost no sleep only focusing on the server issue.

We checked everything: optimized the SQL, implemented new caches on the server, tried many other things, but they were minor problems compared to the evil seed that caused the problem. The main difficulty is that it's very hard to generate load on the dev server similar to the live one. For the tech guys out there, the transaction coordinator (MSDTC) was NOT the problem, it causes an insignificant load, so we are fine with that.

We had to put in two nights in a row. We simply had to rest, but it was made hard by the stress and the guilt of leaving the gimped server alone, not knowing what would await us when we wake up. Things were looking grim.

When we realized the seriousness of the problem, I contacted one of the most knowledgeable people I know in this field and asked him to help out as a fresh mind always has a better chance finding hard bugs. Our deepest respect Soci! (Shameless promotion: http://soci.hu/). Luckily, he had time to check out our architecture and together we were able to start an investigation session on Wednesday night. He gave us several suggestions, pointed out and helped optimize several things in the database layer. Then we moved on to inspect the relay server's code. Since the source is huge, the quickest and most realistic method was to attach an analyzer to the live server application. I must admit this was the last thing I wanted to do on my own, but at this point I was willing to sacrifice anything to find the root of the problem.

He instantly figured out that one innocent-looking little function, namely the one which returns who is online (in chat channels, for example) has an emergent behavior, resulting in an exponential load and suffocating the server. (Insert random sarcasm about glaring oversight and tech madness here.) We then ran Visual Studio performance monitor on the live server to dig down to the heart of it. Soci said it might cause some load, so we messaged the server with a warning about what we were doing. And then we accidentally the whole server. :) 0245 server time, bye-bye field containers!

Thursday we closed the session with Soci and went back to implement what we'd learned. Quick tests on the dev server showed pretty amazing results so it made us rather confident. With shaky hands we patched and let the first 50 players in. This was the moment of success! So we immediately let 100 more players in. During this period we constantly checked the load which became insignificant (2-3%) so we raised finally the cap to JUST OVER 9000!!!!! >:)

The time to chill out has finally come, so we popped open one of Gargaj’s precious Norwegian treasures that had been sitting on our shelves for some years, a bottle of blackcurrant wine. I don’t know if it was the quality of the booze or the grace of that moment, but it was one of the sweetest sips we ever had.

Well, that’s the end of that, hopefully I managed to shed some light on the problem and our day-to-day march until we reached victory. The code belongs to us, but the world of Nia is yours. Enjoy!

Comments for this post

1 Snowman

heh, it wasnt that technical but does give us a good insight into how you probaby felt. Good that you really open about the things that you had to go through to get this problem sorted.

No doubt you will encounter future difficulties but for now, rest well o7

2 Arga

Thanks for a behind the scene look, maybe a quick cameo appearance on the next Podcast, eh? :)

3 Dan

for future bug hunting and performance test you could make bot players, so you would have like 500+ grophos spamming missiles on your test server ;] Dev of this other game did that to test server performance during fleet fights.

4 Wannes Jah

Hey crm and gargaj, congrats on fixing this and for the influx of players which started this whole thing. Me and a lot of others are really happy how you handled this and kept us informed every step of the way. Those were some pretty busy days on IRC : )

I hope you can get some well deserved rest to recuperate!

5 Kynes

As a tech person, I would enjoy reading more technical blogs like this!

6 DEV Crm

I didn't dare to go deeper into concurrency and stuff like that... :) I'll get braver next time.

7 San Vigil

"so we raised finally the cap to JUST OVER 9000!!!!! >:) "

HAHAHAHAHA +3 internets to Dev Crm for proper use of an awesome internet meme!! LOL

Great job to the whole team on fixing the issue guys! Hats off to your hard work and bold moves to save my GoBots! :)

8 Antiquado deLune

Thanks for this update! And thanks for the work you guys put in!

9 Josefius

Awesome sauce, hopefully I'll be home to get on some time this weekend.

10 Andrevich Tolstoy

You guys deserve every drop of that wine! I'll be subbing tomorrow, and I'm still undecided if I'm subbing for the awesome game or the awesome devs.

I loved the blog, too!

11 Sarah Haran

A new player here. Thank you vets for letting us new players play during this time.

12 Eta Carinea

Thanks for the update Dev Crm sounds like it was one of those momments and congrats on solving the issue, must have been a good momment when you finlay realised the problem was solved. The rest of the team did a great job in keeping the players up to date.
Soci sounds like a good guy to have around, and dont worry about getting to Techi, i for one enjoy these types of blogs.

Eta

13 Ralph Law

Brilliant work to all the devs and the guest mastermind. Thank you for an excellent game!

14 Winter Solstice

Thank you for such a deep insight into what was going on behind the scenes and of course, thank you for the amazing hospitality shown to all of the 'Exiles'. We are honored to color the landscape of Nia with our charred robot wrecks. :)

15 Johnny EvilGuy

The handling of the problem was top-notch and like Andrevich, I'm not sure if it's the game or the dev team that pulled me in as a subscriber. Keep up the fantastic work and here's hoping for years of bright future for Perpetuum!

16 Gaulois

Yay! :)

17 Marlona Sky

Great work and keep the blogs coming. :)

18 Owen

Finally read that article. It must have looked pretty epic from your point of view. Awesome work anyway.

Keep it up fellows !

19 Zap Kalan

at the time of these problems, my acc was in trial-status. I am impressed of how you communicate with the community. I have a long history of online games (DaoC/evequest/eve/...) but none of them had this service.
Great work - hope you keep it like this ;)

PS: i bought my first 30 days *g*

20 Twiz

Glad you found the problem on the server thingie. At least now the new players can play the game as it should be played without the player cap and lag. \o/

21 excession

Always impressed with the devs honesty and communication in this game. I'd stay subbed here just to support these guys.

22 Zex Maxwell

as an old eve player. I wanted to say Sorry for braking your server. but it resulted a fix so I'm not so sorry any more :P

23 Dan

MOAR dev blogs! ;]

24 Lonwolf

Outstanding job fellas! Nice how you solved the problem but still managed to let players play. In addition, always enjoy the communication I get from the devs about the problems and constant updates. I associate this with great MMO customer service, which is sadly nearly non-exsistent in games today.

Keep up the great work my friends!

25 Theo CN

Great blog.
It was a very nice read.