Square Service Outage
68 points by m_coder 8 years ago | 28 comments- ewbourget 8 years agoThis is Erik from the Square engineering team. Our service has been restored; we will be following our standard postmortem process and will be making the results of this one public. We currently believe that this was caused by a bad deploy followed by a thundering herd capacity problem in our authentication service - no DDOS attack, etc.
History and details are available at issquareup.com.
We apologize for the downtime; this situation is well outside what we expect of our service and of ourselves.
- ewbourget 8 years agoPosted: https://medium.com/square-corner-blog/incident-summary-2017-...
Please feel free to email me at ewb_at_squareup.com with any questions. Again, we apologize for the outage and take remediation of these issues seriously.
Less technical version: https://www.issquareup.com/incidents/y4mrzjz0nl2d
- cypherpunks01 8 years agoThanks Erik. Where will the postmortem be posted?
- ewbourget 8 years agoIt will be posted on issquareup.com.
- ewbourget 8 years ago
- ewbourget 8 years ago
- gmisra 8 years agoAnecdata from a popular coffee shop ~1 mile from Square HQ:
- Staff at the cafe are extremely frustrated, no real visibility into what's going on.
- Finding the status page was a struggle. Following @square and @sqsupport was insufficient, as both accounts have been publicly silent during the entire outage. The status page, hosted at the non-obvious issquareup.com, is only listed on the profile pages of those social accounts. I located the page and shared it with the cafe staff, which provided some context as to what was going on.
- But, the status page itself was not very useful to them. The information in it is moderately useful for a technical user, but most of Square's POS customers aren't technical? More importantly, most of the hands-on operators of these POS systems are even less technical.
- The only solution offered is to "switch to offline mode", but that only works if your square app hasn't already logged you out, which had happened long before reading about the solution. This behavior corroborated by twitter anecdotes and other comments in this thread.
- There is no other solution path presented.
- Without any other information to share, staff is describing the issue as a "nationwide Square server crash" to all customers.
- Some customers just left when faced with the outage (alternatives are cash or an on-site, fee-based ATM)
- All of this is happening while the staff is continuing to take orders, serve customers, deal with irate customers, and generally be positive and courteous.
- The only reason they retried the app just now is because I read the comment from the Square engineer on this thread announcing service restoration.
Whatever user model Square has of the day-to-day operators of their POS, it seems to be wildly miscalibrated, especially around how to handle incident communication.
- tedmiston 8 years ago> - Finding the status page was a struggle. Following @square and @sqsupport was insufficient, as both accounts have been publicly silent during the entire outage.
I mean, it's clear opening the two Twitter account pages, both have sent tons of replies during this time period.
On @sqsupport specifically they clearly state in the bio that their tweets aren't the right place to check for service outages:
> We're currently working through some issues. For live updates, please check http://issquareup.com
So this doesn't solve the problem of bringing Square online but it also doesn't really sound like the merchant is trying very hard as the right channel was easy to find. Besides adding email / text message alerts to merchants for downtime, Square is doing a lot more than most.
- Philip_with1L 8 years agoThanks for posting this, and you're right. I happen to work at a café and follow these type of events. I was unsure and only slightly confused where to get the information. Our boss slack'd the dashboard and then I found the support twitter. I'd bet many non-tech barista were confused where to look for guidance. It all made sense to me once I found it.
- tedmiston 8 years ago
- dvcc 8 years agoBeing down for an hour as a payment processor is crazy. Going off some old figures [0], and assuming 0 offline transactions (and a bunch of other assumptions too), I think it is around ~$3,500,000 in unprocessed transactions?
Must be stressful trying to bring it back online.
[0] https://techcrunch.com/2014/01/13/putting-squares-5b-valuati...
- tedmiston 8 years agoFrom the page:
> While we continue working to resolve the issue, we recommend that all sellers switch to offline mode, which will enable you to continue taking payments via swiping. Offline mode instructions are available at: squ.re/offlinemode
Though there are some big caveats:
> - Your current swiping rate will be applied to offline transactions, so you’ll see no difference in fees.
> - When operating in Offline Mode, there is additional risk with any payments you accept. Square is not responsible for any loss due to declined cards or expired payments taken while offline or for chargebacks.
> - Square can not contact any customers on your behalf should a payment be declined or expire when taken in Offline Mode.
So if Square is somehow down for 73 hours, a lot of businesses lose a lot of money.
I guess as a business owner one should now consider having a backup credit card reader through a different service.
- agency 8 years agoI was at a cafe when this went down and they said they couldn't switch to offline mode because this outage logged them out and apparently you need to be logged in to switch. They don't accept cash and ended up closing shop for the duration of the outage.
- niij 8 years ago>don't accept cash
What is their reasoning for not accepting cash payments? I have never been somewhere that did not accept cash and can't see how that would benefit customers?
- niij 8 years ago
- agency 8 years ago
- tedmiston 8 years ago
- jrobn 8 years agoWe use Square as our point of sales system at our spa. We are biting our nails now since most of our sales are $75+ and people don't generally carry around that kind of cash anymore. Our iPad also suddenly got signed out of the POS app. Luckily my phone was signed in so I put it in airplane mode to kick it into OFFLINE mode.
You can't sign into the square dashboard either so access to square appointments on the browser is a no go.
- askafriend 8 years agoI just went to a coffee shop that I go to regularly and was confused when they said they're cash only for today. This explains why.
On that note, I also saw multiple people leave to go to a different coffee shop because they didn't have cash on them.
- pm90 8 years agoThis is a pretty huge deal. I really like square and I do hope they come back soon. Like another poster said, I'm at a coffee shop and they are frustrated as fuck; most patrons don't carry much cash around here.
- joez 8 years agoHow bad is this?
Seems like they have offline mode. Do their customers know how to use this? What's the chance for increased fraudulent swipes?
- cypherpunks01 8 years agoIf you swipe a card and their backend errors out or is unreachable, it does prompt you to switch to offline mode (as long as you're already logged in and have taken online transactions recently).
If a customer knows the payment processor is offline, they can use an invalid card and it will appear to go through. Merchant will be stuck with the liability after the transaction is later sent and declined.
- Philip_with1L 8 years agoOffline mode is great for when the shops WiFi or local provider is having connectivity issues. The Square software and reader queue card swipes and transactions along with tip and receipt information until a connection is reestablished. You can just keep ringing people up.
Today, we went into offline mode automatically and processed a few transactions. Then our app was forcibly logged out of Square service and we couldn't get back in to continue in offline mode.
- danielhooper 8 years agoAFAIK offline mode only works if you have successfully completed a non-offline transaction within the last 24 hours.
- zitterbewegung 8 years agoAccording to the docs on offline mode if the card is declined for any reason you as the merchant is on the hook. And those payments will attempt to process when their service is restored .
- kayfox 8 years agoOffline mode is only available if your already logged in.
- cypherpunks01 8 years ago
- huangc10 8 years agoIs the actual failure with logging in and creating transactions or with the checkout or is everything down? This seems like it'll be a pretty big blow especially with lunch soon in the west coast.
At least good old hard cash still works.
- kayfox 8 years agoNoone can log in and it cant process transactions.
So, if you are logged in already you can use offline mode.
If you use their point of sale software to track cash sales and are not logged in already, your pretty screwed at this point.
- Philip_with1L 8 years agoYes, this exactly. We went into offline mode 1st and then that stopped working completely (all cards/taps payments rejected). So we asked every customer if they had cash before taking their order and we're able to complete those transactions just fine. Soon afterwards, both of our terminals (iPad) were kicked out of the app and we resorted to paper and calculator for cash only.
- Philip_with1L 8 years ago
- kayfox 8 years ago
- jrobn 8 years agoper issquareup.com "We’re still experiencing issues; however, we are seeing initial positive improvements in response to the steps we have taken to remove load from the affected service"
Could this be a DoS of some kind?
- myowncrapulence 8 years agoBeen an hour.. wow. Is this a ddos on their auth services?
- jvehent 8 years agoIf your service has higher SLA requirements than your providers contractually committed to, you're doing something wrong.
- cypherpunks01 8 years agoI'm not sure what you mean—who are you saying is doing the wrong thing here?
- emptythought 8 years agoThey're saying if you need more reliability than a service provides, but choose the cheap option with too low(or no) SLA, then you screwed up.
As a former POS engineer, this has been my gripe about these services from the get-go. Real payment processors, and POS software/SaaS vendors you... pay for guarantees about stuff like this, and have clear workarounds. Does it screw up sometimes? Yea. But you don't get opaque downtime like this, and you were given a clear workaround(and ALWAYS a clear offline mode you wont get locked out of flipping on, like the case here) in the first place.
This is a failure both on the customers side, and on squares side. They basically scaled a pickup truck up to a delivery truck without considering why a delivery truck was designed differently in the first place, at least in some ways.
- emptythought 8 years ago
- cypherpunks01 8 years ago