dbfs – command line access to SQL Server DMVs

Page content

With SQL Server on Linux, Microsoft has recognized that they’re opening up their products to a new set of users. People that aren’t used to Windows and it’s tools. In the Linux world we have a set of tools that work with our system performance data and present that to us as text. Specifically, the placeholder for nearly all of the Linux kernel’s performance and configuration data is the /proc virtual file system, procfs. Inside here you can find everything you need that represents the running state of your system. Processes, memory utilization, and disk performance data all of this is presented as files inside of directories inside /proc.

Now, let’s take this idea and extend it to SQL Server. In SQL Server we have DMVs, dynamic management views. These represent to current running state of our SQL Server. SQL Server exposes the data in DMVs as table data that we can query using T-SQL. 

So, Microsoft saw the need to bring these two things together, we can expose the internals of SQL Server and its DMVs to the command line via a virtual file system. And that’s exactly what dbfs does, it exposes all of SQL Server’s DMVs as text files in a directory. When you access one of the text files…you’ll execute query against the SQL Server and the query output comes back to you via standard output to you Linux console. From there you can use any of your Linux command line fu…and do what you want with the data returned. 

Setting up dbfs

So first, let’s go ahead and set this up. I already have the Microsoft SQL Server repo configured so I can install via yum. If you have SQL on Linux installed, you likely already have this repo too. If not, go ahead and follow the repo setup instructions here. To install dbfs we use yum on RHEL based distributions.

sudo yum install dbfs -y
First off, think about what’s going on under the hood here…we’re going to allow the system to execute queries against DMVs…so let’s try to keep this as secure as possible, I’m going to create a user that is allowed to only query DMVs with the VIEW SERVER STATE permission. So let’s do that…
Let’s log into our SQL Server via SQLCMD
sqlcmd -H localhost -U sa -p 
And execute this code to create a user named dbfs_user 
CREATE LOGIN [dbfs_user]
WITH PASSWORD=N'ThisSureIsntSecure',
Once created, let’s assign this user permissions to query DMVs
The next step is we need to create a directory where dbfs will place all the files representing the DMVs we wish to query
mkdir /tmp/dbfs  
Now, let’s go ahead and configure dbfs. I’m going to place it’s configuration file in /etc/ since that’s the standard location for configuration files on Linux systems.
sudo vi /etc/dbfs.config
And inside that file, let’s use the following configuration. Pretty straight forward. Define a configuration name, here you see server1, the hostname which is the locally installed SQL instance. We’ll use the username and password of the user we just created and also defined is a version. While this isn’t very well documented, the code here shows that if you’re on version 16 (SQL Server 2016) or newer it will create files dbfs files with a .json file extension which exposes your DMV data as…you guessed it JSON. Also if you want to add a second server to dbfs, just repeat the configuration inside the same text file.

Running dbfs

Now with all the preliminaries out of the way, let’s launch dbfs. Basic syntax here, the actual program name with the parameter -c pointing to the configuration file we just created and the -m parameter pointing to the directory we want to “mount” our DMVs into.
dbfs -c /etc/dbfs.config -m /tmp/dbfs/ 
Now, what’s interesting about dbfs is if you log out dbfs stays running. Honestly, I don’t like that, if this is the case it should be running as a service managed by systemd or whatever init daemon you’re using on your Linux distribution. I mentioned that on their GitHub repo. If this is going to be a user process, then I should have the choice the background the task myself.

Using dbfs

Looking at the source for dbfs it gets a list of all DMVs from sys.system_views from the SQL Server you configured it to connect to, then creates a file for each and every one of those DMVs. So we have full coverage of all the DMVs available to us and since you can use any bash command line fu to access the data now…the options are really limitless. Microsoft has a few good demos on the GitHub repo here. Let’s walk through a few examples now.
Accessing a DMV

This is pretty straight forward, you read from the file just like you would read from any other file on a Linux system. So let’s do that…we add the column -t option to make sure all the columns are aligned in the output.
cat dm_exec_connections | column -t
<p class="tab-convert:true lang:batch decode:true ">
  And our output looks like this…
session_id  most_recent_session_id  connect_time  net_transport  protocol_type  protocol_version  endpoint_id  encrypt_option  auth_scheme  node_affinity  num_reads  num_writes  last_read  last_write  net_packet_size  client_net_address  client_tcp_port  local_net_address  local_tcp_port  connection_id  parent_connection_id  most_recent_sql_handle
51          51                      Jun           19             2017           09:46:33:660AM    TCP          TSQL            1946157060   4              FALSE      NTLM        0          7           7                Jun                 19               2017               09:46:34:103AM  Jun            19                    2017                    09:46:34:107AM  8000  37641  1433  EDC82F4B-D333-4DCA-88BB-4AB2CE9  02000000a0c09f36765d6d3b8a15a90772d74e103ac8b653000000000000000
Notice in the output above how the connect_time column is split incorrectly? We need to tell column to use the tab as a delimiter. By default it uses whitespaces. So let’s do that…
cat dm_exec_connections  | column -t -s $'\t'
And now our output looks much better
session_id  most_recent_session_id  connect_time                net_transport  protocol_type  protocol_version  endpoint_id  encrypt_option  auth_scheme  node_affinity  num_reads  num_writes  last_read                   last_write                  net_packet_size  client_net_address  client_tcp_port  local_net_address  local_tcp_port  connection_id                    parent_connection_id  most_recent_sql_handle
51          51                      Jun 19 2017 09:29:36:340PM  TCP            TSQL           1895825409        4            FALSE           SQL          0              7          7           Jun 19 2017 09:29:36:347PM  Jun 19 2017 09:29:36:350PM  4096              42784            1433            EDFB041F-B319-4098-B4DE-80739A7                        01000100f0e88f076013d837050000000000000000000000000000000000000

Selecting off a subset of columns

Well you probably noticed that the output is a bit unruly since it’s outputting all of the DMV’s columns. So let’s tame that a bit and pull out particular columns. To do that we’ll use a tool called awk which will print out columns based on the numeric index, so $1 is the first column and so on. 
cat dm_exec_connections | awk '{ print $1,$3,$4,$7 }'
And our output looks like this
session_id connect_time net_transport endpoint_id
51 Jun 19 TCP
Something isn’t right…as DBAs we think of things in rows and columns. So we’re going to count across the top and think the 7th column is going to yield the 7th column and it’s data for each row, right? Well, it will but data processed by awk is whitespace delimited by default and is processed row by row. So the 7th column in the second line isn’t the same as the output in the first line. This can be really frustrating if your row data has spaces in it…like you know…dates.
So let’s fix that…the output from the DMVs via dbfs is tab delimited. We can define our delimiter for awk with -F which will allow for whitespaces in our data. Breaking the data only on the tabs. Let’s hope there isn’t any tabs in our data!
cat dm_exec_connections | awk -F $'\t' '{ print $1,$3,$4,$7 }'
And the output from that looks like this, much better but we don’t have the nice columns.
session_id  connect_time  net_transport  endpoint_id
51          Jun           19             2017         02:29:36:340PM  TCP  4
We’re so close, we can’t throw column on the end to make this nice and columnar because awk with this configuration it will remove the tab delimiters on it’s output stream. column by default will do the same thing too, but we can let column do the work for us and have it print tab delimiters in it’s output stream. 
cat dm_exec_connections | column -t -s $'\t' -o $'\t' | awk -F $'\t' '{ print $1,$3,$4,$7 }'
And voila, we end up with some nice neatly formatted output
session_id connect_time               net_transport endpoint_id
51         Jun 19 2017 02:29:36:340PM TCP           4          

Searching in Text

We can search for text in the output using grep, here’s a quick example looking for the dedicated admin connection in dm_os_schedulers
grep DAC dm_os_schedulers
And here’s the output. 
00000005391c0040	64	1048576	0	VISIBLE ONLINE (DAC)	1	1	1	3	5	2	940180	0	00000005392aa160	000000053906e040	0000000539070040	4000	44	0	0

SQL folks…keep in mind, grep will only output lines matched, so we loose the column headers here since they’re part of the standard output stream when accessing the file/DMV data.

Moving forward with dbfs

We need the ability to execute more complex queries from the command line. Vin Yu mentions this here. As DBAs we already have our scripts that we use day to day to help us access, and more importantly make sense of, the data in the DMVs. So dbfs should allow us to execute those scripts somehow. I’m thinking we can have it read a folder on the local Linux system at runtime, create files for those scripts and throw them in the mounted directory and allow them to be accesses like any of the other DMVs. The other option is we place those scripts as views on the server and access them via dbfs. Pros and cons either way. Since it’s open source…I’m thinking about implementing this myself :)

Next is, somehow we need the ability to maintain column context throughout the output stream, for DBAs it’s going to be tough sell having to deal with that. I know JSON is available, but we’re talking about DBAs and sysadmins here as a target audience. 

In closing is a great step forward…giving access into the DMVs from the command line opens up SQL Server to a set of people who are used to accessing performance data this way. Bravo!