L2 Application Support / SQL / OS / Autosys / Monitoring — Real Q&A (Concise Answers)

Use the toggles below to reveal crisp, interview-ready answers. Content is grouped by topic for quick revision.


Application Support – L2

1) How would you troubleshoot a web application that is running slow for some users but not all?

Click to see the answer
Correlate by cohort: affected locations/browsers/ISPs. Check network latency and CDN edges; review server CPU/RAM and thread/connection pools; examine app logs/APM traces for slow DB calls; verify load balancer health and stickiness; compare feature flags/config drift.

2) What is the first step if you receive an application “500 Internal Server Error”?

Click to see the answer
Check application & web server logs for stack traces and failing modules. Confirm recent deployments/config changes. Validate database connectivity/credentials and environment variables.

3) How would you approach repeated application crashes during peak hours?

Click to see the answer
Inspect logs for unhandled exceptions/memory leaks; analyze resource trends; review thread/DB pool limits/timeouts; reproduce with load tests; propose code/infra fixes and plan a controlled rollout.

4) A batch job failed with a “File not found” error. What’s your next step?

Click to see the answer
Validate configured path and working directory; check permissions/ownership; confirm the upstream job produced the file and mounts/paths are correct (containers/NFS).

5) How do you handle a high-severity incident impacting multiple users?

Click to see the answer
Start the bridge, assign roles (Incident Commander, Comms, Scribe); use monitoring/logs for triage; apply safe workaround; communicate impact/ETA; after fix, document RCA and prevention actions.

SQL – L2

6) How do you find the top 5 slowest running queries (by average time) in SQL Server?

Click to see the answer
SELECT TOP 5
  qs.query_hash,
  qs.total_elapsed_time / NULLIF(qs.execution_count,0) AS avg_elapsed,
  SUBSTRING(st.text, 1, 4000) AS sample_sql
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
ORDER BY avg_elapsed DESC;

7) How can you check if a table is locked in Oracle or SQL Server?

Click to see the answer
-- Oracle (session + lock info)
SELECT s.sid, s.serial#, l.type, l.id1, l.id2, l.lmode, l.request
FROM v$lock l JOIN v$session s ON s.sid = l.sid;

-- SQL Server
SELECT * FROM sys.dm_tran_locks; -- or: EXEC sp_who2

8) What’s the difference between INNER JOIN and LEFT JOIN?

Click to see the answer
INNER JOIN returns only matching rows. LEFT JOIN returns all rows from the left table plus matches on the right (NULLs when missing). Use LEFT when you must preserve left-side completeness.

9) How do you identify and kill a long-running query in SQL Server?

Click to see the answer
-- Identify
SELECT session_id, status, wait_type, blocking_session_id, start_time, command
FROM sys.dm_exec_requests
WHERE status <> 'background';

-- Terminate
KILL <session_id>;

10) How can you optimize a query that is performing poorly?

Click to see the answer
Create selective indexes; avoid SELECT *; inspect the execution plan; simplify predicates; reduce correlated subqueries; consider caching/materialized views where appropriate.

Windows/Linux Troubleshooting – L2

11) A Linux service is not starting. How do you troubleshoot?

Click to see the answer
# Systemd basics
systemctl status <service>
journalctl -u <service> --since "30 min ago"

# Config & ports
ss -tulpen     # or: netstat -tulnp
sudo -u <svcuser> /path/to/binary --version  # sanity check

12) How do you check if a port is listening in Windows?

Click to see the answer
netstat -ano | findstr :<PORT>
tasklist /FI "PID eq <PID>"   # map PID to process

13) A Windows server shows 100% CPU usage. How do you proceed?

Click to see the answer
Use Task Manager/Resource Monitor to identify the process; check for runaway threads/memory leaks; capture a PerfMon snapshot or dump if repeatable; restart service with change control; apply known hotfixes if applicable.

14) How do you check disk space usage in Linux?

Click to see the answer
df -h                   # filesystem usage
du -sh /var/* | sort -h # biggest directories

15) DNS resolution is failing on a Linux machine. Steps?

Click to see the answer
resolvectl status                 # or check /etc/resolv.conf on non-systemd
dig example.com @<dns>
nc -vz <dns_ip> 53                # reachability test

Autosys – L2

16) How do you check the status of a job in Autosys?

Click to see the answer
autorep -j <job_name> -q

17) How do you force start an Autosys job?

Click to see the answer
sendevent -E FORCE_STARTJOB -J <job_name>

18) What’s the difference between ON ICE and ON HOLD in Autosys?

Click to see the answer
ON ICE: job won’t run, but dependents may still run. ON HOLD: job won’t run, and dependents are held until release.

19) How do you find the last run details of a job?

Click to see the answer
autorep -j <job_name> -r

20) How do you troubleshoot a job failure in Autosys?

Click to see the answer
Review job stdout/stderr; verify JIL definition; inspect dependent jobs; check credentials/system resources; re-run with verbose logging if safe.

Monitoring Tools – L2

21) What is the difference between threshold-based and anomaly-based alerts?

Click to see the answer
Threshold: triggers when a metric crosses a fixed bound. Anomaly: flags deviations from learned baselines using historical patterns.

22) In Nagios, how do you check if a host is reachable?

Click to see the answer
Use the web UI to view host status; or run a ping/NRPE check from the Nagios server and review notifications/state history.

23) How do you integrate monitoring alerts with ServiceNow?

Click to see the answer
Forward events via webhook/API (Event Management/MID). Include CI, severity, dedup keys; map to incident assignment rules; test end-to-end with a synthetic alert.

24) In Splunk, how do you find events from the last 15 minutes for a specific error code?

Click to see the answer
index=app_logs error_code=500 earliest=-15m@m latest=now

25) What’s the difference between proactive and reactive monitoring?

Click to see the answer
Proactive: catches early signals and prevents impact (e.g., SLO burn alerts, saturation). Reactive: responds after user-visible incidents; invest in both, bias to proactive.

Need help?

Book a 25-minute Mock Troubleshooting Interview to get a scorecard and a personalized improvement plan.
/mock-online-assessment/

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Read More