L2 Application Support / SQL / OS / Autosys / Monitoring — Real Q&A (Concise Answers)

Ashwani SinghAugust 13, 20250179 reads

Use the toggles below to reveal crisp, interview-ready answers. Content is grouped by topic for quick revision.

Application Support – L2

1) How would you troubleshoot a web application that is running slow for some users but not all?

Click to see the answer

Correlate by cohort: affected locations/browsers/ISPs. Check network latency and CDN edges; review server CPU/RAM and thread/connection pools; examine app logs/APM traces for slow DB calls; verify load balancer health and stickiness; compare feature flags/config drift.

2) What is the first step if you receive an application “500 Internal Server Error”?

Click to see the answer

Check application & web server logs for stack traces and failing modules. Confirm recent deployments/config changes. Validate database connectivity/credentials and environment variables.

3) How would you approach repeated application crashes during peak hours?

Click to see the answer

Inspect logs for unhandled exceptions/memory leaks; analyze resource trends; review thread/DB pool limits/timeouts; reproduce with load tests; propose code/infra fixes and plan a controlled rollout.

4) A batch job failed with a “File not found” error. What’s your next step?

Click to see the answer

Validate configured path and working directory; check permissions/ownership; confirm the upstream job produced the file and mounts/paths are correct (containers/NFS).

5) How do you handle a high-severity incident impacting multiple users?

Click to see the answer

Start the bridge, assign roles (Incident Commander, Comms, Scribe); use monitoring/logs for triage; apply safe workaround; communicate impact/ETA; after fix, document RCA and prevention actions.

SQL – L2

6) How do you find the top 5 slowest running queries (by average time) in SQL Server?

Click to see the answer

SELECT TOP 5
  qs.query_hash,
  qs.total_elapsed_time / NULLIF(qs.execution_count,0) AS avg_elapsed,
  SUBSTRING(st.text, 1, 4000) AS sample_sql
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
ORDER BY avg_elapsed DESC;

7) How can you check if a table is locked in Oracle or SQL Server?

Click to see the answer

-- Oracle (session + lock info)
SELECT s.sid, s.serial#, l.type, l.id1, l.id2, l.lmode, l.request
FROM v$lock l JOIN v$session s ON s.sid = l.sid;

-- SQL Server
SELECT * FROM sys.dm_tran_locks; -- or: EXEC sp_who2

8) What’s the difference between INNER JOIN and LEFT JOIN?

Click to see the answer

INNER JOIN returns only matching rows. LEFT JOIN returns all rows from the left table plus matches on the right (NULLs when missing). Use LEFT when you must preserve left-side completeness.

9) How do you identify and kill a long-running query in SQL Server?

Click to see the answer

-- Identify
SELECT session_id, status, wait_type, blocking_session_id, start_time, command
FROM sys.dm_exec_requests
WHERE status <> 'background';

-- Terminate
KILL <session_id>;

10) How can you optimize a query that is performing poorly?

Click to see the answer

Create selective indexes; avoid SELECT *; inspect the execution plan; simplify predicates; reduce correlated subqueries; consider caching/materialized views where appropriate.

Windows/Linux Troubleshooting – L2

11) A Linux service is not starting. How do you troubleshoot?

Click to see the answer

# Systemd basics
systemctl status <service>
journalctl -u <service> --since "30 min ago"

# Config & ports
ss -tulpen     # or: netstat -tulnp
sudo -u <svcuser> /path/to/binary --version  # sanity check

12) How do you check if a port is listening in Windows?

Click to see the answer

netstat -ano | findstr :<PORT>
tasklist /FI "PID eq <PID>"   # map PID to process

13) A Windows server shows 100% CPU usage. How do you proceed?

Click to see the answer

Use Task Manager/Resource Monitor to identify the process; check for runaway threads/memory leaks; capture a PerfMon snapshot or dump if repeatable; restart service with change control; apply known hotfixes if applicable.

14) How do you check disk space usage in Linux?

Click to see the answer

df -h                   # filesystem usage
du -sh /var/* | sort -h # biggest directories

15) DNS resolution is failing on a Linux machine. Steps?

Click to see the answer

resolvectl status                 # or check /etc/resolv.conf on non-systemd
dig example.com @<dns>
nc -vz <dns_ip> 53                # reachability test

Autosys – L2

16) How do you check the status of a job in Autosys?

Click to see the answer

autorep -j <job_name> -q

17) How do you force start an Autosys job?

Click to see the answer

sendevent -E FORCE_STARTJOB -J <job_name>

18) What’s the difference between ON ICE and ON HOLD in Autosys?

Click to see the answer

ON ICE: job won’t run, but dependents may still run. ON HOLD: job won’t run, and dependents are held until release.

19) How do you find the last run details of a job?

Click to see the answer

autorep -j <job_name> -r

20) How do you troubleshoot a job failure in Autosys?

Click to see the answer

Review job stdout/stderr; verify JIL definition; inspect dependent jobs; check credentials/system resources; re-run with verbose logging if safe.

Monitoring Tools – L2

21) What is the difference between threshold-based and anomaly-based alerts?

Click to see the answer

Threshold: triggers when a metric crosses a fixed bound. Anomaly: flags deviations from learned baselines using historical patterns.

22) In Nagios, how do you check if a host is reachable?

Click to see the answer

Use the web UI to view host status; or run a ping/NRPE check from the Nagios server and review notifications/state history.

23) How do you integrate monitoring alerts with ServiceNow?

Click to see the answer

Forward events via webhook/API (Event Management/MID). Include CI, severity, dedup keys; map to incident assignment rules; test end-to-end with a synthetic alert.

24) In Splunk, how do you find events from the last 15 minutes for a specific error code?

Click to see the answer

index=app_logs error_code=500 earliest=-15m@m latest=now

25) What’s the difference between proactive and reactive monitoring?

Click to see the answer

Proactive: catches early signals and prevents impact (e.g., SLO burn alerts, saturation). Reactive: responds after user-visible incidents; invest in both, bias to proactive.

Need help?

Book a 25-minute Mock Troubleshooting Interview to get a scorecard and a personalized improvement plan.
/mock-online-assessment/

Was this resource helpful?

Yes1No0

Application Support – L2

1) How would you troubleshoot a web application that is running slow for some users but not all?

2) What is the first step if you receive an application “500 Internal Server Error”?

3) How would you approach repeated application crashes during peak hours?

4) A batch job failed with a “File not found” error. What’s your next step?

5) How do you handle a high-severity incident impacting multiple users?

SQL – L2

6) How do you find the top 5 slowest running queries (by average time) in SQL Server?

7) How can you check if a table is locked in Oracle or SQL Server?

8) What’s the difference between INNER JOIN and LEFT JOIN?

9) How do you identify and kill a long-running query in SQL Server?

10) How can you optimize a query that is performing poorly?

Windows/Linux Troubleshooting – L2

11) A Linux service is not starting. How do you troubleshoot?

12) How do you check if a port is listening in Windows?

13) A Windows server shows 100% CPU usage. How do you proceed?

14) How do you check disk space usage in Linux?

15) DNS resolution is failing on a Linux machine. Steps?

Autosys – L2

16) How do you check the status of a job in Autosys?

17) How do you force start an Autosys job?

18) What’s the difference between ON ICE and ON HOLD in Autosys?

19) How do you find the last run details of a job?

20) How do you troubleshoot a job failure in Autosys?

Monitoring Tools – L2

21) What is the difference between threshold-based and anomaly-based alerts?

22) In Nagios, how do you check if a host is reachable?

23) How do you integrate monitoring alerts with ServiceNow?

24) In Splunk, how do you find events from the last 15 minutes for a specific error code?

25) What’s the difference between proactive and reactive monitoring?

Need help?

Scrum Master Interview: Real Scenario-Based, Jira, and HR Questions (Brief Answers)

Network Role – Level 1 Interview Q&A (Concise Guide)