Find answers, ask questions, and connect with our
community around the world.

Home Forums AWS AWS Certified Solutions Architect Professional Question on Redshift Troubleshooting Reply To: Question on Redshift Troubleshooting

  • Jon-Bonso

    Administrator
    May 8, 2020 at 6:09 pm

    Hi Varun,

    Not quite. An MTU problem does not always show up evidently from the beginning. It depends on various factors.

    The scenario basically asks you how can you troubleshoot the issue in your Redshift Cluster. All of the three answers here are based on the official AWS documentation:

    https://docs.aws.amazon.com/redshift/latest/dg/queries-troubleshooting.html#queries-troubleshooting-query-hangs

    About the maximum transmission unit (MTU) size, this issue happens due to packet drop, when there is a difference in the MTU size in the network path between two Internet Protocol (IP) hosts.

    In my opinion, it depends on the packet size. If on those few days, the requests only have a small packet size, then there will not be a problem and the Redshift cluster will work as expected. However, the problem will transpire if a host sends a packet that is bigger than the MTU of the instance, as per the official AWS documentation:

    If a host sends a packet that’s larger than the MTU of the receiving host or that’s larger than the MTU of a device along the path, the receiving host or device returns the following ICMP message: Destination Unreachable: Fragmentation Needed and Don’t Fragment was Set (Type 3, Code 4). This instructs the original host to adjust the MTU until the packet can be transmitted.

    https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html#path_mtu_discovery

    Hence, the option that says: “Reduce the size of maximum transmission unit (MTU).” is still valid for this scenario.

    The MTU size determines the maximum size, in bytes, of a packet that can be transferred in one Ethernet frame over your network connection. If the packet size is small on those first few days (perhaps used for initial testing) then it is possible that it will still work. But if it is used fully, then the packet size being sent by the host could be doubled that causes this issue.

    This issue is also mentioned here: https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-drop-issues.html

    Take note of the word: “Sometimes” :

    Queries Appear to Hang and Sometimes Fail to Reach the Cluster

    You experience an issue with queries completing, where the queries appear to be running but hang in the SQL client tool. Sometimes the queries fail to appear in the cluster, such as in system tables or the Amazon Redshift console.


    Possible solution:

    This issue can happen due to packet drop, when there is a difference in the maximum transmission unit (MTU) size in the network path between two Internet Protocol (IP) hosts. The MTU size determines the maximum size, in bytes, of a packet that can be transferred in one Ethernet frame over a network connection. In AWS, some Amazon EC2 instance types support an MTU of 1500 (Ethernet v2 frames) and other instance types support an MTU of 9001 (TCP/IP jumbo frames).

    The official AWS documentation supports the provided answer but if you are not fully convinced, then feel free to do this on your own AWS account:

    – Launch a single-node Redshift (dc2.large) instance then configure your SQL client tool to send and receive small packets by using a simple SELECT statement or so. And conversely, use a complex INSERT or COPY command that loads data into a table from a data file. That will shoot up the packet size and your request may fail with the following ICMP message:

    Destination Unreachable: Fragmentation Needed and Don’t Fragment was Set (Type 3, Code 4).

    This message instructs the originating host to use the lowest MTU size along the network path to resend the request. Without this negotiation, packet drop can occur because the request is too large for the receiving host to accept.

    IMPORTANT REMINDER: A Redshift cluster is quite expensive so please terminate it once you are through testing.

    https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html

    I’ve developed and released a lot of enterprise applications for various clients. The incoming traffic on these apps widely varies and most of the time, they will reach the peak load after a few days. Multinational investment banks and other large enterprises usually deploy their applications on a weekend so it won’t affect their BAU operations. Say you deploy it on a Saturday and do some smoke testing on a Sunday. The issue might be discovered on Monday when all of the users are actively using the application.

    Regards,

    Jon Bonso